Innovating in Data Science and Machine Learning: Modeling Better Together

This article is by Alan Jacobson and originally appeared on the Alteryx Analytics Blog here: https://community.alteryx.com/t5/Analytics-Blog/Innovating-in-Data-Science-and-Machine-Learning-Modeling-Better/ba-p/472421

 

What is Feature Engineering?

First, before we go too far into all the reasons why we are excited, let’s talk a bit about what Feature Engineering is all about. Dr. Jason Brownlee, an expert in the artificial intelligence (AI) and machine learning (ML) world is quoted in saying, “Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data”. This process includes doing things like determining if a date is a weekend or weekday or computing the peak value across a set of values. These features may be the key element in allowing one to predict something like when a customer will buy a product. In addition, data augmentation can be leveraged as part of feature engineering to further enhance the feature set. For example, if you have the zip codes of customers, and can leverage census data, you can see if the average income in that zip code is a strong indicator of who a customer might be.

Why is Feature Engineering Important?

The art of feature engineering is incredibly important in both understanding data sets, as well as in creating models. Why are sales fluctuating on different dates?  Is it because some days are holidays, weekends or some other factor?  So, how important is feature engineering to analyze and ultimately get positive outcomes with ML? Surveys of AI and ML experts suggest it is the MOST important factor on successful outcomes, as shown in this survey from Kaggle.

Kaggle Survey

Ranking the Importance of Various Parameters on the

Outcome of Machine Learning Models

kaggle survey.png

Data Scientists can manually create these features leveraging Alteryx, but to automatically explore complex data sets and add these features quickly not only saves time, but can ultimately be the difference between successfully seeing the pattern in the data or failing to get the business outcome desired.

Andrew Ng described the challenge of feature engineering in this way: “Coming up with features is difficult, time-consuming and requires expert knowledge. ‘Applied machine learning’ is basically feature engineering.”  As the co-founder of Google Brain, former chief scientist at Baidu, professor at Stanford University and director of AI Lab, and co-founder of deep-learning.ai, I believe his point of view is coming from a great body of experience.

Automating the Feature Engineering task will not only speed up the process and increase success but will help the modern-day data worker upskill and become more capable at machine learning.  Putting tools like this in the hands of the workforce will ultimately accelerate the digital transformation of the enterprise.