Ashish Bansal, Senior Director, Enterprise Merchant Insights Lead, Capital One
Machine learning and Artificial Intelligence (AI) are the hottest terms today. Companies, big or small, are racing to figure out how to incorporate machine learning and AI in their way or working. While there is a lot of hype around this, it is important to understand what can machine learning do and how to apply it in the enterprise intelligently. Applying machine learning intelligently requires evolving the right mental model for it, and thinking the right way about its applicability. In short, it requires Machine Thinking.
What is Machine Learning
Tom Mitchell in his book Machine Learning provides the following definition:
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
In other words, a program that improves its execution by performing more tasks can be considered in the category of machine learning. Let’s consider a program that looks at credit card transactions and detects fraud. This could be built as a set of rules that depend on transaction amount, location of the transaction, time of day etc. Some combination of these will be classified as fraud. These rules were coded into software by humans analyzing previous patterns. This program classifies transactions into fraud or not fraud. However, more transactions it processes does not improve the accuracy of this algorithm. To improve, offline analysis needs to be performed and new rules need to be coded, or existing rules modified before the algorithm becomes better.
In the machine learning regime, every transaction correctly or incorrectly identified as fraud results in a small improvement in the algorithm. Consequently, after processing lots and lots of transactions, the algorithm becomes very good at detecting fraud. It also adapts to techniques being used to conduct fraud. This methodology may be appropriate in many classes of problems such as translating between languages, detecting objects in images, diagnosing diseases from x-rays, validating profiles of people or estimating housing prices. It is always important to understand, given a problem, if experience at doing the task should result in improvement in performance. Not every problem fits this description
The corollary of above is that not every problem is a machine learning problem. Many problems can be solved using rule based systems. For example, validating form fields and correctness of data being filled in could be simply coded using rules.
Machine thinking requires building a traditional software engineering solution to the problem using rules or other techniques, irrespective of whether a machine learning solution exists. At the very least, this provides a baseline that the machine learning solution should beat. Further, iterate on the machine learning solution by using a simple algorithm (like linear or logistic regression) before going for the more complicated methods like Random Forests or Neural Networks/Deep Learning. Key to remember is that a complex machine learning algorithm should provide incremental benefits to accuracy at accomplishing the task.
Understand Your Data
A lot of progress in recent times has been attributed to Deep Learning. State-of-the-art today depends on having large amounts of data available for deep learning to work effectively. Not all problems and organizations have such large data sets available for use. Current state of data science is that 60-70 percent of the time is spent on wrangling data, and 30-40 percent of the time on modelling and tuning. So it is critical to understand the data sets prior to building models. Data usually suffers from data quality issues, incomplete sets, imbalance of labels or skewed distribution of data attributed. Rectifying this would require cleansing data, imputing missing values, and normalizing the skews. This is a critical step in the process, and important to manage in the process of getting value from machine learning.
Neural network and deep learning are dominating the airwaves at the moment
If the data set being used is small, then feature engineering is required. Feature engineering is the art/science of teasing out latent information in the data so that the machine learning algorithm can use it to learn. Lets take an example – consider a data set of retail transactions from a store. This data set may have a date time stamp of the transaction, a set of skus (stock keeping units) of product purchased, a store ID, a customer ID, dollar value of the transaction amongst other fields. Each of these fields have a lot of latent information. Consider the data time stamp – it contains information such as day of week (week day or weekend in a simplified scenario), time of the transaction (or day parts like morning, afternoon, evening, night), whether it was a public holiday etc. A customer ID maybe present if the customer is part of a loyalty program. Using the store ID to get the location of the store, and customer address, a feature can be engineered that calculates the distance the customer travelled to make this particular purchase. Such features may be really important for machine learning. These features can be extracted by data scientists who understand the business domain as well problem being solved. Success of a machine learning initiative can often depend on feature engineering and understanding of the data.
Deep Learning and Deep Thinking
Neural network and deep learning are dominating the airwaves at the moment. One might feel that by not using these techniques they are missing out. Let’s put this perspective through Kaggle 2017 State of Data Science and Machine Learning survey:
This chart shows that top three methods being used are not deep learning methods. State of the art deep learning methods require a lot of data to train today. Goodfellow et al in their Deep Learning book propose:
As of 2016, a rough rule of thumb is that a supervised deep learning algorithm will generally achieve acceptable performance with around 5,000 labeled examples per category and will match or exceed human performance when trained with a dataset containing at least 10 million labeled examples.
The key to note here is per category. Ability to convert a task into a supervised learning problem, and making sure there are enough examples to train a deep learning algorithm can be a very challenging task.
Key to being successful in applying latest algorithms is to cast problems as supervised problems. Supervised learning is a class of machine learning problems where the desired output is known with the input data. For example, the input could be an image and desired output is the label ‘cat’. Both of these would need to be fed into a deep learning network, in fact many thousands and more likely thousands of them, for the algorithm to learn and work effectively. Another way to phrase this problem is to consider pairs of the form (A -> B). B represents one of the many types of labels the machine is learning to discriminate. In case of the cat detector, these could be the two possible values (cat, not cat). During training, multiple (A,B) pairs are provided. Once the algorithm is trained, input of the form (A, ?) is passed in and the machine guesses the correct label.
To be successful, start with casting your problem into a supervised learning problem. This is often the hard part of Machine Thinking. Once you are able to do that, there is very little to stop you from building amazing machine learning solutions and products.