
Carolyn Geason-Beissel/MIT SMR | Getty Images
Leaders are often confused about when to use generative AI versus predictive AI (machine learning and deep learning) tools. The issue isn’t that one technology is superior: It’s about matching the technology to the specific business problem. This column presents a pragmatic way to help you make the best decision and avoid costly mistakes.
The analytics landscape has evolved significantly during the past decade. Many organizations have progressed from basic statistical modeling to machine learning, and some have added deep learning to their toolkits as well. In this context, the emergence of generative AI — with its ability to create humanlike text, generate images, and write code — introduces new possibilities and new questions.
While generative AI promises to revolutionize everything from customer service to product development, its optimal role alongside predictive AI tools (that is, machine learning and deep learning tools) remains a work in progress. That often leaves leaders asking what the right approach is for addressing a particular problem. This article presents a set of guidelines to help leaders and organizations navigate this tricky and crucial decision.
Machine Learning Versus Deep Learning Versus GenAI
Let’s begin with a quick overview of machine learning, deep learning, and generative AI, focusing on their respective strengths and limitations.
Machine learning: This type of AI involves identifying patterns from historical data using statistical and computational techniques to make predictions or decisions without being explicitly programmed to do so. Encompassing a range of techniques, including regression analysis, decision trees, random forests, and gradient boosting, its primary strength lies in handling tabular/structured data — data that can be arranged in the rows and columns of a spreadsheet or a database table. In tabular data, the columns — known as independent variables or features — are either naturally numeric (such as levels of LDL cholesterol for a patient, or the average credit balance for a loan applicant) or can be represented numerically. (For example, if a patient has a family history of heart disease, it is represented by a value of 1; otherwise, the value is zero).
The text being analyzed by and created from generative AI tools encompasses an astonishing range of types.
Since problems with tabular input data are ubiquitous in business, machine learning has had a tremendous positive impact. Retailers use machine learning to forecast product demand and inventory needs by analyzing historical sales data and seasonal patterns. Subscription-based businesses use machine learning for customer-churn prediction and prevention. Financial institutions use machine learning to predict the risk of loan defaults.
But machine learning doesn’t work well if the input data is unstructured (such as images, natural language text, or audio). To effectively use traditional machine learning with unstructured data, the data must be manually structured — an expensive task that makes machine learning unattractive for business use cases where the input data is not tabular.
Deep learning: A particular type of machine learning based on neural networks, deep learning is a seminal advancement in analytical capabilities. Deep learning models can process unstructured data such as images, audio, and natural language without the need for upfront manual processing, thereby making numerous use cases viable. Deep learning can accommodate tabular inputs as well. Its ability to handle both structured and unstructured data makes it particularly valuable for tasks where the input data naturally appears in different modalities. A model for disease detection, for example, should be able to process image data (such as radiology scans) alongside tabular data, such as patient test results. But deep learning tends to be more “data-hungry” than machine learning, and it can be more challenging to understand and interpret due to the complexity and size of the underlying neural networks.
Generative AI: GenAI is distinguished from predictive AI by its ability to generate new content rather than simply making predictions. Built on a breakthrough deep learning architecture known as a transformer, these systems can generate coherent text, realistic images, and even functional code and, as a result, promise widespread applicability to a broad swath of knowledge work. For example, a marketing department might use GenAI to draft advertising copy, create visual content variations, or generate personalized customer communications at scale.
The inputs and outputs of generative AI systems like LLMs are typically unstructured. Most commonly, they comprise text and/or image data and, more recently, videos. Note that the text being analyzed by and created from generative AI tools encompasses an astonishing range of types, such as software code, protein sequences, music notation, mathematical expressions, and chemical formulas.
How to Identify the Right Approach
How can a leader decide which AI tool to use for a given problem? Let’s assume that the problem has been clearly defined, relevant inputs have been identified, and the desired output has been specified.
A logical starting point is the nature of the problem: Is it a prediction problem or a generation problem?
Generation problems are easy to identify. If the desired output is unstructured — such as text, images, videos, or music — it is a generation problem.
Prediction problems come in two varieties: classification and regression. In classification problems, given an input, the user needs to make a choice from a set of predefined outputs. For example, given data about a patient, a doctor may want to predict whether the patient is at high, medium, or low risk for cardiovascular disease. The key here is that the output categories — high risk, medium risk, and low risk — are predefined, not generated on the fly.
In regression problems, you want to predict a number (or a few numbers). Given data about a patient and treatment details, a doctor may want to predict what their LDL cholesterol level will be six months from now. Or, given past sales data for a product, an organization may want to predict its sales units for the next 24 hours. Note that the distinction between classification and regression can be somewhat fuzzy. For example, regression problems can often be framed as classification problems. Instead of trying to predict the exact LDL cholesterol level of a patient, a doctor may be content with predicting whether it will be high, medium, or low.
With the nature of the problem identified, we can turn to which tool to use.
Let’s start with the easy case. If you have a generation problem to solve, there’s only one game in town: generative AI. Depending on the sort of output you want to generate, you may need to use multimodal LLMs, like OpenAI’s GPT-4, Anthropic’s Claude 3.7 Sonnet, or Google’s Gemini 1.5; text-to-image models, like Dall-E; or special-purpose models that have been built for audio and other domains.
If you have a prediction problem, however, matters become more complicated.
The most straightforward scenario is when the input data is all tabular. In this situation, you should favor traditional machine learning. While deep learning can also solve these problems, it brings a host of other burdens that may not be worth the effort: It may require more effort to “tune” the model to the problem, the model may not lend itself to managerial interpretability due to its black-box nature, and so on. In contrast, machine learning models are much quicker to build and tune and require less “babysitting,” and interpretable methods are available. There’s also a wide variety of easy-to-use open-source software and a huge pool of people who know how to use these tools.
By choosing machine learning over deep learning, you are not necessarily settling for lower accuracy in exchange for ease of development. Certain widely used machine learning methods (like XGBoost, short for Extreme Gradient Boosting) are not only easier to work with than deep learning but also can be more accurate for tabular data prediction problems.
What if you have a prediction problem where the inputs are unstructured, like text or images?
This is arguably the scenario where the “right” answer has changed most in recent years. Before the advent of generative AI, the standard approach would have been to collect data and train a deep learning model. But today’s LLMs are often able to solve these types of problems right off the bat, with no specialized training whatsoever.
Let’s start with an LLM sweet spot: when the input data and the output labels are “everyday” natural language text, as opposed to technical or jargon-laden text from a specialized domain.
For example, let’s say you want to build an AI system that can detect whether a product review on an e-commerce site indicates a potential product improvement idea. Such a review-classification system would allow you to process thousands of reviews efficiently and pass on the important ones to product design teams for further investigation. The text in product reviews is considered everyday text, since reviews are written by consumers. The labels can be designed to be everyday text as well (such as “mentions product improvement idea” or “does not mention product improvement idea”).
A few years ago, we would have solved this problem by collecting thousands of reviews, labeling each review as described above (an expensive manual task), and training a predictive AI model with this data. But since LLMs have been trained on a massive amount of text, they can process everyday text (like product reviews and the labels assigned to them) without any special training.
Consider this product review of an office chair on Wayfair.com: “The curve of the back of the chair does not leave enough room to sit comfortably.” It does seem to indicate a potential product improvement idea. We can simply prompt an LLM to classify the text as follows:
Prompt: Does the following product review indicate a potential product improvement idea? Answer yes or no. Review: The curve of the back of the chair does not leave enough room to sit comfortably.
LLM response: Yes
If the off-the-shelf accuracy of LLMs is not high enough, it can sometimes be improved with prompt engineering and/or by providing it with a few examples (known as few-shot prompting).
It is straightforward to write code to automate this process. Massive volumes of reviews can be easily classified by executing the code on a regular cadence. Closed-source LLMs like ChatGPT, Claude, or Gemini can certainly be used for this purpose. The API costs for using these systems have been dropping precipitously in recent years, but the cost can be reduced even further by using capable open-source LLMs (like the Llama or Mistral model families).
If the problem is a prediction classification problem, the input data is text or images, and the output labels are everyday text, try to solve it with an LLM first.
While we have examined a text-classification scenario in detail, the approach described above is equally applicable if the input data is images. Many LLMs are now multimodal and can classify images, detect objects in images, or extract structured data from documents with acceptable accuracy. They are especially effective if the input images are everyday images rather than images from a highly specialized technical domain (such as medical images) and the output labels are everyday text.
In summary, if the problem is a prediction classification problem, the input data is text or images, and the output labels are everyday text (rather than special-purpose jargon), try to solve it with an LLM first.
However, using an LLM is sometimes not feasible. This can happen for a variety of reasons, including issues related to accuracy, cost, latency, or data privacy. In this situation, taking a predictive AI approach makes sense, and since the input data is unstructured, deep learning is typically a good option.
As noted earlier, deep learning tends to have a voracious appetite for data compared with traditional machine learning models. But this burden can be substantially lessened by using pretrained deep learning models. Model hubs contain hundreds of thousands of pretrained deep learning models. You can search a hub for models that have been pretrained on the same type of unstructured input data that your problem involves. For example, if you are working with medical text, you can look for models that have been pretrained on such text. If you are working with images of industrial products, you can look for models that have been pretrained on such images.
Such pretrained models can be downloaded and fine-tuned quickly with a modest amount of problem-specific data. Instead of collecting and labeling tens of thousands of inputs, you may need only hundreds of inputs.
LLMs can be helpful here as well. You can use them to generate and label the data needed to fine-tune a pretrained model. For example, instead of manually labeling thousands of e-commerce product reviews with “mentions product improvement idea” or “does not mention product improvement idea,” you can use an LLM to do the labeling cheaply and quickly (a technique known as LLM-as-a-judge). If even unlabeled product reviews are not available in sufficient volume (perhaps because the e-commerce site launched only recently), an LLM can be prompted to generate synthetic reviews, using available reviews as a “seed.”
Finally, if the input data is a mix of tabular and unstructured data, I recommend starting with deep learning directly.
My recommendations can be summarized as follows:
- Identify whether the problem is a generation problem or a prediction problem.
- Address generation problems with generative AI tools.
- For prediction problems where the input data is tabular, use predictive AI tools, especially time-tested machine learning tools like regression or gradient boosting.
- For prediction problems where the input data is unstructured and the output labels are everyday text, try using GenAI tools. If this proves to be unacceptable for any reason (due to factors like accuracy, cost, or data confidentiality), try deep learning.
- If you are using deep learning, you can reduce the data burden significantly in two ways. The first is to avoid building models from scratch and instead look for models pretrained on similar types of input data that can be fine-tuned with your data. The second is to use LLMs to cost-effectively label your training data.
The Value of Mixing and Matching AI Approaches
As I have discussed, the choice between traditional machine learning, deep learning, and generative AI should not be viewed as an either-or proposition but as a set of capabilities that can be mixed and matched and tailored based on the specifics of the problem at hand.
Looking ahead, the boundaries between these technologies will likely continue to blur as new capabilities emerge. The recent rise of pretrained models for tabular data, for example, may point to a data-efficient alternative to building predictive AI models from scratch for tabular data problems.
Business leaders must stay informed about technological advancements while maintaining focus on their core business objectives. By following a structured decision framework and maintaining focus on business value creation, organizations can navigate the complex AI landscape successfully and make AI project decisions with a higher likelihood of delivering business value.
The post "When to Use GenAI Versus Predictive AI" appeared first on MIT Sloan Management Review
0 Comments