2022 was the year that generative artificial intelligence (AI) exploded into the public consciousness, and 2023 was the year it began to take root in the business world. 2024 thus stands to be a pivotal year for the future of AI, as researchers and enterprises seek to establish how this evolutionary leap in technology can be most practically integrated into our everyday lives.
The evolution of generative AI has mirrored that of computers, albeit on a dramatically accelerated timeline. Massive, centrally operated mainframe computers from a few players gave way to smaller, more efficient machines accessible to enterprises and research institutions. In the decades that followed, incremental advances yielded home computers that hobbyists could tinker with. In time, powerful personal computers with intuitive no-code interfaces became ubiquitous.
Generative AI has already reached its “hobbyist” phase—and as with computers, further progress aims to attain greater performance in smaller packages. 2023 saw an explosion of increasingly efficient foundation models with open licenses, beginning with the launch of Meta’s LlaMa family of large language models (LLMs) and followed by the likes of StableLM, Falcon, Mistral, and Llama 2. DeepFloyd and Stable Diffusion have achieved relative parity with leading proprietary models. Enhanced with fine-tuning techniques and datasets developed by the open source community, many open models can now outperform all but the most powerful closed-source models on most benchmarks, despite far smaller parameter counts.
As the pace of progress accelerates, the ever-expanding capabilities of state-of-the-art models will garner the most media attention. But the most impactful developments may be those focused on governance, middleware, training techniques and data pipelines that make generative AI more trustworthy, sustainable and accessible, for enterprises and end users alike.
Here are some important current AI trends to look out for in the coming year.
- Reality check: more realistic expectations
- Multimodal AI
- Small(er) language models and open source advancements
- GPU shortages and cloud costs
- Model optimization is getting more accessible
- Customized local models and data pipelines
- More powerful virtual agents
- Regulation, copyright and ethical AI concerns
- Shadow AI (and corporate AI policies)
Reality check: more realistic expectations
When generative AI first hit mass awareness, a typical business leader’s knowledge came mostly from marketing materials and breathless news coverage. Tangible experience (if any) was limited to messing around with ChatGPT and DALL-E. Now that the dust has settled, the business community now has a more refined understanding of AI-powered solutions.
The Gartner Hype Cycle positions Generative AI squarely at “Peak of Inflated Expectations,” on the cusp of a slide into the “Trough of Disillusionment”[i]—in other words, about to enter a (relatively) underwhelming transition period—while Deloitte’s “State of Generated AI in the Enterprise “ report from Q1 2024 indicated that many leaders “expect substantial transformative impacts in the short term.”[ii] The reality will likely fall in between: generative AI offers unique opportunities and solutions, but it will not be everything to everyone.
How real-world results compare to the hype is partially a matter of perspective. Standalone tools like ChatGPT typically take center stage in the popular imagination, but smooth integration into established services often yields more staying power. Prior to the current hype cycle, generative machine learning tools like the “Smart Compose” feature rolled out by Google in 2018 weren’t heralded as a paradigm shift, despite being harbingers of today’s text generating services. Similarly, many high-impact generative AI tools are being implemented as integrated elements of enterprise environments that enhance and complement, rather than revolutionize or replace, existing tools: for example, “Copilot” features in Microsoft Office, “Generative Fill” features in Adobe Photoshop or virtual agents in productivity and collaboration apps.
Where generative AI first builds momentum in everyday workflows will have more influence on the future of AI tools than the hypothetical upside of any specific AI capabilities. According to a recent IBM survey of over 1,000 employees at enterprise-scale companies, the top three factors driving AI adoption were advances in AI tools that make them more accessible, the need to reduce costs and automate key processes and the increasing amount of AI embedded into standard off-the-shelf business applications.
Multimodal AI (and video)
That being said, the ambition of state-of-the-art generative AI is growing. The next wave of advancements will focus not only on enhancing performance within a specific domain, but on multimodal models that can take multiple types of data as input. While models that operate across different data modalities are not a strictly new phenomenon—text-to-image models like CLIP and speech-to-text models like Wave2Vec have been around for years now—they’ve typically only operated in one direction, and were trained to accomplish a specific task.
The incoming generation of interdisciplinary models, comprising proprietary models like OpenAI’s GPT-4V or Google’s Gemini, as well as open source models like LLaVa, Adept or Qwen-VL, can move freely between natural language processing (NLP) and computer vision tasks. New models are also bringing video into the fold: in late January, Google announced Lumiere, a text-to-video diffusion model that can also perform image-to-video tasks or use images for style reference.
The most immediate benefit of multimodal AI is more intuitive, versatile AI applications and virtual assistants. Users can, for example, ask about an image and receive a natural language answer, or ask out loud for instructions to repair something and receive visual aids alongside step-by-step text instructions.
On a higher level, multimodal AI allows for a model to process more diverse data inputs, enriching and expanding the information available for training and inference. Video, in particular, offers great potential for holistic learning. “There are cameras that are on 24/7 and they’re capturing what happens just as it happens without any filtering, without any intentionality,” says Peter Norvig, Distinguished Education Fellow at the Stanford Institute for Human-Centered Artificial Intelligence (HAI).[iii] “AI models haven’t had that kind of data before. Those models will just have a better understanding of everything.”
Small(er) language models and open source advancements
In domain-specific models—particularly LLMs—we’ve likely reached the point of diminishing returns from larger parameter counts. Sam Altman, CEO of OpenAI (whose GPT-4 model is rumored to have around 1.76 trillion parameters), suggested as much at MIT’s Imagination in Action event last April: “I think we’re at the end of the era where it’s going to be these giant models, and we’ll make them better in other ways,” he predicted. “I think there’s been way too much focus on parameter count.”
Massive models jumpstarted this ongoing AI golden age, but they’re not without drawbacks. Only the very largest companies have the funds and server space to train and maintain energy-hungry models with hundreds of billions of parameters. According to one estimate from the University of Washington, training a single GPT-3-sized model requires the yearly electricity consumption of over 1,000 households; a standard day of ChatGPT queries rivals the daily energy consumption of 33,000 U.S. households.[iv]
Smaller models, meanwhile, are far less resource-intensive. An influential March 2022 paper from Deepmind demonstrated that training smaller models on more data yields better performance than training larger models on fewer data. Much of the ongoing innovation in LLMs has thus focused on yielding greater output from fewer parameters. As demonstrated by recent progress of models in the 3–70 billion parameter range, particularly those built upon LLaMa, Llama 2 and Mistral foundation models in 2023, models can be downsized without much performance sacrifice.
The power of open models will continue to grow. In December of 2023, Mistral released “Mixtral,” a mixture of experts (MoE) model integrating 8 neural networks, each with 7 billion parameters. Mistral claims that Mixtral not only outperforms the 70B parameter variant of Llama 2 on most benchmarks at 6 times faster inference speeds, but that it even matches or outperforms OpenAI’s far larger GPT-3.5 on most standard benchmarks. Shortly thereafter, Meta announced in January that it has already begun training of Llama 3 models, and confirmed that they will be open sourced. Though details (like model size) have not been confirmed, it’s reasonable to expect Llama 3 to follow the framework established in the two generations prior.
These advances in smaller models have three important benefits:
- They help democratize AI: smaller models that can be run at lower cost on more attainable hardware empower more amateurs and institutions to study, train and improve existing models.
- They can be run locally on smaller devices: this allows more sophisticated AI in scenarios like edge computing and the internet of things (IoT). Furthermore, running models locally—like on a user’s smartphone—helps to sidestep many privacy and cybersecurity concerns that arise from interaction with sensitive personal or proprietary data.
- They make AI more explainable: the larger the model, the more difficult it is to pinpoint how and where it makes important decisions. Explainable AI is essential to understanding, improving and trusting the output of AI systems.
GPU shortages and cloud costs
The trend toward smaller models will be driven as much by necessity as by entrepreneurial vigor, as cloud computing costs increase as the availability of hardware decrease.
“The big companies (and more of them) are all trying to bring AI capabilities in-house, and there is a bit of a run on GPUs,” says James Landay, Vice-Director and Faculty Director of Research, Stanford HAI. “This will create a huge pressure not only for increased GPU production, but for innovators to come up with hardware solutions that are cheaper and easier to make and use.”1
As a late 2023 O’Reilly report explains, cloud providers currently bear much of the computing burden: relatively few AI adopters maintain their own infrastructure, and hardware shortages will only elevate the hurdles and costs of setting up on-premise servers. In the long term, this may put upward pressure on cloud costs as providers update and optimize their own infrastructure to effectively meet demand from generative AI.[v]
For enterprises, navigating this uncertain landscape requires flexibility, in terms of both models–leaning on smaller, more efficient models where necessary or larger, more performant models when practical–and deployment environment. “We don’t want to constrain where people deploy [a model],” said IBM CEO Arvind Krishna in a December 2023 interview with CNBC, in reference to IBM’s watsonx platform. “So [if] they want to deploy it on a large public cloud, we’ll do it there. If they want to deploy it at IBM, we’ll do it at IBM. If they want to do it on their own, and they happen to have enough infrastructure, we’ll do it there.”
Model optimization is getting more accessible
The trend towards maximizing the performance of more compact models is well served by the recent output of the open source community.
Many key advancements have been (and will continue to be) driven not just by new foundation models, but by new techniques and resources (like open source datasets) for training, tweaking, fine-tuning or aligning pre-trained models. Notable model-agnostic techniques that took hold in 2023 include:
- Low Rank Adaptation (LoRA): Rather than directly fine-tuning billions of model parameters, LoRA entails freezing pre-trained model weights and injecting trainable layers—which represent the matrix of changes to model weights as 2 smaller (lower rank) matrices—in each transformer block. This dramatically reduces the number of parameters that need to be updated, which, in turn, dramatically speeds up fine-tuning and reduces memory needed to store model updates.
- Quantization: Like lowering the bitrate of audio or video to reduce file size and latency, quantization lowers the precision used to represent model data points—for example, from 16-bit floating point to 8-bit integer—to reduce memory usage and speed up inference. QLoRA techniques combine quantization with LoRA.
- Direct Preference Optimization (DPO): Chat models typically use reinforcement learning from human feedback (RLHF) to align model outputs to human preferences. Though powerful, RLHF is complex and unstable. DPO promises similar benefits while being computationally lightweight and substantially simpler.
Alongside parallel advances in open source models in the 3–70 billion parameter space, these evolving techniques could shift the dynamics of the AI landscape by providing smaller players, like startups and amateurs, with sophisticated AI capabilities that were previously out of reach.
Customized local models and data pipelines
Enterprises in 2024 can thus pursue differentiation through bespoke model development, rather than building wrappers around repackaged services from “Big AI.” With the right data and development framework, existing open source AI models and tools can be tailored to almost any real-world scenario, from customer support uses to supply chain management to complex document analysis.
Open source models afford organizations the opportunity to develop powerful custom AI models—trained on their proprietary data and fine-tuned for their specific needs—quickly, without prohibitively expensive infrastructure investments. This is especially relevant in domains like legal, healthcare or finance, where highly specialized vocabulary and concepts may not have been learned by foundation models in pre-training.
Legal, finance and healthcare are also prime examples of industries that can benefit from models small enough to be run locally on modest hardware. Keeping AI training, inference and retrieval augmented generation (RAG) local avoids the risk of proprietary data or sensitive personal information being used to train closed-source models or otherwise pass through the hands of third parties. And using RAG to access relevant information rather than storing all knowledge directly within the LLM itself helps reduce model size, further increasing speed and reducing costs.
As 2024 continues to level the model playing field, competitive advantage will increasingly be driven by proprietary data pipelines that enable industry-best fine-tuning.
More powerful virtual agents
With more sophisticated, efficient tools and a year’s worth of market feedback at their disposal, businesses are primed to expand the use cases for virtual agents beyond just straightforward customer experience chatbots.
As AI systems speed up and incorporate new streams and formats of information, they expand the possibilities for not just communication and instruction following, but also task automation. “2023 was the year of being able to chat with an AI. Multiple companies launched something, but the interaction was always you type something in and it types something back,” says Stanford’s Norvig. “In 2024, we’ll see the ability for agents to get stuff done for you. Make reservations, plan a trip, connect to other services.”
Multimodal AI, in particular, significantly increases opportunities for seamless interaction with virtual agents. For example, rather than simply asking a bot for recipes, a user can point a camera at an open fridge and request recipes that can be made with available ingredients. Be My Eyes, a mobile app that connects blind and low vision individuals with volunteers to help with quick tasks, is piloting AI tools that help users directly interact with their surroundings through multimodal AI in lieu of awaiting a human volunteer.
Regulation, copyright and ethical AI concerns
Elevated multimodal capabilities and lowered barriers to entry also open up new doors for abuse: deepfakes, privacy issues, perpetuation of bias and even evasion of CAPTCHA safeguards may become increasingly easy for bad actors. In January of 2024, a wave of explicit celebrity deepfakes hit social media; research from May 2023 indicated that there had been 8 times as many voice deepfakes posted online compared to the same period in 2022.[vi]
Ambiguity in the regulatory environment may slow adoption, or at least more aggressive implementation, in the short to medium term. There is inherent risk to any major, irreversible investment in an emerging technology or practice that might require significant retooling—or even become illegal—following new legislation or changing political headwinds in the coming years.
In December 2023, the European Union (EU) reached provisional agreement on the Artificial Intelligence Act. Among other measures, it prohibits indiscriminate scraping of images to create facial recognition databases, biometric categorization systems with potential for discriminatory bias, “social scoring” systems and the use of AI for social or economic manipulation. It also seeks to define a category of “high-risk” AI systems, with potential to threaten safety, fundamental rights or rule of law, that will be subject to additional oversight. Likewise, it sets transparency requirements for what it calls “general-purpose AI (GPAI)” systems—foundation models—including technical documentation and systemic adversarial testing.
But while some key players, like Mistral, reside in the EU, the majority of groundbreaking AI development is happening in America, where substantive legislation of AI in the private sector will require action from Congress—which may be unlikely in an election year. On October 30, the Biden administration issued a comprehensive executive order detailing 150 requirements for use of AI technologies by federal agencies; months prior, the administration secured voluntary commitments from prominent AI developers to adhere to certain guardrails for trust and security. Notably, both California and Colorado are actively pursuing their own legislation regarding individuals’ data privacy rights with regard to artificial intelligence.
China has moved more proactively toward formal AI restrictions, banning price discrimination by recommendation algorithms on social media and mandating the clear labeling of AI-generated content. Prospective regulations on generative AI seek to require the training data used to train LLMs and the content subsequently generated by models must be “true and accurate,” which experts have taken to indicate measures to censor LLM output.
Meanwhile, the role of copyrighted material in the training of AI models used for content generation, from language models to image generators and video models, remains a hotly contested issue. The outcome of the high-profile lawsuit filed by the New York Times against OpenAI may significantly affect the trajectory of AI legislation. Adversarial tools, like Glaze and Nightshade—both developed at the University of Chicago—have arisen in what may become an arms race of sorts between creators and model developers.
Shadow AI (and corporate AI policies)
For businesses, this escalating potential for legal, regulatory, economic or reputational consequences is compounded by how popular and accessible generative AI tools have become. Organizations must not only have a careful, coherent and clearly articulated corporate policy around generative AI, but also be wary of shadow AI: the “unofficial” personal use of AI in the workplace by employees.
Also dubbed “shadow IT” or “BYOAI,” shadow AI arises when impatient employees seeking quick solutions (or simply wanting to explore new tech faster than a cautious company policy allows) implement generative AI in the workplace without going through IT for approval or oversight. Many consumer-facing services, some free of charge, allow even nontechnical individuals to improvise the use of generative AI tools. In one study from Ernst & Young, 90% of respondents said they use AI at work.[vii]
That enterprising spirit can be great, in a vacuum—but eager employees may lack relevant information or perspective regarding security, privacy or compliance. This can expose businesses to a great deal of risk. For example, an employee might unknowingly feed trade secrets to a public-facing AI model that continually trains on user input, or use copyright-protected material to train a proprietary model for content generation and expose their company to legal action.
Like many ongoing developments, this underscores how the dangers of generative AI rise almost linearly with its capabilities. With great power comes great responsibility.
Moving forward
As we proceed through a pivotal year in artificial intelligence, understanding and adapting to emerging trends is essential to maximizing potential, minimizing risk and responsibly scaling generative AI adoption.
Put generative AI to work with watsonx™ →
Learn how IBM can empower you to stay ahead of AI trends →
[i] “Gartner Places Generative AI on the Peak of Inflated Expectations on the 2023 Hype Cycle for Emerging Technologies,” Gartner, 16 August 2023
[ii] ”Deloitte’s State of Generative AI in the Enteprrise Quarter one report,” Deloitte, January 2024
[iii] ”What to Expect in AI in 2024,” Stanford University, 8 December 2023
[iv] ”Q&A: UW researcher discusses just how much energy ChatGPT uses,” University of Washington, 27 July 2023
[v] “Generative AI in the Enterprise,” O’Reilly, 28 November 2023
[vi] ”Deepfaking it: America’s 2024 election coincides with AI boom,” Reuters, 30 May 2023
[vii] ”How organizations can stop skyrocketing AI use from fueling anxiety,” Ernst & Young, December 2023
The post The most important AI trends in 2024 appeared first on IBM Blog.