Sustainability of GenAI?
--
Generative AI (GenAI) has taken the world by storm, promising to revolutionize everything from creative content generation to scientific discovery[1]. However, its long-term viability hinges on its sustainability is still questionable. In this article we will explore the multi-facet aspects and challenges to GenAI’s long-term viability.
Economic
Microeconomic
- LLM Pricing: GPT-4 (source)
— Output: $0.12/1k token
— Input: $0.06/1k token - Current RAG application depends heavily on context from retrievers and most of the time the retrieved context is quite lengthy. This will add to the input token pricing.
- As the number of queries increase over the time, there would be more invocation of LLM models which will again add to the cost.
- Some use-case rely on multiple queries to the model, which in-turn will further add to the cost.
- This is just for the text generation. If we add embedding model pricing to this, the cost goes up even more. Add vision capability to the mix and the pricing may become a bottleneck.
- Most of the GenAI services that are coming up are adding an extra level of enhanced user-experience with no extra cost to the user. However, since these are going to add on the operating cost.
- Furthermore, companies are also developing GenAI based solutions for internal tools, which may not add enough to the revenue when
- Overall, Given the pricing of current LLMs and anticipated prices for upcoming models combined with reliance on proprietary models is going to have an impact on the margins and EBITDA.
- Even if we look at self-host open source models, there are few challenges even here:
— Open source models may not have comparable text-generation quality then GPTs and Claudes.
— LLM models are bulky and hosting them requires enormous amounts of compute resources (Llama 2 70B requires around 130GB of memory. Even Mixtral 8x7B requires around 110GB of memory) which is going to cost (AWS, Azure, GCP or any other cloud service)
— Even if a choice is made for a smaller or quantized version of these models, then there is a drop in text-generation quality. Need a critically balanced trade-off between model size and quality.
— Also, some services may require multiple instances of these models up and running which is again going to add to the cost. - Check out this post Cost of RAG
https://www.linkedin.com/feed/update/urn:li:activity:7181168603906359296/
Macroeconomics
- Policies
— The broader economic impact of GenAI is yet to be fully understood. We need to consider how GenAI will affect national finances and explore relevant policy frameworks. - How GenAI is going to affect the country’s financials?
- Drastic change in the employability of creative work force [4–7]
- The Macroeconomics of Artificial Intelligence
- Displaced investment focus and AI winter
Recently we have seen a great increase in interest in Large Action Models (LAM) which is further fueled by controlled demos given by newer LAM based AI assistant device companies like Humane AI pin and Rabbit R1. Even though there are quite a lot of concerns around practicality and feasibility of these gadgets be it their bizarre subscription models (Humane AI) or their “ability” to do bookings without the need of human intervention (Rabbit R1). Since these gadgets have only speech as a medium of interaction, they have huge reliance on speech recognition capabilities which we have seen to board well in public environments. There are privacy issues with speech based interaction as we do not want everyone around us to be able to listen to every conversation we have with these devices. Also in public and noisy environments, speech recognition may fail to interpret what we are trying to say and since these devices are given access to take actions with our crucial information, a single misinterpreted word may lead to serious financial or information breach.
— Even with these substantial issues we see huge investment going toward these not real world ready devices on the grounds of AI.
— Organizations, instead of identifying a solution to an existing problem, are trying to construct a new problem to try to fit a GenAI solution.
— Companies are diverting substantial part of their investments in GenAI.
— All of this may point towards the rise of an AI bubble.
— The more this bubble grows, the more inflated people’s expectations from GenAI will become. Which may eventually reach a point where expectations go beyond what is realistically possible and will cause this bubble to burst in worst case scenarios or will cause plummet of people’s trust in GenAI.
— By this time investors would have already spent a lot of money in GenAI, investment in other AI technologies may become contracted potentially leading to the third AI winter.
— This contracted investments for new solutions and increasing expenditures as discussed in the previous section, may act as a negative feedback for the already unstable macroeconomic conditions, further putting pressure on the IT and technology market.
Socio-economic
Market Concentration:
- Currently, there are a handful of companies that that have brought major LLMs to the market:
— OpenAI — ChatGPT 3.5 and GPT-4
— Google — Gemini
— Anthropic — Claude
— MistralAI — Mistral 7B and Mixtral 8x7B
— Meta — Llama
— Tesla — Grok - Overall training process cost of major LLMs:
— GPT-4 — $100 million (source)
— Mistral AI — $22 million (source) - Also if we look at the companies owning major market share in the LLM market are OpenAI, Microsoft and Google. This is somewhat of concern as AI power will eventually become concentrated in the hands of certain For-profit organizations. Even the EU has shown similar concern after the Microsoft-MistralAI deal.
— Recently, Microsoft signed a deal with Mistral AI. This makes their second deal a major LLM contributor after OpenAI.
— It can only be left to anticipations as to how this deal will change MistralAI’s take on open sourcing their SoTA models.
Innovation vs. Hype:
Compute support companies have also been extremely supportive and vocal of GenAI being the future of AI.
- We have seen Nvidia being very supportive of the GenAI trend. In the past few months we have also seen tremendous growth in its stock price. This is mostly because of heavy dependency of these LLMs and similar models (LMMs or LAMs) on GPU for training, fine-tuning and even inference.
- Similarly, cloud service providers have also been supporting GenAI and have come up with LLM hosting services like AWS Bedrock or Azure OpenAI (Azure AI studio) or Google AI studio. These cloud service providers not just support hosting LLM models, they also provide compute solutions.
- Since, growth of these companies is hugely propelled by the growth of GenAI, it begs the question that whether these companies are riding the GenAI trend (or hype) or they are creating these trends (or hype)?
Environmental
Whenever GenAI and LLMs in general are the point of discussion, the main focus is on how these models have made our lives easy. However, one major factor that is often left out of these discussions is their impact on the environment.
- According to Sajjad Moazeni, who is an assistant professor at University of Washington, training a large language model as big as ChatGPT-3 could have led up to 10 gigawatt-hour (GWh) of power consumption. This roughly equates to yearly electricity consumption of over 1,000 households in the US.
- On an average ChatGPt has around 180.5 million active users (source) which could cause more than 1GWh of energy consumption. To put this in perspective, this equates to an average daily energy consumption for more than 33,000 households in the US.
- This numbers are just for the one chat service ie. ChatGPT. Now consider Gemini, Claude and many more such models and their energy consumption.
- To put it in Indian perspective, Maharashtra had an average energy demand of around 27.5GW in 2023. (source)
- Take a look at this article which discusses this in much more detail:
https://8billiontrees.com/carbon-offsets-credits/carbon-footprint-of-ai/
Production Readiness
Guardrails
- Most of the guardrails available today are prompt based.
- Recently Nvidia released Nemo guardrail [8] which is an open-source framework for adding different types of rails for LLM applications.
- However, there are some challenges with this too:
— The rails heavily depend on finding semantic similarity between developer provided examples of off-topic or malicious queries and user input query.
— This puts a restriction of how many types of queries a developer can provide. As there is always a chance of encountering a harmful prompt or a query which the developer has not considered.
— Even for a finite number of cases, a developer needs to provide multiple examples for every kind of off-topic query which can grow exponentially when the application needs to be constrained for a very specific use case.
— Also, the modeling language Nemo guardrail introduced, Colang, has its own learning curve which can turn out to be restrictive for quick onboarding of new developers.
— Furthermore, the readme on the official github repository for the project states the the developers of the project recommend not to use the current version of NeMo in production setting.
Ethics
- There is an on-going legal and ethical debate on rights of content producers whose content is used to train these large models.
- Furthermore, there is no set guideline which defines the usability license and nature of AI generated content. Whether the company using the model owns it or it needs to be open since it is generated by an AI and not an human individual.
Evaluation Challenges
- Difficulty of test for reliability of generated text.
- Frameworks like LangFuse, LangSmith do provide some level of observability into how LLM applications came to an answer but they still lack when it comes to explainability. These LLMs are huge black-boxes for which we have no clue as to why they generated what they generated. Chain-of-thought and similar methods try to solve it up to a certain level, but still the chain of thought responses are LLM generated text. So it is still not a white box. At max a gray box.
- There are tools and libraries like RAGA that provide ways for evaluating a Retrieval Augmented Generation (RAG) application. However, the way in which they evaluate response generation is to be tested against a limited set of predefined Q&A pairs which may not be exhaustive enough as the type of questions these GenAI systems may see is only limited by human creativity. Also some ways of evaluating RAGs involve using LLM which like putting out fire with fire.
Security Concerns
- Non-deterministic nature of LLMs can cause failure of LLM based applications.
- Heavy dependency on prompt can lead to potential security risk:
- Prompt Injection
- Jailbreaking
- And more…
- CVE issues:
— SQLDatabaseChain has SQL injection issue
https://github.com/langchain-ai/langchain/issues/5923
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-36189
https://github.com/langchain-ai/langchain/pull/6051
— llm_math chain enables simple remote code execution (RCE) through the Python interpreter
https://twitter.com/rharang/status/1641899743608463365
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-29374
- No concrete solution. Existing solutions involved moving code from langchain to langchain-experimental [2].
- These days lots of applications are being developed which use LLM to generate code, however, given the stochastic nature of these LLMs, it is possible to generate an unsafe line of code which may cause failure of the entire system.
The Road Ahead
GenAI’s potential is undeniable, but its sustainability requires addressing these challenges. Finding more efficient training methods, exploring renewable energy sources, developing robust evaluation frameworks, and prioritizing security are crucial steps. Only through a multi-faceted approach can GenAI achieve its full potential for a sustainable future.
In conclusion, GenAI presents a powerful new paradigm. However, ensuring its long-term sustainability requires addressing these economic, socio-economic, environmental, and production-related challenges. By fostering responsible development and deployment practices, we can harness the potential of GenAI for a positive and sustainable future.
The road to a sustainable GenAI future requires addressing these economic, social, environmental, and production-related challenges. Continuous research into efficient model training, pricing structures, and responsible development practices are crucial. Only through a multi-pronged approach can GenAI fulfill its promise while ensuring a sustainable future.
Reference:
- A. M. Bran, S. Cox, O. Schilter, C. Baldassari, A. White, and P. Schwaller, ‘Augmenting large language models with chemistry tools’, in NeurIPS 2023 AI for Science Workshop, 2023.
- Goodbye CVEs, Hello `langchain_experimental`.
https://blog.langchain.dev/goodbye-cves-hello-langchain_experimental/ - Cost of Rag by Magdalena Kuhn
https://www.linkedin.com/feed/update/urn:li:activity:7181168603906359296/ - GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
https://arxiv.org/abs/2303.10130 - The Short-Term Effects of Generative Artificial Intelligence on Employment: Evidence from an Online Labor Market
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4527336 - The jobs being replaced by AI — an analysis of 5M freelancing jobs
https://bloomberry.com/i-analyzed-5m-freelancing-jobs-to-see-what-jobs-are-being-replaced-by-ai/ - ChatGPT is already stealing work from freelancers
https://www.businessinsider.in/policy/economy/news/chatgpt-is-already-stealing-work-from-freelancers/articleshow/105681113.cms - Nemo Guardrail by Nvidia
https://github.com/NVIDIA/NeMo-Guardrails/tree/main