What AI strategy to adopt for your projects?

Cost, confidentiality, performance… The choice between open source or proprietary model requires defining with clarity the objectives and constraints inherent to one’s project.

Free or proprietary? Mixtral, Llama, Zephyr… Major open source language models tend to catch up with the performance gap of traditional proprietary models (GPT-4, Claude 2…). For companies, the choice can be complex. However, there are general guidelines that can help guide the choice of model in advance.

What performance need?

The final decision will depend largely on the nature of the use case. Some generative AI projects require performance that currently only proprietary models can provide. “It all depends on the complexity and subtlety of the analysis that the model must perform. For example, for the technical documentation of companies, if the reasoning is very complex, a closed and proprietary model like GPT-4 may be more performant. Typically, for the most complex use cases, such as contract analysis, we need a very high-performing model,” argues Nicolas Gaudemet, chief AI officer at Onepoint.

What knowledge base?

For use cases using RAG (retrieval augmented generation), where the model relies on a documentary base for its responses, performance quickly deteriorates on very long corpora. It may then be necessary to opt for a fine-tuned open source model. “The limit that we have seen in practice, with our clients as well as internally in our own use, is that these models can effectively process only documents of up to 160 pages. Beyond that, we observe losses of information and precision. There are some workarounds, such as customizing embeddings, which push this limit a little, but it remains around a maximum of 200 pages in practice,” analyzes Didier Gaultier, head of AI at Orange Business.

Once this limit is reached, it becomes preferable to divide the documentary base, according to use cases. “If we take the example of use cases in human resources, marketing, and legal, this represents three distinct uses and therefore requires the creation of three separate documentary bases,” illustrates the expert. If it is still necessary to exceed 300 pages of documentation for a specific use case, it is recommended to switch directly to fine-tuning with open source models.

What usage volume for what cost?

Another aspect to consider lies in the project implementation phase. For tools in the experimentation phase (proof of concept, MVP, Pilot…), with few model calls, “I would rather use a model that I will pay by token. This is the case for all closed models,” analyzes Nicolas Gaudemet. For developers, the implementation is much simpler and allows for rapid progress. It is also now possible to switch to open source models via APIs. This is the case for models developed by Mistral or LightOn, notably in France.

For projects that go into production, the thinking is different. If the developed tool generates a significant flow of requests, costs can quickly escalate with a proprietary model. “If we have several thousand employees making a dozen requests per day for the same use case, then it becomes relevant to opt for an open-source model where only the fixed cost of hosting is charged, and not the volume of requests. Beyond a certain threshold of use, switching to an open source solution whose infrastructure costs are controlled thus presents a clear economic interest compared to proprietary models billing per request,” estimates the AI specialist from Onepoint.

The important thing is to accurately calculate the economic profitability of an open source solution. Using an open source model requires substantial hardware resources. “Today, you are required to have a minimum of two H100s (Nvidia GPUs) per model, or even three with Rag. This represents an investment of about 300,000 euros once optimally integrated (dedicated rack with GPU, RAM and adequate storage capacities). Such a deployment certainly allows serving the needs of several hundred users, but requires a significant initial investment in infrastructure,” warns Didier Gaultier.

What latency?

The choice of model can also significantly impact the response time of the program. “In terms of execution speed, open source models are currently the most efficient. Thanks to significant efforts to optimize the compactness of these models, their reduced weight allows for very fast inference. Their latency is thus unbeatable compared to proprietary models,” argues Nicolas Gaudemet.

For use cases requiring text generation in French, it is best to avoid small open models. “For example, we will consider Mistral’s 8x7B or Meta’s 70B models. Especially since we are in France and need models that speak French correctly, which is not the case for small models like Mistral 7B or Llama 27B. Open source is justified with models large enough to handle the complexity of the French language,” notes the Orange Business expert.

Open source or proprietary, the important thing is to accurately calibrate all the aforementioned variables over time. Implementing a generative AI strategy requires thorough preparation and rigorous monitoring. It is important to carry out regular tests to evaluate the relevance, quality, efficiency, and reliability of the chosen models, and to adjust or replace them if necessary.

Leave a Reply

Your email address will not be published. Required fields are marked *