The Hidden Challenges of Working with Cloud Large Language Models: A Text2Graph Case Study

Reading time:

time

min

June 24, 2024

Staying ahead of the curve in machine learning often means adapting to unexpected changes. Recently, our team at Appsilon encountered a situation that highlights the importance of constant monitoring and flexible solutions when working with cloud-based Large Language Models (LLMs).

Interested in a demo of our Text2Graph application? Reach out to experts to set up a call today.

Today, we'd like to share our experience with GPT-4 and how it impacted Text2Graph (our R/Shiny application designed to transform your data into insights).

The Scenario: Using LLM APIs in Text2Graph

Our Text2Graph platform relies on GPT-4 to generate code based on specific prompts. The core of our solution involved sending a carefully crafted prompt to GPT-4 and expecting the generated code to be neatly wrapped in triple backticks (```).

The Unexpected Change: Challenges with LLM API Updates

After a recent update to GPT-4, we noticed something peculiar. In about 50% of cases, the model started including the language type within the code block, resulting in outputs like ```r instead of just ```. This subtle change, while seemingly minor, had a significant impact on our application's ability to process the generated code correctly. Let’s make it clear. The whole prompt was exactly the same, but the output has changed in a systematic way!

Our Solution: Adjusting Prompts for LLM APIs

Interestingly, our approach to solving this issue wasn't to modify our application to handle both ``` and ```r. Instead, we found that adjusting our prompt was the most effective solution. This experience underscores the importance of prompt engineering and the delicate balance between the prompt, the model, and the application processing the output.

The Takeaway

This incident brings us to a crucial point that all developers and companies working with cloud LLMs should keep in mind:

With cloud LLMs, there's no guarantee that a solution working today will continue to work tomorrow.

Unlike traditional software where you have control over the version and behavior of your tools, cloud-based AI models can be updated at any time, potentially altering their output in ways that might affect your applications.

Interested in learning how we apply machine learning to drug discovery? Check out Crystal Clear Vision, our model for protein crystal detection.

The Importance of Monitoring

This experience showed us the importance of monitoring production applications that rely on LLMs.

Here are a few key reasons why:

Regular monitoring can help you quickly identify when model outputs start deviating from expected patterns.
By catching issues early, you can adjust your prompts or application logic to maintain consistent performance.
Continuous monitoring helps ensure that your AI-powered solutions remain reliable and trustworthy for your users.
With proper monitoring in place, you can adapt to changes in model behavior swiftly, minimizing downtime or degradation in service quality.

Additional Challenges of Using LLM APIs

While building with LLMs, here are some other challenges that should be considered:

Context Window Limitations: LLMs have input size limits, necessitating creative solutions like chunking data or using embeddings.
Latency and Performance Issues: LLMs can be slow, and chaining calls makes latency problems worse. Sometimes you may need to opt-in for a more performant model of a lesser quality.
Prompt Engineering Complexity: Crafting effective prompts requires continuous experimentation. It’s worth pointing out that an optimal prompt for OpenAI’s GPT-4o will be different from Anthropic’s Claude 3.5. It will be even different from the GPT-4 prompt!
Prompt Injection Risks: Guardrails are needed to mitigate security risks.
Product Development Realities: LLMs are tools for features, not complete products, requiring standard design and validation processes.
Legal and Compliance Concerns: Ensuring data privacy and regulatory compliance is essential.

Summing Up Challenges of Working with Cloud LLMs

As we continue to push the boundaries of what's possible with AI and LLMs, we should remember that these powerful tools come with their own set of challenges. By being ready to adapt, we can utilize these tools to their full potential. Sometimes it’s worth using additional layers between you and the LLMs with tools like LangChain.

At Appsilon, we're committed to sharing our experiences and insights as we navigate this exciting and rapidly changing landscape. We hope that by sharing this, we can help other teams better prepare for the unique challenges of working with cloud-based LLMs.

Interested in more insights and best practices in R/Shiny and Machine Learning? Subscribe to our newsletter to stay up to date.

Note: Thank you Pasza Storożenko for providing guidance in writing this article.

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Stop Struggling with Outdated Clinical Data Systems

Join pharma data leaders from Jazz Pharmaceuticals and Novo Nordisk in our live podcast episode as they share what really works when building modern, compliant Statistical Computing Environments (SCEs).

Save My Spot

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.

Get the Checklist

Ensure Your R and Python Code Meets FDA and EMA Standards

A comprehensive diagnosis of your R and Python software and computing environment compliance with actionable recommendations and areas for improvement.

Book the Audit