Pharma Brief Extended #1: AI Agents in Action, FDA Rollouts, and Open-Source Game Changers

Reading time:
time
min
By:
Gift Kenneth
June 11, 2025
This is issue #1 of Pharma Brief Extended. If you're reading this on the blog, you can subscribe here to receive future issues in your inbox.

From the Editor

Welcome to the first issue of Pharma Brief Extended, a long-form version of the insights originally shared in the Pharma Brief on LinkedIn.

This newsletter brings you expanded context, curated tools, and actionable updates for professionals shaping the future of pharma, data, and compliance.

Thank you for subscribing. This is a new format, and it will continue to evolve based on your feedback and contributions. We’re building it with the community in mind.

In this issue, we unpack the shift from LLMs to AI agents, explore emerging standards like the Model Context Protocol, and highlight the role of open source in transforming how pharma teams work. You’ll also find upcoming events, tools worth exploring, and spotlighted open-source packages shaping clinical research.

—Gigi Kenneth, Editor

💡Lead Insights

AI Agents, Everywhere: Pharma’s Quiet Leap Forward
The pharma world is moving fast from prompt engineering to agent-based systems that can operate with greater autonomy and flexibility.


Anthropic’s Model Context Protocol (MCP) is quickly becoming a reference point for how to structure these agents. The new course from Andrew Ng and DeepLearning.AI walks through real-world use, showing how context-aware AI could transform healthcare workflows.


On the industry side, Benchling is embedding Claude via Amazon Bedrock into core biotech R&D tools, showing real-world time savings from new AI assistants like:

  • Data Entry Assistant (saving up to 2 weeks per workflow)
  • Notebook Check (improving reproducibility)
  • SQL Assistant (cutting query time from hours to seconds)

Meanwhile, the FDA’s internal rollout of generative AI across all centers is set to complete by June 30. A successful pilot has already shown faster therapy evaluations thanks to the automation of repetitive reviews. A secure, shared platform is being developed to balance center-specific needs with compliance.  As part of this rollout, the agency has also launched Elsa, a generative AI tool to boost efficiency across reviews, inspections, and data tasks. Operating within a secure platform, Elsa ensures that sensitive internal documents remain confidential and are not used for external model training.


And if you’re still catching up on the shift, BCG’s latest report is a must-read. It breaks down the evolution from chains of prompts to autonomous agents and how pharma is experimenting in trial ops, medical writing, and more.

📈Signals & Shifts

Emerging ideas and tools that pharma leaders are watching, spanning clinical AI, open-source frameworks, and regulatory tech.

🧪 R-Statistical Environment for In-Silico Trials

The EU-funded SIMCor project has introduced an open-source web application built with R, R Markdown, and Shiny. This tool assists in validating virtual cohorts and conducting in-silico trials. It supports various statistical analyses, including univariate, bivariate, and multivariate comparisons, as well as sample size calculations. The application is available for use via GitHub and Zenodo, and can be deployed locally or accessed through a Software-as-a-Service model.

🤖 AI-Powered Patient-to-Trial Matching

Researchers have developed TrialMatchAI, an AI-driven system designed to automate the matching of patients to clinical trials. Utilizing fine-tuned, open-source large language models within a retrieval-augmented generation framework, TrialMatchAI processes diverse clinical data, including structured records and unstructured physician notes. It has demonstrated high accuracy in matching oncology patients to relevant trials, with over 90% accuracy in criterion-level eligibility classification.

🧬 MatchMiner-AI for Cancer Trials

MatchMiner-AI is an open-source pipeline aimed at improving patient enrollment in cancer clinical trials. It extracts key information from electronic health records and ranks potential trial matches based on disease context. The tool includes modules for information extraction, rapid ranking of candidate matches, and classification to assess the suitability of matches. Code and synthetic data are publicly available, facilitating adoption and adaptation by other institutions.

🧠 mAIstro: Automated Development for Medical Imaging Models

mAIstro is an open-source, multi-agent system that automates the development of radiomics and deep learning models for medical imaging. It orchestrates tasks such as exploratory data analysis, feature extraction, image segmentation, and model training through a natural language interface, requiring no coding from the user. The system has been evaluated across various imaging modalities and datasets, demonstrating its versatility and effectiveness.

💡Need more insights? Check out those shared in Pharma Brief on LinkedIn.

🔍 Open Source Spotlight

A quick look at promising open-source tools making an impact in pharma, data science, and clinical research.

{muttest} is a mutation testing tool for R that works by introducing small changes (mutations) into your code to verify that your existing tests can catch them. It’s a useful way to improve test coverage and ensure robustness.

{mergen} uses AI to turn your data analysis questions into executable code, clear explanations, and algorithms. Its built-in self-correction loop boosts performance and accuracy by refining outputs on the fly. With a chat-style interface, mergen makes it easy to interact with your data and extract insights - no deep coding knowledge required. Perfect for fast, intuitive analysis in omics and beyond.

gptstudio is designed to help R developers integrate large language models (LLMs) into their workflows. It supports a range of use cases for LLMs in knowledge work, with a focus on transparency and ethical use. For additional functionality, gpttools offers a suite of RStudio addins tailored to LLM use.

{kuzco} is a lightweight computer vision assistant library in R, designed to work seamlessly with local LLMs. Built with tools like Ollama, it interfaces through ollamar and ellmer to chat with LLMs efficiently. kuzco currently supports: classification, recognition, sentiment, text extraction and alt-text generation.

{cheetahR} is an R package by cynkra, that brings the blazing speed of Cheetah Grid to R. Built for performance and efficiency, cheetahR can render millions of rows in milliseconds, making it a powerful alternative to {reactable}, {DT}, and other R table widgets.

The package wraps Cheetah Grid’s JavaScript functions in a seamless R interface, giving users a high-performance, interactive table widget that’s perfect for large datasets and Shiny apps.

Clinical Data Analysis: Open Source in Pharma ebook

📺 What We’re Watching

Presenting aNCA: From Idea to Clinical Impact | ShinyGatherings x Pharmaverse | Appsilon

At the recent ShinyGatherings x Pharmaverse session, Valentin Legras, Jana Spinner, Gerardo Rodriguez, and Mateusz Kołomański shared how they built an open-source Shiny app to automate Non-Compartmental Analysis (NCA), a core method for pharmacokinetics used in early-stage drug development.

What started as an intern-led initiative at Roche has evolved into a full-fledged Shiny app that simplifies pharmacokinetic analysis, from raw data to submission-ready TLGs (tables, listings, and graphs). The app combines usability for scientists (no coding required) with full traceability and customization options.

Aga’s Pick from the Pharma Brief

A quick look at promising open-source tools making an impact in pharma, data science, and clinical research.

Curated by Aga Rasińska, editor of the Pharma Brief on LinkedIn.
Interpretable machine learning leverages proteomics to improve cardiovascular disease risk prediction and biomarker identification: In a collaboration between Novo Nordisk and Microsoft, researchers have demonstrated the powerful potential of combining machine learning with proteomics and clinical data to improve cardiovascular disease risk prediction.

Using data from the UK Biobank, their study reveals how proteomic markers can significantly enhance risk assessment beyond traditional methods.

Their Explainable Boosting Machine (EBM) model outperforms established clinical scores and other machine learning approaches, offering not only better predictions but also personalised insights into individual risk factors.

This research paves the way for redefining personalised medicine in cardiovascular health by highlighting that similar risks may arise from very different underlying biological contributors.

🎧What We’re Listening To

Machine Learning Modeling in Neuroscience Clinical Trials Design

Machine Learning Modeling in Neuroscience Clinical Trials Design

Neuroscience trials are notoriously difficult, expensive, complex, and plagued by high placebo responses and attrition rates. In this episode of Where Technology Meets Science, Jing Dai, Director of Biostatistics at Jazz Pharmaceuticals, joins Appsilon’s Nat to discuss how her team uses machine learning to improve trial design, reproducibility, and regulatory readiness. You’ll hear insights on dealing with bias, interdisciplinary collaboration, and how AI is already shaping real-world trial protocols.

Available on:

Upcoming Events & Meetings

The Anatomy of a Modern Statistical Computing Environment

Ready to move beyond outdated clinical data systems? Join pharma data leaders from Jazz Pharmaceuticals and Novo Nordisk in this live podcast episode as they share insights on building modern, compliant Statistical Computing Environments.

Speakers:

  • Megan Dunham, Director of Technology Enablement at Jazz Pharmaceuticals
  • Andrew York, Clinical Data Science VP at Novo Nordisk
  • Rafael Pereira, Platform Unit Lead at Appsilon

Get actionable insights from teams who've successfully transitioned to modern, submission-ready computing environments. Whether you're exploring options or planning your next move, this session will help you make informed decisions about your data infrastructure.

Save Your Spot Today

Try this

CDISC Dataset Generator  

This application generates synthetic clinical trial and nonclinical research data that complies with CDISC standards (SDTM, ADaM, SEND). It helps researchers, data managers, and programmers to quickly generate realistic test datasets for training, validation, and software testing purposes.

CDISC Data Generator

Want to contribute?

Have a use case, open-source tool, or insight to share? Just reply to this email or send it to newsletter@appsilon.com. We’re building this with the community, not just for it.

Thank you for reading.

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Stop Struggling with Outdated Clinical Data Systems

Join pharma data leaders from Jazz Pharmaceuticals and Novo Nordisk in our live podcast episode as they share what really works when building modern, compliant Statistical Computing Environments (SCEs).

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.

Ensure Your R and Python Code Meets FDA and EMA Standards

A comprehensive diagnosis of your R and Python software and computing environment compliance with actionable recommendations and areas for improvement.
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Pharma Brief Extended