Why Most Statistical Computing Environments in Pharma Weren't Built for What's Coming Next

Reading time:

time

min

April 16, 2026

Author's note

I recently presented at PhUSE Connect 2026 in Austin. The talk covered what it takes to build a cloud-native, open-source-first statistical computing environment from the ground up. The reasoning, the architecture decisions, and the lessons we learned along the way.

The talk went well, but what stuck with me most were the conversations afterward. Almost every team I spoke with is dealing with the same tension: their computing environment was designed for a different era, and they are aware. They can see what modern teams need. They just can't get there from where they are.

This article is about that gap. The specific problems that legacy SCEs create for pharma teams today, and why those problems are getting harder to ignore.

The platform your teams work on was probably designed a decade ago

Most statistical computing environments in pharma were designed when SAS was the default language, studies were simpler, timelines were longer, and "open source in regulated settings" was still a controversial idea.

These platforms did their job. Many still do, technically, and if we can learn from other industries on the tech side, they will continue to do so for the foreseeable future.

But the unfortunate reality is that most legacy SCEs are holding teams back. Data volumes have grown. Teams arrive already fluent in R and Python. Regulatory expectations keep rising. The conversation has moved to AI and machine learning.

These systems can still run analyses, but they can't keep up with how modern teams need to work.

They need to be changed, and they need to keep being changed. But these older systems, and the processes around them, were not built with that in mind.

Where legacy SCEs break down

In our work with pharma clients, and in conversations across the industry at events like PhUSE, we see the same problems show up again and again, in relation to these environments.

They're monolithic. If you want to update one piece, you have to update the whole thing. What should be a minor update becomes a months-long project. Most platforms we discuss with clients were built 10 to 15 years ago and are still being maintained. That creates a lot of constraints, and a lot of "quick fixes" that ended up being a feature rather than a temporary fix.

Validation processes are manual, usually outdated, and slow. Many teams still treat every R package as a one-off validation project (some don't even consider R packages!). There's no risk-based prioritization, no automated assessment, still treating systems and risk the way it was treated decades ago, no matter the complexity. The validation effort ends up looking the same whether a package or tool is low-risk with an amazing track record or genuinely something new and potentially unproven.

They're disconnected from modern tooling. No Git-based workflows. No Infrastructure as Code. No containers. Teams end up working in environments that look nothing like what the broader technology industry uses, and that disconnect becomes a real barrier when you try to introduce anything new.

The "two-environment problem." Exploratory work happens in one system. Submission work happens in another, often with completely different tools (usually one with modern approaches, other with traditional ones). Everything gets done twice. Inconsistencies creep in between development and production. Teams spend time re-doing work instead of advancing it.

Change management becomes the hidden bottleneck. This is the one I keep coming back to. When I was asked during PhUSE what lessons I carry from more recent projects, my answer was immediate: do not underestimate change management. Not because it's technically hard, but because it requires coordination between IT, QA, and the build (or in our case Platform) teams. Groups that often don't share a common vocabulary. If QA has no idea what Git is, your deployment process will stall no matter how good your technological components or architecture is.

What modern pharma teams actually expect

The requirements we hear from clients are consistent: Multilingual support, cloud-native infrastructure, Git-based version control, and a design that can support AI and ML workloads as the platform matures. Some go further and build entirely on open source, skipping SAS from the start, and not for any ideological reasons, but as a strategic bet to maintain full control over their methods, packages, reproducibility and max flexibility.

The vision behind these requirements is not a traditional computing environment but what we've come to call an "intelligence platform" → a system designed to eventually span from data ingestion through CSR automation, covering functions well beyond biometrics into pre-clinical, bioinformatics, and translational medicine.

These requirements come from teams across the industry. The difference is in the starting point. Organizations building from scratch have a real advantage: no legacy to work around, no inherited architecture forcing compromises.

This enables us to move very fast. In one recent engagement, a greenfield platform went from nothing to a functioning GxP environment in six months. With Infrastructure as Code and automated deployments, environment changes take a couple of hours. In some of our environments, we're close to or already achieving a GxP release every two weeks. That's a pace most SCEs can't come close to matching.

Most organizations don't have that luxury but instead of rethinking their foundational architecture and processes, they're trying to bolt new capabilities onto platforms that were never designed for them.

The AI readiness question you should be asking

Everyone at PHUSE was talking about AI. That wasn't surprising.

What did surprise me was the gap between ambition and readiness. Companies are investing heavily and moving fast into AI, sometimes into areas where the foundations aren't there yet. We're still working out human-in-the-loop workflows and meaningful augmentation, and the conversation has already jumped to agentic AI, often without a clear understanding of what it costs to run or what it takes to maintain. Speed without foundations isn't progress. It's just expensive chaos.

This reminded me of the cloud migration wave a few years ago. A lot of companies committed hard to full cloud migration, and costs ballooned. Not always because cloud was wrong for them, but because they weren't prepared. Teams untrained, workloads unoptimized, lift-and-shift treated as a strategy rather than a starting point. The lesson wasn't "move slower." It was "don't skip the work that makes speed possible."

The parallel to SCE modernization is direct. I've written before about why open-source adoption works as a practical indicator of AI readiness. The same infrastructure and processes that support open-source workflows (containerization, version control, cloud-native architecture, programmatic interfaces) provide the foundation for AI.

Organizations that have already invested in these capabilities can adopt AI tools with far less additional effort than those who haven't. Those still on legacy platforms face months or years of modernization before AI becomes practically accessible.

This is a very real scenario we face every day in our consulting work: clients talking about AI adoption for 2027, 2028, or even 2030; not for a lack of ambition, but simply, the groundwork isn't there yet.

And the groundwork isn't just a technical challenge to fix but a process that involves change management, workflows, how teams actually work, review, approve, and hand off.

The organizations that will move fastest with AI modernized how they work when they adopted it. Fast shouldn't mean without care, it should mean modern.

The foundations you build for open source also happen to be the foundations that make AI accessible when you need it. That creates a compounding disadvantage for those who wait. The longer you delay modernizing your SCE or parts of it, your workflows, your infrastructure, your processes, the further behind you fall, not just on current capabilities, but on your ability to adopt whatever comes next.

The problems are organizational, not just technical

So to go deeper into this. One of the most important lessons from our work overall is about people, not technology.

Having the right software development practices (Infrastructure as Code, automated deployments, approval gates) isn't enough if the teams those processes depend on aren't aligned. In some engagements, we started with two meetings a week with everyone in the room. It just isn't productive. We learned to separate problems and address them with the right groups rather than trying to solve everything together. Adopting modern engineering practices to resolve problems.

The real source of friction isn't resistance to change. It's a depth-of-knowledge gap. It takes time for IT, QA, and in our case Platform teams, to develop a shared understanding of how a system as complex as an SCE works. If you're planning a modernization, start with that alignment. Technology is the easier part.

Questions worth asking about your current SCE

If your current environment was designed more than five years ago, a few honest questions can help you gauge where you stand:

Can you update a single component without revalidating everything? If not, your architecture is probably monolithic, and every change is more expensive than it needs to be.

Are your teams using the same tools for exploration and submission? The two-environment problem wastes time and introduces risk. Modern environments let you explore in R and submit in R, on the same platform with different control levels.

Is your validation approach risk-based? Treating every R package as a standalone validation project doesn't scale. The R Validation Hub has published a risk-based assessment framework that reduces the validation burden while maintaining rigor where it matters.

Is your infrastructure defined in code? If environment changes require manual configuration and separate documentation, you're paying a time tax on every update. With the right setup, compliance becomes a byproduct of good engineering practice rather than a separate workstream.

Could your current platform support AI workloads if you needed it to? If the answer is no, or "not without major changes," that tells you something about the gap between where you are and where the industry is heading.

There are no universally right answers here. The right SCE depends on your organization's size, study portfolio, and internal capabilities. But asking these questions honestly is the starting point.

Summing up

A well-built SCE feels invisible when things work and indispensable when you need it. It accelerates your work instead of adding friction. And can quickly adjust when you need it, without compromise.

For many organizations, that's not what's happening. Legacy platforms create friction where there should be flow. They force teams to work around limitations rather than through capabilities. And they make it harder to adopt the tools and practices that modern drug development requires.

The fix isn't always a full rebuild, and in some cases it's modular improvements, targeted at the highest-friction areas. In others, the right move is addressing change management before touching the technology at all.

But the starting point is always the same: an honest assessment of whether your current environment is built for what your teams need today, and what they'll need two to five years from now a few months from now, given how fast technology is moving today.

For a deeper look at modern SCE architecture, see The Anatomy of a Modern Statistical Computing Environment in Pharma.