Short answer: SAS still works, R and Python are where pharma is heading. But the real choice is an architecture decision, not a language one.
Author's note
This conversation comes up constantly. In calls with clients, in internal discussions, in industry threads. Someone asks: "Should we go with R or Python?" Someone else pushes back: "SAS still works fine." There's a natural tension every time, and it rarely gets resolved cleanly.
The reason it never gets resolved is that the question is wrong. The choice between R, Python, and SAS isn't a language decision. It's an architecture decision. It's a people decision. And increasingly, it's a decision about how your organization will work five years from now.
In this article, I want to lay out where each language actually stands in a modern statistical computing environment, what the real trade-offs are, and why the AI era changes the calculus in ways that matter right now.
SAS: the known entity
SAS dominated pharmaceutical statistical computing for decades. It earned that position. Stable. Predictable. Well-understood by regulators. Teams knew what they were getting: a controlled, self-contained ecosystem where the vendor handled most of the complexity.
But that dominance came with assumptions that no longer hold.
The FDA never mandated SAS. This is still one of the most persistent myths in the industry. The FDA accepts results from any validated software. SAS became the default because it was the safest institutional choice at the time, not because regulators required it.
The cracks in that default are now visible. SAS licensing costs are significant, and they compound as teams grow. The talent pipeline is thinning. Universities are producing graduates fluent in R and Python, not SAS. The ecosystem is closed: you work within what SAS provides, on SAS's timeline, at SAS's price.
None of this means SAS is broken. It means the reasons organizations chose SAS decades ago are not the reasons that should drive the decision today.
R's pharma moment
R didn't just show up in pharma. It built the infrastructure to be taken seriously.
Start with CRAN. Over 20,000 packages, all following a standardised structure: documentation, testing, dependency declarations, versioning. CRAN isn't just a repository. It's a quality gate. Every package submitted goes through automated and manual checks before it's published. That kind of ecosystem-level discipline is rare.
On top of CRAN, the pharma industry built Pharmaverse: a coordinated, cross-company initiative to create submission-ready R packages for clinical reporting. Packages like {admiral} for ADaM dataset creation, {rtables} and {tfrmt} for TLG generation, and {riskmetric} for package risk assessment. These aren't side projects. They're maintained by teams at Roche, GSK, Novartis, J&J, and others, with governance structures and release cycles.
Then there's the testing and documentation story. {testthat} gives R a mature unit testing framework. {valtools} was built specifically for validation documentation in regulated settings. roxygen2 standardises function-level documentation. The R Validation Hub published a risk-based framework for package assessment that has been adopted across the industry. As a testament to the fact that R is not an experiment anymore, Roche achieved the first end-to-end R-based regulatory submission to the FDA, EMA, and China's NMPA, using open-source Pharmaverse packages for SDTM, ADaM datasets, and TLG generation.
What sets R apart is that all of this happened through community coordination. No single vendor decided it. Companies that are competitors in the market collaborated on shared infrastructure because the problem was bigger than any one of them. That's a different model than SAS, and it scales differently.
Python enters the room
Python's strength is breadth. It's the dominant language for machine learning, data engineering, and general-purpose automation. The talent pool is enormous. If you're hiring data scientists or ML engineers, they almost certainly know Python.
In pharma specifically, Python is gaining traction in areas adjacent to traditional statistical computing: data pipeline orchestration, image analysis, NLP for adverse event processing, integration layers between systems. These are real, valuable use cases.
But the pharma-specific validation story for Python is still maturing. There's no equivalent to Pharmaverse, and the gap goes deeper than that: CRAN and PyPI don't have the same level of rigor. CRAN enforces standardised documentation, automated checks, and dependency declarations before a package is published. PyPI does not. You can upload almost anything to PyPI with minimal quality gates. That difference matters in a regulated setting where you need to assess and trust what you're bringing into your environment. Python was built as a general-purpose programming language and adopted for data science. It's being pulled into statistical work, but that's a recent move, not its origin.
This doesn't disqualify Python. It means Python occupies a different position in a modern SCE than R does. It's strong where R is weaker (ML, engineering, integration) and weaker where R is stronger (statistical methodology, regulatory precedent, pharma-specific tooling). A mature SCE should use both.
The automation and AI angle
Here's where the conversation shifts.
SAS can be automated. It supports batch mode execution, has a macro language, and SAS Viya adds REST APIs and Kubernetes support. So the claim that "SAS can't be automated" is not accurate.
But every piece of that automation requires a proprietary licensed runtime. There is no open-source SAS interpreter. You cannot put SAS in a Docker container and run it on GitHub Actions or Airflow without a license. You cannot embed SAS in an open CI/CD pipeline the way you can with R or Python. The automation is real, but it lives inside the SAS ecosystem. You're automating within a walled garden.
R and Python are different. They can be orchestrated end-to-end with open tooling: Docker, GitHub Actions, GitLab CI, Airflow, cron, Kubernetes. No license required. No vendor dependency in the pipeline. The entire chain from code to validated output can be open, auditable, and reproducible using tools the broader technology industry has standardised on.
This distinction matters more than it used to because of AI.
Large language models are trained on publicly available code. R and Python codebases are massively represented on GitHub, Stack Overflow, and in open-source documentation. SAS code is not. The proprietary culture around SAS means less public code, less training data, and as a result, weaker AI support. LLMs generate R and Python code with significantly higher quality and coverage than SAS.
That gap will widen. As AI becomes embedded in development workflows (code generation, test writing, documentation, review), the languages with the deepest open-source footprint will benefit most. R and Python were built for the kind of programmatic, scriptable, open workflows that AI tools enhance. SAS was not designed with that model in mind.
This isn't a theoretical concern. Teams using Copilot, Cursor, or similar tools are already seeing the difference. The productivity gain from AI-assisted development is real, and it's disproportionately available to teams working in open-source languages.
What "validated" means in a multi-language SCE
Supporting multiple languages in a regulated environment isn't just about installing R and Python alongside SAS. It requires a validation strategy that works across all of them.
The practical requirements are consistent regardless of language: package management with version pinning, reproducible environments (containers or equivalent), automated testing integrated into the deployment pipeline, and risk-based qualification that scales without treating every package as a bespoke validation project.
The R ecosystem has a head start here. The R Validation Hub's risk-based framework, tools like {riskmetric}, and the standardised structure of CRAN packages make it possible to assess and qualify packages at scale. Python's ecosystem is catching up, but the tooling is less pharma-specific.
The key architectural principle is that validation should be a property of the environment and process, not of the language. If your SCE enforces version control, automated testing, and traceable deployments, the compliance story holds whether the code is in R, Python, or both. The language becomes a detail. The platform is what guarantees control.
The honest trade-off
There is very little that SAS can do that R and Python cannot. Statistically, R covers the same ground and often goes further. Python fills the engineering and ML gaps. Between the two, the functional coverage is broad.
But SAS is a known entity with known problems. Teams can maintain it without deep programming expertise. The learning curve is shallow for the core use cases. The vendor handles updates, documentation, and support. For organizations with stable workloads and teams that aren't ready to invest in new skills, SAS reduces operational uncertainty.
R and Python bring new capabilities, but they also bring new costs. Training. Hiring or retraining staff. Building and maintaining internal tooling. Standing up governance frameworks. Managing open-source supply chain risk. Some organizations can absorb that. Others can't, at least not yet.
Pretending that switching to open source is free, or that staying on SAS is safe, are both wrong. The real question is what your organization can execute on today and what it needs to be capable of tomorrow.
If the concern is SAS language compatibility rather than the SAS Institute specifically, alternatives exist. Altair SLC (now part of Siemens) is one we've pointed clients to. It runs SAS language programs and offers a more open integration story.
Preparing for what's next
The organizations investing in open-source SCEs today aren't just cutting license costs. They're positioning themselves for a future where AI-assisted validation, automated reporting, and programmatic control of statistical workflows are the norm, not the exception.
R and Python are languages that AI models understand deeply. Their ecosystems are open, scriptable, and composable. The infrastructure that supports them (Git, CI/CD, containers, IaC) is the same infrastructure that makes AI integration practical.
SAS was built for a world where statisticians worked in a controlled, self-contained environment and submitted results through manual processes. That world is changing. The SCE decision you make today is a bet on how your teams will work in three to five years.
Organizations that delay modernization don't just fall behind on current capabilities. They fall behind on their ability to adopt whatever comes next. That's a compounding problem, and the gap is already visible.
Asking which language is "better" misses the point. The real question is whether your SCE is built for the way your teams need to work now, and the way the industry will work soon. The language follows from the architecture. Get the architecture right, and the language question answers itself.
About the author
Rafael Pereira leads the Platform Engineering Unit at Appsilon, where he helps enterprise clients, primarily in the Life Sciences sector, scale analytics and statistical computing through secure, cloud-native platforms. With over a decade of experience spanning infrastructure, data platforms, and DevOps, he specializes in designing compliant, scalable environments for R and Python workflows.

