How a Clinical-Stage Biotech Built an Open Source SCE in Six Months

Table of contents

Before:

Active clinical programs but no statistical computing environment to support them. The team needed something cloud-native, GxP-compliant, and operational fast enough to keep up with the science.

After:

A working SCE on AWS in six months. R and Python as primary languages. Parallel GxP and non-GxP environments built on the same architecture, with audit trails generated as part of the engineering process.

About the Client

‍

The client is a precision genetic medicine company focused on gene therapies. The company is developing genetic regulators and enabling technologies with the potential to impact gene therapy delivery across the industry. They were seeking a partner to build a fully R-based, GxP-compliant statistical computing environment from the ground up - cloud-native, AI-ready, and operational fast enough to support active clinical programs.

‍

The Challenge

‍

What is an SCE, and why does it matter?

‍

A Statistical Computing Environment (SCE) is the platform where pharma and biotech companies run the analyses behind their regulatory submissions. It's where clinical programmers write, test, and execute the code that produces the tables, listings, and figures regulators rely on to evaluate new therapies.

An SCE has to meet strict requirements, especially in a GxP environment. Every analysis must be reproducible. Every change must be traceable. Every tool must be validated. And the environment itself needs to comply with GxP standards, the regulatory framework that governs clinical development.

For decades, most SCEs were built just around SAS, with on-premise servers, and manual validation processes. That model worked when studies were simpler and timelines were longer. It is less suited to today's reality: larger datasets, multi-language teams, AI/ML workloads, and the expectation that infrastructure should be as flexible as the science it supports.

‍

What the client needed

‍

As the company's pipeline advances, they need in-house analytical capabilities. Their starting point was unusually clean: no legacy SAS footprint, no on-premise servers to phase out, no accumulated technical debt. That removed a few constraints and shortened some phases, but it didn't change the underlying design. The same architecture applies to organizations modernizing an existing SCE. The difference is sequencing, not design.

The setup they had in mind was innovative and ambitious. The client's analytical needs extend beyond traditional clinical programming into bioinformatics, translational medicine, and pre-clinical research. R and Python give access to a wider ecosystem for those use cases.

‍

The primary constraint was time. The client's clinical programs were active. The environment needed to be operational fast enough to support real work, not a multi-year build that might be ready for the next study.

‍

The Solution

‍

Appsilon partnered with the client to design and implement a modular platform with independently deployable components for infrastructure, compliance, and analytics based on our framework, the BioVerse. For this client we deployed the full stack as a greenfield build. Other organizations can adopt the whole platform, or fit specific components around an existing SCE without rebuilding what already works.

‍

Here are the important beats of the architecture, and why each layer matters.

‍

Containers: reproducibility by default

‍

Container technology is the foundation. Each analytical workload runs inside a container that packages the exact R version, all package dependencies at their specific versions, the operating system libraries, and the environment configuration. When a biostatistician runs code in development, the validated production environment runs that same code with the same dependencies. No environmental drift. No surprise version updates. And because each workload runs in isolation with defined resource limits, the blast radius of any potential issue is contained. That isolation is a security property that matters in GxP environments handling clinical data.

This also enabled Appsilon to deliver usable containers as interim deliverables while the full platform was still being built. The client's team started running workloads before the entire environment was complete. When the full platform went live, migrating code was straightforward because the container was transparent and its guarantees were held.

‍

Kubernetes: extensibility at scale

‍

Kubernetes orchestrates the containerized workloads. It handles automated scaling during peak submission periods, restarts failed workloads automatically, and keeps the system state defined in version-controlled manifests.

The practical value is simple: as long as something can be containerized, it can run on the platform. Each team gets their own copy of the environment, configured for their specific needs. New tools can be brought in, tested, swapped out.

‍

Infrastructure as Code and Git: compliance as engineering

‍

Everything in the environment, including the Kubernetes resources, is defined in code. All deployments are automated. All changes go through approval gates. No modification reaches the environment without version control and QA review.

Git provides an immutable record of every change: who made it, when, and why. Infrastructure as Code means the entire environment can be provisioned, replicated, or recovered from configuration files alone.

You can see what's running, what changed, and who approved it. When something fails, it's traceable. When an auditor asks how a specific environment was configured at a given point in time, the answer lives in Git. Operations and compliance stop being separate workstreams.

The value showed up in unexpected places. The bioinformatics team, writing hundreds of thousands of lines of exploratory code, found that Git changed how they worked. Being able to retrieve something written four months ago, because it suddenly became relevant again, turned out to be one of the most appreciated capabilities on the platform.

In practice, audit trails are generated automatically through Git workflows and infrastructure logging. What used to require weeks of manual documentation happens as a natural part of the engineering process. The target is GxP releases every two weeks, following modern sprint cycles rather than the months-long change windows common with legacy platforms.

‍

Parallel GxP and non-GxP environments, same toolchain

‍

The platform consists of two parallel environments: a GxP environment for regulated analyses and a non-GxP environment for exploratory work. Both use the same tools (R, Python, Git) and the same deployment patterns, the only difference is the level of control. That symmetry means code developed during exploration can move into the validated environment without re-implementation, and analysts work in the same toolchain end-to-end.

‍

Results

‍

Six months after kickoff, the client had a functioning, GxP-compliant SCE.

Client's team now runs regulated clinical analyses in the GxP environment and exploratory research in the non-GxP environment, all on one platform. Both environments share the same tools, container images, and deployment patterns.

Because exploration happens on an architectural twin of the validated environment, code proven out in non-GxP moves into GxP as-is, no rewrites, no re-validation of the runtime. The compliance boundary stays exactly where it should, but the path across it is straightforward by design.

Compliance overhead dropped. Audit trails generate themselves as part of the Git-based deployment process. Component updates don't require full system revalidation. Changes are isolated and tested independently, which keeps the platform current without triggering a months-long change cycle.

The architecture is designed to expand beyond biometrics. The client's roadmap includes bioinformatics, translational medicine, and pre-clinical workflows on the same foundation.

There is a compounding effect here. The same infrastructure that supports open-source workflows (containerization, version control, cloud-native architecture, programmatic interfaces) is also the foundation for AI integration. By building the right SCE now, the client is positioned for AI/ML capabilities without a second round of modernization.

‍

Lessons from the Build

‍

Define goals before selecting tools. The client knew they wanted R, Python, cloud-native infrastructure, AI-readiness, and cross-functional support. The specific tooling decisions followed from those goals. Starting with a vendor shortlist or a technology preference would have narrowed the design too early.

Invest in modularity from day one. The ability to update one component without revalidating everything else is what keeps the platform maintainable over time. Monolithic systems become projects where even small improvements require major effort.

Budget time for change management, as much as for the technical build. Getting IT, QA, and biometrics aligned on Git-based workflows and deployment approvals required deliberate coordination. The lesson: bring each stakeholder group in early, understand their specific concerns, and show how the system addresses them.

Choose a partner that speaks your language. Building a cloud-native, GxP-compliant SCE requires deep knowledge of both pharma regulatory requirements and modern platform engineering. The speed of this project depended on a shared vocabulary. When the biometrics team said CDISC, SDTM, ADaM, or TFLs, the platform engineering team knew exactly what computing power and compliance controls those workloads required.

‍

What This Means for Your Organization

‍

Most pharma and biotech companies face some version of this challenge. Maybe you're running a legacy SAS-based SCE that can't keep up with your team's expectations. Maybe you're a growing biotech that has relied on CROs and is ready to build internal capabilities. Maybe you're trying to figure out how to add AI/ML workloads to a platform that wasn't designed for them.

This project showed that an open-source-first, cloud-native SCE can be GxP-compliant and operational in months. The validated environment, the modular architecture, the automated compliance. It's real, it's working, and the team presented it publicly at a major pharma conference.

If you're evaluating your own SCE strategy, Appsilon can help.

Talk to our team about SCE implementation

About the Client

‍

The Challenge

‍

What is an SCE, and why does it matter?

‍

What the client needed

‍

The Solution

‍

Here are the important beats of the architecture, and why each layer matters.

‍

Containers: reproducibility by default

‍

Kubernetes: extensibility at scale

‍

Infrastructure as Code and Git: compliance as engineering

‍

Parallel GxP and non-GxP environments, same toolchain

‍

Results

‍

Six months after kickoff, the client had a functioning, GxP-compliant SCE.

The architecture is designed to expand beyond biometrics. The client's roadmap includes bioinformatics, translational medicine, and pre-clinical workflows on the same foundation.

‍

Lessons from the Build

‍

What This Means for Your Organization

‍

If you're evaluating your own SCE strategy, Appsilon can help.

Talk to our team about SCE implementation

How a Clinical-Stage Biotech Built an Open Source SCE in Six Months

About the Client

The Challenge

What is an SCE, and why does it matter?

What the client needed

The Solution

Containers: reproducibility by default

Kubernetes: extensibility at scale

Infrastructure as Code and Git: compliance as engineering

Parallel GxP and non-GxP environments, same toolchain

Results

Lessons from the Build

What This Means for Your Organization

About the Client

The Challenge

What is an SCE, and why does it matter?

What the client needed

The Solution

Containers: reproducibility by default

Kubernetes: extensibility at scale

Infrastructure as Code and Git: compliance as engineering

Parallel GxP and non-GxP environments, same toolchain

Results

Lessons from the Build

What This Means for Your Organization

Check our Services

Statistical Computing Environments (SCE)

Advanced Data Visualizations

R package validation

Migration to Open Source

Share Your Data Goals with Us