A Complete Guide to the Drug Development Process

Reading time:
time
min
By:
Ileana Saenz
April 1, 2026

New drugs do not reach patients quickly. The drug development process usually takes 10 to 15 years, and most programs fail before approval. That long path is not only a science challenge. It is also a data, reporting, and compliance challenge.

Over the past decade, Appsilon has worked with 8 of the world’s 10 largest pharmaceutical companies and dozens of biotech organisations across every stage of this process. What we have seen, consistently, is that the science rarely fails alone. Programs slow down because data is fragmented, reporting is manual, and evidence that should flow cleanly between stages gets stuck at handoffs. This post is about what those bottlenecks actually look like in practice and how teams have resolved them.

In this guide, you will learn:

  • what happens in each phase of the drug development process and where data problems typically surface
  • what bottlenecks we have encountered at each stage working with real pharma and biotech clients
  • how those problems were resolved, with links to the full case studies

What Is the Drug Development Process?

The drug development process is the structured path a therapy follows from early research to regulatory approval. It combines biology, chemistry, clinical science, biostatistics, data operations, and compliance across a sequence of stages where each one depends on the quality of what came before.

The development of pharmaceuticals is shaped by regulatory requirements that run through every stage. The standard of proof that regulatory agencies require is fixed and non-negotiable - teams either meet it or they don't. What they can control is how efficiently they do so, and that depends almost entirely on the quality of their data infrastructure.

The core pre-approval path covered in this article has four stages: Discovery and Development, Preclinical Research, Clinical Development, and Regulatory Review. Post-market surveillance begins after approval and is covered separately at the end.

The reason the process takes so long is structural. Every preclinical study must be traceable. Every clinical trial must be reviewable and reproducible. Every submission package must hold up under agency scrutiny. That standard of proof cannot be skipped, it can only be met well or met poorly.

Drug Development Life Cycle: An Overview

Here is a high-level view of the full pre-approval path:

Stage What happens Typical duration
Discovery and Development Target identification, compound screening, lead optimization, candidate selection 3–6 years
Preclinical Research In vitro and in vivo testing, safety, toxicology, PK/PD, IND preparation 1–3 years
Clinical Development Phase I, II, and III clinical trials in humans 6–7 years
Regulatory Review NDA or BLA submission, agency questions, review, approval decision 1–2 years

Each handoff between stages is a risk point. Evidence that arrives incomplete or poorly documented forces the next team to stop and reconstruct what should have been handed to them. The drug development life cycle is as much a data chain as it is a scientific one.

The Phases of Drug Development Explained

Stage 1: Discovery and Development

Every drug starts as a question: what mechanism underlies the condition being addressed, and what approach might usefully change it? This stage covers target identification, screening, lead optimization, and drug candidate selection.

Teams screen large chemical libraries, compare small molecule and biologic approaches, and run early ADME assessments. The goal is not to prove the therapy works in people yet, it is to identify a candidate worth taking into formal preclinical testing.

Key work in this stage includes:

  • target identification and biological validation
  • screening and prioritizing compounds
  • lead optimization and early ADME assessment
  • selecting a drug candidate for preclinical work

The bottleneck we see most often at this stage is data visibility. Discovery teams generate rich experimental data across many tools and systems, but they rarely have a clean, shared way to navigate it. Researchers end up spending time reconstructing context that should be immediately accessible.

What we observed: organoid research data locked in silos

A Fortune 500 healthcare company had accumulated extensive Omics and imaging data from their organoid collection, years of research that was effectively invisible to most of the teams who needed it. There was no systematic way to search, filter, or share it across research groups. We built an MVP platform in R Shiny and Python with advanced search and data visualisation tools that made the collection accessible across the organisation. From kickoff to working product: three weeks.

Read the case study: Organoids platform ready in 3 weeks

What we observed: bioinformatics pipelines too fragile to scale

A global biopharmaceutical company was running research workflows built from disconnected scripts with significant manual steps between them. Results were inconsistent across studies because the process depended too much on individual knowledge rather than standardised, documented steps. We converted their standalone scripts into production-ready Nextflow pipelines and consolidated the spatial transcriptomics analysis steps into a single flexible workflow. 

Read the case study: Nextflow pipelines cutting analysis time

Stage 2: Preclinical Research

Preclinical research determines whether the therapy looks safe and consistent enough to move into human studies. It covers in vitro and in vivo work, toxicology, pharmacokinetics, pharmacodynamics, and bioavailability assessment.

Collecting results is only part of the job. The real deliverable is evidence that is traceable, reproducible, and ready for the Investigational New Drug application. This is where data quality starts to matter most, because outputs from many different study types and systems have to come together into one coherent package.

Common pressure points we encounter here:

  • inconsistent data capture and documentation across studies
  • manual quality checks before key reports are finalized
  • weak traceability between laboratory outputs and later reporting
  • limited visibility across safety, toxicology, and PK/PD review

What we observed: drug dose estimation running on Excel and scripts

A Japanese-owned biotech company was managing their entire drug dose estimation workflow through a combination of Excel files and R scripts. The setup created error risk, slowed collaboration between scientists, chemists, and contractors, and made GxP-compliant outputs unreliable. We built a centralised R Shiny application that brought the whole workflow - dose calculations, allometric analysis, Monte Carlo simulations, and compliant reporting - into one validated platform. The application eliminated over 100 hours of manual work and removed the spreadsheet dependency that had been the main source of errors.

Read the case study: Transforming drug dose estimation processes

What we observed: no unified data layer for laboratory outputs

A Fortune 100 biopharmaceutical company had no unified, secure database connecting their chromatography data system to their reporting workflows. Experiments were being run, but the knowledge being generated was difficult to accumulate systematically. We engineered a tailored ETL process - R, SQL, Jenkins, Amazon S3, and Oracle Database - that connected their data management to automated R reporting workflows. Delivered in three months.

Read the case study: ETL process to unify data and automate reporting

Stage 3: Clinical Development

Clinical development is the most visible and most operationally demanding stage of the drug development process. It includes three distinct phases, and each one asks a different question.

It is important to distinguish between different phases in clinical studies because each phase has distinct objectives, patient populations, and regulatory expectations. Treating them as interchangeable leads to planning gaps and reporting problems that surface at the worst possible time.

Phase I: Safety and dosing

A small clinical trial tests tolerability, dose range, and early safety signals. The primary question at this phase is whether the drug can be given to humans safely and at what dose.

Phase II: Efficacy and signal detection

A larger clinical trial asks whether the therapy shows a meaningful effect in the target population. Safety monitoring continues throughout this phase, and the results inform the design of Phase III.

Phase III: Confirmation at scale

The largest and most expensive clinical trial phase confirms efficacy and safety across a broader patient population, multiple sites, and extended follow-up. This is the evidence base the regulatory submission is built on, and it represents the largest clinical trial investment in the entire program.

Across all three phases, the process becomes a data operations challenge. Every clinical trial generates outputs that must be captured, validated, audited, and reported to a standard that satisfies regulators, statisticians, and study teams simultaneously. Statistical Computing Environments, compliant reporting pipelines, and reproducible analysis workflows are not optional infrastructure at this stage, they are how the work gets done.

What we observed: $930K spent annually on proprietary analytics software

A top-50 pharmaceutical company was paying $930,000 a year for proprietary data analytics software with limited flexibility and an upcoming licence renewal deadline. We designed and deployed a cloud-native platform on AWS - Kubernetes, Posit Workbench and Connect, CI/CD automation, full observability. Delivered two days ahead of the deadline. Monthly compute costs dropped from $11,000 to $1,750 in the first month, and the platform now serves 70+ active users across five business units.

Read the case study: Unlocking $930K annual savings with a future-proof data analytics platform

Stage 4: Regulatory Review

Regulatory review is the stage where agencies assess whether the full body of evidence supports approval. For new drugs, that means a New Drug Application. For biologics, it means a Biologics License Application. The work shifts from generating evidence to defending it: clearly, consistently, and under scrutiny.

A weak audit trail, hard-to-reproduce analysis, or an unvalidated computing environment can slow review significantly, even when the underlying science is sound. The teams we work with at this stage are usually dealing with outputs that are difficult to reproduce, environments that have not been formally validated, or manual processes that add time and risk to every submission cycle.

What we observed: a 40GB submission export running overnight

A global pharmaceutical company was running overnight exports of a 40GB dataset that took approximately 12 hours with opaque error messages when things went wrong, and a dependency on a small group of specialists who knew the regulatory requirements well enough to debug it. When exports failed, teams had no clear way to identify why. We built a dedicated R package for submission exports and define.xml generation, embedding the submission logic into a standardised, documented, repeatable process with clear validation messages. The same export now runs in approximately 20 minutes.

Read the case study: Cut submission dataset exports from 12 hours to 20 minutes

What we observed: GxP compliance reports taking five weeks to generate

A research-focused biopharmaceutical company was generating Annual Product Quality Review reports manually, a process that took approximately five weeks per cycle and carried significant risk of human error. We automated the full report generation workflow using R Shiny applications, CI/CD integration with Jenkins and GitHub Actions, and Posit Connect. Report generation time dropped from five weeks to five minutes. The solution has expanded from 150 to 500 users and is projected to deliver over $1M in savings over three years.

Read the case study: GxP Reporting Automation: from 5 Weeks to 5 Minutes

What Happens After Approval?

Once a drug is approved, companies move into post-market surveillance. The focus shifts to monitoring safety, adverse events, and real-world product performance across a much larger and more diverse patient population than any clinical trial could cover. This phase runs for the full commercial life of the product.

Teams need pharmacovigilance workflows, real-world evidence analytics, and ongoing safety reporting that can handle the volume and variability of post-approval data. The data infrastructure decisions made earlier in the drug development process - how traceable outputs are, how automated reporting pipelines are, how validated environments are - have a direct effect on how manageable this phase becomes.

Drug Development Timeline: How Long Does It Take?

The drug development timeline for a new therapy typically spans 10 to 15 years from early discovery to approval. The actual time varies considerably depending on the disease area, the therapy type, and the regulatory pathway.

Most delays occur at transitions between stages. Clinical trial enrollment takes longer than planned. Regulatory agencies issue questions that require additional data or analysis. Manufacturing scale-up creates unexpected complications. Each delay compounds into the next stage rather than staying contained within the one where it started.

Regulators have created mechanisms to shorten the path for therapies addressing serious conditions or significant unmet medical needs. In the United States, FDA designations such as Fast Track, Breakthrough Therapy, Accelerated Approval, and Priority Review can reduce review periods or allow approval on earlier clinical evidence. The EMA offers similar accelerated pathways in Europe. Biosimilar development also follows a shorter timeline than new molecular entities, because the originator product has already established the safety and efficacy baseline.

For most programs, the timeline pressure is less about regulatory speed and more about how efficiently teams generate, document, and hand off evidence between stages. Programs that lose time to rework, re-analysis, and submission gaps are usually dealing with infrastructure problems that could have been addressed earlier.

Key Challenges in the Drug Development Process

Around 90% of drug candidates that enter clinical trials never reach approval. That failure rate is not random, it reflects structural weaknesses that compound across stages. Pharmaceutical companies invest billions annually into programs that still fail, because biology is unpredictable, regulatory requirements are exacting, and data systems rarely keep pace with scientific ambition. Across the pharmaceutical industry, the structural problems tend to look the same regardless of company size or therapy type.

The problems tend to cluster in the same places:

  • high attrition across the pipeline, with most failures concentrated in Phase II and Phase III
  • cost and time pressure throughout the drug development life cycle
  • growing regulatory complexity, particularly around open-source tools and AI in submissions
  • fragmented data across the full process, with inconsistent standards between stages
  • slow review cycles and manual reporting that delay decisions at every handoff

For teams working in drug development, the question is not whether these problems exist. It is whether the data infrastructure and analytics systems in place are strong enough to contain them before they compound.

FAQ

What is the difference between drug discovery and drug development?

Drug discovery is the early phase of identifying a viable therapy, finding a biological target, screening compounds, and selecting a candidate. Drug development is the broader program that takes that candidate through preclinical testing, clinical trials, regulatory submission, and on toward approval. Discovery is one input into development; development is the full process from candidate to market.

Why is drug discovery hard?

Most candidates that show promise in early screening fail when tested more rigorously. Roughly 90% of compounds that enter clinical development never reach approval, often because of unexpected toxicity, insufficient efficacy in a real patient population, or poor bioavailability. The biology rarely behaves in Phase III the way it appeared to in
Phase I.

What is the probability of success in drug discovery?

Historically, only about 10% of candidates entering Phase I clinical trials ultimately receive regulatory approval. Better candidate selection, tighter preclinical documentation, and clean data handoffs between stages improve the odds but no process improvement eliminates the underlying biological uncertainty. What better infrastructure does is reduce the time and cost of failure, even when it cannot prevent it.

Summary

The drug development process is long because the standard of proof is high. From discovery through preclinical research, clinical development, and regulatory review, every stage builds on the evidence quality of the one before it. Weakness at any point compounds forward.

The patterns we have seen working across pharmaceutical companies are consistent: the science rarely fails in isolation. Programs slow down at data handoffs, in reporting bottlenecks, and in infrastructure that was not built for the scale and compliance demands of later stages. Fixing those problems earlier, before they reach the submission package, is where the time and cost savings are.

Explore Appsilon’s work in pharma and biotech

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Stop Struggling with Outdated Clinical Data Systems

Join pharma data leaders from Jazz Pharmaceuticals and Novo Nordisk in our live podcast episode as they share what really works when building modern, compliant Statistical Computing Environments (SCEs).

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.

Ensure Your R and Python Code Meets FDA and EMA Standards

A comprehensive diagnosis of your R and Python software and computing environment compliance with actionable recommendations and areas for improvement.
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.