Pharma Product Creation: The Biggest Data Challenges

Creating new pharma products is a journey fraught with peril, and hardly any compounds make it. “Only 5 out of 5,000 compounds go from preclinical to human testing — and of these, only 1 is approved to go to market.” What’s more, on average, that solitary compound will take 12-15 years to travel from the lab to US patients.

This state of affairs persists, despite the development of Covid vaccines demonstrating that pharma products can be researched, tested, and shipped in a much shorter time-frame.

Nevertheless, the R&D process for the majority of pharma products is not only lengthy, but also expensive. Some estimates suggest it costs up to $4 billion to bring a drug to market. Part of that cost is explained by the fact that there are stumbling blocks at every stage of the process. From drug discovery to production right through to product marketing, pharma companies face regulatory challenges, market hurdles, and run-of-the-mill operational bottlenecks.

This post highlights some of the most pressing challenges in pharma throughout the entire drug creation process, and how data science can help overcome them. A smoother product pipeline will not only make the innovation-to-sales path more timely and cost-effective, but will also ultimately save more lives.

Overcoming Drug Discovery Challenges

1. Liberating Siloed Data

In a recent paper by the Orphanet Journal of Rare Diseases, researchers argued the proliferation of data silos is leading to a “balkanization” of potentially useful data, hampering drug development — especially for rare disease patients.

This is because pharma companies and clinical research organizations (CROs) might unwittingly recreate existing clinical data that is trapped in proprietary databases or archived by one or another external partner. Liberating this data is vital. Indeed, a recent literature review of the problem described reusing clinical data as “essential to realize the potential for high quality healthcare.”

Among their recommendations:

Implementation of good hygiene practices which support data organization and reuse.
Using tools which can process large volumes of complex data of differing types.
Increased data sharing and collaboration between organizations.

These can help transform an obstacle into an advantage, giving pharma companies large volumes of data without the associated costs of generating new data.

2. Optimizing Data Pipelines

Researchers, lab managers, and pharmacologists spend up to 80% of their time wrestling with data. Not only do they have to prepare disparate data types for their suite of systems, but they also have to be able to move data upstream quickly — a job made more difficult by suboptimally-organized data pipelines.

This job is all the more difficult for the 5% of pharma companies that still rely on Excel to integrate data. Not only does Excel inevitably lead to delays in analysis, but it also causes bottlenecks throughout the drug discovery process, as it can’t read live analytical data, leaves no audit trails, and has versioning problems and formatting issues.

It may seem astonishing that companies still use Excel for drug discovery, but for many of them, their data pipeline may have gradually evolved over time, and for circumstantial reasons they may be stuck with the program. Nevertheless, these faulty data pipelines can multiply redundancies in “data duplication, data integrity, and LOT tracking.”

Life sciences expert Ken Longo of Wave Life Sciences recently outlined that the best way to create an efficient data pipeline is to use an efficient platform which can work with different data types and sources, as well as integrate with different systems, saving time and limiting problems.

3. Collaborating Externally and Internally

Nearly three quarters of industry experts say pharma companies are not efficient collaborators. This is not just because of corporate reluctance to share data, but also because of regulatory requirements to keep data secure while giving researchers access. This latter problem has been exacerbated by the rise of remote work.

As drug discovery necessitates the use of large datasets, individual collaborators' networks can cause delays when uploading or downloading. By centralizing work on a shared, secure, compliant platform, researchers can keep the process moving, even while working remotely.

Finally, no-code, low-code environments can be used throughout the drug discovery pipeline for collaboration with different types of users. For example, business users can use no-code environments for visualization and reporting, while data scientists use advanced techniques for analysis in the same platform.

Overcoming Drug Development Challenges

1. Digitizing Audit Trails

Of the some 10,000 clinically recognized diseases, only 5% have approved treatment options. Stringent regulations designed to protect the public from under-tested pharmaceutical products and “quackery,” as the FDA calls it, present a challenge for bringing new products to market quickly.

One FDA requirement is thorough audit trails of a drug’s production history. The FDA writes that to bring a new drug to market, “a team of physicians, statisticians, chemists, pharmacologists, and other scientists needs to review the company's data and proposed labeling.”

This process of documenting production history in a dedicated way is nigh impossible without systemic audit logging and documentation. Here digitized audit trails play a vital role. Audit logging keeps a sequential record of all the activity on a system for easy administration, and data science documentation ensures the reproducibility of an entire process. And having that process built into the systems (or in an overarching way across multiple systems) used for drug development will automatically and efficiently create the required regulatory trails.

2. Unblocking Machine Learning

Unlocking opportunities with ML in drug development requires removing barriers (not limited to costs) to its deployment.

A significant challenge is freeing up data. For ML models to be trained, researchers need to be able to use every available data source. But as highlighted above, data about compounds and experiments can be trapped in proprietary networks or archived improperly. Liberating and preparing this data are challenges in themselves.

Poor data cataloging and archiving practices can not only render ML implementation tough, but can also present challenges for model validation. One solution is to combine ML techniques with platforms which can manage large volumes of data from numerous sources and types, as the University of Oxford did in a study on ML in drug development.

Another solution is to use no-code or low-code platforms which enable ML techniques to be spread around organizations, not just limiting their use to data science experts.

3. Sharing Results and Methods for Reproducibility

To proceed to clinical trials, researchers and organizations have to be as confident as possible in the validity of their results. As a recent NLM review noted: “the ability to rely on published data and process that data from one lab to another is critical for the successful translation of discovery research.” But drug development frequently runs up against the problem of relying on published results.

One hurdle here is again the lack of sharing of results and data. Not only are there ethical arguments for more open, public data sharing in drug development, but it might also accelerate the process by “detecting and deterring selective or inaccurate reporting of research.” Again, centralized and secure platforms for external collaboration have a role to play.

Overcoming Production Challenges

1. Enhancing Visibility through Traceability

The pharma industry suffers from the same supply chain challenges as all other industries, and it still has to be able to track the flow of its raw materials and products. It can do this via traceability.

GS1, the international not-for-profit organization behind barcode standards, defines traceability in healthcare as the process that “enables you to see the movement of prescription drugs or medical devices across the supply chain.” This process is especially vital for monitoring medications with short life spans as they transit from production to use — for example, vaccines, which typically have a limited “use by” date.

This process is vital for enabling efficient product recalls and for general regulatory purposes. The Drug Quality and Security Act, enacted in the US in 2012, has key implications for traceability in healthcare.

No-code, low-code data science platforms can not only make traceability work for pharma, but also enable users of different analytics capabilities to detect issues and act swiftly during product recalls.

2. Setting up VPEs to Lower Production Costs

Pharma throws up an industrial challenge for data scientists: translating algorithms from development into production environments. As Pharmaceutical Engineering comments: “ Often, a finished algorithm is just a prototype that proves feasibility. Too many promising tools never end up in a production facility.”

Virtual Production Environments act as “digital twins” for real-life settings, and have been shown to increase pharma production efficiencies by up to 30%. By simulating the functioning of individual machines or even entire production lines, complex production routes are tested.

By combining predictive analytics with robotic process automation connected via the IoT, data scientists in pharma can be given sandboxes to test-run production lines before they become a reality.

3. Achieving Personalization through Modular Manufacturing

Personalization is a generalized trend across most consumer industries, and is predicted to be a core element of “Industry 5.0.” Personalized medicine tailors treatments to each patient based on their genetic profile. With personalized medicines, doctors can eliminate much of the trial and error associated with traditional medical treatment. This can help patients get the precise treatment they need, when they need it, and at the optimal dose.

In the context of pharma production, personalization comes at an increased need for smaller batches of medications. Those batches will also often need customization for each patient. However, a traditional plant layout makes it challenging for a factory to switch over and adjust equipment as needed to produce several different types or strengths of medications.

Modular manufacturing allows pharmaceutical processing companies to produce multiple batches of various types of drugs, all from the same facility. To allow for modular production, some processors are using a ballroom layout, meaning there is no fixed equipment. The facility can be broken down and rearranged as needed, depending on that day's manufacturing needs.

No-code, low-code can help visibility for modular setups, enabling factory floor workers to quickly apprehend which setup is required by the different needs presented in the data visualization.

Overcoming Marketing Challenges

1. Upskilling Marketing

Nearly half of life sciences leaders say there’s not enough analytics talent in their companies, with domain experts not able to work effectively with data. And although digital marketers often work with analytics, marketers do not typically possess advanced data science skills.

As such, part of the pharma product pipeline involves giving non-experts the ability to access and manipulate the data they need to best do their jobs. In pharma marketing, that means upskilling the workforce, with a reasonable goal of making 10% of a company’s business experts analytics-literate.

By enabling marketing experts to conduct their own analytics work without the bottleneck of relying on data science experts, they can begin more quickly to channel data-driven insights into strategic decisions. It would not only be a shame, but also extremely costly to have waited almost two decades to bring a drug to market and then not capitalize on data-driven and localized marketing.

This can be best achieved through a combination of data upskilling and the use of no-code, low-code platforms, which free up domain experts to perform complex operations through visual programming.

2. Readying Infrastructure

The global life science analytics market size will be roughly $42 billion by 2025, but these numbers belie the differing levels of analytics maturity among pharma companies.

The COVID-19 pandemic accelerated pharma’s journey toward digital. However, it also demonstrated many companies’ lack of readiness around marketing. Decentralized platforms and marketing infrastructure allowed them to not only work remotely, but also make localized decisions to optimize individual market needs.

However, decentralized infrastructure for localized marketing requires investment — not just in marketing platforms, but also in data science environments which can help with sophisticated ML techniques.

3. Segmenting Customers Intelligently

On average, American consumers spend $1,000 annually on medical products. In such a large, competitive market, pharma companies have to skillfully brand and advertise products for maximum share. No easy task, given that there are some 55,000 trademarked pharmaceutical products from which new products have to be distinguished.

Nevertheless, new products have to abide by strict rules stating that branding or communications must not be misleading in any way. This is especially true for over-the-counter (OTC) drugs, which don’t require prescriptions.

Pharma companies therefore have to identify, anticipate, and provide solutions for customer requirements while adhering to exhaustive rules. Machine-learning-driven customer segmentation helps create dynamic groups of customers based on similarities in large data sets.

These are far more detailed than traditional segments, opening up unique branding insights for products, which could be specifically targeted with hard-to-spot trends in mind. Low-code, no-code platforms can help by giving non-data experts access to advanced techniques, and do so with decentralized access. This can help with local marketing efforts, such as identifying effective local business channels using customer data through ML.

Optimizing Pharma Product Pipelines with Data Science

Shortening the time to bringing drugs to market will not happen through the use of machine learning alone, as promised by some commentators. To optimize the entire pharma process, data science must be applied in a dedicated and meticulous way throughout.

The right data science environment will help overcome problems and delays associated with data preparation by letting people work with different data types and sources in an intuitive and easy way.

It will also help bring down barriers, opening up access to analytics techniques to non-data experts. By combining data upskilling with decentralized platforms, pharma companies can help spread the use of analytics throughout their organizations.

Whether it’s for entities transferring data during drug discovery, or enabling different departments within organizations to collaborate quickly for the needs of traceability, the right data science environment will make every moment count, and get drugs to people who need them faster.