Aligning archival strategy to risk-based requirements
In this section, we’ll seek to outline good practice approaches to data archiving and preservation.
Organisations need to understand what good looks like, and importantly to be able to align your archival strategy to your data retention requirements (ideally supported by a risk-based assessment).
We’ll cover four main areas that should be considered:
- Data safeguarding
- Digital preservation
- Consolidation and access management
- Inspection readiness and audits
These four areas can support organisations to ensure that their digital records and data are compliant and align to the ALCOA+ principles for as long as they are needed.


Data Safeguarding
A critical ALCOA+ principle for any retained data is ‘Enduring’. Organisations must put in place processes to ensure that data can endure for as long as it needs to be retained.
Historically, particularly for documents and records, these would have been stored physically. Paper records and documents would have been maintained in a physical archive and records managers would have been responsible for managing physical access. They would ensure the documents were safe from hazards such as fire and manage the preservation of those records by controlling variables such as temperature.
For digital content, different challenges must be overcome and different processes must apply. These challenges include data corruption, bit rot and loss. It is worth noting that
- All data is at risk of becoming damaged, all the time. While the occurrence isn’t common, the risk increases when retaining hundreds of thousands of records over long periods of time.
- No technology currently exists that completely removes the risk of data becoming corrupted, and so we must put in place process to identify any issues and resolve them as required.
So how do we manage this risk?
Store data in multiple locations
Storing data in multiple cloud locations has several benefits for sponsors. The first is that it protects against the albeit unlikely occurrence of damage to the physical storage locations (for example through fire or flood damage).
There is though a much more important benefit; it enables good archival practice to replace any damaged data throughout the retention period.
Please note, that this is different to keeping a ‘backup’ of your data. Backups are designed to support disaster recovery and take snapshots of datasets on an ongoing basis. It is typical for a new backup to overwrite previous backups on a rolling basis. In this instance, if a file were to become corrupt, a backup system would simply keep backing up the damaged data, eventually replacing the healthy copy of the file, with the damaged one.
Automated data integrity checks & resolving issues
Where in the past, sponsors would have been only able to check a percentage of records, technology today enables these organisations to run data integrity checks over an entire dataset.
In order to safeguard data for decades or longer, it is important to check every file on entry to the archive to ensure that it is as it should be. These checks can then continue on a regular basis for the entirety of the retention period.
If an issue is found, then it can be resolved immediately, by taking a healthy copy of the file from one of your other storage locations. In this way, an organisation can reduce, or remove the risk of data becoming damaged or corrupted. This type of process is critical to ensuring the data endures over the entire retention period.
Digital Preservation
Another critical ALCOA+ principle for maintaining data integrity is ‘Legible’. We see the word referenced directly in regulations and guidelines, including variations such as readable, human-readable or even usable.
Ensuring legibility is more than just safeguarding the data. Across 25+ years you can ensure that data hasn’t become corrupted, but if the device that is being used in future is unable to recognise or open the data, then it is effectively useless.
Technology, both hardware and software, evolve over time. New hardware is released, current devices are retired, software is updated or simply becomes obsolete.
This is why digital preservation is essential.
It is the process of ensuring that digital content can be recognised, open and read (by a human) not just now, but at whatever point in the future it is needed.
Maintaining long-term preservation copies
The best approach to preserve your data is to create long-term preservation copies of each file (sometimes called file format normalisation).
Backed by digital preservation good practice, this is the process of identifying a file format, finding its current best long-term equivalent and then creating a copy of the file, in that format.
A simple example of this would be if I have a word document today, then I would create a PDF/A copy and maintain (and safeguard) both.
If in future, a PDF/A is no longer the recommended best long-term format for a word doc, then this process can be repeated, and a new copy generated.
Now while it is possible to do this manually, it would take significant time and resource, and likely result in a number of errors. A digital preservation tool will manage this process automatically, across millions of documents and records.
Therefore, in future when someone tries to access that record, they will be able to do so with whatever current device they are using.
It’s important to understand that the process of digital preservation is an ongoing maintenance of the data, vs. preserving it once and then you’re done. This is essential for ensuring that data remains legible for as long as it needs to be retained.

The Digital Preservation Coalition’s (DPC) Endangered Digital Species
Every two years, the DPC updates their global bit list, detailed which data types and formats are at risk of becoming extinct and obsolete. In their most recent 2023 edition of the bit list, they categorised commercial software as ‘Critically Endangered’.
This includes software such as SAP, Oracle, Adobe products and the Microsoft Office Suite.
Quoting directly from the report it stated that:
"The entry focuses on the distinct risks relating to the availability and access to software and code, and lack of preservation interest or mandate, by companies that publish them, creating challenges to preserve digital content and software in source code form."
This highlights the risks associated with not taking a proper approach to digital preservation, and the future challenges organisations may face when trying to use records and data in future.
Consolidation and Access Management
We have already discussed the expansion of the number of eClinical systems in use across the clinical trial lifecycle today. During the live trial, it makes sense for the data to remain within these systems, but when the study is approaching its end, there is a strong argument to consolidate all retained records and data into a single, centralised repository (or as few as possible).
The recent release of ICH E6 (R3) has made efforts to broaden what essential records should be retained by sponsors and investigators after a trial has closed. This only strengthens the argument to consolidate important records and data into a single long-term archive.
Below let’s explore some of the additional benefits of consolidating long-term data into a single location:
- Active preservation: Both safeguarding and preservation require active maintenance of data to align with the ALCOA+ principles. This is much easier if the data is in one location and safeguarding and preservation activities do not have to be repeated or duplicated.
- Managed access: A centralised repository is much easier to manage access to. Whether that is for internal stakeholders or even for inspectors during an audit. A centralised repository, with the appropriate regulatory tools, is arguably more inspection-ready than retaining data in dispersed eClinical systems.
- Data portability and M&A: A centralised archive can ensure that the data is both accessible and portable. This includes avoiding vendor lock-in with various eClinical solutions. This can be extremely useful in the case of mergers and acquisitions as trial data and records must be transferred to the new owner. Dispersed (or possibly lost) data could delay or even put the acquisition at risk.
- Retention management: Managing retention periods can be challenging, particularly if data resides in multiple systems. Consolidating records into one system with flexible retention rules makes this process much simpler. This is particularly useful with more fluid retention regulations (e.g. MDR 745 where the trial sponsor will not know the exact retention period at the conclusion of the trial.)
- Business continuity: As retention periods extend over decades, it is highly unlikely that the organisation structure will be the same. Even if staff are still within the organisation, it is highly likely they will have moved positions internally and no longer be responsible for the archive. This only strengthens the case for a single, centrally managed repository, with supporting SOPs. In this way, the management of the data can be easily passed on as staff move or churn.
- Long-term cost: In our experience it is usually more cost-effective over the entire retention period, to have all your data in one repository. This is especially the case if organisations have chosen to retain data within costly eClinical solutions past the end of the trial. In this case, ROI for centralising data would be achieved quickly.
It may seem like a simple recommendation, but any initial effort to centralise essential records and documents will easily see returns in the long run. Organisations will not only be compliant and inspection-ready for the entire retention period but also realise commercial benefits such as efficient M&A activities and reduced total cost of ownership (TCO). This combination of benefits is a powerful tool in creating a business case for investment in appropriate digital preservation investment.

"...Our clinical trial data is kept secure and accessible over the archiving lifespan and allows confidence that it will be available in the event of a future authority inspection."
Inspection Readiness
Organisations' long-term digital archives must be inspection-ready. An inspection-ready archive can not only de-risk that part of an audit but also reduce resources required in preparing the data pre-inspection.
We have already discussed ensuring that records and data are readily accessible and that inspectors can be provided direct access. This section will explore additional steps to promote inspection readiness.
Audit trails and traceability
ICH E6 (R3) states that: "Essential records are used as part of the sponsor oversight or investigator supervision of the trial. These records are used by the sponsor’s independent audit function and during inspections by regulatory authority(ies) to assess the trial conduct and the reliability of the trial results."
The retention of records and data should include associated audit trails and must be considered as part of the essential records covered within this guide. Audit trails are an essential part of supporting sponsors, investigators and inspectors in assessing trial conduct and the reliability of trial results (as stated above).
Audit trails should include audit trails migrated from live systems, audit trails for the migration, and audit trails of data during the archiving phase. Together these enable full traceability and provide evidence that data integrity existed both during the study and has been maintained ever since. It is important to ensure that these requirements are met when planning for end of study activities.
Searchability
When managing such large volumes of data and records, it’s important to be able to find what you’re looking for quickly and easily. This consideration feeds into several ALCOA+ principles.
There is also an inspection readiness consideration as well. As regulators are increasingly expected to be given direct access to records, it’s important that they are able to find what they need to, when they need it.
We have experience with sponsors who have received a finding during an inspection, due to the slow availability of records. Part of this was the time that it took to be able to locate the requested information.
Let’s explore how we can prepare data to ensure that records can be easily found, both now, and throughout the entire retention period.
The power of metadata
Metadata is an incredibly powerful tool which if applied properly, will ensure that records can be easily found.
In short, metadata is data about the data. A simple example might be the date a file was created, or the last time a record was modified. Most data storage systems capture and retain basic metadata such as those examples given.
A digital archive should provide users with the ability to create, retain and most importantly search against, custom metadata fields. In the case of a clinical trial, a simple example would be having a metadata field for the site.
Arkivum would recommend preparing your metadata before uploading trial data and records into an archive, although it is possible to retrospectively update this within an appropriate system.
The use of metadata enables organisations to quickly find the right files that they need when they need them.