Data Integrity
This Chapter will cover:
- ALCOA+ Definition
- Data Integrity
- Data Management Plans
- Risk
ALCOA+
The MHRA GxP Data Integrity Definitions and Guidance for Industry[1] includes the following definition and statement:
"Data Integrity is the degree to which data are complete, consistent, accurate, trustworthy, reliable and that these characteristics of the data are maintained throughout the data life cycle. The data should be collected and maintained in a secure manner, so that they are attributable, legible, contemporaneously recorded, original (or a true copy) and accurate."
Attributable, Legible, Contemporaneous, Original and Accurate are abbreviated as ALCOA.
Similar definitions exist in the data integrity guidelines from the EMA[2], FDA[3], ICH[4], PIC/S[5] and WHO[6]. The regulators, including the MHRA, have extended ALCOA to include additional concepts, notably that data should also be Complete, Correct, Accurate and Enduring. Sometimes the list includes Traceable as an explicit requirement and sometimes this is included in the need for data to be Legible (see the table below).
The addition of Complete, Consistent, Enduring and Accurate to ALCOA is abbreviated as ALCOACCEA, but is much more commonly called ALCOA+ or sometimes ALCOA++.
Although not formally part of regulatory guidelines, the concepts of data being Credible and Corroborated are also encountered in the literature[7] around Data Integrity.
For example, the FDA and MHRA note[8] that “clinical trials must be
of high quality, address important questions utilising study designs that are suited to the question being asked, and be well conducted so that study results will be credible”.
This reflects good practice, for example that processes should be in place to provide confidence that data can be trusted, believed, and relied upon. Where possible, data should be supported by additional information (e.g. witness statements, equipment validation certification) that confirms the data was generated, collected, processed, analysed, interpreted, and verified in accordance with regulatory requirements, best practice, and established protocols. If guidelines for Data Integrity and GDocP are followed, including the ALCOA+ principles, then achieving data that is Credible and Corroborated should be a natural outcome. These terms are useful to have in mind when checking that a chosen approach to Data Integrity will have the desirable result.
The ALCOA+ principles for Data Integrity as defined by the EMA[9], are listed below. Click on each button to find out more about each element.
Data Integrity: Scope
‘Data’ in the context of Data Integrity is more than just a set of computer files that result from a GXP activity.
Data includes:
- Original records and true copies (sometimes known as certified copies), including source data (sometimes called raw data)
- Metadata, which describes the records and how to interpret and understand them. Metadata provides context. This includes audit logs, timestamps and digital signatures.
- Any subsequent transformations on the data, for example when doing data migrations between systems or performing digital preservation actions.
- Any other associated context needed for the interpretation of the data and meaning of the data to be understood, for example documentation produced according to GDocP.
Data can be static, e.g. a set of PDF documents, or dynamic, e.g. records in databases where users can interact with and query the data. For dynamic data, e.g. data in a database, it is important for the ability to interact with and interpret the data to be maintained as much as possible, including during archiving.
Data Integrity, covering all of the above aspects of data, needs to be maintained for the full lifecycle of the data, including throughout the entire duration of the required retention / archiving period.
Data Integrity: Data Management Plans
Data achieving long-term Data Integrity, for example when data is archived, involves more than just implementing the ALCOA+ principles. It depends on having a quality and risk-based approach, good data governance and data management, computer systems validation when using software systems, and proper oversight and management when using third-party solutions and providers.
It is good practice to describe how data will be managed and Data Integrity will be achieved in a Data Management Plan (DMP). The use of a DMP is explicitly mentioned in the FDA Guidance for Industry: Electronic Source Data in Clinical Investigations[10] and is implied in many of the guidelines and requirements for GCP, such as part of the documentation required in an eTMF. For example, the CDISC TMF Reference Model group has released a TMF Plan Template[11] that has a very similar function to a Data Management Plan.
The need for DMPs in the context of GXP guidelines and regulations is covered in detail by the Society for Clinical Data Management (SCDM) in their document on Data Management Plans[12].
The SCDM DMP guidelines state the following minimum standards for the creation, maintenance and implementation of Data Management Plans:
Integrity: Risk and Criticality, Impact and Consequences
Data Integrity is ultimately about creating and retaining quality data (and documents) that provide evidence of the safety, quality, and efficacy of drugs and the protection of public health e.g. the safety, rights, and well-being of those involved in clinical trials.
This is why regulators conduct inspections and want to see evidence of compliance with Data Integrity principles. For example, Data Integrity provides evidence of the quality and scientific rigor of a study; this in turn allows the impact on patient health and safety to be assessed – both for the participants in a study and those who might subsequently receive a drug or intervention that is marketed as a result.
The consequences of not maintaining Data Integrity from the perspective of an organisation responsible for the data, e.g. a Sponsor, includes: rejection of marketing applications; removal of a drug from market; the need to repeat or conduct additional work; reputational damage; or monetary penalties such as fines.
If Data Integrity cannot be demonstrated, for example, when reconstructing a study, then question marks will be raised on the quality of the study, whether the study design and its execution was appropriate, and the credibility of the results. This in turn raises issues of the safety of study subjects and wider issues such as whether it is ethical to involve subjects in a study that cannot be demonstrated to have been done to a high standard of quality.
The industry and regulators are moving to a risk-based approach where the measures needed to ensure Data Integrity should be
proportionate to the impact and consequences of not achieving Data Integrity, such as the health and safety of patients.
This can be seen in the Data Integrity guidelines from the regulators, which is an area of active development and evolution, for example as seen in draft guidelines from the WHO in 2020[13], the EMA in 2021[14], the ongoing collaboration between the FDA and MHRA in workshops on Data Integrity in 2020[15] and 2021[16], and the new version of GAMP5 released in 2022[17], to name but some.
What is clear is that there needs to be a quality by design and risk-based approach where the criticality of the data and the need for Data Integrity can be matched to the approach taken by an organisation when archiving that data and employing LTDP good practice. This is why this report uses maturity models for LTDP so that an appropriate level can be selected and matched to the criticality of the data that needs to be archived and preserved by an organisation.
An organisation can perform a risk-based assessment of their data in line with guidance from the regulators and then match this to an appropriate level of LTDP.