Technical Archiving Solution Requirements
This section specifically covers the requirements for a digital archiving and preservation technical solution. Sections covered are focused on ensuring that digital content stored within the archive are safeguarded and preserved for as long as they are required. This includes guaranteeing that records are findable, accessible and readable for the long-term.
Upload Capabilities
Migrating and uploading your data into the archive can be a challenging process. It is important that the selected solution has the technical capability to ensure healthy data is added to the archive. Additionally it can be helpful to understand if your selected 3rd party has expertise you can leverage to make this process as easy and risk-free as possible.
Does the system test for viruses that could place digital content at risk upon processing or uploading data into the system?
Viruses compromise the integrity of data, and therefore it is imperative to avoid this risk throughout the process of transferring data into an archive. Using a system that tests for viruses as data is being transferred into the archive minimises the risk of viruses impacting your archived data.
Does the system check for data integrity issues on transfer to the archive?
It’s important to ensure that healthy data is added into the archive. Uploading bad data to the archive will result in bad data out. An archiving solution should automatically check every record for integrity issues on upload.
Does the system generate reports on the upload process and flag any upload issues?
It’s important to be able to track the progress of transferring data into the archive and flagging any issues as soon as possible. This ensures any issues that are flagged can be quickly resolved.
Can you upload large volumes of digital content and metadata at once?
Systems can struggle with both large data volumes (e.g. multiple Gigabytes and Terabytes) and large volumes of records (e.g. a TMF can consist of potentially hundreds of thousands of documents). It’s important to have a system which can manage both with ease, reducing the time and effort required to upload the data. This ultimately reduces the risk of issues caused from multiple smaller uploads over a longer period of time.
Do you offer assistance in preparing data for transferring into the archive?
Transferring data into an archive can be overwhelming. Having the support of an dedicated team, experienced in migrating data from stakeholders, into the archive can be highly beneficial. This streamlines the process, reduces the likelihood of any errors and provides guidance on how to do it for future transfers.
Searchability and Metadata
For records to be findable long into the future, it’s important to have a system which supports a broad range of metadata*. Attributing metadata to each record, enables both searches for specific records and broader discoveries by individuals now and in the future.
*Metadata is the contextual information related to the document, and can include information such as the creator, the time of creation, where it was created and more. This information attributed to records is crucial for understanding what the document entails and improving searchability of documents.
Can you attribute custom or bespoke metadata to your records?
It’s important that the archive solution can fit with your metadata structure, whether this is an industry standard, or something more bespoke. This will make it significantly simpler to find records in future.
Do you support any metadata standards?
If you don’t have any specific standards that you use, it might make sense to use an existing industry metadata standard like PCDM or Dublin Core.
Can you search against metadata within the archive?
Attributing metadata to records is one thing, but it is also important to be able to search against that metadata within the archive itself. If not, it can be extremely slow and difficult to search for and access records in future.
Can you update metadata within the system?
A flexible and intuitive archive solution should provide customers with the ability, if required, to edit the metadata of files from within the system.
Do you offer support for customers to map metadata correctly into the archive?
Maintaining the metadata when migrating data from an existing system into an archive can be a complex process. Working with a third party, experienced in mapping metadata across systems can make this process simpler and more cost effective.
Does your system assess the digital content and attribute relevant metadata?
Sometimes when digital content has no attached metadata, it is beneficial for there to be a process whereby metadata is created. Some dedicated archives can attribute metadata to digital content transferred into the archive.
Can you upload data without metadata?
With potentially hundreds of thousands of documents, it can be difficult to keep track or apply metadata to every single one. While we would not recommend uploading data without metadata into an archive (mainly for accessibility reasons), sometimes it may be necessary.
Can the metadata and/or digital content generated by your system be harvested by other systems or tools?
By “harvested” we mean to be able to extract/export metadata that had been generated within the vendor solution. If data needs to be transferred into another system, the metadata generated by the existing system will need to stick with the data.
Safeguarding Capability
Over many years, archived data can become lost or corrupted. The process of safeguarding data can mitigate against the risks of this occurring. This section will cover the technical requirements of a solution to safeguard your data.
Does the system generate checksums for each file?
It is beneficial for both the owner of the data and the archive solution to produce checksums on the data. Checksums should then be verified before and after transferring the data into the archive to check the integrity remains intact. Checksums are mathematically calculated strings of characters, determined by the size of the document associated with the checksum. These checksums are crucial for the verification of the integrity of data.
Are checksums tested regularly?
To support long-term digital preservation, testing checksums regularly (at least annually would be best) ensures that the data has not been compromised.
Does the system notify admin when the integrity is compromised?
On the occasion that data integrity has been compromised, this would need to be actioned promptly. Using a system that notifies the admin of compromise will allow for promptly resolve the issue.
Is data replicated and stored in multiple geographical locations?
Storing data in multiple geographical locations mitigates risk of losing the data for any reason. If data is lost or becomes corrupted, the copy of the data stored in another location can replace it. An important consideration here will be whether a copy of the data is held in Escrow on a different cloud provider (see QMS section for the question).
Will archived data always be immutable?
This is an important question for where data needs to remain unchanged for long periods of time. The answer should always be yes, unless authorised.
To what extent does the system comply with ALCOA++ principles?
The ALCOA++ principles are important for ensuring the integrity of data. It can be useful to ask a potential third-party supplier to what extent their approach to long-term retention aligns with the principles (and how much, if it all, they have considered this) Where the system complies with these principles will evidence that data integrity is the priority.
Preservation Capability
Digital preservation is the process of ensuring that digital content remains readable, legible and usable, regardless of future device or software, for as long as required. This helps to mitigate against the risk of future hardware and software obsolescence.
Digital preservation is critical to ensure that organisations are compliant with regulations such as EU Reg 536 where legibility is a requirement.
Can the system automatically create long-term preservation copies of each record?
One main principle of digital preservation is to maintain long-term preservation copies of each record where possible. For example, an old word perfect file can be converted into a PDF/A which has a much greater chance of being opening and read by a modern device. A digital preservation solution can automatically maintain these copies in line with current and future requirements.
Does the system provide the option to create preservation copies of data already uploaded to the archive?
It is important to understand how flexible the preservation solution is; can you create preservation copies of records after upload or must this be done during the upload process. If the system is not capable on preserving records within the archive, this can create extra work when maintaining copies over the entire retention period.
Can I see the difference between the original, and preservation copy within the archive system?
When it is crucial to keep original copies of records, it is important to be able to differentiate within the system, which is which.
How many (or what) formats does the system support?
Depending on your dataset, it’s important to understand what formats you currently have, and what formats the proposed system can support. Each format is different in terms of its risk profile (i.e. how likely is it to become obsolete in the near or long-term) and how important it is to preserve that record in the given format.
How does the solution/provider handle formats that are not supported?
Not every format will likely be supported be a preservation system given how many are in existence. Make sure you work out with your provider what is supported, and if something like software preservation needs to be considered for more bespoke requirements.
Does the system generate a report on preservation activities?
The system should be able to provide a report on how any preservation activities have performed. This report could include which files have been preserved, whether the process was successful and what format they were copied to.
Storage Requirements
It is important to understand your storage requirements at the start of the project, and where possible plan for future needs. From this you can find a third-party provider who will be able to scale appropriately with those needs.
Can you increase (or decrease) your storage capacity at any point?
One of the key benefits of cloud storage is the ability to scale the solution to meet your needs. It’s important to understand what your requirement is now, what it might be in future, and how can the prospective solution scale to meet those needs.
What happens if we accidentally go over our agreed amount?
It’s important to understand what happens if you accidentally use more data than agreed to avoid incurring additional costs.
How does your pricing packages change if we need to increase our storage capacity?
It’s important there is an element of predictability regarding costs. Knowing the exact fees that can incur with increasing storage is important for avoiding being blindsided by unexpected costs.
Additional Capabilities
In this section we’ve included additional capability which you may want included within the solution.
How quickly can data be retrieved from the system?
Historically, records could take weeks to be retrieved from storage. One of the main benefits of a digital archive is that records can be retrieved and accessed within minutes.
Can files be accessed from within the solution?
Viewing the files in the solution takes less time than having to download a copy of the file stored on the solution in order to access it. Some solutions offer PDF viewers to quickly display the stored content.
Can the archive store all types of digital content? (Images, videos, sounds, documents)
Digital content comes in many forms. Ensure that the solution you choose stores the formats you require to be archived or may require in the future to be archived.
Does the system have retention policies or rules that can be assigned to datasets?
Various datasets must be retained for a certain period of time in line with regulations (e.g. EU regulation 536 stipulates +25 years for the eTMF). Retention policies can make the process of managing multiple retention periods much easier within the system.
What other reports are available from the system?
It is worthwhile to understand all the reporting information available to users.