Biospecimen Resource Informatics: Data Management and Inventory Control and Tracking
Driven by the scale of data generated by the cutting-edge -omics technologies, informatics systems have become critical to the research enterprise. A minimum set of functional, operational, and legal requirements should be considered best practices (as outlined in this document) and should be incorporated when developing or selecting informatics systems to support biospecimen resources.
B.6.1. Functionality — General
B.6.1.1. Data types
At the biospecimen resource level, informatics systems should be focused on recording data types as described in Section B.5. This includes inventory functions, tracking all phases of biospecimen acquisition, processing, handling, QA/QC, biospecimen quality measurements (such as RNA Integrity Numbers), and distribution from the collection site (research participant) to utilization (researcher).
B.6.1.2. Identifiers
Each biospecimen should have a unique ID assigned to it in the system. The informatics system should have the capability of linking the labels on the physical biospecimen container (e.g., paper labels or barcodes) to other information regarding that biospecimen in the system.
B.6.1.3. Association of biospecimen data with clinical data
Informatics systems should track clinical data associated with a biospecimen and/or link biospecimen data with external sources of clinical data, where applicable.
B.6.1.4. Security
Biospecimen resource informatics systems should provide role- and project- based access control to system functionality and data. The role-based access control (RBAC) should support at least the flat National Institute of Standards and Technology (NIST) level, and preferably the hierarchical level as well [65]. Project based security should implement a separate RBAC for access to data based on project/study/protocol privileges.
Biospecimen resource informatics systems that store protected health information (PHI)/ personally identifiable information (PII) should adhere to all security regulations for such data (e.g. HIPAA, HITECH). These systems should also meet the criteria for NIST data stored and accessed at the FISMA moderate level [66].
B.6.1.5. Data Access Logs
Biospecimen resource informatics systems should provide vital system statistics and audit logs of all access to PII/PHI in the database.
B.6.2. Functionality — Identification and Tracking of Biospecimens
B.6.2.1. Standard Definitions
For informatics purposes, a biospecimen refers to a physically distinct specimen usually stored in a single container. Multiple physical parts created by extraction, division into aliquots, or other physical division of a biospecimen are considered new biospecimens and are referred to in this document as samples, sometimes referred to elsewhere as derived (or child) samples, each requiring a new identifier. The origin of each sample should be recorded. Biospecimen resources should define standard terms for all lineages of biospecimens, from initial collection to subsequent divisions and extractions. Biospecimen resources should employ an existing standard terminology or modify an existing standard to harmonize data elements for semantic interoperability.
B.6.2.2. Unique Identifiers and Labels
There is a functional need to employ a method to assign either a global unique identifier (GUID) or a method to maintain the integrity of the original identifier for each biospecimen. There are research needs to verify and trace back to the original source biospecimen when its associated aliquots/derivatives are used. In addition, as biospecimens and derived samples are shared among biospecimen resources, QC questions rely on having a global unique identifier to ease traceability (see Section B.2.6.4).
Each biospecimen should be assigned a unique identifier or combination of identifiers, such as a number and/or barcode, which should not be reflective of its identity (i.e. current storage location position, clinical data, patient identifiers, etc). This recommendation is most applicable to future biospecimen collections because implementation in existing collections would be laborious. In this context, the scope within which identifiers are unique applies to an individual system and the biospecimen resources it supports, although it is recommended that if a global identifier is able to be assigned, it should be used wherever possible.
For all biospecimens, labels should be printed in both machine-readable and human readable formats. The label should link back to the inventory management software.
B.6.2.3. Tracking Significant Events
The informatics system should be able to track a biospecimen through significant events from collection through freezing/thawing, processing, storage, distribution, and possible destruction. This includes tracking of amount distributed and amount remaining of partially-used biospecimens. Restocking of returned, unused samples from the researcher — while not recommended because of potential effects of unknown handling on sample quality — should also be tracked. Tracking includes cross-referencing multiple, pre-existing, and/or external physical biospecimen identifiers, such as barcodes with non-identifying information. Any data about the sample being compromised should be noted and available to the user.
B.6.2.4. Position Identification and Updates to Location
The biospecimen resource database should be updated each time a biospecimen or sample is moved within or out of the biospecimen resource, and the informatics system should be able to track the location changes of the sample. The database must be able to identify each position in storage (i.e. the positions in the box, the box, rack, and freezer). Different storage configurations should be supported (i.e. upright and chest freezers, LN2 tanks, straws).
B.6.2.5. Query Capability
The biospecimen resource database should provide full query capability throughout the system.
B.6.2.6. Audit Trail
The biospecimen resource database should provide audit trail capability in order to track all changes made to the data, including but not limited to all specimen data, system metadata, and clinical data. The computer-generated and automatic reports should include: original data and new data; date and time changed; how the change was made; who made the changes; why the changes were made.
B.6.2.7. Annotation
Since a repository may track samples of many different studies or from different collections, consideration should be given to what the inventory management database can contain and what should be stored in an external database and linked to the inventory via a unique identification number (UID). In the case of human specimens, consideration should be given to storing confidential patient clinical information separately from inventory data such as sample information and location.
The informatics system may also be designed to handle digitally-scanned documents related to the sample. Relevant documents may include pathology reports, clinical lab reports, donor consent forms, material transfer agreements or necessary permit documentation.
B.6.3. Interoperability
B.6.3.1. General
Although biospecimen resources may have different informatics requirements based on workflow that require different informatics systems, these systems should be interoperable to integrate clinical and research data and establish distributed biospecimen resources. This interoperability should enable integration with local systems and authorized external systems.
B.6.3.2. Standards
The informatics systems should utilize data elements from a common metadata repository. Even if the systems utilized non-standard data elements for storage internally, the system design should allow for configurable translations to one or more established standards.
B.6.3.3. Regulation
Integration with clinical data systems should conform to HIPAA, HITECH, and other regulations and laws as applicable to the systems purpose, scope, and jurisdiction.
B.6.3.4. Interface
Data systems should provide a published application programming interface (API) for other systems to interact with. Changes to this interface should remain backwards compatible as much as possible in order to minimize disruption for connecting systems. The API implementation should include both automated conformance and interoperability testing to ensure robustness.
B.6.3.5. Security
Interoperability APIs should support a security layer at least as secure as other system interfaces. The API should enforce all business and security rules on connecting systems. Evaluation of the systems API should be measured against National Institute of Standards and Technology (NIST) guidelines, i.e. the NIST Special Publication 800-30, Guide for Conducting Risk Assessments [56].
B.6.3.6. Data Sharing
Biospecimen resource informatics management systems should be capable of sharing appropriate, de-identified biospecimen data to users at remote locations for multiple purposes including satisfying reporting and regulatory requirements as well as searching for potential biospecimens for a proposed scientific study. The NIH has developed the NCI Specimen Resource Locator [67], a tool for interoperability that aids biospecimen resources in distributing and locating biospecimens.
B.6.3.7. Results Data
If the results data is stored in the biobank information management system, then it must adhere to all of the criteria listed above.
B.6.4. Selection of Biospecimen Resource Informatics Management Systems
B.6.4.1. Organizational Requirements
Biospecimen resources should engage all stakeholders (IT office, clinicians, researchers, etc.) in the requirements gathering phase to identify system features and functionality. The organizational requirements for a tracking system should reflect the needs of all users and should comply with data protection policy. Use case scenarios are a recommended tool to document the needs of all users.
B.6.4.2. Technical Requirements
Biospecimen resources should identify the minimum set of requirements such as:
- Computing platforms
- Scalability requirements
- Performance requirements
- Connectivity requirements
Common requirements to gather and evaluate are: biospecimen tracking, biospecimen processing and history, data entry, data verification, querying and reporting, label printing/scanning, audit trails, interoperability, security, scalability, validation and implementation requirements, infrastructure requirements, IT support requirements, number of users, cost for purchase and maintenance.
B.6.4.3 Information Management Systems Evaluations
Biospecimen resources should use criteria identified above to judge mature and commercially available systems, taking into account the specific organizational and technical requirements. It is critical that the original stakeholders are involved at all phases of the evaluation process.
As part of the evaluations, an assessment of the system provider must be performed for their capability to provide implementation, support, and ongoing maintenance.
B.6.4.4 Build versus Buy System
This is a complex question with many considerations on resources, personnel, schedules, budgets, politics, and organizational bias. Building a customized system will allow the biospecimen resources to have the interface to exactly meet the operational requirements and workflow, but requires resources, funding, and a commitment to ongoing maintenance. Purchasing a system allows the biospecimen resources to take advantage of existing technology at a reduced cost and implementation timeline, but with an interface that does not precisely map to the original needs. There is no standard answer to this question; individual biospecimen resources must review the system requirements and make a strategic decision on the best path forward for the organization.
B.6.5. Validation and Operation of Biospecimen Resource Informatics Systems
B.6.5.1. Dependability
Biospecimen resource informatics management systems should have an operational infrastructure to support operation access 24 hours a day, 7 days a week.
B.6.5.2. Disaster Recovery
Biospecimen resource informatics management systems should have processes defined and in place to cope with system downtimes and disaster recovery. ystem backups and restores should be tested on a regular basis to ensure the quality of the backup media and the restore process. All data stored outside the system should be encrypted to secure PHI/PII.
B.6.5.3. Quality Control
Biospecimen resource informatics management systems should be periodically evaluated to ensure that the system is fulfilling the criteria advised in best practices and the latest needs of the biospecimen resource. Random quality control checks should be performed on the physical inventory confirming that the physical location of stored biospecimens matches that provided in the informatics system. All system tools and methods should be validated to ensure their accuracy in performing that task.
B.6.5.4. Physical Security
All biospecimen resource databases at an individual institution should be in a secure site monitored by the institution. Resources without the capabilities to provide such infrastructure should seek external hosting arrangements for their informatics system.
B.6.5.5 Software System Validation
Initial validation of the informatics system should be well-documented ensuring data integrity, accurate process workflow, and adequate audit trail. Regulations such as the FDA’s 21 CFR Part 11 dictate requirements to include in the validation plan.
A detailed written validation plan must identify high risk areas in the software and how they will be thoroughly tested. Particularly susceptible areas are data migration points, data flow junctures, system configurable areas, and any customized features.
A new software implementation requires more comprehensive testing than an upgrade to an existing system. A system upgrade should include re-testing of updated program elements and any high risk areas of the program, whether presumed to be updated or not. To adequately test an upgraded system, a copy of the existing data should be used in a separate test environment.
Subsequent validation of each upgrade to the system should replicate a portion of the initial validation to prevent unidentified regression errors as well as a full validation of the upgraded portion of the system.
B.6.6. Regulatory Issues Pertaining to Informatics Systems
Besides those issues identified in the Ethical, Legal, and Policy section in these guidelines, the following regulatory issues should be addressed as applicable.
B.6.6.1. Regulations
Biospecimen resources should meet relevant regulatory requirements, including but not limited to:
- State and Federal Regulations
- Privacy Protection
- 508 Compliance
- Security Regulations
B.6.6.2. Security
Biospecimen resources should refer to the NIST Special Publication 800-30 Guide for Conducting Risk Assessments [56], as applicable, to determine the appropriate level of security for informatics systems.
B.6.6.3. HIPAA/HITECH
Any PHI or PII data stored in the informatics system should be flagged as such and masked from incidental viewing. Only those users with specific authorization to view this data should be allowed access. All access to this data should be logged in a secure, non-editable, permanent audit trail.