欢迎您注册蒲公英
您需要 登录 才可以下载或查看,没有帐号?立即注册
x
本帖最后由 歪说歪有李 于 2024-11-18 22:44 编辑
数据质量,应有哪些维度?
如果你有耐心看完,你肯定不吝一个专业度的,对吧?
Data Governance, Data Integrity, and Data Quality: What’s theConnection? 数据治理、数据完整性和数据质量:有啥弯弯绕儿? Published on: November 7, 2024 Chris Burgess ------------------------------------------------- 感谢GxP视界的投喂。感谢Chris Burgess大神的演绎。 -------------------------------------------- Abstract 摘要 Nomenclature is important. Data governance, data integrity, and dataquality are all widely used terms, but what do they actually mean and how arethey connected? The purpose of this article is to provide a structured modelfor these terms with their definitions and their relationships in the contextof analysis and testing within a pharmaceutical quality system. 起名是个玄学。数据治理、数据完整性和数据质量都是大家耳熟能详的词儿,但它们的实际含义是什么,它们是如何交织起来的?本文的目的是为这些术语提供一个结构化模型,以解读它们在药品质量体系内的分析和测试背景下的定义及其关系。 The crucial concept is that data quality, including data integrity, isonly attainable via data governance, as will be illustrated in the proposedmodel. 正如下文拟议的模型展示的那样,关键概念是,数据质量(包括数据完整性)只能通过数据治理来实现。 Introduction 简介 In a regulatory context, the establishment of measured values andsubsequent reportable results of a predefined quality is an essential activity.Reportable results are then compared with predetermined acceptance criteriaand/or standards and specifications. The processes, mechanisms, and controlsystems necessary to establish measurement values and reportable results of adefined quality are interlinked. These interlinkages may include metrological,procedural, or organizational elements. 在监管环境中,建立预定义质量的测量值和随后的可报告结果是一项必不可少的活动。然后将可报告的结果与预先确定的验收标准和/或标准和规范进行比较。建立定义质量的测量值和可报告结果所需的过程、机制和控制系统是相互关联的。这些相互联系可能包括计量、程序或组织元素。 The overall proposed model is built up using building blocks akin tothe construction of Lego brick models. This approach has been used in previouspapers concerning error budgets in measurement uncertainty and Monte Carlo Simulation, and data quality within alifecycle approach (1–2). 整体提议的模型是使用类似于乐高积木模型的构建块构建的。这种方法已在以前的论文中使用,涉及测量不确定度和蒙特卡洛模拟中的误差预算,以及生命周期方法中的数据质量(1-2)。 An analytical testing flow diagram, shown in Figure 1, gives ahigh-level idea of the traceability and subsequent interactions from themarketing authorization (new drug application, NDA, or abbreviated new drugapplication, ANDA) to the reportable result and data quality. 图 1 所示的分析测试流程图简要介绍了可追溯性以及从上市许可(新药申请、NDA 或简化新药申请、ANDA)到可报告结果和数据质量的后续交互。
Figure 1: Ahigh-level process flow indicating some of the elements of data governance,data quality, and data integrity, as well as the quality management system(QMS). 图 1:一个概览流程图,揭示了数据治理、数据质量和数据完整性以及质量管理体系(QMS)的一些要素。
This process flow has been translated intothe Lego brick model shown in Figure 2. This model is based upon adata quality within a lifecycle approach model which has been modified andextended, as will be described in some detail (2). 该流程已转换为图 2 所示的乐高积木模型。该模型基于生命周期方法模型中的数据质量,该模型已被修改和扩展,并将详细描述(2)。
Figure 2: Data governance model modifiedand extended from the data quality within a lifecycle approach model (taken fromreference 2, Figure 15). 图 2:从生命周期方法的数据质量模型中进行修改和扩展的数据治理模型(取自参考文献 2,图 15进行了修改和扩展)。 注:分析程序里的验证与确认应该是Validation & verification。
The crucial concept is that data quality isonly attainable via data governance. 关键得认清,只有通过数据治理才能实现数据质量。
The data governance and data qualityLego brick model 数据治理和数据质量乐高积木模型
Data quality is not an accident but aproduct of design. It is a combination of data integrity and the functionalityand control under the Pharmaceutical Quality System, usually termed the qualitymanagement system (QMS). These aspects are underpinned by data governance andqualified information technology (IT) infrastructure services. This isessentially a sandwich structure in which the “filling” is provided by metrological integrity, analytical procedureintegrity, and quality oversight. These “fillings” are described below. 数据质量不是凭空而来,而是设计的产物。它是数据完整性与药品质量体系(通常称为质量管理体系 (QMS))下的功能和控制的组合。这些方面以数据治理和已确认的信息技术 (IT)基础设施服务为基础。这本质上是一种夹层结构,其中“积木块”由计量完整性、分析程序完整性和质量监督提供。这些 “积木块” 如下所述。
All the elements shown in Figure 2 aresubject to risk assessments and risk management (3), which will not bediscussed here. 图 2 中所示的所有要素都受风险评估和风险管理 (3) 的约束,这里不再讨论。 The qualified IT infrastructure serviceswith cybersecurity and access control, as seen above, also will not be coveredfurther in this article. 如上所述,具有网络安全和访问控制功能的已确认 IT 基础设施服务也将在本文中不再进一步介绍。 It is, however, necessary to discuss thekey elements of data integrity and the Pharmaceutical Quality System which,when combined, generate data quality on the foundation of data governance. 然而,有必要讨论数据完整性和药品质量体系的关键要素,当它们结合起来时,可以在数据治理的基础上产生数据质量。 Data governance is the totality ofarrangements to ensure that data, irrespective of the format in which they aregenerated, are recorded, processed, retained, and used to ensure a complete,consistent, and accurate record throughout the data lifecycle (4). 数据数据治理是确保数据(无论以何种格式生成)得到记录、处理、保留和使用的整体安排,以确保在整个数据生命周期中保持完整、一致和准确的记录(4)。
Data integrity The purpose of an analytical procedure isto provide a reportable result of the analytical characteristic or qualityattribute being determined. Analysis and testing require a measurement system,and a procedure for its application to a sample. 分析程序的目的是提供所确定的分析特征或质量属性的可报告结果。分析和测试需要一个测量系统,以及将其应用于样品的程序。 Data integrity is underpinned by the firstbrick, the metrological integrity of the instrument or system’s operational performance, with demonstrable assurance that it is “fit for intended use” within a specificanalytical procedure, which is “fit for intendedpurpose” over the data lifecycle. 数据完整性的基础是第一块,即仪器或系统运行性能的计量完整性,并有明显保证它在特定分析程序中“适合预期用途”,即在数据生命周期内“适合预期用途”。
Metrological integrity Analysis and testing usually involve theuse of an apparatus, analytical instrument, or system to make a measurement.Therefore, establishment of “fitness for intended use” for any apparatus, analytical instruments, or systems used inanalysis and testing is necessary to ensure metrological integrity over theoperational ranges required. 分析和测试通常涉及使用仪器、分析仪器或系统进行测量。因此,有必要为用于分析和测试的任何仪器、分析仪器或系统建立“适合预期用途”,以确保所需操作范围内的计量完整性。 Therefore, it is essential to establish “fitness for intended use” before theanalytical procedure is performed. The main resource for instrument and systemrequirements are the specific monographs and general chapters in thepharmacopeias. In particular, United States Pharmacopeia (USP) has a uniquegeneral chapter on the lifecycle processes and requirements for ensuring thatany apparatus, analytical instrument, or system is “fitfor intended use,” as seen in Figure 3 (5). 因此,在执行分析程序之前确定“预期用途的适用性”至关重要。仪器和系统要求的主要资源是药典中的具体各论和通则。特别是,美国药典(USP) 有一个独特的通则,介绍了生命周期流程和要求,以确保任何仪器、分析仪器或系统“适合预期用途”,如图 3 (5) 所示。
Figure 3: Data quality outline for ananalysis and testing quality control (QC) model using USP references. 图 3:索引至 USP的分析和测试质量控制(QC) 模型的数据质量概述。
Assurance lifecycle activities include: 保障生命周期活动包括: • analytical instrument andsystem qualification分析仪器和系统确认 • application software validation应用软件验证 • calibration over theoperational ranges of critical measurement functions在关键测量功能的工作范围内进行校准 • maintenance and change control维护和变更控制 • trend analysis to monitor anongoing state of control. 用于监控持续控制状态的趋势分析 The second component of data integrity is avalidated or verified analytical procedure performed by a trained analyst. 数据完整性的第二个组成部分是由训练有素的分析师来执行的经过验证或验证的分析程序。
Analytical procedure integrity It has been a requirement for more than 20years that analytical methods and procedures need to be validated or verified(6–7). Recently, these requirements have been updatedby the International Council for Harmonisation (ICH) and a new guideline onanalytical procedure development issued (8–9). USPGeneral Chapter <1220> on Analytical Procedure Lifecycle should beconsulted (10). 20 多年来,分析方法和程序需要得到验证或验证 (6–7) 一直是一项要求。最近,ICH更新了这些要求,并发布了新的分析程序开发指南(8-9)。美国药典应查阅关于分析程序生命周期的通则<1220> (10)。 Critical lifecycle activities include: 关键的生命周期活动包括: • analytical target profile anddevelopment分析目标概况和开发 • qualification and verification确认和核实? • sample management andpreparation样品管理和制备 • use of reference standards标准物质的使用 • trained analysts and secondperson review训练有素的分析师和第二人审查 • ongoing performanceverification持续的性能核实 • deviation management and changecontrol. 偏差管理和变更控制。 Particular attention should be takenregarding analyst training and second person review within the laboratory. 应特别注意实验室内的分析人员培训和第二人审查。 Second person review is essential inensuring data quality. 第二人称审查对于确保数据质量至关重要。
The Pharmaceutical Quality System Quality control is the guardian ofscientific soundness, whereas the quality assurance function is the guardian ofcompliance. To perform this duty of care, the quality assurance functionrequires a robust and comprehensive quality management system that enshrinesthe elements to provide and perform the necessary quality oversight over thedata lifecycle. 质量控制是科学健全性的守护者,而质量保证职能是合规性的守护者。为了履行这一注意义务,质量保证职能需要一个强大而全面的质量管理体系,该体系包含了在数据生命周期中提供和执行必要质量监督的要素。 Quality oversight involves both reviewingand auditing activities of QC but also includes the Pharmaceutical QualitySystem implementation itself to ensure that it is up to date (11). 质量监督涉及 QC 的审核和审计活动,但也包括药品质量体系的实施本身,以确保其是最新的(11)。 Quality oversight covers key areas such as: 质量监督涵盖关键领域,例如: • policies政策 • procedures程序 • good documentation practice(GDocP) 良好文档管理规范 (GDocP) • training plans and records培训计划和记录 • data integrity audits andinvestigations数据完整性审计和调查 • records management andarchiving记录管理和存档 • second person review (12). 第二人复核(12)。
ALCOA models for data integrity用于数据完整性的 ALCOA 模型 Much has been written on this topic,particularly regarding the three ALCOA models and the meanings of theiracronyms (4). These acronyms are summarized below and illustrated in Figure 4. 关于这个主题已经写了很多文章,特别是关于三种 ALCOA 模型及其首字母缩略词的含义(4)。这些首字母缩略词总结如下,如图 4 所示。
Figure 4: Pictorial representation of thethree ALCOA models. 图 4:三种 ALCOA 模型的示意图。
ALCOA (13) Attributable 可归属的 It must be possible to identify theindividual or computerized system that performed a recorded task, and when thetask was performed. This also applies to any changes made to records, such as corrections, deletions,and changes, where it is important to know who made a change,when, and why. 必须能够识别执行记录任务的个人或计算机化系统,以及任务的执行时间。这也适用于对记录所做的任何更改,例如更正、删除和更改,在这些情况下,了解更改者、更改时间和原因非常重要。 Legible 清晰易读的 All data, including any associatedmetadata, should be unambiguously readable throughout the lifecycle. Legibilityalso extends to any changes or modification tothe original data made by an authorized individual so that the original entryis not obscured. 所有数据(包括任何关联的元数据)在整个生命周期中都应该是明确的可读的。易读性还适用于授权个人对原始数据所做的任何更改或修改,以便原始条目不被遮挡。 Contemporaneous 同步的 Data should be recorded on paper orelectronically at the time the observation is made. All data entries must bedated and signed by the person entering the data. 在进行观察时,数据应以纸质或电子方式记录下来。所有数据条目必须注明日期并由数据输入者签名。 Original 原始的 The original record is the first capture ofinformation, whether recorded on paper (static) or electronically (usuallydynamic, depending on the complexity of the system). Data or informationoriginally captured in a dynamic state remain in that state. 原始记录是信息的第一次捕获,无论是纸质记录(静态)还是电子记录(通常是动态的,取决于系统的复杂程度)。最初以动态状态捕获的数据或信息将保持该状态。 Accurate 准确的 Records need to be a truthfulrepresentation of facts to be accurate. No errors in the originalobservation(s) and no editing are allowed without documented amendments or audit trail entries by authorizedpersonnel. Accuracy is assured and verified by a documented review includingreview of audit trails. 记录需要真实地陈述事实才能准确。原始观察结果中没有错误,未经授权人员不允许进行书面修正或审计跟踪录入。通过书面审查(包括审计跟踪审查)来确保和核实准确性。
ALCOA+ (14) Complete 完整的 All data from an analysis, including anydata generated including original data, data before and after repeat testing,reanalysis, modification, recalculation, reintegration, and deletion. Forhybrid systems, the paper output must be linked to the underlying electronicrecords used to produce it. A complete record of data generated electronicallyincludes relevant metadata. 分析中的所有数据,包括生成的任何数据,包括原始数据、重复测试、重新分析、修改、重新计算、重新积分和删除之前和之后的数据。对于混合系统,纸张输出必须与用于生成它的基础电子记录相关联。以电子方式生成的数据的完整记录包括相关元数据。 Consistent 连续的 Data and information records should becreated, processed, and stored in a logical manner that has a definedconsistency. This includes policies or procedures that help control orstandardize data (such as chronological sequencing, date formats, units ofmeasurement, approaches to rounding, significant digits, etc.). 数据和信息记录应以具有定义一致性的逻辑方式创建、处理和存储。这包括有助于控制或标准化数据的政策或程序(例如按时间顺序排列、日期格式、度量单位、舍入方法、有效数字等)。 Enduring 耐久的 Data are recorded in a permanent,maintainable, authorized media form during the retention period. Records shouldbe kept in a manner such that they continue to exist and are accessible for theentire period during which they are needed. They need to remain intact as anindelible and durable record throughout the record retention period. 在保留期内,数据以永久、可维护、授权的媒体形式记录。记录的保存方式应使其继续存在,并且可以在需要它们的整个期间内访问。它们需要在整个记录保留期内保持完整,成为不可磨灭且持久的记录。 Available 可用可及的 Records should be available for review atany time during the required retention period, accessible in a readable formatto all applicable personnel who are responsible for their review, whether forroutine release decisions, investigations, trending, annual reports, audits, orinspections. 记录应在规定的保留期内随时可供查阅,并应以可读格式提供给负责其审查的所有适用人员,无论是用于例行发布决定、调查、趋势、年度报告、审计还是检查。
ALCOA++ (4,15) Traceable 可追溯的 Data should be traceable though thelifecycle. Any changes to data or metadata should be explained and should betraceable without obscuring the original information. Timestamps should betraceable to a trusted time source. Metrological standards and instrument orsystem qualification should be traceable to international standards whereverpossible. 数据应该在整个生命周期中都是可追溯的。对数据或元数据的任何更改都应该得到解释,并且应该是可追溯的,而不会掩盖原始信息。时间戳应可追溯到受信任的时间源。计量标准和仪器或系统确认应尽可能追溯至国际标准。
Data quality 数据质量 Data quality is a combination of dataintegrity and overall control as part of the pharmaceutical quality system. 数据质量是数据完整性和整体控制的组合,是药品质量体系的一部分。 An example of a quality control dataquality outline for analysis and testing, using examples from USP, isillustrated in Figure 3. 图 3 显示了用于分析和测试的质量控制数据质量概要示例,其中使用了 USP 中的示例。
Summary Data quality cannot be assured without adata governance structure supported by a qualified IT infrastructure serviceswith cybersecurity and access control. 如果没有由具有网络安全和访问控制的已确认 IT 基础设施服务支持的数据治理结构,就无法保证数据质量。 The proposed Lego brick model provides astructural framework for assuring data quality over the lifecycle. 提出的乐高积木模型提供了一个结构框架,用于确保整个生命周期的数据质量。 A short glossary of definitions of keyterms is appended. 附录中附有关键术语定义的简短词汇表。
Acknowledgements 申明 I wish to thank Bob McDowall and OscarQuatrocchi for their review and helpful comments. 我要感谢 Bob McDowall 和Oscar Quatrocchi 的审阅和有益评论。 Definitions of key terms 关键术语定义 Data governance 数据治理 The totality of arrangements to ensure thatdata, irrespective of the format in which they are generated, are recorded,processed, retained, and used to ensure a complete, consistent, and accuraterecord throughout the data lifecycle (16). 确保数据(无论以何种格式生成)被记录、处理、保留和使用的整体安排,以确保在整个数据生命周期中保持完整、一致和准确的记录 (16)。 Data integrity Data integrity is the degree to which dataare complete, consistent, accurate, trustworthy, reliable, and that thesecharacteristics of the data are maintained throughout the data lifecycle. Thedata should be collected and maintained in a secure manner, so that they areattributable, legible, contemporaneously recorded, original (or a true copy),and accurate. Assuring data integrity requires appropriate quality and riskmanagement systems, including adherence to sound scientific principles and gooddocumentation practices (16). 数据完整性是指数据的完整、一致、准确、可信、可靠的程度,以及数据的各个特性在整个数据生命周期中得到维护的程度。应以安全的方式收集和维护数据,使其可归属、清晰易读、同时记录、原始(或真实副本)和准确。确保数据完整性需要适当的质量和风险管理系统,包括遵守健全的科学原则和良好的文档实践(16)。 Data lifecycle 数据生命周期 All phases in the life of the data(including raw data), from initial generation and recording through processing(including transformation or migration), use, data retention, archive andretrieval, and destruction (17). 数据(包括原始数据)生命周期的所有阶段,从初始生成和记录到处理(包括转换或迁移)、使用、数据保留、存档和检索以及销毁 (17)。 Data quality 数据质量 The assurance that data produced is exactlywhat was intended to be produced and fit for its intended purpose (17). 保证生成的数据正是预期生成的数据,并适合其预期目的 (17)。 Good documentation practices (GDocP) 良好文档实践 Those measures that collectively andindividually ensure documentation, whether paper or electronic, meet datamanagement and integrity principles, for instance, ALCOA+ (17). 那些共同和单独确保文档(无论是纸质还是电子)符合数据管理和完整性原则的措施,例如ALCOA+ (17)。 Metadata 元数据 Data that describe the attributes of otherdata, and provide context and meaning. Typically, these are data that describethe structure, data elements, inter-relationships and other characteristics ofdata, such as audit trails. Metadata also permit data to be attributable to anindividual (or if automatically generated, to the original data source).Metadata form an integral part of the original record. Without the contextprovided by metadata, the data has no meaning (17). 描述其他数据的属性并提供上下文和含义的数据。通常,这些数据是描述数据的结构、数据元素、相互关系和其他特征(如审计跟踪)的数据。元数据还允许将数据归属于个人(或者如果自动生成,则归属于原始数据源)。元数据是原始记录不可或缺的一部分。没有元数据提供的上下文,数据就没有意义(17)。 Pharmaceutical Quality System 药品质量体系 A model for an effective quality managementsystem for the pharmaceutical industry to direct and control a pharmaceuticalcompany with regard to quality. (ICH Q10) based upon ISO 9000:2005 (11). 制药行业有效的质量管理体系模型,用于指导和控制制药公司的质量。(ICH Q10)基于 ISO 9000:2005 (11)。 Quality unit(s) 质量系统 Quality units are organizational entitieswithin the pharmaceutical quality system, necessarily independent of each otherand production, that fulfill quality control and quality assurance roles andresponsibilities. 质量系统是药品质量体系中的组织实体,独立于其他部门和生产部门,履行质量控制和质量保证的角色和责任。 Raw data 原始数据 Raw data is defined asthe original record (data) which can be described as the first capture ofinformation, whether recorded on paper or electronically. Information that isoriginally captured in a dynamic state should remain available in that state. (16). 原始数据被定义为原始记录(数据),它可以被描述为信息的第一次捕获,无论是记录在纸上还是电子记录。最初以动态状态捕获的信息应在该状态下保持可用。(16). However, US regulations for good laboratorypractice offer a better definition (18). 然而,美国关于良好实验室实践的法规提供了更好的定义 (18)。 Raw data means any laboratory worksheets,records, memoranda, notes, or exact copies thereof, that are the result oforiginal observations and activities of a nonclinical laboratory study, and arenecessary for the reconstruction and evaluation of the report of that study. 原始数据是指任何实验室工作表、记录、备忘录、笔记或其精确副本,它们是非临床实验室研究的原始观察和活动的结果,并且对于重建和评估该研究报告是必要的。
References 参考文献 1. Burgess, C. Never Mind the Statistics;Just Tell Me What the Answer Is! PharmTech.com, March 20, 2023. 2. ECA, Guide for Integrated LifecycleApproach to Analytical Instrument Qualification and System Validation, Version1 (Analytical Quality Control Group, November 2023). 3. ICH, Q9 Quality Risk Management, Step 5Version – Revision 1 (2023). 4. McDowall, R. D. Is Traceability the Gluefor ALCOA, ALCOA+, or ALCOA++? Spectroscopy 2022, 37 (4) 13–19. DOI: 10.56530/spectroscopy.up8185n1 5. USP, USP General Chapter <1058>, “Analytical Instrument Qualification,” USP-NF(Rockville, Md., 2024). DOI: 10.31003/USPNF_M1124_01_01 6. USP. USP General Chapter <1225>, “Validation of Compendial Procedures,” USP-NF(Rockville, Md., 2024). DOI: 10.31003/USPNF_M99945_04_01 7. USP. USP General Chapter <1226>, “Verification of Compendial Procedures,”USP-NF (Rockville, Md., 2024). DOI: 10.31003/USPNF_M870_03_01 8. ICH, Q2(R2) Validation of AnalyticalProcedures, Step 5 Version – Revision 1 (2024). 9. ICH, Q14 Analytical ProcedureDevelopment, Step 5 Version (2024). 10. USP. USP General Chapter <1220>, “Analytical Procedure Lifecycle,” USP-NF(Rockville, Md., 2022). 11. ICH, Q10 Pharmaceutical Quality System,Step 5 Version (2008). 12. Newton, M. E., and McDowall, R. D. DataIntegrity in the Chromatography Laboratory, Part V: Second-Person Review. LCGCNorth Am. 2018, 36 (8) 527–529. 13. Woollen, S. W., “Data Quality and the Origin of ALCOA,” TheCompass Newsletter, Summer 2010. 14. EMA, EMA/INS/GCP/454280/2010,Reflection Paper on Expectations for Electronic Source Data and DataTranscribed to Electronic Data Collection Tools in Clinical Trials (June 9,2010). 15. EMA, EMA/INS/GCP/112288/2023, Guidelineon Computerised Systems and Electronic Data in Clinical Trials (March 9, 2023). 16. MHRA, ‘GXP’ Data Integrity Guidance and Definitions, Revision 1 (March 2018). 17. PIC/S, Good Practices for DataManagement and Integrity in Regulated GMP/GDP Environments (July 2021). 18. CFR Title 21, Part 58 (GovernmentPrinting Office, Washington, DC) 58367–58380.
--------------------------------------------华丽的分割线---------------------------------------------
|