RIME model testing ------------------ RIME provides in-depth testing for your models. The following tables and links provide details on the the tests and test categories available. .. toctree:: :maxdepth: 1 /for_data_scientists/explanation/test_categories.md RIME consolidated test database ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following table lists every test that RIME can perform. Click **Add Condition** to set filtering conditions to the table in order to discover specific tests you're interested in. .. csv-table:: :class: datatable :file: test_db.csv :widths: 8 8 28 28 28 :header-rows: 1 The following table contains supplementary information about literature and regulatory requirements elevant to specific test categories. +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Test category | Relevant literature | Regulatory requirements | +====================+=================================================================================================================================================================================================================================================================================================+=================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ | Abnormal input | | Chen, Haihua, Chen, Jiangping, and Ding, Junhua. "Data Evaluation and Enhancement for Quality Improvement of Machine Learning". `IEEE Transactions on Reliability `_ 70, no. 2 (June 2021): 831–47. | | **U.S. Food & Drug Administration (FDA)**: | | | | | | | | | | Rukat, Tammo, Lange, Dustin, Schelter, Sebastian, and Biessman, Felix. "Towards Automated Data Quality Management for Machine Learning", n.d., 3. | | `FDA's Good Machine Learning Practices `_, #2. "Good Software Engineering and Security Practices Are Implemented": the entire stress testing suite fulfills the need to demonstrate "good software engineering practices, data quality assurance, data management, and robust cybersecurity practices," #8. "Testing Demonstrates Device Performance during Clinically Relevant Conditions": our entire test suite is reproducible and #10. "Deployed Models Are Monitored for Performance and Re-training Risks are Managed": our Abnormal Input tests can be continuously tested for ongoing monitoring in real-time. | | | | | | | | | | Schelter, Sebastian, Grafberger, Stefan, Schmidt, Philipp, Rukat, Tammo, Kiessling, Mario, Taptunov, Andrey, Biessmann, Felix, and Lange, Dustin. "Deequ - Data Quality Validation for Machine Learning Pipelines", n.d., 3. | | **Federal Reserve Supervisory Guidance on Model Risk Management (SR 11-7)**: | | | | | | | | | | `SR 11-7 `_ is the key piece of regulation for banking and lending institutions in the United States enforced by the Federal Reserve. Our entire stress testing suite helps fulfill the requirements SR 11-7 specifies under Section IV. Model Development, Implementation and Use: "An integral part of model development is testing, in which the various components of a model and its overall functioning are evaluated to determine whether the model is performing as intended. Model testing includes checking the model's accuracy, demonstrating that the model is robust and stable, assessing potential limitations, and evaluating the model's behavior over a range of input values. It should also assess the impact of assumptions and identify situations where the model performs poorly or becomes unreliable." In addition our entire test suite helps fulfill the requirements specified under Section V. Model Validation: "Documentation and testing should convey an understanding of model limitations and assumptions," which is a key outcome of the entire test suite. Our Continuous Testing capabilities allow for ongoing monitoring, which is another key part of SR 11-7's Section V. Model Validation: "The second core element of the validation process is ongoing monitoring. Such monitoring confirms that the model is appropriately implemented and is being used and is performing as intended. | | | | | Ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid." Further, SR 11-7 requires the evaluation of a model's conceptual soundness in Section V. Model Validation: our Abnormal Input and Distribution Drift test categoires both "verify that models are performing as expected" and "also identifies potential limitations and assumptions, and assesses their possible impact." | | | | | | | | | | **EU AI Act**: | | | | | | | | | | The EU AI Act proposes that systems must have risk management systems in place (as seen in Chapter 2, Requirements for High-Risk AI systems) throughout their entire lifecycle (must be able to identify known and foreseeable risks associated with the AI system, estimate and evaluate risks that "may emerge" when the system is "used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse"), including risk management measures. Our full stress testing suite fulfills this all-encompassing risk management system requirement. In addition, Chapter 2 of the Act requires human oversight throughout the AI lifecycle to "facilitate the respect of other fundamental rights by minimising the risk of erroneous or biased AI-assisted decisions in critical areas such as education and training, employment, important services, law enforcement and the judiciary," which the RI platform allows for by having a user-friendly UI and notifications and alerts for any updates on model testing. Model cards also allow for logging and documentation of model testing results, which can be used for reporting and tracking. | +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Attacks | | Goodfellow, Ian J., Shlens, Jonathon, and Szegedy, Christian. `“Explaining and Harnessing Adversarial Examples" `_. arXiv, March 20, 2015. | | **U.S. Food & Drug Administration (FDA)**: | | | | | | | | | | Kotyan, Shashank, and Vasconcellos Vargas, Danilo. `"Adversarial Robustness Assessment: Why Both $L_0$ and $L_\infty$ Attacks Are Necessary" `_. arXiv, July 16, 2020. | | FDA's Good Machine Learning Practices, #2. "Good Software Engineering and Security Practices Are Implemented": the entire stress testing suite fulfills the need to demonstrate "good software engineering practices, data quality assurance, data management, and robust cybersecurity practices" and #8 "Testing Demonstrates Device Performance during Clinically Relevant Conditions": our entire test suite is reproducible, specifically Distribution Drift Tests): FDA's Good Machine Learning Practices, #6. "Model Design Is Tailored to the Available Data and Reflects the Intended Use of the Device": evaluates security risk. | | | | | | | | | | Raj, Sunny, Pullum, Laura, Ramanthan, Arvind, and Jha, Sumit Kumar. `"$$\mathcal {SATYA}$$: Defending Against Adversarial Attacks Using Statistical Hypothesis Testing" `_. | | **Federal Reserve Supervisory Guidance on Model Risk Management (SR 11-7)**: | | | | In Foundations and Practice of Security, edited by Abdessamad Imine, José M. Fernandez, Jean-Yves Marion, Luigi Logrippo, and Joaquin Garcia-Alfaro, 10723:277–92. `Lecture Notes in Computer Science `_. | | | | | | Cham: Springer International Publishing, 2018. | | SR 11-7 is the key piece of regulation for banking and lending institutions in the United States enforced by the Federal Reserve. Our entire stress testing suite helps fulfill the requirements SR 11-7 specifies under Section IV. Model Development, Implementation and Use: "An integral part of model development is testing, in which the various components of a model and its overall functioning are evaluated to determine whether the model is performing as intended. Model testing includes checking the model's accuracy, demonstrating that the model is robust and stable, assessing potential limitations, and evaluating the model's behavior over a range of input values. It should also assess the impact of assumptions and identify situations where the model performs poorly or becomes unreliable." In addition our entire test suite helps fulfill the requirements specified under Section V. Model Validation: "Documentation and testing should convey an understanding of model limitations and assumptions," which is a key outcome of the entire test suite. Our Continuous Testing capabilities allow for ongoing monitoring, which is another key part of SR 11-7's Section V. Model Validation section: "The second core element of the validation process is ongoing monitoring. Such monitoring confirms that the model is appropriately implemented and is being used and is performing as intended. | | | | | Ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid." Further, SR 11-7 requires the evaluation of a model's conceptual soundness in Section V. Model Validation: our Attacks and Transformation test categories employ sensitivity analysis tests which can "check the impact of small changes in inputs and parameter values on model outputs to make sure they fall within an expected range." | | | | | | | | | | **EU AI Act**: | | | | | | | | | | The EU AI Act proposes that systems must have risk management systems in place (as seen in Chapter 2, Requirements for High-Risk AI systems) throughout their entire lifecycle (must be able to identify known and foreseeable risks associated with the AI system, estimate and evaluate risks that ‘may merge’ when the system is ‘used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse’, include risk management measures. Our full stress testing suite fulfills this all-encompassing risk management system requirement. In addition, Chapter 2 of the Act requires human oversight throughout the AI lifecycle to "facilitate the respect of other fundamental rights by minimising the risk of erroneous or biased AI-assisted decisions in critical areas such as education and training, employment, important services, law enforcement and the judiciary," which the RI platform allows for by having a user-friendly UI and notifications and alerts for any updates on model testing. Model cards also allow for logging and documentation of model testing results, which can be used for reporting and tracking. | +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Bias and Fairness | | Audit-AI. Python. 2018. Reprint, Pymetrics, 2022. https://github.com/pymetrics/audit-ai. | | **U.S. Food & Drug Administration (FDA)**: | | | | | | | | | | Corbett-Davies, Sam, and Sharad Goel. `"The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning" `_ arXiv, August 14, 2018. | | FDA's Good Machine Learning Practices, #3. "Clinical Study Participants and Data Sets Are Representative of the Intended Patient Population": the Compliance test category "ensure that the relevant characteristics of the intended patient population (for example, in terms of age, gender, sex, race, and ethnicity), use, and measurement inputs are sufficiently represented in a sample of adequate size in the clinical study and training and test datasets, so that results can be reasonably generalized to the population of interest" | | | | | | | | | | Ghosh, Avijit, Lea Genuit, and Mary Reagan. `"Characterizing Intersectional Group Fairness with Worst-Case Comparisons" `_ arXiv, May 4, 2022. | | **Federal Reserve Supervisory Guidance on Model Risk Management (SR 11-7)**: | | | | | | | | | | SR 11-7 is the key piece of regulation for banking and lending institutions in the United States enforced by the Federal Reserve. Our entire stress testing suite helps fulfill the requirements SR 11-7 specifies under Section IV. Model Development, Implementation and Use: "An integral part of model development is testing, in which the various components of a model and its overall functioning are evaluated to determine whether the model is performing as intended. Model testing includes checking the model's accuracy, demonstrating that the model is robust and stable, assessing potential limitations, and evaluating the model's behavior over a range of input values. It should also assess the impact of assumptions and identify situations where the model performs poorly or becomes unreliable." In addition our entire test suite helps fulfill the requirements specified under Section V. Model Validation: "Documentation and testing should convey an understanding of model limitations and assumptions," which is a key outcome of the entire test suite. Our Continuous Testing capabilities allow for ongoing monitoring, which is another key part of SR 11-7's Section V. Model Validation section: "The second core element of the validation process is ongoing monitoring. Such monitoring confirms that the model is appropriately implemented and is being used and is performing as intended. | | | | | Ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid." Further, SR 11-7 requires the evaluation of a model's conceptual soundness in Section V. Model Validation: the guidelines require both that "validation should be done by people who are not responsible for development or use and do not have a stake in whether a model is determined to be valid" and prevents bias, which our Compliance test cateogry fulfills. | | | | | | | | | | **EU AI Act**: | | | | | | | | | | The EU AI Act proposes that systems must have risk management systems in place (as seen in Chapter 2, Requirements for High-Risk AI systems ) throughout their entire lifecycle (must be able to identify known and foreseeable risks associated with the AI system, estimate and evaluate risks that ‘may merge’ when the system is ‘used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse’, include risk management measures. Our full stress testing suite fulfills this all-encompassing risk management system requirement. In addition, Chapter 2 of the Act requires human oversight throughout the AI lifecycle to "facilitate the respect of other fundamental rights by minimising the risk of erroneous or biased AI-assisted decisions in critical areas such as education and training, employment, important services, law enforcement and the judiciary," which the RI platform allows for by having a user-friendly UI and notifications and alerts for any updates on model testing. Model cards also allow for logging and documentation of model testing results, which can be used for reporting and tracking. | +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Data cleanliness | | | **U.S. Food & Drug Administration (FDA)**: | | | | | | | | | | FDA's Good Machine Learning Practices, #2. "Good Software Engineering and Security Practices Are Implemented": the entire stress testing suite fulfills the need to demonstrate "good software engineering practices, data quality assurance, data management, and robust cybersecurity practices" and #8 "Testing Demonstrates Device Performance during Clinically Relevant Conditions": our entire test suite is reproducible. | | | | | | | | | | **Federal Reserve Supervisory Guidance on Model Risk Management (SR 11-7)** | | | | | | | | | | SR 11-7 is the key piece of regulation for banking and lending institutions in the United States enforced by the Federal Reserve. Our entire stress testing suite helps fulfill the requirements SR 11-7 specifies under Section IV. Model Development, Implementation and Use: "An integral part of model development is testing, in which the various components of a model and its overall functioning are evaluated to determine whether the model is performing as intended. Model testing includes checking the model's accuracy, demonstrating that the model is robust and stable, assessing potential limitations, and evaluating the model's behavior over a range of input values. It should also assess the impact of assumptions and identify situations where the model performs poorly or becomes unreliable." In addition our entire test suite helps fulfill the requirements specified under Section V. Model Validation: "Documentation and testing should convey an understanding of model limitations and assumptions," which is a key outcome of the entire test suite. Our Continuous Testing capabilities allow for ongoing monitoring, which is another key part of SR 11-7's Section V. Model Validation section: "The second core element of the validation process is ongoing monitoring. Such monitoring confirms that the model is appropriately implemented and is being used and is performing as intended. | | | | | Ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid." | | | | | | | | | | **Specific to the Feature Leakage test**: | | | | | | | | | | FDA's Good Machine Learning Practices, #4. "Training Data Sets Are Independent of Test Sets": allows "training and test datasets are selected and maintained to be appropriately independent of one another" | +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Distribution drift | | Ackerman, Samuel, Eitan Farchi, Orna Raz, Marcel Zalmanovici, and Parijat Dube. `“Detection of Data Drift and Outliers Affecting Machine Learning Model Performance over Time” `_. arXiv, January 20, 2021. | | **U.S. Food & Drug Administration (FDA)**: | | | | | | | | | | Ackerman, Samuel, Orna Raz, Marcel Zalmanovici, and Aviad Zlotnick. `"Automatically Detecting Data Drift in Machine Learning Classifiers" `_. arXiv, November 10, 2021. | | FDA's Good Machine Learning Practices, #2. "Good Software Engineering and Security Practices Are Implemented": the entire stress testing suite fulfills the need to demonstrate "good software engineering practices, data quality assurance, data management, and robust cybersecurity practices," #8. "Testing Demonstrates Device Performance during Clinically Relevant Conditions": our entire test suite is reproducible, and #10. "Deployed Models Are Monitored for Performance and Re-training Risks are Managed": our Distribution Drift tests can be continuously tested for ongoing monitoring in real-time. | | | | | | | | | | Mallick, Ankur, Kevin Hsieh, Behnaz Arzani, and Gauri Joshi. "Matchmaker: Data Drift Mitigation in Machine Learning for Large-Scale Systems". Proceedings of Machine Learning and Systems 4 (April 22, 2022): 77–94. | | **Specific to the Distribution Drift tests**: | | | | | | | | | | FDA's Good Machine Learning Practices, #6. "Model Design Is Tailored to the Available Data and Reflects the Intended Use of the Device": evaluates "available data and supports the active mitigation of known risks, like overfitting, performance degradation" | | | | | | | | | | **Federal Reserve Supervisory Guidance on Model Risk Management (SR 11-7)**: | | | | | | | | | | SR 11-7 is the key piece of regulation for banking and lending institutions in the United States enforced by the Federal Reserve. Our entire stress testing suite helps fulfill the requirements SR 11-7 specifies under Section IV. Model Development, Implementation and Use: "An integral part of model development is testing, in which the various components of a model and its overall functioning are evaluated to determine whether the model is performing as intended. Model testing includes checking the model's accuracy, demonstrating that the model is robust and stable, assessing potential limitations, and evaluating the model's behavior over a range of input values. It should also assess the impact of assumptions and identify situations where the model performs poorly or becomes unreliable." In addition our entire test suite helps fulfill the requirements specified under Section V. Model Validation: "Documentation and testing should convey an understanding of model limitations and assumptions," which is a key outcome of the entire test suite. Our Continuous Testing capabilities allow for ongoing monitoring, which is another key part of SR 11-7's Section V. Model Validation section: "The second core element of the validation process is ongoing monitoring. Such monitoring confirms that the model is appropriately implemented and is being used and is performing as intended. | | | | | Ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid." Further, SR 11-7 requires the evaluation of a model's conceptual soundness in Section V. Model Validation: our Abnormal Input and Distribution Drift test categoires both "verify that models are performing as expected" and "also identifies potential limitations and assumptions, and assesses their possible impact." | | | | | | | | | | **EU AI Act**: | | | | | | | | | | The EU AI Act proposes that systems must have risk management systems in place (as seen in Chapter 2, Requirements for High-Risk AI systems ) throughout their entire lifecycle (must be able to identify known and foreseeable risks associated with the AI system, estimate and evaluate risks that ‘may merge’ when the system is ‘used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse’, include risk management measures. Our full stress testing suite fulfills this all-encompassing risk management system requirement. In addition, Chapter 2 of the Act requires human oversight throughout the AI lifecycle to "facilitate the respect of other fundamental rights by minimising the risk of erroneous or biased AI-assisted decisions in critical areas such as education and training, employment, important services, law enforcement and the judiciary," which the RI platform allows for by having a user-friendly UI and notifications and alerts for any updates on model testing. Model cards also allow for logging and documentation of model testing results, which can be used for reporting and tracking. | +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Subset performance | | Ackerman, Samuel, Orna Raz, and Marcel Zalmanovici. `"FreaAI: Automated Extraction of Data Slices to Test Machine Learning Models" `_. | | **U.S. Food & Drug Administration (FDA)**: | | | | In Engineering Dependable and Secure Machine Learning Systems, edited by Onn Shehory, Eitan Farchi, and Guy Barash, 67–83. `Communications in Computer and Information Science `_. Cham: Springer International Publishing, 2020. | | | | | | | | FDA's Good Machine Learning Practices, #2. "Good Software Engineering and Security Practices Are Implemented": the entire stress testing suite fulfills the need to demonstrate "good software engineering practices, data quality assurance, data management, and robust cybersecurity practices" and #8 "Testing Demonstrates Device Performance during Clinically Relevant Conditions": our entire test suite is reproducible. | | | | Gattermann-Itschert, Theresa, and Ulrich W. Thonemann. "How Training on Multiple Time Slices Improves Performance in Churn Prediction", `European Journal of Operational Research `_ 295, no. 2 (December 1, 2021): 664–74. | | | | | | | | **Specific to the Subset Performance test**: | | | | Wexler, James, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viégas, and Jimbo Wilson. "The What-If Tool: Interactive Probing of Machine Learning Models". | | | | | | `IEEE Transactions on Visualization and Computer Graphics `_ 26, no. 1 (January 2020): 56–65. | | FDA's Good Machine Learning Practices, #5. "Selected Reference Datasets Are Based Upon Best Available Methods": ensures "that clinically relevant and well characterized data are collected and the limitations of the reference are understood" | | | | | | | | | | **Federal Reserve Supervisory Guidance on Model Risk Management (SR 11-7)**: | | | | | | | | | | SR 11-7 is the key piece of regulation for banking and lending institutions in the United States enforced by the Federal Reserve. Our entire stress testing suite helps fulfill the requirements SR 11-7 specifies under Section IV. Model Development, Implementation and Use: "An integral part of model development is testing, in which the various components of a model and its overall functioning are evaluated to determine whether the model is performing as intended. Model testing includes checking the model's accuracy, demonstrating that the model is robust and stable, assessing potential limitations, and evaluating the model's behavior over a range of input values. It should also assess the impact of assumptions and identify situations where the model performs poorly or becomes unreliable." In addition our entire test suite helps fulfill the requirements specified under Section V. Model Validation: "Documentation and testing should convey an understanding of model limitations and assumptions," which is a key outcome of the entire test suite. Our Continuous Testing capabilities allow for ongoing monitoring, which is another key part of SR 11-7's Section V. Model Validation section: "The second core element of the validation process is ongoing monitoring. Such monitoring confirms that the model is appropriately implemented and is being used and is performing as intended. | | | | | Ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid." | | | | | | | | | | **EU AI Act**: | | | | | | | | | | The EU AI Act proposes that systems must have risk management systems in place (as seen in Chapter 2, Requirements for High-Risk AI systems ) throughout their entire lifecycle (must be able to identify known and foreseeable risks associated with the AI system, estimate and evaluate risks that ‘may merge’ when the system is ‘used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse’, include risk management measures. Our full stress testing suite fulfills this all-encompassing risk management system requirement. In addition, Chapter 2 of the Act requires human oversight throughout the AI lifecycle to "facilitate the respect of other fundamental rights by minimising the risk of erroneous or biased AI-assisted decisions in critical areas such as education and training, employment, important services, law enforcement and the judiciary," which the RI platform allows for by having a user-friendly UI and notifications and alerts for any updates on model testing. Model cards also allow for logging and documentation of model testing results, which can be used for reporting and tracking. | +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Transformations | | Balestriero, Randall, Leon Bottou, and Yann LeCun. `“The Effects of Regularization and Data Augmentation Are Class Dependent.” `_ arXiv, April 8, 2022. | | **U.S. Food & Drug Administration (FDA)**: | | | | | | | | | | Bhagoji, Arjun Nitin, Daniel Cullina, Chawin Sitawarin, and Prateek Mittal. "Enhancing Robustness of Machine Learning Systems via Data Transformations". In `2018 52nd Annual Conference on Information Sciences and Systems (CISS) `_, 1–5, 2018. | | FDA's Good Machine Learning Practices , #2. "Good Software Engineering and Security Practices Are Implemented": the entire stress testing suite fulfills the need to demonstrate "good software engineering practices, data quality assurance, data management, and robust cybersecurity practices" and #8 "Testing Demonstrates Device Performance during Clinically Relevant Conditions": our entire test suite is reproducible. | | | | | | | | | | Woodie, Alex. "Why You Need Data Transformation in Machine Learning" `Datanami `_, November 8, 2019. | | **Federal Reserve Supervisory Guidance on Model Risk Management (SR 11-7)**: | | | | | | | | | | SR 11-7 is the key piece of regulation for banking and lending institutions in the United States enforced by the Federal Reserve. Our entire stress testing suite helps fulfill the requirements SR 11-7 specifies under Section IV. Model Development, Implementation and Use: "An integral part of model development is testing, in which the various components of a model and its overall functioning are evaluated to determine whether the model is performing as intended. Model testing includes checking the model's accuracy, demonstrating that the model is robust and stable, assessing potential limitations, and evaluating the model's behavior over a range of input values. | | | | | It should also assess the impact of assumptions and identify situations where the model performs poorly or becomes unreliable." In addition our entire test suite helps fulfill the requirements specified under Section V. Model Validation: "Documentation and testing should convey an understanding of model limitations and assumptions," which is a key outcome of the entire test suite. Our Continuous Testing capabilities allow for ongoing monitoring, which is another key part of SR 11-7's Section V. Model Validation section: "The second core element of the validation process is ongoing monitoring. Such monitoring confirms that the model is appropriately implemented and is being used and is performing as intended. Ongoing monitoring is essential to evaluate whether changes in products, exposures, activities, clients, or market conditions necessitate adjustment, redevelopment, or replacement of the model and to verify that any extension of the model beyond its original scope is valid." Further, SR 11-7 requires the evaluation of a model's conceptual soundness in Section V. Model Validation: our Attacks and Transformation test categories employ sensitivity analysis tests which can "check the impact of small changes in inputs and parameter values on model outputs to make sure they fall within an expected range." | | | | | | | | | | **EU AI Act**: | | | | | | | | | | The EU AI Act proposes that systems must have risk management systems in place (as seen in Chapter 2, Requirements for High-Risk AI systems throughout their entire lifecycle (must be able to identify known and foreseeable risks associated with the AI system, estimate and evaluate risks that ‘may merge’ when the system is ‘used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse’, include risk management measures. Our full stress testing suite fulfills this all-encompassing risk management system requirement. In addition, Chapter 2 of the Act requires human oversight throughout the AI lifecycle to "facilitate the respect of other fundamental rights by minimising the risk of erroneous or biased AI-assisted decisions in critical areas such as education and training, employment, important services, law enforcement and the judiciary," which the RI platform allows for by having a user-friendly UI and notifications and alerts for any updates on model testing. Model cards also allow for logging and documentation of model testing results, which can be used for reporting and tracking. | +--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ All of the RIME tests are listed in the following sections, along with detailed descriptions. Tests ~~~~~~~~~~~~~ Tabular tests are performed on table-formatted data. Generally speaking, data that consists of a list of records with various attributes for a given data can be considered tabular. NLP models attempt to correctly parse human speech, recognize entities being referred to by that speech, and analyze implied qualities of that speech, such as emotional subtext. CV models attempt to interpret images to discern specific objects within the image or generally classify the contents of an image. .. toctree:: :maxdepth: 1 /for_data_scientists/explanation/test_categories.md explanation/tests/tabular/tests.md