🖼 🤖 OpenAI updates AI safeguard framework to address potential severe harms

🤖 OpenAI Updates AI Risk Mitigation Framework to Address Potential Severe Harms

OpenAI has released an updated "Risk Mitigation Framework" aimed at tracking and preventing severe harms that advanced AI could potentially cause. Key updates include:

* Priority of High - Risk Capabilities: Assess whether AI capabilities could lead to severe harms and categorize them according to five key criteria: plausibility, measurability, severity, novelty, and irreversibility of the risk.
* Refinement of Capability Categories:
* Tracked Categories: Biological and chemical capabilities, cybersecurity capabilities, and AI self - improvement capabilities.
* Research Categories: Remote autonomy, sandbagging (deliberately underperforming), autonomous replication and adaptation, circumvention of safeguards, and nuclear and radioactive.
* Definition of Capability Levels:
* High - Capability: Capabilities that could expand existing severe harm pathways. Risks associated must be adequately mitigated before deployment.
* Critical - Capability: Capabilities that could introduce unprecedented new severe harm pathways. Risks also need to be adequately mitigated during the development process.
* Security Advisory Group (SAG): A cross - departmental team assesses whether safeguards adequately reduce severe risks and makes recommendations to OpenAI leadership.
* Scalable Assessment: Build an automated assessment suite and conduct expert - led "deep dives" to ensure the accuracy of the assessment.
* Definition of Safeguard Reporting: Based on capability reports, add detailed information on how to design robust safeguards and verify their effectiveness.
* Response to Changes: If other AI developers release high - risk systems without similar safeguards, OpenAI may adjust its requirements. However, it will first confirm that the risk environment has indeed changed and publicly acknowledge the ongoing adjustment.

OpenAI will continue to publish its risk mitigation research results and share new benchmarks to support the security efforts across the field.

(@OpenAI)

via Teahouse - Telegram Channel