CA/Responding To An Incident: Difference between revisions

From MozillaWiki
< CA
Jump to navigation Jump to search
(Expand with text from Kathleen and ideas from Ryan)
m (→‎Examples of Good Practice: Updated based on v.3.0 of CCADB Incident Reporting Guidelines.)
 
(43 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{draft}}
Please go to '''https://www.ccadb.org/cas/incident-report''' for detailed information about reporting compliance incidents.


The page gives guidance to CAs as to how Mozilla expects them to react to reported misissuances, and what the best practices are. For the purposes of this page, a "misissuance" is defined as any certificate issued in contravention of any applicable standard, process or document - so it could be RFC non-compliant, BR non-compliant, issued contrary to the CA's CP/CPS, or have some other flaw or problem.
(Researchers who report CA incidents such as misissuances are welcome to include a link to that page in their report to the CA, reminding the CA of Mozilla's expectations for incident reporting.)


While some forms of misissuance may be seen as less serious than others, opinions vary on which these are. Mozilla sees all misissuances as good opportunities for the CA to test that their incident response processes are working well, and so we expect a similar level of timeliness of response and quality of reporting for all incidents, whatever their adjudged severity.
This page provides supplemental information on Mozilla's expectations regarding the handling of compliance incidents, incident reporting, remediation, and communication.  It gives guidance to CAs as to how Mozilla expects them to react to reported incidents such as misissuances, and what the best practices are.  


We do not expect perfection from any CA; it is true that our confidence in a CA is in part affected by the number and severity of incidents, but it is also significantly affected by the speed and quality of incident response.
= Overview =
 
An incident arises any time a CA fails to comply with an applicable requirement found in the Mozilla Root Store Policy, the CA/Browser Forum's requirements, or the CCADB's requirements. As noted in section 2.4 of the Mozilla Root Store Policy, a compliance incident can arise from certificate misissuance, delayed revocation, procedural or operational issues, or some other cause.
 
A "misissuance" is defined as any certificate issued in contravention of any applicable standard, process or document - so it could be RFC non-compliant, BR non-compliant, issued contrary to the CA's CP/CPS, or have some other flaw or problem.
 
Sometimes our guidance is framed in terms of misissuance of certificates; it will need to be adapted as necessary for incidents of a different nature, respecting the spirit of the information requests contained in the standard incident-reporting template.
 
Other examples of incidents include misconfigured CRLs and OCSP responders, delayed responses, failures to properly communicate information, and any other event affecting trust in the WebPKI which does not involve the actual contents of certificates.
 
While some forms of incident may be seen as less serious than others, opinions may vary. Mozilla sees all incidents as good opportunities for CA operators to confirm that their incident response processes are working well, and so we expect a similar level of timeliness of response and quality of reporting for all incidents, whatever their adjudged severity.
 
To be clear, the [https://www.ccadb.org/cas/incident-report#incident-report-template incident reporting template] and incident-reporting process provide a set of best practices. Therefore, failure to follow one or more of the recommendations alone is not by itself sanctionable. However, failure to do so without good reason may affect Mozilla's general opinion of the CA. Our confidence in a CA is in part affected by the number and severity of incidents, but it is also significantly affected by the speed and quality of incident response.  


= Immediate Actions =
= Immediate Actions =


In almost all cases, a CA should immediately cease issuance from the affected part of your PKI until you have diagnosed the source of the problem.
In misissuance cases, a CA should almost always immediately cease issuance from the affected part of its PKI. In situations not involving misissuance, there also may be processes that need to be stopped until the CA has diagnosed the source of the problem.
 
Once the problem is diagnosed, if the CA is able to put in place temporary or manual procedures to prevent the problem from re-occurring, it may restart the process even if a full fix is not rolled out. CAs should not restart affected processes until they are confident that the problem will not re-occur.


Once the problem is diagnosed, you can restart issuance even if a full fix is not rolled out, if you are able to put in place temporary or manual procedures to prevent the problem re-occurring. You should not restart issuance until you are confident that the problem will not re-occur.
'''An initial report should be filed within 72 hours of being made aware of the incident.'''
See https://www.ccadb.org/cas/incident-report#incident-reports


= Revocation =
= Revocation =
== Mozilla’s Expectations on Revocation ==
CA operators MUST revoke misissued or otherwise problematic TLS server certificates within 24 hours or 5 days, depending on the circumstances set forth in [https://cabforum.org/working-groups/server/baseline-requirements/requirements/#491-circumstances-for-revocation section 4.9.1] of the CA/Browser Forum’s TLS Baseline Requirements (TLS BRs).
Per [https://www.mozilla.org/en-US/about/governance/policies/security-group/certs/policy/#613-delayed-revocation MRSP section 6.1.3], Mozilla does not grant exceptions to the revocation requirements of the TLS BRs.
Furthermore, to ensure compliance with the TLS BRs, beginning September 1, 2025, Mozilla requires that CA operators:


It is normal practice for CAs to revoke misissued certificates. But that leaves the question about when this should be done, particularly if it's not possible to contact the customer immediately, or if they are unable to replace their certificate quickly. Section 4.9.1.1 of the CA/Browser Forum’s Baseline Requirements states:
* engage in proactive communication and advise subscribers well in advance about the revocation timelines and explicitly warn them against using publicly-trusted TLS server certificates on systems that cannot tolerate timely revocation;
* include appropriate language in customer agreements requiring subscribers’ timely cooperation in meeting revocation timelines and acknowledging the CA’s obligations to adhere to applicable policies and standards; and
* prepare and maintain comprehensive and actionable plans to address mass revocation events, including detailed procedures for handling mass revocations effectively, including rapid communication with affected parties and conducting annual plan testing through tabletop exercises, simulations, parallel testing, or use of test environments, which do not involve the revocation of active certificates.


<blockquote>
Beginning with the CA operator’s next annual audit cycle starting on or after June 1, 2025, each CA operator MUST engage a third-party assessor to evaluate whether the CA operator has:
“The CA SHALL revoke a Certificate within 24 hours if one or more of the following occurs: …<br>
* well-documented and actionable plans to handle mass revocation events;  
9. The CA is made aware that the Certificate was not issued in accordance with these Requirements or the CA’s Certificate Policy or Certification Practice Statement;<br>
* demonstrated the implementation and feasibility of the plans, through testing exercises including documentation of testing, processes, timelines, results, and remediation steps; and 
10. The CA determines that any of the information appearing in the Certificate is inaccurate or misleading; …<br>
* incorporated feedback from such testing exercises and other evaluations to enhance readiness and improve future performance.
14. Revocation is required by the CA’s Certificate Policy and/or Certification Practice Statement; or<br>
15. The technical content or format of the Certificate presents an unacceptable risk to Application Software Suppliers or Relying Parties (e.g. the CA/Browser Forum might determine that a deprecated cryptographic/signature algorithm or key size presents an unacceptable risk and that such Certificates should be revoked and replaced by CAs within a given period of time).
</blockquote>


This means that, in most cases of misissuance, the CA has an obligation under the BRs to revoke the certificates concerned within 24 hours.
The above-referenced June 1, 2025, date is to ensure that compliance with the September 1, 2025, requirements will be evaluated within a reasonable timeframe while allowing CA operators to incorporate mass revocation testing into their CA processes and annual audit cycles. However, the assessment does not have to be conducted as part of the CA operator’s ETSI or WebTrust audit unless the CA operator finds it more convenient to include it within that scope. The assessment may be conducted separately by a qualified third-party assessor, provided it meets the stated evaluation criteria.


However, it is not our intent to introduce additional problems by forcing the immediate revocation of certificates that are not BR compliant when they do not pose an urgent security concern. Therefore, we request that your CA perform careful analysis of the situation. If there is justification to not revoke the problematic certificates, then your report will need to explain those reasons and provide a timeline for when the bulks of the certificates will expire or be revoked/replaced.
== Reporting Delayed Revocation Incidents ==


If your CA will not be revoking the certificates within 24 hours in accordance with the BRs, then that will need to be listed as a finding in your CA’s BR audit statement.
The [https://www.ccadb.org/cas/incident-report CCADB incident reporting process] ensures the Web PKI community is informed and that issues are tracked and resolved effectively. Clear and timely communication fosters trust and accountability, mitigating risks to the ecosystem.


We expect that your CA will work with your auditor (and supervisory body, as appropriate) and the Root Store(s) that your CA participates in to ensure your analysis of the risk and plan of remediation is acceptable. If your CA will not be revoking the problematic certificates as required by the BRs, then we recommend that you also contact the other root programs that your CA participates in to acknowledge this non-compliance and discuss what expectations their Root Programs have with respect to these certificates.  
If a CA operator determines that it might delay revocation of certificates beyond the time period required by the TLS BRs, it MUST file a preliminary incident report with a Summary section immediately in Bugzilla, even if the delay has not yet occurred.
 
Consistent with CCADB incident reporting requirements, the CA operator SHALL explain in the "Analysis" section of the incident report those factors and rationales behind the decision to delay revocation (including detailed and substantiated explanations of how extensive harm would result to third parties–such as essential public services or widely relied-upon systems–and why the situation is exceptionally rare and unavoidable).
 
Also, the "Timeline" section should include the time(s) at which the CA Operator actually completed revocation of affected certificates, and the "Action Items" list MUST include steps reasonably calculated to prevent or reduce future revocation delays.
 
== Consequences of Delayed Revocations ==
 
Failing to meet the standards of timely revocation erodes trust in the Web PKI and poses risks to global internet security.  Delayed revocation is a measure of last resort and MUST NOT be used routinely. Repeated incidents of delayed revocation without sufficient justification will result in heightened scrutiny and sanctions, including removal of the CA from the Mozilla Root Store. CA operators must also adhere to the policies and revocation requirements of other Root Store Programs that include their CA certificates.  Additionally, all delayed revocation incidents MUST be listed as findings in the CA operator’s next TLS BR audit statement.


= Follow-Up Actions =
= Follow-Up Actions =
Line 37: Line 66:
* Work out how the bug or problem was introduced. For a code bug, were the code review processes sufficient? Does your code have automated tests, and if so, why did they not catch this case?
* Work out how the bug or problem was introduced. For a code bug, were the code review processes sufficient? Does your code have automated tests, and if so, why did they not catch this case?


* Work out why the problem was not detected earlier. Were these certificates missed by your self-audits? Or is the code or process you use for such audits insufficently rigorous?
* Work out why the problem was not detected earlier. Were these certificates missed by your linting processes or self audits? Or is the code or process you use for insufficient?


* If the problem is lack of compliance to an RFC, Baseline Requirement or Mozilla Policy requirement: were you aware of this requirement? If not, why not? If so, was an attempt made to meet it? If not, why not? If so, why was that attempt flawed? Do any processes need updating for making sure your CA complies with the latest version of the various requirements placed upon it?
* If the problem is lack of compliance to an RFC, Baseline Requirement, or Mozilla Policy requirement: were you aware of this requirement? If not, why not? If so, was an attempt made to meet it? If not, why not? If so, why was that attempt flawed? Do any processes need updating for making sure your CA complies with the latest version of the various requirements placed upon it?


* Scan your corpus of certificates to look for others with the same issue. It does not look good for a CA to claim they have revoked all affected certificates and resolved the issue, and then for a researcher to discover another set of certificates with the same or a similar problem.
* Scan your corpus of certificates to look for others with the same issue. It does not look good for a CA to claim they have revoked all affected certificates and resolved the issue, and then for a researcher to discover another set of certificates with the same or a similar problem.


* Examine whether there are potential related problems which you can also remediate at the same time. For example, if the problem was bad data in a particular field, consider improving the validation of all fields in the certificate prior to issuance. You should be proactively looking for ways to harden your issuance pipeline against further problems.
* Examine whether there are potential related problems which you can also remediate at the same time. For example, if the problem was bad data in a particular field, consider improving the validation of all fields in the certificate prior to issuance. You should be proactively looking for ways, such as pre-issuance lint testing, to harden your issuance pipeline against further problems.


* If, as happens in a regrettably large number of cases, a problem report was sent to your CA but action was not taken within 24 hours, investigate what happened to that report and whether your report handling processes are adequate.
* If, as happens in a regrettably large number of cases, a problem report was sent to your CA but action in accordance with BR section 9.4.5 was not taken within 24 hours, investigate what happened to that report and whether your report handling processes are adequate.


= Incident Report =
= Incident Report =


Each incident should result in an incident report, written as soon as the problem is fully diagnosed and (temporary or permanent) measures have been put in place to make sure it will not re-occur. If the permanent fix is going to take significant time to implement, you should not wait until this is done before issuing the report. We expect to see incident reports as soon as possible, and certainly within two weeks of the initial issue report.  
For guidance on incident reporting, first visit '''https://www.ccadb.org/cas/incident-report'''.


The incident report should cover at least the following topics:
Your CA must submit an incident report by [https://bugzilla.mozilla.org/enter_bug.cgi?product=CA%20Program&component=CA%20Certificate%20Compliance&version=other creating a bug in Bugzilla under the CA Program :: CA Certificate Compliance component]. When the incident is reported only on the CCADB public list or on the [https://groups.google.com/a/mozilla.org/g/dev-security-policy MDSP mailing list], then a bug will be created to track the incident and its resolution in Bugzilla. CAs are encouraged to announce important incidents on public@ccadb.org when they involve the Baseline Requirements, other root programs, or the CCADB; or on the Mozilla dev-security-policy list, when they only involve violations of the Mozilla Root Store Policy.


# How your CA first became aware of the problem (e.g. via a problem report submitted to your Problem Reporting Mechanism, via a discussion in mozilla.dev.security.policy, or via a Bugzilla bug), and the date.
The incident report should use the markdown template provided on the CCADB website:
# A timeline of the actions your CA took in response.
 
# Confirmation that your CA has stopped issuing TLS/SSL certificates with the problem.
'''https://www.ccadb.org/cas/incident-report#incident-report-template'''
# A summary of the problematic certificates. For each problem: number of certs, and the date the first and last certs with that problem were issued.
# A complete list of the problematic certificates. The recommended way to handle this is to ensure each certificate is logged to CT and then list the fingerprints or crt.sh IDs, either in the report or as an attached spreadsheet, with one list per distinct problem.
# Explanation about how and why the mistakes were made or bugs introduced, and how they avoided detection until now.
# List of steps your CA is taking to resolve the situation and ensure such issuance will not be repeated in the future, accompanied with a timeline of when your CA expects to accomplish these things.


= Keeping Us Informed =
= Keeping Us Informed =


Once the report is posted, you should provide regular updates giving your progress, and confirm when the remediation steps have been completed. Such updates should be posted to the m.d.s.p. thread, if there is one, and the Bugzilla bug, if there is one.
Once the report is posted, you should respond promptly to questions that are asked, and in no circumstances should a question linger without a response for more than one week, even if the response is only to acknowledge the question and provide a later date when an answer will be delivered. You should also provide updates at least every week giving your progress, and confirm when the remediation steps have been completed - unless a root store representative has agreed to a different schedule by setting a “Next Update” date in the “Whiteboard” field of the bug or has announced they consider closing the bug and no further comments have been posted. Updates to important incidents (see e.g. https://www.ccadb.org/cas/public-group#lessons-learned-from-ca-incident-reports) should be posted to either the [https://groups.google.com/a/ccadb.org/g/public CCADB Public list] or the [https://groups.google.com/a/mozilla.org/g/dev-security-policy MDSP mailing list] and the Bugzilla bug. The bug will be closed when remediation is completed.


= Examples of Good Practice =
= Examples of Good Practice =


Here are some examples of good practice, where a CA did most or all of the things recommended above.
Here are some examples of good practice. (These examples will be updated based on experience with [https://www.ccadb.org/cas/incident-report version 3.0 of the CCADB Incident Reporting Guidelines].)
 
== Let's Encrypt Unicode Normalization Compliance Incident ==
 
* [https://groups.google.com/forum/#!topic/mozilla.dev.security.policy/g6_zGA2exXw Initial Public Problem Report], 2017-08-10 20:23 UTC (apparently LE were made aware of the problem privately earlier that day)
* [https://groups.google.com/d/msg/mozilla.dev.security.policy/g6_zGA2exXw/_tXldrbIBwAJ Initial Public Response from CA], 2017-08-10 21:53 UTC
* [https://groups.google.com/d/msg/mozilla.dev.security.policy/nMxaxhYb_iY/AmjCI3_ZBwAJY Final Report from CA], 2017-08-11 03:00 UTC
 
In this case, the CA managed to diagnose the problem, remediate it, and deploy the fix to production within 24 hours.


== PKIOverheid Short Serial Number Incident ==
== Let's Encrypt: keyCompromise key blocking deviation from CP/CPS ==
https://bugzilla.mozilla.org/show_bug.cgi?id=1886876
* Clear indication of Preliminary and Full Incident Reports.
* Detailed timeline that identifies all policy, process, and software changes that contributed to the root cause, and an indication of when the incident began and ended.
* Detailed Root Cause Analysis that offers background on the various conditions that gave rise to the issue.
* Timely updates in response to questions posed, continued analysis, and changes to Action Items.


* [https://groups.google.com/d/msg/mozilla.dev.security.policy/vl5eq0PoJxY/uD-Li1w1BgAJ Initial Public Problem Report], 2017-07-18 22:26 UTC
== Google Trust Services: Failure to properly validate IP address ==
* [https://groups.google.com/d/msg/mozilla.dev.security.policy/vl5eq0PoJxY/TzH5eI9dAQAJ Initial Public Response from CA], 2017-07-25 19:20 UTC
https://bugzilla.mozilla.org/show_bug.cgi?id=1876593
* [https://groups.google.com/forum/#!msg/mozilla.dev.security.policy/vl5eq0PoJxY/W1D4oZ__BwAJ Final Report from CA], 2017-08-11 14:39 UTC
* Significant amount of background information that informs the timeline of the incident.
* Clear identification of the contributing factors that contributed to the incident that notes how many of them avoided detection in the Root Cause Analysis.
* Action Items that prevent, mitigate, and detect what didn’t go well.
* Timely and detailed updates conveying Action Item status.


While the CA could have provided interim updates, and the final report was a little delayed, the contents of it were excellent.
== HARICA: Anomaly in OCSP services after CA software upgrade ==
https://bugzilla.mozilla.org/show_bug.cgi?id=1878106
* Clear Summary that provides just enough context for new readers to understand the rest of the report.
* Effective use of the “5 Whys” Root Cause Analysis methodology where “why” is asked as many times as necessary to identify the root cause of the incident.
* Action Items that prevent and detect what didn’t go well.
* Timely updates in response to questions posed and changes to Action Items.

Latest revision as of 22:30, 11 March 2025

Please go to https://www.ccadb.org/cas/incident-report for detailed information about reporting compliance incidents.

(Researchers who report CA incidents such as misissuances are welcome to include a link to that page in their report to the CA, reminding the CA of Mozilla's expectations for incident reporting.)

This page provides supplemental information on Mozilla's expectations regarding the handling of compliance incidents, incident reporting, remediation, and communication. It gives guidance to CAs as to how Mozilla expects them to react to reported incidents such as misissuances, and what the best practices are.

Overview

An incident arises any time a CA fails to comply with an applicable requirement found in the Mozilla Root Store Policy, the CA/Browser Forum's requirements, or the CCADB's requirements. As noted in section 2.4 of the Mozilla Root Store Policy, a compliance incident can arise from certificate misissuance, delayed revocation, procedural or operational issues, or some other cause.

A "misissuance" is defined as any certificate issued in contravention of any applicable standard, process or document - so it could be RFC non-compliant, BR non-compliant, issued contrary to the CA's CP/CPS, or have some other flaw or problem.

Sometimes our guidance is framed in terms of misissuance of certificates; it will need to be adapted as necessary for incidents of a different nature, respecting the spirit of the information requests contained in the standard incident-reporting template.

Other examples of incidents include misconfigured CRLs and OCSP responders, delayed responses, failures to properly communicate information, and any other event affecting trust in the WebPKI which does not involve the actual contents of certificates.

While some forms of incident may be seen as less serious than others, opinions may vary. Mozilla sees all incidents as good opportunities for CA operators to confirm that their incident response processes are working well, and so we expect a similar level of timeliness of response and quality of reporting for all incidents, whatever their adjudged severity.

To be clear, the incident reporting template and incident-reporting process provide a set of best practices. Therefore, failure to follow one or more of the recommendations alone is not by itself sanctionable. However, failure to do so without good reason may affect Mozilla's general opinion of the CA. Our confidence in a CA is in part affected by the number and severity of incidents, but it is also significantly affected by the speed and quality of incident response.

Immediate Actions

In misissuance cases, a CA should almost always immediately cease issuance from the affected part of its PKI. In situations not involving misissuance, there also may be processes that need to be stopped until the CA has diagnosed the source of the problem.

Once the problem is diagnosed, if the CA is able to put in place temporary or manual procedures to prevent the problem from re-occurring, it may restart the process even if a full fix is not rolled out. CAs should not restart affected processes until they are confident that the problem will not re-occur.

An initial report should be filed within 72 hours of being made aware of the incident. See https://www.ccadb.org/cas/incident-report#incident-reports

Revocation

Mozilla’s Expectations on Revocation

CA operators MUST revoke misissued or otherwise problematic TLS server certificates within 24 hours or 5 days, depending on the circumstances set forth in section 4.9.1 of the CA/Browser Forum’s TLS Baseline Requirements (TLS BRs).

Per MRSP section 6.1.3, Mozilla does not grant exceptions to the revocation requirements of the TLS BRs.

Furthermore, to ensure compliance with the TLS BRs, beginning September 1, 2025, Mozilla requires that CA operators:

  • engage in proactive communication and advise subscribers well in advance about the revocation timelines and explicitly warn them against using publicly-trusted TLS server certificates on systems that cannot tolerate timely revocation;
  • include appropriate language in customer agreements requiring subscribers’ timely cooperation in meeting revocation timelines and acknowledging the CA’s obligations to adhere to applicable policies and standards; and
  • prepare and maintain comprehensive and actionable plans to address mass revocation events, including detailed procedures for handling mass revocations effectively, including rapid communication with affected parties and conducting annual plan testing through tabletop exercises, simulations, parallel testing, or use of test environments, which do not involve the revocation of active certificates.

Beginning with the CA operator’s next annual audit cycle starting on or after June 1, 2025, each CA operator MUST engage a third-party assessor to evaluate whether the CA operator has:

  • well-documented and actionable plans to handle mass revocation events;
  • demonstrated the implementation and feasibility of the plans, through testing exercises including documentation of testing, processes, timelines, results, and remediation steps; and
  • incorporated feedback from such testing exercises and other evaluations to enhance readiness and improve future performance.

The above-referenced June 1, 2025, date is to ensure that compliance with the September 1, 2025, requirements will be evaluated within a reasonable timeframe while allowing CA operators to incorporate mass revocation testing into their CA processes and annual audit cycles. However, the assessment does not have to be conducted as part of the CA operator’s ETSI or WebTrust audit unless the CA operator finds it more convenient to include it within that scope. The assessment may be conducted separately by a qualified third-party assessor, provided it meets the stated evaluation criteria.

Reporting Delayed Revocation Incidents

The CCADB incident reporting process ensures the Web PKI community is informed and that issues are tracked and resolved effectively. Clear and timely communication fosters trust and accountability, mitigating risks to the ecosystem.

If a CA operator determines that it might delay revocation of certificates beyond the time period required by the TLS BRs, it MUST file a preliminary incident report with a Summary section immediately in Bugzilla, even if the delay has not yet occurred.

Consistent with CCADB incident reporting requirements, the CA operator SHALL explain in the "Analysis" section of the incident report those factors and rationales behind the decision to delay revocation (including detailed and substantiated explanations of how extensive harm would result to third parties–such as essential public services or widely relied-upon systems–and why the situation is exceptionally rare and unavoidable).

Also, the "Timeline" section should include the time(s) at which the CA Operator actually completed revocation of affected certificates, and the "Action Items" list MUST include steps reasonably calculated to prevent or reduce future revocation delays.

Consequences of Delayed Revocations

Failing to meet the standards of timely revocation erodes trust in the Web PKI and poses risks to global internet security. Delayed revocation is a measure of last resort and MUST NOT be used routinely. Repeated incidents of delayed revocation without sufficient justification will result in heightened scrutiny and sanctions, including removal of the CA from the Mozilla Root Store. CA operators must also adhere to the policies and revocation requirements of other Root Store Programs that include their CA certificates. Additionally, all delayed revocation incidents MUST be listed as findings in the CA operator’s next TLS BR audit statement.

Follow-Up Actions

  • Work out how the bug or problem was introduced. For a code bug, were the code review processes sufficient? Does your code have automated tests, and if so, why did they not catch this case?
  • Work out why the problem was not detected earlier. Were these certificates missed by your linting processes or self audits? Or is the code or process you use for insufficient?
  • If the problem is lack of compliance to an RFC, Baseline Requirement, or Mozilla Policy requirement: were you aware of this requirement? If not, why not? If so, was an attempt made to meet it? If not, why not? If so, why was that attempt flawed? Do any processes need updating for making sure your CA complies with the latest version of the various requirements placed upon it?
  • Scan your corpus of certificates to look for others with the same issue. It does not look good for a CA to claim they have revoked all affected certificates and resolved the issue, and then for a researcher to discover another set of certificates with the same or a similar problem.
  • Examine whether there are potential related problems which you can also remediate at the same time. For example, if the problem was bad data in a particular field, consider improving the validation of all fields in the certificate prior to issuance. You should be proactively looking for ways, such as pre-issuance lint testing, to harden your issuance pipeline against further problems.
  • If, as happens in a regrettably large number of cases, a problem report was sent to your CA but action in accordance with BR section 9.4.5 was not taken within 24 hours, investigate what happened to that report and whether your report handling processes are adequate.

Incident Report

For guidance on incident reporting, first visit https://www.ccadb.org/cas/incident-report.

Your CA must submit an incident report by creating a bug in Bugzilla under the CA Program :: CA Certificate Compliance component. When the incident is reported only on the CCADB public list or on the MDSP mailing list, then a bug will be created to track the incident and its resolution in Bugzilla. CAs are encouraged to announce important incidents on public@ccadb.org when they involve the Baseline Requirements, other root programs, or the CCADB; or on the Mozilla dev-security-policy list, when they only involve violations of the Mozilla Root Store Policy.

The incident report should use the markdown template provided on the CCADB website:

https://www.ccadb.org/cas/incident-report#incident-report-template

Keeping Us Informed

Once the report is posted, you should respond promptly to questions that are asked, and in no circumstances should a question linger without a response for more than one week, even if the response is only to acknowledge the question and provide a later date when an answer will be delivered. You should also provide updates at least every week giving your progress, and confirm when the remediation steps have been completed - unless a root store representative has agreed to a different schedule by setting a “Next Update” date in the “Whiteboard” field of the bug or has announced they consider closing the bug and no further comments have been posted. Updates to important incidents (see e.g. https://www.ccadb.org/cas/public-group#lessons-learned-from-ca-incident-reports) should be posted to either the CCADB Public list or the MDSP mailing list and the Bugzilla bug. The bug will be closed when remediation is completed.

Examples of Good Practice

Here are some examples of good practice. (These examples will be updated based on experience with version 3.0 of the CCADB Incident Reporting Guidelines.)

Let's Encrypt: keyCompromise key blocking deviation from CP/CPS

https://bugzilla.mozilla.org/show_bug.cgi?id=1886876

  • Clear indication of Preliminary and Full Incident Reports.
  • Detailed timeline that identifies all policy, process, and software changes that contributed to the root cause, and an indication of when the incident began and ended.
  • Detailed Root Cause Analysis that offers background on the various conditions that gave rise to the issue.
  • Timely updates in response to questions posed, continued analysis, and changes to Action Items.

Google Trust Services: Failure to properly validate IP address

https://bugzilla.mozilla.org/show_bug.cgi?id=1876593

  • Significant amount of background information that informs the timeline of the incident.
  • Clear identification of the contributing factors that contributed to the incident that notes how many of them avoided detection in the Root Cause Analysis.
  • Action Items that prevent, mitigate, and detect what didn’t go well.
  • Timely and detailed updates conveying Action Item status.

HARICA: Anomaly in OCSP services after CA software upgrade

https://bugzilla.mozilla.org/show_bug.cgi?id=1878106

  • Clear Summary that provides just enough context for new readers to understand the rest of the report.
  • Effective use of the “5 Whys” Root Cause Analysis methodology where “why” is asked as many times as necessary to identify the root cause of the incident.
  • Action Items that prevent and detect what didn’t go well.
  • Timely updates in response to questions posed and changes to Action Items.