cloud.mongodb.com unavailable
Incident Report for MongoDB Cloud
Postmortem

We apologize for the downtime of MongoDB Atlas and Cloud Manager’s web interface as well as the inaccessibility of logins for the MongoDB Support Portal and MongoDB University.

Approximately at 10:50 UTC on 2020-05-30, MongoDB Cloud customers would have noticed an inability to connect to the control plane. MongoDB Support customers would have been unable to login. MongoDB University customers would have also been unable to login. At approximately 11:00 UTC, MongoDB posted an alert to all customers that MongoDB Cloud was unavailable. We identified the root cause at approximately 10:54 UTC. MongoDB Cloud Operations began rolling out the fix by 11:38 UTC. Around 13:15 UTC, service was restored to the web interface, although some MongoDB Cloud Manager Backup customers would have noticed delayed snapshots while the system continued to heal.

Root Cause
Some certificates used by internal services were using a cross-signed intermediate CA. One of the cross-signing CAs, to which we were still building the trust chain to support some legacy systems, expired. This caused some critical systems to start rejecting certificates causing a failure in our internal control plane. This prevented access to MongoDB University, MongoDB Support, and MongoDB Cloud.

Remediations
We removed the expired cross-signing elements from our internal certificate chains and verified all of our systems were operational. To prevent this issue reoccurring in the future we'll also be enforcing full-chain inspection in our existing certificate monitoring and alerting.

Posted May 30, 2020 - 18:36 UTC

Resolved
During this time:
- cloud.mongodb.com was unreachable
- university.mongodb.com was unreachable
- logins to the support portal was unavailable, but users who were logged in previously could still create and modify cases
- Atlas clusters were unaffected and were still reachable and usable, but they could not be modified, deleted, or created
- Cloud Manager managed clusters were unaffected and were still reachable and usable, but modifications could not be made
- Alerts across all of MongoDB Cloud were not sent during this time
- Metrics data across all of MongoDB Cloud during this time was lost and is not able to be backfilled

Continuing effects:
- A small percentage of Cloud Manager Backup and Atlas Continuous Backup customers have delayed backup snapshots. These will catch up over the next few minutes to hours.
Posted May 30, 2020 - 14:49 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted May 30, 2020 - 13:21 UTC
Update
The cloud.mongodb.com website is currently unavailable. Our operations team has identified the root cause and are working on a solution. Atlas clusters are not affected. Cloud Manager managed processes are also not affected.

It is not presently possible to login to support.mongodb.com but users who have previously logged in and have active sessions can still file cases.

MongoDB Stitch is currently unavailable.
Posted May 30, 2020 - 12:52 UTC
Identified
The cloud.mongodb.com website is currently unavailable. Our operations team has identified the root cause and are working on a solution. Atlas clusters are not affected. Cloud Manager managed processes are also not affected. Thank you for your patience.
Posted May 30, 2020 - 12:21 UTC
Investigating
The cloud.mongodb.com website is currently unavailable. We are investigating this issue.

This issue does not impact the functioning of Atlas clusters.
Posted May 30, 2020 - 11:01 UTC
This incident affected: MongoDB Cloud, MongoDB Support Portal, and MongoDB Atlas App Services and Device Sync.