Marketplace/ComponentSLA
Availability Tiers
There are three tiers of service within the infrastructure:
Contracted Availability
These services/workflows have been defined as mission-critical, and have explicitly set levels of availability or maximum durations for processing. As the specific requirements for each piece will be unique, they will be discussed individually below. Inclusion of them in a MOU is expected.
High Availability
Most of the systems fall in this category. They are monitored closely and have an expectation that the system will be up. However, there is no contract specifying a value for this, and it is also acceptable for the system to be down for planned maintenance.
Internally, there may be differences in expection for components in the infrasturcture, but that will be reflected in development resources assigned and exact monitoring details for each component.
Best Effort
These systems are low-priority, usually documentation or read-only sections. While we will do our best to keep them up, there won't be extensive effort through redundancy or geodistribution to make them High Availability. Even with these qualifications, we still expect them to be up over 98% of the time.
Specific Contracted System Components
These components have been identified as critical pieces of the infrastructure. Note that this does not mean that they will be up 100% of the time, but that there is a determined minimum, and not having it available will either cause economic harm, or need to be compensated for by the client pieces of the system.
Application Removal
Sometimes, an application needs to be removed from the Marketplace. This may be due to a dangerous coding error, a security issue, or legal concerns. Both the owner of the application and administrators of this system need access to this capability.
Once the application has been removed, it needs to leave the system such that it doesn't show up on category pages, in searches, or as an app page within a certain time frame.
Flow
The user logs on and selects the delete-now button. The item is removed from the database (cached elsewhere offline?) and affiliated searches. Caches involving it are flushed.
Failure Scenarios
- User cannot log in
- Database Error prevents deletion
- Caches do not flush
Alternate Paths
In the event that a user cannot remove an application, they should be presented with a hotline that lets them communicate with an admin to do the deletion (will we always have an admin available?). The admin channel should be separate from the user channel, and may have more direct access to the system, as well as the ability to manually flush items from the cache if needed. (How can they verify to the admin that they're the owner of the app?)
Guarantees
Need a guarantee for big-red-button uptime: 99.5% Need a guarantee for removal speed: 15 minutes