Health checks provide a way for the hoster to check if the application is still running and working as expected.
The most basic health check is enabled by default on the /_system/check route. This is a very simple check if the process is running and provides no details to your application checks.
To demonstrate the idea we will describe a simple health check perfoming a request to a HTTP resource and return the appropriate result:
@s.provides(s.MediaType.ApplicationJson, default=True)
class SimpleHttpResourceCheck(s.RequestHandler):
@s.async
def get(self):
result = self.environment.http_resource.ping()
if result.code == 200:
raise s.HealthCheckOk()
if result.code == 599:
raise s.HealthCheckError()
To enable this health check simply add this to the environment in your services run method:
class MyService(s.Service):
def run(self):
self.environment.add_health_check('http_resource',
SimpleHttpResourceCheck)
When the service is then started you can access the check as /_system/check/http_resource:
$ curl 'http://127.0.0.1/_system/check/http_resource'
{"code": "OK", "ok": true}
The HTTP response code will be 200 when everything is ok. Any error, WARNING or ERROR will return the HTTP code 500. A warning will return the response:
$ curl 'http://127.0.0.1/_system/check/http_resource_with_warning'
{"code": "WARNING", "error": true}
and an error a similar one:
$ curl 'http://127.0.0.1/_system/check/http_resource_with_warning'
{"code": "ERROR", "error": true}
Exception for health checks return values indicating a ERROR health check.
The ERROR state indicates a major problem like a failed connection to a database.
Exception for health checks return values indicating a OK health check.
Exception for health checks return values indicating a WARNING health check.
The WARNING state indicates a problem that is not critical to the application. This could involve things like long response and similar problems.
The default system health check.
This check is returning this JSON:
{"message": "API running", "code": "OK", "ok": true}
and its primiary use is to check if the process is still running and working as expected. If this request takes too long to respond, and all other systems are working correctly, you probably need to create more instances of the service since the current number of processes cannot deal with the number of requests coming from the outside.