Monitoring (Server)

Tip

This page covers all aspects regarding monitoring on the server level. For informations about the monitoring of individual websites, see Monitoring (Website).

Availability (external)

We closely monitor all aspects of your server. According to your service level, our on call organisation will take appropriate actions if required.

Availability (internal)

Monit, nginx and PHP FPM (if installed) status pages are available at http://localhost:2813/:

  • http://localhost:2813/monit/: Monit service manager displaying status of all locally monitored processes

  • http://localhost:2813/nginx/: nginx stub status output

  • http://localhost:2813/fpm-<poolname>/: PHP FPM per pool status page

Tip

this status vhost is running on localhost only. Expose port 2813 through SSH to access locally: ssh <hostname> -L 2813:localhost:2813

Tip

The monit status can also be checked in the terminal with monit-status as devop user (see Generic Admin User).

Reboot

A automatic reboot is initiated to solve certain high usage scenarios:

  • 5 minute average load higher than CPU count * 10 for 5 minutes

  • memory usage higher than 95% for 5 minutes

Tip

always make sure that any required services will be up and running automatically

If your managed server was rebooted too many times in certain period of time, this is detected by our monitoring system and you’ll be notified by e-mail.

To understand what happened, you can use the following commands with the devop user (see Generic Admin User):

SSH Session
# see a list of the latest reboots of your server
last reboot

# check whether monit triggered a reboot
grep 'monit-reboot:' /var/log/syslog*

Out of Memory (OOM)

In case the memory of your managed server is exhausted by the running processes, the Linux operating system starts to protect itself by killing processes that consume bigger amounts of memory. Doing so frees up system memory with the intention to keep the overall system running and responsive.

If the OOM-Killer get’s invoked too often this is a sign that your managed server could be short of resources. Our monitoring will detect this and notify you by e-mail.

To troubleshoot memory exhaustion, you can use the following commands with the devop user (see Generic Admin User):

SSH Session
# list what caused the oom-killer to do something
cat /var/log/syslog* | grep oom-killer -A2

# see what processes got killed by the oom_reaper
cat /var/log/syslog* | grep oom_reaper

Utilization

Netdata

Netdata is a real-time, interactive web dashboard collecting data every second. Metrics are saved in memory and kept for 1 hour only. You can reach its webinterface at http://localhost:19999.

To connect from your local computer, either forward port 19999 through SSH (ssh <hostname> -L 19999:localhost:19999), or add a reverse proxy website forwarding requests to http://localhost:19999

Warning

when using the reverse proxy method, make sure to enable HTTPS and password protection

collectd

System statistics are collected every 10 seconds by collectd and written to RRD files in /var/lib/collectd. For performance reasons, we don’t create graphs by default, therefore you have to download and render them with a tool of your choice by yourself. Please select a rendering-tool from list of frontends within the collectd wiki. We recommend collectd-web.

For Debian-based Linux Distributions

Installation:

sudo apt-get install librrds-perl libjson-perl libhtml-parser-perl
git clone https://github.com/httpdss/collectd-web.git
echo 'datadir: "/tmp/rrd"' | sudo tee /etc/collectd/collection.conf

Fetch data and render graphs:

rsync -avz <server>:/var/lib/collectd/rrd/ /tmp/rrd/
cd /path/to/collectd-web
python runserver.py

Then open collectd-web at http://127.0.0.1:8888/.

collectd-web with Docker

A Docker image is also available.

rsync -avz <server>:/var/lib/collectd/rrd/ /tmp/rrd/
docker run -p 8888:80 --volume /tmp:/tmp -it registry.gitlab.com/opsone_ch/docker-collectd-web:latest

Then open collectd-web at http://127.0.0.1:8888/.