Monitive’s first implementations followed the Agile way of emergent architecture. This tells us not to build for 1.000.000 users from day one, but to expand it as it goes. So we did that for the automatic 24/7 checking system and for the internationalization mechanism.
Although it was rather simple to build it, it soon got to an end. A dead end. Because soon enough the architecture wasn’t built for hundreds of users and thousands of checks every few minutes.
This is why Monitive as it is today started out from day one with a loosely-coupled, scalable architecture, modulated, white-labeled, fully internationalized and designed to naturally expand across servers. Roughly, the entire system is composed of a few types of “application roles”:
- the Website, where users find out about us, sign-up or confirms their email; this is a stand-alone component that is not combined with any other type of roles, on a server;
- the Jobmanager, this is the main server that holds all the data, makes all the checks using the locations provided.
- the Nodeworker, a very small set of small scripts that are scattered across hosting accounts around the globe; these scripts receive checking commands from the jobmanager and execute them by accessing the client’s service and report back with their outcome and metrics.
Each of these roles are designed to be multiplied across different servers as the system grows. What happens is that every few hundred customers, we deploy a new Jobmanager and some new Nodeworkers to handle the extra checking.
The Website also gets replicated when needed to run under a different brand, since it is completely white-labeled.
Nodeworker expansion is a matter of uploading a set of 3 files on a new hosting locations and typing in the API endpoint URL into the Jobmanagers that use it.
And since reliability is our TOP priority, Jobmanagers have replications servers standing by that automatically take over in case the main ones are found dead, within minutes!
Dead Nodeworkers are automatically skipped and the only thing that currently doesn’t have a backup plan is the Website. But since our uptime is at 99.99%, that isn’t an issue. If the website goes down the checking is still being made, the alerts are still being sent.
So, a word of recommendation for fellow system builders: start with a toy, stress it and see where it spills, then rebuild everything by applying what you have learned in the process.