Facebook’s Prineville Data Server Center

Creating PULSE — A developer-friendly server monitoring tool (Part 3)

This is part of a weekly development blog series, where I will document the creation of an application from the initial idea through to its deployment on a scalable architecture. Even as an experienced developer, I find these stories to be interesting and I usually pick up a tip or two, so if you’d like to come along, and hopefully benefit in some way, let’s dig in!

NOTICE: Pulse has now launched and is available to use. You can create an account by visiting https://pulse.alphametric.co

Building a receipt system

Following on from last week, I needed to build a billing and receipt system to handle payment subscriptions for users. I looked into Stripe / Laravel Cashier for handling this, but I found it to be a little inflexible to my needs.

If you’re building a SaaS that uses a simple subscription model, then I’d highly recommend Cashier as it will save you time and reduce your codebase.

However, when I decided that I was going to roll my own option, that led to a few issues. Namely, when a user changes a plan, or cancels their account.

In these situations, I was going to have to take the price of the plan, multiply it by the number of servers the user has, then calculate the number of days in the billing cycle and put it all together. It’s not friendly code, fortunately by wrapping some functionality within support methods, it made it easier:

Given how important it was to get this functionality right, I dropped down a level and wrote some unit tests for this… it helps me sleep a little easier.

The payment data returned by Stripe was easy enough to store in the database, but I needed to create an option that would allow users to print a receipt (either on hard copy or out to a PDF). This allowed for the exploration of CSS print stylesheets, which was somewhat new to me.

It took a little bit of time to get it right (particularly for responsive design), but I think the result turned out pretty well:

The receipt (as viewed in the browser)
The receipt (as viewed in a PDF)

Creating a server dashboard

Now that payments are completed, I was able to move forward onto the actual meat of the application. I started by building a simple management dashboard containing cards for each server. The cards would display the basics… name, IP address, time zone, last update time and current status.

I decided on four statuses for servers:

  1. New — a server which has yet to be configured and therefore is not actively monitored by Pulse at all.
  2. Silent — a server that has been configured, but is not reporting any data to Pulse (either because the script isn’t set up, or because the server is down)
  3. Bad — a server that has sent statistics, one or more of which violate the threshold configured by the user e.g. more than 80% CPU usage.
  4. Good — a server that has sent statistics, all of which are within thresholds.

Next, I opted to add color accents to the cards:

The server management dashboard

These helped to make the state of the server stand out. I also added a default orderByRaw clause to the database query to sort the servers by type:

$query->orderByRaw("FIELD(status, 'New', 'Silent', 'Bad', 'Good')");

Using this means that no matter the search performed, New servers will always be shown first, followed by Silent ones, then Bad, and finally Good.

The Server Page

Clicking through to an individual server presents a simple grid detailing the current monitors that have been configured for the server (a new one starts out with the standard monitors — CPU, Memory, Disk Storage & Networking).

You can also see the thresholds that have been set, which if breached, trigger a notification to the channel and route a user has specified. The latest value and its state (whether it has breached the threshold) are also shown:

Monitor configuration for a server

All in all, its comprehensive without being overwhelming. There are several other links in the screenshot, but we’ll discuss those in a later post.

A note on premature optimisation

When I initially designed the database tables for the servers and the logs, I decided to store the monitor configuration as an array, which Laravel would then encode and decode on the fly. The same is true for the logs.

Why did I do this? One reason, performance.

I was concerned that the database would be under too much strain pulling out the server record, the numerous monitor records and the log records in order to process the information sent by a user’s server and determine whether to send out a notification — this would occur every two minutes for EACH server.

Truthfully, this concern is not entirely without merit, and I’ll have to see how things go when it’s in production, but while developing the system with this approach, I discovered something about the code… it looked TERRIBLE.

I also discovered another issue with it. I won’t discuss it here, other than to say it would have been messy to solve using this approach.

Working with nested arrays is just plain nasty. So, I changed mid way and began refactoring the codebase to accept a parent server with child monitor records. It took the best part of a day to do, and I had to rewrite a lot of code, but the net result is a much simpler, easier to navigate codebase.

Why do this though, particularly if app performance might suffer? Simple, a harder to understand codebase is ALWAYS less preferable to a slightly slower system. It’s the lesser of two evils. And you have to ask yourself, do you really want to risk breaking something for a small gain in performance?

Besides, if it really becomes an issue, there are other options available to me. I could upgrade the database server to a higher specification, I could use a strategy where we have a READ database and a WRITE database, I could opt for a more aggressive caching strategy etc…

In other words, the take away here is as follows:

ONLY add complexity to your codebase when you have no other choice. Instead, ALWAYS go with the simplest, easiest option for your code. Change it ONLY when it becomes a real problem, and not before (often it never will).

Wrapping Up

Well, that’s it for this week, but things are moving along pretty well! Next up, we’ll be looking at the logging data and presenting it in a series of charts that allow a much easier digestion of performance / demand over time.

We will also be examining search functionality to allow a user to drill down to a specific set of logging data. This carries its own set of challenges due to the potential for user servers operating in different time zones to Pulse’s servers.

All that is coming in next week’s article. In the mean time, be sure to follow me here on Medium, and also on Twitter for more frequent updates.

NOTICE: Pulse has now launched and is available to use. You can create an account by visiting https://pulse.alphametric.co

Thanks, and happy coding!

Software developer. Most of the time, I’m working with PHP, Laravel, Vue and TailwindCSS.