Prep

Queues

Learning Objectives

Queues are a frequently-seen component of large software systems that involve potentially heavyweight or long-running requests. A queue can act as a form of buffer, smoothing out spikes of load so that the system can deal with work when it has the resources to do so.

Reading

Read about the Queue-Based Load-Leveling Pattern.

Make sure you have achieved all of the learning objectives for this prep.

πŸ€” How can results of tasks be communicated back to users in a queue-based system?

Kafka in a Nutshell

Learning Objectives

Kafka is a commonly-used open-source distributed queue.

Reading

Read Apache Kafka in a Nutshell.

Make sure you have achieved all of the learning objectives for this prep.

Kafka Paper

Learning Objectives

Kafka is a commonly-used open-source distributed queue.

Reading

Read about the core Kafka concepts in the Kafka: a Distributed Messaging System for Log Processing paper.

Make sure you have achieved all of the learning objectives for this prep.

Project: Kafka Cron Scheduler

Learning Objectives

Throughout this module, you will be building a project.

The purpose of the project is to allow a user to specify command lines that should be run on some schedule, and then have those command lines be run on that schedule, on different computers. The more computers we add to the pool of runners, the more command lines we can run at a time.

To simplify deployment, we will use docker compose to simulate having multiple runner computers.

Because you will be learning lots of new things in this project, we will split this project up into steps. We will:

  1. Build a local cron scheduler that parses the file format and runs simplified tasks at the required intervals.
  2. Dockerise this local cron scheduler so we can run it in Docker.
  3. Insert Kafka into the process. Have our cron scheduler produce a message into a Kafka queue, and a consumer pull it out.
  4. Make our producer produce command lines to run, and our consumer run them.
  5. Introduce multiple queues.
  6. Handle errors.
  7. Add monitoring.
  8. Add alerting.

Cron

Learning Objectives

We are going to implement a distributed version of the cron job scheduler (read about cron if you are not familiar with it). Cron jobs are defined by two attributes: the command to be executed, and either the schedule that the job should run on or a definition of the times that the job should execute. The schedule is defined according to the crontab format.

Most languages have parsers of the crontab format - you do not need to write one yourself, (though it can be an interesting challenge!). Some examples:

  • For Go, the most widely used is robfig/cron.
  • For Java, Quartz has a Cron parser/scheduler, see this quick start guide for how to use it.
    Note that Quartz parsing requires a leading seconds specifier, which is non-standard. You can convert a regular cron expression to a Quartz-compatible one by adding the prefix "0 ".

The cron tool common to Unix operating systems runs jobs on a schedule. Cron only works on a single hosts. We want to create a version of cron that can schedule jobs across multiple workers, running on different hosts.

Writing a cron scheduler without Kafka

The first step won’t involve Kafka at all, or running custom jobs. These will come later.

Exercise

Write code which will parse a file which contains a list of crontabs, one per line, and print “Running job [line number]” for each line on the schedule.

e.g. if passed the file:

* * * * *
15 * * * *

Your program should print “Running job 0” every minute, and “Running job 1” once an hour at quarter past the hour.

Exercise

Create a Docker image for your cron scheduler program. Make sure you can run the image.