Queues
Learning Objectives
Queues are a frequently-seen component of large software systems that involve potentially heavyweight or long-running requests. A queue can act as a form of buffer, smoothing out spikes of load so that the system can deal with work when it has the resources to do so.
Reading
Read about the Queue-Based Load-Leveling Pattern.
Make sure you have achieved all of the learning objectives for this prep.
π¬ No - a user doesn’t see the stdout/stderr of the process running the work. stdout/stderr can be useful for the queue operators to debug things, but generally aren’t useful to end-users who submit tasks.
π¬ Yes - queues can send notifications about successes/failures/progress.
π¬ Ish. We can build systems that monitor the queue and display results. But in general, when we submit work to a queue, we don’t have a server we can ask to show us progress.
Kafka in a Nutshell
Learning Objectives
Kafka is a commonly-used open-source distributed queue.
Reading
Read Apache Kafka in a Nutshell.
Make sure you have achieved all of the learning objectives for this prep.
Kafka Paper
Learning Objectives
Kafka is a commonly-used open-source distributed queue.
Reading
Read about the core Kafka concepts in the Kafka: a Distributed Messaging System for Log Processing paper.
Make sure you have achieved all of the learning objectives for this prep.
Project: Kafka Cron Scheduler
Learning Objectives
Throughout this module, you will be building a project.
The purpose of the project is to allow a user to specify command lines that should be run on some schedule, and then have those command lines be run on that schedule, on different computers. The more computers we add to the pool of runners, the more command lines we can run at a time.
To simplify deployment, we will use docker compose to simulate having multiple runner computers.
Because you will be learning lots of new things in this project, we will split this project up into steps. We will:
- Build a local cron scheduler that parses the file format and runs simplified tasks at the required intervals.
- Dockerise this local cron scheduler so we can run it in Docker.
- Insert Kafka into the process. Have our cron scheduler produce a message into a Kafka queue, and a consumer pull it out.
- Make our producer produce command lines to run, and our consumer run them.
- Introduce multiple queues.
- Handle errors.
- Add monitoring.
- Add alerting.
Cron
Learning Objectives
We are going to implement a distributed version of the cron job scheduler (read about cron if you are not familiar with it). Cron jobs are defined by two attributes: the command to be executed, and either the schedule that the job should run on or a definition of the times that the job should execute. The schedule is defined according
to the crontab format.
Most languages have parsers of the crontab format - you do not need to write one yourself, (though it can be an interesting challenge!). Some examples:
- For Go, the most widely used is robfig/cron.
- For Java, Quartz has a Cron parser/scheduler, see this quick start guide for how to use it.
Note that Quartz parsing requires a leading seconds specifier, which is non-standard. You can convert a regular cron expression to a Quartz-compatible one by adding the prefix"0 ".
The cron tool common to Unix operating systems runs jobs on a schedule. Cron only works on a single hosts. We want to create a version of cron that can schedule jobs across multiple workers, running on different hosts.
Writing a cron scheduler without Kafka
The first step won’t involve Kafka at all, or running custom jobs. These will come later.
Exercise
Write code which will parse a file which contains a list of crontabs, one per line, and print “Running job [line number]” for each line on the schedule.
e.g. if passed the file:
* * * * *
15 * * * *
Your program should print “Running job 0” every minute, and “Running job 1” once an hour at quarter past the hour.