Event-Driven Architecture Sprint 1 Prep

Learning Objectives

List advantages of using queues.
List examples of systems where queues are helpful.
Explain how a queue helps to avoid system overload.
Explain how a queue can help reduce service provisioning and costs.
Identify draw-backs to queue-based systems.

Queues are a frequently-seen component of large software systems that involve potentially heavyweight or long-running requests. A queue can act as a form of buffer, smoothing out spikes of load so that the system can deal with work when it has the resources to do so.

Reading

Read about the Queue-Based Load-Leveling Pattern.

Make sure you have achieved all of the learning objectives for this prep.

💬 No - a user doesn’t see the stdout/stderr of the process running the work. stdout/stderr can be useful for the queue operators to debug things, but generally aren’t useful to end-users who submit tasks.

💬 Yes - queues can send notifications about successes/failures/progress.

💬 Ish. We can build systems that monitor the queue and display results. But in general, when we submit work to a queue, we don’t have a server we can ask to show us progress.

Learning Objectives

List the components of the Kafka architecture.
Explain the purpose of a producer, consumer, and broker.
Defined a record.
Define a topic.
Define a partition.
Explain the relationship (and differences) between topics and partitions.
Explain how Kafka knows when a consumer has successfully handled a record.
Contrast at-most-once and at-least-once delivery.
Explain why exactly-once delivery is very hard to achieve.

Kafka is a commonly-used open-source distributed queue.

Reading

Read Apache Kafka in a Nutshell.

Make sure you have achieved all of the learning objectives for this prep.

Learning Objectives

Describe how Kakfa stores data internally.
Calculate how many partitions are needed to serve a given number of consumers on a topic.
Contrast push-based and pull-based queueing systems.
Describe what delivery ordering constraints are and aren’t guaranteed by Kafka.
Explain limitations of Kafka compared to systems with acknowledgements or two-phase commits.

Kafka is a commonly-used open-source distributed queue.

Reading

Read about the core Kafka concepts in the Kafka: a Distributed Messaging System for Log Processing paper.

Make sure you have achieved all of the learning objectives for this prep.

Learning Objectives

Explain the objectives of the module project.

Throughout this module, you will be building a project.

The purpose of the project is to allow a user to specify command lines that should be run on some schedule, and then have those command lines be run on that schedule, on different computers. The more computers we add to the pool of runners, the more command lines we can run at a time.

To simplify deployment, we will use docker compose to simulate having multiple runner computers.

Because you will be learning lots of new things in this project, we will split this project up into steps. We will:

Build a local cron scheduler that parses the file format and runs simplified tasks at the required intervals.
Dockerise this local cron scheduler so we can run it in Docker.
Insert Kafka into the process. Have our cron scheduler produce a message into a Kafka queue, and a consumer pull it out.
Make our producer produce command lines to run, and our consumer run them.
Introduce multiple queues.
Handle errors.
Add monitoring.
Add alerting.

Learning Objectives

Describe the purpose of cron.
Write a crontab to run a job every minute, or at fixed times.
Write a program to parse files containing crontabs and schedule jobs.

We are going to implement a distributed version of the cron job scheduler (read about cron if you are not familiar with it). Cron jobs are defined by two attributes: the command to be executed, and either the schedule that the job should run on or a definition of the times that the job should execute. The schedule is defined according to the crontab format.

Most languages have parsers of the crontab format - you do not need to write one yourself, (though it can be an interesting challenge!). Some examples:

For Go, the most widely used is robfig/cron.
For Java, Quartz has a Cron parser/scheduler, see this quick start guide for how to use it.
Note that Quartz parsing requires a leading seconds specifier, which is non-standard. You can convert a regular cron expression to a Quartz-compatible one by adding the prefix "0 ".

The cron tool common to Unix operating systems runs jobs on a schedule. Cron only works on a single hosts. We want to create a version of cron that can schedule jobs across multiple workers, running on different hosts.

Writing a cron scheduler without Kafka

The first step won’t involve Kafka at all, or running custom jobs. These will come later.

Exercise

Write code which will parse a file which contains a list of crontabs, one per line, and print “Running job [line number]” for each line on the schedule.

e.g. if passed the file:

* * * * *
15 * * * *

Your program should print “Running job 0” every minute, and “Running job 1” once an hour at quarter past the hour.

Exercise

Create a Docker image for your cron scheduler program. Make sure you can run the image.

ai-essentials backlog Tracks

Prep

Queues

Learning Objectives

Reading

Kafka in a Nutshell

Learning Objectives

Reading

Kafka Paper

Learning Objectives

Reading

Project: Kafka Cron Scheduler

Learning Objectives

Cron

Learning Objectives

Writing a cron scheduler without Kafka

Exercise

Exercise