A couple weekends back, I spent some time kicking the tires of my new OpenStack cluster with a trivialized distributed processing application to calculate digits of pi utilizing multiple worker machines.
[tl/dr: Screenshots are at the bottom.]
The machines automagically install dependency Ubuntu packages defined in
fabfile.APT_PACKAGES and dependency Python libraries defined in
fabfile.PIP_PACKAGES. Next, the machines
git clone my Celery project defined in
settings.GIT_URL and complete the tasks therein.
pzcelery is my Celery project which is deployed to the machine instances.
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well. The execution units, called tasks, are executed concurrently on a single or more worker servers
This library is what lets a programmer consume a cluster effortlessly. One machine is designated as the broker and hosts a message queue; the other machines are designated as workers and these process tasks. Clusters are useful to compute problems which are too large for any one computer to do so (quickly).
The project itself comprises a
tasks.py which defines the tasks and a
client.py which dispatches tasks to workers. Here, I make some pi:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
make_pi is a trivial demonstration task. However, as the
GIT_URL is user-defined in
settings.py, I will be able to re-use
pzcluster.git to run other compute projects in the future, beyond
pzcelery.git. This Celery project was merely a proof-of-concept before bigger and better things to come.