This project comes from Innovation Week March 2018. Code can be found at https://github.com/csrster/docker-csr
...
With Docker these can be fired up extremely quickly. The broker could be started directly from the command line
Code Block |
---|
sudo docker run -p 7676 seges/openmq |
The ftp server isn't much more complicated, except that we need to define a username and password. The database is a bit more work because we need to include the scripts to initialise the database schema and ingest some test-data. To do this we extend the base Docker image for postgresql with some scripts. The entire Dockerfile looks like
Code Block |
---|
FROM postgres:9.3
COPY harvestdb/0* docker-entrypoint-initdb.d/
COPY harvestdb/data/* ./
RUN mkdir /tsindex
RUN chown postgres:postgres /tsindex |
The first line just imports the basic postgres image. The next two lines copy in some scripts and sql-files to the docker image, while the last two lines create a directory for a tablespace used by the NetarchiveSuite harvestdatabase. The input directory structure just looks like
Code Block |
---|
.
├── Dockerfile
└── harvestdb
├── 00harvestdb_setup.sql
├── 01harvestdb_setup.sh
└── data
├── 01netarchivesuite_init.sql
├── 02harvestdb.testdata.sql
└── 03createArchiveDB.pgsql |
and the base postgres image ensures the two .sql and .sh files that are copied to the directory docker-entrypoint-initdb.d get executed in alphanumerically sorted order.
NetarchiveSuite Applications
All NetarchiveSuite applications are basically similar. To start one, you need to know the name of the Application class and provide an xml settings file, a logging configuration file, and a start script that provides the right classpath (a jmxremote.password file is also needed and the harvesters need a certificate file for the Heritrix 3 https GUI). I created a generic NetarchiveSuite Dockerfile which can be uses jinja2 templating to convert generic configuration files to application specific files. For example, the generic start script looks like
Code Block |
---|
#!/usr/bin/env bash
echo Starting linux application: {{APP_LABEL}}
export CLASSPATH={{CLASSPATH}}:$CLASSPATH;
java -Xmx1024m -Ddk.netarkivet.settings.file=/nas/settings.xml -Dlogback.configurationFile=/nas/logback.xml {{APP_CLASS}} |
and the templating engine just substitutes in for the three named placeholders.
The Dockerfile looks like
Code Block |
---|
FROM mlaccetti/docker-oracle-java8-ubuntu-16.04
ADD https://sbforge.org/nexus/service/local/repositories/releases/content/org/netarchivesuite/distribution/5.2.2/distribution-5.2.2.zip nas.zip
ADD https://sbforge.org/nexus/service/local/repositories/releases/content/org/netarchivesuite/heritrix3-bundler/5.2.2/heritrix3-bundler-5.2.2.zip h3bundler.zip
RUN apt-get update && apt-get install -y ca-certificates unzip postgresql-client python-setuptools && easy_install j2cli
RUN unzip nas.zip -d nas
RUN unzip h3bundler.zip
RUN mv heritrix-3* bundler
RUN mv bundler/lib/* /nas/lib
WORKDIR /nas
COPY *.j2 /nas/
COPY wait-for-postgres.sh /nas/wait-for-postgres.sh
COPY jmxremote.password /nas/jmxremote.password
COPY docker-entrypoint.sh /
COPY h3server.jks /
RUN chmod 755 /nas/*.j2
RUN chmod 755 /nas/wait-for-postgres.sh
RUN chmod 755 /docker-entrypoint.sh
EXPOSE 8078
CMD ["/docker-entrypoint.sh"] |
As base it uses an ubuntu image with preinstalled Java 8. I copy NetarchiveSuite 5.2.2 into the image and install the jinja2 command line (j2cli). Then comes a little bit of unpacking and renaming of some NetarchiveSuite files. I expose port 8078. Actually this is really only necessary for the GUI and ViewerProxy applications. Finally I define the command to be run by the container when it is started - docker-entrypoint.sh. What does this script actually do?
Info |
---|
#!/bin/bash -e |
It applies the templates and starts the NetarchiveSuite application.
Note that on its won this will always fail, because every NetarchiveSuite application requires as an absolute minimum that the JMS broker is also running. So how do we coordinate all that?
Putting it all together with Docker-Compose
Docker-compose is a magical application that takes a single file (in yaml format) that specifies all the different Docker containers your application needs, the dependencies amongst them, and which exposed ports they use to talk to each other. For example, for NetarchiveSuite we could start with
Code Block |
---|
version: "3"
services:
database:
build: nasdb
ports:
- 5432
mq:
image: seges/openmq
ports:
- 7676
ftp:
image: andrewvos/docker-proftpd
ports:
- 20
- 21
- "21100-21110:21100-21110"
environment:
- USERNAME=jms
nasgui:
build: nasapp
ports:
- "8078:8078"
links:
- database
depends_on:
- database
- mq
environment:
- APP_LABEL=GUIApplication
- APP_CLASS=dk.netarkivet.common.webinterface.GUIApplication
- CLASSPATH=/nas/lib/netarchivesuite-monitor-core.jar:/nas/lib/netarchivesuite-harvest-scheduler.jar:/nas/lib/netarchivesuite-harvester-core.jar:/nas/lib/netarchivesuite-archive-core.jar
command: ["/nas/wait-for-postgres.sh", "database", "--", "/docker-entrypoint.sh"] |
This defines three services (the database, the JMS broker and the ftp server) and then a single NetarchiveSuite application.
Note the "ports" variable on the database service, for example. This means that the nasgui application will be able to access the database on the url jdbc://database:5432 . Docker and docker-compose ensure that the name "database" is mapped to the actual container and that the port 5432 which is otherwise only visible internally the database container is made available to the nasgui container. All this magic happens behind the scenes. As far as the humble programmer is concerned, the nasgui is just connecting to a named machine on a the usual port 5432.
Note however that the GUI specifies a port "8078:8078". This means that the internally exposed port 8078 is mapped to the real port 8078 on the host machine, which is where the GUI can actually be seen.
Finally there is the script "wait-for-postgres.sh". This is just a little script that polls the postgres database and waits until postgres is started before calling docker-entrypoint.sh.
The full docker-compose.yml file contains all the necessary applications to create a fully-functional containerised NetarchiveSuite instance.
Note that the macnine names are the values container Id's supplied by docker, which is what the JVM sees as its hostname. Not especially useful!
What Else?
One advantage of using something standardised like Docker/-Compose is that you can leverage standardised tools. For example the Rancher tool provides a dashboard for all your Docker containers, and has support for deploying docker-compose builds to multiple hosts (supposedly - I didn't test it). Some screenshots