Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

RabbitMQ should now be reachable at http://localhost:15672 (user: guest, pass: guest).

Do

Code Block
cd heritrix-3.3.0-BDB-5.0.x-NAS-1.0-SNAPSHOT/jobs/
mkdir myTestJob
cd myTestJob


Get a Heritrix 3 Crawl Job Configuration File (like this one:

View file
namecxml.cxml
height250
),

put it in  heritrix-umbra/heritrix-3.3.0-BDB-5.0.x-NAS-1.0-SNAPSHOT/jobs/myTestJob/

and rename it to  crawler-beans.cxml

At the top of the file, just above the first  bean  tag (not beans), insert:

...

  <ref bean="umbraBean"/>

Then, still under the  myTestJob  dir, create a text file called  seeds.txt  with a single line saying:  http://netarkivet.dk




NOTE: Heritrix can be killed by doing a

...