Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Harvest Templates

Template for daily harvest

Some help to harvest newspapers : a way to stop automatically the harvest after a day (exactly after 23 hours so the deduplication has time to be finished). Here is the extract of the order.xml for this purpose:

A processor that halts further progress once a fixed amount of time has elapsed since the start of a crawl.

Code Block
<newObject name="RuntimeLimitEnforcer" class="org.archive.crawler.prefetch.RuntimeLimitEnforcer">

 . <boolean name="enabled">true</boolean> <newObject name="[[RuntimeLimitEnforcer#decide-rules.22|RuntimeLimitEnforcer#decide-rules"]] class="org.archive.crawler.deciderules.DecideRuleSequence">
  . <map name="rules"> </map>
 </newObject> <long name="runtime-sec">82800</long> <string name="end-operation">Terminate job</string>

</newObject>