Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A list of crawler traps is just a plain text file containing crawler-trap regular expressions one-per-line. Lists may be active or inactive. When NetarchiveSuite creates a new job for any harvest, all crawler traps for all active lists (excluding duplicates) are added to the crawl template for that job.

attachment:GlobalCrawlerTraps_1.pngImage Added

To upload a list of global traps, first click on the Edit link and fill in a name and description for the list of crawler traps and the path where the file containing the crawler trap expressions is to be found. You can also choose whether the list should be initially active or inactive. Click Create to upload the list.

attachment:GlobalCrawlerTraps_2.pngImage Added

A list may be made active or inactive by clicking on the Activate and Deactivate buttons. Lists may also be viewed (via the Retrieve button), deleted, or edited. Note that the retrieved version of a crawler trap list may differ from the original uploaded version because any duplicates in the original are removed during upload and the order of the lines in the retrieved version will not be the same as in the original file. The Edit actions allow for uploading of a new version of the list.

attachment:GlobalCrawlerTraps_3.pngImage Added

A side effect of using global crawler trap lists is that the database will grow more rapidly as the modified crawl template, including all the active crawler traps, is stored for every job.

...