Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Panel

We're preparing our first broad crawl for 2023. WeFor this purpose we're writing a Python program to automate creation of new harvest passes based on a short YAML config file containing values for maxBytes, maxObjects, maxSeconds and ordertemplate per harvest pass. Eg:

auto:
  P1:
  comment: this is an automatically created harvest pass
   objects: 3
    bytes: 1000
    seconds: 3600
  autostart: true
  previous: false
    template:
      name: broad_harvest_type_1
      placeholder_namespace: KB.
      placeholders:
        MAX_OBJECT_SIZE_BYTES: 400000000
        EXTRACT_JAVASCRIPT: false
P2:
previous: true
objects: ...

We have ended a number of older selective harvests that were started because of earlier general elections in Sweden, among them a couple of unsuccessful attempts to harvest Twitter, Facebook and Instagram.

We have added selective harvests for local authorities and regions and will soon add government agencies. These harvests are introduced as a part of our work with the e-legal collections where our other methods of collecting material (RSS-based or OAI-PMH partial harvesting, FTP, web uploading) have been less successful.

Next meetings

  • April 11th
  • May 9th
  • June 6th
  • July 4th
  • September 5th
  • October 3rd
  • November 7th
  • December 5th
  • January 9th 2024

...