Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Branch BnF (on fork https://github.com/bnfklm/netarchivesuite/)

https://sbforge.org/jira/kb-dk.atlassian.net/browse/NAS-2589
Add an H3 extension to enable the addition of RejectRules

https://sbforge.org/jira/kb-dk.atlassian.net/browse/NAS-2588
Add an H3 extension to enable queue budget modification

https://sbforge.org/jira/kb-dk.atlassian.net/browse/NAS-2565
Fix orderXMLName and add operator/templateUpdateDate/templateDescription fields in harvestInfo.xml

https://sbforge.org/jira/kb-dk.atlassian.net/browse/NAS-2591
Evolutions on the Crawllog page

https://sbforge.org/jira/kb-dk.atlassian.net/browse/NAS-2590
Evolutions on the H3 Frontier page

...

Branch NAS-2592 (on fork https://github.com/bnfklm/netarchivesuite/)

https://sbforge.org/jirakb-dk.atlassian.net/browse/NAS-2592
Evolutions on H3 Running Job page (History/history/job/ID/)

https://sbforge.org/jira/kb-dk.atlassian.net/browse/NAS-2593
Minor label changes in the "Harvest Status" menu

https://sbforge.org/jirakb-dk.atlassian.net/browse/NAS-2594
Evolutions on Running Job X Details (Harveststatus-running-jobdetails.jsp)

...

Branch NAS-2595 (on fork https://github.com/bnfklm/netarchivesuite/)

https://sbforge.org/jira/browse/NAS-2595
Minor evolutions on Running Jobs (Harveststatus-running.jsp)

...

Panel

We are still working on our move to NAS 5 and H3 (see above).

Our annual broad crawl was completed on December 5th, after 8 weeks. We gathered 90,4TB of data (compressed), which makes this crawl the biggest ever realised at the BnF. The infrastructure was stable and we didn't encounter any technical problems. We will be analysing the data more precisely on two subjects: regional domains and ebooks.

Annick Le Follic has now left the web archiving team to work in BnF Metadata department. Lam will also soon leave our team to work on other development projects. And we will welcome TThomas F. Fressin as a new developer.

ONB

Panel
In the meantime we have a president, so we keep running our presidental elections crawl until our president will start to work. That will be at the end of january.

We created a new collection  “women/gender” in cooperation with ONB’s women/gender-documentation department. The crawl was done in December.

Our next broad crawl will take place in 2017, we still run broad crawls every two years. We will work on a new concept for the webarchive regarding harvesting intervals etc.

 

In Production we had to switch to NAS 5.2.2 due to a bug in the DeduplicateToCDXAdapter (
Jira Legacy
serverSystem JIRA
serverId81c76265-cab2-3ba5-b74d-ee7cd9a2765e
keyNAS-2582
). Because we are using also some NAS Libraries in the backend, we generated a lot of corrupt cdx data in the last monts. But we could fix that now and with the new Version the cdx generator works fine again

...