2017-01-03 Statusmeeting
- 1 Participants
- 2 Upcoming NAS 5.3 Release
- 3 Status of the production sites
- 3.1 Netarkivet
- 3.2 BnF
- 3.3 ONB
- 3.4 BNE
- 3.5 KB/Sweden
- 3.6 Next meetings
- 3.7 Any other business?
Agenda for the joint BNF, ONB, SB, KB and BNE NetarchiveSuite tele-conference 2016-01-03, 13:00-14:00.
Practical information
Login as guest
Write your name
Insert password: wayback
Participants
BNF: Lam, Géraldine, Sara
ONB: Michaela, Andreas
KB/DK: Stephen, Tue
SB: Sabine, Colin
BNE: -
KB/Sweden: -
Upcoming NAS 5.3 Release
Status of developments.
BnF is currently working on the following features:
NAS-2592 Evolutions on H3 Running Job page (History/history/job/ID/)
NAS-2588 Add an H3 extension to enable queue budget modification
NAS-2589 Add an H3 extension to enable the addition of RejectRules
NAS-2590 Evolutions on the H3 Frontier page
NAS-2591 Evolutions on the H3 Crawllog page
NAS-2595 Minor evolutions on Running Jobs (Harveststatus-running.jsp)
NAS-2594 Evolutions on Running Job X Details (Harveststatus-running-jobdetails.jsp)
NAS-2593 Minor label changes in the "Harvest Status" menu
NAS-2563 Give users the ability to search the job list by jobID (Harveststatus-alljobs.jsp)
NAS-2564 Show all jobs in the "All Jobs" page
NAS-2565 Fix orderXMLName and add operator/templateUpdateDate/templateDescription fields in harvestInfo.xml
NAS-2587 Software stated in the metadata files warcinfo records cannot be easily parsed
On fork https://github.com/bnfklm/netarchivesuite multiple branchs were created in order to facilitate the proofreading of the code at pull request time.
master --- \BnF ---------- \NAS-2592 ---------- \NAS-2595 ----------
Branch BnF (on fork https://github.com/bnfklm/netarchivesuite/)
https://kb-dk.atlassian.net/browse/NAS-2589
Add an H3 extension to enable the addition of RejectRules
https://kb-dk.atlassian.net/browse/NAS-2588
Add an H3 extension to enable queue budget modification
https://kb-dk.atlassian.net/browse/NAS-2565
Fix orderXMLName and add operator/templateUpdateDate/templateDescription fields in harvestInfo.xml
https://kb-dk.atlassian.net/browse/NAS-2591
Evolutions on the Crawllog page
https://kb-dk.atlassian.net/browse/NAS-2590
Evolutions on the H3 Frontier page
-----------------------------------------------------------
Branch NAS-2592 (on fork https://github.com/bnfklm/netarchivesuite/)
https://kb-dk.atlassian.net/browse/NAS-2592
Evolutions on H3 Running Job page (History/history/job/ID/)
https://kb-dk.atlassian.net/browse/NAS-2593
Minor label changes in the "Harvest Status" menu
https://kb-dk.atlassian.net/browse/NAS-2594
Evolutions on Running Job X Details (Harveststatus-running-jobdetails.jsp)
-----------------------------------------------------------
Branch NAS-2595 (on fork https://github.com/bnfklm/netarchivesuite/)
https://sbforge.org/jira/browse/NAS-2595
Minor evolutions on Running Jobs (Harveststatus-running.jsp)
Status of the production sites
Netarkivet
We still keep NAS 5.2.2 in our test environment because of a bug, which had prevented NAS 5.2 from creating Heritrix jobs. Jobs did not start, selctive crawls were deactivated by the system or the jobs just hung without getting in touch with H3. The bug is solved but meanwhile we had started the fourth broad crawl for 2016 (it ran from Nov. 28 to Dec. 27 2016), so we wait with implementing NAS 5.2.2 in our production system until the beginning of 2017.
We are updating our access procedure and our citrix solution. Because of the rather restrictive Danish data protection law we have complicated user group administration and we have problems with one of the groups
We still work on the compression of the archive. There is a bug in JWA S, the compression software, when it is solved we will reschedule the compression project.
BnF
We are still working on our move to NAS 5 and H3 (see above).
Our annual broad crawl was completed on December 5th, after 8 weeks. We gathered 90,4TB of data (compressed), which makes this crawl the biggest ever realised at the BnF. The infrastructure was stable and we didn't encounter any technical problems. We will be analysing the data more precisely on two subjects: regional domains and ebooks.
Annick Le Follic has now left the web archiving team to work in BnF Metadata department. Lam will also soon leave our team to work on other development projects. And we will welcome Thomas F. as a new developer.
ONB
In the meantime we have a president, so we keep running our presidental elections crawl until our president will start to work. That will be at the end of january.
We created a new collection “women/gender” in cooperation with ONB’s women/gender-documentation department. The crawl was done in December.
Our next broad crawl will take place in 2017, we still run broad crawls every two years. We will work on a new concept for the webarchive regarding harvesting intervals etc.
In Production we had to switch to NAS 5.2.2 due to a bug in the DeduplicateToCDXAdapter (NAS-2582: DeduplicateToCDXAdapter fails to identify new dedup formatClosed). Because we are using also some NAS Libraries in the backend, we generated a lot of corrupt cdx data in the last monts. But we could fix that now and with the new Version the cdx generator works fine again
BNE
KB/Sweden
Next meetings
Any other business?