Agenda for the joint BNF, ONB, SB and KB NetarchiveSuite tele-conference August the 14th 2012, 13:00-14:00.
Practical information
- TDC tele-conference:
- Dial in number (+45) 70 26 50 45
- Dial in code 9064479#
- BridgeIT: BridgeIT conference will be available about 5 min. before start of meeting. The Bridgit url is konf01.statsbiblioteket.dk. The Bridgit password is sbview.
Participants
- BNF: Nicholas, Sara
- ONB: Michaela and Andreas
- KB: Tue, Søren and Nicholas
- SB: Colin and Mikis, Sabine
- Any other issues to be discussed on today's tele-conference?
Heritrix 3 in NetarchiveSuite
- The week of 17.sep.
- Issue for planning: NAS-2066 Heritrix roadmap Workshop.
JhoNAS status (Nicholas)
A status update from the begining of August was sent to the PWG and is accessible from this link: jhonas-project-status-aug.pdf
All JHove2 Modules seem to work. Thomas Ledoux is working on containerMD.xsl.
Thomas Ledoux has been testing the different modules and a bunch of issues have been fixed in JWAT/Jhove2.
Current issues: WARC-Target-URI validation is too strict, unit test modules, jhove2 does not remove temp files with -t option.
And of course the usual, finish JWAT library...
- Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.: Done, needs unit testing.
- : Done, needs unit testing. Besides a WARCBatchJob also ArchiveBatchJob has been implemented for batch jobs running on both ARC and WARC.
- : Tested in local installation.
- : Done, needs unit testing.
- : Done, needs unit testing. Problems with WARC and content-length=0.
- :Done, needs unit testing. Problems with WARC and content-length=0.
- : N/A
- : N/A
- : Currently it is a mirror of the ARC file.
- : N/A
- : N/A
- : N/A
Moved sourcecode to GitHub?
I think we should consider moving the code to git hub because:
- Git is a much more flexible than Subversion, see 3 Reasons to Switch to Git from Subversion, GitSvnComparison, svn - git vs Subversion - pros and cons, Why You Should Switch from Subversion to Git.
- Will be moving the code to a standard open source hosting sites, which will increase accessability.
- GitHub is great!
Iteration 52 (3.21 development release) (Mikis)
Status of the production sites
- Netarkivet:
As our broad crawls a speeded up to last less than 2 month, we took advantage of the break between to broad crawls
- To crawl “very big web sites” (such as the Danish National Broadcast dr.dk and our other main tv-station tv2.dk) in depth.
- To crawl websites of ministries, departments etc. in depth
- To capture url’s of YouTube videos on and by political parties
We started our own event crawl on the Olympics in London: entering url’s into the system, QA and monitoring.
As to our selective crawls: “business as usual” – that is to say: analyze of “candidates” (new sites proposed for selective crawls), QA of selective crawls, monitoring harvest jobs, revision of harvest profiles
- BNF:
- ONB:
Date for NAS workshop at SB
Mid-october?
Date for next joint tele-conference.
September 11th?