...
Panel |
---|
Our first broad crawl with NAS5 and H3 is finished! We crawled 101.55 TB in 6 weeks. We encountered 4 problems during this crawl:
We received only 5 complaints from web publishers compared to around 15 in 2016. During the coming weeks, we are going to analyse the crawl reports and the quality of the archives to produce a report on the crawl. In parallel, we had scheduling issues: our daily news crawls stopped three times. Two jobs were submitted with the same ID and this changed the status of the selective harvest from active to inactive. |
ONB
Panel |
---|
BNE
Panel |
---|
Dear colleagues, Last month, we successfully migrated all our web collections to the production environment of NAS 5. We are reasonably happy with the new environment. Anyway, and despite the tests we run on the preproduction environment, we experienced some problems mainly related to the configuration of templates in NAS 5. Frontpage+1 and frontpage+2 didn’t work as expected. Nevertheless we realized that some of the crawls ran very fast, but they stopped when encountered any slight problem and didn’t manage to finish. Juan Carlos compared the NAS 5 templates with the ones in NAS 4 and adjusted some parameters. Apparently everything is working properly, crawls finish faster than before and harvest more objects. But the default template is not working yet and my IT colleagues are studying its configuration. We wait for the system to be more stable before running the .gal domain crawl. We hope we can launch it before the end of the year. |
Next meetings
- January 9th, 2018
...