Test11B Bitpresevation
Check filelist is correct and database is loaded with missing files
- Click on Bitpreservation
- Click on "Update" under Filelist status
- Open a new tab in your browser and go to System status: http://$GUIadminserver:$http-port/Status/Monitor-JMXsummary.jsp
- Check that you get INFO messages like this INFO: The file 'TEST2_999.arc' was not found in the database. Thus creating entry for the file.
( be aware of, that the last missing file name inserted in the database will be listed until end )
- And wait until the the filelist is completed without any errors. The last log message should read something like 'INFO: Received batch ended from bitarchive '172.17.0.176_BitApp_G':
BatchEndedMessage for batch job ID:51371-130.226.228.6(ca:65:66:63:d6:69)-55327-1270553597402 From Bitarchive 172.17.0.176_BitApp_G FilesProcessed = 27569 ' for each bitarchiveserver
Here is a list with number files per archive:
- From Bitarchive 172.17.0.176_BitApp_G FilesProcessed = 27569
- From Bitarchive 172.17.0.176_BitApp_H FilesProcessed = 27566
- From Bitarchive 172.17.0.176_BitApp_E FilesProcessed = 27620
- From Bitarchive 172.17.0.176_BitApp_I FilesProcessed = 27559
- From Bitarchive 172.17.0.176_BitApp_F FilesProcessed = 27555
- From Bitarchive 172.17.0.176_BitApp_J FilesProcessed = 18906
Total : 156775 files
14/09/2010 only 5 bitapps each with about 25.800 files having only 25.800 unique files
Check checksum is correct
- Click on Bitpreservation
- Click on "Update" under Checksum status
- Open a new tab in your browser and go to System status: http://$GUIadminserver:$http-port/Status/Monitor-JMXsummary.jsp
- Click on Instanse-ID.
- Click on one of the the first bitarchive instanse-ID's.
- Click Show all in the Index column and.
- Verify that you get log messages like "INFO: The batchjob 'class dk.netarkivet.archive.arcrepository.bitpreservation.ChecksumJob' has run for 1938 seconds and has reached file '11297-MB100.arc' which is number 1615 out of 27620" each 30. sec ( be aware of, that the checksum logmessages can be delayed because of very big files > 1 GB)
- And wait until the the checksum is completed without any errors. The last log message should read something like 'INFO: Finished batch job dk.netarkivet.archive.arcrepository.bitpreservation.ChecksumJob with result: 0 failures in processing 27620 files at 172.17.0.176_BitApp_E'
Stress test batch jobs
Setup test
export TESTX=TEST11B cd /home/test/$TESTX/ mkdir batchprogs scp test@kb-prod-udv-001.kb.dk:/home/test/test-batch/* batchprogs/.
ChecksumJob
Calculating the MD5 checksum on the archive files (takes around 8 hours).
Run the following command:
java -cp lib/dk.netarkivet.archive.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/ChecksumJob.class -Ooutput.checksum
This should write out the following text messages in the console:
Running batch job 'batchprogs/ChecksumJob.class' on files matching '.*' on replica 'KBN', output written to file 'output.checksum', errors written to stderr Processed 11 files with 0 failures Cleaning up dk.netarkivet.common.distribute.JMSConnectionSunMQ Cleaned up dk.netarkivet.common.distribute.JMSConnectionSunMQ
The output is put into the file 'output.checksum'. This file should contain the following text:
1-1-20090316092641-00003-kb-test-har-002.kb.dk.arc##c68b3e18f7b870b76d86de7970a822c2 2-2-20090316092643-00003-kb-test-har-001.kb.dk.arc##7d723dd4d374437c5e29e995521bf014 .......
GoodPostProcessingJob
Run the following job:
java -cp lib/dk.netarkivet.archive.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/GoodPostProcessingJob.class -Ogood.out
The output is put into the file 'good.out'. This file should contain the following text (sorted):
0G5.arc 0G5.arc ...
Go to the status page and check the log for the BitarchiveMonitor . There should be the following messages:
Jun 11, 2010 9:24:31 AM dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer doBatchReply INFO: BatchReplyMessage: 'BatchReplyMessage for batch job ID:10-130.226.228.6(d3:b5:49:8b:d6:94)-37536-1276241061823 FilesProcessed = 156775 FilesFailed = 0 ID:1906780-130.226.228.6(d4:77:b5:34:b8:ae)-52334-1276241071022: To TEST11B_COMMON_THIS_REPOS_CLIENT_130_226_228_6_GUIA_BATCH ReplyTo TEST11B_KB_THE_BAMON OK' sent from BA monitor to queue: '[Queue 'TEST11B_COMMON_THIS_REPOS_CLIENT_130_226_228_6_GUIA_BATCH']' Jun 11, 2010 9:24:27 AM dk.netarkivet.common.utils.batch.GoodPostProcessingJob postProcess INFO: Sorting the filenames Jun 11, 2010 9:24:27 AM dk.netarkivet.common.utils.batch.GoodPostProcessingJob postProcess INFO: Reading all the filenames. Jun 11, 2010 9:24:27 AM dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer doBatchReply INFO: Post processing batchjob results for 'dk.netarkivet.common.utils.batch.LoadableFileBatchJob' with id 'ID:10-130.226.228.6(d3:b5:49:8b:d6:94)-37536-1276241061823'
EvilPostProcessingJob
Run the following job:
java -cp lib/dk.netarkivet.archive.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/EvilPostProcessingJob.class -Oevil.out
The output is put into the file 'evil.out'. This file should contain the following text (unsorted):
0G5.arc 1G5.arc ...
Running method from a jar file
Run the following job (takes around 25 hours):
java -cp lib/dk.netarkivet.archive.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH dk.netarkivet.archive.tools.RunBatch -Jbatchprogs/mime.jar -Nbatchprogs.MimeSize -Omimesize.out
The output is put into the file 'mimesize.out'. This file should contain the following text:
.. text/html##567890 image/jpeg##1234567 ...