NAS usage resources

(Page under construction.)

What kind of resources are used by different organisation to run Netarchivesuite and Heritrix?


Denmark

We have in total of H3 harvester capaity (CPH + AAR):   150  instances

104 dedicated broadcrawl harvesters (CPH)

46 dedicated selective harvesters using different chanels (CPH+AAR)

Number of H3 instances per server is based on how many Alerts (caused by overload) in the NetarchiveSuite Running status GUI (it should be under 50 per job).


Physical servers: dedicated broadcrawl H3 harvest

For some years  ago we tried with a virtual farm of 35 broadcrawl harvesters  with one H3 instance on each with NFS diskstorage. It was not cheap to scale or to run a stabil broadcrawl because of the NFS diskstorage and it was too expensive to have storage locally compared to physical servers. So we are now using physical servers for broad crawl.


QuantityPhy/Vir

Disk TB

RAM GBSwap GBCPUsYear/genH3 instCountry/placeTot H3 inst
broadselbroadsel
5phy4,33216282013-2018

10


DK/CPH

50


1phy4,43216702020

10


DK/CPH10
1phy4,46464322022

18


DK/CPH18
3phy4,4323224old

8-10


DK/CPH26
5vir32018

8DK/AAR
40
6vir1,5412

1DK/CPH
6

(Temp DK pasted text:)

001-005: 4,3 TB 32 G RAM 16 G Swap 28xCPU about 5-10 years old HW generation 8xH3 instances per server

013: 7 TB SSD 32 G RAM 32 G swap 70xCPU 3 years old HP HW generation 10xH3 instances

014: 4,7 TB SSD 64 G RAM 64 G SWAP 32x CPU newest HP HW generation  (from last year) 18xH3 instances  (The storage here is a little too low - should have been 3-4 TB - we will see..)

015-016-017: 4,4 TB 32 G RAM 32 G swap 24xCP very old bitarchive servers with spec for mass processing: 6xH3 instances per server


Virtual servers: dedicated selective harvests and big domains "broadcrawl" selective harvests

006: 1,5 TB 4G RAM 2xCPU 10K  diske

007: 400 G 4G RAM 2xCPU 10K  diske

008-012: 400 G 4G RAM 2xCPU 10K  diske


in AAR: in total : 40 H3 selective harvesters

001-004,006: 3 TB, 20 G RAM, 1 G swap, 8xCPU: 8xH3 instances per server

If all 8 instances are running we are getting a lot of load errors, so the amount of MEM and Swap is too small and perhaps also amount of CPU's.

For some years ago we tried to use this virtuel platform also for broadcrawl harvest. It was not possible.


Preliminary Sweden

QuantityPhy/Vir

Disk TB

RAM GBSwap GBCPUsYear/genH3 instCountry/placeTot H3 inst
broadselbroadsel
18vir0,3108-184

5

1SE/STH

90

18