NAS usage resources
(Page under construction.)
What kind of resources are used by different organisation to run Netarchivesuite and Heritrix?
Denmark
We have in total of H3 harvester capaity (CPH + AAR): 150 instances
104 dedicated broadcrawl harvesters (CPH)
46 dedicated selective harvesters using different chanels (CPH+AAR)
Number of H3 instances per server is based on how many Alerts (caused by overload) in the NetarchiveSuite Running status GUI (it should be under 50 per job).
Physical servers: dedicated broadcrawl H3 harvest
For some years ago we tried with a virtual farm of 35 broadcrawl harvesters with one H3 instance on each with NFS diskstorage. It was not cheap to scale or to run a stabil broadcrawl because of the NFS diskstorage and it was too expensive to have storage locally compared to physical servers. So we are now using physical servers for broad crawl.
Quantity | Phy/Vir | Disk TB | RAM GB | Swap GB | CPUs | Year/gen | H3 inst | Country/place | Tot H3 inst | ||
---|---|---|---|---|---|---|---|---|---|---|---|
broad | sel | broad | sel | ||||||||
5 | phy | 4,3 | 32 | 16 | 28 | 2013-2018 | 10 | DK/CPH | 50 | ||
1 | phy | 4,4 | 32 | 16 | 70 | 2020 | 10 | DK/CPH | 10 | ||
1 | phy | 4,4 | 64 | 64 | 32 | 2022 | 18 | DK/CPH | 18 | ||
3 | phy | 4,4 | 32 | 32 | 24 | old | 8-10 | DK/CPH | 26 | ||
5 | vir | 3 | 20 | 1 | 8 | 8 | DK/AAR | 40 | |||
6 | vir | 1,5 | 4 | 1 | 2 | 1 | DK/CPH | 6 |
(Temp DK pasted text:)
001-005: 4,3 TB 32 G RAM 16 G Swap 28xCPU about 5-10 years old HW generation 8xH3 instances per server
013: 7 TB SSD 32 G RAM 32 G swap 70xCPU 3 years old HP HW generation 10xH3 instances
014: 4,7 TB SSD 64 G RAM 64 G SWAP 32x CPU newest HP HW generation (from last year) 18xH3 instances (The storage here is a little too low - should have been 3-4 TB - we will see..)
015-016-017: 4,4 TB 32 G RAM 32 G swap 24xCP very old bitarchive servers with spec for mass processing: 6xH3 instances per server
Virtual servers: dedicated selective harvests and big domains "broadcrawl" selective harvests
006: 1,5 TB 4G RAM 2xCPU 10K diske
007: 400 G 4G RAM 2xCPU 10K diske
008-012: 400 G 4G RAM 2xCPU 10K diske
in AAR: in total : 40 H3 selective harvesters
001-004,006: 3 TB, 20 G RAM, 1 G swap, 8xCPU: 8xH3 instances per server
If all 8 instances are running we are getting a lot of load errors, so the amount of MEM and Swap is too small and perhaps also amount of CPU's.
For some years ago we tried to use this virtuel platform also for broadcrawl harvest. It was not possible.
Preliminary Sweden
Quantity | Phy/Vir | Disk TB | RAM GB | Swap GB | CPUs | Year/gen | H3 inst | Country/place | Tot H3 inst | ||
---|---|---|---|---|---|---|---|---|---|---|---|
broad | sel | broad | sel | ||||||||
18 | vir | 0,3 | 10 | 8-18 | 4 | 5 | 1 | SE/STH | 90 | 18 | |