Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

(Page under construction.)

What kind of resources are used by different organisation to run Netarchivesuite and Heritrix?


Denmark

We have in total of H3 harvester capaity (CPH + AAR):   135  instances

89 dedicated broadcrawl harvesters (CPH)

46 dedicated selective harvesters using different chanels (CPH+AAR)

Number of H3 instances per server is based on how many Alerts (caused by overload) in the NetarchiveSuite Running status GUI (it should be under 50 per job).


Physical servers: dedicated broadcrawl H3 harvest

For some years  ago we tried with a virtual farm of 35 broadcrawl harvesters  with one H3 instance on each with NFS diskstorage. It was not cheap to scale or to run a stabil broadcrawl because of the NFS diskstorage and it was too expensive to have storage locally compared to physical servers. So we are now using physical servers for broad crawl.


QuantityPhy/Vir

Disk TB

RAM GBSwap GBCPUsYear/genH3 instCountry/placeTot H3 inst
broadselbroadsel
5phy4,33216282013-2018

8


DK/CPH

40


1phy4,43216702020

10


DK/CPH10
1phy2,76464322022

17


DK/CPH17
2phy4,4323224old

6


DK/CPH12
1vir1,54
2


DK/CPH

1vir0,44
2


DK/CPH

5vir0,44
2


DK/CPH

5vir32018

8DK/AAR
40

(Temp DK pasted text:)

001-005: 4,3 TB 32 G RAM 16 G Swap 28xCPU about 5-10 years old HW generation 8xH3 instances per server

013: 4,4 TB SSD 32 G RAM 32 G swap 70xCPU 3 years old HP HW generation 10xH3 instances

014: 2,7 TB SSD 64 G RAM 64 G SWAP 32x CPU newest HP HW generation  (from last year) 17xH3 instances  (The storage here is a little too low - should have been 3-4 TB - we will see..)

015-016: 4,4 TB 32 G RAM 32 G swap 24xCP very old bitarchive servers with spec for mass processing: 6xH3 instances per server


Virtual servers: dedicated selective harvests and big domains "broadcrawl" selective harvests

006: 1,5 TB 4G RAM 2xCPU 10K  diske

007: 400 G 4G RAM 2xCPU 10K  diske

008-012: 400 G 4G RAM 2xCPU 10K  diske


in AAR: in total : 40 H3 selective harvesters

001-004,006: 3 TB, 20 G RAM, 1 G swap, 8xCPU: 8xH3 instances per server

If all 8 instances are running we are getting a lot of load errors, so the amount of MEM and Swap is too small and perhaps also amount of CPU's.

For some years ago we tried to use this virtuel platform also for broadcrawl harvest. It was not possible.


Preliminary Sweden

QuantityPhy/Vir

Disk GB

RAM GBSwap GBCPUsYear/genH3 instCountry/placeTot H3 inst
broadselbroadsel
12vir300108-184

5

1SE/STH

64

8
  • No labels