...
Once umbra and rabbitmq are installed, you can play with umbra without using the heritrix installation. However for these commands to work it seems that you need to make a small change to the default rabbbitmq setup. Specifically you need to bind the queue "urls" to the routing key "urls" in the gui:
(This binding is created automatically when using heritrix.) After this you can use the verbose options to manually see what links umbra finds on different pages:
umbra -v & queue-url -v http: //www.netarkivet.dk |
Umbra/chrome has an annoying habit of stealing keyboard focus. You can avoid this by starting umbra on a separate X server as follows:
Code Block |
---|
sudo X :1 <ctrl> <alt> <f7> (to return to the original server) export DISPLAY=:1; umbra -v |
gives giving output from umbra like
2015 - 01 - 19 13 : 57 : 55 , 348 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py: 180 ) sending to amqp exchange=umbra routing_key=load_url. 0 payload={ 'parentUrl' : 'http://www.netarkivet.dk' , 'method' : 'GET' , 'parentUrlMetadata' : {}, 'headers' : { 'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36' , 'X-DevTools-Emulate-Network-Conditions-Client-Id' : '0B80F796-5795-0FDF-AB44-149ADD52C507' }, 'url' : 'http://netarkivet.dk/' } 2015 - 01 - 19 13 : 57 : 55 , 843 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py: 180 ) sending to amqp exchange=umbra routing_key=load_url. 0 payload={ 'parentUrl' : 'http://www.netarkivet.dk' , 'method' : 'GET' , 'parentUrlMetadata' : {}, 'headers' : { 'Accept' : 'text/css,*/*;q=0.1' , 'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36' , 'X-DevTools-Emulate-Network-Conditions-Client-Id' : '0B80F796-5795-0FDF-AB44-149ADD52C507' , 'Referer' : 'http://netarkivet.dk/' }, 'url' : 'http://netarkivet.dk/wp-content/themes/netarkivet/style.css' } 2015 - 01 - 19 13 : 57 : 55 , 843 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py: 180 ) sending to amqp exchange=umbra routing_key=load_url. 0 payload={ 'parentUrl' : 'http://www.netarkivet.dk' , 'method' : 'GET' , 'parentUrlMetadata' : {}, 'headers' : { 'Accept' : 'text/css,*/*;q=0.1' , 'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36' , 'X-DevTools-Emulate-Network-Conditions-Client-Id' : '0B80F796-5795-0FDF-AB44-149ADD52C507' , 'Referer' : 'http://netarkivet.dk/' }, 'url' : 'http://netarkivet.dk/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=3.9.3' } 2015 - 01 - 19 13 : 57 : 55 , 843 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py: 180 ) sending to amqp exchange=umbra routing_key=load_url. 0 payload={ 'parentUrl' : 'http://www.netarkivet.dk' , 'method' : 'GET' , 'parentUrlMetadata' : {}, 'headers' : { 'Accept' : 'text/css,*/*;q=0.1' , 'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36' , 'X-DevTools-Emulate-Network-Conditions-Client-Id' : '0B80F796-5795-0FDF-AB44-149ADD52C507' , 'Referer' : 'http://netarkivet.dk/' }, 'url' : 'http://netarkivet.dk/wp-content/plugins/gallery-to-slideshow//css/gallery-to-slideshow.css?ver=1.4' } |
...