Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Once umbra and rabbitmq are installed, you can play with umbra without using the heritrix installation. However for these commands to work it seems that you need to make a small change to the default rabbbitmq setup. Specifically you need to bind the queue "urls" to the routing key "urls" in the gui:

(This binding is created automatically when using heritrix.) After this you can use the verbose options to manually see what links umbra finds on different pages:

 

umbra -v &
queue-url -v  http://www.netarkivet.dk

Umbra/chrome has an annoying habit of stealing keyboard focus. You can avoid this by starting umbra on a separate X server as follows:

Code Block
sudo X :1
<ctrl> <alt> <f7>    (to return to the original server)
export DISPLAY=:1; umbra -v

 

gives giving output from umbra like

 

2015-01-19 13:57:55,348 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py:180) sending to amqp exchange=umbra routing_key=load_url.0 payload={'parentUrl''http://www.netarkivet.dk''method''GET''parentUrlMetadata': {}, 'headers': {'User-Agent''Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36''X-DevTools-Emulate-Network-Conditions-Client-Id''0B80F796-5795-0FDF-AB44-149ADD52C507'}, 'url''http://netarkivet.dk/'}
2015-01-19 13:57:55,843 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py:180) sending to amqp exchange=umbra routing_key=load_url.0 payload={'parentUrl''http://www.netarkivet.dk''method''GET''parentUrlMetadata': {}, 'headers': {'Accept''text/css,*/*;q=0.1''User-Agent''Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36''X-DevTools-Emulate-Network-Conditions-Client-Id''0B80F796-5795-0FDF-AB44-149ADD52C507''Referer''http://netarkivet.dk/'}, 'url''http://netarkivet.dk/wp-content/themes/netarkivet/style.css'}
2015-01-19 13:57:55,843 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py:180) sending to amqp exchange=umbra routing_key=load_url.0 payload={'parentUrl''http://www.netarkivet.dk''method''GET''parentUrlMetadata': {}, 'headers': {'Accept''text/css,*/*;q=0.1''User-Agent''Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36''X-DevTools-Emulate-Network-Conditions-Client-Id''0B80F796-5795-0FDF-AB44-149ADD52C507''Referer''http://netarkivet.dk/'}, 'url''http://netarkivet.dk/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=3.9.3'}
2015-01-19 13:57:55,843 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py:180) sending to amqp exchange=umbra routing_key=load_url.0 payload={'parentUrl''http://www.netarkivet.dk''method''GET''parentUrlMetadata': {}, 'headers': {'Accept''text/css,*/*;q=0.1''User-Agent''Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36''X-DevTools-Emulate-Network-Conditions-Client-Id''0B80F796-5795-0FDF-AB44-149ADD52C507''Referer''http://netarkivet.dk/'}, 'url''http://netarkivet.dk/wp-content/plugins/gallery-to-slideshow//css/gallery-to-slideshow.css?ver=1.4'}

 

...