Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

sudo apt-get install rabbitmq-server
sudo rabbitmq-plugins enable rabbitmq_management
sudo service rabbitmq-server restart

 

After whoch which rabbitmq can be managed at http://localhost:15672 (user/pass guest/guest).

...

Once umbra and rabbitmq are installed, you can play with umbra without using the heritrix installation. However for these commands to work it seems that you need to make a small change to the default rabbbitmq setup. Specifically you need to bind the queue "urls" to the routing key "urls" in the gui:

(This binding is created automatically when using heritrix.) After this you can use the verbose options to manually see what links umbra finds on different pages:

 

umbra -v &
queue-url -v  http://www.netarkivet.dk

Umbra/chrome has an annoying habit of stealing keyboard focus. You can avoid this by starting umbra on a separate X server as follows:

Code Block
sudo X :1
<ctrl> <alt> <f7>    (to return to the original server)
export DISPLAY=:1; umbra -v

 

gives giving output from umbra like

 

2015-01-19 13:57:55,348 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py:180) sending to amqp exchange=umbra routing_key=load_url.0 payload={'parentUrl''http://www.netarkivet.dk''method''GET''parentUrlMetadata': {}, 'headers': {'User-Agent''Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36''X-DevTools-Emulate-Network-Conditions-Client-Id''0B80F796-5795-0FDF-AB44-149ADD52C507'}, 'url''http://netarkivet.dk/'}
2015-01-19 13:57:55,843 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py:180) sending to amqp exchange=umbra routing_key=load_url.0 payload={'parentUrl''http://www.netarkivet.dk''method''GET''parentUrlMetadata': {}, 'headers': {'Accept''text/css,*/*;q=0.1''User-Agent''Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36''X-DevTools-Emulate-Network-Conditions-Client-Id''0B80F796-5795-0FDF-AB44-149ADD52C507''Referer''http://netarkivet.dk/'}, 'url''http://netarkivet.dk/wp-content/themes/netarkivet/style.css'}
2015-01-19 13:57:55,843 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py:180) sending to amqp exchange=umbra routing_key=load_url.0 payload={'parentUrl''http://www.netarkivet.dk''method''GET''parentUrlMetadata': {}, 'headers': {'Accept''text/css,*/*;q=0.1''User-Agent''Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36''X-DevTools-Emulate-Network-Conditions-Client-Id''0B80F796-5795-0FDF-AB44-149ADD52C507''Referer''http://netarkivet.dk/'}, 'url''http://netarkivet.dk/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=3.9.3'}
2015-01-19 13:57:55,843 16121 DEBUG WebsockThread9200-XC3zTY umbra.controller.AmqpBrowserController.on_request(controller.py:180) sending to amqp exchange=umbra routing_key=load_url.0 payload={'parentUrl''http://www.netarkivet.dk''method''GET''parentUrlMetadata': {}, 'headers': {'Accept''text/css,*/*;q=0.1''User-Agent''Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36''X-DevTools-Emulate-Network-Conditions-Client-Id''0B80F796-5795-0FDF-AB44-149ADD52C507''Referer''http://netarkivet.dk/'}, 'url''http://netarkivet.dk/wp-content/plugins/gallery-to-slideshow//css/gallery-to-slideshow.css?ver=1.4'}

 

...