Grab - site scraping framework

Grab could help you to:

  • Extract data from web site
  • Work with web-service API
  • Automate some activty on the web site

Important information:

If you want to use Grab in Windows OS then you should to download our pycurl library compilation, we have fixed the bug in pycurl library which causes some POST requests to fail. Link to download: pycurl-ssl-7.19.0.win32-py2.7.msi

Discussions in python-grab

  • 20 August 10:36: Как получить все src изображений в блоке?
  • 15 August 21:34: Как получить код ответа сервера(404)?
  • 15 August 06:50: ошибка в https://github.com/lorien/grab/blob/master/grab/spider/cache_backend/postgresql.py
  • 14 August 14:43: grab.doc.select не ищет, если в строке xpath есть не-ascii символ
  • 13 August 19:24: проблемы с днс при запросе через сокс

Documenation

Docs are here docs.grablib.org. Originally docs were written in Russian. Now I am trying to tranlate documentation into English.

Here is incompleted English docs.

How to help Grab project

  1. Write publication about the Grab in your blog or on some pupular discussion board like reddit or hacker news
  2. Report a bug, describe details
  3. Create new feature and submit pull-request
  4. Order some site-scraping project at DataLab

Development activity


Fork me on GitHub