hpricot guru-sp por jonas alves

Download Hpricot GURU-SP por Jonas Alves

If you can't read please download the document

Upload: jonas-alves

Post on 26-May-2015

771 views

Category:

Technology


0 download

DESCRIPTION

Apresentação sobre Hpricot para o encontro do GURU-SP. Por Jonas Alves http://github.com/jonasfa/hpricot_gurusp

TRANSCRIPT

  • 1. Hpricot Extraindo dados de pginas web por Jonas Alves

2. Jonas Alves

  • Rubista desde 2008 WebGoal desde 2009 @jonas_alves http://github.com/jonasfa http://br.linkedin.com/in/alvesjonas

3. Cenrio ? 4. Cenrio

  • 10+ pessoas coletando dados manualmente

5. Erros comprometem a qualidade do servio 6. Muito trabalho == hora extra == $$ 7. Automatizar Proposta 8. Ferramentas

  • PHP: DOMDocument
  • Limitado

Java: HTMLParser

  • Verboso!

Ruby: HPricot

  • Simples e poderoso

9. Comparao

  • Hpricot (Ruby) doc = Hpricot(open('http://www.ruby-lang.org/en/about/')) puts (doc/'#content h3').collect { |h3| h3.inner_text }

10. Comparao

  • HTMLParser (Java) CssSelectorNodeFilter cssSelector = new CssSelectorNodeFilter("#content h3"); FilterBean bean = new FilterBean(); bean.setFilters(new NodeFilter[] {cssSelector}); bean.setURL(" http://www.ruby-lang.org/en/about/ "); SimpleNodeIterator iterator = bean.getNodes().elements(); while (iterator.hasMoreNodes()) { System.out.println(iterator.nextNode().toPlainTextString()); }

11. Let's code! 12. http://github.com/jonasfa/hpricot_gurusp GitHub 13. Referncias

  • http://www.hpricot.com/

14. http://github.com/hpricot/hpricot 15. http://wiki.github.com/hpricot/hpricot/ 16. Agradecimentos

  • GURU-SP

17. Anderson Leite, Caelum e organizao 18. WebGoal