Download - Hpricot GURU-SP por Jonas Alves
Transcript
- 1. Hpricot Extraindo dados de pginas web por Jonas Alves
2. Jonas Alves
- Rubista desde 2008 WebGoal desde 2009 @jonas_alves http://github.com/jonasfa http://br.linkedin.com/in/alvesjonas
3. Cenrio ? 4. Cenrio
- 10+ pessoas coletando dados manualmente
5. Erros comprometem a qualidade do servio 6. Muito trabalho == hora extra == $$ 7. Automatizar Proposta 8. Ferramentas
- PHP: DOMDocument
- Limitado
Java: HTMLParser
- Verboso!
Ruby: HPricot
- Simples e poderoso
9. Comparao
- Hpricot (Ruby) doc = Hpricot(open('http://www.ruby-lang.org/en/about/')) puts (doc/'#content h3').collect { |h3| h3.inner_text }
10. Comparao
- HTMLParser (Java) CssSelectorNodeFilter cssSelector = new CssSelectorNodeFilter("#content h3"); FilterBean bean = new FilterBean(); bean.setFilters(new NodeFilter[] {cssSelector}); bean.setURL(" http://www.ruby-lang.org/en/about/ "); SimpleNodeIterator iterator = bean.getNodes().elements(); while (iterator.hasMoreNodes()) { System.out.println(iterator.nextNode().toPlainTextString()); }
11. Let's code! 12. http://github.com/jonasfa/hpricot_gurusp GitHub 13. Referncias
- http://www.hpricot.com/
14. http://github.com/hpricot/hpricot 15. http://wiki.github.com/hpricot/hpricot/ 16. Agradecimentos
- GURU-SP
17. Anderson Leite, Caelum e organizao 18. WebGoal