singularity @ globo.com hackday 2014-12-02

13
Singularity Ambiente de Computação Interativa para Big Data baseado no Spark e IPython. Ciro Cavani Personalização Globo.com HackDay 02/12/2014

Upload: ciro-cavani

Post on 11-Jul-2015

272 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Singularity @ Globo.com HackDay 2014-12-02

SingularityAmbiente de Computação Interativa para Big Data baseado no Spark e IPython.

Ciro CavaniPersonalização

Globo.com

HackDay 02/12/2014

Page 2: Singularity @ Globo.com HackDay 2014-12-02

MotivaçãoA tecnologia necessária para mudar como a Globo.com faz negócio está em produção.

Hadoop2, Kafka e Spark.

A ideia é orientar a Globo para tomar decisões baseada em dados.

Page 3: Singularity @ Globo.com HackDay 2014-12-02

Proposta● ter acesso a todos os dados da empresa● rodar algoritmos de machine learning● identificar informações relevantes● formular hipóteses e explorar os dados● formular experimentos, testes AB● um sistema interativo

Page 4: Singularity @ Globo.com HackDay 2014-12-02

HadoopHadoop2 é dois sistemas:● HDFS, sistema de

arquivos distribuído;● YARN, sistema de

execução distribuído.

HBase, Pig, Mahout, Solr

imagem: http://hortonworks.com/hadoop/yarn/

Page 5: Singularity @ Globo.com HackDay 2014-12-02

KafkaCluster de distribuição de mensagens (bilhões por dia) criado pelo LinkedIn.

Performance - alto throughput

Escalabilidade - muitos consumidores

Mensagens pequenas, não estruturadas / opacas (bytes)

imagem: http://hortonworks.com/hadoop/kafka/

Page 6: Singularity @ Globo.com HackDay 2014-12-02

SparkA fast and general-purpose cluster computing system.

High-level APIs in PythonSpark SQL for SQL and structured data processingMLlib for machine learningGraphX for graph processingSpark Streaming for stream processing

http://spark.apache.org/

Page 7: Singularity @ Globo.com HackDay 2014-12-02

IPython Notebookweb-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.

Page 8: Singularity @ Globo.com HackDay 2014-12-02
Page 9: Singularity @ Globo.com HackDay 2014-12-02
Page 10: Singularity @ Globo.com HackDay 2014-12-02

Wolfram Language (inspiração)

http://youtu.be/_P9HqHVPeik

Stephen Wolfram introduces the Wolfram Language in this video that shows how the symbolic programming language enables powerful functional programming, querying of large databases, flexible interactivity, easy deployment, and much, much more.

Page 11: Singularity @ Globo.com HackDay 2014-12-02

Databricks Cloud (inspiração)

http://youtu.be/dJQ5lV5Tldw

The Databricks Cloud provides the full power of Spark to you, in the cloud, plus a powerful set of features for exploring and visualization your data, as well as writing and deploying production data products.

* Visualize data right as you explore it* Collaborate in real-time* Export your analysis to production dashboards in seconds

Page 12: Singularity @ Globo.com HackDay 2014-12-02

Jupyter e Julia (futuro)

http://youtu.be/jhlVHoeB05A

This talk will begin with an introduction to the Julia language, both explaining why it is able to attain C-like performance in many cases. (...) we will explain how connecting to the IPython "Jupyter" front-end from an IJulia back-end allows Julia to benefit from IPython's rich multimedia notebook interface, and how Julia can even use IPython 2's interactive-widget infrastructure to provide truly interactive computations.https://github.com/stevengj/Julia-EuroSciPy14

Page 13: Singularity @ Globo.com HackDay 2014-12-02

Globo.comGostou?

Quer Trabalhar na Globo.com?Estamos Contratando

https://github.com/globocom/IWantToWorkAtGloboCom

[email protected]://www.linkedin.com/in/cirocavani