intercon 2016 - sla vs agilidade: uso de microserviços e monitoramento de cloud

Post on 07-Jan-2017

74 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

October 2016

First 90SLA vs. AgileMicroservices and cloud monitoring

Why this talk?

This is our visionBuilding the foundation to Build a 3B Company by FY20

Agenda1 . “Old World”: MercadoLivre’s original architecture.

2 . “Ground Zero”: shifting to microservices on the cloud

3 . Monitoring the cloud

4. Alarms: when things go south

5. “Fury”: streamlining DevOps at MercadoLivre

In numbers

+400 deploys/dayOn +650 APPS

+1000 DevelopersIn 8 development centers

+10 programming languages

In numbers

+25.000.000Request per minute

+22.000 VM’sIn 7 data centers

+700 DB’sIn 4 different engines

OldWorld

Old world architecture

User ml.jarHuge DB

This is our visionBuilding the foundation to Build a 3B Company by FY20

Old world properties

● Monolithic

● Highly coupled code

● Unified SVN repository

● Single DB

● Simple infrastructure with little overhead

● Single QA team

● Closed system

This is our visionBuilding the foundation to Build a 3B Company by FY20

Deployments as ML grew

Anyone at anytime

This is our visionBuilding the foundation to Build a 3B Company by FY20

Deployments as ML grew

Anyone at anytime

Some people, anytime

This is our visionBuilding the foundation to Build a 3B Company by FY20

Deployments as ML grew

Anyone at anytime

Some people, anytime

Some people, once a week

This is our visionBuilding the foundation to Build a 3B Company by FY20

Deployments as ML grew

Anyone at anytime

Some people, anytime

Some people, once a week

Only by all experts together, at 3 AM, on thursdays not covered by any “freeze”

GroundZero

Shifting to microservices

Frontend

API

Frontend CRMMobile apps

3rd party devsAPI API

This is our visionBuilding the foundation to Build a 3B Company by FY20

Ground zero properties

● Multiple technologies and frameworks (dev’s choice)

● Completely decoupled code in multiple Github repositories

● One DB for each app, multiple engines

● Complex infrastructure with possible high overhead

● QA, testing and Continuous Integrations is done by each team

● Independent deployments, environments and policies

● Open platform

“With great power comes great responsibility”.

Stan Lee

This is our visionBuilding the foundation to Build a 3B Company by FY20

Developer responsibilities● Developer gets ownership of entire dev cycle

● Massive empowerment of dev team -> OWNERSHIP

Manage resourcesVMs

Choose support systems required and create them

DevelopCodeChoose your technology and keep your Github repository

Test

Create tests, regressions or CI as needed

Ensure qualityDefine uptime

Define what “up” means for your own app (health.sh)

Measure

Create metrics to analyze performance and downtime

DBs and services

NetworkingCreate rules and loadbalancers to route traffic to application

Create & scale computing pools for dev/test/prod

React

Deploy

Write all routines for automatically deploying your app on any VM React to critical events

that affect your app

DevTools in ML

Developer

Melicloud API

- Create apps- Manage pools (test/prod)- Manage VMs & loadbalancers- Build & deploy

- Create queues- Create DBaaS or KVSaaS- Create caches

Github repo- Code app- Write test & deploy strategy- Write uptime definitions

Nginx

eventRouting & OpsGenie

- Write rules to route traffic to your pools

- Write rules to manage alarms- Define alarm escalation policies & schedules- Manage contact channels

Microservices in ML

Mobile apps

Module

Test app

CI

Main appAutomated build & store deployment

Repo

Team

Module

Test app

CI

Repo

Team

Module

Test app

CI

Repo

Team

Monitoring mobile apps

Module

Main app

Team

Module

Module

Crash reporting

Team

Team

Monitoring the cloud

This is our visionBuilding the foundation to Build a 3B Company by FY20

New Relic● Default monitoring in VMs golden image

● No configuration necessary (initially)

HTTP errorsUnhandled errors

See if other devs/clients misuse your entry params

Stack tracesFast debugging

See what’s going on in production

Unified pool data

All instances’ traces in the same place

Performance metricsTransaction traces

See what’s taking so long

Recognize deviations

Graphs to see if traffic or response time vary w/ respect to another period

Unsupported params

Other services

Detect down services affecting you

Unexpected issues appear in production

Apdex Score

This is our visionBuilding the foundation to Build a 3B Company by FY20

Datadog● Easy to use for different frameworks

● Good for business specific metrics

Custom metricsComplex metrics

Graphs filtered with different dimensions

Infra monitoringFull info

More data than NR on disk, memory, network

Scalable

Handles well aggregating information from many different VMs

Real time analysisFast response

Almost no latency

Dashboards

Customizable dashboards to show what’s more relevant for each app

Online filtering

Alarms

Flexible alarms based on custom metrics

You can send multiple parameters for events

This is our visionBuilding the foundation to Build a 3B Company by FY20

Log collection

● Logs are collected by an agent on all VMs

● They are sent to an ElasticSearch

● Access via a Kibana frontend

● Developers can use special syntax to create queryable

dimensions for all logged events

● All instances’ logs in the same place

● Request tracing through multiple applications/APIs

(request_id)

Alarms

Unified handling of events

health.sh

Code triggered alarms

eventRouting

This is our visionBuilding the foundation to Build a 3B Company by FY20

Event routing

● Rules added by each team

● Check alarm origin, type and importance

● Check “quiet hours”

● Assign escalation policy and forward to OpsGenie

This is our visionBuilding the foundation to Build a 3B Company by FY20

OpsGenie

● Manage teams to deal with escalation policies

● Set “on call” schedules (w/substitutes & manager escalation)

● Everyone manages his contact methods (SMS, mail, phone call, app)

Fury

This is our visionBuilding the foundation to Build a 3B Company by FY20

Evolution

Old world Ground zero Fury

This is our visionBuilding the foundation to Build a 3B Company by FY20

Fury: DevOps to NoOps

● Still microservices

● Full service oriented

● Easier dev cycle and learning curve

● Pre-assembled flavors for popular frameworks

● Less bash scripts, more UI based configuration

● Auto-scaling & auto-healing

● Docker based (smaller dev/prod environment gap)

● Designed to run on AWS

● Continuous integration already included

This is our visionBuilding the foundation to Build a 3B Company by FY20

Fury dashboard

This is our visionBuilding the foundation to Build a 3B Company by FY20

Dev Cycle in Fury: create app

● Creates repository

● Creates Jenkins CI server

● Creates network infra

This is our visionBuilding the foundation to Build a 3B Company by FY20

Dev Cycle in Fury: create scope

● Creates load balancer (ELB)

● Creates auto scaling group (ASG) for scope instances

● Creates instances

● Initialize logs & metrics services

● Download containers to instances

● Start traffic

This is our visionBuilding the foundation to Build a 3B Company by FY20

Dev Cycle in Fury: deploy

● Creates ASG for new version

● Create instances for new ASG

● Initialize logs & metrics services

● Download containers to instances

● Progressive traffic switch

● If candidate is OK, destroy

previous infrastructure

?

Thankyou!

top related