introduo estatstica estatstica? e estatstica descritiva ec/files_1011/week 02 -...

Download Introduo  Estatstica ESTATSTICA? e Estatstica Descritiva ec/files_1011/week 02 - descriptive...jlborges@fe.up.pt Introduo  Estatstica e Estatstica Descritiva 0 jlborges@fe.up.pt ESTATSTICA? Um conjunto de procedimentos e princpios para

Post on 04-May-2018

213 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • jlborges@fe.up.pt

    Introduo Estatsticae

    Estatstica Descritiva

    0

    jlborges@fe.up.pt

    ESTATSTICA?

    Um conjunto de procedimentos e princpios para recolha, compilao, anlise e interpretao de dados por forma a ajudar na tomada de decises quando na presena de incerteza.

    1

    jlborges@fe.up.pt

    Herbert George Wells,Herbert George Wells, English author, said (circa 1940 ),

    Statistical thinking will one day be as necessary for efficient citizenship as the abilitynecessary for efficient citizenship as the ability to read and write

    2

    jlborges@fe.up.pt

    Average depth 3ft (0 9144 )(0.9144m)

    3

  • jlborges@fe.up.pt

    Why do we need to understand statistics?

    Reasoning with Uncertainty

    from

    Peter Donnelly: How juries are fooled by statistics http://www.ted.com/index.php/talks/view/id/67

    4

    jlborges@fe.up.pt

    Ex 1 Coin TossingEx 1 - Coin Tossing

    Imagine tossing a coin successively, and waiting till the first time a particular pattern appears, say HTT

    For example, if the sequence of tosses was

    HHTHHTHHTTHHTTTHTHHHTHHTHHTTHHTTTHTH

    Th tt HTT ld fi t ft th 10th t The pattern HTT would first appear after the 10th toss

    5

    jlborges@fe.up.ptEx 1 - Coin Tossing

    Imagine that half of you toss a coin several times, g y ,each time till the sequence HTT occurs.

    Record the average number of tosses till HTT occurs

    The other half of you prefer to count HTH The other half of you prefer to count HTH

    Record the average number of tosses till HTH occurs

    6

    jlborges@fe.up.pt

    f f

    Ex 1 - Coin Tossing

    Which of the following is true:

    A. The average number of tosses until HTH is larger thanthe average number of tosses until HTT

    B. The average number of tosses until HTH is the same asthe average number of tosses until HTT

    C. The average number of tosses until HTH is smaller thanthe average number of tosses until HTT

    Most people think that B is true but A is true The averageMost people think that B is true but A is true. The average number of tosses till HTH is 10 and the average number of tosses till HTT is 8.

    7

    tosses till HTT is 8.

  • jlborges@fe.up.ptEx 1 - Coin Tossing

    Intuitive explanation: Imagine that you win if HTH occursImagine that you win if HTH occurs

    If the first toss gives a H you are exited and you get even more exited if the second is a T. If the third is H you win ybut if it is a T you have to start again and wait for the next H.

    If you win when HTT occursy For the first two tosses the experience is the same.

    However, if the third toss is a H you loose but you already have the first H and are 1/3 of the way to your pattern.

    8

    jlborges@fe.up.ptEx 1 - Coin Tossing

    It l fIt was an example of a

    simple question on

    probabilities that most

    people get wrong.

    9

    jlborges@fe.up.pt

    Conclusions from the examplesConclusions from the examples

    Randomness, uncertainty and chance are part of our life.

    People make errors of logic when reasoning with uncertaintywith uncertainty.

    Errors in statistics may have serious consequences.

    It is very important to understand statistics!

    10

    statistics!

    jlborges@fe.up.pt

    What is the problem here?

    On average the gtemperature is very nicevery nice...

    11

  • jlborges@fe.up.pt

    Estatstica DescritivaEstatstica Descritiva

    Na estatstica descritiva procura-se sintetizar e t d f l i f representar de forma compreensvel a informao

    contida num conjunto de dados (atravs da t d t b l d fi d l l dconstruo de tabelas, de grficos ou do clculo de

    medidas)

    12

    jlborges@fe.up.pt

    Objectivo da estatstica descritiva: sntese da informao contida em dados

    13

    jlborges@fe.up.pt

    Exemplo: notas a uma determinada disciplinae p o otas a u a dete ada d sc p a

    14

    jlborges@fe.up.pt

    Mdia 10.52Mediana 10 51Mediana 10.51Amplitude 16.29Mximo 17 67Mximo 17.67Mnimo 1.38Quartil 25% 9.068Quartil 75% 12.68Desvio Padro 3.208Varincia 10 291Varincia 10.291Assimetria -0.25

    15

  • jlborges@fe.up.pt

    Mdia e Mediana

    Exemplox ( ) 25.118/21014711141210 =+++++++=x

    Mdia:

    1012

    ( )

    14117

    Mediana:

    7 10 10 11 12 12 14 1471410

    7 10 10 11 12 12 14 14

    11.51012

    16

    jlborges@fe.up.pt

    Mdia e Mediana

    Exemplox ( ) 5.348/210200711141210 =+++++++=x

    Mdia:

    1012

    ( )Mediana:

    14117

    7 10 10 11 12 12 14 200

    11 5720010

    11.5

    Mdia mais sensvel a valores extremos!1012 ex. Salrio mdio vs. Salrio mediano

    17

    jlborges@fe.up.pt

    Varincia e Desvio Padro

    Para inferir da variabilidade de uma populao a partir de uma amostra usa-se a varincia amostral (s2)

    ( ) =N

    xxs 22 1

    O d i d t l ( ) i

    ( )=

    =n

    n xxNs

    11

    O desvio padro amostral (s), raiz quadrada da varincia amostral, tem a vantagem de ser expresso nas mesmas unidades dos dados

    ( )21N

    18

    ( )21

    11 nn

    s x xN =

    =

    jlborges@fe.up.pt

    Exemplo: Calcular o desvio padro da seguinte amostra: - 4 , -3 , -2 , 3 , 5

    ( )2Xi X XXi ( )2XXi -4 -0,2 -3,8 14,44

    -3 -2,8 7,84

    -2 -1,8 3,24

    3 3,2 10,24

    5 5,2 27,04, ,

    Soma= 62.8

    Sabemos que n = 5 e 62,8 / (5-1) = 15,7A raiz quadrada de 15 7 o desvio padro = 3 96

    19

    A raiz quadrada de 15,7 o desvio padro 3,96

  • jlborges@fe.up.pt

    Histograma das classificaes

    20

    http://www.stat.tamu.edu/~west/javahtml/Histogram.html

    jlborges@fe.up.pt

    Coeficiente de assimetria (g1)Coeficiente de assimetria (g1)

    ( ) ( )

    == =

    N

    nn xxNNN

    NkCOMskg

    1

    32

    333

    1 )(1

    21,

    g1 = 0

    21

    g1 > 0 g1 < 0

    jlborges@fe.up.pt

    Box-Plot: permite comparar as classificaes de 3 anos de Mest

    percentil 75%

    mediana

    percentilpercentil 25%

    22

    jlborges@fe.up.pt

    Box Plot of Home Runs per Season for

    60.0

    70.0

    Box Plot of Home Runs per Season for4 Great Players When They Were NY Yankees

    30.0

    40.0

    50.0

    e Ru

    ns0 0

    10.0

    20.0Hom

    e70.0

    Box Plot of Home Runs per Season for4 Great Players for Their Entire Careers

    -10.0

    0.0

    Ruth_Y Mantle_Y Gehrig_Y Maris_YPLAYERS 50.0

    60.0

    70.0

    20.0

    30.0

    40.0

    Hom

    e Ru

    ns

    0.0

    10.0

    H

    23

    -10.0Ruth Gehrig Mantle Maris

    PLAYERS

  • jlborges@fe.up.pt

    Amostras bivariadas dados quantitativosq

    A relao existente entre os dois atributos de uma amostra bivariada com dados quantitativos pode ser evidenciada por um diagrama (X Y)com dados quantitativos pode ser evidenciada por um diagrama (X,Y)ou, de forma mais sinttica, pelo clculo do grau de ajuste de determinada relao

    LOTE VOLUME DE PRODUO

    CUSTO DE PRODUO

    (unidades) (contos)

    1 1500 3100 2 800 1900 3 2600 4200 4 1000 23004 1000 2300 5 600 1200 6 2800 4900 7 1200 2800 8 900 2100 9 400 1400

    10 1300 2400 11 1200 2400

    24

    11 1200 2400 12 2000 3800

    jlborges@fe.up.pt

    A relao entre duas variveis pode ser ilustrada atravs de um diagrama(x,y) - scatterplot

    5000

    6000

    o

    3000

    4000

    de p

    rodu

    1000

    2000

    Cus

    to

    00 500 1000 1500 2000 2500 3000

    Volume de produo

    25

    jlborges@fe.up.pt

    matriz de scatter plotsp

    26

    jlborges@fe.up.pt

    Um scatterplot permite analisar o relacionamento geral e a existncia de desvios entre duas variveis.

    Por vezes interessa caracterizar a relao entre duas variveis e medirPor vezes interessa caracterizar a relao entre duas variveis e medir o respectivo grau de ajuste.

    Vamos ver o exemplo a relao linear

    27

    Vamos ver o exemplo a relao linear.

  • jlborges@fe.up.pt

    Medidas do grau de ajustamento da relao linear aos dados:

    Covarincia amostral (permite inferir acerca da populao)Covarincia amostral (permite inferir acerca da populao)

    ( ) ( )=

    =N

    nnnXY yyxxN

    c11

    1

    ( ) ( )1 N

    yyxx

    Coeficiente de correlao amostral (medida adimensional)

    ( ) ( )

    ( ) ( )( )11

    11

    11

    1

    1

    2

    1

    2

    1

    =

    =

    ==

    =XY

    YX

    XYN

    nn

    N

    nn

    nnn

    XY rssc

    yyN

    xxN

    yyxxNr

    28

    11 == nn

    jlborges@fe.up.pt

    x y 1000x 1000y1500 3100 1500000 3100000800 1900 800000 1900000

    2600 4200 2600000 4200000

    y

    60002600 4200 2600000 42000001000 2300 1000000 2300000600 1200 600000 1200000

    2800 4900 2800000 49000001200 2800 1200000 2800000 3000

    4000

    5000

    6000

    y1200 2800 1200000 2800000900 2100 900000 2100000400 1400 400000 1400000

    1300 2400 1300000 24000001200 2400 1200000 2400000 0

    1000

    2000

    3000 y

    1200 2400 1200000 24000002000 3800 2000000 3800000

    correl: 0.9811009 0.9811009cov: 757847.22 7.578E+11

    00 500 1000 1500 2000 2500 3000

    1000y

    4000000

    5000000

    6000000

    1000000

    2000000

    3000000 1000yPara uma determinada relao a mudana de escala altera o valor da covarincia.

    29

    00 500000 1000000 1500000 2000000 2500000 3000000

    jlborges@fe.up.pt

    30

    jlborges@fe.up.pt

    http://bcs.whfreeman.com/ips4e/cat_010/applets/CorrelationRegression.html

    31

  • jlborges@fe.up.pt

    9657

    Expresso 18 Jan.

Recommended

View more >