codigos em r

Upload: sandra-vecoso

Post on 05-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Codigos Em R

    1/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    Operadores em R

    Operadores em R

    Operator Descrio

    [ [[ Indexao

    $ componente

    ^ Exponenciao

    : Sequncia

    % special % %% Operadores especiais (ex: 7%/%3 (maior inteiro que cabe na

    diviso,7 %%3 ( Resto da diviso))

    < > = == != Ordenao e comparao (menor, maior, menor ou igual, maior

    ou igual, igual, diferente)

    ! Smbolo lgico de negao

    & && Simbolo lgico de conjuno (AND)

    | || Simbolo lgico de disjuno (OR)

    ~ Frmula

    > ->> Atribuio (esquerda para a direita)

    = Atribuio ao argumento (direita para a esquerda)

  • 8/2/2019 Codigos Em R

    2/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    Function Converte os objetos em

    as.numeric(x) # a numeric vector (integer or real). Factors converted to integers.

    as.null(x) # a NULL

    as.logical(x) # a logical vector. Values of >1 converted to TRUE, otherwise FALSE

    as.character(x) # a character vector

    as.vector(x) #a vector. All attributes (including names) are removed.

    as.factor(x) # a factor. This is an abbreviated version of factor

    as.matrix(x) # a matrix. Any non-numeric elements result in all matrix elements

    being converted to character strings

    as.list(x) # a list

    as.data.frame(x) # a data frame. Matrix columns and list columns are converted into a

    separate vectors of the data frame, and character vectors are

    converted into factors. All previous attributes are removed

    Manipulao de objetos Descriosubset(dados, condio, select=) # Subset a vector or data frame according to a set of

    conditions

    apply(x, INDEX, FUN) # Apply the function (FUN) to the margins (INDEX=1 is

    rows,INDEX=2 is columns, INDEX=c(1,2) is both) of a vector

    array or list (x)

    tapply(x, factorlist, FUN) # Apply the function (FUN) to the vector (x) separately for

    each combination of the list of factors

    lapply(x, FUN) # Apply the function (FUN) to each element of the list x

    replicate(n, EXP) # Re-evaluate the expression (EXP) n times. Differs from rep

    function which repeats the result of a single evaluation

    aggregate(x, by, FUN) # Splits data according to a combination of factors and

    calculates summary statistics on each set

    sort() # sort elements into order, by default omitting NAs

    which.min(x) # Index of minimum element in x

    which.max(x) # Index of maximum element in x

    which(x == a) # Each of the elements of x is compared to the value of a and a

    vector of indices for which the logical comparison is true is returned

    match(x,y) # A vector of the same length as x with the indices of the first

    occurance of each element of x within y

    choose(n,k) # Computes the number of unique combinations in which k events

    can be arranged in a sequence of n

    combn(x,k) # List all the unique combinations in which the elements of x can be

    arranged when taken k elements at a time

    with(x,EXP) # Evaluate an expression (EXP) (typically a function) in an

    environment defined by x

    unique(x) # Removes duplicate values

    cumsum(x) # Apresenta um vector cujos elementos so a soma acumulada dos

    elementos do vetor, das colunas de uma matriz ou Data Frame,

    baseada num grupo de variveis.

  • 8/2/2019 Codigos Em R

    3/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    Indexao

    Vectors Descrio

    x[i] # Select the ith element

    x[i:j] # Select the ith through jth elements inclusive

    x[c(1,5,6,9)] # Select specific elements (see

    x[-i] # Select all except the ith element

    x["name"] # Select the element called "name"

    x[x > 10] # Select all elements greater than 10

    x[x > 10 & x < 20] # Select all elements between 10 and 20 (both conditions must be satisfied)

    x[y == "value"] # Select all elements of x according to which y elements are equal to

    value

    x[x > 10 | y == "value"] # Select all elements which satisfy either condition

    Matricies Descrio

    x[i,j] # Select element in row i, column j

    x[i,] # Select all elements in row i

    x[,j] # Select all elements in column j

    x[-i,] # Select all elements in each row other than the ith row

    x["name",1:2] # Select columns 1 through to 2 for the row named "name"

    x[x[,"Var1"]>4,] # Select all rows for which the value of the column named "Var1" is greater

    than 4

    x[,x[,"Var1"]=="value"] # Select all columns for which the value of the column named "Var1" is

    equal to value

    Listas Descrio

    x[[i]] Select the ith object of the list

    x[["value"]] Select the object named "value" from the list

    x[["value"]][1:3] Select the first three elements of the object named "value" from the list

    Data frames Descrio

    Indexar por linha # Select the first 10 rows of each of the vectors in the data frame >x[1:10,]

    (unidades amostrais) Select each of the vectors for the row called NOMEVETOR from the dataframe > x['NOMEVETOR',]

    x[c(i,j),] # Select rows i and j for each column of the data frame

    x[,"name"] # Select each row of the column named "name"

    Indexing by columns # Select all rows but just the i-simo and j-simo vector of the data

    (Variveis) frame : x[,c(i,j)]

    x[["name"]] # Select the column named "name"

    x$name # Refer to a vector named "name" within the data frame (x)

    E[,c('X','Y')] # Select the X and Y vectors for all sites from the dataframe

    Indexing by conditions # Selecionar dados da linha Z que tem no vetor X valores maiores que 3

    > x[x$X>3,]

    Selecionar dados com valor DADO do vetor Z que tem o valor do vetor Y

    maior que 3 > x$X>3 & x$Z==['DADO',]

  • 8/2/2019 Codigos Em R

    4/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    Classe dos vetores

    Vetores Description

    Integer

  • 8/2/2019 Codigos Em R

    5/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    cumprod(x) # Apresenta um vector cujos elementos so o produto acumulado dos

    elementos do vector x

    sd(x) # Desvio padro amostral

    cor(x,y) # Correlao amostral entre os vectores x e y

    length(x) # Nmero de elementos do vector x

    quantile(x,p) # Quantil p

    paste(..., sep=) # Combine multiple vectors together after converting them into

    character vectors

    sample(x, size) #Randomly resample size number of elements from the x vector

    without replacement. Use the option replace=TRUE to sample with

    replacement.

    substr(x, start, stop) #Extract substrings from a character vector

    cut(x, breaks) # Creates a factor out of a vector by slicing the vector x up into

    chunks. The option breaks is either a number indicating the number

    of cuts or else a vector of cut values

    levels(factor) # Lists the levels (in order) of a factor

    tapply(x, factorlist, FUN) # Apply the function (FUN) to the vector (x) separately for each

    combination of the list of factors

    Classe das matrizes alguns aspetos

    Funo Descrio

    matrix(x, nrow = 5) #Matriz com 5 linhas formadas com os elementos do vetor xmatrix (x,c(5,2)) distribudos por coluna, em 5 linhas. Com a opo ncol=2, distribuiu

    os valores de x por coluna, em duas colunas. Por defeito a matriz

    preenchida por coluna. Se se pretender que seja preenchida por

    linha: matrix(x, nrow = 5, byrow=T).

    colnames(MX) ou rownames(MX) #Atribui nomes s colunas ou s linhas com os elementos de um

    vetor de strings. Ex: colnames(MX)

  • 8/2/2019 Codigos Em R

    6/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    summary(X) # Extrai a informao sobre todas as colunas da matriz: mnimo e

    mximo, mdia, e quartis. A funo aplicada transposta da matriz

    d o mesmo por linhas.

    summary(as.numeric(A)) # d o sumrio do vetor formado por todos os elementos da m

    colSums(X) ou RowSums(X) #D a soma de todos os elementos de cada coluna ou linha

    Nota: As opes de atuao sobre matrizes podem ser consultadas em http://127.0.0.1:22773/library/base/html/max.col.html

    Classe das listas

    Funo Descrio

    list () # Armazena colees de objetos que podem ser de diferentes tipo e

    ter diferentes tamanhos

    with() # do computation using columns of specified data frame

    Classe Data Frame

    Funo Descrio

    data.frame() # Combina mltiplos vetores da mesma dimenso tal que cada vetor se

    torna num vetor coluna. Os vetores podem ter tipos diferentes. Os vetores

    com caracteres so transformados em fatores. Caso no se pretenda que

    isso acontea, utiliza-se a funo I para alterar a classe do objeto. Ex: em

    data.frame(x=c(1,2),a=c(A,B)),a um factor; em

    data.frame(x=c(1,2),a=I(c(A,B)))a no fator.

    attach(nomedataframe) # Permite tratar as colunas da data frame como objetos o independentes.

    Esta opo simplifica a anlise dos elementos da data frame (em

    alternativa: nomedataframe$coluna). Quando no for necessrio o acesso

    direto s colunas, deve-se usar a funo detach(nomedataframe) para

    desfazer esse efeito.

    fix(nomedataframe) # apresenta a data frame com a forma de uma folha de clculo, onde

    possvel introduzir todas as alteraes necessrias, dar nomes s colunas.

    Grficos

    Funo Descrio

    plot(x) # if x is a numeric vectorthis form of the plot() function produces a time

    series plot, a plot of x against index numbers.>plot(X)

    plot(~x) # if x is a numeric vector this form of the plot() function produces a

    stripchart for x. The same could be achieved with the stripplot() function.

    The ~ indicates a formula in which the left side is modeled against the right.>plot(~x)

    plot(x,y) # if x and y are numeric vectors this form of the plot() function produces a

    scatterplot of y against x.

    >plot(X,Y)

    plot(y~expr) #if y is a numeric vectorand expr is an expression, this form of the

    plot() functionplots y against each vector in the expression.

    > plot(Y ~ X)

  • 8/2/2019 Codigos Em R

    7/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    plot(xy) #if xy is a either a two-column matrixor a listcontaining the entries x

    and y, this form of the plot() functionproduces a plot of y (column 2)against x (column 1).If x is numeric, this will be a scatterplot, otherwise it will be a boxplot.

    > plot(XY)

    plot(fact) # iffact is afactor vector, this form of the plot()function produces abar graph (bar chart) with the height of bars representing the number ofentries of each level of the factor. The same could be achieved with the

    barplot() function.> plot(FATOR)

    plot(fact, dv) # iffact is afactor vectorand dv is a numeric vector, this form of

    the plot()function produces boxplots of dv for each level offact. The

    same could be achieved with the boxplot()function.> plot(FATOR, X)

    plot(dv~fact) # iffact is afactor vectorand dv is a numeric vector, this form oftheplot()function produces boxplots ofdvfor each level offact.

    > plot(x ~ FATOR)> plot(X, Y, ylab = "Y coordinate",xlab = "")

    pairs(matriz) # Grficos de disperso de matrizes de variveis ou frmulas (duas a duas)

    boxplot(x, horizontal=T) # Diagrama de extremos e quartis (Caixa de Bigodes para um vetor ou

    frmula, vertical ( por defeito) ou horizontal)

    hist(x, breaks, prob=) # Histograma de frequncias (absolutas ou relativas)do vetor x. A opo

    breaks especifica como e quantas classes so construdas podendo ser

    atravs de um nmero ou de um vetor de pontos de quebra.

    stem() # Diagrama de caule e folhas.

    pie() # Grfico circular

    abline(fit) # adiciona uma reta de regresso linear de um modelo ajustado.

    qqnorm() # Grfico de probabilidade normalqqline() # Reta que, com o grfico anterior, permite analisar o ajustamento de um

    conjunto de dados a uma distribuio normal (anlise de resduos)

    line(density()) # Curva de ajustamento a uma distribuio emprica.

    Parmetros da funo plot e outros grficos

    xlim e ylim Descrio xlab e ylab Descrio

    xlim=NULL # limites por defeito xlab=NULL #Nome dos vetores

    xlim=c(a,b) # limite mnimo e mximo xlab="Designao" # Redefine o ttulo do eixo

    xlab="" # Suprime o ttulo de eixoType (plot) Descrio log * Descrio

    type="p" #Pontos log="x" # Log x-axis scale

    type="l" # linhas log="y" # Log y-axis scale

    type="b" # Pontos e linhas log="xy" # Log x-axis and y-axis scales

    type="o" # Pontos sobre as linhas

    type="h" # Histograms

    type="s" # Degraus

    type="n" #Sem pontos

    *Nota: O parmetro log indica se ou quais os eixos devem ser representados em escala logartmica

  • 8/2/2019 Codigos Em R

    8/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    Parmetros dos grficos tipo de linha

    Parmetro Descrio

    lty # The type of line. Specified as either a single integer in the range of 1 to 6 (for

    predefined line types) or as a string of 2 or 4 numbers that define the relativelengths of dashes and spaces within a repeated sequence:

    lty=1 lty=2 lty=3 lty=4 lty=5 lty=6 lty=7 lwd=1234 lwd=9111

    lwd # The thickness of a line as a multiple of the default thickness (which is device

    specific) lwd=0.5 lwd=0.75 lwd=1 lwd=2 lwd=4

    Cores, ttulo e outras caractersticas

    palette() # permite aceder s designaes das oito cores principais disponveis

    colors() # permite aceder gama de cores disponveis por nome e por nmero

    main= # Atribuir um ttulo ao grfico

    ylab= and xlab= # Este argumento especifica os rtulos usados nos eixos vertical ehorizontal respetivamente.

    xlim=NULL # limites por defeito

    xlim=c(a,b) # limite mnimo e mximo

    xlab=NULL #Nome dos vetores

    xlab="" # Suprime o ttulo de eixo

    xlab="Designao" # Redefine o ttulo do eixo

    TRANSFORMAES NOS DADOS

    Uma grande parte das ferramentas da inferncia paramtrica assenta no pressuposto da distribuio

    normal dos dados. Quando este pressuposto no verificado, pode usar-se transformaes de escala

    dos dados.

    O objectivo da transformao de escala ento o de normalizar os dados de modo a satisfazer os

    pressupostos subjacentes a uma anlise estatstica. Como tal, possvel aplicar qualquer funo aos

    dados. No entanto, certos tipos de dados respondem mais favoravelmente a determinadas

    transformaes, dado as suas caractersticas. As transformaes mais comuns so as que constam da

    tabela seguinte:

    Common data transformations.

    Natureza dos dados Transformao R syntax

    Medidas

    (comprimentos,pesos, etc) loge log(x)log10 log(x, 10)

    log10 log10(x)

    log x+ 1 log(x+1)

    Contagens (nmero de indivduos, etc sqrt(x)

    Percentagens (devem ser propores) arcsin asin(sqrt(x))*180/pi

    Nota: x is the name of the vector (variable) whose values are to be transformed.

  • 8/2/2019 Codigos Em R

    9/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    MEDIDAS DE LOCALIZAO

    Estimadores comuns de parmetros populacionais

    Parameter Description R syntax

    Estimates of LocationArithmetic mean () #The sum of the values divided by mean(X)

    the number of values (n)

    Trimmed mean #The arithmetic mean calculated mean(X, trim=0.05)

    after a fraction (typically 0.05

    or 5%) of the lower and upper

    values have been discarded

    Winsorized mean #The arithmetic mean is calculated library(psych)

    after the trimmed values are winsor(X, trim=0.05)

    replaced by the upper and

    lower trimmed quantiles

    Median #The middle value median(X)Minimum, maximum #Smallest and largest values min(X), max(X)

    Estimates of Spread

    Variance(2) #Average deviation of observations var(X)

    from the mean

    Standard deviation() #Square-root of variance sd(X)

    Median absolute deviation #The median difference of mad(X)

    observations from the median

    value

    Inter-quartile range #Difference between the 75% and IQR(X)

    25% ranked observations

    Precision and confidenceStandard error of )(

    ysy #Precision of the estimate y y sd(X)/sqrt(length(X))

    95% confidence intervalof #Interval with a 95% probability of library(gmodels)

    containing the true mean ci(X)

    NOTA:Only L-estimators are provided. L-estimators are linear combinations of weighted statistics on ordered values. M-estimators

    (of which maximum likelihood is an example) are calculated as the minimum of some function(s).

  • 8/2/2019 Codigos Em R

    10/10

    Computao Estatstica II

    Actividade 2

    Conceio Leal

    TESTES DE HIPTESES

    Testes de hipteses paramtricos verificam-se os pressupostos de normalidade e homogeneidade de

    # Perform one-sample t-test

    > t.test(DV, dataset)

    # Perform (separate variances) independent-sample t-test

    one-tailed (H > B)

    > t.test(DV ~ FACTOR, dataset, alternative = "greater")

    two-tailed (H0 : A = B)

    > t.test(DV ~ FACTOR, dataset)

    For pooled variances t-tests, include the var.equal=T argument

    # Perform (separate variances) paired t-test

    one-tailed (H0 : A > B)

    > t.test(DV1, DV2, dataset, alternative = "greater")> t.test(DV ~ FACTOR, dataset, alternative = "greater",paired = T)

    two-tailed (H0 : A = B)

    > t.test(DV1, DV2, dataset)

    > t.test(DV ~ FACTOR, dataset, paired = T)

    for pooled variances t-tests, include the var.equal=T argument.Nota: Quando no se verificam os pressupostos pode tentar-se a transformao dos dados.

    Observaes independentes ou emparelhadas, no homogeneidade de varincias

    (Wilcoxon rank sum nonparametric test)

    # Perform one-sample Wilcoxon (rank sum) test

    > wilcox.test(DV, dataset)

    #Perform independent-sample Mann-Whitney Wilcoxon test

    one-tailed (H0 : A >)

    > wilcox.test(DV ~ FACTOR, dataset, alternative = "greater")

    two-tailed (H0 :A = B)

    > wilcox.test(DV ~ FACTOR, dataset)

    #Perform paired Wilcoxon (signed rank) test

    one-tailed (H0 : A > B)

    > wilcox.test(DV1,DV2, dataset, alternative="greater")

    > #OR for long format

    > wilcox.test(DV~FACTOR, dataset, alternative="greater", paired=T)

    two-tailed (H0 : A = B)

    > wilcox.test(DV1, DV2, dataset)

    > wilcox.test(DV ~ FACTOR, dataset, paired = T)

    Adaptado deLogan, Murray (2010), Biostatistical Design and Analysis Using R , A Practical Guide, John Wiley & Sons, Inc.,

    Publication