codigos em r
TRANSCRIPT
-
8/2/2019 Codigos Em R
1/10
Computao Estatstica II
Actividade 2
Conceio Leal
Operadores em R
Operadores em R
Operator Descrio
[ [[ Indexao
$ componente
^ Exponenciao
: Sequncia
% special % %% Operadores especiais (ex: 7%/%3 (maior inteiro que cabe na
diviso,7 %%3 ( Resto da diviso))
< > = == != Ordenao e comparao (menor, maior, menor ou igual, maior
ou igual, igual, diferente)
! Smbolo lgico de negao
& && Simbolo lgico de conjuno (AND)
| || Simbolo lgico de disjuno (OR)
~ Frmula
> ->> Atribuio (esquerda para a direita)
= Atribuio ao argumento (direita para a esquerda)
-
8/2/2019 Codigos Em R
2/10
Computao Estatstica II
Actividade 2
Conceio Leal
Function Converte os objetos em
as.numeric(x) # a numeric vector (integer or real). Factors converted to integers.
as.null(x) # a NULL
as.logical(x) # a logical vector. Values of >1 converted to TRUE, otherwise FALSE
as.character(x) # a character vector
as.vector(x) #a vector. All attributes (including names) are removed.
as.factor(x) # a factor. This is an abbreviated version of factor
as.matrix(x) # a matrix. Any non-numeric elements result in all matrix elements
being converted to character strings
as.list(x) # a list
as.data.frame(x) # a data frame. Matrix columns and list columns are converted into a
separate vectors of the data frame, and character vectors are
converted into factors. All previous attributes are removed
Manipulao de objetos Descriosubset(dados, condio, select=) # Subset a vector or data frame according to a set of
conditions
apply(x, INDEX, FUN) # Apply the function (FUN) to the margins (INDEX=1 is
rows,INDEX=2 is columns, INDEX=c(1,2) is both) of a vector
array or list (x)
tapply(x, factorlist, FUN) # Apply the function (FUN) to the vector (x) separately for
each combination of the list of factors
lapply(x, FUN) # Apply the function (FUN) to each element of the list x
replicate(n, EXP) # Re-evaluate the expression (EXP) n times. Differs from rep
function which repeats the result of a single evaluation
aggregate(x, by, FUN) # Splits data according to a combination of factors and
calculates summary statistics on each set
sort() # sort elements into order, by default omitting NAs
which.min(x) # Index of minimum element in x
which.max(x) # Index of maximum element in x
which(x == a) # Each of the elements of x is compared to the value of a and a
vector of indices for which the logical comparison is true is returned
match(x,y) # A vector of the same length as x with the indices of the first
occurance of each element of x within y
choose(n,k) # Computes the number of unique combinations in which k events
can be arranged in a sequence of n
combn(x,k) # List all the unique combinations in which the elements of x can be
arranged when taken k elements at a time
with(x,EXP) # Evaluate an expression (EXP) (typically a function) in an
environment defined by x
unique(x) # Removes duplicate values
cumsum(x) # Apresenta um vector cujos elementos so a soma acumulada dos
elementos do vetor, das colunas de uma matriz ou Data Frame,
baseada num grupo de variveis.
-
8/2/2019 Codigos Em R
3/10
Computao Estatstica II
Actividade 2
Conceio Leal
Indexao
Vectors Descrio
x[i] # Select the ith element
x[i:j] # Select the ith through jth elements inclusive
x[c(1,5,6,9)] # Select specific elements (see
x[-i] # Select all except the ith element
x["name"] # Select the element called "name"
x[x > 10] # Select all elements greater than 10
x[x > 10 & x < 20] # Select all elements between 10 and 20 (both conditions must be satisfied)
x[y == "value"] # Select all elements of x according to which y elements are equal to
value
x[x > 10 | y == "value"] # Select all elements which satisfy either condition
Matricies Descrio
x[i,j] # Select element in row i, column j
x[i,] # Select all elements in row i
x[,j] # Select all elements in column j
x[-i,] # Select all elements in each row other than the ith row
x["name",1:2] # Select columns 1 through to 2 for the row named "name"
x[x[,"Var1"]>4,] # Select all rows for which the value of the column named "Var1" is greater
than 4
x[,x[,"Var1"]=="value"] # Select all columns for which the value of the column named "Var1" is
equal to value
Listas Descrio
x[[i]] Select the ith object of the list
x[["value"]] Select the object named "value" from the list
x[["value"]][1:3] Select the first three elements of the object named "value" from the list
Data frames Descrio
Indexar por linha # Select the first 10 rows of each of the vectors in the data frame >x[1:10,]
(unidades amostrais) Select each of the vectors for the row called NOMEVETOR from the dataframe > x['NOMEVETOR',]
x[c(i,j),] # Select rows i and j for each column of the data frame
x[,"name"] # Select each row of the column named "name"
Indexing by columns # Select all rows but just the i-simo and j-simo vector of the data
(Variveis) frame : x[,c(i,j)]
x[["name"]] # Select the column named "name"
x$name # Refer to a vector named "name" within the data frame (x)
E[,c('X','Y')] # Select the X and Y vectors for all sites from the dataframe
Indexing by conditions # Selecionar dados da linha Z que tem no vetor X valores maiores que 3
> x[x$X>3,]
Selecionar dados com valor DADO do vetor Z que tem o valor do vetor Y
maior que 3 > x$X>3 & x$Z==['DADO',]
-
8/2/2019 Codigos Em R
4/10
Computao Estatstica II
Actividade 2
Conceio Leal
Classe dos vetores
Vetores Description
Integer
-
8/2/2019 Codigos Em R
5/10
Computao Estatstica II
Actividade 2
Conceio Leal
cumprod(x) # Apresenta um vector cujos elementos so o produto acumulado dos
elementos do vector x
sd(x) # Desvio padro amostral
cor(x,y) # Correlao amostral entre os vectores x e y
length(x) # Nmero de elementos do vector x
quantile(x,p) # Quantil p
paste(..., sep=) # Combine multiple vectors together after converting them into
character vectors
sample(x, size) #Randomly resample size number of elements from the x vector
without replacement. Use the option replace=TRUE to sample with
replacement.
substr(x, start, stop) #Extract substrings from a character vector
cut(x, breaks) # Creates a factor out of a vector by slicing the vector x up into
chunks. The option breaks is either a number indicating the number
of cuts or else a vector of cut values
levels(factor) # Lists the levels (in order) of a factor
tapply(x, factorlist, FUN) # Apply the function (FUN) to the vector (x) separately for each
combination of the list of factors
Classe das matrizes alguns aspetos
Funo Descrio
matrix(x, nrow = 5) #Matriz com 5 linhas formadas com os elementos do vetor xmatrix (x,c(5,2)) distribudos por coluna, em 5 linhas. Com a opo ncol=2, distribuiu
os valores de x por coluna, em duas colunas. Por defeito a matriz
preenchida por coluna. Se se pretender que seja preenchida por
linha: matrix(x, nrow = 5, byrow=T).
colnames(MX) ou rownames(MX) #Atribui nomes s colunas ou s linhas com os elementos de um
vetor de strings. Ex: colnames(MX)
-
8/2/2019 Codigos Em R
6/10
Computao Estatstica II
Actividade 2
Conceio Leal
summary(X) # Extrai a informao sobre todas as colunas da matriz: mnimo e
mximo, mdia, e quartis. A funo aplicada transposta da matriz
d o mesmo por linhas.
summary(as.numeric(A)) # d o sumrio do vetor formado por todos os elementos da m
colSums(X) ou RowSums(X) #D a soma de todos os elementos de cada coluna ou linha
Nota: As opes de atuao sobre matrizes podem ser consultadas em http://127.0.0.1:22773/library/base/html/max.col.html
Classe das listas
Funo Descrio
list () # Armazena colees de objetos que podem ser de diferentes tipo e
ter diferentes tamanhos
with() # do computation using columns of specified data frame
Classe Data Frame
Funo Descrio
data.frame() # Combina mltiplos vetores da mesma dimenso tal que cada vetor se
torna num vetor coluna. Os vetores podem ter tipos diferentes. Os vetores
com caracteres so transformados em fatores. Caso no se pretenda que
isso acontea, utiliza-se a funo I para alterar a classe do objeto. Ex: em
data.frame(x=c(1,2),a=c(A,B)),a um factor; em
data.frame(x=c(1,2),a=I(c(A,B)))a no fator.
attach(nomedataframe) # Permite tratar as colunas da data frame como objetos o independentes.
Esta opo simplifica a anlise dos elementos da data frame (em
alternativa: nomedataframe$coluna). Quando no for necessrio o acesso
direto s colunas, deve-se usar a funo detach(nomedataframe) para
desfazer esse efeito.
fix(nomedataframe) # apresenta a data frame com a forma de uma folha de clculo, onde
possvel introduzir todas as alteraes necessrias, dar nomes s colunas.
Grficos
Funo Descrio
plot(x) # if x is a numeric vectorthis form of the plot() function produces a time
series plot, a plot of x against index numbers.>plot(X)
plot(~x) # if x is a numeric vector this form of the plot() function produces a
stripchart for x. The same could be achieved with the stripplot() function.
The ~ indicates a formula in which the left side is modeled against the right.>plot(~x)
plot(x,y) # if x and y are numeric vectors this form of the plot() function produces a
scatterplot of y against x.
>plot(X,Y)
plot(y~expr) #if y is a numeric vectorand expr is an expression, this form of the
plot() functionplots y against each vector in the expression.
> plot(Y ~ X)
-
8/2/2019 Codigos Em R
7/10
Computao Estatstica II
Actividade 2
Conceio Leal
plot(xy) #if xy is a either a two-column matrixor a listcontaining the entries x
and y, this form of the plot() functionproduces a plot of y (column 2)against x (column 1).If x is numeric, this will be a scatterplot, otherwise it will be a boxplot.
> plot(XY)
plot(fact) # iffact is afactor vector, this form of the plot()function produces abar graph (bar chart) with the height of bars representing the number ofentries of each level of the factor. The same could be achieved with the
barplot() function.> plot(FATOR)
plot(fact, dv) # iffact is afactor vectorand dv is a numeric vector, this form of
the plot()function produces boxplots of dv for each level offact. The
same could be achieved with the boxplot()function.> plot(FATOR, X)
plot(dv~fact) # iffact is afactor vectorand dv is a numeric vector, this form oftheplot()function produces boxplots ofdvfor each level offact.
> plot(x ~ FATOR)> plot(X, Y, ylab = "Y coordinate",xlab = "")
pairs(matriz) # Grficos de disperso de matrizes de variveis ou frmulas (duas a duas)
boxplot(x, horizontal=T) # Diagrama de extremos e quartis (Caixa de Bigodes para um vetor ou
frmula, vertical ( por defeito) ou horizontal)
hist(x, breaks, prob=) # Histograma de frequncias (absolutas ou relativas)do vetor x. A opo
breaks especifica como e quantas classes so construdas podendo ser
atravs de um nmero ou de um vetor de pontos de quebra.
stem() # Diagrama de caule e folhas.
pie() # Grfico circular
abline(fit) # adiciona uma reta de regresso linear de um modelo ajustado.
qqnorm() # Grfico de probabilidade normalqqline() # Reta que, com o grfico anterior, permite analisar o ajustamento de um
conjunto de dados a uma distribuio normal (anlise de resduos)
line(density()) # Curva de ajustamento a uma distribuio emprica.
Parmetros da funo plot e outros grficos
xlim e ylim Descrio xlab e ylab Descrio
xlim=NULL # limites por defeito xlab=NULL #Nome dos vetores
xlim=c(a,b) # limite mnimo e mximo xlab="Designao" # Redefine o ttulo do eixo
xlab="" # Suprime o ttulo de eixoType (plot) Descrio log * Descrio
type="p" #Pontos log="x" # Log x-axis scale
type="l" # linhas log="y" # Log y-axis scale
type="b" # Pontos e linhas log="xy" # Log x-axis and y-axis scales
type="o" # Pontos sobre as linhas
type="h" # Histograms
type="s" # Degraus
type="n" #Sem pontos
*Nota: O parmetro log indica se ou quais os eixos devem ser representados em escala logartmica
-
8/2/2019 Codigos Em R
8/10
Computao Estatstica II
Actividade 2
Conceio Leal
Parmetros dos grficos tipo de linha
Parmetro Descrio
lty # The type of line. Specified as either a single integer in the range of 1 to 6 (for
predefined line types) or as a string of 2 or 4 numbers that define the relativelengths of dashes and spaces within a repeated sequence:
lty=1 lty=2 lty=3 lty=4 lty=5 lty=6 lty=7 lwd=1234 lwd=9111
lwd # The thickness of a line as a multiple of the default thickness (which is device
specific) lwd=0.5 lwd=0.75 lwd=1 lwd=2 lwd=4
Cores, ttulo e outras caractersticas
palette() # permite aceder s designaes das oito cores principais disponveis
colors() # permite aceder gama de cores disponveis por nome e por nmero
main= # Atribuir um ttulo ao grfico
ylab= and xlab= # Este argumento especifica os rtulos usados nos eixos vertical ehorizontal respetivamente.
xlim=NULL # limites por defeito
xlim=c(a,b) # limite mnimo e mximo
xlab=NULL #Nome dos vetores
xlab="" # Suprime o ttulo de eixo
xlab="Designao" # Redefine o ttulo do eixo
TRANSFORMAES NOS DADOS
Uma grande parte das ferramentas da inferncia paramtrica assenta no pressuposto da distribuio
normal dos dados. Quando este pressuposto no verificado, pode usar-se transformaes de escala
dos dados.
O objectivo da transformao de escala ento o de normalizar os dados de modo a satisfazer os
pressupostos subjacentes a uma anlise estatstica. Como tal, possvel aplicar qualquer funo aos
dados. No entanto, certos tipos de dados respondem mais favoravelmente a determinadas
transformaes, dado as suas caractersticas. As transformaes mais comuns so as que constam da
tabela seguinte:
Common data transformations.
Natureza dos dados Transformao R syntax
Medidas
(comprimentos,pesos, etc) loge log(x)log10 log(x, 10)
log10 log10(x)
log x+ 1 log(x+1)
Contagens (nmero de indivduos, etc sqrt(x)
Percentagens (devem ser propores) arcsin asin(sqrt(x))*180/pi
Nota: x is the name of the vector (variable) whose values are to be transformed.
-
8/2/2019 Codigos Em R
9/10
Computao Estatstica II
Actividade 2
Conceio Leal
MEDIDAS DE LOCALIZAO
Estimadores comuns de parmetros populacionais
Parameter Description R syntax
Estimates of LocationArithmetic mean () #The sum of the values divided by mean(X)
the number of values (n)
Trimmed mean #The arithmetic mean calculated mean(X, trim=0.05)
after a fraction (typically 0.05
or 5%) of the lower and upper
values have been discarded
Winsorized mean #The arithmetic mean is calculated library(psych)
after the trimmed values are winsor(X, trim=0.05)
replaced by the upper and
lower trimmed quantiles
Median #The middle value median(X)Minimum, maximum #Smallest and largest values min(X), max(X)
Estimates of Spread
Variance(2) #Average deviation of observations var(X)
from the mean
Standard deviation() #Square-root of variance sd(X)
Median absolute deviation #The median difference of mad(X)
observations from the median
value
Inter-quartile range #Difference between the 75% and IQR(X)
25% ranked observations
Precision and confidenceStandard error of )(
ysy #Precision of the estimate y y sd(X)/sqrt(length(X))
95% confidence intervalof #Interval with a 95% probability of library(gmodels)
containing the true mean ci(X)
NOTA:Only L-estimators are provided. L-estimators are linear combinations of weighted statistics on ordered values. M-estimators
(of which maximum likelihood is an example) are calculated as the minimum of some function(s).
-
8/2/2019 Codigos Em R
10/10
Computao Estatstica II
Actividade 2
Conceio Leal
TESTES DE HIPTESES
Testes de hipteses paramtricos verificam-se os pressupostos de normalidade e homogeneidade de
# Perform one-sample t-test
> t.test(DV, dataset)
# Perform (separate variances) independent-sample t-test
one-tailed (H > B)
> t.test(DV ~ FACTOR, dataset, alternative = "greater")
two-tailed (H0 : A = B)
> t.test(DV ~ FACTOR, dataset)
For pooled variances t-tests, include the var.equal=T argument
# Perform (separate variances) paired t-test
one-tailed (H0 : A > B)
> t.test(DV1, DV2, dataset, alternative = "greater")> t.test(DV ~ FACTOR, dataset, alternative = "greater",paired = T)
two-tailed (H0 : A = B)
> t.test(DV1, DV2, dataset)
> t.test(DV ~ FACTOR, dataset, paired = T)
for pooled variances t-tests, include the var.equal=T argument.Nota: Quando no se verificam os pressupostos pode tentar-se a transformao dos dados.
Observaes independentes ou emparelhadas, no homogeneidade de varincias
(Wilcoxon rank sum nonparametric test)
# Perform one-sample Wilcoxon (rank sum) test
> wilcox.test(DV, dataset)
#Perform independent-sample Mann-Whitney Wilcoxon test
one-tailed (H0 : A >)
> wilcox.test(DV ~ FACTOR, dataset, alternative = "greater")
two-tailed (H0 :A = B)
> wilcox.test(DV ~ FACTOR, dataset)
#Perform paired Wilcoxon (signed rank) test
one-tailed (H0 : A > B)
> wilcox.test(DV1,DV2, dataset, alternative="greater")
> #OR for long format
> wilcox.test(DV~FACTOR, dataset, alternative="greater", paired=T)
two-tailed (H0 : A = B)
> wilcox.test(DV1, DV2, dataset)
> wilcox.test(DV ~ FACTOR, dataset, paired = T)
Adaptado deLogan, Murray (2010), Biostatistical Design and Analysis Using R , A Practical Guide, John Wiley & Sons, Inc.,
Publication