artigo cronbach 1947 - validade

Upload: marcos-hirata-soares

Post on 02-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 artigo cronbach 1947 - validade

    1/16

    PSYCHOMETRIKA--VOL. 12, NO. 1MARCIa, 194 7

    T E S T " R E L I A B I L I T Y " : I T S M E A N I N G A N D D E T E R M I N A T I O NL E E J. C R O N B A C H

    UNIVERSITY OF CHICAGOT h e c o n c e p t o f t e s t r e l i a b i l i ty i s e x a m i n e d i n t e r m s o f g e n e r a l ,g r o u p , a n d s p ec i fi c f a c t o r s a m o n g t h e i te m s , a n d t h e s t a b i l i t y o f

    s c o r e s i n t h e s e f a c t o r s f r o m t r i a l t o t r i a l . F o u r e s s e n t i a l l y d i f f e r e n td e f in i t io n s o f r e l i a b i l i t y a r e d i s t in g u i s h e d , w h i c h m a y b e c a ll e d t h eh y p o t h e t i c a l s e l f - c o r r e l a t i o n , t h e c o e f f ic i e n t o f e q u i v a l e n c e , t h e c o -e f f i c i e n t o f s t a b i l i t y , a n d t h e c o e f fi c ie n t o f s t a b i l i t y a n d e q u i v a l e n c e .T h e p o s s i b i l i t y o f e s t i m a t i n g e a c h o f t h e s e c o e f f ic i e n ts i s d i s c u s s e d .T h e c o e ff ic i en t s a r e n o t i n t e r c h a n g e a b l e a n d h a v e d i f f e r e n t v a l u e s i nc o r r e c t i o n s f o r a t t e n t u a t i o n , s t a n d a r d e r r o r s o f m e a s u r e m e n t , a n do t h e r p r a c t i c a l a p p l i c a t io n s .T h e l i t e r a t u r e o f t e s t i n g c o n t a i n s m a n y d i s c u s s io n s o f t e s t r e-

    l ia b i li ty . E a c h y e a r , n e w f o r m u l a t i o n s a r e o f fe r e d , a n d n e w p r o c e -d u r e s f o r e s t im a t i n g r e l i a b i l i ty a r e ch a m p i o n e d . T h e r e a p p e a r s t oh a v e d e v e l o p e d n o u n i v e r s a l l y a c c e p t e d p r o c e d u r e , a n d s e v e r a l w r i t -e r s h a v e a t t r i b u t e d t h i s d i f f ic u l ty t o t h e d i v e r s i t y o f d e f in i t i o n s f o rr e l ia b i l i t y n o w i n u s e. I t h a s o f t e n b e e n s u g g e s t e d t h a t p e r h a p s t h eo n l y e ff e c t iv e w a y t o r e s o l v e th e c o n f l i c ts a m o n g c o n t e n d i n g v i e w -p o i n t s i s t o r e p l a c e t h e t e r m " r e l i a b i l i t y , " r e c o g n i z i n g t h a t i t c o v e r sn o t on e , b u t s e v e r a l c o n c ep t s. T h e p r e s e n t p a p e r a t t e m p t s t o r e s t a t et h e c o n f l i c ti n g c o n c e p t s a n d a s s u m p t i o n s n o w c u r r e n t , a n d t o o f f e r as c h e m e f o r s e p a r a t i n g t h e v a r i o u s a s p e c t s o f d e p e n d a b i l i t y o f m e a s -u r e m e n t .

    T h e p h y s i c a l s c i e n t i s t g e n e r a l l y h a s e x p r e s s e d t h e a c c u r a c y o fh is o b s e r v a t i o n s i n t e r m s o f t h e v a r i a t i o n o f r e p e a t e d o b s e r v a t i o n so f t h e s a m e e v e n t . T h e m e a n o f t h e s q u a r e d d e v i a t io n s o f t h e s e o b-s e r v a t i o n s a b o u t t h e o b ta i n e d m e a n i s t h e " e r r o r v a r i a n c e ." T h i s isa m e a s u r e o f p r e c i s i o n o r r e li a b il i t y . I f f o r th e p r e s e n t w e r e g a r dr e l i a b i l i t y a s th e c o n s i s t e n c y o f r e p e a t e d m e a s u r e m e n t s o f t h e s a m ee v e n t b y th e s a m e p r o c e ss , tw o f u n d a m e n t a l d i ff e re n c e s b e t w e e n t h ep r o b l e m o f t h e p h y s i c a l s c i e n t i s t a n d t h e p s y c h o l o g i s t a p p e a r . T h ep h y s ic a l s c i e n t i s t m a k e s t w o a s s u m p t i o n s , b o t h o f w h ic h a r e a d e -q u a t e l y t r u e f o r h i m . F i r s t , h e a s s u m e s t h a t t h e e n t i ty b e i n g m e a s -u r e d d o e s n o t c h a n g e d u r i n g t h e m e a s u r e m e n t p r oc e ss . B y c on tr o l-l in g th e r e l e v a n t c o n d i t i o n s - - a n d h e u s u a l ly k n o w s w h a t t h e s e c o n di -t i o n s a r e a n d c a n c o n t r o l t h e m - - h e c a n h o l d n e a r l y c o n s t a n t t h el e n g t h o f a r o d o r t h e p r e s s u r e o f a g a s . W h e n m e a s u r i n g a v a r ia b l e

  • 7/27/2019 artigo cronbach 1947 - validade

    2/16

  • 7/27/2019 artigo cronbach 1947 - validade

    3/16

    LEE J. CRONBACH 3in st ru me nt used by this man." If scores obtained by several observ-ers in simultaneous measurements are pooled for comparison, thecon sta nt error of each man is included as a source of variation . Thisprocedure studies the reliabili ty of "this measuring instrument usedby different men." Since the human takes part in the measurement,one cannot study the reliabili ty of an instrument apart from the menwho use it.

    Types of "Reliability"I t is known that

    0 . e 2rtt = 1 -- -- (I )O . t 2 'where rtt is the reliability coefficient, 0.j is the hypothetical errorvariance--the mean of the squared deviations of all obtained scoresfor each person from the mean obtained score for that person--and0.t 2 is the vari ance of t he scores o f all p ersons on all th e hypo thet icalindependent tr ials.

    It is convenient to consider the possible definitions of error ofmeasu rement in terms of variance. Using a bi-factor patt ern to de-scribe a test,* the variance of scores from a single testing may beexpressed as follows:

    -0+0. I:+ " '+ 2+ ~+ . . .+0 . , / _ ,+ o~. (2)" 1 2 ~ 0 . 9 2 " ~ - - 0 . / ' 1 " 0 . 8 1 0 .* ~ 2 t lThe terms have the following meanings:

    ~12 is the var ian ce of o btai ned scores;ag2 .is the variance in the general factor (if any) represented inthe test items ;0.i,~ ~/~, et c. , are the respective variances in the orthog onal group

    factors of undetermined number, each of which is represented in twoor more items;

    0 . ~ , 0 . ~ " , etc. , are the respective "specificities" of the n items--the part of the reliable variance of scores on the .items which cannotbe assigned to common factors; and 0.J like the residual variance.

    The referents for these factors may be il lustrated in a hypo-thetical examinatio n in psychology. The general facto r might includegeneral knowledge of psychology, reading ability, motivation, and* Anoth er fact or pa tte rn could be assumed withou t chang ing the basic argu -ment (4, 7-9, 107).

  • 7/27/2019 artigo cronbach 1947 - validade

    4/16

    4 PSYCHOMETRIKAo t h e r c h a r a c t e r i s t i c s . G r o u p f a c t o r s m i g h t b e r e l a t e d to k n o w l e d g eo f s e p a r a t e t o p ic s , m a t h e m a t i c a l s ki ll r e q u i r e d i n o n l y a f e w i t e m s ,a n d s o o n. E a c h i te m t a p s , in a d d it io n , s o m e s p ec if ic k n o w l e d g e n o td e m a n d e d b y o t h e r i te m s . T h e s p e c i f i c i ty v a r i a n c e a c c o u n t s f o r in -d i v i d ua I d i f f e re n c e s i n t h e s e e le m e n t s . T h e r e m a i n i n g v a r i a n c e m a yi n c l u de m o m e n t a r y i n a t t e n t i o n , g u e s s i n g , a n d o t h e r r a n d o m e l em e n t s .

    F o r r e f e r e n c e , th e f o r m u l a w i l l b e r e w r i t t e n t h u s :+ Z + Z + (3 )

    C o n s i d e r n o w t h e s c o r e s o b t a in e d f r o m a s e r ie s o f i n d e p e n d e n tm e a s u r e m e n t s o f t h e s a m e i n d i v id u a l s u s i n g th e s a m e te s t .

    a t ~ = o 2 + E ~ + E a 2 _ - ~ E a ~ . + E E , 7 ) + E E a 2 + a ~ . ( 4 )~t ~ is t h e v a r i a n c e o f a ll o b t a i n e d s c o r e s a b o u t t h e g r a n d m e a n ;~ is t h e v a r i a n c e o f t h e m e a n g e n e r a l f a c t o r s c o r e s o f a l l i n d i-

    gzv i d u a l s a b o u t t h e m e a n f o r a ll i n d i v i d u a l s - - t h e b e t w e e n - p e r s o n s v a r i -a n c e i n g ;a'-' is t h e b e t w e e n - p e r s o n s v a r i a n c e i n a g r o u p f a c t o r ;Y;~2 is t h e b e t w e e n - p e r s o n s v a r i a n c e i n s p e c if i ci ty on a n y i t e m ;R , ~ is t h e s u m o v e r i n d iv i d u a l s o f t h e v a r i a n c e s o f t h e g e n e r a l -

    f a c t o r s c o r e s f o r e a c h i n d i v id u a l a b o u t th e m e a n f o r t h a t i n d i v i d u a l - -t h e w i th i n - p e r s o n s v a r i a n c e ;

    E a/~ a n d Y~ R a ~ r e p r e s e n t t h e c o r r e s p o n d i n g w i t h i n - p e r s o n sv a r i a n c e s i n t h e g r o u p f a c t o r s a n d s p e c i fi c it ie s , r e s p e c t i v e l y ; a n d

    , ~ i s t h e r e s id u a l v a r i a n c e .T h e b e t w e e n - p e r s o n s v a r i a n c e s r e p r e s e n t , a s i n t h e c a s e o f t h es i n g le t r ia l , i n d iv i d u a l d i f f e r e n c e s i n t h e f a c t o r s . T h e w i t h i n - p e r s o n sv a r i a n c e s r e p r e s e n t i n s t a b i l i t y o f s c o r e s f o r e a c h i n d i v id u a l , a s a r e -s u l t o f c h a n g e s f r o m t e s t t o te s t .T h e s e f o r m u l a t i o n s p e r m i t a n e x a c t s t a t e m e n t o f w h a t a " r e li -a b i l i t y c o e ff ic i en t " r e p r e s e n t s . A p p a r e n t l y a t l e a s t f o u r f u n d a m e n -t a l ly d i f f e r e n t m e a n i n g s o f r e l i a b i l i t y a r e c u r r e n t :

    ( 1 ) T h e " e r r o r v a r i a n c e " m a y b e p e r m i t t e d t o i nc lu d e , i n e q u a -t i o n ( 4 ) , t h e t e r m s ~ ~g ~, R ~ ~/~ , R F~ ~ , a n d a ~ . T h a t i s , . i n s t a -b i l i t y i s r e g a r d e d a s a n e r r o r o f m e a s u r e m e n t . T h i s i s t h e c oe f fi ci en td e fi ne d b y t h e c o r r e la t io n f r o m r e p e a t e d i n d e p e n d e n t a d m i n i s t r a t i o n s

  • 7/27/2019 artigo cronbach 1947 - validade

    5/16

    L E E J . C R O N B A C H 5

    of the same test. The assu mptio n of constancy is made, since anychange of score from trial to trial is treated as an error of measure-ment. If that assumpti on is true, the instabili ty term s vanish, butsuch constancy in all the behaviors a test measures is highly unlikely.

    (2) The "e rr or varian ce" may be permi tted to include, in equa-tion ( 4 ) , t h e t e r m s Z ~ , F ~ , F . F ~ , E Z ~ , a n d ~ . B ot hinst abili ty and specificity are trea ted as errors. This is the "reliabil-ity " defined by the correlat ion between successive independen t admin-istr atio ns of equival ent tests. Because d.ifferent items are used inpreparing equivalent forms, the specific-factor scores of individualson the two tests will be uncorrelated. These, therefore , contribute tochanges in score and are trea ted as error. If the tests do not repre-sent the same gr oup facto rs, at least pa rt of F~ a ~ is also added to th e

    - - 5error variance.(3) The "e rro r varianc e" may be perm itt ed to include in equao

    " "reli ablht y the cor-ion (3) , the terms ~- and ,~ . This defines . . . . asrelation between two equivalent tests administered simultaneously.Instab ili ty is excluded from consideration, and no assumptions of con-sta ncy are made. Specific-factor variances are included in errors ofmeasu remen t. Dependi ng on the degree of equivalence, par t of thegroup-factor variance may also be treated as error.

    (4) The "er ror variance" may be restr icted, in equation (3),to the ter m ~. This is "reliabi li ty" defined as the self-correlation ofa tes t (see below). No assumpti on of const ancy is made, and inde-pendence is not involved. The specific fact ors rema in the same fro mtest to test and are added to the true-score variance. All real vari-ables measured by the tes t are tr ea te d as quantities estimated, not aserrors.

    It may now be helpful to restate these definitions and to givethem names for reference.D e f i n i t i o n (1) : R e l i a b i l i t y i s t h e d e g r e e t o w h i c h t h e t e s ts c o r e i n d i c a t e s u n c h a n g i n g * i n d i v i d u a l d i f f e r e n c e s i n a n yt r a i ts . ( C o e f f ic i e n t o f s t a b i l i t y ) .D e f i n i t i o n (2) : R e l i a b i t i t y i s t h e d e g r e e t o w h i c h t h e t e s ts c o r e i n d i c a t e s u n c h a n g i n g i n d i v i d u a l d i f f e r e n c e s i n t h e g e n -e ra l a n d g r o u p fa c t o r s d e f in e d b y t h e t e st . ( C o e f f ic i e n t o fs t a b i l i t y a n d e q u i v a l e n c e ) .

    * T h i s m a y b e m o d i f i e d b y r e q u i r i n g c o n s t a n c y o v e r s o m e s p e ci f i ed p e r i o d( o n e y e a r , o n e d a y , e t c . }

  • 7/27/2019 artigo cronbach 1947 - validade

    6/16

    6 P S Y C H O M E T R I K A

    D e f i n i t i o n ( 3 ) : R e l i a b i l i ty i s t h e d e g r e e t o w h i c h t h e t e s ts c o r e i n d i c a t e s t h e s t a t u s o f t h e i n d i v i d u a l a t t h e p r e s e n t i n -s t a n t i n t h e g e n e r a l a n d g r o u p f a c t o r s d e f i n e d b y t h e t e s t .( C o e f f i c i e n t o f e q u i v a l e n c e ) . I n t e r n a l c o n s i s te n c y t e s ts a r eg e n e r a l l y m e a s u r e s o f e q u i v a le n c e . T h e s e c o e ff ic ie n ts p r e -d i c t t h e c o r r e l a t i o n o f t h e t e s t w i t h a h y p o t h e t i c a l e q u i v a -l e n t te s t , a s l ik e t h e f i rs t t e s t a s t h e p a r t s o f t h e f i r s t t e s ta r e l i k e e a c h o t h e r .D e f i n i t i o n ( 4 ) : R e l i a b i l i t y , is t h e d e g r e e t o w h i c h t h e t e s ts c o r e i n d i c a t e s i n d i v i d u a l d i f ] e r e n c e s i n a n y t r a i t s a t t h ep r e s e n t m o m e n t . ( H y p o t h e t ic a l s e l f- c o r r e la t i o n ) .

    T h e s e n a m e s a r e o p e n to c r i t i c is m , a n d b e t t e r s u g g e s t i o n s a r e i n o r-d e r . T h e i m p o r t a n t t h i n g i s t o r e c o g n i z e t h a t i n t h e p a s t a l l f o u r o ft h e s e a n d m a n y a p p r o x i m a t i o n s t o t h e m h a v e b e e n c a ll ed " t h e r e l i -a b i l i t y c o e f f ic i e n t. " N o o n e o f t h e s e i s t h e "r~gh, t" c o e f f i c i e n t . T h e ym e a s u r e d i f f e r e n t t h in g s , a n d e a c h i s u s e f u l . W h a t is i m p o r t a n t i st o a v o i d c o n f u s i n g o n e w i t h a n o t h e r , a n d u s i n g o n e a s a n e s t i m a t e o fa n o t h e r . I t m a y b e n o te d t h a t r e l i a b i l i ty o f a t e s t c a n o n l y b e d is -c u s s e d in r e la t i o n t o a p a r t i c u l a r s a m p l e o f p e r s o n s .

    T h e c o m p o n e n t s o f e r r o r v a r i a n c e u n d e r e a c h d e f i n i t io n i m p l yt h a t i n p r a c t i c e s o m e c o e f fi c ie n t s w i ll b e l a r g e r t h a n o t h e r s f o r a g i v -e n te s t , I f s ta b i l i t y i s n o t p e r f e c t , a n d i f i te m s c o n t a i n s o m e s p e -c i f ic i ty l o a d i n g , t h e h y p o t h e t i c a l s e l f - c o r r e l a t i o n w i l l b e g r e a t e s t , a n dt h e c o e f f ic i e n t o f s t a b i l i t y a n d e q u i v a l e n c e w i l l b e t h e s m a l l e s t o f t h ef o u r .A s K e l le y s t a t e s ( 7 ) , t h e c o n c e p t o f r e l i a b i l i t y Js m e a n i n g l e s s u n -l es s o n e p o s t u l a t e s t h a t t w o m e a s u r e s o f t h e s a m e f u n c t i o n e x i s t.T h e y m a y b e s u c c e ~ i v e m e a s u r e m e n t s o f a s ta b l e e ve n t , o r s im u l -t a n e o u s m e a s u r e m e n t s o f a u n i q u e e v e n t . B u t in r e g a r d t o t h e n o n-r e p e a t i n g e v e n t w h i c h c a n b e o b s e r v e d o n l y on c e , r e l ia b i l i ty h a s o n l ya t h e o r e t i c a l i n t e r e s t . I n f a c t , i f o n e a c c e p t s a d e t e r m i n i s t i c p o s i t i o n ,t h e r e is n o " e r r o r " i n a m e a s u r e m e n t o f a u n i q u e e v e n t . T h e s t u -d e n t ' s r e s p o n s e s a n d h i s s co r e a r e d e t e r m i n e d b y m a n y f o rc e s , a n dw e d o n o t k n o w w h a t t h e y a r e ; b u t t h e r e s u l t a n t o f t h e s e f o r c e s ~s ap a r t i c u l a r a c t , a n d t h e a c t i t s el f , a t th i s i n s t a n t a n d w i t h t h e s e p a r -t i c u l a r f o r c e s , is p e r f e c t l y re l i ab le . " C h a n c e " a n d " e r r o r " a r e m e r e l yn a m e s w e g iv e t o o u r i g n o r a n c e o f w h a t d e t e r m i n e s a n e v e n t .A l l m e t h o d s o f s t u d y i n g r e l i a b il i t y m a k e a s o m e w h a t f a l l a c io u sd i v i s i o n o f v a r i a b l e s in t o " r e a l v a r i a b l e s " a n d " e r r o r . " I t i s p r o b -b a l y m o r e c o r r e c t t o co n c e iv e a c o n t i n u u m b e t w e e n t h e i n s t a n t a n e o u sb e h a v i o r wl~ ic h h a s a n i n f in i t e s im a l p e r i o d , t h r o u g h s t a t e s o f l o n g e rd u r a t i o n , t o t h e v i r t u a l l y c o n s t a n t i n d i v id u a l d i f fe r e n c e s . A t e s t s c e r e

  • 7/27/2019 artigo cronbach 1947 - validade

    7/16

    L E E J . C R O N B A C H 7is made up of all these "real" elements, each of which could be per-fectly predicted if our kn owledge were adequate. Reliability, accord-ing to this conception, becomes a measure of our ignorance of thereal factors underl ying brief f luctuations of behavior and atypicalacts. Perh aps a new statistical method based on the non-Arist oteli anconception of a continuum of realities will some day permit us toavoid the troublesome attempt to divide the continuum into "reality"and "error ."For the present, it appears to be necessary to retain the artificialseparation. In think ing about the self-correlation of a test-- th e con-sistency with which it measures whatever i t measures--we may classas chance effects all variables whose period of variation is shortertha n the time required to take the test. Momen tary fluctuations aretherefore "errors," but shifts in fatigue, set, or skill having a longercycle are possibly wo rt h measuring.

    Techniques of EstimationEach method used in the past to study "reliability" may be asso-

    ciated with one of these definitions. The procedures requ iri ng morethan one trial will be discussed first.

    Retest method. The retest method calls for giving the same testtwice to the same group. The tr.ials are supp osed to be indep enden t,but this m ay well not be true. Shi ft in relative scores is always trea t-ed in the error variance, not the true-score variance; the retest co-efficient is the re fo re an est ima te of the coefficient of stab.ility. Fail-ure to attain independent tr ials may make the estimate too high ortoo low.

    Guttman (3, 263), in a complete reconsideration of reliabilitytheory, defines reliabili ty in terms of the stabili ty of individual digferences during a large numb er of "indep endent " retests. He showsth at the rel iability thus defined (a coefficient of stabilit y) may beestimated by the correlation between two independent tr ials. His def-inition of independence will be discussed below.

    Equivalent tests method. Two "equivalent" or "parallel" testsmay be given, with any interval between, and their correlation de-termined. Exp eri men tal independen ce is assumed, despite the effectexperience with one form m ay have on the second. Constan cy is as-sumed, and all shifts in relative score are treated in the error vari-ance. Specific-factor variances are treat ed in the error variance. Thisis the ref ore an esti mate of the coefficient of Stability and equivalence.Because the assumption of independence cannot be tested, it is never

  • 7/27/2019 artigo cronbach 1947 - validade

    8/16

    8 PSYCHOMETRIKA

    known whether the estimate is high or low. To interp re t a coefficientinvolving equivalence, one must know how the tests are equivalent.If the tests are alike only in the general factor, group-factor vari-ances are included as er ror, and the coefficient reflects the extent towh'ich scores are dete rmined by a stable general factor. Parallel t estsshould ordinari ly have the same general and group factors. Wereitems in the two forms matched to test the same specific items of in-formation or skill, the equivalent tests might to some degree includethe same specific factors . The specific fac tor s in the two tests couldnot be completely the same, however, unless the items were identical.The coefficient of equivalence is a proper ty of a pa i r of tests and willvary according to the kind of similarity established in equating thetests. To the degree that parallel tests have the same general andgroup factors, the coefficient indicates the stability o f perf ormanc ei n the general and group factors.

    The sp l i t - ha l f me thod . The widely used split-half method requi resthe correlation of half the items in the test with the remaining items.Cronbach has studied the effect of various splits upon the resultingcoefficient (1) and has suggested the use of parallel splits, in whichthe two halves are made nearly equivalent (2). In the parallel split,each part represents the general factor and the group factors of theoriginal test as well as possible. The half-t ests should have equalstanda rd deviations. The procedure makes no assumption of con-stancy, but does include the specific-factor variance as er ro r variance.The split-half estima te is a coefficient of equivalence, est imating thecorrelation of simultaneously administered parallel tests, as like eachother as are t he halves of the test given. Any failu re in splitting toobtain equivalent halves will tend to lower the correlation obtained.An assumption of experimental independence is made in consideringthe split-half correlation an estimate of the parallel-test correlation.In testing by parallel tests, the performance on one form is presum-ably independen t of performa nce on the other. When items are pre-sented together, however, there is always the possibility of spuriousinter-item correlation due to item linkages and brief fluctuations ofmood and attention.Most random or odd-even splits do not represent all factorsequally in both halves. If the assumption of experimental indepen-dence were valid, the correlation would there for e be an under estimateof the coefficient of equivalence. Gutt man (3, 260) states t ha t thecorrected split-half coefficient is always a l ower bound to "the reli-ability coefficient," no mat te r how the tes t is split. He cautions tha tthis inequality is tru e only for an indefinitely large sample of per-sons. Sampl ing erro rs in practice preclude tak ing as one's coefficient

  • 7/27/2019 artigo cronbach 1947 - validade

    9/16

    LEE J. CRONBACH 9the l arge st of man~ trial split coefficients. Gut tman defines reliabil-ity in terms o repeated independent trials of the same (not equiva-lent) tests. By this definition, the split-h alf estimate, including spe-cificity as an e rror of meas urem ent, is a low one. The coefficient ofequivalence is a conservative estimate of the hypothetical self-corre-lation.

    The assumptions of the Spearman-Brown formula have beenstated in various ways, and this has led to some confusion as to theapplicability of the formula. The derivation hypothecat es equivalenttests and predicts their correlation from the correlation of equivalenthalf-tests. Equivalence is the only assumpti on made, and in the deri-vation equivalence is defined by requiring equal standard devia-tions of the half-tests and by requiring that the hypothetical equiva-lent t es ts be ju st as simi la r a s the ha lf -tes ts (r~b ~ r~A - - ~'b8 ~-- tAB).This defines equivalence so th at all tests ha ve the same common fact orcomposition. It makes no direct assumption of the equivalence of pairsof items or of the unit-rank among the item intercorrelations.The items of a test may be considered as a sample of some largerpopulation. One may define the purpo se of the test in terms of thepopulation of items to be measured; the test fulfils this purpose inso-far as the items are a representative sample of the population. Alter-natively, one may consider the test as defined by its items, and thinkof the population as the entire gr ou p of items of which the sampleis representative . The coefficient of equivalence (obtained by theparallel-test or interna l consistency methods) correlates two samplesof items and indicates the extent to which the variance in each maybe attr ibu ted to common factors. The exten t of common-facto r load-ings is the extent to which test scores are determined by "the popula-tion variable." If the samples to be compared must be representative,rather than random, it is necessary, in split-half procedures, to usethe parallel split or a split according to a table of specifications.

    The Kuder-Richardson formulas. A radical reformulation of thereliability problem was offered in 1937 by Kude r and Richardson (8).They proposed several alternative formulas which have been widelyadopted. The original de rivation has been criticized because of thenumerous assumptions made, but other writers have developed thesame formulas more directly. Perhaps the simplest derivation waspublished by Jacks on and Fergus on (5, 74). They define reliabilityas a coefficient of equivalence, equivalence being defined by requiringthat the two tests have equal variances and that the mean inter-itemcovariance within each test be equal and equal to the mean inter-itemcovariance between tests. If these assumpti ons are satisfied, theKuder-Ri chardson formula (20) is an exact estimate of the coefficient

  • 7/27/2019 artigo cronbach 1947 - validade

    10/16

    I 0 P S Y C H O M E T R I K A

    o f e q u i v a le n c e . T h i s c o n d i t i o n i s a re a s o n a b l e o n e w h e n t h e i t e m s o fa t e s t a r e c o n s id e r e d a s d r a w n f r o m a p o p u l a t io n o f i t e m s a ll m e a s -u r i n g a s i n g le g e n e r a l f a c t o r . I f g r o u p f a c t o r s a r e p r e s e n t , e v e nt h o u g h t h e t w o t e s t s m e a s u r e s t h e s e g r o u p f a c t o r s e q u al ly , t h e n ,r ~ ~ i5"~ ~ r ~ , b ' , S j . , * a n d th e K u d e r - R i c h a r d s o n f o r m u l a g i v e s a c o n -)s e r v a t i v e e s t i m a t e o f t h e c o ef fi ci en t o f e q u i v a l e n c e - - h o w c o n s e r v a t i v eo n e d o e s n o t k n o w .

    T h e G u t t m a n l o w e r b o u n d s . T h e l a t e s t s t a t e m e n t o f t h e p r o b l e mis t h a t p u b l i sh e d b y G u t t m a n i n 1 9 45 ( 3 ) . H e d e I i v e s si x f o r m u l a sf o r e s t i m a t i n g a c o e ff ic ie n t f r o m d a t a o b t a i n e d o n a s in g l e te s t i n g ,a ll t h e e s t i m a t e s b e i n g l o w e r t h a n t h e " t r u e r e l i a b i l i t y " i f t h e s a m p l eis s u f fi c ie n t ly g r e a t . H i s e s t i m a t e L3 i s i d e n t ic a l t o t h a t f r o m K u d e r -R i c h a r d s o n f o r m u l a ( 2 0 ) , a l th o u g h t h e d e r i v a t io n s a r e d i s s i m i l a r .H i s L4 i s e q u i v a l e n t t o t h e s p l i t - h a l f c o e f fi ci e nt . L 2 , w h i c h u s e s i t e mc o v a r i a n c e s , i s a n o r i g i n a l f o r m u l a m o r e d i ff i cu l t t o c o m p u t e t h a n L:~,a n d L , . L , , L ~ , a n d L 6 a r e e x p e c t e d t o h a v e l i tt le p r a c t i c a l i m p o r t a n c e .

    G u t t m a n d e fi ne s e r r o r a s t h e v a r i a t i o n o f t h e s c o r e o f a p e r s o no v e r a u n i v e r s e o f i n d e p e n d e n t t r i a l s w i t h t h e s a m e te s t . H i s c r u c ia la s s u m p t i o n , C~ ( 3, 2 6 5 - 2 6 6 ) , d e f i n e s i n d e p e n d e n c e s o t h a t t h e s c o r eo f a p e r s o n o n a n y i t e m o n a n y t r ia l i s e x p e r i m e n t a l l y i n d e p e n d e n to f h i s s c o r e s o n a n y o t h e r i t e m s . I n p r a c t ic e , c h a n g e s i n m o t i v a t i o n ,f u n c t io n s h i f t, a n d o t h e r v a r i a b l e s c a u s e i t e m s a d m i n i s t e r e d t o g e t h e rt o v a r y t o g e th e r . G u t t m a n c la s se s s h i f t s i n t h e v a r i a b l e s m e a s u r e da s e r r o r s o f m e a s u r e m e n t a n d t h e r e f o r e is e s t i m a t i n g a co e ff ic ie n t o fs t a b i l i t y w h e n h e d e m o n s t r a t e s t h a t t h e c o r r e l a t i o n b e t w e e n t w o i n -d e p e n d e n t t r i a l s o n a la r g e p o p u l a t i o n m a y b e t a k e n a s e q u a l to " t h er e l i a b i l i t y c o e f f i c i e n t " ( 3 , 2 6 8 ) .I n d e r i v i n g l o w e r - b o u n d s f o r m u l a s , G u t t m a n d e a ls w i t h h y p o -t h e t ic a I i n d e p e n d e n t r e t e s t s i n w h i c h t h e m e a n c o v a r i a n c e o f t w oi t e m s w i t h i n . tr ia ls e q u a l s t h e m e a n c o v a r i a n c e o f t h e s a m e l t e m s b e -t w e e n t r ia l s . B e y o n d t h i s he m a k e s n o a s s u m p t i o n . H i s d e f i n it io n o fi n d e p e n d e n c e r e q u i r e s t h a t t h e r e b e n o s h i f t i n t h e v a r i a b l e s m e a s -u r e d b e t w e e n t r i a l s ; i .e ., t h a t t h e h y p o t h e t i c a l t r i a l s b e s im u l t a n e o u s .S i n c e h e i s u s i n g i d e n t i c a l t e s t s s i m u l t a n e o u s l y , h e h a s d e f in e d r e ti -a b i l i t y a s t h e h y p o t h e t i c a l s e l f - c o r r e l a t i o n . H i s f o r m u l a s l e a d t o u n -d e r e s t i m a t e s o f t h a t c o e f fi ci en t.O n e m a y s t u d y t h e e f fe c t o n G u t t m a n ' s r e s u l ts i f h is a s s u m p t i o no f i n d e p e n d e n c e w i t h i n t r i a l s i s d e n ie d . T h i s m a y o c c u r w h e n o n ei t e m i n fl u e nc e s t h e a n s w e r t o a n o t h e r b y g i v i n g a c lu e , b y c a u s i n ge n c o u r a g e m e n t o r d i s c o u r a g e m e n t , o r b y se t t in g u p a p a t t e r n a m o n g

    * i .e ., the mean inter- i te m covar iance within tes ts is less tha n the mean inter-i tem covar iance between ~sts .

  • 7/27/2019 artigo cronbach 1947 - validade

    11/16

    LEE J. CRONBACH 11

    the responses. In the derivation of L~, the assumption leads to dis-carding a positive covariance term from the right member of (28).As a consequence, ~, and L~ are greater than they would be withoutthe assumption, and may overestimate the hypothetical self-correla-tion as defined. In the der ivation of L.., L3, and L4, the assumptionis felt in (25), where a positive covariance term is dropped from theright member. Without the assumption,

    ~'.~j; > ~ ' ~ - y . ~ . , g --/: j ,and the inequality given in (37) may not hold. The remainder of thederivation therefore may lead to estimates higher than the hypotheti-cal self-correlation, if the assumption of experimental independenceof items does not hold.

    This weakness is common to all estimates of reliability based ona single trial. Lindquist (9, 219) paints out tha t in the split-halfmethod the two halves are falsely assumed to be experimen tally inde-pendent, and ther efor e he considers the split-half estimat e spuriouslyhigh. [He, however, defines reliability as what we have called thecoefficient of stability and equivalence (9, 216)] . In the Kuder-Rich-ardson formula, as derived by Jackson and Ferguson, the same as-sumption of independence is made when the mean inter-item covari-ante between tests is taken as equal to the mean covariance withintests. If motivation, response sets, and other fac tor s common to per-formance on the various items of a trial are considered part of thegeneral or group factors measured by the test, their contribution tothe inter-item correlation within a trial is rightly included in theestimate of accuracy of measurement. But mo menta ry variationswhich cause random changes in item covariance should not be per-mitted to raise the estimate obtained. Any estimate of self-correla-tion or equivalence based on a single trial may be higher than thehypothetical self-correlation. It ma y be trea ted as a conservative orexact estimate only if we are willing to assume that the response toeach item is an independent behavior, related to response on otheritems only because of significant conditions in the person tested.Guttman makes the point that his split-half formula

    L . - - 2 ( 1 s ~ + s " ~ ) (5)8t 2

    is superior to the Spear man-Brown formu la in that it does not assumethe two half-tests to have equal variance. His formula can be derivedas an estimate of the coefficient of equivalence, according to the usualproof of the Spearman-Brown formula, except tha t equivalence is de-

  • 7/27/2019 artigo cronbach 1947 - validade

    12/16

    12 PSYCHOMETRIKAfine d so th at a~,+~ ~ ~A+B, and r ~ a ~ -~ "~'(,BO'~tO'B~ r A b a . 4 E % ~ r b t t g b O "Br~ba~ab. This leads to a formula identical to Guttman's, or an equiva-lent form previously derived by Flanagan (see Kelley, 7) which isless readily computed. Values obtained using this fo rmula a re small-er (usually by a small amount) than the values from the Spe arman-Brown formula, except where s , , : s ~ . It appears that th.is formulashould replace the Spearman-Brown procedure.

    S u ~ t m ~Four possible definitions of "reliability" have been considered.The hypothetical self-correlation requires independent simultaneousidentical tests. Fo r psychological variabl es this is a hypothetical sit-uation, and no one has found an unbiased estimate of this coefficient.Guttman's formula L~ would be a conservative estimate of the hypo-thetical self-correlation, save for the necessity of assuming that re-sponses to one item are not influenced by responses to another item.Gutt man' s L~ is ordinari ly gre ater than the estima te fro m the Kuder-Richardson formula.The coefficient of equivalence is lower than the hypoth etical self-correlation. Kuder-Rlchardson formula (20) is an exact est ima te ofthe coefficient of equivalence for tests where the item intercorrelationmat rix has ra_~k one; otherw ise the estimate is conservative. This,however, like all estimates of equivalence, assumes experimental in-dependence of items within one trial. The parallel-split method givesan es tima te of the coefficient of equivalence. Fo r an ideally largepopulation, the highest split-coefficient is the best estimate, and esti-mates from other splits are conservative, save for the failure of in-dependence of items.The coefficient of stability is lower than the hypothetical self-correlation. It is estimated by the test-retest correlation, b ut carry-over from one test to another may cause the estimate to be faulty.The parallel-tests correlation is an es timate of the coefficient ofstability and equivalence. It may be unduly high if the two tests arenot experime ntally independent. Otherwise, the estimate will ordi-narily be lower than the coefficient of stability or the coefficient ofequivalence.A simple table may indicate the different meanings of the var.i-ous procedures. In Table 1, checks indicate the variances which a reincluded in the error of measurement, according to each procedure.In the absence of sampling error, any estimate of reliability is lessthan the hypothetical self-correlation, as suming exp erim ent al inde-pendence. Eve ry procedure assumes either the experimental indepen-dence of trials o r of items within the trials. This condition is rarely

  • 7/27/2019 artigo cronbach 1947 - validade

    13/16

    LE]~ J . CRONB ACH 13

    satisfied, and any obtained coefficient may the ref ore be high er th anthe coefficient supposed to be obtain ed.

    T A B L E 1Variances Included in Error Variance of a Test, According toVarious Formulations of the Reliability Problem*

    Test-Retest x x x xParallel Test x x x x xParallel Split x xRandom Split x x xKuder-Richardson (20) x x xG u t t m a n L o x ~Hypothetical Self-Correlation xCoefficient of Equivalence x xCoefficientof Stability x x x xCoefficientof Stabilityand Equivalence x x x x x

    * An z indicates tha t the va rian ce indicated is included i n t h e error of measure ment by the pro-cedure or definition listed at the left.In equations (31) and (43), Gutt man sets up inequalities which overesti mate the item e r r o rvariance.

    Practical ImplicationsNo one "best" est imate of rel iabil i ty exis ts . If one could val idly

    make the assumption of s tabil i ty between tr ials , and independence oft r ia ls, the tes t - re tes t corre la t ion would be sa t i s fac tory. Frequen t lywe mus t rely on s ingle-tr ial est imates . Gutt man 's L~ or a paral lel-split used with his L3 will in general give the highest coefficients.Where the tes t measures a s ingle fac tor , the Kuder-Richardson for-mula (Guttman's L,) should be as useful as the other two procedures.

    In m any si tuat ions, i t is appr opr iat e to seek a coefficient otherthan the hypothet ica l sel f -corre lat ion. In corr ect ing for a t tenuat ion,any of the coefficients described in this pa per may be approp riate .Following the lead of Remm ers and Whisler (11), one may dis t in-guish between the "true . instantaneous score" in a variable (relatedto the self-co rrelat ion or the coefficient of equivalence) and the "t ru escor e" in a trai t (rel ated to the Coefficient of stabil ity or of stabi lityand equivalence). Sometimes one wishes to know the corr elat ion be-tween true scores in two trai ts postulated as s table over a period oft ime --" soma toty pe" vs. " te mpe ram ent " is a typical problem. Herethe a ppr opr iate coefficients for use in the at ten uatio n formu la are the

  • 7/27/2019 artigo cronbach 1947 - validade

    14/16

    14 PSYCHOMETRIKA

    coefficient of stab ili ty (if the tra it is defined operationall y by a spe-cific tes t) or the coefficient of stabili ty and equivalence (if the t ra itis defined by a famil y of similar t est s). Othe r problems call for st udy-ing the relation between true instantaneous score in one variable(such as an aptitude test) and true score in another defined as stable(such as job perfo rmanc e). For this, the reliability of the formerscore would be based on a coefficient of equivalence (since the hypo-thetical self-correlation is not known), and the reliability of the lat-te r would be based on one of the coefficients involving stab ility. Thethird possibility, and one of much theoretical importance, is a prob-lem regarding true instantaneous scores in two variables, such asmood and performance. The correction for attenuation here requiresuse of t wo coefficients of equivalence.Similar reasoning applies to the problem of estimating the sig-nificance of changes in test score. If the identical tes t is given bo thtimes, the coefficient of stabili ty is approp riate. The hypot hetical self-correlation, if known, would test whether a significant change in be-havior had occurred, although this change might be due to normaldiurnal fluctuation. The coefficient of stability tests wh eth er th echange is greater than that "normally" to be expected due to functionfluctuation. If growt h is measured by equivalent tests, a coefficientof equivalence, or of stability and equivalence, is relevant.In evaluating a test, edl four coefficients are of interest. Formost purposes, one wishes to measure stable characteristics, so thata coefficient of stab ilit y is needed. Fo r research purpo ses, however,a test having high instantaneous self-correlation or equivalence andlow stability may be very satisfactory.The coefficient of stability is an abst ract ion ; in reality, there isan indefinitely large number of such coefficients, corresponding tovariou s time intervals betwee n tests. For mean ingful use of such acoefficient, it must be defined as "the coefficient of stability over oneweek," or the like. The coefficient also depends on the conditions af-fecti ng the sub ject betwe en testings. Strict ly speaking, a coefficientof stability may be carried over to a new situation only when the timeinterval and the con4itions between testings are similar to those un-der which the coefficient was ob tained. The coefficient of stabi lit ywould be better understood if research were available showing howthe coefficient varies with increasing time lapse.The following recommendations result from the analysis madeabove.1. Reliability fo r psychological measur ement can never be ob-served as in the physical sciences, where variables are practicallyconst ant and non-hysteretic. All estimates of reliability require as-

  • 7/27/2019 artigo cronbach 1947 - validade

    15/16

    L E E J. CRONBACH 15sump tio ns unlikely to be fulfilled.

    2. Several coefficients numer icall y less tha n the hy pothetica lself-correlation can be estimated . A distincti on between these vari-ous coefficients should be mad e; the wri ter proposes the n ames coef-ficient of equivalence, coefficient of s tabil ity, and coefficient of sta-bility and equivaience.

    3. The coefficient of equivalence ma y be estim ated by the par-allel-split method, using form ula (5), G utt man 's L, . The Kuder-Richards on formul a (20) undere stimat es this coefficient unless thetest i tem mat rix has rank,one. Gut tman 's L~ gives an under estimateof the hypothetical self-correlation which may or may not be highertha n the coefficient of equivalence. All estimates of retiabili ty orequivalence based on a single trial assume that test items are experi-mentally independent. To the extent th at this is untrue, estimatesmay be erroneously high.

    4. The coefficient of stabili ty may be esti mated by the test-re-test method, with an undetermined error due to failure of indepen-dence. The coefficient of stabi lit y and equivalence ma y be esti matedby the correlation of parallel tests, with a similar error.

    5. In describi ng a test, the aut ho r should provide separat e es-ti mat es of the coefficient of equivalence and the coefficient of sta bili ty.The ti me in terval used in obt ainin g the coefficient of stabitit y shouldbe reported. If ther e are multiple forms, the coefficient of stabil ityfor each should be given.

    6. In pract ice, the coefficient of equivalence or the coefficient ofstabili ty may be used meani ngfu lly where the reliabili ty coefficientis called for. The coefficients are n ot interc hangea ble and have dif-ferent meanings in correct ions for a t tenuat ion, s tandard errors ofmeasu remen t, and like applications. The hypot hetica l self-correlation,showing the extent to which a test measures real but possibly mo-mentary differences in performance, is more important to the theoryof measurement than to the practical use of tests.

    R E F E R E N C E S1. Cronbach, L. J. A case study of t h e s p l i t - h a l f reliability coefficient. J . educ .Psychol . , in press.2. Cronbach, L. J. On estimates of test reliability. J. educ. Psychol . , 1943, 3 4 ,485-494.3. Guttman, L. A basis for analyzing test-retest reliability. P s y c h o m e t r i k a ,1945, 10, 255-282.4. Holzinger, K. J. and Harman, H. Factorial analysis. Chicago: Universityof Chicago Press, 1941.5. Jackson, R. W. B. and Ferguson, G.A. Studies on the reliability of tests.Toronto: Department of Educational Research, Bulletin No. 12, 1941.6. Jenkins, J. G. Validity for what? J . consu l t i ng psychol . , 1946, 10, 93-98.

  • 7/27/2019 artigo cronbach 1947 - validade

    16/16

    1 6 PSYCHOMETRIKA7. Kelley, T. L. The relia bi lit y coefficient. P s y c I w m ~ t r i k a , 1942, 7, 75-83.8. Kuder , G. F. and Richardson , M. W. The theory of the esti mati on of test

    reliability. P s y c h o m e t r i k a , 1937, 2, 151-160.9. Lindq uist, E. F. A first course in statistic s. Boston : Houghton- Mifflin, 1942.10. London, I. D. Some consequences for his tor y and psychology of La ngra ui r'sconcept of convergence and divergence of phenomena. P s y c h o l . R e v . 1946,53, 170-188.11. Returners, H. H. and Whisler , L. Test re lia bil ity as a funct ion of methodof computation. J . ed~ c . Psychol . , 1938,29, 81-92.