abordagems estatísticas para o estudo da cerâmica

16
Archaeometry 50, 1 (2008) 142–157 doi: 10.1111/j.1475-4754.2007.00359.x *Received 8 November 2005; accepted 20 September 2006 © University of Oxford, 2007 Blackwell Publishing Ltd Oxford, UK ARCH Archaeometry 0003-813X © University of Oxford, 2008 XXX ORIGINAL ARTICLES ON STATISTICAL APPROACHES TO THE STUDY OF CERAMIC ARTEFACTS M. J. BAXTER ET AL. *Received 8 November 2005; accepted 20 September 2006 ON STATISTICAL APPROACHES TO THE STUDY OF CERAMIC ARTEFACTS USING GEOCHEMICAL AND PETROGRAPHIC DATA* M. J. BAXTER and C. C. BEARDAH School of Biomedical and Natural Sciences, Nottingham Trent University, Clifton, Nottingham NG11 8NS, UK I. PAPAGEORGIOU Department of Statistics, Athens University of Economics and Business, 76 Patission Str., 10434 Athens, Greece M. A. CAU Institució Catalana de Recerca i Estudis Avançats (ICREA)/ERAUB, Departament de Prehistòria, Història Antiga i Arqueologia, c/ de Baldiri Reixac s/n 08028 Barcelona, Spain P. M. DAY Department of Archaeology, University of Sheffield, Northgate House, West Street, Sheffield, S1 4ET, UK and V. KILIKOGLOU Laboratory of Archaeometry, Institute of Materials Science, NCSR Demokritos, Aghia Paraskevi, 15310 Attiki, Greece The scientific analysis of ceramics often has the aim of identifying groups of similar artefacts. Much published work focuses on analysis of data derived from geochemical or mineralogical techniques. The former is more likely to be subjected to quantitative statistical analysis. This paper examines some approaches to the statistical analysis of data arising from both kinds of techniques, including ‘mixed-mode’ methods where both types of data are incorporated into analysis. The approaches are illustrated using data derived from 88 Late Bronze Age transport jars from Kommos, Crete. Results suggest that the mixed-mode approach can provide additional insight into the data. KEYWORDS: CERAMICS, GEOCHEMICAL, LATE BRONZE AGE, MIXED-MODE, MULTIVARIATE ANALYSIS, PETROGRAPHIC, THIN SECTIONS © University of Oxford, 2008 INTRODUCTION The scientific analysis of archaeological ceramics is often undertaken with the aim of identifying groups of similar artefacts. Much published work focuses on the analysis of data derived from either geochemical or mineralogical techniques. Geochemical data lend themselves naturally to analysis by quantitative statistical methods. Since the mid-1970s numerous papers have been published on the use of multivariate analysis for the purpose of grouping such data (e.g., Bieber et al. 1976; Glascock 1992; Beier and Mommsen 1994; Baxter and Buck 2000).

Upload: hannahlatina

Post on 11-Jul-2016

216 views

Category:

Documents


0 download

DESCRIPTION

The scientific analysis of ceramics often has the aim of identifying groups of similarartefacts. Much published work focuses on analysis of data derived from geochemical ormineralogical techniques. The former is more likely to be subjected to quantitative statisticalanalysis. This paper examines some approaches to the statistical analysis of data arisingfrom both kinds of techniques, including ‘mixed-mode’ methods where both types of data areincorporated into analysis. The approaches are illustrated using data derived from 88 LateBronze Age transport jars from Kommos, Crete. Results suggest that the mixed-modeapproach can provide additional insight into the data.

TRANSCRIPT

Page 1: Abordagems Estatísticas Para o Estudo Da Cerâmica

Archaeometry

50

, 1 (2008) 142–157 doi: 10.1111/j.1475-4754.2007.00359.x

*Received 8 November 2005; accepted 20 September 2006© University of Oxford, 2007

Blackwell Publishing LtdOxford, UKARCHArchaeometry0003-813X© University of Oxford, 2008XXX

ORIGINAL ARTICLES

ON STATISTICAL APPROACHES TO THE STUDY OF CERAMIC ARTEFACTSM. J. BAXTER

ET AL.

*Received 8 November 2005; accepted 20 September 2006

ON STATISTICAL APPROACHES TO THE STUDY OF CERAMIC ARTEFACTS USING GEOCHEMICAL AND

PETROGRAPHIC DATA*

M. J. BAXTER and C. C. BEARDAH

School of Biomedical and Natural Sciences, Nottingham Trent University, Clifton, Nottingham NG11 8NS, UK

I. PAPAGEORGIOU

Department of Statistics, Athens University of Economics and Business, 76 Patission Str., 10434 Athens, Greece

M. A. CAU

Institució Catalana de Recerca i Estudis Avançats (ICREA)/ERAUB, Departament de Prehistòria, Història Antiga i Arqueologia, c/ de Baldiri Reixac s/n 08028 Barcelona, Spain

P. M. DAY

Department of Archaeology, University of Sheffield, Northgate House, West Street, Sheffield, S1 4ET, UK

and V. KILIKOGLOU

Laboratory of Archaeometry, Institute of Materials Science, NCSR Demokritos, Aghia Paraskevi, 15310 Attiki, Greece

The scientific analysis of ceramics often has the aim of identifying groups of similarartefacts. Much published work focuses on analysis of data derived from geochemical ormineralogical techniques. The former is more likely to be subjected to quantitative statisticalanalysis. This paper examines some approaches to the statistical analysis of data arisingfrom both kinds of techniques, including ‘mixed-mode’ methods where both types of data areincorporated into analysis. The approaches are illustrated using data derived from 88 LateBronze Age transport jars from Kommos, Crete. Results suggest that the mixed-modeapproach can provide additional insight into the data.

KEYWORDS:

CERAMICS, GEOCHEMICAL, LATE BRONZE AGE, MIXED-MODE, MULTIVARIATE ANALYSIS, PETROGRAPHIC, THIN SECTIONS

© University of Oxford, 2008

INTRODUCTION

The scientific analysis of archaeological ceramics is often undertaken with the aim of identifyinggroups of similar artefacts. Much published work focuses on the analysis of data derived fromeither geochemical or mineralogical techniques. Geochemical data lend themselves naturallyto analysis by quantitative statistical methods. Since the mid-1970s numerous papers havebeen published on the use of multivariate analysis for the purpose of grouping such data (e.g.,Bieber

et al

. 1976; Glascock 1992; Beier and Mommsen 1994; Baxter and Buck 2000).

Page 2: Abordagems Estatísticas Para o Estudo Da Cerâmica

On statistical approaches to the study of ceramic artefacts

143

© University of Oxford, 2007,

Archaeometry

50

, 1 (2008) 142–157

Mineralogical data—and the focus in this paper is on those produced by thin-sectionpetrography—are less frequently recorded in a manner that invites quantitative analysis, andstudies in which both geochemical and mineralogical data are used, and combined, in a quantitativeway are comparatively rare. The aim in the present paper is to investigate some possibleapproaches to what we shall call ‘mixed-mode’ analysis, in which both kinds of data are studiedjointly in a quantitative fashion. The motivation is that the joint study of both forms of data islikely to be more informative than separate study.

The main idea is to undertake a series of analyses that give different weights to the twotypes of data. At one extreme only the thin-section data are used, while at the other only chemicaldata are used. If these tell the same ‘story’, there is no real need to combine them in an analysis.If apparently different patterns are observed using the two different data types then it is possiblethat using them in combination will show interpretable patterning in the data, not readily seenusing either type separately.

The methods developed will be illustrated on a sample of 88 transport jars found in excavationsat Kommos, Crete. In the concluding section we review what has been achieved and consider,briefly, other approaches to analysis that might be adopted.

One of the referees for this paper questioned the validity of combining petrographic andgeochemical data in a single analysis and concluded that ‘perhaps’ it was acceptable,

but it isalso dangerous

. One reason for this suggestion (we comment on other reasons later) is that itmay be preferable to investigate the petrographic and geochemical data separately, and test theresults against each other investigating, in particular, the reasons for any discrepancies. In fact,as noted in the discussion, our approach allows for this if it is the preferred option, and theissue is discussed in Beardah

et al

. (2003). The focus in the present paper is on what we havetermed ‘mixed-mode’ analysis, but in the light of the referee’s comment we should emphasizethat we present it as

an

approach to analysis rather than

the

approach.

PRINCIPLES OF STATISTICAL ANALYSIS

Notation and general principles

Let

X

c

be an

n

×

p

data matrix describing the chemical composition of

n

artefacts. If thin sectionsare available for each artefact, using the methods described in Cau

et al

. (2004), these can becoded in the form of an

n

×

q

data matrix

X

m

. The

q

variables are binary, taking on the values1 or 0, which reflect the presence or absence of qualities of the thin section deemed relevantto the purpose of the analysis in mind.

This method of coding thin sections is a flexible approach that could be undertaken in morethan one way. In Cau

et al

. (2004), and here, a set of primary variables relating to the technology,rock types and rock-forming minerals is initially defined. These are categorical variables withvariable

k

having

L

k

levels, from which

L

k

dummy variables corresponding to the levels canbe defined. This is done for each variable in turn, giving rise to an

n

×

q

matrix, where

q

=

Σ

k

L

k

.Each row of this matrix consists of 0s and 1s, the 1s corresponding to features observed in thethin section. Full details are given in Cau

et al

. (2004). The system is not prescriptive, asresearchers are at liberty to define those variables considered to be most appropriate to theirproblem.

The same referee who queried the merits of combining petrographic and geochemical datain a single analysis also wondered, at this point, whether this ‘liberty’ allowed researchers tochoose variables that supported their preconceived ideas about the petrographic typology. The

Page 3: Abordagems Estatísticas Para o Estudo Da Cerâmica

144

M. J. Baxter

et al.

© University of Oxford, 2007,

Archaeometry

50

, 1 (2008) 142–157

question is a legitimate and interesting one. The possibility exists, but is not necessarily an evilone. If the quantification ‘supports’ a typology determined by other means, it can be arguedthat it is doing its job, while allowing a more formal comparison of typologies suggested bydifferent methods (petrographic and chemical). That, as the referee has also suggested, thismight result in a combination of the data types resulting in adjusting geochemical groupingsso that they may be more in line with predetermined typological groupings, resulting in a ‘spuriousvalidity’ being assigned to the groupings, is also a valid point. We would simply reiteratehere that, while we are focusing on ‘mixed-mode’ analysis, our philosophy does allow for thecomparison of separate petrographic and geochemical analyses. The quantification of thepetrographic data, however undertaken, can be viewed as one way of facilitating this.

The previous paragraph is an attempt to discuss an important issue in a reasonably generalway. As far as the present study is concerned, the original petrographic classification, andsubsequent coding of the data, were undertaken by different individuals. While the coding wasundertaken with knowledge of the classification, it was based on a system developed inde-pendently for a different data set (Cau

et al

. 2004), and judged to be suitable for the purposeto hand. That is, the petrographic attributes used were not specifically chosen to, in the referee’swords,

best support the pre-determined typological classification

, although as our results showthey do support the classification well. While the purpose of the present paper is primarily toderive and illustrate methodology, researchers tempted by it do need to give careful thought tothe issues raised above and in the concluding paragraph of the introduction.

The data matrix

X

c

may be analysed by standard methods such as principal component(PCA) or cluster analysis, usually after transformation and/or standardization of the variables.If

Z

c

denotes the

n

×

r

matrix of scores on the first

r

principal components, the usual hope isthat two- or three-dimensional plots based on a subset of the columns of

Z

c

will reveal interpretablestructure in the data.

The data matrix

X

m

may be treated in an essentially similar way, allowing for its binarynature, either by using correspondence analysis (CA) which can be thought of as a weightedform of PCA, or by the direct application of PCA, which is equivalent to classical metric multi-dimensional scaling (MDS). Other forms of (non-metric) MDS are also available. In practiceit can be useful to compare different methods, since they can emphasize different (interpretable)structures in the data. Any single analysis results in a matrix of scores,

Z

m

, which can be usedin the same way as

Z

c

for identifying structure in the data.Two possible approaches to mixed-mode analysis are described. In the first a matrix of

scores

Z

λ

is obtained, where

λ

reflects the relative ‘weight’ given to the two kinds of data.For

λ

=

0,

Z

m

is obtained, while as

λ

increases through whole-number values

Z

λ

Z

c

. In thesecond approach a matrix of scores

Z

µ

is obtained, which behaves in a similar way to

Z

λ

as

µ

varies smoothly from 0 to 1.

Comparing different analyses

Principles will be discussed first before some important practicalities are noted. Most simply,and informally, two- or three-dimensional plots based on (subsets of) the

r

components maybe compared visually.

More formal comparisons may be undertaken using some form of Procrustes statistic. Suchstatistics measure how close two

r

-dimensional configurations of data are after rotating,reflecting and rescaling the data to match the configurations as closely as possible. For twosets of scores,

Z

i

and

Z

j

, one such statistic, developed by Sibson (1978), is defined as

Page 4: Abordagems Estatísticas Para o Estudo Da Cerâmica

On statistical approaches to the study of ceramic artefacts

145

© University of Oxford, 2007,

Archaeometry

50

, 1 (2008) 142–157

γ

=

1 – [{tr(

Z

i

T

Z

j

Z

j

T

Z

i

)

1/2

}

2

/tr(

Z

i

T

Z

i

)tr(

Z

j

T

Z

j

)],

where tr(.) is the trace and

T

the matrix transpose operator. Other Procrustes statistics could beused; however, this one has the merit of symmetry, as

Z

i

and Zj can be interchanged withoutaffecting the result. It takes values between 0 and 1, with 0 arising for identical configurations.

The main practical consideration is that both visual and formal comparisons can be badlyaffected by outliers present in either the chemical or mineralogical data sets. The sensibleapproach here is to identify obvious outliers from either analysis, remove any such outliersfrom both analyses, repeat the statistical analysis, and proceed in an iterative fashion until outliersare judged not to be a problem. The subset of outliers, if any, identified in this way should not,of course, be ignored, but should be considered separately when substantive interpretation ofthe data is attempted.

Mixed-mode analysis—first approach

Our first approach to mixed-mode analysis rests on the idea of defining a dissimilarity coefficientbetween cases using all the available data, and subjecting the resulting dissimilarity matrix tosome form of MDS. This requires that dissimilarity between cases be defined. A seminalpaper in this regard is Gower (1971). Let d(i, j) be the dissimilarity coefficient between casesi and j and let m = (p + q). Kaufman and Rousseeuw (1990) generalize Gower’s coefficient bydefining

,

where dij(k) is the contribution of variable k to d(i, j) and δij

(k) is the weighting of variable k anddepends on the variable type (for details concerning computation, see the appendix).

This is a fairly general definition, and for present purposes we specialize to the case wherevariables are binary or continuous. For continuous data,

and δ ij(k) = 1, where rk is the range of variable R, so that the contribution of the variable is

between 0 (identical) and 1 (most different). Here, xik is the value of variable k for case i.Binary variables may be treated symmetrically or asymmetrically. In the former case, 0–0

and 1–1 matches are treated as equally indicative of similarity; in the latter case, 0–0 matchesare not regarded as indicative of similarity. In an asymmetric treatment, which is the one usedhere, the fact that two thin sections do not, for example, include a particular rock type is notregarded as indicative of similarity, whereas the fact that they do is regarded as evidenceof similarity. Thus, define dij

(k) to be 0 if xik = xjk and 1 otherwise, and define δ ij(k) = 1 unless

xik = xjk = 0, in which case it is equal to 0.Notwithstanding suggestions to the contrary in the first edition of Shennan’s (1988) text,

Gower’s coefficient does not seem to have been widely used in published archaeologicalapplications (Baxter 2003, 94). One possible reason for this is that such analyses tend to bedominated by the binary data, Xm, at the expense of the continuous data Xc. For the situation

d i j

dijk

ijk

k

m

ijk

k

m( , ) [ , ]

( ) ( )

( )

= ∈=

=

δ

δ

1

1

0 1

d x x rijk

ik jk k( ) = −

Page 5: Abordagems Estatísticas Para o Estudo Da Cerâmica

146 M. J. Baxter et al.

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

to which we have specialized, a possible way round this potential problem is to generalize thedefinition of d(i, j) as follows. Define

and

which are the contributions to the numerator of d(i, j) of the binary and continuous variables.Now generalize the definition of d(i, j) to

where b is the number of binary variables for which δ ij(k) = 1, and λ is a weighting factor that

has the value 1 in the original definition.For practical applications it is necessary to decide on a suitable value of λ and calculate the

d(i, j). Given software that allows calculation of Gower’s coefficient, or generalizations of it,the following simple method has proved to be effective. Use the notation

‘Xm + Xc’ = [Xm | Xc]

to refer to the partitioned data matrix of the original data. Analysis of this corresponds to usingλ = 1 in the analysis. To give more weight to the chemical data, the idea is to augment thisdata matrix with copies of Xc so that, for example,

‘Xm + 2Xc’ = [Xm | Xc | Xc]

would correspond to a choice of λ = 2.Rather than attempting to determine an ‘optimal’ value of λ, we have found it useful to

examine a series of views for different values. For a sufficient (relatively small) number ofcopies of Xc, the analysis is essentially that of the chemical data only; for λ = 0, analysis is ofthe mineralogical data only. Computational aspects of this approach are discussed in theappendix.

Another practical concern is that it is to be expected that, for data matrices of the size andtype being used, no single low-dimensional projection of the data will reveal all the structurepresent. This suggests that outliers and groups revealed in initial analysis of the data shouldbe removed, and analysis repeated to reveal further structure in the data. This will bereferred to as ‘iterative’ analysis, or ‘peeling off’ of the more obvious outliers and structure inthe data.

Mixed-mode analysis—second approach

The mixed-mode analysis described above relies upon the ability to calculate—for example,via Gower’s coefficient—a measure of the dissimilarity between cases when variables are ofdiffering type. The resulting dissimilarity matrix is then subjected to some form of MDS. Analternative ‘weighted’ mixed-mode analysis can be undertaken by separately calculatingdissimilarities between cases on the basis of (a) the continuous chemical data and (b) themineralogical binary data, before combining this information as described below.

B dijk

ijk

k

q

( ) ( )==

∑δ1

C dijk

k

p

,( )==

∑1

d i jB C

b p( , )

[ , ],=

++

∈λλ

0 1

Page 6: Abordagems Estatísticas Para o Estudo Da Cerâmica

On statistical approaches to the study of ceramic artefacts 147

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

Let Dc be the dissimilarity matrix for the chemical compositional data and Dm that for thecoded mineralogical binary data, and assume that these are scaled so that entries lie between0 and 1. We can now form a new dissimilarity matrix, Dµ, using

Dµ = µDc + (1 − µ)Dm.

Here, µ is a mixing parameter that lies between 0 and 1. If µ = 0, then Dµ = Dm and the dis-similarities are based only on the mineralogical data. On the other hand, if µ = 1, then Dµ = Dc

and the dissimilarities are based only on the chemical compositional data. Weighted mixed-modeanalyses can be performed by choosing intermediate values of µ. Values of µ close to 0 assigngreater weight to the binary data, whilst values close to 1 favour the chemical compositionaldata; a value of µ = 0.5 would give the two analyses equal weight. For display purposes, theresulting dissimilarity matrices are subjected to some form of MDS. In principle, we prefer todisplay results for a sequence of values as µ is changed from 0 to 1, rather than attempting toidentify an optimum value of µ, but see the final section for further discussion of this.

The approach just described is essentially a specialized version of the method discussed inunpublished work of Neff et al. (1988). They, in turn, ascribe the methodology to Romesburg(1984). So far as we are aware, it has not been illustrated in published archaeometric applications.

EXAMPLES

To illustrate, data from a set of 88 Late Bronze Age transport jars found in excavations atKommos, Crete, will be used. Using neutron activation analysis, all samples were analysed forthe elements Sm, Lu, Yb, Na, Ca, Ce, Th, Cr, Hf, Cs, Rb, Sc, Fe, Co, Eu, La, As, Sb, U and Tb.For reasons of precision, the last four of these were not used in statistical analysis. Additionally,four samples had incomplete chemical information for the elements used, and have been omittedin all the analyses to follow.

Initially, on the basis of typological evidence, the jars were (separately and independent ofstatistical analysis) classified as Cretan (34 samples) and imported material (54). The lattergroup was classified as Canaanite (32) or Egyptian (22). Subsequently, on the basis of the thinsections the Cretan material was divided into 10 fabric groups, and the imported material into12. Of these 22 groups, nine consist of a single specimen, and a further six of two specimensonly. For the purposes of the quantitative analyses to be described here the thin-section datawere coded in binary form, as discussed in the first part of the previous section, and describedfully in Cau et al. (2004), using a coding system similar to that given there.

In what follows all the data are analysed, to illustrate our general approach. It proves possibleto separate out most of the Cretan samples from the imports. A more detailed analysis is thenundertaken of the imported material. It should be emphasized that while some discussion ofthe archaeological import of our analyses is necessary, the prime emphasis is on illustratingmethodological matters. A full archaeological discussion will be published elsewhere.

Table 1 is intended as a succinct reminder of the methodologies used.

Example 1—analysis of all the data

Figure 1, based on a correspondence analysis (CA) of the petrographic data only, shows fourclear petrographic outliers. Two of these correspond to singleton fabric groups identified in theoriginal interpretation of the thin sections, the remaining two (closely associated) outlierscorresponding to a similarly identified group of two cases. All are Canaanite samples. The

Page 7: Abordagems Estatísticas Para o Estudo Da Cerâmica

148 M. J. Baxter et al.

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

Cretan material separates out fairly well from the imported material, apart from three samplesthat seem similar to Egyptian samples. A PCA of the standardized chemical data (not illustrated)identified four clear chemical outliers, one of which was also a petrographic outlier. All fourchemical outliers correspond to samples identified as ‘loners’ in the original interpretation of thethin sections. The seven outliers (three petrographic, three chemical, and one both a petrographicand chemical outlier) were subsequently omitted from all analyses, to allow formal comparisonsbetween different plots.

Thus, Figures 2 and 3 repeat the CA and PCA analyses after this omission, and are labelledaccording to whether the samples are Cretan, Caananite or Egyptian. In the petrographic analysisof Figure 2, separation between the three groups is reasonably good, although the boundarybetween the Caananite and Egyptian samples would be difficult to identify without priorknowledge of the identifications. Two of the Cretan samples seem more akin to Egyptian samples,while one Egyptian sample is firmly located within the Caananite group. We note, again inresponse to a request for clarification from a referee, that the labelling ‘Caananite’ and so onwas initially undertaken without reference to the petrography, but there is some ‘overlap’ in

Table 1 In approach 1, a modified version of Gower’s similarity coefficient is used in which the value of λ determinesthe relative weight given to the geochemical data. For 0, the analysis is based on the petrographic data alone; 1notionally gives equal weight to both forms of data but, for reasons noted in the text, values greater than 1 may besometimes be preferred. What constitutes ‘Large’ may depend on the data set—in our application values of about 5or 6 ensured the dominance of the geochemical data. In approach 2, separate dissimilarity matrices are computed for

the two kinds of data, which are then combined with a weighting determined by µ, where 0 ≤ µ ≤ 1

Petrography Mixed (equal weight) Chemistry

Approach 1 (λ) 0 1 ‘Large’Approach 2 (µ) 0 0.5 1

Figure 1 A three-dimensional component plot based on correspondence analysis of the Kommos petrographic data, using both Cretan and imported material. The main aim is to identify four petrographic outliers, omitted from subsequent analyses. Key to samples: o, Cretan; ×, Canaanite; +, Egyptian.

Page 8: Abordagems Estatísticas Para o Estudo Da Cerâmica

On statistical approaches to the study of ceramic artefacts 149

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

the petrographic classification of the Caananite and Egyptian samples, so that clear separationis not to be expected in the statistical analysis of the petrographic data.

In the chemical analysis of Figure 3, all but two of the Cretan samples separate out clearlyfrom all but two of the imported samples. The separation between the Egyptian and Canaanitematerial is less good than for the petrographic analysis. The broad patterns evident in the two

Figure 2 A two-dimensional component plot based on correspondence analysis of the Kommos petrographic data, using both Cretan and imported material and omitting the four outliers identified in Figure 1. A further three outliers, identified in analysis of the chemical data, have also been omitted. Key to samples: o, Cretan; ×, Canaanite; +, Egyptian.

Figure 3 A two-dimensional component plot based on PCA of the Kommos chemical data, using both Cretan and imported material and omitting seven chemical or petrographic outliers as in Figure 2. Key to samples: o, Cretan; ×, Canaanite; +, Egyptian.

Page 9: Abordagems Estatísticas Para o Estudo Da Cerâmica

150 M. J. Baxter et al.

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

analyses are similar, but the detail is not, evidenced by visual comparison of Figures 2 and 3,and the fact that γ = 0.60. This suggests that it is worth attempting a mixed-mode analysis.

Figure 4 shows the results from the first approach to mixed-mode analysis, using λ = 1. Thesecond approach to mixed-mode analysis that was outlined produced similar results for µ = 0.5and is not illustrated. Experimentation with values of λ > 1 and µ ≠ 0.5 did not produce anyadditional insights into the data. Arguably, the results are more satisfactory than for the chemicaland petrographic analyses alone. All the Cretan material now separates from all but two of theimports (which remain located within the Cretan group on inspection of higher-order components).The Egyptian material is possibly slightly better separated from the Canaanite material, onesample apart, than in the petrographic analysis, although this is a fine judgement to make.

To summarize the analysis so far, treating the division into Cretan, Canaanite and Egyptianmaterial as given, we have shown that the ‘mixed-mode’ approach to analysis is slightly moresuccessful at recovering these distinctions than the separate analysis of chemical or (quantitativelyformulated) petrographic data. This is after ‘peeling off’ some obvious outliers, and removingthem from both the chemical and petrographic data sets to facilitate comparisons.

Example 2—analysis of the imported material only

With the exception of between two and four cases, the Cretan material is convincingly separatedfrom the imported material, and for the purposes of further illustration we shall now concentrateon the latter, with a view to seeing how well the Egyptian and Caananite samples can beseparated, and whether there are subgroups within them.

On the basis of the original thin-section analysis, the imported material was classified into12 fabric groups. Of these, five groups consisted of only one sample and two groups of onlytwo samples. The remaining five fabric groups contained between three and 20 samples. Fourof the fabric groups were divided into subgroups, consisting in some cases of just two or threesamples. Three samples had incomplete chemical information.

Figure 4 A two-dimensional component plot based on mixed-mode analysis (λ = 1) of the Kommos chemical andpetrographic data, omitting chemical and petrographic outliers. Key to samples: o, Cretan; ×, Canaanite; +, Egyptian.

Page 10: Abordagems Estatísticas Para o Estudo Da Cerâmica

On statistical approaches to the study of ceramic artefacts 151

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

Initial data analysis using the petrographic data confirmed that most of the very smallgroups were indeed distinctive and, in the ‘peeling-off’ procedure previously described, afterthree iterations most of the singletons and doubleton groups and subgroups were removedfrom the analysis.

A similar analysis based on the chemical data largely identified the same outliers as thepetrographic analysis. The main exception to this generalization is that a fabric group of twosamples was clearly chemically (though not petrographically) distinct. After removing all outliersin advance of further analysis the four largest groups were left, along with one doubleton, anda single survivor from a fabric group of three that had been classed in a subgroup of its own.

In Figures 5–8, Egyptian samples have been labelled with an ‘E’ (these include the doubletonnoted above), while Caananite samples are labelled 1–5 according to their original classificationbased on the thin-section analysis. Some of the finer distinctions made in the original classificationhave been suppressed but, where appropriate, we will note these in our commentary. Caananitegroup 1 consists of specimens that were grouped with the bulk of the Egyptian specimens inthe original thin section analysis. The singleton labelled 5 in the plots—the ‘single survivor’referred to above—has been retained.

Two points should be emphasized here. The first is that the coding system used for quantitativeanalysis is designed to reflect the properties of the thin sections used to define the fabricgroups. It is therefore to be hoped that this analysis will identify the more obvious features of thefabric grouping, such as singletons. That it does so suggests that the approach to quantificationused is sufficiently sensitive to be combined with the chemical data in a mixed-modeapproach. We shall also see shortly that it is capable of suggesting subgroups not identified inthe original thin-section analysis.

The second point is that the emergence of some samples as clear petrographic outliers isonly obvious after having ‘peeled off’ the Cretan data. This is illustrative of the fact that low-dimensional plots will initially be dominated by the more obvious structure in the data, subtlerfeatures only being evident after obvious structure is stripped out.

Figure 5 A two-dimensional component plot based on correspondence analysis of the Kommos petrographic data for the imported material omitting outliers. Numbers identify Canaanite fabric groups and ‘E’ indicates an Egyptian origin.

Page 11: Abordagems Estatísticas Para o Estudo Da Cerâmica

152 M. J. Baxter et al.

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

Figures 5–7 are similar to Figures 2–4. In Figure 5, based on the quantified petrographicdata, the Egyptian samples that separate out to the right of the plot form two subgroups (notseparated in the thin-section classification). Intermingled with the larger of these is a Canaanitesample. All these samples were originally classified into the same subgroup. Egyptian andCaananite samples (coded 1) that do not plot to the right all belong to different groups orsubgroups.

Figure 6 A two-dimensional component plot based on PCA of the Kommos chemical data for the imported material omitting outliers. Numbers identify Canaanite fabric groups and ‘E’ indicates an Egyptian origin.

Figure 7 A two-dimensional component plot based on mixed-mode analysis (λ = 1) of the Kommos imported material omitting outliers. Numbers identify Canaanite fabric groups and ‘E’ indicates an Egyptian origin.

Page 12: Abordagems Estatísticas Para o Estudo Da Cerâmica

On statistical approaches to the study of ceramic artefacts 153

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

The Canaanite fabric group 3 of five samples (only three show, as there are coincidentplotting positions) and the singleton fabric 5 plot coherently at the top-centre. The other mainconcentration to the left of the plot mixes samples from the other two larger Canaanite groups,2 and 4.

The chemical analysis of Figure 6, as might be expected, does less well at separating out thefabric groups defined on the basis of thin-section analysis. The Egyptian material, with theexceptions noted for Figure 5, plots coherently at the bottom of the plot, along with boththe Canaanite samples associated with the Egyptian samples in the original petrographicclassification. Three of four group 4 samples plot fairly well together, as do eight of ninegroup 2 fabrics, and the separation is better than in the petrographic analysis of Figure 5.Group 3 does not plot coherently.

Figure 7 shows the results from our first mixed-mode approach with λ = 1. It does a betterjob at distinguishing the Egyptian from the Canaanite material, while suggesting that the mainEgyptian group could be divided into two. Only one Egyptian sample, from a separate subgroupin the original classification, plots well away from the main concentration. The picture isotherwise fairly similar to Figure 5, with one group 3 sample plotting much further away fromthe main group 3 concentration.

None of the plots examined so far convincingly isolate fabric groups 2 and 4 from eachother and the remaining groups, but we observed that three of the four samples from group 4did separate out in Figure 6. This suggests that increasing the weight given to the chemicaldata might effect better separation, and Figure 8 shows the results of using the first mixed-mode approach with λ = 3. This, more so than the other plots examined, shows most of groups2 and 4 plotting coherently and separately. There are individual samples within these fabricgroups, and others that are clearly petrographically and/or chemically very distinctive fromtheir fellows as originally identified.

Analyses were also undertaken using our second mixed-mode approach, and these producedessentially similar results to those for our first approach. We comment on this in a little moredetail in our concluding discussion.

Figure 8 A two-dimensional component plot based on mixed-mode analysis (λ = 3) of the Kommos imported material omitting outliers. Numbers identify Canaanite fabric groups and ‘E’ indicates an Egyptian origin.

Page 13: Abordagems Estatísticas Para o Estudo Da Cerâmica

154 M. J. Baxter et al.

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

DISCUSSION AND CONCLUSION

The main aim of this paper has been to propose, illustrate and evaluate what we have termeda ‘mixed-mode’ approach to the quantitative analysis of chemical and petrographic dataobtained in ceramic provenance studies. A novel feature of our approach, we think, is ouradvocacy of the merits of examining several different views of the data, rather than trying toidentify an ‘optimal’ view that captures all the interesting features in the data. These different‘views’ correspond to the different weighting given to the chemical and petrographic data,with at one extreme only the chemical data being used, and at the other extreme only thepetrographic.

A perfectly valid approach to mixed-mode analysis, though we have gone further, would besimply to compare, and synthesize the conclusions from, these two extreme views. Thisrequires that the petrographic data be quantified, a subject discussed in detail in Cau et al.(2004). Another feature of our approach is that we advocate an iterative approach to analysisin which obvious outliers and groups are identified, and then removed from both sets of databefore effecting comparisons.

To evaluate the effectiveness of our methodology, we initially took as given the distinctionbetween the three main provenance groups. It proved relatively easy to separate the Cretanmaterial from the rest. Having removed this from the analysis, we then took as given the fabricgroups defined on the basis of ‘qualitative’ analysis of the thin sections. From this perspectiveour approach seems quite successful, and threw up some surprises.

With the caveat that in several of the fabric groups there were single samples that did notcluster well with others from their group, it proved possible to separate the different groupsfairly well and, inter alia, separate the Egyptian and Canaanite material. This, and this reflectsone of the advantages of our approach, could not be done in a single view. For example, thepetrographic and mixed-mode (λ = 1) analyses fail to separate out fabric groups 2 and 4,whereas analyses that give more weight to the chemical data (e.g., the mixed-mode analysiswith λ = 3 in Fig. 8) do so. In Figure 8 one fabric 4 case, to the middle of the plot, separatesout quite clearly from the other three cases. It is interesting that in a study of comparativeCanaanite material, carried out after the original drafting of this article but independently of it,this isolated fabric 4 case was reclassified as fabric 2, to which it plots more closely in the figure.We have retained the original labelling to emphasize that our approach is capable of identifyinganomalies in the classification that may indeed require rectification.

The mixed-mode analysis (λ = 1) successfully separates out most of the Egyptian materialfrom the rest. The petrographic analysis shows some Egyptian samples that separate from theother Egyptian material, but these belong to different subgroups, information that, for clarity,has been suppressed in the plots. The chemical analysis shows that the specimens involvedhave rather different chemical compositions from the other Egyptian samples. The singleton 5was most successfully isolated in the mixed-mode analysis.

A feature of the petrographic analysis, not identified in the original classification, was thatthe main Egyptian group split clearly into two subgroups. This was also evident in the mixed-mode analysis, but not the chemical analysis where it plotted as a coherent chemical group.

How effective would our analysis have been had we lacked the thin-section classification,and interpretation in terms of provenance, that has informed our discussion so far? We couldcertainly have separated the Cretan from the other material. This might have taken with it twosamples (1 Canaanite, 1 Egyptian) that are, however, clear petrographic outliers with respectto the other Canaanite and Egyptian samples.

Page 14: Abordagems Estatísticas Para o Estudo Da Cerâmica

On statistical approaches to the study of ceramic artefacts 155

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

If one now views Figures 5–7 and ignores the labelling, simply looking for pattern in termsof cluster structure, to our mind the mixed mode analysis of Figure 7 seems most satisfactory.The two groups to the right of the plot are clear as, we think, are the groups consisting of 3sin the top centre, and of 2s and 4s to the left. Remaining interpretation is more subjective, butwe would be inclined to identify 5 and 3 in the top-left as isolated cases and either treat theremaining six cases as a single group or split it into a group of three cases, and three more isolatedspecimens. However this is done, the groups so identified can be labelled and such labellingapplied to other plots. This would immediately show, for example, that in the largest group tothe bottom-right of Figure 7, the two Egyptian samples (in fact from a different fabric groupfrom the others) separate out on the petrographic plot and are chemically different from eachother, and that the sample labelled 2 is also distinct. Those plots weighted towards the chemicaldata also show that the 2s and 4s in the leftmost group separate out.

In other words, starting from the mixed-mode plot of Figure 7 without assuming labelling,it is possible to identify all the different fabric groups reasonably successfully; and additionallysplit the largest group into two-subgroups. Furthermore, it is possible to identify outliers relativeto their presumed group whose classification might be questioned and re-evaluated. As this isprimarily a methodological paper these issues will be pursued elsewhere. The present claim isthat our analyses suggest that the mixed-mode methodology, in which comparison of differentanalyses plays an integral part, offers potential advantages compared to the quantitative analysisof petrographic or chemical data only.

Two approaches to mixed-mode analysis have been suggested. The second of these is apparentlythe more sensitive, since it allows finer control over the weighting of the different types ofdata, but we have not illustrated it here. One reason is that results for λ = 1 and µ = 0.5 werevery similar. Another reason is that we found that for values of µ not very different from 0.5,analyses very quickly became similar to an analysis of either the petrographic data alone, orthe chemical data alone, depending on which direction µ was varied in. This is illustrated inFigure 9, which shows how Sibson’s coefficient, γ, measuring the similarity with the petrographic

Figure 9 For the second mixed-mode approach, the graph shows how closely the results compare to the petrographic (solid line) and chemical analyses (dashed line) alone as µ varies, and as measured by Sibson’s coefficient, γ, for the two-dimensional plots.

Page 15: Abordagems Estatísticas Para o Estudo Da Cerâmica

156 M. J. Baxter et al.

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

and chemical analyses, varies as µ varies. Thus, for the analyses reported here, experimentingwith different values of µ added little to those analyses reported. Whether or not this willgenerally be the case is unclear, and would need more experience to decide.

Early experiments in the cluster analysis of mixed-mode data notwithstanding (Rice andSaffer 1982; Phillip and Ottaway 1983), there seem to have been few attempts to apply suchmethods in the published archaeometric literature. Our emphasis is on the use of ordinationmethods, such as PCA and MDS, rather than cluster analysis, and we also place importanceon an iterative approach to the analysis, and the comparison of different views of the data.

Computational resources have increased considerably since the early period of experimentation,and our methodology exploits this. Computational aspects, with additional examples, aredescribed in Beardah et al. (2003) and, briefly, in the appendix to this paper. Ours is an exploratoryapproach to data analysis. Recently, Moustaki and Papageorgiou (2005) have developed amodel-based approach to mixed-mode analysis that is applied to archaeometric data identicalin kind to ours. Their methodology is more complex than ours, both mathematically andcomputationally, and, being model-based, dependent on distributional assumptions one mightnot always wish to make. Such assumptions allow a more formal approach to determining thenumber of groups in the data, and assessing goodness-of-fit. Both their examples involve datasets where the structure is quite clear, using either petrographic or chemical data, and furtherexperience (and possibly development) is needed to assess how it would handle large data setswith numerous small groups (including singletons) where the separate chemical and petrographicanalyses tell different stories.

ACKNOWLEDGEMENTS

This work forms part of the GEOPRO Research Network funded by the DGXII of the EuropeanCommission, under the TMR Network Programme (Contract Number ERBFMRX-CT98-0165). We are grateful to Hector Neff for access to his unpublished work and Jaume Buxeda iGarrigós for suggesting the second of the mixed-mode approaches. For permission to sampleand analyse the transport jars from Kommos, we are grateful to J. W. Shaw, J. B. Rutter andthe 23rd Ephorate of Prehistoric and Classical Antiquities, Herakleion, Crete. We are alsograteful to Archaeometry’s referees for thorough and constructive comment on the originalversion of this paper.

REFERENCES

Baxter, M. J., 2003, Statistics in archaeology, Arnold, London.Baxter, M. J., and Buck, C. E., 2000, Data handling and statistical analysis, in Modern analytical methods in art and

archaeology (eds. E. Ciliberto and G. Spoto), 681–746, Wiley, New York.Beardah, C. C., Baxter, M. J., Papageorgiou, I., and Cau, M. A., 2003, ‘Mixed-mode’ approaches to the grouping of

ceramic artefacts using S-Plus, in The digital heritage of archaeology: CAA2002 (eds. M. Doerr and A. Sarris),261–5, Hellenic Ministry of Culture, Greece.

Beier, T., and Mommsen, H., 1994, Modified Mahalanobis filters for grouping pottery by chemical composition,Archaeometry, 36, 287–306.

Bieber, A. M., Brooks, D. W., Harbottle, G., and Sayre E. V., 1976, Application of multivariate techniques to analyticaldata on Aegean ceramics, Archaeometry, 18, 59–74.

Cau, M. A., Day, P. M., Baxter, M. J., Papageorgiou, I., Iliopoulos, I., and Montana, G., 2004, Exploring automaticgrouping procedures in ceramic petrology, Journal of Archaeological Science, 31, 1325–38.

Glascock, M. D., 1992, Characterization of archaeological ceramics at MURR by neutron activation analysis andmultivariate statistics, in Chemical characterization of ceramic pastes in archaeology (ed. H. Neff), 11–26,Prehistory Press, Madison, WI.

Page 16: Abordagems Estatísticas Para o Estudo Da Cerâmica

On statistical approaches to the study of ceramic artefacts 157

© University of Oxford, 2007, Archaeometry 50, 1 (2008) 142–157

Gower, J. C., 1971, A general coefficient of similarity and some of its properties, Biometrics, 27, 857–71.Kaufman, L., and Rousseeuw, P. J., 1990, Finding groups in data, Wiley, New York.Moustaki, I., and Papageorgiou, I., 2005, Latent class models for mixed variables with applications in archaeometry,

Computational Statistics and Data Analysis, 48, 659–75.Neff, H., Bishop, R. L., and Rands, R. L., 1988, Similarity/distance measures: solutions to the mixed level data problem,

unpublished manuscript.Phillip, G., and Ottaway, B. S., 1983, Mixed data cluster analysis: an illustration using Cypriot hooked-tang weapons,

Archaeometry, 25, 119–33.Rice, P. M., and Saffer, M. E., 1982, Cluster analysis of mixed-level data: pottery provenience as an example, Journal

of Archaeological Science, 9, 395–409.Romesburg, H. C., 1984, Cluster analysis for researchers, Lifetime Learning Publications, Belmont, CA.Shennan, S., 1988, Quantifying archaeology, Edinburgh University Press, Edinburgh.Sibson, R., 1978, Studies in the robustness of multi-dimensional scaling: Procrustes statistics, Journal of the Royal

Statistical Society (B), 40, 234–8.Venables, W. N., and Ripley, B. D., 1999, Modern applied statistics with S-Plus, 3rd edn, Springer, New York.Venables, W. N., and Ripley, B. D., 2002, Modern applied statistics with S, 4th edn, Springer, New York.

APPENDIX

Computational considerations

All the analyses reported in this paper were undertaken using S-PLUS 2000 for WINDOWS(Venables and Ripley 1999), and our implementation is described in Beardah et al. (2003).This system is now obsolete, having been superseded by later versions of S-PLUS, describedin the fourth edition of Venables and Ripley’s (2002) book. Were we to start this research anewwe would use the R system, which has similar functionality to S-PLUS with the additionaladvantage of being Open Source. Venables and Ripley (2002) is a good guide to R as well asS-PLUS, though other texts at various levels are increasingly appearing.

All these systems, or packages of functions written for them, allow the application of methodssuch as PCA and MDS (metric and non-metric) to be applied with relative ease. For the firstof the mixed-mode approaches described we used the daisy function to calculate the generalizedGower’s coefficient, as described in Kaufman and Rousseeuw (1990). Anyone wishing toemulate this should be aware that, using chemical (i.e., continuous) data only, daisy defaults tocomputing Euclidean distance as a measure of dissimilarity. If used to compute the dissimilaritymatrix Dc described in our second approach, this means that Dc needs to be rescaled, so thatentries lie between 0 and 1, before implementing the approach.

Interested readers are welcome to approach the second author for further details and accessto our code, but should be aware that this might need modification for those systems currentlyavailable.