sistemÁtica)filogenÉtica - botanicaamazonica.wiki.br cladogramas mais curtos (com a máxima...

SISTEMÁTICA FILOGENÉTICA

Aula 6: inferência filogenética

Parcimônia

Inferência

Qual árvore é a que melhor representa a relação evolutiva entre as espécies?

Um exemplo em Carnivora

Inferência

Monofilético!

> Relação dentro do grupo?

> Dentição, dieta….

Inferência

Um exemplo em Carnivora

Amostragem?

> 250 spp.

Quais spp. incluir?

Fissípedes Pinípedes

Inferência

2 grupos

AguáticosTerrestres

> Um representante de cada família!

Inferência

2 grupos

Qual caráter analisar?

> Qualquer caráter que varie entre os terminais!

> ~ 1990: dados morfológicos

Grupo externo (outgroup)

> Ponto de comparação com o ingroup

> Permite enraizar polarizar os caracteres

Creodonta, grupo externo, extinto

Inferência

> Ignorados> Impossível caracterizar

Inferência

Sequências DNA

Inferência Hennigan

Willi Hennig

(1913-‐1976)

1) existe uma possível árvore

1) NÃO existe homoplasia

Caráter

estado plesiomórfico apomórfico

> Não existe mutação reversa/independete

> Terminais que compartilham uma apomorfia = CLADO

Aplicando o método

em uso ~1960 – ~1970

Aplicando o método

Problemas…..

> Faz suposições não realistas sobre a evolução

Homoplasia é ausente, mas…. ocorre!

Duas características não distinguíveis podem evoluir paralelamente!

Problemas na escolha do estado de caráter!

Presença/ ausência baixo premolar 1

Presente

Homoplásico!

Parcimônia

> Homoplasia pode ocorrer!

> Podemos minimizar!

A árvore mais parcimoniosa é aquela que requer o menor número de eventos evolu^vos (e.g., subsjtuição de nucleokdeos, trocas de amino ácidos, etc)para explicar os dados

Tipos de dados utilizados eminferência filogenética

Tipos de métodos computacionais:

Algorítmos de Agrupamento: Usa distâncias par a par. São métodos puramente algorítmicos, em que o algorítmo define tanto a topologia da árvore quanto o critério de seleção das mesmas. Tendem a ser muito rápidos computacionalmente e produzem uma única árvore, normalmente enraizada por distância. Não possuem uma função objetiva para comparação com outras árvores, mesmo se várias outras podem explicar igualmente bem os dados.

Lembre-se: Encontrar uma única árvore não é necessariamente a mesma coisa de encontrar a “verdadeira” árvore evolutiva”.

Critério de Otimização: Usam tanto caracteres quanto dados de distância. Primeiramente definem um Critério de Optimização(Tamanho mínimo de ramos, Menor número de eventos, Maior verossimilhança), então usa um algorítmo específico para encontra as árvores com o melhor valor para a função objetiva. Pode identificar várias árvores igualmente ótimas, se estas existiremt. Lembre-se: Achar a “melhor” árvore não é necessariamente a mesma coisa de encontrar a “verdadeira” árvore evolutiva”.

Tipos de dados utilizados eminferência filogenética

Tipos de métodos computacionais:

Algorítmos de Agrupamento: Usa distâncias par a par. São métodos puramente algorítmicos, em que o algorítmo define tanto a topologia da árvore quanto o critério de seleção das mesmas. Tendem a ser muito rápidos computacionalmente e produzem uma única árvore, normalmente enraizada por distância. Não possuem uma função objetiva para comparação com outras árvores, mesmo se várias outras podem explicar igualmente bem os dados.

Lembre-se: Encontrar uma única árvore não é necessariamente a mesma coisa de encontrar a “verdadeira” árvore evolutiva”.

Critério de Otimização: Usam tanto caracteres quanto dados de distância. Primeiramente definem um Critério de Optimização(Tamanho mínimo de ramos, Menor número de eventos, Maior verossimilhança), então usa um algorítmo específico para encontra as árvores com o melhor valor para a função objetiva. Pode identificar várias árvores igualmente ótimas, se estas existiremt. Lembre-se: Achar a “melhor” árvore não é necessariamente a mesma coisa de encontrar a “verdadeira” árvore evolutiva”.

Parcimônia

Aplicando o método

outgroupingroup

caráter 1 = 1 mudança

caráter 2 = 2 mudança

1, 3, 4, 5 e 8 = apenas uma reconstrução!

2, 6 e 7 = múltiplas reconstruções!

Parcimônia

Aplicando o método

11 passo!

Parcimônia

Aplicando o método

9 passos!

1, 2, 3, 5, 6, 7 e 8 = apenas uma reconstrução!

4 = múltiplas reconstruções!

Tree 3

Parcimônia

Aplicando o método

> caráter informativo = 2, 4, 6 e 7

> caráter informativo vs. não informativo

> caráter não informativo = 1, 3, 5 e 8 autoapomorfia!

Parcimônia

Procurando a melhor árvore

outgroupingroup

Parcimônia

~ 20 taxa Busca “Branch-‐and-‐bound”

BranchBranch--andand--boundbound1. Traverse a search tree in a depth-first sequence2. Select upper bound (L) on optimal value of chosen criterion.3. Move along path to tips and evaluate trees. If tree is >L then dispense the rest of that path.

Buscas Heurísticas

-Árvores iniciais-“stepwise addition”-decomposição estelar

-Busca por árvores melhores-“Branch Swapping”

algorítmos heurísticos: Métodos aproximados quetentam encontrar a árvore ótima para o critério de escolha, mas não podem garanti-la. Buscas heurísticas muitavezes operam de forma “Colina acima” (“hill-climbing).

Cladogramas mais curtos (com

a máxima parcimônia) sem

calcular todos os cladogramas

possíveis (exhaustive

Buscas Heurís^cas

Parcimônia

> 20 taxa

> Tentam encontrar a árvore ótima para o critério de escolha, mas não podem garanti-la

> Operam de forma “Colina acima” (hill-climbing)

Buscas Heurís^cas

Parcimônia

Uso de sequências de DNA

Seraina Klopfstein, Stockholm, 28 May – 1 June 2012

Pradosia brevipes

Pradosia cochlearia

Pradosia decipiens

> Caráter = posição do nucleotídio (1, 2, 3…)

> Estado Caráter = nucleotídios (A, C, T, G)

Parcimônia

Uso de sequências de DNA

em macromoléculas (DNA, proteínas)….> Primeiro passo, Alinhamento

Problemas???

A T G A C C T G G C G G C T T T AA T G T G G A T A T G G C A T T A

Parcimônia

Uso de sequências de DNA> Primeiro passo, Alinhamento

A T G A C C T G G – – – – C G G C T – T T AA T G – – – T G G A T A T – G G C – A T T A

> com adição de 5 INDELS

Parcimônia

A T G A C C T G G – – – C G G C T T T AA T G – – – T G G A T A T G G C A T T A

> 2 INDELS + 2 subsjtuições

???> 5 INDELS

> 2 INDELS + 2 subsjtuiçõesou…

Is Sequence Alignment an Art or a Science?

David A. Morrison

Systematic Biology, Uppsala University, Norbyvagen 18D, 75236 Uppsala, SwedenAuthor for correspondence (David.Morrison@ebc.uu.se)

Communicating Editor: Mark P. Simmons

Abstract—Aligning multiple nucleotide sequences is a prerequisite for many if not most comparative sequence analyses in evolutionarybiology. These alignments are often recognized as representing the homology relations of the aligned nucleotides, but this is a necessaryrequirement only for phylogenetic analyses. Unfortunately, existing computer programs for sequence alignment are not based explicitlyon detecting the homology of nucleotides, and so there is a notable gap in the existing bioinformatics repertoire. If homology is the goal, thencurrent alignment procedures may be more art than science. To resolve this issue, I present a simple conceptual scheme relating the tradi-tional criteria for homology to the features of nucleotide sequences. These relations can then be used as optimization criteria for nucleotidesequence alignments. I point out the way in which current computer programs for multiple sequence alignment relate to these criteria, notingthat each of them usually implements only one criterion. This explains the apparent dissatisfaction with computerized sequence alignmentin phylogenetics, as any program that truly tried to produce alignments based on homology would need to simultaneously optimize all ofthe criteria.

Keywords—Multiple alignment, alignment algorithm, sequence homology.

Multiple sequence alignment software have not yet mettheir primary aim for evolutionary biologists: maximizinghomology of characters. This is in spite of 30 yr of workin the field by scores of people (starting with Hogweg andHesper 1984). All of this effort has led to a proliferation ofalignment methods that have diverse optimization functions,along with assorted heuristics to search for the optimumalignment. These methods produce detectably different mul-tiple sequence alignments in almost all realistic cases, whichleaves the phylogenetics practitioner wondering what to do.If the goal is to develop an automated procedure for

homology assessment, then we currently do not have one,and no one has demonstrated where we might get one inpractice. It is worth looking at why, and also how we mightmake some progress in the near future. My purpose hereis therefore to try to conceptualize why there are currentlyso many different approaches to sequence alignment (e.g.see the lists of programs in Do and Katoh 2009; Anisimovaet al. 2010), and see how they relate to each other in thecontext of homology assessment.I start by putting aside the automation issue for the

moment, and looking first at the actual biological goal(nucleotide homology). I try to identify the traditional para-digm for detecting homology, and then explicitly relate thisto nucleotide sequences. Only then do I consider whether /how this paradigm might be automated.

Homology as a Goal for Alignment

Homology is a topic of long-standing interest to biologists(Hall 1994; Bock and Cardew 1999; Wagner 2001; Kleisner2007). This follows from the idea that both homologies andphylogenies need to be “discovered” within the phenotypicand genotypic data that we have accumulated about bio-logical organisms. How do we go about this discovery?If we accept the idea that there is no fundamental differ-

ence between homology in classical and molecular biology,then for sequence alignment two sequences are homologousif they have descended through a chain of replication froma common precursor molecule, and their residues are alsohomologous if they have, in turn, descended through a chainof replication from a common precursor set of residues. If a

multiple sequence alignment is to represent homology rela-tionships, then all of the nucleotides in any column of thealignment should be homologous, or at least be hypothe-sized as homologous. Homology is not the only possiblecriterion for aligning nucleotides, but it is the one that I amaddressing here: homology is the relationship among partsof organisms that provides evidence for common ancestry(Brower and de Pinna 2012).

Sequence alignment is one of the core techniques in bio-informatics (Wallace et al. 2005; Edgar and Batzoglou 2006;Kumar and Filipski 2007; Notredame 2007; Pei 2008; Kemenaand Notredame 2009). Indeed, some of the most-cited papersin biology describe the most commonly used alignmentprograms: BLAST for pairwise alignment (papers ranked12th and 14th in the Science Citation Index) and Clustal formultiple alignment (ranked 10th and 28th) (van Noordenet al. 2014). Bioinformatics lies at the junction of mathematics,computing and biology. The computer programs implementmathematical algorithms in a usable and efficient way, andthe algorithms define a procedure for optimizing some objec-tive function. The objective function will be an equation (or setof equations) that mathematically defines some biologicalnotion, so that optimizing the function with respect to anygiven data will yield a biologically relevant answer. Thisnexus defines the importance of bioinformatics in modernbiological science.

The catch for sequence alignment is that there is noknown objective function for identifying homology, and sothe bioinformatics nexus breaks down. Homology relationsare defined by unique historical events (Donoghue 1992;Brigandt 2003), which by their very nature are unobserv-able: homology exists independently of our ability to rec-ognize it. Comparative biology is thus based on studyingthe features of contemporary organisms, on the groundsthat they will contain traces of their historical ancestry,from which homology relations might be extracted, how-ever imperfectly. It is, however, very difficult to get anyinformatics into this biology.

The mathematical argument for current computerizedalignment practices is basically this:

similarity = homology + analogy

Parcimônia

Análises

Morfologia DNAchances: 1 em 2000

Parcimônia

Peso para os caracteres (Character state Weigh/ng)

Fitch parsimoy (Fitch 1971)> Peso igual aos caracteres!

> Peso para caracteres mais informa^vos!

ou….Generalized parsimony

> mais informa^vo

Parcimônia

Problemas….

> Não leva em consideração o comprimento dos ramos

-‐ Taxa de evolução é alta

-‐ Braços com diferente comprimento

alguma informação pode ser perdida

> Long-‐branch a3rac4on-‐ Braços longos são agrupados

> Long-‐branch a3rac4on

sistemÁtica)filogenÉtica - botanicaamazonica.wiki.br cladogramas mais curtos (com a máxima...

Documents

ramificação - botanicaamazonica.wiki.br ocorre a...

prof. james scandian questÃo 1objetivos: com base na...

taxatividade da improcedÊncia liminar do pedido the … ·...

nova hipótese de relacionamento filogenético entre os...

05 focus mycobactéries (db):mise en page 1€¦ ·...

construindo o pensamento filogenético na educação...

redes inteligentes para cidades inteligentes e casas ... ·...

cladogramas, evolução e ensino de biologia

plurall › public › test › 2018_2017 › ec508… ·...

!filogenia!e!evolução!dasespéciesneotropicaisde! pteris...

análise filogenética para dados moleculares -...

enfoques fundamentais - ib.usp.br · biogeografia...

biologia - portal vestibular uerj · momentos importantes...

introdução ao uso do programa r -...

classificaÇÃo biolÓgica -...

circus universidade de sÃo paulo...number of test cases...

moraceae - botanicaamazonica.wiki.br raiz suporte ausente...

analisando cladogramas

evolução e sistemática cladogramas a sistemática é a...

sistemÁtica filogenÉtica dos gÊneros...