eukaryotic and prokaryotic promoter prediction using hybrid approach hao lin qian-zhong li theory in...
TRANSCRIPT
![Page 1: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/1.jpg)
“Eukaryotic and prokaryotic promoter predictionusing hybrid approach”
Hao Lin • Qian-Zhong Li
Theory in Biosciences, 2011
![Page 2: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/2.jpg)
Eukaryotic and prokaryotic promoter prediction using hybrid approach
Introdução Bases de dados Abordagem proposta
Técnicas Experimentos Resultados
![Page 3: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/3.jpg)
Introdução
Conservação Oligonucleotídeo
K-mer Transcrição (Exons e Introns) PWM (Position Weight Matrix)
![Page 4: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/4.jpg)
IntroduçãoTranscrição
![Page 5: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/5.jpg)
IntroduçãoPWM (Position Weight Matrix)
![Page 6: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/6.jpg)
Base de Dados
Eucariotos 300 pb → -249 a +50 (TSS = 0)
Procariotos 81 pb → -60 a +20 bp (TSS = 0)
Bases (5 espécies) C. elegans → 598 promoters, 600 coding sequences and 600 introns B. subtilis → 270 promoters, 300 coding sequences and 300 convergent
intergenic sequences H. sapiens → 1787 promoters, 1800 coding sequences and 1800 introns D. melanogaster → 1886 promoters, 2859 coding sequences and 1799
introns E. coli → 741 promoters, 700 coding sequences and 700 convergent
intergenic sequences
![Page 7: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/7.jpg)
Abordagem proposta
Abordagem híbrida baseada em Mahalanobis Discriminant modificado para identificação de promotores procarióticos e eucarióticos Utiliza 2 técnicas para descrever características de
sinal e de composição Position Correlation Score Function (PCSF) (Li and Lin
2006; Gordon et al. 2006; Kielbasa et al. 2005) Increment of Diversity (ID) (Laxton 1978)
![Page 8: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/8.jpg)
Abordagem proposta
PCSF (baseado em PWM) Usado para estimar a ocorrência de sequências k-
mer numa posição específica ID
Medir a similaridade na composição de oligonucleotídeos em sub-regiões específicas entre sequências de teste e de treino
MD modificado Usando como entrada os resultados de PCSF e ID,
é aplicado para predizer promotores
![Page 9: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/9.jpg)
Conservação de oligonucleotídeos
Quanto maior o valor de maior a conservação da região
M k i
![Page 10: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/10.jpg)
![Page 11: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/11.jpg)
Uma matriz de probabilidade trimer oligonucleotídeo com 64 linhas (uma linha para cada trimer oligonucleotídeo) e o número de colunas de regiões conservadas é construído através da equação:
Position Correlation Score Function (PCSF)
![Page 12: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/12.jpg)
Position Correlation Score Function (PCSF)
Baseado na matriz de probabilidade construída, a Equação 3
Average background probability →
pode ser utilizada para calcular o valor de peso
das sequências (assim como e )
shows the degree of sequence closed to matrix resource
F promoter
F non−coding F coding
F
![Page 13: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/13.jpg)
Increment of Diversity (ID)
De acordo com o conceito de diversidade, se uma sequência X pode ser descrita como um vetor d-dimensional , então a diversidade da sequência será
Frequência absoluta do ith k-mer oligonucleotídeo →
Para duas sequências, o incremento de diversidade pode ser descrito como
![Page 14: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/14.jpg)
Increment of Diversity (ID)
Quanto menor o ID, maior a similaridade entre 2 sequências
![Page 15: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/15.jpg)
Através do PCSF e do ID, cada sequência eucariótica pode ser descrita como um vetor de 12 dimensões 3 do PCSF e 9 do ID
![Page 16: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/16.jpg)
Mahalanobis Discriminant (MD)
Group mean →
Covariance matrix of training dataset →
Inverse matrix →
Determinant →
![Page 17: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/17.jpg)
Função de predição
![Page 18: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/18.jpg)
Hipótese
Sequências codificantes e não-codificantes juntas numa única base negativa podem resultar em baixa performance Diferença entre sequências codificantes e não
codificantes
![Page 19: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/19.jpg)
Sub-regiões sequências de promotores eucarióticas
As bases de dados de promotores eucarióticos foram divididas em 3 regiões Transcrita Não transcrita Core promoter
![Page 20: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/20.jpg)
Sub-regiões em sequências de promotores eucarióticos
cctcgatagtgccctcataaggcgcttaaacccaccttacccttaccatcatggctagtcgacgccaaaagcagttcgatcggaagtacagctcctatcggtaggtttggagattctggagctgaaaaaaccaatttt
Core Promoter Região transcrita
Região não transcrita
TSS
![Page 21: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/21.jpg)
Experimentos
![Page 22: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/22.jpg)
Experimentos
Base de teste e de treino dividida em 10 partes Varia-se a taxa entre elas 5 taxas diferentes
10-fold cross validation Comparações com outras abordagens para D.
melanogaster 400 seq → 200 promotores, 100 coding, 100 introns
Comparações com outras abordagens para H. sapiens 400 seq → 200 promotores, 100 coding, 100 introns
![Page 23: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/23.jpg)
Avaliação de performance
![Page 24: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/24.jpg)
![Page 25: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/25.jpg)
Resultados
![Page 26: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/26.jpg)
Resultados
![Page 27: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/27.jpg)
Resultados
![Page 28: Eukaryotic and prokaryotic promoter prediction using hybrid approach Hao Lin Qian-Zhong Li Theory in Biosciences, 2011](https://reader036.vdocuments.com.br/reader036/viewer/2022062512/552fc10b497959413d8c2110/html5/thumbnails/28.jpg)
Trabalhos futuros
Utilizar informações estruturais do DNA e predição completa do genoma