tópicos especiais em aprendizagem reinaldo bianchi centro universitário da fei 2012

41
Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012

Upload: internet

Post on 18-Apr-2015

108 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Tpicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitrio da FEI 2012
  • Slide 2
  • 4a. Aula Parte B
  • Slide 3
  • O algoritmo K-means
  • Slide 4
  • K-Means n Algoritmo muito conhecido para agrupamento (clustering) de padres. n Usado quando se pode definir o nmero de agrupamentos: Escolha o nmero de agrupamentos desejado. Escolha centros e membros dos agrupamentos de modo a minimizar o erro. No pode ser feito por busca: muitos parmetros.
  • Slide 5
  • K-Means n Algoritmo: Fixe os centros dos agrupamentos. Aloque os pontos para o agrupamento mais prximo. Recalcule os centros dos clusters, como sendo a mdia dos pontos que ele representa. Repita at que os centros parem de se mover.
  • Slide 6
  • K-Means n Pode ser usado para qualquer atributo para o qual se pode calcular uma distncia
  • Slide 7
  • Clustering n Partitioning Clustering Approach: a typical clustering analysis approach via partitioning data set iteratively construct a partition of a data set to produce several non-empty clusters (usually, the number of clusters given in advance) in principle, partitions achieved via minimising the sum of squared distance in each cluster
  • Slide 8
  • Clustering n Given a K, find a partition of K clusters to optimise the chosen partitioning criterion: global optimal: exhaustively enumerate all partitions Heuristic method: K-means algorithm (MacQueen67): each cluster is represented by the center of the cluster and the algorithm converges to stable centers of clusters.
  • Slide 9
  • Algorithm n Initialisation: set seed points n Assign each object to the cluster with the nearest seed point; n Compute seed points as the centroids of the clusters of the current partition (the centroid is the centre, i.e., mean point, of the cluster) n Go back to Step 1), n stop when no more new assignment Given the cluster number K, the K-means algorithm is carried out in three steps:
  • Slide 10
  • Example n Suppose we have 4 types of medicines and each has two attributes: pH and weight index. n Our goal is to group these objects into K=2 group of medicine.
  • Slide 11
  • Example AB C D MedicineWeightpH-Index A11 B21 C43 D54
  • Slide 12
  • Step 1: Use initial seed points for partitioning Assign each object to the cluster with the nearest seed point Euclidean distance
  • Slide 13
  • Step 2: Compute new centroids of the current partition Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.
  • Slide 14
  • Step 2: Renew membership based on new centroids 14 Compute the distance of all objects to the new centroids Assign the membership to objects
  • Slide 15
  • Step 3: Repeat the first two steps until its convergence Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.
  • Slide 16
  • Repeat the first two steps until its convergence Compute the distance of all objects to the new centroids Stop due to no new assignment
  • Slide 17
  • K-means Demo 17 1.User set up the number of clusters theyd like. (e.g. k=5)
  • Slide 18
  • K-means Demo 18 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster Center locations
  • Slide 19
  • K-means Demo 19 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster Center locations 3.Each data point finds out which Center its closest to. (Thus each Center owns a set of data points)
  • Slide 20
  • K-means Demo 20 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each Center owns a set of data points) 4.Each centre finds the centroid of the points it owns
  • Slide 21
  • K-means Demo 21 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each centre owns a set of data points) 4.Each centre finds the centroid of the points it owns 5.and jumps there
  • Slide 22
  • K-means Demo 22 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each centre owns a set of data points) 4.Each centre finds the centroid of the points it owns 5.and jumps there 6.Repeat until terminated!
  • Slide 23
  • Exemplo K-means no Matlab 23
  • Slide 24
  • Exemplo k-means no iPad 24
  • Slide 25
  • Relevant Issues n Efficient in computation O(tKn), where n is number of objects, K is number of clusters, and t is number of iterations. Normally, K, t