| | Re: Discuss advanced data analysis techniques relevant to the problem of characterisi
Here's your answer:
In the clustering process, after similarity score calculations, the most closely related pairs are identified in an above-diagonal scoring matrix. In this process, a node in the hierarchy is created for the highest-scoring pair, the gene expressed profilers averaged, and the joined elements are weighted by the number of elements they contain. The matrix is then updated replacing the two joined elements by the node. However, a gene express pattern for which a high value is found at an intermediate time point will be clustered with another gene for which a high value is found at a later point in time. These variations have to be separated in a subsequent step.
Principal Component Analysis:
PCA is a linear mathematical technique that finds base vectors that expand the problem space. These vectors are called PCs. A PC can be thought of as a major pattern in the data set (e.g. gene expression data). The more PCs are used to expand (model) the problem space, the more accurate the representation will be. However, the lower the significance of a PC, the more noise it represents. So a balance needs to be struck between the need for maximal expansion of the problem space and the need for elimination of noise.