PCA's are cool but I find it maddening when people convert sparse data (like counts of how many words are shared between documents) into dense distance data to use it.
You can shortcut the whole process by finding the smallest non zero eigenvalue/eigenvector pairs of the graph laplacian (Fiedler vectors). You need to use a sparse solver that can find the smallest values/vectors instead of the larges (like LOBCPG) but that is faster anyways.
You can shortcut the whole process by finding the smallest non zero eigenvalue/eigenvector pairs of the graph laplacian (Fiedler vectors). You need to use a sparse solver that can find the smallest values/vectors instead of the larges (like LOBCPG) but that is faster anyways.