问题描述:

I implemented the default gmm model provided in mllib for my algorithm.

I am repeatedly finding that the resultant weights are always equally waited no matter how many clusters i initiate. Is there any specific reason why the weights are not being adjusted ? Am I implementing it wrong ?

import org.apache.spark.mllib.clustering.GaussianMixture

import org.apache.spark.mllib.clustering.GaussianMixtureModel

import org.apache.spark.mllib.linalg.Vectors

import org.apache.spark.ml.feature.VectorAssembler

import org.apache.spark.ml.feature.Normalizer

import org.apache.spark.sql.DataFrame

import org.apache.spark.sql.DataFrameNaFunctions

var colnames= df.columns;

for(x<-colnames)

{

if (df.select(x).dtypes(0)._2.equals("StringType")|| df.select(x).dtypes(0)._2.equals("LongType"))

{df = df.drop(x)}

}

colnames= df.columns;

var assembler = new VectorAssembler().setInputCols(colnames).setOutputCol("features")

var output = assembler.transform(df)

var normalizer= new Normalizer().setInputCol("features").setOutputCol("normalizedfeatures").setP(2.0)

var normalizedOutput = normalizer.transform(output)

var temp = normalizedOutput.select("normalizedfeatures")

var outputs = temp.rdd.map(_.getAs[org.apache.spark.mllib.linalg.Vector]("normalizedfeatures"))

var gmm = new GaussianMixture().setK(2).setMaxIterations(10000).setSeed(25).run(outputs)

Output code :

for (i <- 0 until gmm.k) {

println("weight=%f\nmu=%s\nsigma=\n%s\n" format

(gmm.weights(i), gmm.gaussians(i).mu, gmm.gaussians(i).sigma))

}

And therefore the points are being predicted in the same cluster for all the points .

var ol=gmm.predict(outputs).toDF

相关阅读:
Top