DagliJava 機(jī)器學(xué)習(xí)函數(shù)庫(kù)
Dagli 是 LinkedIn 開(kāi)源的用于 Java(和其他 JVM 語(yǔ)言)的機(jī)器學(xué)習(xí)函數(shù)庫(kù),其開(kāi)發(fā)團(tuán)隊(duì)表示通過(guò)它可輕松編寫不易出錯(cuò)、可讀、可修改、可維護(hù)且易于部署的模型管道,而不會(huì)引起技術(shù)債。Dagli 充分利用了現(xiàn)代多核的 CPU 和功能日益強(qiáng)大的 GPU,可以對(duì)真實(shí)世界模型進(jìn)行有效的單機(jī)訓(xùn)練。
下面是一個(gè)文本分類器的介紹性示例,此文本分類器以管道的形式實(shí)現(xiàn),使用梯度增強(qiáng)決策樹(shù)模型 (XGBoost) 的主動(dòng)學(xué)習(xí)以及高維 ngram 集作為邏輯回歸分類器中的特征:
Placeholder<String> text = new Placeholder<>();
Placeholder<LabelType> label = new Placeholder<>();
Tokens tokens = new Tokens().withInput(text);
NgramVector unigramFeatures = new NgramVector().withMaxSize(1).withInput(tokens);
Producer<Vector> leafFeatures = new XGBoostClassification<>()
.withFeaturesInput(unigramFeatures)
.withLabelInput(label)
.asLeafFeatures();
NgramVector ngramFeatures = new NgramVector().withMaxSize(3).withInput(tokens);
LiblinearClassification<LabelType> prediction = new LiblinearClassification<LabelType>()
.withFeaturesInput().fromVectors(ngramFeatures, leafFeatures)
.withLabelInput(label);
DAG2x1.Prepared<String, LabelType, DiscreteDistribution<LabelType>> trainedModel =
DAG.withPlaceholders(text, label).withOutput(prediction).prepare(textList, labelList);
LabelType predictedLabel = trainedModel.apply("Some text for which to predict a label", null);
// trainedModel now can be serialized and later loaded on a server, in a CLI app, in a Hive UDF...評(píng)論
圖片
表情
