POS tagging a sentence in Scala using Spark NLP.
POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag, based on its context and definition. This task is not straightforward, as a particular word may have a different part of speech based on the context in which the word is used. so in this tutorial is will be using Spark NLP library by JhonSnow labs. so lets dive into implementing.
Dependency
name := "scalaExamples"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.4",
"org.apache.spark" %% "spark-mllib" % "2.4.4" % "compile",
"com.johnsnowlabs.nlp" %% "spark-nlp" % "2.3.6"
)
POS (part of speech) Tagging
object Main extends App {
val spark = SparkSession
.builder()
.appName("test")
.config("spark.master", "local")
.getOrCreate();
val sent = Seq((1, " I ate dinner."),
(2, "We had a three-course meal."),
(3, "Brad came to dinner with us."),
(4, "He loves fish tacos."),
(5, "we all felt like we ate too much."),
(6, "In the end We all agreed; it was a magnificent evening."),
(7, "the end We all agreed; it was a magnificent evening."));
var df = spark.createDataFrame(sent).toDF("id", "sentence");
df.show();
df = new Tokenizer().setInputCol("sentence").setOutputCol("token").transform(df);
val pM = PretrainedPipeline("explain_document_dl");
val result = pM.annotate(df,"sentence");
result.select("pos").show();
}
To begin with Scala and SBT follow this tutorial :