POS tagging a sentence in Scala using Spark NLP.

POS tagging a sentence in Scala using Spark NLP.

POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag, based on its context and definition. This task is not straightforward, as a particular word may have a different part of speech based on the context in which the word is used. so in this tutorial is will be using Spark NLP library by JhonSnow labs. so lets dive into implementing.

Dependency


name := "scalaExamples"

version := "0.1"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.4.4",
  "org.apache.spark" %% "spark-mllib" % "2.4.4" % "compile",
  "com.johnsnowlabs.nlp" %% "spark-nlp" % "2.3.6"
)

POS (part of speech) Tagging


object Main extends App {

  val spark = SparkSession
    .builder()
    .appName("test")
    .config("spark.master", "local")
    .getOrCreate();

  val sent = Seq((1, " I ate dinner."),
    (2, "We had a three-course meal."),
    (3, "Brad came to dinner with us."),
    (4, "He loves fish tacos."),
    (5, "we all felt like we ate too much."),
    (6, "In the end We all agreed; it was a magnificent evening."),
    (7, "the end We all agreed; it was a magnificent evening."));


  var df = spark.createDataFrame(sent).toDF("id", "sentence");
  df.show();

  df = new Tokenizer().setInputCol("sentence").setOutputCol("token").transform(df);

  val pM = PretrainedPipeline("explain_document_dl");
  val result = pM.annotate(df,"sentence");
  result.select("pos").show();
}

To begin with Scala and SBT follow this tutorial :

Install Scala And Scala Build Tools IN UBUNTU-18.04LTS
How to - install scala,and sbt(scala build tool) on ubuntu 18.04 lts & ubuntu 16.04 lts, and start a hello world project in sbt and shell