Big Data POS tagging a sentence in Scala using Spark NLP. POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag, based on its context and definition. This task is not straightforward, as a particular word may have a different part of speech based on the context in which the
Big Data Basic encoding : label encoding and one hot encoding in Scala with Apache Spark If your starting with machine learning, after cleaning the data you end up with Normalising data, this is where encoding techniques comes in handy. there are lot of data encoding techniques but we hear lot about one hot encoding and label encoding a lot. also adaption rate for Scala programming
Apache Spark 4 Moments : Skew,Kurtosis,Mean,Variance in Scala and Apache Spark Moments is specific quantitative measure of the shape of the data. In statistics, moments are used to understand the various characteristics of a probability distribution. usually we use moments to characterise the data, identify the shape of normal distribution. Moments are used measure to central tendency, dispersion, skewness and kurtosis
Apache Spark Basic statistics concepts for machine learning in Scala spark Before applying some distribution algorithm or probability density function or probability mass function, we need to understand some basic concepts of statistics these concepts might be though in our school ,we shall start by brushing up the concepts and implement those in Scala spark,Just for an overview i will
Apache Removing Stop Words in Apache Spark using Scala Long ago is was working on my pet project where i used scrape description and title form web URL and indexing words for granular search and grouping. the project was in java. and i had to remove few words which i did not want to index, Like "I"
Apache Spark Featured Apache Spark SQL : Running SQL Queries on DataFrame using Scala Apache Spark is a Big data processing engine which has components like "Spark SQL", "Spark Mlib" & "Spark streaming", we generally uses Apache spark for processing big data which process in-memory, batch wise and real time, general use case is to query large data
Scala Apache Spark Imputer Usage In Scala This Tutorial explain what is Spark imputer, implement the Imputer and basic terminologies used while using the imputer.And strategies available in spark imputer.
Apache Spark Apache Spark Data Frame:Basic Data manipulation using scala Overview of this tutorial * Replace the data with new value in Data Frame * Filter the row values with basic conditions in Data Frame * Type Casting the Column Value in Data Frame To start Apache spark and read data from csv follow this post [https://blogs.ashrithgn.com/2018/08/06/
Scala Introduction to Apache Spark in scala What is Apache Spark ? Apache Spark is all referred as big data processing tool or framework developed under Apache. Spark has various inbuilt tool like SparkSQL, Spark Streaming,Spark Mllib,GraphX to handle the big data work. Overview of things covered in this Tutorial * Adding dependency to scala project * Starting