Sign in Subscribe

Topic

Apache Spark

Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. this tags contains basic tutorial on various Apache spark functions

POS tagging a sentence in Scala using Spark NLP.

POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag, based on its context and definition. This task is not straightforward, as a particular word may have a different part of speech based on the context in which the

Basic encoding : label encoding and one hot encoding in Scala with Apache Spark

If your starting with machine learning, after cleaning the data you end up with Normalising data, this is where encoding techniques comes in handy. there are lot of data encoding techniques but we hear lot about one hot encoding and label encoding a lot. also adaption rate for Scala programming

4 Moments : Skew,Kurtosis,Mean,Variance in Scala and Apache Spark

Moments is specific quantitative measure of the shape of the data. In statistics, moments are used to understand the various characteristics of a probability distribution. usually we use moments to characterise the data, identify the shape of normal distribution. Moments are used measure to central tendency, dispersion, skewness and kurtosis

Basic statistics concepts for machine learning in Scala spark

Before applying some distribution algorithm or probability density function or probability mass function, we need to understand some basic concepts of statistics these concepts might be though in our school ,we shall start by brushing up the concepts and implement those in Scala spark,Just for an overview i will

Removing Stop Words in Apache Spark using Scala

Long ago is was working on my pet project where i used scrape description and title form web URL and indexing words for granular search and grouping. the project was in java. and i had to remove few words which i did not want to index, Like "I"

Apache Spark SQL : Running SQL Queries on DataFrame using Scala

Apache Spark is a Big data processing engine which has components like "Spark SQL", "Spark Mlib" & "Spark streaming", we generally uses Apache spark for processing big data which process in-memory, batch wise and real time, general use case is to query large data

Apache Spark Imputer Usage In Scala

This Tutorial explain what is Spark imputer, implement the Imputer and basic terminologies used while using the imputer.And strategies available in spark imputer.

Apache Spark Data Frame:Basic Data manipulation using scala

Overview of this tutorial * Replace the data with new value in Data Frame * Filter the row values with basic conditions in Data Frame * Type Casting the Column Value in Data Frame To start Apache spark and read data from csv follow this post Replace the data with new value in

Introduction to Apache Spark in scala

What is Apache Spark ? Apache Spark is all referred as big data processing tool or framework developed under Apache. Spark has various inbuilt tool like SparkSQL, Spark Streaming,Spark Mllib,GraphX to handle the big data work. Overview of things covered in this Tutorial * Adding dependency to scala project * Starting