Spark installation on ubuntu

03 May

Spark installation on ubuntu

Spark can be deployed in a variety of ways, provides native bindings for the Java, Scala, Python, and R programming languages, and supports SQL, streaming data, machine learning, and graph processing. Out of the box, Spark can run in a standalone cluster mode that simply requires the Apache Spark framework and a JVM on each machine in the cluster.

In this tutorial we will see how to install spark on ubuntu 16.04 by doing the following steps

Step 1: Before installing Spark, you need to First ensure that java8 is installed:

    sudo add-apt-repository ppa:webupd8team/java
    sudo apt-get update
    sudo apt-get install oracle-java8-installer

sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java8-installer

Verify that java is correctly installed:

    java -version

1	java -version

Configuring Java Environment

    sudo apt-get install oracle-java8-set-default

1	sudo apt-get install oracle-java8-set-default

Step 2: Ensure that you successfully installed hadoop on your machine
Check this link if you need to know how to install it.

Step 3: Download Apache Spark

Go to downloads page
Choose a Spark release: 2.1.1 (May 02 2017)
Choose a package type: Pre-built for Hadoop 2.6

Step 4: Complete the installation process

Move the downloaded file “spark-2.1.1-bin-hadoop2.6.tgz” to your home (~)
Compress it :

tar -xf spark-2.1.1-bin-hadoop2.6.tgz

1

tar -xf spark-2.1.1-bin-hadoop2.6.tgz
Create a link to spark installation:

sudo ln -s ~/spark-2.1.1-bin-hadoop2.6 /usr/local/spark

1

sudo ln -s ~/spark-2.1.1-bin-hadoop2.6 /usr/local/spark
Edit bashrc using this command line:

cd sudo gedit .bashrc

1
2

cd
sudo gedit .bashrc

Add this lines:

    # - SPARK ENVIRONMENT VARIABLES START -#
    export SPARK_HOME=/usr/local/spark
    export PATH=$SPARK_HOME/bin:$PATH
    # — SPARK ENVIRONMENT VARIABLES END — #

# - SPARK ENVIRONMENT VARIABLES START -#

export SPARK_HOME=/usr/local/spark

export PATH=$SPARK_HOME/bin:$PATH

# — SPARK ENVIRONMENT VARIABLES END — #

Execute bashrc:

. ~/.bashrc

1

. ~/.bashrc

Step 5: Test the installation

Execute this command line:

spark-shell

1

spark-shell

While Structured Streaming provides high-level improvements to Spark Streaming, it currently relies on the same microbatching scheme of handling streaming data. However, the Apache Spark team is working to bring continuous streaming without microbatching to the platform, which should solve many of the problems with handling low-latency responses

Author: Nizar Ellouze

Share20

20 Shares

Spark installation on ubuntu

Spark installation on ubuntu

Author: Nizar Ellouze

No Comments

Post a Comment Cancel Reply

Social Media

Tutorials

Blog