How To Setup Spark, Scala, Sbt and Generate Jar Files

Prerequisites
• Install Scala: http://www.scala-lang.org/download/
• Install SBT: http://www.scala-sbt.org/download.html
• Install Eclipse (/Scala IDE): http://scala-ide.org/download/sdk.html
• Install Spark
• Create main folder for application and inside that folder, create build.sbt file

Configure
Create the following directory and files.

mkdir -p src/{main,test}/{java,resources,scala}
mkdir lib project target
touch build.sbt
touch project/plugins.sbt

• Paste following code inside build.sbt file
( change name to whatever application name you want, change scala & spark version in library dependencies according to your installed scala & spark versions )

name := “Spark2”

version := “1.0”

scalaVersion := “2.11.0”

libraryDependencies += “org.apache.spark” %% “spark-core” % “2.1.0”
libraryDependencies += “org.apache.spark” % “spark-sql_2.11” % “2.1.1”

• After creating the folder structure, from project root run following command to enter sbt interactive mode
( make sure there are no errors with this command and it should show a prompt about setting current project to xxx project
It can take time when first running following commands in sbt interactive mode )

sbt

• If you get any errors with this command, that means something is wrong with your build.sbt file
• Keep this terminal window open and sbt interactive mode running as we’ll be running further commands here
• If you make any changes in build.sbt or plugins.sbt , run following command in sbt interactive mode

reload

• Add following lines in project/plugins.sbt

addSbtPlugin(“com.eed3si9n” % “sbt-assembly” % “0.14.3”)
addSbtPlugin(“com.typesafe.sbteclipse” % “sbteclipse-plugin” % “4.0.0”)

• Run following command for creating eclipse project and adding plugins from plugins.sbt file

eclipse

• Note: eclipse command will not be available unless the plugins are not loaded. Once done loading, you can open the current project with eclipse IDE or scala IDE
• Sbt interactive mode needs to be run only when you change configuration i.e. build.sbt or project/plugins.sbt file . Other than that you don’t have to keep it running while writing spark project
• Also whenever you make change in build.sbt or plugins.sbt file and reload the dependency through sbt mode, changes will not reflect in eclipse/scala IDE so you need to type run eclipse command as well and refresh from IDE menu too.
• To create jar file, run following command from sbt interactive mode

package

• Make sure you don’t get any errors with this command and once done, it will display full path for jar file created.
• To submit/run this jar file on spark cluster, run following command

spark-submit –class “YOUR_SCALA_CLASS_NAME” –master SET_MASTER_HERE JAR_FILE_LOCATION

spark-submit –class “SimpleApp” –master yarn /path/to/jar_file.jar

Leave a Reply

Your email address will not be published.