반응형
Installing Apache Spark and Python
Windows: (keep scrolling for MacOS and Linux)
- Install a JDK (Java Development Kit) from http://www.oracle.com/technetwork/java/javase/downloads/index.html . You must install the JDK into a path with no spaces, for example c:\jdk. Be sure to change the default location for the installation! DO NOT INSTALL JAVA 16. SPARK IS ONLY COMPATIBLE WITH JAVA 8 OR 11.
- Download a pre-built version of Apache Spark 3 from https://spark.apache.org/downloads.html
- If necessary, download and install WinRAR so you can extract the .tgz file you downloaded. http://www.rarlab.com/download.htm
- Extract the Spark archive, and copy its contents into C:\spark after creating that directory. You should end up with directories like c:\spark\bin, c:\spark\conf, etc.
- Download winutils.exe from https://sundog–s3.amazonaws.com/winutils.exe and move it into a C:\winutils\bin folder that you’ve created. (note, this is a 64-bit application. If you are on a 32-bit version of Windows, you’ll need to search for a 32-bit build of winutils.exe for Hadoop.)
- Create a c:\tmp\hive directory, and cd into c:\winutils\bin, and run winutils.exe chmod 777 c:\tmp\hive
- Open the the c:\spark\conf folder, and make sure “File Name Extensions” is checked in the “view” tab of Windows Explorer. Rename the log4j.properties.template file to log4j.properties. Edit this file (using Wordpad or something similar) and change the error level from INFO to ERROR for log4j.rootCategory
- Right-click your Windows menu, select Control Panel, System and Security, and then System. Click on “Advanced System Settings” and then the “Environment Variables” button.
- Add the following new USER variables:
- SPARK_HOME c:\spark
- JAVA_HOME (the path you installed the JDK to in step 1, for example C:\JDK)
- HADOOP_HOME c:\winutils
- PYSPARK_PYTHON python
- Add the following paths to your PATH user variable:
%SPARK_HOME%\bin
%JAVA_HOME%\bin
- Close the environment variable screen and the control panels.
- Install the latest Anaconda for Python 3 from anaconda.com. Don’t install a Python 2.7 version! If you already use some other Python environment, that’s OK – you can use it instead, as long as it is a Python 3 environment.
- Test it out!
- Open up your Start menu and select “Anaconda Prompt” from the Anaconda3 menu.
- Enter cd c:\spark and then dir to get a directory listing.
- Look for a text file we can play with, like README.md or CHANGES.txt
- Enter pyspark
- At this point you should have a >>> prompt. If not, double check the steps above.
- Enter rdd = sc.textFile(“README.md”) (or whatever text file you’ve found) Enter rdd.count()
- You should get a count of the number of lines in that file! Congratulations, you just ran your first Spark program!
- Enter quit() to exit the spark shell, and close the console window
- You’ve got everything set up! Hooray!
MacOS
Step 1: Install Apache Spark
Method A: By Hand
If you’ve never used “homebrew,” this might be the better way to go for you. The best setup instructions for Spark on MacOS are at the following link:
https://medium.com/luckspark/installing-spark-2-3-0-on-macos-high-sierra-276a127b8b85
Spark 2.3.0 is no longer available, but the same process should work with 2.4.4 or 3.x.
Method B: Using Homebrew
- Install Homebrew if you don’t have it already by entering this from a terminal prompt: /usr/bin/ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”
- Enter brew install apache-spark
- Create a log4j.properties file via
- cd /usr/local/Cellar/apache-spark/2.0.0/libexec/conf (substitute 2.0.0 for the version actually installed)
- cp log4j.properties.template log4j.properties
- Edit the log4j.properties file and change the log level from INFO to ERROR on log4j.rootCategory.It’s OK if Homebrew does not install Spark 3; the code in the course should work fine with recent 2.x releases as well.
Step 2: Install Anaconda
Install the latest Anaconda for Python 3 from anaconda.com
Step 3: Test it out!
- Open up a terminal
- cd into the directory where you installed Spark, and then ls to get a directory listing.
- Look for a text file we can play with, like README.md or CHANGES.txt
- Enter pyspark
- At this point you should have a >>> prompt. If not, double check the steps above.
- Enter rdd = sc.textFile(“README.md”) (or whatever text file you’ve found) Enter rdd.count()
- You should get a count of the number of lines in that file! Congratulations, you just ran your first Spark program!
- Enter quit() to exit the spark shell, and close the terminal window
- You’ve got everything set up! Hooray!
Linux
- Install Java, Scala, and Spark according to the particulars of your specific OS. A good starting point is http://www.tutorialspoint.com/apache_spark/apache_spark_installation.htm (but be sure to install Spark 2.4.4 or newer)
- Install the latest Anaconda for Python 3 from anaconda.com
- Test it out!
- Open up a terminal
- cd into the directory you installed Spark, and do an ls to see what’s in there.
- Look for a text file we can play with, like README.md or CHANGES.txt
- Enter pyspark
- At this point you should have a >>> prompt. If not, double check the steps above.
- Enter rdd = sc.textFile(“README.md”) (or whatever text file you’ve found) Enter rdd.count()
- You should get a count of the number of lines in that file! Congratulations, you just ran your first Spark program!
- Enter quit() to exit the spark shell, and close the console window
- You’ve got everything set up! Hooray!
https://medium.com/luckspark/installing-spark-2-3-0-on-macos-high-sierra-276a127b8b85
Installing Apache Spark 2.3.0 on macOS High Sierra
Apache Spark 2.3.0 has been released on 28 February 2018. This tutorial guides you through its essential installation steps on macOS High Sierra.
반응형
'Data Analysis > Python' 카테고리의 다른 글
[Airflow] Remove DAG examples DAG 예시 파일 제거 방법 (0) | 2023.01.24 |
---|---|
[Airflow]. Airflow Local Executor와 Celery Executor (0) | 2023.01.23 |
[Airflow] 기본 정리 (2) | 2023.01.23 |
[Pandas] 조건걸고 새로운 칼럼 추가하기 (0) | 2023.01.07 |
[Airflow] The important views of the Airflow UI (1) | 2023.01.07 |
댓글