{"id":43628,"date":"2024-10-06T01:02:28","date_gmt":"2024-10-05T17:02:28","guid":{"rendered":"https:\/\/server.hk\/cnblog\/43628\/"},"modified":"2024-10-06T01:02:28","modified_gmt":"2024-10-05T17:02:28","slug":"centos-6-%e4%b8%8b%e9%85%8d%e7%bd%ae-spark-python-%e9%96%8b%e7%99%bc%e7%92%b0%e5%a2%83%e8%a8%98%e9%8c%84","status":"publish","type":"post","link":"https:\/\/server.hk\/cnblog\/43628\/","title":{"rendered":"CentOS 6 \u4e0b\u914d\u7f6e Spark + Python \u958b\u767c\u74b0\u5883\u8a18\u9304"},"content":{"rendered":"<h1 id=\"centos-6-%e4%b8%8b%e9%85%8d%e7%bd%ae-spark-python-%e9%96%8b%e7%99%bc%e7%92%b0%e5%a2%83%e8%a8%98%e9%8c%84-OWUJWDnqAN\">CentOS 6 \u4e0b\u914d\u7f6e Spark + Python \u958b\u767c\u74b0\u5883\u8a18\u9304<\/h1>\n<p>Apache Spark \u662f\u4e00\u500b\u5f37\u5927\u7684\u958b\u6e90\u5206\u6563\u5f0f\u8a08\u7b97\u6846\u67b6\uff0c\u5ee3\u6cdb\u61c9\u7528\u65bc\u5927\u6578\u64da\u8655\u7406\u548c\u5206\u6790\u3002\u7d50\u5408 Python \u7684\u6613\u7528\u6027\uff0cSpark \u63d0\u4f9b\u4e86\u5f37\u5927\u7684\u6578\u64da\u8655\u7406\u80fd\u529b\u3002\u672c\u6587\u5c07\u4ecb\u7d39\u5982\u4f55\u5728 CentOS 6 \u4e0a\u914d\u7f6e Spark \u548c Python \u7684\u958b\u767c\u74b0\u5883\uff0c\u5e6b\u52a9\u958b\u767c\u8005\u5feb\u901f\u4e0a\u624b\u3002<\/p>\n<h2 id=\"%e5%89%8d%e6%9c%9f%e6%ba%96%e5%82%99-OWUJWDnqAN\">\u524d\u671f\u6e96\u5099<\/h2>\n<p>\u5728\u958b\u59cb\u4e4b\u524d\uff0c\u78ba\u4fdd\u4f60\u7684 CentOS 6 \u7cfb\u7d71\u5df2\u7d93\u66f4\u65b0\u5230\u6700\u65b0\u7248\u672c\u3002\u53ef\u4ee5\u4f7f\u7528\u4ee5\u4e0b\u547d\u4ee4\u9032\u884c\u66f4\u65b0\uff1a<\/p>\n<pre><code>sudo yum update<\/code><\/pre>\n<p>\u63a5\u4e0b\u4f86\uff0c\u5b89\u88dd Java\uff0c\u56e0\u70ba Spark \u9700\u8981 Java \u74b0\u5883\u3002\u4f7f\u7528\u4ee5\u4e0b\u547d\u4ee4\u5b89\u88dd OpenJDK\uff1a<\/p>\n<pre><code>sudo yum install java-1.8.0-openjdk<\/code><\/pre>\n<p>\u5b89\u88dd\u5b8c\u6210\u5f8c\uff0c\u53ef\u4ee5\u4f7f\u7528\u4ee5\u4e0b\u547d\u4ee4\u6aa2\u67e5 Java \u662f\u5426\u5b89\u88dd\u6210\u529f\uff1a<\/p>\n<pre><code>java -version<\/code><\/pre>\n<h2 id=\"%e5%ae%89%e8%a3%9d-python-OWUJWDnqAN\">\u5b89\u88dd Python<\/h2>\n<p>\u63a5\u4e0b\u4f86\uff0c\u6211\u5011\u9700\u8981\u5b89\u88dd Python\u3002CentOS 6 \u9ed8\u8a8d\u5b89\u88dd\u7684 Python \u7248\u672c\u53ef\u80fd\u8f03\u820a\uff0c\u56e0\u6b64\u5efa\u8b70\u5b89\u88dd Python 3\u3002\u53ef\u4ee5\u4f7f\u7528\u4ee5\u4e0b\u547d\u4ee4\u5b89\u88dd Python 3\uff1a<\/p>\n<pre><code>sudo yum install python34<\/code><\/pre>\n<p>\u5b89\u88dd\u5b8c\u6210\u5f8c\uff0c\u6aa2\u67e5 Python \u7248\u672c\uff1a<\/p>\n<pre><code>python3 --version<\/code><\/pre>\n<h2 id=\"%e5%ae%89%e8%a3%9d-spark-OWUJWDnqAN\">\u5b89\u88dd Spark<\/h2>\n<p>\u73fe\u5728\u6211\u5011\u4f86\u5b89\u88dd Apache Spark\u3002\u9996\u5148\uff0c\u4e0b\u8f09 Spark \u7684\u6700\u65b0\u7248\u672c\u3002\u53ef\u4ee5\u8a2a\u554f <a href=\"https:\/\/spark.apache.org\/downloads.html\" rel=\"nofollow noopener\" target=\"_blank\">Apache Spark \u5b98\u65b9\u7db2\u7ad9<\/a> \u4e0b\u8f09\u9069\u5408\u7684\u7248\u672c\u3002\u4ee5\u4e0b\u662f\u4f7f\u7528 wget \u547d\u4ee4\u4e0b\u8f09 Spark \u7684\u793a\u4f8b\uff1a<\/p>\n<pre><code>wget https:\/\/archive.apache.org\/dist\/spark\/spark-3.1.2\/spark-3.1.2-bin-hadoop3.2.tgz<\/code><\/pre>\n<p>\u4e0b\u8f09\u5b8c\u6210\u5f8c\uff0c\u89e3\u58d3\u7e2e\u6587\u4ef6\uff1a<\/p>\n<pre><code>tar -xvzf spark-3.1.2-bin-hadoop3.2.tgz<\/code><\/pre>\n<p>\u7136\u5f8c\u5c07\u89e3\u58d3\u7e2e\u7684\u6587\u4ef6\u79fb\u52d5\u5230 \/opt \u76ee\u9304\uff1a<\/p>\n<pre><code>sudo mv spark-3.1.2-bin-hadoop3.2 \/opt\/spark<\/code><\/pre>\n<h2 id=\"%e9%85%8d%e7%bd%ae%e7%92%b0%e5%a2%83%e8%ae%8a%e9%87%8f-OWUJWDnqAN\">\u914d\u7f6e\u74b0\u5883\u8b8a\u91cf<\/h2>\n<p>\u70ba\u4e86\u65b9\u4fbf\u4f7f\u7528 Spark\uff0c\u6211\u5011\u9700\u8981\u914d\u7f6e\u74b0\u5883\u8b8a\u91cf\u3002\u7de8\u8f2f ~\/.bashrc \u6587\u4ef6\uff0c\u6dfb\u52a0\u4ee5\u4e0b\u5167\u5bb9\uff1a<\/p>\n<pre><code>export SPARK_HOME=\/opt\/spark\nexport PATH=$PATH:$SPARK_HOME\/bin<\/code><\/pre>\n<p>\u4fdd\u5b58\u6587\u4ef6\u5f8c\uff0c\u57f7\u884c\u4ee5\u4e0b\u547d\u4ee4\u4f7f\u8b8a\u66f4\u751f\u6548\uff1a<\/p>\n<pre><code>source ~\/.bashrc<\/code><\/pre>\n<h2 id=\"%e5%ae%89%e8%a3%9d-pyspark-OWUJWDnqAN\">\u5b89\u88dd PySpark<\/h2>\n<p>\u8981\u5728 Python \u4e2d\u4f7f\u7528 Spark\uff0c\u6211\u5011\u9700\u8981\u5b89\u88dd PySpark\u3002\u53ef\u4ee5\u4f7f\u7528 pip \u5b89\u88dd\uff1a<\/p>\n<pre><code>pip3 install pyspark<\/code><\/pre>\n<h2 id=\"%e6%b8%ac%e8%a9%a6%e5%ae%89%e8%a3%9d-OWUJWDnqAN\">\u6e2c\u8a66\u5b89\u88dd<\/h2>\n<p>\u5b89\u88dd\u5b8c\u6210\u5f8c\uff0c\u53ef\u4ee5\u901a\u904e\u4ee5\u4e0b Python \u4ee3\u78bc\u6e2c\u8a66 Spark \u662f\u5426\u6b63\u5e38\u904b\u884c\uff1a<\/p>\n<pre><code>from pyspark import SparkContext\n\nsc = SparkContext(\"local\", \"test\")\ndata = [1, 2, 3, 4, 5]\ndistData = sc.parallelize(data)\nprint(distData.reduce(lambda a, b: a + b))  # \u8f38\u51fa 15\nsc.stop()<\/code><\/pre>\n<p>\u5c07\u4e0a\u8ff0\u4ee3\u78bc\u4fdd\u5b58\u70ba test.py\uff0c\u7136\u5f8c\u904b\u884c\uff1a<\/p>\n<pre><code>python3 test.py<\/code><\/pre>\n<p>\u5982\u679c\u8f38\u51fa\u7d50\u679c\u70ba 15\uff0c\u5247\u8868\u793a Spark \u548c Python \u7684\u958b\u767c\u74b0\u5883\u914d\u7f6e\u6210\u529f\u3002<\/p>\n<h2 id=\"%e7%b8%bd%e7%b5%90-OWUJWDnqAN\">\u7e3d\u7d50<\/h2>\n<p>\u5728 CentOS 6 \u4e0a\u914d\u7f6e Spark \u548c Python \u958b\u767c\u74b0\u5883\u7684\u904e\u7a0b\u76f8\u5c0d\u7c21\u55ae\uff0c\u901a\u904e\u5b89\u88dd Java\u3001Python\u3001Spark \u548c PySpark\uff0c\u958b\u767c\u8005\u53ef\u4ee5\u5feb\u901f\u958b\u59cb\u5927\u6578\u64da\u8655\u7406\u548c\u5206\u6790\u7684\u5de5\u4f5c\u3002\u5c0d\u65bc\u9700\u8981\u9ad8\u6548\u80fd\u8a08\u7b97\u7684\u61c9\u7528\uff0c\u9078\u64c7\u5408\u9069\u7684 <a href=\"https:\/\/server.hk\">VPS<\/a> \u65b9\u6848\u5c07\u6709\u52a9\u65bc\u63d0\u5347\u6027\u80fd\u548c\u7a69\u5b9a\u6027\u3002\u7121\u8ad6\u662f\u9078\u64c7 <a href=\"https:\/\/server.hk\">\u9999\u6e2f\u4f3a\u670d\u5668<\/a> \u9084\u662f\u5176\u4ed6\u5730\u5340\u7684\u670d\u52d9\uff0c\u78ba\u4fdd\u9078\u64c7\u7b26\u5408\u9700\u6c42\u7684\u914d\u7f6e\uff0c\u4ee5\u652f\u6301\u4f60\u7684\u958b\u767c\u5de5\u4f5c\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u5728CentOS 6\u4e0a\u914d\u7f6eSpark\u8207Python\u958b\u767c\u74b0\u5883\u7684\u8a73\u7d30\u6b65\u9a5f\u8207\u8a18\u9304\uff0c\u52a9\u4f60\u8f15\u9b06\u642d\u5efa\u5927\u6578\u64da\u5206\u6790\u5e73\u53f0\u3002<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4924],"tags":[],"class_list":["post-43628","post","type-post","status-publish","format-standard","hentry","category-setup-tutorials"],"_links":{"self":[{"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/posts\/43628","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/comments?post=43628"}],"version-history":[{"count":1,"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/posts\/43628\/revisions"}],"predecessor-version":[{"id":43629,"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/posts\/43628\/revisions\/43629"}],"wp:attachment":[{"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/media?parent=43628"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/categories?post=43628"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/server.hk\/cnblog\/wp-json\/wp\/v2\/tags?post=43628"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}