Table of Contents
1 设置java库和c++库的路径
修改~/.bashrc文件,增加
# qfs with spark export SPARK_CLASSPATH=/your-path/qfs/lib/hadoop-2.5.1-qfs-master.jar:/your-path/qfs/lib/qfs-access-master.jar export LD_LIBRARY_PATH=/your-path/qfs/lib/
2 用spark-shell读取qfs文件
2.1 简化spark-shell启动
~/.bashrc文件中,增加一个环境变量,简化spark-shell启动
export MASTER=spark://meta-server-ip:7077
2.2 启动后设置hadoopConfiguration
scala> sc.hadoopConfiguration.set("fs.qfs.impl", "com.quantcast.qfs.hadoop.QuantcastFileSystem"); scala> sc.hadoopConfiguration.set("fs.defaultFS", "qfs://meta-server-ip:20000"); scala> sc.hadoopConfiguration.set("fs.qfs.metaServerHost", "meta-server-ip"); scala> sc.hadoopConfiguration.set("fs.qfs.metaServerPort", "20000");
2.3 读取文件
scala> val file = sc.textFile("/sdm/files/sample") file: org.apache.spark.rdd.RDD[String] = /sdm/files/sample MapPartitionsRDD[1] at textFile at <console>:24 scala> file.count() res4: Long = 93676
注意,这里文件路径没有/qfs/前缀