昨天捣鼓了一天这个东西,随便写点笔记。arvo:除了著名的hdfs文件,hadoop上常用的另一种序列化存储的文件格式就是arvo。简单的讲,这货就是由一个定义好的schema来读取的二进制文本文件。arvo schema:很像json...比如这里这个:{
"type" : "record",
"name" : "Tweet",
"namespace" : "com.miguno.avro",
"fields" : [ {
"name" : "username",
"type" : "string",
"doc" : "Name of the user account on Twitter.com"
}, {
"name" : "tweet",
"type" : "string",
"doc" : "The content of the user's Twitter message"
}, {
"name" : "timestamp",
"type" : "long",
"doc" : "Unix epoch time in seconds"
} ],
"doc:" : "A basic schema for storing Twitter messages"
}定义好schema之后可以用java去build...arvo to HIVE:可以
...
继续阅读
(101)