编写一个JSON解析器实际上就是一个函数,它的输入是一个表示JSON的字符串,输出是结构化的对应到语言本身的数据结构。
和XML相比,JSON本身结构非常简单,并且仅有几种数据类型,以Java为例,对应的数据结构是:
String
;Long
或Double
;Boolean
;null
;List
或Object[]
;Map
。解析JSON和解析XML类似,最终都是解析为内存的一个对象。出于效率考虑,使用流的方式几乎是唯一选择,也就是解析器只从头扫描一遍JSON字符串,就完整地解析出对应的数据结构。
本质上解析器就是一个状态机,只要按照JSON定义的格式(参考http://www.json.org,正确实现状态转移即可。但是为了简化代码,我们也没必要完整地实现一个字符一个字符的状态转移。
解析器的输入应该是一个字符流,所以,第一步是获得Reader
,以便能不断地读入下一个字符。
在解析的过程中,我们经常要根据下一个字符来决定状态跳转,此时又涉及到回退的问题,就是某些时候不能用next()
取下一个字符,而是用peek()
取下一个字符,但字符流的指针不移动。所以,Reader
接口不能满足这个需求,应当进一步封装一个CharReader
,它可以实现:
Reader
指针;Reader
指针;JSON解析比其他文本解析要简单的地方在于,任何JSON数据类型,只需要根据下一个字符即可确定,仔细总结可以发现,如果peek()
返回的字符是某个字符,就可以期望读取的数据类型:
但是单个字符要匹配的状态太多了,需要进一步把字符流变为Token
,可以总结出如下几种Token
:
然后,将CharReader
进一步封装为TokenReader
,提供以下接口:
由于JSON的Object和Array可以嵌套,在读取过程中,使用一个栈来存储Object和Array是必须的。每当我们读到一个BEGIN_OBJECT
时,就创建一个Map
并压栈;每当读到一个BEGIN_ARRAY
时,就创建一个List
并压栈;每当读到一个END_OBJECT
和END_ARRAY
时,就弹出栈顶元素,并根据新的栈顶元素判断是否压栈。此外,读到Object的Key也必须压栈,读到后面的Value后将Key-Value压入栈顶的Map。
如果读到END_DOCUMENT
时,栈恰好只剩下一个元素,则读取正确,将该元素返回,读取结束。如果栈剩下不止一个元素,则JSON文档格式不正确。
最后,JsonReader
的核心解析代码parse()
就是负责从TokenReader
中不断读取Token,根据当前状态操作,然后设定下一个Token期望的状态,如果与期望状态不符,则JSON的格式无效。起始状态被设定为STATUS_EXPECT_SINGLE_VALUE | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_BEGIN_ARRAY
,即期望读取到单个value、{
或[
。循环的退出点是读取到END_DOCUMENT
时。
public class JsonReader {
TokenReader reader;
public Object parse() {
Stack stack = new Stack();
int status = STATUS_EXPECT_SINGLE_VALUE | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_BEGIN_ARRAY;
for (;;) {
Token currentToken = reader.readNextToken();
switch (currentToken) {
case BOOLEAN:
if (hasStatus(STATUS_EXPECT_SINGLE_VALUE)) {
// single boolean:
Boolean bool = reader.readBoolean();
stack.push(StackValue.newJsonSingle(bool));
status = STATUS_EXPECT_END_DOCUMENT;
continue;
}
if (hasStatus(STATUS_EXPECT_OBJECT_VALUE)) {
Boolean bool = reader.readBoolean();
String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey();
stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, bool);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT;
continue;
}
if (hasStatus(STATUS_EXPECT_ARRAY_VALUE)) {
Boolean bool = reader.readBoolean();
stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(bool);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY;
continue;
}
throw new JsonParseException("Unexpected boolean.", reader.reader.readed);
case NULL:
if (hasStatus(STATUS_EXPECT_SINGLE_VALUE)) {
// single null:
reader.readNull();
stack.push(StackValue.newJsonSingle(null));
status = STATUS_EXPECT_END_DOCUMENT;
continue;
}
if (hasStatus(STATUS_EXPECT_OBJECT_VALUE)) {
reader.readNull();
String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey();
stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, null);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT;
continue;
}
if (hasStatus(STATUS_EXPECT_ARRAY_VALUE)) {
reader.readNull();
stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(null);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY;
continue;
}
throw new JsonParseException("Unexpected null.", reader.reader.readed);
case NUMBER:
if (hasStatus(STATUS_EXPECT_SINGLE_VALUE)) {
// single number:
Number number = reader.readNumber();
stack.push(StackValue.newJsonSingle(number));
status = STATUS_EXPECT_END_DOCUMENT;
continue;
}
if (hasStatus(STATUS_EXPECT_OBJECT_VALUE)) {
Number number = reader.readNumber();
String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey();
stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, number);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT;
continue;
}
if (hasStatus(STATUS_EXPECT_ARRAY_VALUE)) {
Number number = reader.readNumber();
stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(number);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY;
continue;
}
throw new JsonParseException("Unexpected number.", reader.reader.readed);
case STRING:
if (hasStatus(STATUS_EXPECT_SINGLE_VALUE)) {
// single string:
String str = reader.readString();
stack.push(StackValue.newJsonSingle(str));
status = STATUS_EXPECT_END_DOCUMENT;
continue;
}
if (hasStatus(STATUS_EXPECT_OBJECT_KEY)) {
String str = reader.readString();
stack.push(StackValue.newJsonObjectKey(str));
status = STATUS_EXPECT_COLON;
continue;
}
if (hasStatus(STATUS_EXPECT_OBJECT_VALUE)) {
String str = reader.readString();
String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey();
stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, str);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT;
continue;
}
if (hasStatus(STATUS_EXPECT_ARRAY_VALUE)) {
String str = reader.readString();
stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(str);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY;
continue;
}
throw new JsonParseException("Unexpected char \'\"\'.", reader.reader.readed);
case SEP_COLON: // :
if (status == STATUS_EXPECT_COLON) {
status = STATUS_EXPECT_OBJECT_VALUE | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_BEGIN_ARRAY;
continue;
}
throw new JsonParseException("Unexpected char \':\'.", reader.reader.readed);
case SEP_COMMA: // ,
if (hasStatus(STATUS_EXPECT_COMMA)) {
if (hasStatus(STATUS_EXPECT_END_OBJECT)) {
status = STATUS_EXPECT_OBJECT_KEY;
continue;
}
if (hasStatus(STATUS_EXPECT_END_ARRAY)) {
status = STATUS_EXPECT_ARRAY_VALUE | STATUS_EXPECT_BEGIN_ARRAY | STATUS_EXPECT_BEGIN_OBJECT;
continue;
}
}
throw new JsonParseException("Unexpected char \',\'.", reader.reader.readed);
case END_ARRAY:
if (hasStatus(STATUS_EXPECT_END_ARRAY)) {
StackValue array = stack.pop(StackValue.TYPE_ARRAY);
if (stack.isEmpty()) {
stack.push(array);
status = STATUS_EXPECT_END_DOCUMENT;
continue;
}
int type = stack.getTopValueType();
if (type == StackValue.TYPE_OBJECT_KEY) {
// key: [ CURRENT ] ,}
String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey();
stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, array.value);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT;
continue;
}
if (type == StackValue.TYPE_ARRAY) {
// xx, xx, [CURRENT] ,]
stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(array.value);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY;
continue;
}
}
throw new JsonParseException("Unexpected char: \']\'.", reader.reader.readed);
case END_OBJECT:
if (hasStatus(STATUS_EXPECT_END_OBJECT)) {
StackValue object = stack.pop(StackValue.TYPE_OBJECT);
if (stack.isEmpty()) {
// root object:
stack.push(object);
status = STATUS_EXPECT_END_DOCUMENT;
continue;
}
int type = stack.getTopValueType();
if (type == StackValue.TYPE_OBJECT_KEY) {
String key = stack.pop(StackValue.TYPE_OBJECT_KEY).valueAsKey();
stack.peek(StackValue.TYPE_OBJECT).valueAsObject().put(key, object.value);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_OBJECT;
continue;
}
if (type == StackValue.TYPE_ARRAY) {
stack.peek(StackValue.TYPE_ARRAY).valueAsArray().add(object.value);
status = STATUS_EXPECT_COMMA | STATUS_EXPECT_END_ARRAY;
continue;
}
}
throw new JsonParseException("Unexpected char: \'}\'.", reader.reader.readed);
case END_DOCUMENT:
if (hasStatus(STATUS_EXPECT_END_DOCUMENT)) {
StackValue v = stack.pop();
if (stack.isEmpty()) {
return v.value;
}
}
throw new JsonParseException("Unexpected EOF.", reader.reader.readed);
case BEGIN_ARRAY:
if (hasStatus(STATUS_EXPECT_BEGIN_ARRAY)) {
stack.push(StackValue.newJsonArray(this.jsonArrayFactory.createJsonArray()));
status = STATUS_EXPECT_ARRAY_VALUE | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_BEGIN_ARRAY | STATUS_EXPECT_END_ARRAY;
continue;
}
throw new JsonParseException("Unexpected char: \'[\'.", reader.reader.readed);
case BEGIN_OBJECT:
if (hasStatus(STATUS_EXPECT_BEGIN_OBJECT)) {
stack.push(StackValue.newJsonObject(this.jsonObjectFactory.createJsonObject()));
status = STATUS_EXPECT_OBJECT_KEY | STATUS_EXPECT_BEGIN_OBJECT | STATUS_EXPECT_END_OBJECT;
continue;
}
throw new JsonParseException("Unexpected char: \'{\'.", reader.reader.readed);
}
}
}
}