* [PATCH v3 0/7] qobject: switch JSON parser to push
@ 2026-05-25 15:04 Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 1/7] json-parser: constify JSONToken Paolo Bonzini
` (7 more replies)
0 siblings, 8 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-05-25 15:04 UTC (permalink / raw)
To: qemu-devel
This rewrites the json-parser to use a push parser aka state machine.
While push parsers are inherently more complex than recursive descent,
the grammar for JSON is simple enough that the parser remains readable.
There is therefore no need to use e.g. QEMU coroutines.
Unlike the suggestion in commit 62815d85aed ("json: Redesign the callback
to consume JSON values", 2018-08-24), I kept the json-streamer concept.
It helps in handling input limits, it performs error recovery, and it
converts the token-at-a-time push interface to callbacks---all things
that are more easily done in a separate layer to keep the parser clean.
However, there is no need anymore for it to store partial JSON objects
in tokenized form, because the current state is stored in the push
parser's stack.
Another benefit is that QEMU can report the first parsing error
immediately, without waiting for parentheses to be balanced or for a
lexing error. Error recovery then proceeds as before (i.e., the next
parse still starts after balanced parentheses or a lexing error).
On top of the benefits intrinsic in the push architecture, it so happens
that it's really easy to add a location to JSON parsing errors now, so
do that as well.
The diffstat is unfavorable, but most of the new lines delta is really
new comments explaining the grammar and state machines.
Paolo
v2->v3:
- accept interpolation for the key of a dictionary
v1->v2:
- remove part of the patch to pass around the lookahead token,
it was hard to review and added little value
- separate patch to reuse the JSONParser
- separate patch to make brace/bracket count unsigned
- add comment with the structure of the stack
- add big comment with the grammar
- split long lines
- remove QObject **value argument to pop_entry()
- add assertions about the type of the top-of-stack
- change error to "key is not a string in object"
- split out json_parser_reset() already in the first patch
- rename json_parser_parse_token() to parse_token()
- do not use single quotes in commit messages
- move initialization of JSONToken close to usage
Paolo Bonzini (7):
json-parser: constify JSONToken
json-parser: replace with a push parser
json-streamer: reuse parser
json-streamer: make brace/bracket count unsigned
json-streamer: remove token queue
json-streamer: do not heap-allocate JSONToken
json-parser: add location to JSON parsing errors
include/qobject/json-parser.h | 16 +-
qobject/json-parser-int.h | 13 +-
qobject/json-lexer.c | 11 +-
qobject/json-parser.c | 580 +++++++++++++++++++---------------
qobject/json-streamer.c | 120 +++----
5 files changed, 415 insertions(+), 325 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 1/7] json-parser: constify JSONToken
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
@ 2026-05-25 15:04 ` Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 2/7] json-parser: replace with a push parser Paolo Bonzini
` (6 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-05-25 15:04 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
qobject/json-parser.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 7483e582fea..f6622b82b0a 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -55,7 +55,8 @@ static QObject *parse_value(JSONParserContext *ctxt);
* Error handler
*/
static void G_GNUC_PRINTF(3, 4) parse_error(JSONParserContext *ctxt,
- JSONToken *token, const char *msg, ...)
+ const JSONToken *token,
+ const char *msg, ...)
{
va_list ap;
char message[1024];
@@ -126,7 +127,7 @@ static int cvt4hex(const char *s)
* - Invalid Unicode characters are rejected.
* - Control characters \x00..\x1F are rejected by the lexer.
*/
-static QString *parse_string(JSONParserContext *ctxt, JSONToken *token)
+static QString *parse_string(JSONParserContext *ctxt, const JSONToken *token)
{
const char *ptr = token->str;
GString *str;
@@ -239,14 +240,14 @@ out:
* parser_context_pop_token is deleted as soon as parser_context_pop_token
* is called again.
*/
-static JSONToken *parser_context_pop_token(JSONParserContext *ctxt)
+static const JSONToken *parser_context_pop_token(JSONParserContext *ctxt)
{
g_free(ctxt->current);
ctxt->current = g_queue_pop_head(ctxt->buf);
return ctxt->current;
}
-static JSONToken *parser_context_peek_token(JSONParserContext *ctxt)
+static const JSONToken *parser_context_peek_token(JSONParserContext *ctxt)
{
return g_queue_peek_head(ctxt->buf);
}
@@ -259,7 +260,7 @@ static int parse_pair(JSONParserContext *ctxt, QDict *dict)
QObject *key_obj = NULL;
QString *key;
QObject *value;
- JSONToken *peek, *token;
+ const JSONToken *peek, *token;
peek = parser_context_peek_token(ctxt);
if (peek == NULL) {
@@ -309,7 +310,7 @@ out:
static QObject *parse_object(JSONParserContext *ctxt)
{
QDict *dict = NULL;
- JSONToken *token, *peek;
+ const JSONToken *token, *peek;
token = parser_context_pop_token(ctxt);
assert(token && token->type == JSON_LCURLY);
@@ -363,7 +364,7 @@ out:
static QObject *parse_array(JSONParserContext *ctxt)
{
QList *list = NULL;
- JSONToken *token, *peek;
+ const JSONToken *token, *peek;
token = parser_context_pop_token(ctxt);
assert(token && token->type == JSON_LSQUARE);
@@ -426,7 +427,7 @@ out:
static QObject *parse_keyword(JSONParserContext *ctxt)
{
- JSONToken *token;
+ const JSONToken *token;
token = parser_context_pop_token(ctxt);
assert(token && token->type == JSON_KEYWORD);
@@ -444,7 +445,7 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
static QObject *parse_interpolation(JSONParserContext *ctxt)
{
- JSONToken *token;
+ const JSONToken *token;
token = parser_context_pop_token(ctxt);
assert(token && token->type == JSON_INTERP);
@@ -480,7 +481,7 @@ static QObject *parse_interpolation(JSONParserContext *ctxt)
static QObject *parse_literal(JSONParserContext *ctxt)
{
- JSONToken *token;
+ const JSONToken *token;
token = parser_context_pop_token(ctxt);
assert(token);
@@ -532,7 +533,7 @@ static QObject *parse_literal(JSONParserContext *ctxt)
static QObject *parse_value(JSONParserContext *ctxt)
{
- JSONToken *token;
+ const JSONToken *token;
token = parser_context_peek_token(ctxt);
if (token == NULL) {
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 2/7] json-parser: replace with a push parser
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 1/7] json-parser: constify JSONToken Paolo Bonzini
@ 2026-05-25 15:04 ` Paolo Bonzini
2026-06-12 14:21 ` Markus Armbruster
2026-05-25 15:04 ` [PATCH v3 3/7] json-streamer: reuse parser Paolo Bonzini
` (5 subsequent siblings)
7 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2026-05-25 15:04 UTC (permalink / raw)
To: qemu-devel
In order to avoid stashing all the tokens corresponding to a JSON value,
embed the parsing stack and state machine in JSONParser. This is more
efficient and allows for more prompt error recovery; it also does not
make the code substantially larger than the current recursive descent
parser, though the state machine is probably a bit harder to follow.
The stack consists of QLists and QDicts corresponding to open
brackets and braces, plus optionally a QString with the current
key on top of each QDict.
After each value is parsed, it is added to the top array or dictionary
or, if the stack is empty, json_parser_feed returns the complete
QObject.
For now, json-streamer.c keeps tracking the tokens up until braces
and brackets are balanced, and then shoves the whole queue of tokens
into the push parser. The only logic change is that JSON_END_OF_INPUT
always triggers the emptying of the queue; the parser takes notice and
checks that there is nothing on the stack. Not using brace_count
and bracket_count for this is the first step towards improved separation
of concerns between json-parser.c and json-streamer.c.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 6 +
qobject/json-parser-int.h | 5 +-
qobject/json-parser.c | 551 ++++++++++++++++++++--------------
qobject/json-streamer.c | 21 +-
4 files changed, 345 insertions(+), 238 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 7345a9bd5cb..05346fa816b 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -20,6 +20,12 @@ typedef struct JSONLexer {
int x, y;
} JSONLexer;
+typedef struct JSONParserContext {
+ Error *err;
+ GQueue *stack;
+ va_list *ap;
+} JSONParserContext;
+
typedef struct JSONMessageParser {
void (*emit)(void *opaque, QObject *json, Error *err);
void *opaque;
diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
index 8c01f236276..1f435cb8eb2 100644
--- a/qobject/json-parser-int.h
+++ b/qobject/json-parser-int.h
@@ -49,6 +49,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
/* json-parser.c */
JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
-QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
+void json_parser_init(JSONParserContext *ctxt, va_list *ap);
+void json_parser_reset(JSONParserContext *ctxt);
+QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token, Error **errp);
+void json_parser_destroy(JSONParserContext *ctxt);
#endif
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index f6622b82b0a..3b5edc5bae4 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -31,12 +31,105 @@ struct JSONToken {
char str[];
};
-typedef struct JSONParserContext {
- Error *err;
- JSONToken *current;
- GQueue *buf;
- va_list *ap;
-} JSONParserContext;
+/*
+ * The JSON parser is a push parser, returning to the caller after every
+ * token. Therefore it has an explicit representation of its parser
+ * stack; each stack entry consists of a parser state and a QObject:
+ * - a QList, for an array that is being added to
+ * - a QDict, for a dictionary that is being added to
+ * - a QString, for the key of the next pair that will be added to a QDict
+ *
+ * The stack represents an arbitrary nesting of arrays and dictionaries
+ * (whose next key has been parsed); it can also have a dictionary whose
+ * next key has not been parsed, but that can only happen at the top level.
+ * Because of this, the stack contents are always of the form
+ * "(QList | QDict QString)* QDict?".
+ *
+ * An empty stack represents the beginning of the parsing process, with
+ * start state BEFORE_VALUE.
+ */
+
+typedef enum JSONParserState {
+ AFTER_LCURLY,
+ AFTER_LSQUARE,
+ BEFORE_KEY,
+ BEFORE_VALUE,
+ END_OF_KEY,
+ END_OF_VALUE,
+} JSONParserState;
+
+typedef struct JSONParserStackEntry {
+ /*
+ * State when the container is completed or, for the top of the stack,
+ * entry state for the next token.
+ */
+ JSONParserState state;
+
+ /*
+ * A QString with the last parsed key, or a QList/QDict for the current
+ * container.
+ */
+ QObject *partial;
+} JSONParserStackEntry;
+
+/*
+ * This is the JSON grammar that's parsed, with the state transition and
+ * action at each point of the grammar. While this is not a formal
+ * description, "-> action" represents the pseudocode of the action
+ * and "-> STATE" sets the top stack entry's state to STATE.
+ *
+ * // The initial state is BEFORE_VALUE.
+ * input := value -> END_OF_VALUE -> return parsed value
+ * END_OF_INPUT -> check stack is empty
+ *
+ * // entered on BEFORE_VALUE; after any of these rules are processed, the
+ * // parser has completed a QObject and is in the END_OF_VALUE state.
+ * //
+ * // When the parser reaches the END_OF_VALUE state, it examines the
+ * // top of the stack to see if it's coming from "input" (stack empty),
+ * // "array_items" (TOS is a QList) or "dict_pairs" (TOS is a QString; the
+ * // item below will be a QDict). It then proceeds with the corresponding
+ * // actions, which will be one of:
+ * // - return parsed value
+ * // - add value to QList
+ * // - pop QString with the key, add key/value to the QDict
+ * value := literal -> END_OF_VALUE
+ * | '[' -> push empty QList -> AFTER_LSQUARE
+ * after_lsquare -> END_OF_VALUE
+ * | '{' -> push empty QDict -> AFTER_LCURLY
+ * after_lcurly -> END_OF_VALUE
+ *
+ * // non-recursive values, entered on BEFORE_VALUE
+ * literal := INTEGER -> END_OF_VALUE
+ * | FLOAT -> END_OF_VALUE
+ * | KEYWORD -> END_OF_VALUE
+ * | STRING -> END_OF_VALUE
+ * | INTERP -> END_OF_VALUE
+ *
+ * // entered on AFTER_LSQUARE
+ * after_lsquare := ']' -> pop completed QList -> END_OF_VALUE
+ * | ϵ -> BEFORE_VALUE
+ * array_items -> END_OF_VALUE
+ *
+ * // entered on BEFORE_VALUE, with TOS being a QList
+ * array_items := value -> add value to QList -> END_OF_VALUE
+ * (']' -> pop completed QList -> END_OF_VALUE
+ * | ',' -> BEFORE_VALUE
+ * array_items) -> END_OF_VALUE
+ *
+ * // entered on AFTER_LCURLY
+ * after_lcurly := '}' -> pop completed QDict -> END_OF_VALUE
+ * | ϵ -> BEFORE_KEY
+ * dict_pairs -> END_OF_VALUE
+ *
+ * // entered on BEFORE_KEY, with TOS being a QDict
+ * dict_pairs := (STRING | INTERP) -> push QString -> END_OF_KEY
+ * ':' -> BEFORE_VALUE
+ * value -> pop QString + add pair to QDict -> END_OF_VALUE
+ * ('}' -> pop completed QDict -> END_OF_VALUE
+ * | ',' -> BEFORE_KEY
+ * dict_pairs) -> END_OF_VALUE
+ */
#define BUG_ON(cond) assert(!(cond))
@@ -49,7 +142,26 @@ typedef struct JSONParserContext {
* 4) deal with premature EOI
*/
-static QObject *parse_value(JSONParserContext *ctxt);
+static inline JSONParserStackEntry *current_entry(JSONParserContext *ctxt)
+{
+ return g_queue_peek_tail(ctxt->stack);
+}
+
+static void push_entry(JSONParserContext *ctxt, QObject *partial,
+ JSONParserState state)
+{
+ JSONParserStackEntry *entry = g_new(JSONParserStackEntry, 1);
+ entry->partial = partial;
+ entry->state = state;
+ g_queue_push_tail(ctxt->stack, entry);
+}
+
+static JSONParserStackEntry *pop_entry(JSONParserContext *ctxt)
+{
+ JSONParserStackEntry *entry = g_queue_pop_tail(ctxt->stack);
+ g_free(entry);
+ return current_entry(ctxt);
+}
/**
* Error handler
@@ -236,200 +348,10 @@ out:
return NULL;
}
-/* Note: the token object returned by parser_context_peek_token or
- * parser_context_pop_token is deleted as soon as parser_context_pop_token
- * is called again.
- */
-static const JSONToken *parser_context_pop_token(JSONParserContext *ctxt)
+/* Terminals */
+
+static QObject *parse_keyword(JSONParserContext *ctxt, const JSONToken *token)
{
- g_free(ctxt->current);
- ctxt->current = g_queue_pop_head(ctxt->buf);
- return ctxt->current;
-}
-
-static const JSONToken *parser_context_peek_token(JSONParserContext *ctxt)
-{
- return g_queue_peek_head(ctxt->buf);
-}
-
-/**
- * Parsing rules
- */
-static int parse_pair(JSONParserContext *ctxt, QDict *dict)
-{
- QObject *key_obj = NULL;
- QString *key;
- QObject *value;
- const JSONToken *peek, *token;
-
- peek = parser_context_peek_token(ctxt);
- if (peek == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- key_obj = parse_value(ctxt);
- key = qobject_to(QString, key_obj);
- if (!key) {
- parse_error(ctxt, peek, "key is not a string in object");
- goto out;
- }
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- if (token->type != JSON_COLON) {
- parse_error(ctxt, token, "missing : in object pair");
- goto out;
- }
-
- value = parse_value(ctxt);
- if (value == NULL) {
- parse_error(ctxt, token, "Missing value in dict");
- goto out;
- }
-
- if (qdict_haskey(dict, qstring_get_str(key))) {
- parse_error(ctxt, token, "duplicate key");
- goto out;
- }
-
- qdict_put_obj(dict, qstring_get_str(key), value);
-
- qobject_unref(key_obj);
- return 0;
-
-out:
- qobject_unref(key_obj);
- return -1;
-}
-
-static QObject *parse_object(JSONParserContext *ctxt)
-{
- QDict *dict = NULL;
- const JSONToken *token, *peek;
-
- token = parser_context_pop_token(ctxt);
- assert(token && token->type == JSON_LCURLY);
-
- dict = qdict_new();
-
- peek = parser_context_peek_token(ctxt);
- if (peek == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- if (peek->type != JSON_RCURLY) {
- if (parse_pair(ctxt, dict) == -1) {
- goto out;
- }
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- while (token->type != JSON_RCURLY) {
- if (token->type != JSON_COMMA) {
- parse_error(ctxt, token, "expected separator in dict");
- goto out;
- }
-
- if (parse_pair(ctxt, dict) == -1) {
- goto out;
- }
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
- }
- } else {
- (void)parser_context_pop_token(ctxt);
- }
-
- return QOBJECT(dict);
-
-out:
- qobject_unref(dict);
- return NULL;
-}
-
-static QObject *parse_array(JSONParserContext *ctxt)
-{
- QList *list = NULL;
- const JSONToken *token, *peek;
-
- token = parser_context_pop_token(ctxt);
- assert(token && token->type == JSON_LSQUARE);
-
- list = qlist_new();
-
- peek = parser_context_peek_token(ctxt);
- if (peek == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- if (peek->type != JSON_RSQUARE) {
- QObject *obj;
-
- obj = parse_value(ctxt);
- if (obj == NULL) {
- parse_error(ctxt, token, "expecting value");
- goto out;
- }
-
- qlist_append_obj(list, obj);
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- while (token->type != JSON_RSQUARE) {
- if (token->type != JSON_COMMA) {
- parse_error(ctxt, token, "expected separator in list");
- goto out;
- }
-
- obj = parse_value(ctxt);
- if (obj == NULL) {
- parse_error(ctxt, token, "expecting value");
- goto out;
- }
-
- qlist_append_obj(list, obj);
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
- }
- } else {
- (void)parser_context_pop_token(ctxt);
- }
-
- return QOBJECT(list);
-
-out:
- qobject_unref(list);
- return NULL;
-}
-
-static QObject *parse_keyword(JSONParserContext *ctxt)
-{
- const JSONToken *token;
-
- token = parser_context_pop_token(ctxt);
assert(token && token->type == JSON_KEYWORD);
if (!strcmp(token->str, "true")) {
@@ -443,11 +365,9 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
return NULL;
}
-static QObject *parse_interpolation(JSONParserContext *ctxt)
+static QObject *parse_interpolation(JSONParserContext *ctxt,
+ const JSONToken *token)
{
- const JSONToken *token;
-
- token = parser_context_pop_token(ctxt);
assert(token && token->type == JSON_INTERP);
if (!strcmp(token->str, "%p")) {
@@ -479,11 +399,8 @@ static QObject *parse_interpolation(JSONParserContext *ctxt)
return NULL;
}
-static QObject *parse_literal(JSONParserContext *ctxt)
+static QObject *parse_literal(JSONParserContext *ctxt, const JSONToken *token)
{
- const JSONToken *token;
-
- token = parser_context_pop_token(ctxt);
assert(token);
switch (token->type) {
@@ -531,35 +448,167 @@ static QObject *parse_literal(JSONParserContext *ctxt)
}
}
-static QObject *parse_value(JSONParserContext *ctxt)
+/* Parsing state machine */
+
+static QObject *parse_begin_value(JSONParserContext *ctxt,
+ const JSONToken *token)
{
- const JSONToken *token;
-
- token = parser_context_peek_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- return NULL;
- }
-
switch (token->type) {
case JSON_LCURLY:
- return parse_object(ctxt);
+ push_entry(ctxt, QOBJECT(qdict_new()), AFTER_LCURLY);
+ return NULL;
case JSON_LSQUARE:
- return parse_array(ctxt);
+ push_entry(ctxt, QOBJECT(qlist_new()), AFTER_LSQUARE);
+ return NULL;
case JSON_INTERP:
- return parse_interpolation(ctxt);
+ return parse_interpolation(ctxt, token);
case JSON_INTEGER:
case JSON_FLOAT:
case JSON_STRING:
- return parse_literal(ctxt);
+ return parse_literal(ctxt, token);
case JSON_KEYWORD:
- return parse_keyword(ctxt);
+ return parse_keyword(ctxt, token);
default:
parse_error(ctxt, token, "expecting value");
return NULL;
}
}
+static QObject *parse_token(JSONParserContext *ctxt, const JSONToken *token)
+{
+ JSONParserStackEntry *entry;
+ JSONParserState state;
+ QString *key;
+ QObject *key_obj = NULL, *value = NULL;
+
+ entry = current_entry(ctxt);
+ state = entry ? entry->state : BEFORE_VALUE;
+ switch (state) {
+ case AFTER_LCURLY:
+ /* Grab '}' for empty object or fall through to BEFORE_KEY */
+ assert(qobject_type(entry->partial) == QTYPE_QDICT);
+ if (token->type == JSON_RCURLY) {
+ value = entry->partial;
+ entry = pop_entry(ctxt);
+ break;
+ }
+ entry->state = BEFORE_KEY;
+ /* fall through */
+
+ case BEFORE_KEY:
+ /* Expecting object key */
+ assert(qobject_type(entry->partial) == QTYPE_QDICT);
+ if (token->type == JSON_STRING || token->type == JSON_INTERP) {
+ key_obj = parse_begin_value(ctxt, token);
+ if (!key_obj) {
+ /* parse error happened */
+ return NULL;
+ }
+ }
+ if (!key_obj || qobject_type(key_obj) != QTYPE_QSTRING) {
+ parse_error(ctxt, token, "key is not a string in object");
+ return NULL;
+ }
+
+ /* Store key in a special entry on the stack */
+ push_entry(ctxt, key_obj, END_OF_KEY);
+ return NULL;
+
+ case END_OF_KEY:
+ /* Expecting ':' after key */
+ assert(qobject_type(entry->partial) == QTYPE_QSTRING);
+ if (token->type == JSON_COLON) {
+ entry->state = BEFORE_VALUE;
+ } else {
+ parse_error(ctxt, token, "expecting ':'");
+ }
+ return NULL;
+
+ case AFTER_LSQUARE:
+ /* Grab ']' for empty array or fall through to BEFORE_VALUE */
+ assert(qobject_type(entry->partial) == QTYPE_QLIST);
+ if (token->type == JSON_RSQUARE) {
+ value = entry->partial;
+ entry = pop_entry(ctxt);
+ break;
+ }
+ entry->state = BEFORE_VALUE;
+ /* fall through */
+
+ case BEFORE_VALUE:
+ /* Expecting value */
+ assert(!entry || qobject_type(entry->partial) != QTYPE_QDICT);
+ value = parse_begin_value(ctxt, token);
+ if (!value) {
+ /* Error or '['/'{' */
+ return NULL;
+ }
+ /* Return value or insert it into a container */
+ break;
+
+ case END_OF_VALUE:
+ /* Grab ',' or ']' for array; ',' or '}' for object */
+ if (qobject_to(QList, entry->partial)) {
+ /* Array */
+ if (token->type != JSON_RSQUARE) {
+ if (token->type == JSON_COMMA) {
+ entry->state = BEFORE_VALUE;
+ } else {
+ parse_error(ctxt, token, "expected ',' or ']'");
+ }
+ return NULL;
+ }
+ } else if (qobject_to(QDict, entry->partial)) {
+ /* Object */
+ if (token->type != JSON_RCURLY) {
+ if (token->type == JSON_COMMA) {
+ entry->state = BEFORE_KEY;
+ } else {
+ parse_error(ctxt, token, "expected ',' or '}'");
+ }
+ return NULL;
+ }
+ } else {
+ g_assert_not_reached();
+ }
+
+ /* Got ']' or '}'; return full value or insert into parent container */
+ value = entry->partial;
+ entry = pop_entry(ctxt);
+ break;
+ }
+
+ assert(value);
+ if (entry == NULL) {
+ /* The toplevel value is complete. */
+ return value;
+ }
+
+ key = qobject_to(QString, entry->partial);
+ if (key) {
+ const char *key_str;
+ QDict *dict;
+
+ entry = pop_entry(ctxt);
+ dict = qobject_to(QDict, entry->partial);
+ assert(dict);
+ key_str = qstring_get_str(key);
+ if (qdict_haskey(dict, key_str)) {
+ parse_error(ctxt, token, "duplicate key");
+ qobject_unref(value);
+ return NULL;
+ }
+ qdict_put_obj(dict, key_str, value);
+ qobject_unref(key);
+ } else {
+ /* Add to array */
+ qlist_append_obj(qobject_to(QList, entry->partial), value);
+ }
+
+ entry->state = END_OF_VALUE;
+ return NULL;
+}
+
JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr)
{
JSONToken *token = g_malloc(sizeof(JSONToken) + tokstr->len + 1);
@@ -572,20 +621,56 @@ JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr)
return token;
}
-QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp)
+void json_parser_reset(JSONParserContext *ctxt)
{
- JSONParserContext ctxt = { .buf = tokens, .ap = ap };
- QObject *result;
+ JSONParserStackEntry *entry;
- result = parse_value(&ctxt);
- assert(ctxt.err || g_queue_is_empty(ctxt.buf));
-
- error_propagate(errp, ctxt.err);
-
- while (!g_queue_is_empty(ctxt.buf)) {
- parser_context_pop_token(&ctxt);
+ ctxt->err = NULL;
+ while ((entry = g_queue_pop_tail(ctxt->stack)) != NULL) {
+ qobject_unref(entry->partial);
+ g_free(entry);
}
- g_free(ctxt.current);
+}
+void json_parser_init(JSONParserContext *ctxt, va_list *ap)
+{
+ ctxt->stack = g_queue_new();
+ ctxt->ap = ap;
+ json_parser_reset(ctxt);
+}
+
+void json_parser_destroy(JSONParserContext *ctxt)
+{
+ json_parser_reset(ctxt);
+ g_queue_free(ctxt->stack);
+ ctxt->stack = NULL;
+}
+
+/*
+ * Advance the parser based on the token that is passed.
+ * Return the finished toplevel value if the token completes it.
+ * If an error is returned, the function must not be called without
+ * first resetting the parser.
+ */
+QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token,
+ Error **errp)
+{
+ QObject *result = NULL;
+
+ assert(!ctxt->err);
+ switch (token->type) {
+ case JSON_END_OF_INPUT:
+ /* Check for premature end of input */
+ if (!g_queue_is_empty(ctxt->stack)) {
+ parse_error(ctxt, token, "premature end of input");
+ }
+ break;
+
+ default:
+ result = parse_token(ctxt, token);
+ break;
+ }
+
+ error_propagate(errp, ctxt->err);
return result;
}
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index b93d97b995f..6c93e6fd78d 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -32,6 +32,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
JSONTokenType type, int x, int y)
{
JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
+ JSONParserContext ctxt;
QObject *json = NULL;
Error *err = NULL;
JSONToken *token;
@@ -56,8 +57,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
if (g_queue_is_empty(&parser->tokens)) {
return;
}
- json = json_parser_parse(&parser->tokens, parser->ap, &err);
- goto out_emit;
+ break;
default:
break;
}
@@ -85,11 +85,24 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
g_queue_push_tail(&parser->tokens, token);
if ((parser->brace_count > 0 || parser->bracket_count > 0)
- && parser->brace_count >= 0 && parser->bracket_count >= 0) {
+ && parser->brace_count >= 0 && parser->bracket_count >= 0
+ && type != JSON_END_OF_INPUT) {
return;
}
- json = json_parser_parse(&parser->tokens, parser->ap, &err);
+ json_parser_init(&ctxt, parser->ap);
+
+ /* Process all tokens in the queue */
+ while (!g_queue_is_empty(&parser->tokens)) {
+ token = g_queue_pop_head(&parser->tokens);
+ json = json_parser_feed(&ctxt, token, &err);
+ g_free(token);
+ if (json || err) {
+ break;
+ }
+ }
+
+ json_parser_destroy(&ctxt);
out_emit:
parser->brace_count = 0;
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 3/7] json-streamer: reuse parser
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 1/7] json-parser: constify JSONToken Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 2/7] json-parser: replace with a push parser Paolo Bonzini
@ 2026-05-25 15:04 ` Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 4/7] json-streamer: make brace/bracket count unsigned Paolo Bonzini
` (4 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-05-25 15:04 UTC (permalink / raw)
To: qemu-devel
The push parser can be reset, so reuse it when the json-streamer
detects a completed toplevel object.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 2 +-
qobject/json-streamer.c | 11 ++++-------
2 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 05346fa816b..4c3d89f751f 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -29,8 +29,8 @@ typedef struct JSONParserContext {
typedef struct JSONMessageParser {
void (*emit)(void *opaque, QObject *json, Error *err);
void *opaque;
- va_list *ap;
JSONLexer lexer;
+ JSONParserContext parser;
int brace_count;
int bracket_count;
GQueue tokens;
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 6c93e6fd78d..f3dfdcaea12 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -32,7 +32,6 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
JSONTokenType type, int x, int y)
{
JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
- JSONParserContext ctxt;
QObject *json = NULL;
Error *err = NULL;
JSONToken *token;
@@ -90,26 +89,23 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
return;
}
- json_parser_init(&ctxt, parser->ap);
-
/* Process all tokens in the queue */
while (!g_queue_is_empty(&parser->tokens)) {
token = g_queue_pop_head(&parser->tokens);
- json = json_parser_feed(&ctxt, token, &err);
+ json = json_parser_feed(&parser->parser, token, &err);
g_free(token);
if (json || err) {
break;
}
}
- json_parser_destroy(&ctxt);
-
out_emit:
parser->brace_count = 0;
parser->bracket_count = 0;
json_message_free_tokens(parser);
parser->token_size = 0;
parser->emit(parser->opaque, json, err);
+ json_parser_reset(&parser->parser);
}
void json_message_parser_init(JSONMessageParser *parser,
@@ -119,12 +115,12 @@ void json_message_parser_init(JSONMessageParser *parser,
{
parser->emit = emit;
parser->opaque = opaque;
- parser->ap = ap;
parser->brace_count = 0;
parser->bracket_count = 0;
g_queue_init(&parser->tokens);
parser->token_size = 0;
+ json_parser_init(&parser->parser, ap);
json_lexer_init(&parser->lexer, !!ap);
}
@@ -144,4 +140,5 @@ void json_message_parser_destroy(JSONMessageParser *parser)
{
json_lexer_destroy(&parser->lexer);
json_message_free_tokens(parser);
+ json_parser_destroy(&parser->parser);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 4/7] json-streamer: make brace/bracket count unsigned
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
` (2 preceding siblings ...)
2026-05-25 15:04 ` [PATCH v3 3/7] json-streamer: reuse parser Paolo Bonzini
@ 2026-05-25 15:05 ` Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 5/7] json-streamer: remove token queue Paolo Bonzini
` (3 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-05-25 15:05 UTC (permalink / raw)
To: qemu-devel
It makes no sense to let brace_count and bracket_count go negative,
also because it immediately ends error recovery and sets them both
back to zero. Instead set them to zero *before* choosing
whether to process the token queue; this makes it possible to
have the fields as unsigned.
Note that JSON_END_OF_INPUT now forces the parentheses to appear
balanced, so that the queue is emptied and an error is reported;
hence, the "type != JSON_END_OF_INPUT" condition can be removed.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 4 ++--
qobject/json-streamer.c | 19 ++++++++++++++++---
2 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 4c3d89f751f..0cf6932ecdc 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -31,8 +31,8 @@ typedef struct JSONMessageParser {
void *opaque;
JSONLexer lexer;
JSONParserContext parser;
- int brace_count;
- int bracket_count;
+ unsigned int brace_count;
+ unsigned int bracket_count;
GQueue tokens;
uint64_t token_size;
} JSONMessageParser;
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index f3dfdcaea12..b0bf2083ca6 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -41,12 +41,18 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
parser->brace_count++;
break;
case JSON_RCURLY:
+ if (parser->brace_count <= 0) {
+ goto end_error_recovery;
+ }
parser->brace_count--;
break;
case JSON_LSQUARE:
parser->bracket_count++;
break;
case JSON_RSQUARE:
+ if (parser->bracket_count <= 0) {
+ goto end_error_recovery;
+ }
parser->bracket_count--;
break;
case JSON_ERROR:
@@ -56,6 +62,15 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
if (g_queue_is_empty(&parser->tokens)) {
return;
}
+ end_error_recovery:
+ /*
+ * Cause error recovery to end immediately.
+ * If not in error recovery, the parser will raise an error
+ * (due to JSON_ERROR or unexpected JSON_R{CURLY,SQUARE})
+ * but error recovery won't be entered at all.
+ */
+ parser->brace_count = 0;
+ parser->bracket_count = 0;
break;
default:
break;
@@ -83,9 +98,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
g_queue_push_tail(&parser->tokens, token);
- if ((parser->brace_count > 0 || parser->bracket_count > 0)
- && parser->brace_count >= 0 && parser->bracket_count >= 0
- && type != JSON_END_OF_INPUT) {
+ if (parser->brace_count > 0 || parser->bracket_count > 0) {
return;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 5/7] json-streamer: remove token queue
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
` (3 preceding siblings ...)
2026-05-25 15:05 ` [PATCH v3 4/7] json-streamer: make brace/bracket count unsigned Paolo Bonzini
@ 2026-05-25 15:05 ` Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 6/7] json-streamer: do not heap-allocate JSONToken Paolo Bonzini
` (2 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-05-25 15:05 UTC (permalink / raw)
To: qemu-devel
Now fully exploit the push parser, feeding it one token at a time
without having to wait until braces and brackets are balanced.
While the nesting counts are retained for error recovery purposes,
the system can now report the first parsing error without waiting
for parentheses to be balanced. This also means that JSON_ERROR
can be handled in json-parser.c, not json-streamer.c.
After reporting the error, json-streamer.c then enters an error recovery
mode where subsequent errors are suppressed. This mimics the previous
error reporting behavior, but it provides prompt feedback on parsing
errors. As an example, here is an example interaction with qemu-ga.
BEFORE (error reported only once braces are balanced):
>> {"execute":foo
>> }
<< {"error": {"class": "GenericError", "desc": "JSON parse error, invalid keyword 'foo'"}}
>> {"execute":"somecommand"}
<< {"error": {"class": "CommandNotFound", "desc": "The command somecommand has not been found"}}
AFTER (error reported immediately, but similar error recovery as before):
>> {"execute":foo
<< {"error": {"class": "GenericError", "desc": "JSON parse error, invalid keyword 'foo'"}}
>> }
>> {"execute":"somecommand"}
<< {"error": {"class": "CommandNotFound", "desc": "The command somecommand has not been found"}}
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 3 +-
qobject/json-parser.c | 4 ++
qobject/json-streamer.c | 100 ++++++++++++++--------------------
3 files changed, 46 insertions(+), 61 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 0cf6932ecdc..3479e637588 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -33,7 +33,8 @@ typedef struct JSONMessageParser {
JSONParserContext parser;
unsigned int brace_count;
unsigned int bracket_count;
- GQueue tokens;
+ unsigned int token_count;
+ bool error;
uint64_t token_size;
} JSONMessageParser;
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 3b5edc5bae4..b77baab585f 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -659,6 +659,10 @@ QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token,
assert(!ctxt->err);
switch (token->type) {
+ case JSON_ERROR:
+ parse_error(ctxt, token, "JSON parse error, stray '%s'", token->str);
+ break;
+
case JSON_END_OF_INPUT:
/* Check for premature end of input */
if (!g_queue_is_empty(ctxt->stack)) {
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index b0bf2083ca6..82d2bbc9426 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -1,5 +1,5 @@
/*
- * JSON streaming support
+ * JSON parser - callback interface and error recovery
*
* Copyright IBM, Corp. 2009
*
@@ -19,23 +19,16 @@
#define MAX_TOKEN_COUNT (2ULL << 20)
#define MAX_NESTING (1 << 10)
-static void json_message_free_tokens(JSONMessageParser *parser)
-{
- JSONToken *token;
-
- while ((token = g_queue_pop_head(&parser->tokens))) {
- g_free(token);
- }
-}
-
void json_message_process_token(JSONLexer *lexer, GString *input,
JSONTokenType type, int x, int y)
{
JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
- QObject *json = NULL;
Error *err = NULL;
- JSONToken *token;
+ parser->token_size += input->len;
+ parser->token_count++;
+
+ /* Detect message boundaries for error recovery purposes. */
switch (type) {
case JSON_LCURLY:
parser->brace_count++;
@@ -56,12 +49,6 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
parser->bracket_count--;
break;
case JSON_ERROR:
- error_setg(&err, "JSON parse error, stray '%s'", input->str);
- goto out_emit;
- case JSON_END_OF_INPUT:
- if (g_queue_is_empty(&parser->tokens)) {
- return;
- }
end_error_recovery:
/*
* Cause error recovery to end immediately.
@@ -76,49 +63,43 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
break;
}
- /*
- * Security consideration, we limit total memory allocated per object
- * and the maximum recursion depth that a message can force.
- */
- if (parser->token_size + input->len + 1 > MAX_TOKEN_SIZE) {
- error_setg(&err, "JSON token size limit exceeded");
- goto out_emit;
- }
- if (g_queue_get_length(&parser->tokens) + 1 > MAX_TOKEN_COUNT) {
- error_setg(&err, "JSON token count limit exceeded");
- goto out_emit;
- }
- if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
- error_setg(&err, "JSON nesting depth limit exceeded");
- goto out_emit;
- }
+ if (parser->error) {
+ /* error recovery, eat tokens until parentheses balance */
+ } else {
+ /*
+ * Safety consideration, we limit total memory allocated per object
+ * and the maximum nesting depth that a message can force.
+ */
+ if (parser->token_size > MAX_TOKEN_SIZE) {
+ error_setg(&err, "JSON token size limit exceeded");
+ } else if (parser->token_count > MAX_TOKEN_COUNT) {
+ error_setg(&err, "JSON token count limit exceeded");
+ } else if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
+ error_setg(&err, "JSON nesting depth limit exceeded");
+ } else {
+ g_autofree JSONToken *token = json_token(type, x, y, input);
+ QObject *json = json_parser_feed(&parser->parser, token, &err);
+ if (json) {
+ parser->emit(parser->opaque, json, NULL);
+ }
+ }
- token = json_token(type, x, y, input);
- parser->token_size += input->len;
-
- g_queue_push_tail(&parser->tokens, token);
-
- if (parser->brace_count > 0 || parser->bracket_count > 0) {
- return;
- }
-
- /* Process all tokens in the queue */
- while (!g_queue_is_empty(&parser->tokens)) {
- token = g_queue_pop_head(&parser->tokens);
- json = json_parser_feed(&parser->parser, token, &err);
- g_free(token);
- if (json || err) {
- break;
+ if (err) {
+ parser->emit(parser->opaque, NULL, err);
+ /* start recovery */
+ parser->error = true;
}
}
-out_emit:
- parser->brace_count = 0;
- parser->bracket_count = 0;
- json_message_free_tokens(parser);
- parser->token_size = 0;
- parser->emit(parser->opaque, json, err);
- json_parser_reset(&parser->parser);
+ if ((parser->brace_count == 0 && parser->bracket_count == 0)
+ || type == JSON_END_OF_INPUT) {
+ parser->error = false;
+ parser->brace_count = 0;
+ parser->bracket_count = 0;
+ parser->token_count = 0;
+ parser->token_size = 0;
+ json_parser_reset(&parser->parser);
+ }
}
void json_message_parser_init(JSONMessageParser *parser,
@@ -128,9 +109,10 @@ void json_message_parser_init(JSONMessageParser *parser,
{
parser->emit = emit;
parser->opaque = opaque;
+ parser->error = false;
parser->brace_count = 0;
parser->bracket_count = 0;
- g_queue_init(&parser->tokens);
+ parser->token_count = 0;
parser->token_size = 0;
json_parser_init(&parser->parser, ap);
@@ -146,12 +128,10 @@ void json_message_parser_feed(JSONMessageParser *parser,
void json_message_parser_flush(JSONMessageParser *parser)
{
json_lexer_flush(&parser->lexer);
- assert(g_queue_is_empty(&parser->tokens));
}
void json_message_parser_destroy(JSONMessageParser *parser)
{
json_lexer_destroy(&parser->lexer);
- json_message_free_tokens(parser);
json_parser_destroy(&parser->parser);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 6/7] json-streamer: do not heap-allocate JSONToken
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
` (4 preceding siblings ...)
2026-05-25 15:05 ` [PATCH v3 5/7] json-streamer: remove token queue Paolo Bonzini
@ 2026-05-25 15:05 ` Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 7/7] json-parser: add location to JSON parsing errors Paolo Bonzini
2026-06-02 8:58 ` [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
7 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-05-25 15:05 UTC (permalink / raw)
To: qemu-devel; +Cc: Markus Armbruster
This is not needed with a push parser. Since it processes tokens
immediately, the JSONToken can be created directly on the stack
and does not need to copy the lexer's string data.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
qobject/json-parser-int.h | 8 ++++++--
qobject/json-parser.c | 18 ------------------
qobject/json-streamer.c | 9 +++++++--
3 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
index 1f435cb8eb2..5a6b5c9af90 100644
--- a/qobject/json-parser-int.h
+++ b/qobject/json-parser-int.h
@@ -35,7 +35,12 @@ typedef enum json_token_type {
JSON_MAX = JSON_END_OF_INPUT
} JSONTokenType;
-typedef struct JSONToken JSONToken;
+typedef struct JSONToken {
+ JSONTokenType type;
+ int x;
+ int y;
+ char *str;
+} JSONToken;
/* json-lexer.c */
void json_lexer_init(JSONLexer *lexer, bool enable_interpolation);
@@ -48,7 +53,6 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
JSONTokenType type, int x, int y);
/* json-parser.c */
-JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
void json_parser_init(JSONParserContext *ctxt, va_list *ap);
void json_parser_reset(JSONParserContext *ctxt);
QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token, Error **errp);
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index b77baab585f..faf3a9142bd 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -24,13 +24,6 @@
#include "qobject/qstring.h"
#include "json-parser-int.h"
-struct JSONToken {
- JSONTokenType type;
- int x;
- int y;
- char str[];
-};
-
/*
* The JSON parser is a push parser, returning to the caller after every
* token. Therefore it has an explicit representation of its parser
@@ -609,17 +602,6 @@ static QObject *parse_token(JSONParserContext *ctxt, const JSONToken *token)
return NULL;
}
-JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr)
-{
- JSONToken *token = g_malloc(sizeof(JSONToken) + tokstr->len + 1);
-
- token->type = type;
- memcpy(token->str, tokstr->str, tokstr->len);
- token->str[tokstr->len] = 0;
- token->x = x;
- token->y = y;
- return token;
-}
void json_parser_reset(JSONParserContext *ctxt)
{
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 82d2bbc9426..2c1f20fc62b 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -77,8 +77,13 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
} else if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
error_setg(&err, "JSON nesting depth limit exceeded");
} else {
- g_autofree JSONToken *token = json_token(type, x, y, input);
- QObject *json = json_parser_feed(&parser->parser, token, &err);
+ JSONToken token = (JSONToken) {
+ .type = type,
+ .x = x,
+ .y = y,
+ .str = input->str
+ };
+ QObject *json = json_parser_feed(&parser->parser, &token, &err);
if (json) {
parser->emit(parser->opaque, json, NULL);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 7/7] json-parser: add location to JSON parsing errors
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
` (5 preceding siblings ...)
2026-05-25 15:05 ` [PATCH v3 6/7] json-streamer: do not heap-allocate JSONToken Paolo Bonzini
@ 2026-05-25 15:05 ` Paolo Bonzini
2026-06-02 8:58 ` [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
7 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-05-25 15:05 UTC (permalink / raw)
To: qemu-devel
Now that all calls to parse_error have a token, add the line and column
to the message. As far as I can see the two important TODOs (better
errors and better EOI handling) are done, and the others (token range
information and "parsed size"?) do not really matter or are handled
better by json-streamer.c. So remove the list, which had sat unchanged
since 2009.
This needs some adjustments to provide a good x and y for error messages.
First of all, they switch from zero-based to one-based, which is safe
because they were both sitting unused. Second, right now the x and y
are those of the *last* character in the token. Modify json-lexer.c to
freeze tok->x and tok->y at the first character added to the GString.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 1 +
qobject/json-lexer.c | 11 +++++++----
qobject/json-parser.c | 12 ++----------
3 files changed, 10 insertions(+), 14 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 3479e637588..e078b36b2d5 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -17,6 +17,7 @@
typedef struct JSONLexer {
int start_state, state;
GString *token;
+ int cur_x, cur_y;
int x, y;
} JSONLexer;
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 51341d96e49..7753ba6c092 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -277,7 +277,8 @@ void json_lexer_init(JSONLexer *lexer, bool enable_interpolation)
lexer->start_state = lexer->state = enable_interpolation
? IN_START_INTERP : IN_START;
lexer->token = g_string_sized_new(3);
- lexer->x = lexer->y = 0;
+ lexer->cur_x = lexer->cur_y = 1;
+ lexer->x = lexer->y = 1;
}
static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
@@ -285,10 +286,10 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
int new_state;
bool char_consumed = false;
- lexer->x++;
+ lexer->cur_x++;
if (ch == '\n') {
- lexer->x = 0;
- lexer->y++;
+ lexer->cur_x = 1;
+ lexer->cur_y++;
}
while (flush ? lexer->state != lexer->start_state : !char_consumed) {
@@ -316,6 +317,8 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
case IN_START:
g_string_truncate(lexer->token, 0);
new_state = lexer->start_state;
+ lexer->x = lexer->cur_x;
+ lexer->y = lexer->cur_y;
break;
case JSON_ERROR:
json_message_process_token(lexer, lexer->token, JSON_ERROR,
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index faf3a9142bd..8c58ae0349a 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -126,15 +126,6 @@ typedef struct JSONParserStackEntry {
#define BUG_ON(cond) assert(!(cond))
-/**
- * TODO
- *
- * 0) make errors meaningful again
- * 1) add geometry information to tokens
- * 3) should we return a parsed size?
- * 4) deal with premature EOI
- */
-
static inline JSONParserStackEntry *current_entry(JSONParserContext *ctxt)
{
return g_queue_peek_tail(ctxt->stack);
@@ -172,7 +163,8 @@ static void G_GNUC_PRINTF(3, 4) parse_error(JSONParserContext *ctxt,
va_start(ap, msg);
vsnprintf(message, sizeof(message), msg, ap);
va_end(ap);
- error_setg(&ctxt->err, "JSON parse error, %s", message);
+ error_setg(&ctxt->err, "JSON parse error at line %d, column %d, %s",
+ token->y, token->x, message);
}
static int cvt4hex(const char *s)
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 0/7] qobject: switch JSON parser to push
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
` (6 preceding siblings ...)
2026-05-25 15:05 ` [PATCH v3 7/7] json-parser: add location to JSON parsing errors Paolo Bonzini
@ 2026-06-02 8:58 ` Paolo Bonzini
7 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-06-02 8:58 UTC (permalink / raw)
To: qemu-devel, Armbruster, Markus
Just a heads up that I'm planning to include this in a pull request
around mid June.
Paolo
On Mon, May 25, 2026 at 5:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> This rewrites the json-parser to use a push parser aka state machine.
> While push parsers are inherently more complex than recursive descent,
> the grammar for JSON is simple enough that the parser remains readable.
> There is therefore no need to use e.g. QEMU coroutines.
>
> Unlike the suggestion in commit 62815d85aed ("json: Redesign the callback
> to consume JSON values", 2018-08-24), I kept the json-streamer concept.
> It helps in handling input limits, it performs error recovery, and it
> converts the token-at-a-time push interface to callbacks---all things
> that are more easily done in a separate layer to keep the parser clean.
> However, there is no need anymore for it to store partial JSON objects
> in tokenized form, because the current state is stored in the push
> parser's stack.
>
> Another benefit is that QEMU can report the first parsing error
> immediately, without waiting for parentheses to be balanced or for a
> lexing error. Error recovery then proceeds as before (i.e., the next
> parse still starts after balanced parentheses or a lexing error).
>
> On top of the benefits intrinsic in the push architecture, it so happens
> that it's really easy to add a location to JSON parsing errors now, so
> do that as well.
>
> The diffstat is unfavorable, but most of the new lines delta is really
> new comments explaining the grammar and state machines.
>
> Paolo
>
> v2->v3:
> - accept interpolation for the key of a dictionary
>
> v1->v2:
> - remove part of the patch to pass around the lookahead token,
> it was hard to review and added little value
> - separate patch to reuse the JSONParser
> - separate patch to make brace/bracket count unsigned
> - add comment with the structure of the stack
> - add big comment with the grammar
> - split long lines
> - remove QObject **value argument to pop_entry()
> - add assertions about the type of the top-of-stack
> - change error to "key is not a string in object"
> - split out json_parser_reset() already in the first patch
> - rename json_parser_parse_token() to parse_token()
> - do not use single quotes in commit messages
> - move initialization of JSONToken close to usage
>
>
> Paolo Bonzini (7):
> json-parser: constify JSONToken
> json-parser: replace with a push parser
> json-streamer: reuse parser
> json-streamer: make brace/bracket count unsigned
> json-streamer: remove token queue
> json-streamer: do not heap-allocate JSONToken
> json-parser: add location to JSON parsing errors
>
> include/qobject/json-parser.h | 16 +-
> qobject/json-parser-int.h | 13 +-
> qobject/json-lexer.c | 11 +-
> qobject/json-parser.c | 580 +++++++++++++++++++---------------
> qobject/json-streamer.c | 120 +++----
> 5 files changed, 415 insertions(+), 325 deletions(-)
>
> --
> 2.54.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/7] json-parser: replace with a push parser
2026-05-25 15:04 ` [PATCH v3 2/7] json-parser: replace with a push parser Paolo Bonzini
@ 2026-06-12 14:21 ` Markus Armbruster
2026-06-12 15:08 ` Paolo Bonzini
0 siblings, 1 reply; 11+ messages in thread
From: Markus Armbruster @ 2026-06-12 14:21 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
Paolo Bonzini <pbonzini@redhat.com> writes:
> In order to avoid stashing all the tokens corresponding to a JSON value,
> embed the parsing stack and state machine in JSONParser. This is more
> efficient and allows for more prompt error recovery; it also does not
> make the code substantially larger than the current recursive descent
> parser, though the state machine is probably a bit harder to follow.
>
> The stack consists of QLists and QDicts corresponding to open
> brackets and braces, plus optionally a QString with the current
> key on top of each QDict.
>
> After each value is parsed, it is added to the top array or dictionary
> or, if the stack is empty, json_parser_feed returns the complete
> QObject.
>
> For now, json-streamer.c keeps tracking the tokens up until braces
> and brackets are balanced, and then shoves the whole queue of tokens
> into the push parser. The only logic change is that JSON_END_OF_INPUT
> always triggers the emptying of the queue; the parser takes notice and
> checks that there is nothing on the stack. Not using brace_count
> and bracket_count for this is the first step towards improved separation
> of concerns between json-parser.c and json-streamer.c.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> include/qobject/json-parser.h | 6 +
> qobject/json-parser-int.h | 5 +-
> qobject/json-parser.c | 551 ++++++++++++++++++++--------------
> qobject/json-streamer.c | 21 +-
> 4 files changed, 345 insertions(+), 238 deletions(-)
>
> diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
> index 7345a9bd5cb..05346fa816b 100644
> --- a/include/qobject/json-parser.h
> +++ b/include/qobject/json-parser.h
> @@ -20,6 +20,12 @@ typedef struct JSONLexer {
> int x, y;
> } JSONLexer;
>
> +typedef struct JSONParserContext {
> + Error *err;
> + GQueue *stack;
> + va_list *ap;
> +} JSONParserContext;
> +
> typedef struct JSONMessageParser {
> void (*emit)(void *opaque, QObject *json, Error *err);
> void *opaque;
> diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
> index 8c01f236276..1f435cb8eb2 100644
> --- a/qobject/json-parser-int.h
> +++ b/qobject/json-parser-int.h
> @@ -49,6 +49,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
>
> /* json-parser.c */
> JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
> -QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
> +void json_parser_init(JSONParserContext *ctxt, va_list *ap);
> +void json_parser_reset(JSONParserContext *ctxt);
> +QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token, Error **errp);
> +void json_parser_destroy(JSONParserContext *ctxt);
>
> #endif
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index f6622b82b0a..3b5edc5bae4 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -31,12 +31,105 @@ struct JSONToken {
> char str[];
> };
>
> -typedef struct JSONParserContext {
> - Error *err;
> - JSONToken *current;
> - GQueue *buf;
> - va_list *ap;
> -} JSONParserContext;
> +/*
> + * The JSON parser is a push parser, returning to the caller after every
> + * token.
The thing that returns after every token is json_parser_feed(), right?
Detail not mentioned here: the value it returns. Leaving that to
json_parser_feed()'s contract feels fine, but pointing from here to
there could be useful.
> Therefore it has an explicit representation of its parser
> + * stack; each stack entry consists of a parser state and a QObject:
> + * - a QList, for an array that is being added to
> + * - a QDict, for a dictionary that is being added to
> + * - a QString, for the key of the next pair that will be added to a QDict
> + *
> + * The stack represents an arbitrary nesting of arrays and dictionaries
> + * (whose next key has been parsed); it can also have a dictionary whose
> + * next key has not been parsed, but that can only happen at the top level.
> + * Because of this, the stack contents are always of the form
> + * "(QList | QDict QString)* QDict?".
> + *
> + * An empty stack represents the beginning of the parsing process, with
> + * start state BEFORE_VALUE.
> + */
> +
> +typedef enum JSONParserState {
> + AFTER_LCURLY,
> + AFTER_LSQUARE,
> + BEFORE_KEY,
> + BEFORE_VALUE,
> + END_OF_KEY,
> + END_OF_VALUE,
> +} JSONParserState;
> +
> +typedef struct JSONParserStackEntry {
> + /*
> + * State when the container is completed or, for the top of the stack,
> + * entry state for the next token.
> + */
> + JSONParserState state;
> +
> + /*
> + * A QString with the last parsed key, or a QList/QDict for the current
> + * container.
> + */
> + QObject *partial;
> +} JSONParserStackEntry;
> +
> +/*
> + * This is the JSON grammar that's parsed, with the state transition and
> + * action at each point of the grammar. While this is not a formal
> + * description, "-> action" represents the pseudocode of the action
> + * and "-> STATE" sets the top stack entry's state to STATE.
> + *
> + * // The initial state is BEFORE_VALUE.
> + * input := value -> END_OF_VALUE -> return parsed value
> + * END_OF_INPUT -> check stack is empty
How can the stack *not* be empty here?
> + *
> + * // entered on BEFORE_VALUE; after any of these rules are processed, the
> + * // parser has completed a QObject and is in the END_OF_VALUE state.
> + * //
> + * // When the parser reaches the END_OF_VALUE state, it examines the
> + * // top of the stack to see if it's coming from "input" (stack empty),
> + * // "array_items" (TOS is a QList) or "dict_pairs" (TOS is a QString; the
> + * // item below will be a QDict). It then proceeds with the corresponding
> + * // actions, which will be one of:
> + * // - return parsed value
> + * // - add value to QList
> + * // - pop QString with the key, add key/value to the QDict
> + * value := literal -> END_OF_VALUE
> + * | '[' -> push empty QList -> AFTER_LSQUARE
> + * after_lsquare -> END_OF_VALUE
> + * | '{' -> push empty QDict -> AFTER_LCURLY
> + * after_lcurly -> END_OF_VALUE
> + *
> + * // non-recursive values, entered on BEFORE_VALUE
> + * literal := INTEGER -> END_OF_VALUE
> + * | FLOAT -> END_OF_VALUE
> + * | KEYWORD -> END_OF_VALUE
> + * | STRING -> END_OF_VALUE
> + * | INTERP -> END_OF_VALUE
> + *
> + * // entered on AFTER_LSQUARE
> + * after_lsquare := ']' -> pop completed QList -> END_OF_VALUE
> + * | ϵ -> BEFORE_VALUE
> + * array_items -> END_OF_VALUE
> + *
> + * // entered on BEFORE_VALUE, with TOS being a QList
> + * array_items := value -> add value to QList -> END_OF_VALUE
> + * (']' -> pop completed QList -> END_OF_VALUE
> + * | ',' -> BEFORE_VALUE
> + * array_items) -> END_OF_VALUE
> + *
> + * // entered on AFTER_LCURLY
> + * after_lcurly := '}' -> pop completed QDict -> END_OF_VALUE
> + * | ϵ -> BEFORE_KEY
> + * dict_pairs -> END_OF_VALUE
> + *
> + * // entered on BEFORE_KEY, with TOS being a QDict
> + * dict_pairs := (STRING | INTERP) -> push QString -> END_OF_KEY
> + * ':' -> BEFORE_VALUE
> + * value -> pop QString + add pair to QDict -> END_OF_VALUE
> + * ('}' -> pop completed QDict -> END_OF_VALUE
> + * | ',' -> BEFORE_KEY
> + * dict_pairs) -> END_OF_VALUE
> + */
This is useful.
It doesn't mention how we do parse errors. Leaving that to
json_parser_feed()'s contract feels fine.
>
> #define BUG_ON(cond) assert(!(cond))
>
> @@ -49,7 +142,26 @@ typedef struct JSONParseCrontext {
> * 4) deal with premature EOI
> */
>
> -static QObject *parse_value(JSONParserContext *ctxt);
> +static inline JSONParserStackEntry *current_entry(JSONParserContext *ctxt)
> +{
> + return g_queue_peek_tail(ctxt->stack);
> +}
> +
> +static void push_entry(JSONParserContext *ctxt, QObject *partial,
> + JSONParserState state)
> +{
> + JSONParserStackEntry *entry = g_new(JSONParserStackEntry, 1);
> + entry->partial = partial;
> + entry->state = state;
> + g_queue_push_tail(ctxt->stack, entry);
> +}
> +
> +static JSONParserStackEntry *pop_entry(JSONParserContext *ctxt)
> +{
> + JSONParserStackEntry *entry = g_queue_pop_tail(ctxt->stack);
> + g_free(entry);
> + return current_entry(ctxt);
> +}
This pops the stack and returns the entry now on top. Slightly
surprising; pop operations commonly return the entry popped from the
stack.
It's this way because you use it like
// invariant: @entry is the entry on top of ctxt->stack, null if empty
value = entry->partial;
entry = pop_entry(ctxt);
Okay. A function comment might reduce surprise.
>
> /**
> * Error handler
> @@ -236,200 +348,10 @@ out:
> return NULL;
> }
>
> -/* Note: the token object returned by parser_context_peek_token or
> - * parser_context_pop_token is deleted as soon as parser_context_pop_token
> - * is called again.
> - */
> -static const JSONToken *parser_context_pop_token(JSONParserContext *ctxt)
> +/* Terminals */
> +
> +static QObject *parse_keyword(JSONParserContext *ctxt, const JSONToken *token)
> {
> - g_free(ctxt->current);
> - ctxt->current = g_queue_pop_head(ctxt->buf);
> - return ctxt->current;
> -}
> -
> -static const JSONToken *parser_context_peek_token(JSONParserContext *ctxt)
> -{
> - return g_queue_peek_head(ctxt->buf);
> -}
> -
> -/**
> - * Parsing rules
> - */
> -static int parse_pair(JSONParserContext *ctxt, QDict *dict)
> -{
> - QObject *key_obj = NULL;
> - QString *key;
> - QObject *value;
> - const JSONToken *peek, *token;
> -
> - peek = parser_context_peek_token(ctxt);
> - if (peek == NULL) {
> - parse_error(ctxt, NULL, "premature EOI");
> - goto out;
> - }
> -
> - key_obj = parse_value(ctxt);
> - key = qobject_to(QString, key_obj);
> - if (!key) {
> - parse_error(ctxt, peek, "key is not a string in object");
> - goto out;
> - }
> -
> - token = parser_context_pop_token(ctxt);
> - if (token == NULL) {
> - parse_error(ctxt, NULL, "premature EOI");
> - goto out;
> - }
> -
> - if (token->type != JSON_COLON) {
> - parse_error(ctxt, token, "missing : in object pair");
> - goto out;
> - }
> -
> - value = parse_value(ctxt);
> - if (value == NULL) {
> - parse_error(ctxt, token, "Missing value in dict");
> - goto out;
> - }
> -
> - if (qdict_haskey(dict, qstring_get_str(key))) {
> - parse_error(ctxt, token, "duplicate key");
> - goto out;
> - }
> -
> - qdict_put_obj(dict, qstring_get_str(key), value);
> -
> - qobject_unref(key_obj);
> - return 0;
> -
> -out:
> - qobject_unref(key_obj);
> - return -1;
> -}
> -
> -static QObject *parse_object(JSONParserContext *ctxt)
> -{
> - QDict *dict = NULL;
> - const JSONToken *token, *peek;
> -
> - token = parser_context_pop_token(ctxt);
> - assert(token && token->type == JSON_LCURLY);
> -
> - dict = qdict_new();
> -
> - peek = parser_context_peek_token(ctxt);
> - if (peek == NULL) {
> - parse_error(ctxt, NULL, "premature EOI");
> - goto out;
> - }
> -
> - if (peek->type != JSON_RCURLY) {
> - if (parse_pair(ctxt, dict) == -1) {
> - goto out;
> - }
> -
> - token = parser_context_pop_token(ctxt);
> - if (token == NULL) {
> - parse_error(ctxt, NULL, "premature EOI");
> - goto out;
> - }
> -
> - while (token->type != JSON_RCURLY) {
> - if (token->type != JSON_COMMA) {
> - parse_error(ctxt, token, "expected separator in dict");
> - goto out;
> - }
> -
> - if (parse_pair(ctxt, dict) == -1) {
> - goto out;
> - }
> -
> - token = parser_context_pop_token(ctxt);
> - if (token == NULL) {
> - parse_error(ctxt, NULL, "premature EOI");
> - goto out;
> - }
> - }
> - } else {
> - (void)parser_context_pop_token(ctxt);
> - }
> -
> - return QOBJECT(dict);
> -
> -out:
> - qobject_unref(dict);
> - return NULL;
> -}
> -
> -static QObject *parse_array(JSONParserContext *ctxt)
> -{
> - QList *list = NULL;
> - const JSONToken *token, *peek;
> -
> - token = parser_context_pop_token(ctxt);
> - assert(token && token->type == JSON_LSQUARE);
> -
> - list = qlist_new();
> -
> - peek = parser_context_peek_token(ctxt);
> - if (peek == NULL) {
> - parse_error(ctxt, NULL, "premature EOI");
> - goto out;
> - }
> -
> - if (peek->type != JSON_RSQUARE) {
> - QObject *obj;
> -
> - obj = parse_value(ctxt);
> - if (obj == NULL) {
> - parse_error(ctxt, token, "expecting value");
> - goto out;
> - }
> -
> - qlist_append_obj(list, obj);
> -
> - token = parser_context_pop_token(ctxt);
> - if (token == NULL) {
> - parse_error(ctxt, NULL, "premature EOI");
> - goto out;
> - }
> -
> - while (token->type != JSON_RSQUARE) {
> - if (token->type != JSON_COMMA) {
> - parse_error(ctxt, token, "expected separator in list");
> - goto out;
> - }
> -
> - obj = parse_value(ctxt);
> - if (obj == NULL) {
> - parse_error(ctxt, token, "expecting value");
> - goto out;
> - }
> -
> - qlist_append_obj(list, obj);
> -
> - token = parser_context_pop_token(ctxt);
> - if (token == NULL) {
> - parse_error(ctxt, NULL, "premature EOI");
> - goto out;
> - }
> - }
> - } else {
> - (void)parser_context_pop_token(ctxt);
> - }
> -
> - return QOBJECT(list);
> -
> -out:
> - qobject_unref(list);
> - return NULL;
> -}
> -
> -static QObject *parse_keyword(JSONParserContext *ctxt)
> -{
> - const JSONToken *token;
> -
> - token = parser_context_pop_token(ctxt);
> assert(token && token->type == JSON_KEYWORD);
>
> if (!strcmp(token->str, "true")) {
> @@ -443,11 +365,9 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
> return NULL;
> }
>
> -static QObject *parse_interpolation(JSONParserContext *ctxt)
> +static QObject *parse_interpolation(JSONParserContext *ctxt,
> + const JSONToken *token)
> {
> - const JSONToken *token;
> -
> - token = parser_context_pop_token(ctxt);
> assert(token && token->type == JSON_INTERP);
>
> if (!strcmp(token->str, "%p")) {
> @@ -479,11 +399,8 @@ static QObject *parse_interpolation(JSONParserContext *ctxt)
> return NULL;
> }
>
> -static QObject *parse_literal(JSONParserContext *ctxt)
> +static QObject *parse_literal(JSONParserContext *ctxt, const JSONToken *token)
> {
> - const JSONToken *token;
> -
> - token = parser_context_pop_token(ctxt);
> assert(token);
>
> switch (token->type) {
> @@ -531,35 +448,167 @@ static QObject *parse_literal(JSONParserContext *ctxt)
> }
> }
>
> -static QObject *parse_value(JSONParserContext *ctxt)
> +/* Parsing state machine */
> +
> +static QObject *parse_begin_value(JSONParserContext *ctxt,
> + const JSONToken *token)
> {
> - const JSONToken *token;
> -
> - token = parser_context_peek_token(ctxt);
> - if (token == NULL) {
> - parse_error(ctxt, NULL, "premature EOI");
> - return NULL;
> - }
> -
> switch (token->type) {
> case JSON_LCURLY:
> - return parse_object(ctxt);
> + push_entry(ctxt, QOBJECT(qdict_new()), AFTER_LCURLY);
> + return NULL;
> case JSON_LSQUARE:
> - return parse_array(ctxt);
> + push_entry(ctxt, QOBJECT(qlist_new()), AFTER_LSQUARE);
> + return NULL;
> case JSON_INTERP:
> - return parse_interpolation(ctxt);
> + return parse_interpolation(ctxt, token);
> case JSON_INTEGER:
> case JSON_FLOAT:
> case JSON_STRING:
> - return parse_literal(ctxt);
> + return parse_literal(ctxt, token);
> case JSON_KEYWORD:
> - return parse_keyword(ctxt);
> + return parse_keyword(ctxt, token);
> default:
> parse_error(ctxt, token, "expecting value");
> return NULL;
> }
> }
>
> +static QObject *parse_token(JSONParserContext *ctxt, const JSONToken *token)
> +{
> + JSONParserStackEntry *entry;
> + JSONParserState state;
> + QString *key;
> + QObject *key_obj = NULL, *value = NULL;
> +
> + entry = current_entry(ctxt);
> + state = entry ? entry->state : BEFORE_VALUE;
> + switch (state) {
> + case AFTER_LCURLY:
> + /* Grab '}' for empty object or fall through to BEFORE_KEY */
> + assert(qobject_type(entry->partial) == QTYPE_QDICT);
> + if (token->type == JSON_RCURLY) {
> + value = entry->partial;
> + entry = pop_entry(ctxt);
> + break;
> + }
> + entry->state = BEFORE_KEY;
> + /* fall through */
> +
> + case BEFORE_KEY:
> + /* Expecting object key */
> + assert(qobject_type(entry->partial) == QTYPE_QDICT);
> + if (token->type == JSON_STRING || token->type == JSON_INTERP) {
> + key_obj = parse_begin_value(ctxt, token);
v2 used parse_string() here, which broke interpolation with %s. This
version works.
> + if (!key_obj) {
> + /* parse error happened */
> + return NULL;
> + }
> + }
> + if (!key_obj || qobject_type(key_obj) != QTYPE_QSTRING) {
> + parse_error(ctxt, token, "key is not a string in object");
> + return NULL;
> + }
> +
> + /* Store key in a special entry on the stack */
> + push_entry(ctxt, key_obj, END_OF_KEY);
> + return NULL;
> +
> + case END_OF_KEY:
> + /* Expecting ':' after key */
> + assert(qobject_type(entry->partial) == QTYPE_QSTRING);
> + if (token->type == JSON_COLON) {
> + entry->state = BEFORE_VALUE;
> + } else {
> + parse_error(ctxt, token, "expecting ':'");
> + }
> + return NULL;
> +
> + case AFTER_LSQUARE:
> + /* Grab ']' for empty array or fall through to BEFORE_VALUE */
> + assert(qobject_type(entry->partial) == QTYPE_QLIST);
> + if (token->type == JSON_RSQUARE) {
> + value = entry->partial;
> + entry = pop_entry(ctxt);
> + break;
> + }
> + entry->state = BEFORE_VALUE;
> + /* fall through */
> +
> + case BEFORE_VALUE:
> + /* Expecting value */
> + assert(!entry || qobject_type(entry->partial) != QTYPE_QDICT);
> + value = parse_begin_value(ctxt, token);
> + if (!value) {
> + /* Error or '['/'{' */
> + return NULL;
> + }
> + /* Return value or insert it into a container */
> + break;
> +
> + case END_OF_VALUE:
> + /* Grab ',' or ']' for array; ',' or '}' for object */
> + if (qobject_to(QList, entry->partial)) {
> + /* Array */
> + if (token->type != JSON_RSQUARE) {
> + if (token->type == JSON_COMMA) {
> + entry->state = BEFORE_VALUE;
> + } else {
> + parse_error(ctxt, token, "expected ',' or ']'");
> + }
> + return NULL;
> + }
> + } else if (qobject_to(QDict, entry->partial)) {
> + /* Object */
> + if (token->type != JSON_RCURLY) {
> + if (token->type == JSON_COMMA) {
> + entry->state = BEFORE_KEY;
> + } else {
> + parse_error(ctxt, token, "expected ',' or '}'");
> + }
> + return NULL;
> + }
> + } else {
> + g_assert_not_reached();
> + }
> +
> + /* Got ']' or '}'; return full value or insert into parent container */
> + value = entry->partial;
> + entry = pop_entry(ctxt);
> + break;
> + }
> +
> + assert(value);
> + if (entry == NULL) {
> + /* The toplevel value is complete. */
Maybe
/* Parse stack is empty, top level value is complete */
> + return value;
> + }
> +
Suggest
/*
* Parse stack is not empty.
* If we're parsing an object, it's QString (key) on top of
* QDict. Pop off key, and store (key, value) in QDict.
* If we're parsing an array, it's QList. Store value in it.
*/
> + key = qobject_to(QString, entry->partial);
> + if (key) {
> + const char *key_str;
> + QDict *dict;
> +
> + entry = pop_entry(ctxt);
> + dict = qobject_to(QDict, entry->partial);
> + assert(dict);
> + key_str = qstring_get_str(key);
> + if (qdict_haskey(dict, key_str)) {
> + parse_error(ctxt, token, "duplicate key");
> + qobject_unref(value);
> + return NULL;
> + }
> + qdict_put_obj(dict, key_str, value);
> + qobject_unref(key);
> + } else {
> + /* Add to array */
> + qlist_append_obj(qobject_to(QList, entry->partial), value);
> + }
> +
> + entry->state = END_OF_VALUE;
> + return NULL;
> +}
> +
> JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr)
> {
> JSONToken *token = g_malloc(sizeof(JSONToken) + tokstr->len + 1);
> @@ -572,20 +621,56 @@ JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr)
> return token;
> }
>
> -QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp)
> +void json_parser_reset(JSONParserContext *ctxt)
> {
> - JSONParserContext ctxt = { .buf = tokens, .ap = ap };
> - QObject *result;
> + JSONParserStackEntry *entry;
>
> - result = parse_value(&ctxt);
> - assert(ctxt.err || g_queue_is_empty(ctxt.buf));
> -
> - error_propagate(errp, ctxt.err);
> -
> - while (!g_queue_is_empty(ctxt.buf)) {
> - parser_context_pop_token(&ctxt);
> + ctxt->err = NULL;
> + while ((entry = g_queue_pop_tail(ctxt->stack)) != NULL) {
> + qobject_unref(entry->partial);
> + g_free(entry);
> }
> - g_free(ctxt.current);
> +}
>
> +void json_parser_init(JSONParserContext *ctxt, va_list *ap)
> +{
> + ctxt->stack = g_queue_new();
> + ctxt->ap = ap;
> + json_parser_reset(ctxt);
> +}
> +
> +void json_parser_destroy(JSONParserContext *ctxt)
> +{
> + json_parser_reset(ctxt);
> + g_queue_free(ctxt->stack);
> + ctxt->stack = NULL;
> +}
> +
> +/*
> + * Advance the parser based on the token that is passed.
> + * Return the finished toplevel value if the token completes it.
My dictionary wants "top level" or "top-level".
> + * If an error is returned, the function must not be called without
> + * first resetting the parser.
> + */
> +QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token,
> + Error **errp)
> +{
> + QObject *result = NULL;
> +
> + assert(!ctxt->err);
> + switch (token->type) {
> + case JSON_END_OF_INPUT:
> + /* Check for premature end of input */
> + if (!g_queue_is_empty(ctxt->stack)) {
> + parse_error(ctxt, token, "premature end of input");
> + }
> + break;
> +
> + default:
> + result = parse_token(ctxt, token);
> + break;
> + }
> +
> + error_propagate(errp, ctxt->err);
> return result;
> }
> diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
> index b93d97b995f..6c93e6fd78d 100644
> --- a/qobject/json-streamer.c
> +++ b/qobject/json-streamer.c
> @@ -32,6 +32,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
> JSONTokenType type, int x, int y)
> {
> JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
> + JSONParserContext ctxt;
> QObject *json = NULL;
> Error *err = NULL;
> JSONToken *token;
> @@ -56,8 +57,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
> if (g_queue_is_empty(&parser->tokens)) {
> return;
> }
> - json = json_parser_parse(&parser->tokens, parser->ap, &err);
> - goto out_emit;
> + break;
> default:
> break;
> }
> @@ -85,11 +85,24 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
> g_queue_push_tail(&parser->tokens, token);
>
> if ((parser->brace_count > 0 || parser->bracket_count > 0)
> - && parser->brace_count >= 0 && parser->bracket_count >= 0) {
> + && parser->brace_count >= 0 && parser->bracket_count >= 0
> + && type != JSON_END_OF_INPUT) {
> return;
> }
>
> - json = json_parser_parse(&parser->tokens, parser->ap, &err);
> + json_parser_init(&ctxt, parser->ap);
> +
> + /* Process all tokens in the queue */
> + while (!g_queue_is_empty(&parser->tokens)) {
> + token = g_queue_pop_head(&parser->tokens);
> + json = json_parser_feed(&ctxt, token, &err);
> + g_free(token);
> + if (json || err) {
> + break;
> + }
> + }
> +
> + json_parser_destroy(&ctxt);
>
> out_emit:
> parser->brace_count = 0;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/7] json-parser: replace with a push parser
2026-06-12 14:21 ` Markus Armbruster
@ 2026-06-12 15:08 ` Paolo Bonzini
0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-06-12 15:08 UTC (permalink / raw)
To: Markus Armbruster; +Cc: qemu-devel
On 6/12/26 16:21, Markus Armbruster wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> In order to avoid stashing all the tokens corresponding to a JSON value,
>> embed the parsing stack and state machine in JSONParser. This is more
>> efficient and allows for more prompt error recovery; it also does not
>> make the code substantially larger than the current recursive descent
>> parser, though the state machine is probably a bit harder to follow.
>>
>> The stack consists of QLists and QDicts corresponding to open
>> brackets and braces, plus optionally a QString with the current
>> key on top of each QDict.
>>
>> After each value is parsed, it is added to the top array or dictionary
>> or, if the stack is empty, json_parser_feed returns the complete
>> QObject.
>>
>> For now, json-streamer.c keeps tracking the tokens up until braces
>> and brackets are balanced, and then shoves the whole queue of tokens
>> into the push parser. The only logic change is that JSON_END_OF_INPUT
>> always triggers the emptying of the queue; the parser takes notice and
>> checks that there is nothing on the stack. Not using brace_count
>> and bracket_count for this is the first step towards improved separation
>> of concerns between json-parser.c and json-streamer.c.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> include/qobject/json-parser.h | 6 +
>> qobject/json-parser-int.h | 5 +-
>> qobject/json-parser.c | 551 ++++++++++++++++++++--------------
>> qobject/json-streamer.c | 21 +-
>> 4 files changed, 345 insertions(+), 238 deletions(-)
>>
>> diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
>> index 7345a9bd5cb..05346fa816b 100644
>> --- a/include/qobject/json-parser.h
>> +++ b/include/qobject/json-parser.h
>> @@ -20,6 +20,12 @@ typedef struct JSONLexer {
>> int x, y;
>> } JSONLexer;
>>
>> +typedef struct JSONParserContext {
>> + Error *err;
>> + GQueue *stack;
>> + va_list *ap;
>> +} JSONParserContext;
>> +
>> typedef struct JSONMessageParser {
>> void (*emit)(void *opaque, QObject *json, Error *err);
>> void *opaque;
>> diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
>> index 8c01f236276..1f435cb8eb2 100644
>> --- a/qobject/json-parser-int.h
>> +++ b/qobject/json-parser-int.h
>> @@ -49,6 +49,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
>>
>> /* json-parser.c */
>> JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
>> -QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
>> +void json_parser_init(JSONParserContext *ctxt, va_list *ap);
>> +void json_parser_reset(JSONParserContext *ctxt);
>> +QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token, Error **errp);
>> +void json_parser_destroy(JSONParserContext *ctxt);
>>
>> #endif
>> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
>> index f6622b82b0a..3b5edc5bae4 100644
>> --- a/qobject/json-parser.c
>> +++ b/qobject/json-parser.c
>> @@ -31,12 +31,105 @@ struct JSONToken {
>> char str[];
>> };
>>
>> -typedef struct JSONParserContext {
>> - Error *err;
>> - JSONToken *current;
>> - GQueue *buf;
>> - va_list *ap;
>> -} JSONParserContext;
>> +/*
>> + * The JSON parser is a push parser, returning to the caller after every
>> + * token.
>
> The thing that returns after every token is json_parser_feed(), right?
>
> Detail not mentioned here: the value it returns. Leaving that to
> json_parser_feed()'s contract feels fine, but pointing from here to
> there could be useful.
"returning a completed top-level object, an error, or NULL (if the
object is incomplete and no error happened) after every token"?
>> + * // The initial state is BEFORE_VALUE.
>> + * input := value -> END_OF_VALUE -> return parsed value
>> + * END_OF_INPUT -> check stack is empty
>
> How can the stack *not* be empty here?
Right, this is not END_OF_INPUT in the middle of the stream. Will delete.
>> + * // entered on BEFORE_KEY, with TOS being a QDict
>> + * dict_pairs := (STRING | INTERP) -> push QString -> END_OF_KEY
>> + * ':' -> BEFORE_VALUE
>> + * value -> pop QString + add pair to QDict -> END_OF_VALUE
>> + * ('}' -> pop completed QDict -> END_OF_VALUE
>> + * | ',' -> BEFORE_KEY
>> + * dict_pairs) -> END_OF_VALUE
>> + */
>
> This is useful.
>
> It doesn't mention how we do parse errors. Leaving that to
> json_parser_feed()'s contract feels fine.
Right---parse errors are out of the scope because recovery happens in
json-streamer.c.
I can add a note for this and everything else, thanks for the review!
Rewrites are not the most enticing form of thing to receive, or the most
polite to send.
Paolo
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-12 15:09 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 1/7] json-parser: constify JSONToken Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 2/7] json-parser: replace with a push parser Paolo Bonzini
2026-06-12 14:21 ` Markus Armbruster
2026-06-12 15:08 ` Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 3/7] json-streamer: reuse parser Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 4/7] json-streamer: make brace/bracket count unsigned Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 5/7] json-streamer: remove token queue Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 6/7] json-streamer: do not heap-allocate JSONToken Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 7/7] json-parser: add location to JSON parsing errors Paolo Bonzini
2026-06-02 8:58 ` [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.