* [PATCH v4 0/6] qobject: switch JSON parser to push
@ 2026-06-26 10:17 Paolo Bonzini
2026-06-26 10:17 ` [PATCH 1/6] json-parser: replace with a push parser Paolo Bonzini
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-06-26 10:17 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru
This rewrites the json-parser to use a push parser aka state machine.
While push parsers are inherently more complex than recursive descent,
the grammar for JSON is simple enough that the parser remains readable.
There is therefore no need to use e.g. QEMU coroutines.
Unlike the suggestion in commit 62815d85aed ("json: Redesign the callback
to consume JSON values", 2018-08-24), I kept the json-streamer concept.
It helps in handling input limits, it performs error recovery, and it
converts the token-at-a-time push interface to callbacks---all things
that are more easily done in a separate layer to keep the parser clean.
However, there is no need anymore for it to store partial JSON objects
in tokenized form, because the current state is stored in the push
parser's stack.
Another benefit is that QEMU can report the first parsing error
immediately, without waiting for parentheses to be balanced or for a
lexing error. Error recovery then proceeds as before (i.e., the next
parse still starts after balanced parentheses or a lexing error).
On top of the benefits intrinsic in the push architecture, it so happens
that it's really easy to add a location to JSON parsing errors now, so
do that as well.
The diffstat is unfavorable, but most of the new lines delta is really
new comments explaining the grammar and state machines.
Almost the same as v3, the only substantial change being to
restore the "expecting value" (actually now "expecting key") error
for not having a string or interpolation where a key is expected.
Other changes:
- Add some extra comments to json-parser.c
- Consistently use "top-level"
- Move json_parser_reset() right after the out_emit label
- Avoid <= comparisons for unsigned variables
- Add extra comments about error recovery situations in json-streamer.c
- Avoid double "JSON parser error, JSON parser error, stray '%s'"
- Spell location as %d:%d rather than "at line %d, column %d"
Thanks,
Paolo
Paolo Bonzini (6):
json-parser: replace with a push parser
json-streamer: reuse parser
json-streamer: make brace/bracket count unsigned
json-streamer: remove token queue
json-streamer: do not heap-allocate JSONToken
json-parser: add location to JSON parsing errors
include/qobject/json-parser.h | 16 +-
qobject/json-parser-int.h | 13 +-
qobject/json-lexer.c | 11 +-
qobject/json-parser.c | 589 +++++++++++++++++++---------------
qobject/json-streamer.c | 121 +++----
5 files changed, 427 insertions(+), 323 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/6] json-parser: replace with a push parser
2026-06-26 10:17 [PATCH v4 0/6] qobject: switch JSON parser to push Paolo Bonzini
@ 2026-06-26 10:17 ` Paolo Bonzini
2026-06-29 13:02 ` Markus Armbruster
2026-06-26 10:17 ` [PATCH 2/6] json-streamer: reuse parser Paolo Bonzini
` (5 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2026-06-26 10:17 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru
In order to avoid stashing all the tokens corresponding to a JSON value,
embed the parsing stack and state machine in JSONParser. This is more
efficient and allows for more prompt error recovery; it also does not
make the code substantially larger than the current recursive descent
parser, though the state machine is probably a bit harder to follow.
The stack consists of QLists and QDicts corresponding to open
brackets and braces, plus optionally a QString with the current
key on top of each QDict.
After each value is parsed, it is added to the top array or dictionary
or, if the stack is empty, json_parser_feed returns the complete
QObject.
For now, json-streamer.c keeps tracking the tokens up until braces
and brackets are balanced, and then shoves the whole queue of tokens
into the push parser. The only logic change is that JSON_END_OF_INPUT
always triggers the emptying of the queue; the parser takes notice and
checks that there is nothing on the stack. Not using brace_count
and bracket_count for this is the first step towards improved separation
of concerns between json-parser.c and json-streamer.c.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 6 +
qobject/json-parser-int.h | 5 +-
qobject/json-parser.c | 565 ++++++++++++++++++++--------------
qobject/json-streamer.c | 21 +-
4 files changed, 359 insertions(+), 238 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 7345a9bd5cb..05346fa816b 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -20,6 +20,12 @@ typedef struct JSONLexer {
int x, y;
} JSONLexer;
+typedef struct JSONParserContext {
+ Error *err;
+ GQueue *stack;
+ va_list *ap;
+} JSONParserContext;
+
typedef struct JSONMessageParser {
void (*emit)(void *opaque, QObject *json, Error *err);
void *opaque;
diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
index 8c01f236276..1f435cb8eb2 100644
--- a/qobject/json-parser-int.h
+++ b/qobject/json-parser-int.h
@@ -49,6 +49,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
/* json-parser.c */
JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
-QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
+void json_parser_init(JSONParserContext *ctxt, va_list *ap);
+void json_parser_reset(JSONParserContext *ctxt);
+QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token, Error **errp);
+void json_parser_destroy(JSONParserContext *ctxt);
#endif
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index f6622b82b0a..845da3699aa 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -31,12 +31,111 @@ struct JSONToken {
char str[];
};
-typedef struct JSONParserContext {
- Error *err;
- JSONToken *current;
- GQueue *buf;
- va_list *ap;
-} JSONParserContext;
+/*
+ * The JSON parser is a push parser, returning to the caller after every
+ * token. Therefore it has an explicit representation of its parser
+ * stack; each stack entry consists of a parser state and a QObject:
+ * - a QList, for an array that is being added to
+ * - a QDict, for a dictionary that is being added to
+ * - a QString, for the key of the next pair that will be added to a QDict
+ *
+ * The stack represents an arbitrary nesting of arrays and dictionaries
+ * (whose next key has been parsed); it can also have a dictionary whose
+ * next key has not been parsed, but that can only happen at the top level.
+ * Because of this, the stack contents are always of the form
+ * "(QList | QDict QString)* QDict?".
+ *
+ * An empty stack represents the beginning of the parsing process, with
+ * start state BEFORE_VALUE.
+ */
+
+typedef enum JSONParserState {
+ AFTER_LCURLY,
+ AFTER_LSQUARE,
+ BEFORE_KEY,
+ BEFORE_VALUE,
+ END_OF_KEY,
+ END_OF_VALUE,
+} JSONParserState;
+
+typedef struct JSONParserStackEntry {
+ /*
+ * State when the container is completed or, for the top of the stack,
+ * entry state for the next token.
+ */
+ JSONParserState state;
+
+ /*
+ * A QString with the last parsed key, or a QList/QDict for the current
+ * container.
+ */
+ QObject *partial;
+} JSONParserStackEntry;
+
+/*
+ * This is the JSON grammar that's parsed, with the state transition and
+ * action at each point of the grammar. While this is not a formal
+ * description, "-> action" represents the pseudocode of the action
+ * and "-> STATE" sets the top stack entry's state to STATE.
+ *
+ * The state alone is enough to tell you what to parse; the state plus
+ * the type of the top of stack tells you which action to take.
+ *
+ * // The initial state is BEFORE_VALUE.
+ * input := value -> END_OF_VALUE -> return parsed value
+ * (input | END_OF_INPUT)
+ *
+ * // entered on BEFORE_VALUE; after any of these rules are processed, the
+ * // parser has completed a QObject and is in the END_OF_VALUE state.
+ * //
+ * // When the parser reaches the END_OF_VALUE state, it examines the
+ * // top of the stack to see if it's coming from "input" (stack empty),
+ * // "array_items" (TOS is a QList) or "dict_pairs" (TOS is a QString; the
+ * // item below will be a QDict). It then proceeds with the corresponding
+ * // actions, which will be one of:
+ * // - return parsed value
+ * // - add value to QList
+ * // - pop QString with the key, add key/value to the QDict
+ * value := literal -> END_OF_VALUE
+ * | '[' -> push empty QList -> AFTER_LSQUARE
+ * after_lsquare -> END_OF_VALUE
+ * | '{' -> push empty QDict -> AFTER_LCURLY
+ * after_lcurly -> END_OF_VALUE
+ *
+ * // non-recursive values, entered on BEFORE_VALUE
+ * literal := INTEGER -> END_OF_VALUE
+ * | FLOAT -> END_OF_VALUE
+ * | KEYWORD -> END_OF_VALUE
+ * | STRING -> END_OF_VALUE
+ * | INTERP -> END_OF_VALUE
+ *
+ * // entered on AFTER_LSQUARE
+ * after_lsquare := ']' -> pop completed QList -> END_OF_VALUE
+ * | ϵ -> BEFORE_VALUE
+ * array_items -> END_OF_VALUE
+ *
+ * // entered on BEFORE_VALUE, with TOS being a QList
+ * array_items := value -> add value to QList -> END_OF_VALUE
+ * (']' -> pop completed QList -> END_OF_VALUE
+ * | ',' -> BEFORE_VALUE
+ * array_items) -> END_OF_VALUE
+ *
+ * // entered on AFTER_LCURLY
+ * after_lcurly := '}' -> pop completed QDict -> END_OF_VALUE
+ * | ϵ -> BEFORE_KEY
+ * dict_pairs -> END_OF_VALUE
+ *
+ * // entered on BEFORE_KEY, with TOS being a QDict
+ * dict_pairs := (STRING | INTERP) -> push QString -> END_OF_KEY
+ * ':' -> BEFORE_VALUE
+ * value -> pop QString + add pair to QDict -> END_OF_VALUE
+ * ('}' -> pop completed QDict -> END_OF_VALUE
+ * | ',' -> BEFORE_KEY
+ * dict_pairs) -> END_OF_VALUE
+ *
+ * Parse errors ignore the token. json_parser_reset() can be
+ * called to restart parsing from scratch, with an empty stack.
+ */
#define BUG_ON(cond) assert(!(cond))
@@ -49,7 +148,27 @@ typedef struct JSONParserContext {
* 4) deal with premature EOI
*/
-static QObject *parse_value(JSONParserContext *ctxt);
+static inline JSONParserStackEntry *current_entry(JSONParserContext *ctxt)
+{
+ return g_queue_peek_tail(ctxt->stack);
+}
+
+static void push_entry(JSONParserContext *ctxt, QObject *partial,
+ JSONParserState state)
+{
+ JSONParserStackEntry *entry = g_new(JSONParserStackEntry, 1);
+ entry->partial = partial;
+ entry->state = state;
+ g_queue_push_tail(ctxt->stack, entry);
+}
+
+/* Drop the top entry and return the new top entry. */
+static JSONParserStackEntry *pop_entry(JSONParserContext *ctxt)
+{
+ JSONParserStackEntry *entry = g_queue_pop_tail(ctxt->stack);
+ g_free(entry);
+ return current_entry(ctxt);
+}
/**
* Error handler
@@ -236,200 +355,10 @@ out:
return NULL;
}
-/* Note: the token object returned by parser_context_peek_token or
- * parser_context_pop_token is deleted as soon as parser_context_pop_token
- * is called again.
- */
-static const JSONToken *parser_context_pop_token(JSONParserContext *ctxt)
+/* Terminals */
+
+static QObject *parse_keyword(JSONParserContext *ctxt, const JSONToken *token)
{
- g_free(ctxt->current);
- ctxt->current = g_queue_pop_head(ctxt->buf);
- return ctxt->current;
-}
-
-static const JSONToken *parser_context_peek_token(JSONParserContext *ctxt)
-{
- return g_queue_peek_head(ctxt->buf);
-}
-
-/**
- * Parsing rules
- */
-static int parse_pair(JSONParserContext *ctxt, QDict *dict)
-{
- QObject *key_obj = NULL;
- QString *key;
- QObject *value;
- const JSONToken *peek, *token;
-
- peek = parser_context_peek_token(ctxt);
- if (peek == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- key_obj = parse_value(ctxt);
- key = qobject_to(QString, key_obj);
- if (!key) {
- parse_error(ctxt, peek, "key is not a string in object");
- goto out;
- }
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- if (token->type != JSON_COLON) {
- parse_error(ctxt, token, "missing : in object pair");
- goto out;
- }
-
- value = parse_value(ctxt);
- if (value == NULL) {
- parse_error(ctxt, token, "Missing value in dict");
- goto out;
- }
-
- if (qdict_haskey(dict, qstring_get_str(key))) {
- parse_error(ctxt, token, "duplicate key");
- goto out;
- }
-
- qdict_put_obj(dict, qstring_get_str(key), value);
-
- qobject_unref(key_obj);
- return 0;
-
-out:
- qobject_unref(key_obj);
- return -1;
-}
-
-static QObject *parse_object(JSONParserContext *ctxt)
-{
- QDict *dict = NULL;
- const JSONToken *token, *peek;
-
- token = parser_context_pop_token(ctxt);
- assert(token && token->type == JSON_LCURLY);
-
- dict = qdict_new();
-
- peek = parser_context_peek_token(ctxt);
- if (peek == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- if (peek->type != JSON_RCURLY) {
- if (parse_pair(ctxt, dict) == -1) {
- goto out;
- }
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- while (token->type != JSON_RCURLY) {
- if (token->type != JSON_COMMA) {
- parse_error(ctxt, token, "expected separator in dict");
- goto out;
- }
-
- if (parse_pair(ctxt, dict) == -1) {
- goto out;
- }
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
- }
- } else {
- (void)parser_context_pop_token(ctxt);
- }
-
- return QOBJECT(dict);
-
-out:
- qobject_unref(dict);
- return NULL;
-}
-
-static QObject *parse_array(JSONParserContext *ctxt)
-{
- QList *list = NULL;
- const JSONToken *token, *peek;
-
- token = parser_context_pop_token(ctxt);
- assert(token && token->type == JSON_LSQUARE);
-
- list = qlist_new();
-
- peek = parser_context_peek_token(ctxt);
- if (peek == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- if (peek->type != JSON_RSQUARE) {
- QObject *obj;
-
- obj = parse_value(ctxt);
- if (obj == NULL) {
- parse_error(ctxt, token, "expecting value");
- goto out;
- }
-
- qlist_append_obj(list, obj);
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
-
- while (token->type != JSON_RSQUARE) {
- if (token->type != JSON_COMMA) {
- parse_error(ctxt, token, "expected separator in list");
- goto out;
- }
-
- obj = parse_value(ctxt);
- if (obj == NULL) {
- parse_error(ctxt, token, "expecting value");
- goto out;
- }
-
- qlist_append_obj(list, obj);
-
- token = parser_context_pop_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- goto out;
- }
- }
- } else {
- (void)parser_context_pop_token(ctxt);
- }
-
- return QOBJECT(list);
-
-out:
- qobject_unref(list);
- return NULL;
-}
-
-static QObject *parse_keyword(JSONParserContext *ctxt)
-{
- const JSONToken *token;
-
- token = parser_context_pop_token(ctxt);
assert(token && token->type == JSON_KEYWORD);
if (!strcmp(token->str, "true")) {
@@ -443,11 +372,9 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
return NULL;
}
-static QObject *parse_interpolation(JSONParserContext *ctxt)
+static QObject *parse_interpolation(JSONParserContext *ctxt,
+ const JSONToken *token)
{
- const JSONToken *token;
-
- token = parser_context_pop_token(ctxt);
assert(token && token->type == JSON_INTERP);
if (!strcmp(token->str, "%p")) {
@@ -479,11 +406,8 @@ static QObject *parse_interpolation(JSONParserContext *ctxt)
return NULL;
}
-static QObject *parse_literal(JSONParserContext *ctxt)
+static QObject *parse_literal(JSONParserContext *ctxt, const JSONToken *token)
{
- const JSONToken *token;
-
- token = parser_context_pop_token(ctxt);
assert(token);
switch (token->type) {
@@ -531,35 +455,174 @@ static QObject *parse_literal(JSONParserContext *ctxt)
}
}
-static QObject *parse_value(JSONParserContext *ctxt)
+/* Parsing state machine */
+
+static QObject *parse_begin_value(JSONParserContext *ctxt,
+ const JSONToken *token)
{
- const JSONToken *token;
-
- token = parser_context_peek_token(ctxt);
- if (token == NULL) {
- parse_error(ctxt, NULL, "premature EOI");
- return NULL;
- }
-
switch (token->type) {
case JSON_LCURLY:
- return parse_object(ctxt);
+ push_entry(ctxt, QOBJECT(qdict_new()), AFTER_LCURLY);
+ return NULL;
case JSON_LSQUARE:
- return parse_array(ctxt);
+ push_entry(ctxt, QOBJECT(qlist_new()), AFTER_LSQUARE);
+ return NULL;
case JSON_INTERP:
- return parse_interpolation(ctxt);
+ return parse_interpolation(ctxt, token);
case JSON_INTEGER:
case JSON_FLOAT:
case JSON_STRING:
- return parse_literal(ctxt);
+ return parse_literal(ctxt, token);
case JSON_KEYWORD:
- return parse_keyword(ctxt);
+ return parse_keyword(ctxt, token);
default:
parse_error(ctxt, token, "expecting value");
return NULL;
}
}
+static QObject *parse_token(JSONParserContext *ctxt, const JSONToken *token)
+{
+ JSONParserStackEntry *entry;
+ JSONParserState state;
+ QString *key;
+ QObject *key_obj = NULL, *value = NULL;
+
+ entry = current_entry(ctxt);
+ state = entry ? entry->state : BEFORE_VALUE;
+ switch (state) {
+ case AFTER_LCURLY:
+ /* Grab '}' for empty object or fall through to BEFORE_KEY */
+ assert(qobject_type(entry->partial) == QTYPE_QDICT);
+ if (token->type == JSON_RCURLY) {
+ value = entry->partial;
+ entry = pop_entry(ctxt);
+ break;
+ }
+ entry->state = BEFORE_KEY;
+ /* fall through */
+
+ case BEFORE_KEY:
+ /* Expecting object key */
+ assert(qobject_type(entry->partial) == QTYPE_QDICT);
+ if (token->type != JSON_STRING && token->type != JSON_INTERP) {
+ parse_error(ctxt, token, "expecting key");
+ return NULL;
+ }
+
+ key_obj = parse_begin_value(ctxt, token);
+ if (!key_obj) {
+ /* Parse error already reported */
+ } else if (qobject_type(key_obj) != QTYPE_QSTRING) {
+ /* An interpolation was valid syntactically but not %s */
+ parse_error(ctxt, token, "key is not a string in object");
+ } else {
+ /* Store key in a special entry on the stack */
+ push_entry(ctxt, key_obj, END_OF_KEY);
+ }
+ return NULL;
+
+ case END_OF_KEY:
+ /* Expecting ':' after key */
+ assert(qobject_type(entry->partial) == QTYPE_QSTRING);
+ if (token->type == JSON_COLON) {
+ entry->state = BEFORE_VALUE;
+ } else {
+ parse_error(ctxt, token, "expecting ':'");
+ }
+ return NULL;
+
+ case AFTER_LSQUARE:
+ /* Grab ']' for empty array or fall through to BEFORE_VALUE */
+ assert(qobject_type(entry->partial) == QTYPE_QLIST);
+ if (token->type == JSON_RSQUARE) {
+ value = entry->partial;
+ entry = pop_entry(ctxt);
+ break;
+ }
+ entry->state = BEFORE_VALUE;
+ /* fall through */
+
+ case BEFORE_VALUE:
+ /* Expecting value */
+ assert(!entry || qobject_type(entry->partial) != QTYPE_QDICT);
+ value = parse_begin_value(ctxt, token);
+ if (!value) {
+ /* Error or '['/'{' */
+ return NULL;
+ }
+ /* Return value or insert it into a container */
+ break;
+
+ case END_OF_VALUE:
+ /* Grab ',' or ']' for array; ',' or '}' for object */
+ if (qobject_to(QList, entry->partial)) {
+ /* Array */
+ if (token->type != JSON_RSQUARE) {
+ if (token->type == JSON_COMMA) {
+ entry->state = BEFORE_VALUE;
+ } else {
+ parse_error(ctxt, token, "expected ',' or ']'");
+ }
+ return NULL;
+ }
+ } else if (qobject_to(QDict, entry->partial)) {
+ /* Object */
+ if (token->type != JSON_RCURLY) {
+ if (token->type == JSON_COMMA) {
+ entry->state = BEFORE_KEY;
+ } else {
+ parse_error(ctxt, token, "expected ',' or '}'");
+ }
+ return NULL;
+ }
+ } else {
+ g_assert_not_reached();
+ }
+
+ /* Got ']' or '}'; return full value or insert into parent container */
+ value = entry->partial;
+ entry = pop_entry(ctxt);
+ break;
+ }
+
+ assert(value);
+ if (entry == NULL) {
+ /* Parse stack now empty, the top-level value is complete. */
+ return value;
+ }
+
+ /*
+ * Parse stack is not empty and entry->partial is the top of stack.
+ * It's a QString with the key (and a QDict is below it) if we're
+ * parsing an object, or a QList if we're parsing an array.
+ */
+ key = qobject_to(QString, entry->partial);
+ if (key) {
+ const char *key_str;
+ QDict *dict;
+
+ /* Pop off key, and store (key, value) in QDict. */
+ entry = pop_entry(ctxt);
+ dict = qobject_to(QDict, entry->partial);
+ assert(dict);
+ key_str = qstring_get_str(key);
+ if (qdict_haskey(dict, key_str)) {
+ parse_error(ctxt, token, "duplicate key");
+ qobject_unref(value);
+ return NULL;
+ }
+ qdict_put_obj(dict, key_str, value);
+ qobject_unref(key);
+ } else {
+ /* Array, just store value in the QList. */
+ qlist_append_obj(qobject_to(QList, entry->partial), value);
+ }
+
+ entry->state = END_OF_VALUE;
+ return NULL;
+}
+
JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr)
{
JSONToken *token = g_malloc(sizeof(JSONToken) + tokstr->len + 1);
@@ -572,20 +635,56 @@ JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr)
return token;
}
-QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp)
+void json_parser_reset(JSONParserContext *ctxt)
{
- JSONParserContext ctxt = { .buf = tokens, .ap = ap };
- QObject *result;
+ JSONParserStackEntry *entry;
- result = parse_value(&ctxt);
- assert(ctxt.err || g_queue_is_empty(ctxt.buf));
-
- error_propagate(errp, ctxt.err);
-
- while (!g_queue_is_empty(ctxt.buf)) {
- parser_context_pop_token(&ctxt);
+ ctxt->err = NULL;
+ while ((entry = g_queue_pop_tail(ctxt->stack)) != NULL) {
+ qobject_unref(entry->partial);
+ g_free(entry);
}
- g_free(ctxt.current);
+}
+void json_parser_init(JSONParserContext *ctxt, va_list *ap)
+{
+ ctxt->stack = g_queue_new();
+ ctxt->ap = ap;
+ json_parser_reset(ctxt);
+}
+
+void json_parser_destroy(JSONParserContext *ctxt)
+{
+ json_parser_reset(ctxt);
+ g_queue_free(ctxt->stack);
+ ctxt->stack = NULL;
+}
+
+/*
+ * Advance the parser based on the token that is passed.
+ * Return the finished top-level value if the token completes it.
+ * If an error is returned, the function must not be called without
+ * first resetting the parser.
+ */
+QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token,
+ Error **errp)
+{
+ QObject *result = NULL;
+
+ assert(!ctxt->err);
+ switch (token->type) {
+ case JSON_END_OF_INPUT:
+ /* Check for premature end of input */
+ if (!g_queue_is_empty(ctxt->stack)) {
+ parse_error(ctxt, token, "premature end of input");
+ }
+ break;
+
+ default:
+ result = parse_token(ctxt, token);
+ break;
+ }
+
+ error_propagate(errp, ctxt->err);
return result;
}
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index b93d97b995f..6c93e6fd78d 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -32,6 +32,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
JSONTokenType type, int x, int y)
{
JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
+ JSONParserContext ctxt;
QObject *json = NULL;
Error *err = NULL;
JSONToken *token;
@@ -56,8 +57,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
if (g_queue_is_empty(&parser->tokens)) {
return;
}
- json = json_parser_parse(&parser->tokens, parser->ap, &err);
- goto out_emit;
+ break;
default:
break;
}
@@ -85,11 +85,24 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
g_queue_push_tail(&parser->tokens, token);
if ((parser->brace_count > 0 || parser->bracket_count > 0)
- && parser->brace_count >= 0 && parser->bracket_count >= 0) {
+ && parser->brace_count >= 0 && parser->bracket_count >= 0
+ && type != JSON_END_OF_INPUT) {
return;
}
- json = json_parser_parse(&parser->tokens, parser->ap, &err);
+ json_parser_init(&ctxt, parser->ap);
+
+ /* Process all tokens in the queue */
+ while (!g_queue_is_empty(&parser->tokens)) {
+ token = g_queue_pop_head(&parser->tokens);
+ json = json_parser_feed(&ctxt, token, &err);
+ g_free(token);
+ if (json || err) {
+ break;
+ }
+ }
+
+ json_parser_destroy(&ctxt);
out_emit:
parser->brace_count = 0;
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/6] json-streamer: reuse parser
2026-06-26 10:17 [PATCH v4 0/6] qobject: switch JSON parser to push Paolo Bonzini
2026-06-26 10:17 ` [PATCH 1/6] json-parser: replace with a push parser Paolo Bonzini
@ 2026-06-26 10:17 ` Paolo Bonzini
2026-06-26 13:02 ` Philippe Mathieu-Daudé
2026-06-26 10:17 ` [PATCH 3/6] json-streamer: make brace/bracket count unsigned Paolo Bonzini
` (4 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2026-06-26 10:17 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru
The push parser can be reset, so reuse it when the json-streamer
detects a completed toplevel object.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 2 +-
qobject/json-streamer.c | 11 ++++-------
2 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 05346fa816b..4c3d89f751f 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -29,8 +29,8 @@ typedef struct JSONParserContext {
typedef struct JSONMessageParser {
void (*emit)(void *opaque, QObject *json, Error *err);
void *opaque;
- va_list *ap;
JSONLexer lexer;
+ JSONParserContext parser;
int brace_count;
int bracket_count;
GQueue tokens;
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 6c93e6fd78d..6c4f99b3e7f 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -32,7 +32,6 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
JSONTokenType type, int x, int y)
{
JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
- JSONParserContext ctxt;
QObject *json = NULL;
Error *err = NULL;
JSONToken *token;
@@ -90,21 +89,18 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
return;
}
- json_parser_init(&ctxt, parser->ap);
-
/* Process all tokens in the queue */
while (!g_queue_is_empty(&parser->tokens)) {
token = g_queue_pop_head(&parser->tokens);
- json = json_parser_feed(&ctxt, token, &err);
+ json = json_parser_feed(&parser->parser, token, &err);
g_free(token);
if (json || err) {
break;
}
}
- json_parser_destroy(&ctxt);
-
out_emit:
+ json_parser_reset(&parser->parser);
parser->brace_count = 0;
parser->bracket_count = 0;
json_message_free_tokens(parser);
@@ -119,12 +115,12 @@ void json_message_parser_init(JSONMessageParser *parser,
{
parser->emit = emit;
parser->opaque = opaque;
- parser->ap = ap;
parser->brace_count = 0;
parser->bracket_count = 0;
g_queue_init(&parser->tokens);
parser->token_size = 0;
+ json_parser_init(&parser->parser, ap);
json_lexer_init(&parser->lexer, !!ap);
}
@@ -144,4 +140,5 @@ void json_message_parser_destroy(JSONMessageParser *parser)
{
json_lexer_destroy(&parser->lexer);
json_message_free_tokens(parser);
+ json_parser_destroy(&parser->parser);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/6] json-streamer: make brace/bracket count unsigned
2026-06-26 10:17 [PATCH v4 0/6] qobject: switch JSON parser to push Paolo Bonzini
2026-06-26 10:17 ` [PATCH 1/6] json-parser: replace with a push parser Paolo Bonzini
2026-06-26 10:17 ` [PATCH 2/6] json-streamer: reuse parser Paolo Bonzini
@ 2026-06-26 10:17 ` Paolo Bonzini
2026-06-26 10:17 ` [PATCH 4/6] json-streamer: remove token queue Paolo Bonzini
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-06-26 10:17 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru
It makes no sense to let brace_count and bracket_count go negative,
also because it immediately ends error recovery and sets them both
back to zero. Instead set them to zero *before* choosing
whether to process the token queue; this makes it possible to
have the fields as unsigned.
Note that JSON_END_OF_INPUT now forces the parentheses to appear
balanced, so that the queue is emptied and an error is reported;
hence, the "type != JSON_END_OF_INPUT" condition can be removed.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 4 ++--
qobject/json-streamer.c | 24 +++++++++++++++++++++---
2 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 4c3d89f751f..0cf6932ecdc 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -31,8 +31,8 @@ typedef struct JSONMessageParser {
void *opaque;
JSONLexer lexer;
JSONParserContext parser;
- int brace_count;
- int bracket_count;
+ unsigned int brace_count;
+ unsigned int bracket_count;
GQueue tokens;
uint64_t token_size;
} JSONMessageParser;
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 6c4f99b3e7f..9e1f650bad8 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -41,21 +41,41 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
parser->brace_count++;
break;
case JSON_RCURLY:
+ if (!parser->brace_count) {
+ goto end_error_recovery;
+ }
parser->brace_count--;
break;
case JSON_LSQUARE:
parser->bracket_count++;
break;
case JSON_RSQUARE:
+ if (!parser->bracket_count) {
+ goto end_error_recovery;
+ }
parser->bracket_count--;
break;
case JSON_ERROR:
error_setg(&err, "JSON parse error, stray '%s'", input->str);
goto out_emit;
case JSON_END_OF_INPUT:
+ /*
+ * Force the parentheses to appear balanced and the queue
+ * to be emptied, causing a parse error if it wasn't.
+ */
if (g_queue_is_empty(&parser->tokens)) {
return;
}
+ end_error_recovery:
+ /*
+ * We goto here due to receiving either JSON_ERROR or a
+ * JSON_R{CURLY,SQUARE}) that is known to be unbalanced.
+ * If in error recovery, end it immediately. If not in
+ * error recovery, json_parser_feed() will raise an error
+ * but error recovery won't be entered at all.
+ */
+ parser->brace_count = 0;
+ parser->bracket_count = 0;
break;
default:
break;
@@ -83,9 +103,7 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
g_queue_push_tail(&parser->tokens, token);
- if ((parser->brace_count > 0 || parser->bracket_count > 0)
- && parser->brace_count >= 0 && parser->bracket_count >= 0
- && type != JSON_END_OF_INPUT) {
+ if (parser->brace_count > 0 || parser->bracket_count > 0) {
return;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/6] json-streamer: remove token queue
2026-06-26 10:17 [PATCH v4 0/6] qobject: switch JSON parser to push Paolo Bonzini
` (2 preceding siblings ...)
2026-06-26 10:17 ` [PATCH 3/6] json-streamer: make brace/bracket count unsigned Paolo Bonzini
@ 2026-06-26 10:17 ` Paolo Bonzini
2026-06-29 13:02 ` Markus Armbruster
2026-06-26 10:17 ` [PATCH 5/6] json-streamer: do not heap-allocate JSONToken Paolo Bonzini
` (2 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2026-06-26 10:17 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru
Now fully exploit the push parser, feeding it one token at a time
without having to wait until braces and brackets are balanced.
While the nesting counts are retained for error recovery purposes,
the system can now report the first parsing error without waiting
for parentheses to be balanced. This also means that JSON_ERROR
can be handled in json-parser.c, not json-streamer.c.
After reporting the error, json-streamer.c then enters an error recovery
mode where subsequent errors are suppressed. This mimics the previous
error reporting behavior, but it provides prompt feedback on parsing
errors. As an example, here is an example interaction with qemu-ga.
BEFORE (error reported only once braces are balanced):
>> {"execute":foo
>> }
<< {"error": {"class": "GenericError", "desc": "JSON parse error, invalid keyword 'foo'"}}
>> {"execute":"somecommand"}
<< {"error": {"class": "CommandNotFound", "desc": "The command somecommand has not been found"}}
AFTER (error reported immediately, but similar error recovery as before):
>> {"execute":foo
<< {"error": {"class": "GenericError", "desc": "JSON parse error, invalid keyword 'foo'"}}
>> }
>> {"execute":"somecommand"}
<< {"error": {"class": "CommandNotFound", "desc": "The command somecommand has not been found"}}
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 3 +-
qobject/json-parser.c | 4 ++
qobject/json-streamer.c | 106 +++++++++++++---------------------
3 files changed, 47 insertions(+), 66 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 0cf6932ecdc..3479e637588 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -33,7 +33,8 @@ typedef struct JSONMessageParser {
JSONParserContext parser;
unsigned int brace_count;
unsigned int bracket_count;
- GQueue tokens;
+ unsigned int token_count;
+ bool error;
uint64_t token_size;
} JSONMessageParser;
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 845da3699aa..484956deae4 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -673,6 +673,10 @@ QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token,
assert(!ctxt->err);
switch (token->type) {
+ case JSON_ERROR:
+ parse_error(ctxt, token, "stray '%s'", token->str);
+ break;
+
case JSON_END_OF_INPUT:
/* Check for premature end of input */
if (!g_queue_is_empty(ctxt->stack)) {
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 9e1f650bad8..9526f815f00 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -1,5 +1,5 @@
/*
- * JSON streaming support
+ * JSON parser - callback interface and error recovery
*
* Copyright IBM, Corp. 2009
*
@@ -19,23 +19,16 @@
#define MAX_TOKEN_COUNT (2ULL << 20)
#define MAX_NESTING (1 << 10)
-static void json_message_free_tokens(JSONMessageParser *parser)
-{
- JSONToken *token;
-
- while ((token = g_queue_pop_head(&parser->tokens))) {
- g_free(token);
- }
-}
-
void json_message_process_token(JSONLexer *lexer, GString *input,
JSONTokenType type, int x, int y)
{
JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
- QObject *json = NULL;
Error *err = NULL;
- JSONToken *token;
+ parser->token_size += input->len;
+ parser->token_count++;
+
+ /* Detect message boundaries for error recovery purposes. */
switch (type) {
case JSON_LCURLY:
parser->brace_count++;
@@ -56,19 +49,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
parser->bracket_count--;
break;
case JSON_ERROR:
- error_setg(&err, "JSON parse error, stray '%s'", input->str);
- goto out_emit;
- case JSON_END_OF_INPUT:
- /*
- * Force the parentheses to appear balanced and the queue
- * to be emptied, causing a parse error if it wasn't.
- */
- if (g_queue_is_empty(&parser->tokens)) {
- return;
- }
end_error_recovery:
/*
- * We goto here due to receiving either JSON_ERROR or a
+ * We come here due to receiving either JSON_ERROR or a
* JSON_R{CURLY,SQUARE}) that is known to be unbalanced.
* If in error recovery, end it immediately. If not in
* error recovery, json_parser_feed() will raise an error
@@ -81,49 +64,43 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
break;
}
- /*
- * Security consideration, we limit total memory allocated per object
- * and the maximum recursion depth that a message can force.
- */
- if (parser->token_size + input->len + 1 > MAX_TOKEN_SIZE) {
- error_setg(&err, "JSON token size limit exceeded");
- goto out_emit;
- }
- if (g_queue_get_length(&parser->tokens) + 1 > MAX_TOKEN_COUNT) {
- error_setg(&err, "JSON token count limit exceeded");
- goto out_emit;
- }
- if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
- error_setg(&err, "JSON nesting depth limit exceeded");
- goto out_emit;
- }
+ if (parser->error) {
+ /* error recovery, eat tokens until parentheses balance */
+ } else {
+ /*
+ * Safety consideration, we limit total memory allocated per object
+ * and the maximum nesting depth that a message can force.
+ */
+ if (parser->token_size > MAX_TOKEN_SIZE) {
+ error_setg(&err, "JSON token size limit exceeded");
+ } else if (parser->token_count > MAX_TOKEN_COUNT) {
+ error_setg(&err, "JSON token count limit exceeded");
+ } else if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
+ error_setg(&err, "JSON nesting depth limit exceeded");
+ } else {
+ g_autofree JSONToken *token = json_token(type, x, y, input);
+ QObject *json = json_parser_feed(&parser->parser, token, &err);
+ if (json) {
+ parser->emit(parser->opaque, json, NULL);
+ }
+ }
- token = json_token(type, x, y, input);
- parser->token_size += input->len;
-
- g_queue_push_tail(&parser->tokens, token);
-
- if (parser->brace_count > 0 || parser->bracket_count > 0) {
- return;
- }
-
- /* Process all tokens in the queue */
- while (!g_queue_is_empty(&parser->tokens)) {
- token = g_queue_pop_head(&parser->tokens);
- json = json_parser_feed(&parser->parser, token, &err);
- g_free(token);
- if (json || err) {
- break;
+ if (err) {
+ parser->emit(parser->opaque, NULL, err);
+ /* start recovery */
+ parser->error = true;
}
}
-out_emit:
- json_parser_reset(&parser->parser);
- parser->brace_count = 0;
- parser->bracket_count = 0;
- json_message_free_tokens(parser);
- parser->token_size = 0;
- parser->emit(parser->opaque, json, err);
+ if ((parser->brace_count == 0 && parser->bracket_count == 0)
+ || type == JSON_END_OF_INPUT) {
+ json_parser_reset(&parser->parser);
+ parser->error = false;
+ parser->brace_count = 0;
+ parser->bracket_count = 0;
+ parser->token_count = 0;
+ parser->token_size = 0;
+ }
}
void json_message_parser_init(JSONMessageParser *parser,
@@ -133,9 +110,10 @@ void json_message_parser_init(JSONMessageParser *parser,
{
parser->emit = emit;
parser->opaque = opaque;
+ parser->error = false;
parser->brace_count = 0;
parser->bracket_count = 0;
- g_queue_init(&parser->tokens);
+ parser->token_count = 0;
parser->token_size = 0;
json_parser_init(&parser->parser, ap);
@@ -151,12 +129,10 @@ void json_message_parser_feed(JSONMessageParser *parser,
void json_message_parser_flush(JSONMessageParser *parser)
{
json_lexer_flush(&parser->lexer);
- assert(g_queue_is_empty(&parser->tokens));
}
void json_message_parser_destroy(JSONMessageParser *parser)
{
json_lexer_destroy(&parser->lexer);
- json_message_free_tokens(parser);
json_parser_destroy(&parser->parser);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 5/6] json-streamer: do not heap-allocate JSONToken
2026-06-26 10:17 [PATCH v4 0/6] qobject: switch JSON parser to push Paolo Bonzini
` (3 preceding siblings ...)
2026-06-26 10:17 ` [PATCH 4/6] json-streamer: remove token queue Paolo Bonzini
@ 2026-06-26 10:17 ` Paolo Bonzini
2026-06-26 10:17 ` [PATCH 6/6] json-parser: add location to JSON parsing errors Paolo Bonzini
2026-06-29 13:03 ` [PATCH v4 0/6] qobject: switch JSON parser to push Markus Armbruster
6 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-06-26 10:17 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru
This is not needed with a push parser. Since it processes tokens
immediately, the JSONToken can be created directly on the stack
and does not need to copy the lexer's string data.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
qobject/json-parser-int.h | 8 ++++++--
qobject/json-parser.c | 18 ------------------
qobject/json-streamer.c | 9 +++++++--
3 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
index 1f435cb8eb2..5a6b5c9af90 100644
--- a/qobject/json-parser-int.h
+++ b/qobject/json-parser-int.h
@@ -35,7 +35,12 @@ typedef enum json_token_type {
JSON_MAX = JSON_END_OF_INPUT
} JSONTokenType;
-typedef struct JSONToken JSONToken;
+typedef struct JSONToken {
+ JSONTokenType type;
+ int x;
+ int y;
+ char *str;
+} JSONToken;
/* json-lexer.c */
void json_lexer_init(JSONLexer *lexer, bool enable_interpolation);
@@ -48,7 +53,6 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
JSONTokenType type, int x, int y);
/* json-parser.c */
-JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
void json_parser_init(JSONParserContext *ctxt, va_list *ap);
void json_parser_reset(JSONParserContext *ctxt);
QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token, Error **errp);
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 484956deae4..4a3f5866129 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -24,13 +24,6 @@
#include "qobject/qstring.h"
#include "json-parser-int.h"
-struct JSONToken {
- JSONTokenType type;
- int x;
- int y;
- char str[];
-};
-
/*
* The JSON parser is a push parser, returning to the caller after every
* token. Therefore it has an explicit representation of its parser
@@ -623,17 +616,6 @@ static QObject *parse_token(JSONParserContext *ctxt, const JSONToken *token)
return NULL;
}
-JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr)
-{
- JSONToken *token = g_malloc(sizeof(JSONToken) + tokstr->len + 1);
-
- token->type = type;
- memcpy(token->str, tokstr->str, tokstr->len);
- token->str[tokstr->len] = 0;
- token->x = x;
- token->y = y;
- return token;
-}
void json_parser_reset(JSONParserContext *ctxt)
{
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 9526f815f00..6d7c947f94a 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -78,8 +78,13 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
} else if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
error_setg(&err, "JSON nesting depth limit exceeded");
} else {
- g_autofree JSONToken *token = json_token(type, x, y, input);
- QObject *json = json_parser_feed(&parser->parser, token, &err);
+ JSONToken token = (JSONToken) {
+ .type = type,
+ .x = x,
+ .y = y,
+ .str = input->str
+ };
+ QObject *json = json_parser_feed(&parser->parser, &token, &err);
if (json) {
parser->emit(parser->opaque, json, NULL);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 6/6] json-parser: add location to JSON parsing errors
2026-06-26 10:17 [PATCH v4 0/6] qobject: switch JSON parser to push Paolo Bonzini
` (4 preceding siblings ...)
2026-06-26 10:17 ` [PATCH 5/6] json-streamer: do not heap-allocate JSONToken Paolo Bonzini
@ 2026-06-26 10:17 ` Paolo Bonzini
2026-06-29 13:03 ` [PATCH v4 0/6] qobject: switch JSON parser to push Markus Armbruster
6 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2026-06-26 10:17 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru
Now that all calls to parse_error have a token, add the line and column
to the message. As far as I can see the two important TODOs (better
errors and better EOI handling) are done, and the others (token range
information and "parsed size"?) do not really matter or are handled
better by json-streamer.c. So remove the list, which had sat unchanged
since 2009.
This needs some adjustments to provide a good x and y for error messages.
First of all, they switch from zero-based to one-based, which is safe
because they were both sitting unused. Second, right now the x and y
are those of the *last* character in the token. Modify json-lexer.c to
freeze tok->x and tok->y at the first character added to the GString.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
include/qobject/json-parser.h | 1 +
qobject/json-lexer.c | 11 +++++++----
qobject/json-parser.c | 12 ++----------
3 files changed, 10 insertions(+), 14 deletions(-)
diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 3479e637588..e078b36b2d5 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -17,6 +17,7 @@
typedef struct JSONLexer {
int start_state, state;
GString *token;
+ int cur_x, cur_y;
int x, y;
} JSONLexer;
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 51341d96e49..7753ba6c092 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -277,7 +277,8 @@ void json_lexer_init(JSONLexer *lexer, bool enable_interpolation)
lexer->start_state = lexer->state = enable_interpolation
? IN_START_INTERP : IN_START;
lexer->token = g_string_sized_new(3);
- lexer->x = lexer->y = 0;
+ lexer->cur_x = lexer->cur_y = 1;
+ lexer->x = lexer->y = 1;
}
static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
@@ -285,10 +286,10 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
int new_state;
bool char_consumed = false;
- lexer->x++;
+ lexer->cur_x++;
if (ch == '\n') {
- lexer->x = 0;
- lexer->y++;
+ lexer->cur_x = 1;
+ lexer->cur_y++;
}
while (flush ? lexer->state != lexer->start_state : !char_consumed) {
@@ -316,6 +317,8 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
case IN_START:
g_string_truncate(lexer->token, 0);
new_state = lexer->start_state;
+ lexer->x = lexer->cur_x;
+ lexer->y = lexer->cur_y;
break;
case JSON_ERROR:
json_message_process_token(lexer, lexer->token, JSON_ERROR,
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 4a3f5866129..a188d58d006 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -132,15 +132,6 @@ typedef struct JSONParserStackEntry {
#define BUG_ON(cond) assert(!(cond))
-/**
- * TODO
- *
- * 0) make errors meaningful again
- * 1) add geometry information to tokens
- * 3) should we return a parsed size?
- * 4) deal with premature EOI
- */
-
static inline JSONParserStackEntry *current_entry(JSONParserContext *ctxt)
{
return g_queue_peek_tail(ctxt->stack);
@@ -179,7 +170,8 @@ static void G_GNUC_PRINTF(3, 4) parse_error(JSONParserContext *ctxt,
va_start(ap, msg);
vsnprintf(message, sizeof(message), msg, ap);
va_end(ap);
- error_setg(&ctxt->err, "JSON parse error, %s", message);
+ error_setg(&ctxt->err, "%d:%d: JSON parse error, %s",
+ token->y, token->x, message);
}
static int cvt4hex(const char *s)
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/6] json-streamer: reuse parser
2026-06-26 10:17 ` [PATCH 2/6] json-streamer: reuse parser Paolo Bonzini
@ 2026-06-26 13:02 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 11+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-06-26 13:02 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: armbru
On 26/6/26 12:17, Paolo Bonzini wrote:
> The push parser can be reset, so reuse it when the json-streamer
> detects a completed toplevel object.
>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> include/qobject/json-parser.h | 2 +-
> qobject/json-streamer.c | 11 ++++-------
> 2 files changed, 5 insertions(+), 8 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@oss.qualcomm.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/6] json-parser: replace with a push parser
2026-06-26 10:17 ` [PATCH 1/6] json-parser: replace with a push parser Paolo Bonzini
@ 2026-06-29 13:02 ` Markus Armbruster
0 siblings, 0 replies; 11+ messages in thread
From: Markus Armbruster @ 2026-06-29 13:02 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
Paolo Bonzini <pbonzini@redhat.com> writes:
> In order to avoid stashing all the tokens corresponding to a JSON value,
> embed the parsing stack and state machine in JSONParser. This is more
> efficient and allows for more prompt error recovery; it also does not
> make the code substantially larger than the current recursive descent
> parser, though the state machine is probably a bit harder to follow.
>
> The stack consists of QLists and QDicts corresponding to open
> brackets and braces, plus optionally a QString with the current
> key on top of each QDict.
>
> After each value is parsed, it is added to the top array or dictionary
> or, if the stack is empty, json_parser_feed returns the complete
> QObject.
>
> For now, json-streamer.c keeps tracking the tokens up until braces
> and brackets are balanced, and then shoves the whole queue of tokens
> into the push parser. The only logic change is that JSON_END_OF_INPUT
> always triggers the emptying of the queue; the parser takes notice and
> checks that there is nothing on the stack. Not using brace_count
> and bracket_count for this is the first step towards improved separation
> of concerns between json-parser.c and json-streamer.c.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[...]
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index f6622b82b0a..845da3699aa 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -31,12 +31,111 @@ struct JSONToken {
> char str[];
> };
>
> -typedef struct JSONParserContext {
> - Error *err;
> - JSONToken *current;
> - GQueue *buf;
> - va_list *ap;
> -} JSONParserContext;
> +/*
> + * The JSON parser is a push parser, returning to the caller after every
> + * token. Therefore it has an explicit representation of its parser
I think you proposed "returning a completed top-level object, an error,
or NULL (if the object is incomplete and no error happened) after every
token". Happy to apply that without a respin.
> + * stack; each stack entry consists of a parser state and a QObject:
> + * - a QList, for an array that is being added to
> + * - a QDict, for a dictionary that is being added to
> + * - a QString, for the key of the next pair that will be added to a QDict
> + *
> + * The stack represents an arbitrary nesting of arrays and dictionaries
> + * (whose next key has been parsed); it can also have a dictionary whose
> + * next key has not been parsed, but that can only happen at the top level.
> + * Because of this, the stack contents are always of the form
> + * "(QList | QDict QString)* QDict?".
> + *
> + * An empty stack represents the beginning of the parsing process, with
> + * start state BEFORE_VALUE.
> + */
[...]
> +/*
> + * Advance the parser based on the token that is passed.
> + * Return the finished top-level value if the token completes it.
> + * If an error is returned, the function must not be called without
> + * first resetting the parser.
> + */
Suggested polish:
/*
* Advance the parser based on the token that is passed.
* Return the finished top-level value if the token completes it,
* else NULL.
* Once an error is returned, the function must not be called again
* without first resetting the parser.
*/
Again, not worth a respin.
> +QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token,
> + Error **errp)
> +{
> + QObject *result = NULL;
> +
> + assert(!ctxt->err);
> + switch (token->type) {
> + case JSON_END_OF_INPUT:
> + /* Check for premature end of input */
> + if (!g_queue_is_empty(ctxt->stack)) {
> + parse_error(ctxt, token, "premature end of input");
> + }
> + break;
> +
> + default:
> + result = parse_token(ctxt, token);
> + break;
> + }
> +
> + error_propagate(errp, ctxt->err);
> return result;
> }
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 4/6] json-streamer: remove token queue
2026-06-26 10:17 ` [PATCH 4/6] json-streamer: remove token queue Paolo Bonzini
@ 2026-06-29 13:02 ` Markus Armbruster
0 siblings, 0 replies; 11+ messages in thread
From: Markus Armbruster @ 2026-06-29 13:02 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
Paolo Bonzini <pbonzini@redhat.com> writes:
> Now fully exploit the push parser, feeding it one token at a time
> without having to wait until braces and brackets are balanced.
>
> While the nesting counts are retained for error recovery purposes,
> the system can now report the first parsing error without waiting
> for parentheses to be balanced. This also means that JSON_ERROR
> can be handled in json-parser.c, not json-streamer.c.
>
> After reporting the error, json-streamer.c then enters an error recovery
> mode where subsequent errors are suppressed. This mimics the previous
> error reporting behavior, but it provides prompt feedback on parsing
> errors. As an example, here is an example interaction with qemu-ga.
>
> BEFORE (error reported only once braces are balanced):
>
> >> {"execute":foo
> >> }
> << {"error": {"class": "GenericError", "desc": "JSON parse error, invalid keyword 'foo'"}}
> >> {"execute":"somecommand"}
> << {"error": {"class": "CommandNotFound", "desc": "The command somecommand has not been found"}}
>
> AFTER (error reported immediately, but similar error recovery as before):
>
> >> {"execute":foo
> << {"error": {"class": "GenericError", "desc": "JSON parse error, invalid keyword 'foo'"}}
> >> }
> >> {"execute":"somecommand"}
> << {"error": {"class": "CommandNotFound", "desc": "The command somecommand has not been found"}}
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> include/qobject/json-parser.h | 3 +-
> qobject/json-parser.c | 4 ++
> qobject/json-streamer.c | 106 +++++++++++++---------------------
> 3 files changed, 47 insertions(+), 66 deletions(-)
>
> diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
> index 0cf6932ecdc..3479e637588 100644
> --- a/include/qobject/json-parser.h
> +++ b/include/qobject/json-parser.h
> @@ -33,7 +33,8 @@ typedef struct JSONMessageParser {
> JSONParserContext parser;
> unsigned int brace_count;
> unsigned int bracket_count;
> - GQueue tokens;
> + unsigned int token_count;
> + bool error;
> uint64_t token_size;
> } JSONMessageParser;
>
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index 845da3699aa..484956deae4 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -673,6 +673,10 @@ QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token,
>
> assert(!ctxt->err);
> switch (token->type) {
> + case JSON_ERROR:
> + parse_error(ctxt, token, "stray '%s'", token->str);
> + break;
> +
> case JSON_END_OF_INPUT:
> /* Check for premature end of input */
> if (!g_queue_is_empty(ctxt->stack)) {
> diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
> index 9e1f650bad8..9526f815f00 100644
> --- a/qobject/json-streamer.c
> +++ b/qobject/json-streamer.c
> @@ -1,5 +1,5 @@
> /*
> - * JSON streaming support
> + * JSON parser - callback interface and error recovery
> *
> * Copyright IBM, Corp. 2009
> *
> @@ -19,23 +19,16 @@
> #define MAX_TOKEN_COUNT (2ULL << 20)
> #define MAX_NESTING (1 << 10)
>
> -static void json_message_free_tokens(JSONMessageParser *parser)
> -{
> - JSONToken *token;
> -
> - while ((token = g_queue_pop_head(&parser->tokens))) {
> - g_free(token);
> - }
> -}
> -
> void json_message_process_token(JSONLexer *lexer, GString *input,
> JSONTokenType type, int x, int y)
> {
> JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
> - QObject *json = NULL;
> Error *err = NULL;
> - JSONToken *token;
>
> + parser->token_size += input->len;
> + parser->token_count++;
> +
> + /* Detect message boundaries for error recovery purposes. */
> switch (type) {
> case JSON_LCURLY:
> parser->brace_count++;
> @@ -56,19 +49,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
> parser->bracket_count--;
> break;
> case JSON_ERROR:
> - error_setg(&err, "JSON parse error, stray '%s'", input->str);
> - goto out_emit;
> - case JSON_END_OF_INPUT:
> - /*
> - * Force the parentheses to appear balanced and the queue
> - * to be emptied, causing a parse error if it wasn't.
> - */
> - if (g_queue_is_empty(&parser->tokens)) {
> - return;
> - }
> end_error_recovery:
> /*
> - * We goto here due to receiving either JSON_ERROR or a
> + * We come here due to receiving either JSON_ERROR or a
Line was added in the previous commit. Squash the change into it?
> * JSON_R{CURLY,SQUARE}) that is known to be unbalanced.
> * If in error recovery, end it immediately. If not in
> * error recovery, json_parser_feed() will raise an error
> @@ -81,49 +64,43 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
> break;
> }
>
> - /*
> - * Security consideration, we limit total memory allocated per object
> - * and the maximum recursion depth that a message can force.
> - */
> - if (parser->token_size + input->len + 1 > MAX_TOKEN_SIZE) {
Left operand of > is unincremented token_size plus increment plus 1.
> - error_setg(&err, "JSON token size limit exceeded");
> - goto out_emit;
> - }
> - if (g_queue_get_length(&parser->tokens) + 1 > MAX_TOKEN_COUNT) {
> - error_setg(&err, "JSON token count limit exceeded");
> - goto out_emit;
> - }
> - if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
> - error_setg(&err, "JSON nesting depth limit exceeded");
> - goto out_emit;
> - }
> + if (parser->error) {
> + /* error recovery, eat tokens until parentheses balance */
> + } else {
> + /*
> + * Safety consideration, we limit total memory allocated per object
> + * and the maximum nesting depth that a message can force.
> + */
> + if (parser->token_size > MAX_TOKEN_SIZE) {
Left operand of > is incremented token size.
I believe this is one less than before the patch. Testing... yes:
-blockdev '{"a":"01234567890123456789012345678901234567890123456789012345"}'
with MAX_TOKEN_SIZE hacked to 64: is rejected before the series, and
accepted afterwards.
Obvious fix: change > to >=.
If you'd prefer not to change the code, mention the change in the commit
message.
> + error_setg(&err, "JSON token size limit exceeded");
> + } else if (parser->token_count > MAX_TOKEN_COUNT) {
> + error_setg(&err, "JSON token count limit exceeded");
> + } else if (parser->bracket_count + parser->brace_count > MAX_NESTING) {
> + error_setg(&err, "JSON nesting depth limit exceeded");
> + } else {
> + g_autofree JSONToken *token = json_token(type, x, y, input);
> + QObject *json = json_parser_feed(&parser->parser, token, &err);
> + if (json) {
> + parser->emit(parser->opaque, json, NULL);
> + }
> + }
>
> - token = json_token(type, x, y, input);
> - parser->token_size += input->len;
> -
> - g_queue_push_tail(&parser->tokens, token);
> -
> - if (parser->brace_count > 0 || parser->bracket_count > 0) {
> - return;
> - }
> -
> - /* Process all tokens in the queue */
> - while (!g_queue_is_empty(&parser->tokens)) {
> - token = g_queue_pop_head(&parser->tokens);
> - json = json_parser_feed(&parser->parser, token, &err);
> - g_free(token);
> - if (json || err) {
> - break;
> + if (err) {
> + parser->emit(parser->opaque, NULL, err);
> + /* start recovery */
> + parser->error = true;
> }
> }
>
> -out_emit:
> - json_parser_reset(&parser->parser);
> - parser->brace_count = 0;
> - parser->bracket_count = 0;
> - json_message_free_tokens(parser);
> - parser->token_size = 0;
> - parser->emit(parser->opaque, json, err);
> + if ((parser->brace_count == 0 && parser->bracket_count == 0)
> + || type == JSON_END_OF_INPUT) {
> + json_parser_reset(&parser->parser);
> + parser->error = false;
> + parser->brace_count = 0;
> + parser->bracket_count = 0;
> + parser->token_count = 0;
> + parser->token_size = 0;
> + }
> }
>
> void json_message_parser_init(JSONMessageParser *parser,
> @@ -133,9 +110,10 @@ void json_message_parser_init(JSONMessageParser *parser,
> {
> parser->emit = emit;
> parser->opaque = opaque;
> + parser->error = false;
> parser->brace_count = 0;
> parser->bracket_count = 0;
> - g_queue_init(&parser->tokens);
> + parser->token_count = 0;
> parser->token_size = 0;
>
> json_parser_init(&parser->parser, ap);
> @@ -151,12 +129,10 @@ void json_message_parser_feed(JSONMessageParser *parser,
> void json_message_parser_flush(JSONMessageParser *parser)
> {
> json_lexer_flush(&parser->lexer);
> - assert(g_queue_is_empty(&parser->tokens));
> }
>
> void json_message_parser_destroy(JSONMessageParser *parser)
> {
> json_lexer_destroy(&parser->lexer);
> - json_message_free_tokens(parser);
> json_parser_destroy(&parser->parser);
> }
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v4 0/6] qobject: switch JSON parser to push
2026-06-26 10:17 [PATCH v4 0/6] qobject: switch JSON parser to push Paolo Bonzini
` (5 preceding siblings ...)
2026-06-26 10:17 ` [PATCH 6/6] json-parser: add location to JSON parsing errors Paolo Bonzini
@ 2026-06-29 13:03 ` Markus Armbruster
6 siblings, 0 replies; 11+ messages in thread
From: Markus Armbruster @ 2026-06-29 13:03 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
Paolo Bonzini <pbonzini@redhat.com> writes:
> This rewrites the json-parser to use a push parser aka state machine.
> While push parsers are inherently more complex than recursive descent,
> the grammar for JSON is simple enough that the parser remains readable.
> There is therefore no need to use e.g. QEMU coroutines.
>
> Unlike the suggestion in commit 62815d85aed ("json: Redesign the callback
> to consume JSON values", 2018-08-24), I kept the json-streamer concept.
> It helps in handling input limits, it performs error recovery, and it
> converts the token-at-a-time push interface to callbacks---all things
> that are more easily done in a separate layer to keep the parser clean.
> However, there is no need anymore for it to store partial JSON objects
> in tokenized form, because the current state is stored in the push
> parser's stack.
>
> Another benefit is that QEMU can report the first parsing error
> immediately, without waiting for parentheses to be balanced or for a
> lexing error. Error recovery then proceeds as before (i.e., the next
> parse still starts after balanced parentheses or a lexing error).
>
> On top of the benefits intrinsic in the push architecture, it so happens
> that it's really easy to add a location to JSON parsing errors now, so
> do that as well.
>
> The diffstat is unfavorable, but most of the new lines delta is really
> new comments explaining the grammar and state machines.
I found an unintentional, harmless limit change by one, and suggested a
few further comment tweaks.
With the limit change reverted or mentioned in the commit message,
series
Reviewed-by: Markus Armbruster <armbru@redhat.com>
I volunteer to do the pull request, since I have another patch for
qobject/ queued up already. I'd apply the changes I suggested, less
ones you disagree with. Let me know!
[...]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-29 13:03 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 10:17 [PATCH v4 0/6] qobject: switch JSON parser to push Paolo Bonzini
2026-06-26 10:17 ` [PATCH 1/6] json-parser: replace with a push parser Paolo Bonzini
2026-06-29 13:02 ` Markus Armbruster
2026-06-26 10:17 ` [PATCH 2/6] json-streamer: reuse parser Paolo Bonzini
2026-06-26 13:02 ` Philippe Mathieu-Daudé
2026-06-26 10:17 ` [PATCH 3/6] json-streamer: make brace/bracket count unsigned Paolo Bonzini
2026-06-26 10:17 ` [PATCH 4/6] json-streamer: remove token queue Paolo Bonzini
2026-06-29 13:02 ` Markus Armbruster
2026-06-26 10:17 ` [PATCH 5/6] json-streamer: do not heap-allocate JSONToken Paolo Bonzini
2026-06-26 10:17 ` [PATCH 6/6] json-parser: add location to JSON parsing errors Paolo Bonzini
2026-06-29 13:03 ` [PATCH v4 0/6] qobject: switch JSON parser to push Markus Armbruster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox