From: Markus Armbruster <armbru@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH 1/6] json-parser: replace with a push parser
Date: Mon, 29 Jun 2026 15:02:45 +0200 [thread overview]
Message-ID: <87a4sdts7e.fsf@pond.sub.org> (raw)
In-Reply-To: <20260626101727.1727389-2-pbonzini@redhat.com> (Paolo Bonzini's message of "Fri, 26 Jun 2026 12:17:21 +0200")
Paolo Bonzini <pbonzini@redhat.com> writes:
> In order to avoid stashing all the tokens corresponding to a JSON value,
> embed the parsing stack and state machine in JSONParser. This is more
> efficient and allows for more prompt error recovery; it also does not
> make the code substantially larger than the current recursive descent
> parser, though the state machine is probably a bit harder to follow.
>
> The stack consists of QLists and QDicts corresponding to open
> brackets and braces, plus optionally a QString with the current
> key on top of each QDict.
>
> After each value is parsed, it is added to the top array or dictionary
> or, if the stack is empty, json_parser_feed returns the complete
> QObject.
>
> For now, json-streamer.c keeps tracking the tokens up until braces
> and brackets are balanced, and then shoves the whole queue of tokens
> into the push parser. The only logic change is that JSON_END_OF_INPUT
> always triggers the emptying of the queue; the parser takes notice and
> checks that there is nothing on the stack. Not using brace_count
> and bracket_count for this is the first step towards improved separation
> of concerns between json-parser.c and json-streamer.c.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[...]
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index f6622b82b0a..845da3699aa 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -31,12 +31,111 @@ struct JSONToken {
> char str[];
> };
>
> -typedef struct JSONParserContext {
> - Error *err;
> - JSONToken *current;
> - GQueue *buf;
> - va_list *ap;
> -} JSONParserContext;
> +/*
> + * The JSON parser is a push parser, returning to the caller after every
> + * token. Therefore it has an explicit representation of its parser
I think you proposed "returning a completed top-level object, an error,
or NULL (if the object is incomplete and no error happened) after every
token". Happy to apply that without a respin.
> + * stack; each stack entry consists of a parser state and a QObject:
> + * - a QList, for an array that is being added to
> + * - a QDict, for a dictionary that is being added to
> + * - a QString, for the key of the next pair that will be added to a QDict
> + *
> + * The stack represents an arbitrary nesting of arrays and dictionaries
> + * (whose next key has been parsed); it can also have a dictionary whose
> + * next key has not been parsed, but that can only happen at the top level.
> + * Because of this, the stack contents are always of the form
> + * "(QList | QDict QString)* QDict?".
> + *
> + * An empty stack represents the beginning of the parsing process, with
> + * start state BEFORE_VALUE.
> + */
[...]
> +/*
> + * Advance the parser based on the token that is passed.
> + * Return the finished top-level value if the token completes it.
> + * If an error is returned, the function must not be called without
> + * first resetting the parser.
> + */
Suggested polish:
/*
* Advance the parser based on the token that is passed.
* Return the finished top-level value if the token completes it,
* else NULL.
* Once an error is returned, the function must not be called again
* without first resetting the parser.
*/
Again, not worth a respin.
> +QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token,
> + Error **errp)
> +{
> + QObject *result = NULL;
> +
> + assert(!ctxt->err);
> + switch (token->type) {
> + case JSON_END_OF_INPUT:
> + /* Check for premature end of input */
> + if (!g_queue_is_empty(ctxt->stack)) {
> + parse_error(ctxt, token, "premature end of input");
> + }
> + break;
> +
> + default:
> + result = parse_token(ctxt, token);
> + break;
> + }
> +
> + error_propagate(errp, ctxt->err);
> return result;
> }
[...]
next prev parent reply other threads:[~2026-06-29 13:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-26 10:17 [PATCH v4 0/6] qobject: switch JSON parser to push Paolo Bonzini
2026-06-26 10:17 ` [PATCH 1/6] json-parser: replace with a push parser Paolo Bonzini
2026-06-29 13:02 ` Markus Armbruster [this message]
2026-06-26 10:17 ` [PATCH 2/6] json-streamer: reuse parser Paolo Bonzini
2026-06-26 13:02 ` Philippe Mathieu-Daudé
2026-06-26 10:17 ` [PATCH 3/6] json-streamer: make brace/bracket count unsigned Paolo Bonzini
2026-06-26 10:17 ` [PATCH 4/6] json-streamer: remove token queue Paolo Bonzini
2026-06-29 13:02 ` Markus Armbruster
2026-06-26 10:17 ` [PATCH 5/6] json-streamer: do not heap-allocate JSONToken Paolo Bonzini
2026-06-26 10:17 ` [PATCH 6/6] json-parser: add location to JSON parsing errors Paolo Bonzini
2026-06-29 13:03 ` [PATCH v4 0/6] qobject: switch JSON parser to push Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a4sdts7e.fsf@pond.sub.org \
--to=armbru@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.