From: Markus Armbruster <armbru@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [PATCH v3 2/7] json-parser: replace with a push parser
Date: Mon, 15 Jun 2026 09:48:51 +0200 [thread overview]
Message-ID: <87jys0s0po.fsf@pond.sub.org> (raw)
In-Reply-To: <eeff3845-a564-4871-9b7f-bf91ba3f2295@redhat.com> (Paolo Bonzini's message of "Fri, 12 Jun 2026 17:08:34 +0200")
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 6/12/26 16:21, Markus Armbruster wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>>
>>> In order to avoid stashing all the tokens corresponding to a JSON value,
>>> embed the parsing stack and state machine in JSONParser. This is more
>>> efficient and allows for more prompt error recovery; it also does not
>>> make the code substantially larger than the current recursive descent
>>> parser, though the state machine is probably a bit harder to follow.
>>>
>>> The stack consists of QLists and QDicts corresponding to open
>>> brackets and braces, plus optionally a QString with the current
>>> key on top of each QDict.
>>>
>>> After each value is parsed, it is added to the top array or dictionary
>>> or, if the stack is empty, json_parser_feed returns the complete
>>> QObject.
>>>
>>> For now, json-streamer.c keeps tracking the tokens up until braces
>>> and brackets are balanced, and then shoves the whole queue of tokens
>>> into the push parser. The only logic change is that JSON_END_OF_INPUT
>>> always triggers the emptying of the queue; the parser takes notice and
>>> checks that there is nothing on the stack. Not using brace_count
>>> and bracket_count for this is the first step towards improved separation
>>> of concerns between json-parser.c and json-streamer.c.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>> include/qobject/json-parser.h | 6 +
>>> qobject/json-parser-int.h | 5 +-
>>> qobject/json-parser.c | 551 ++++++++++++++++++++--------------
>>> qobject/json-streamer.c | 21 +-
>>> 4 files changed, 345 insertions(+), 238 deletions(-)
>>>
>>> diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
>>> index 7345a9bd5cb..05346fa816b 100644
>>> --- a/include/qobject/json-parser.h
>>> +++ b/include/qobject/json-parser.h
>>> @@ -20,6 +20,12 @@ typedef struct JSONLexer {
>>> int x, y;
>>> } JSONLexer;
>>> +typedef struct JSONParserContext {
>>> + Error *err;
>>> + GQueue *stack;
>>> + va_list *ap;
>>> +} JSONParserContext;
>>> +
>>> typedef struct JSONMessageParser {
>>> void (*emit)(void *opaque, QObject *json, Error *err);
>>> void *opaque;
>>> diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
>>> index 8c01f236276..1f435cb8eb2 100644
>>> --- a/qobject/json-parser-int.h
>>> +++ b/qobject/json-parser-int.h
>>> @@ -49,6 +49,9 @@ void json_message_process_token(JSONLexer *lexer, GString *input,
>>> /* json-parser.c */
>>> JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
>>> -QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
>>> +void json_parser_init(JSONParserContext *ctxt, va_list *ap);
>>> +void json_parser_reset(JSONParserContext *ctxt);
>>> +QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token, Error **errp);
>>> +void json_parser_destroy(JSONParserContext *ctxt);
>>>
>>> #endif
>>> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
>>> index f6622b82b0a..3b5edc5bae4 100644
>>> --- a/qobject/json-parser.c
>>> +++ b/qobject/json-parser.c
>>> @@ -31,12 +31,105 @@ struct JSONToken {
>>> char str[];
>>> };
>>> -typedef struct JSONParserContext {
>>> - Error *err;
>>> - JSONToken *current;
>>> - GQueue *buf;
>>> - va_list *ap;
>>> -} JSONParserContext;
>>> +/*
>>> + * The JSON parser is a push parser, returning to the caller after every
>>> + * token.
>>
>> The thing that returns after every token is json_parser_feed(), right?
>> Detail not mentioned here: the value it returns. Leaving that to
>> json_parser_feed()'s contract feels fine, but pointing from here to
>> there could be useful.
>
> "returning a completed top-level object, an error, or NULL (if the object is incomplete and no error happened) after every token"?
I like it!
>>> + * // The initial state is BEFORE_VALUE.
>>> + * input := value -> END_OF_VALUE -> return parsed value
>>> + * END_OF_INPUT -> check stack is empty
>>
>> How can the stack *not* be empty here?
>
> Right, this is not END_OF_INPUT in the middle of the stream. Will delete.
>>> + * // entered on BEFORE_KEY, with TOS being a QDict
>>> + * dict_pairs := (STRING | INTERP) -> push QString -> END_OF_KEY
>>> + * ':' -> BEFORE_VALUE
>>> + * value -> pop QString + add pair to QDict -> END_OF_VALUE
>>> + * ('}' -> pop completed QDict -> END_OF_VALUE
>>> + * | ',' -> BEFORE_KEY
>>> + * dict_pairs) -> END_OF_VALUE
>>> + */
>>
>> This is useful.
>>
>> It doesn't mention how we do parse errors. Leaving that to
>> json_parser_feed()'s contract feels fine.
>
> Right---parse errors are out of the scope because recovery happens in json-streamer.c.
>
> I can add a note for this and everything else, thanks for the review! Rewrites are not the most enticing form of thing to receive, or the most polite to send.
>
> Paolo
In all fairness, I had moaned about this parser more than once,
e.g. "it's half-assed: it's a push lexer wed to a pull parser with
parenthesis counting."
next prev parent reply other threads:[~2026-06-15 7:49 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-25 15:04 [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 1/7] json-parser: constify JSONToken Paolo Bonzini
2026-05-25 15:04 ` [PATCH v3 2/7] json-parser: replace with a push parser Paolo Bonzini
2026-06-12 14:21 ` Markus Armbruster
2026-06-12 15:08 ` Paolo Bonzini
2026-06-15 7:48 ` Markus Armbruster [this message]
2026-06-15 11:06 ` Markus Armbruster
2026-06-15 12:23 ` Markus Armbruster
2026-05-25 15:04 ` [PATCH v3 3/7] json-streamer: reuse parser Paolo Bonzini
2026-06-15 7:56 ` Markus Armbruster
2026-05-25 15:05 ` [PATCH v3 4/7] json-streamer: make brace/bracket count unsigned Paolo Bonzini
2026-06-15 8:11 ` Markus Armbruster
2026-05-25 15:05 ` [PATCH v3 5/7] json-streamer: remove token queue Paolo Bonzini
2026-06-15 10:58 ` Markus Armbruster
2026-06-15 13:33 ` Paolo Bonzini
2026-06-15 12:29 ` Markus Armbruster
2026-06-15 13:22 ` Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 6/7] json-streamer: do not heap-allocate JSONToken Paolo Bonzini
2026-05-25 15:05 ` [PATCH v3 7/7] json-parser: add location to JSON parsing errors Paolo Bonzini
2026-06-15 11:35 ` Markus Armbruster
2026-06-15 13:22 ` Paolo Bonzini
2026-06-02 8:58 ` [PATCH v3 0/7] qobject: switch JSON parser to push Paolo Bonzini
2026-06-15 17:01 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jys0s0po.fsf@pond.sub.org \
--to=armbru@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.