From: Markus Armbruster <armbru@redhat.com>
To: qemu-devel@nongnu.org
Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com,
eblake@redhat.com
Subject: [Qemu-devel] [PATCH v2 2/6] json: Clean up how lexer consumes "end of input"
Date: Fri, 31 Aug 2018 09:58:37 +0200 [thread overview]
Message-ID: <20180831075841.13363-3-armbru@redhat.com> (raw)
In-Reply-To: <20180831075841.13363-1-armbru@redhat.com>
When the lexer isn't in its start state at the end of input, it's
working on a token. To flush it out, it needs to transit to its start
state on "end of input" lookahead.
There are two ways to the start state, depending on the current state:
* If the lexer is in a TERMINAL(JSON_FOO) state, it can emit a
JSON_FOO token.
* Else, it can go to IN_ERROR state, and emit a JSON_ERROR token.
There are complications, however:
* The transition to IN_ERROR state consumes the input character and
adds it to the JSON_ERROR token. The latter is inappropriate for
the "end of input" character, so we suppress that. See also recent
commit a2ec6be72b8 "json: Fix lexer to include the bad character in
JSON_ERROR token".
* The transition to a TERMINAL(JSON_FOO) state doesn't consume the
input character. In that case, the lexer normally loops until it is
consumed. We have to suppress that for the "end of input" input
character. If we didn't, the lexer would consume it by entering
IN_ERROR state, emitting a bogus JSON_ERROR token. We fixed that in
commit bd3924a33a6.
However, simply breaking the loop this way assumes that the lexer
needs exactly one state transition to reach its start state. That
assumption is correct now, but it's unclean, and I'll soon break it.
Clean up: instead of breaking the loop after one iteration, break it
after it reached the start state.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
qobject/json-lexer.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 4867839f66..ec3aec726f 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -261,7 +261,8 @@ void json_lexer_init(JSONLexer *lexer, bool enable_interpolation)
static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
{
- int char_consumed, new_state;
+ int new_state;
+ bool char_consumed = false;
lexer->x++;
if (ch == '\n') {
@@ -269,11 +270,12 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
lexer->y++;
}
- do {
+ while (flush ? lexer->state != lexer->start_state : !char_consumed) {
assert(lexer->state <= ARRAY_SIZE(json_lexer));
new_state = json_lexer[lexer->state][(uint8_t)ch];
- char_consumed = !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state);
- if (char_consumed && !flush) {
+ char_consumed = !flush
+ && !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state);
+ if (char_consumed) {
g_string_append_c(lexer->token, ch);
}
@@ -318,7 +320,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
break;
}
lexer->state = new_state;
- } while (!char_consumed && !flush);
+ }
/* Do not let a single token grow to an arbitrarily large size,
* this is a security consideration.
@@ -342,9 +344,8 @@ void json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size)
void json_lexer_flush(JSONLexer *lexer)
{
- if (lexer->state != lexer->start_state) {
- json_lexer_feed_char(lexer, 0, true);
- }
+ json_lexer_feed_char(lexer, 0, true);
+ assert(lexer->state == lexer->start_state);
json_message_process_token(lexer, lexer->token, JSON_END_OF_INPUT,
lexer->x, lexer->y);
}
--
2.17.1
next prev parent reply other threads:[~2018-08-31 7:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-31 7:58 [Qemu-devel] [PATCH v2 0/6] json: More fixes, error reporting improvements, cleanups Markus Armbruster
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 1/6] json: Fix lexer for lookahead character beyond '\x7F' Markus Armbruster
2018-08-31 7:58 ` Markus Armbruster [this message]
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 3/6] json: Make lexer's "character consumed" logic less confusing Markus Armbruster
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 4/6] json: Nicer recovery from lexical errors Markus Armbruster
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 5/6] json: Eliminate lexer state IN_ERROR Markus Armbruster
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 6/6] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180831075841.13363-3-armbru@redhat.com \
--to=armbru@redhat.com \
--cc=eblake@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).