From: Markus Armbruster <armbru@redhat.com>
To: qemu-devel@nongnu.org
Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com,
eblake@redhat.com
Subject: [Qemu-devel] [PATCH v2 4/6] json: Nicer recovery from lexical errors
Date: Fri, 31 Aug 2018 09:58:39 +0200 [thread overview]
Message-ID: <20180831075841.13363-5-armbru@redhat.com> (raw)
In-Reply-To: <20180831075841.13363-1-armbru@redhat.com>
When the lexer chokes on an input character, it consumes the
character, emits a JSON error token, and enters its start state. This
can lead to suboptimal error recovery. For instance, input
0123 ,
produces the tokens
JSON_ERROR 01
JSON_INTEGER 23
JSON_COMMA ,
Make the lexer skip characters after a lexical error until a
structural character ('[', ']', '{', '}', ':', ','), an ASCII control
character, or '\xFE', or '\xFF'.
Note that we must not skip ASCII control characters, '\xFE', '\xFF',
because those are documented to force the JSON parser into known-good
state, by docs/interop/qmp-spec.txt.
The lexer now produces
JSON_ERROR 01
JSON_COMMA ,
Update qmp-test for the nicer error recovery: QMP now reports just one
error for input %p instead of two. Also drop the newline after %p; it
was needed to tease out the second error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
qobject/json-lexer.c | 43 +++++++++++++++++++++++++++++--------------
tests/qmp-test.c | 6 +-----
2 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 28582e17d9..39c7ce7adc 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -101,6 +101,7 @@
enum json_lexer_state {
IN_ERROR = 0, /* must really be 0, see json_lexer[] */
+ IN_RECOVERY,
IN_DQ_STRING_ESCAPE,
IN_DQ_STRING,
IN_SQ_STRING_ESCAPE,
@@ -130,6 +131,28 @@ QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1);
static const uint8_t json_lexer[][256] = {
/* Relies on default initialization to IN_ERROR! */
+ /* error recovery */
+ [IN_RECOVERY] = {
+ /*
+ * Skip characters until a structural character, an ASCII
+ * control character other than '\t', or impossible UTF-8
+ * bytes '\xFE', '\xFF'. Structural characters and line
+ * endings are promising resynchronization points. Clients
+ * may use the others to force the JSON parser into known-good
+ * state; see docs/interop/qmp-spec.txt.
+ */
+ [0 ... 0x1F] = IN_START | LOOKAHEAD,
+ [0x20 ... 0xFD] = IN_RECOVERY,
+ [0xFE ... 0xFF] = IN_START | LOOKAHEAD,
+ ['\t'] = IN_RECOVERY,
+ ['['] = IN_START | LOOKAHEAD,
+ [']'] = IN_START | LOOKAHEAD,
+ ['{'] = IN_START | LOOKAHEAD,
+ ['}'] = IN_START | LOOKAHEAD,
+ [':'] = IN_START | LOOKAHEAD,
+ [','] = IN_START | LOOKAHEAD,
+ },
+
/* double quote string */
[IN_DQ_STRING_ESCAPE] = {
[0x20 ... 0xFD] = IN_DQ_STRING,
@@ -301,26 +324,18 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
/* fall through */
case JSON_SKIP:
g_string_truncate(lexer->token, 0);
+ /* fall through */
+ case IN_START:
new_state = lexer->start_state;
break;
case IN_ERROR:
- /* XXX: To avoid having previous bad input leaving the parser in an
- * unresponsive state where we consume unpredictable amounts of
- * subsequent "good" input, percolate this error state up to the
- * parser by emitting a JSON_ERROR token, then reset lexer state.
- *
- * Also note that this handling is required for reliable channel
- * negotiation between QMP and the guest agent, since chr(0xFF)
- * is placed at the beginning of certain events to ensure proper
- * delivery when the channel is in an unknown state. chr(0xFF) is
- * never a valid ASCII/UTF-8 sequence, so this should reliably
- * induce an error/flush state.
- */
json_message_process_token(lexer, lexer->token, JSON_ERROR,
lexer->x, lexer->y);
+ new_state = IN_RECOVERY;
+ /* fall through */
+ case IN_RECOVERY:
g_string_truncate(lexer->token, 0);
- lexer->state = lexer->start_state;
- return;
+ break;
default:
break;
}
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 4ae2245484..04ad7648d2 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -93,11 +93,7 @@ static void test_malformed(QTestState *qts)
g_assert(recovered(qts));
/* lexical error: interpolation */
- qtest_qmp_send_raw(qts, "%%p\n");
- /* two errors, one for "%", one for "p" */
- resp = qtest_qmp_receive(qts);
- g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
- qobject_unref(resp);
+ qtest_qmp_send_raw(qts, "%%p");
resp = qtest_qmp_receive(qts);
g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
qobject_unref(resp);
--
2.17.1
next prev parent reply other threads:[~2018-08-31 7:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-31 7:58 [Qemu-devel] [PATCH v2 0/6] json: More fixes, error reporting improvements, cleanups Markus Armbruster
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 1/6] json: Fix lexer for lookahead character beyond '\x7F' Markus Armbruster
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 2/6] json: Clean up how lexer consumes "end of input" Markus Armbruster
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 3/6] json: Make lexer's "character consumed" logic less confusing Markus Armbruster
2018-08-31 7:58 ` Markus Armbruster [this message]
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 5/6] json: Eliminate lexer state IN_ERROR Markus Armbruster
2018-08-31 7:58 ` [Qemu-devel] [PATCH v2 6/6] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180831075841.13363-5-armbru@redhat.com \
--to=armbru@redhat.com \
--cc=eblake@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).