From: Anthony Liguori <aliguori@us.ibm.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: aliguori@linux.vnet.ibm.com, agl@linux.vnet.ibm.com,
qemu-devel@nongnu.org, Jes.Sorensen@redhat.com
Subject: [Qemu-devel] Re: [RFC][PATCH v1 01/12] json-lexer: make lexer error-recovery more deterministic
Date: Fri, 25 Mar 2011 16:18:11 -0500 [thread overview]
Message-ID: <4D8D0693.5090409@us.ibm.com> (raw)
In-Reply-To: <1301082479-4058-2-git-send-email-mdroth@linux.vnet.ibm.com>
On 03/25/2011 02:47 PM, Michael Roth wrote:
> Currently when we reach an error state we effectively flush everything
> fed to the lexer, which can put us in a state where we keep feeding
> tokens into the parser at arbitrary offsets in the stream. This makes it
> difficult for the lexer/tokenizer/parser to get back in sync when bad
> input is made by the client.
>
> With these changes we emit an error state/token up to the tokenizer as
> soon as we reach an error state, and continue processing any data passed
> in rather than bailing out. The reset token will be used to reset the
> tokenizer and parser, such that they'll recover state as soon as the
> lexer begins generating valid token sequences again.
>
> We also map chr(0xFF) to an error state here, since it's an invalid
> UTF-8 character. QMP guest proxy/agent use this to force a flush/reset
> of previous input for reliable delivery of certain events, so also we
> document that thoroughly here.
>
> Signed-off-by: Michael Roth<mdroth@linux.vnet.ibm.com>
> ---
> json-lexer.c | 22 ++++++++++++++++++----
> json-lexer.h | 1 +
> 2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/json-lexer.c b/json-lexer.c
> index 3462c89..21aa03a 100644
> --- a/json-lexer.c
> +++ b/json-lexer.c
> @@ -105,7 +105,7 @@ static const uint8_t json_lexer[][256] = {
> ['u'] = IN_DQ_UCODE0,
> },
> [IN_DQ_STRING] = {
> - [1 ... 0xFF] = IN_DQ_STRING,
> + [1 ... 0xFE] = IN_DQ_STRING,
We also need to exclude 192, 193, 245-254 as these are all invalid bytes
in a UTF-8 sequence. See http://en.wikipedia.org/wiki/UTF-8#Codepage_layout
We probably ought to actually handle UTF-8 extend byte sequences in the
lexer but we can keep this as a future exercise.
> ['\\'] = IN_DQ_STRING_ESCAPE,
> ['"'] = JSON_STRING,
> },
> @@ -144,7 +144,7 @@ static const uint8_t json_lexer[][256] = {
> ['u'] = IN_SQ_UCODE0,
> },
> [IN_SQ_STRING] = {
> - [1 ... 0xFF] = IN_SQ_STRING,
> + [1 ... 0xFE] = IN_SQ_STRING,
> ['\\'] = IN_SQ_STRING_ESCAPE,
> ['\''] = JSON_STRING,
> },
> @@ -305,10 +305,25 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
> new_state = IN_START;
> break;
> case ERROR:
> + /* XXX: To avoid having previous bad input leaving the parser in an
> + * unresponsive state where we consume unpredictable amounts of
> + * subsequent "good" input, percolate this error state up to the
> + * tokenizer/parser by forcing a NULL object to be emitted, then
> + * reset state.
> + *
> + * Also note that this handling is required for reliable channel
> + * negotiation between QMP and the guest agent, since chr(0xFF)
> + * is placed at the beginning of certain events to ensure proper
> + * delivery when the channel is in an unknown state. chr(0xFF) is
> + * never a valid ASCII/UTF-8 sequence, so this should reliably
> + * induce an error/flush state.
> + */
> + lexer->emit(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y);
> QDECREF(lexer->token);
> lexer->token = qstring_new();
> new_state = IN_START;
> - return -EINVAL;
> + lexer->state = new_state;
> + return 0;
> default:
> break;
> }
> @@ -334,7 +349,6 @@ int json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size)
>
> for (i = 0; i< size; i++) {
> int err;
> -
This whitespace change slipped in FWIW.
Regards,
Anthony Liguori
> err = json_lexer_feed_char(lexer, buffer[i]);
> if (err< 0) {
> return err;
> diff --git a/json-lexer.h b/json-lexer.h
> index 3b50c46..10bc0a7 100644
> --- a/json-lexer.h
> +++ b/json-lexer.h
> @@ -25,6 +25,7 @@ typedef enum json_token_type {
> JSON_STRING,
> JSON_ESCAPE,
> JSON_SKIP,
> + JSON_ERROR,
> } JSONTokenType;
>
> typedef struct JSONLexer JSONLexer;
next prev parent reply other threads:[~2011-03-25 21:18 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-25 19:47 [Qemu-devel] [RFC][PATCH v1 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent) Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 01/12] json-lexer: make lexer error-recovery more deterministic Michael Roth
2011-03-25 21:18 ` Anthony Liguori [this message]
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 02/12] json-streamer: add handling for JSON_ERROR token/state Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 03/12] json-parser: add handling for NULL token list Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 04/12] qapi: fix function name typo in qmp-gen.py Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 05/12] qapi: fix handling for null-return async callbacks Michael Roth
2011-03-25 21:22 ` [Qemu-devel] " Anthony Liguori
2011-03-28 16:47 ` Luiz Capitulino
2011-03-28 17:01 ` Anthony Liguori
2011-03-28 17:06 ` Luiz Capitulino
2011-03-28 17:19 ` Anthony Liguori
2011-03-28 17:27 ` Luiz Capitulino
2011-03-28 17:39 ` Anthony Liguori
2011-03-28 17:59 ` Michael Roth
2011-03-28 18:27 ` Anthony Liguori
2011-03-28 20:42 ` Michael Roth
2011-03-28 20:45 ` Anthony Liguori
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 06/12] qmp proxy: build qemu with guest proxy dependency Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 07/12] qmp proxy: core code for proxying qmp requests to guest Michael Roth
2011-03-25 21:27 ` [Qemu-devel] " Anthony Liguori
2011-03-25 21:56 ` Michael Roth
2011-03-28 19:05 ` Anthony Liguori
2011-03-28 19:57 ` Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 08/12] qemu-char: add qmp_proxy chardev Michael Roth
2011-03-25 21:29 ` [Qemu-devel] " Anthony Liguori
2011-03-25 22:11 ` Michael Roth
2011-03-28 17:45 ` Anthony Liguori
2011-03-29 18:54 ` Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 09/12] guest agent: core marshal/dispatch interfaces Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 10/12] guest agent: qemu-ga daemon Michael Roth
2011-04-01 9:45 ` [Qemu-devel] " Jes Sorensen
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 11/12] guest agent: guest-side command implementations Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 12/12] guest agent: build qemu-ga, add QEMU-wide gio dep Michael Roth
2011-03-25 20:42 ` [Qemu-devel] Re: [RFC][PATCH v1 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent) Michael Roth
2011-03-25 22:03 ` Anthony Liguori
2011-03-25 22:36 ` Michael Roth
2011-03-28 17:03 ` Anthony Liguori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D8D0693.5090409@us.ibm.com \
--to=aliguori@us.ibm.com \
--cc=Jes.Sorensen@redhat.com \
--cc=agl@linux.vnet.ibm.com \
--cc=aliguori@linux.vnet.ibm.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).