qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <aliguori@us.ibm.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: aliguori@linux.vnet.ibm.com, agl@linux.vnet.ibm.com,
	qemu-devel@nongnu.org, Jes.Sorensen@redhat.com
Subject: [Qemu-devel] Re: [RFC][PATCH v1 01/12] json-lexer: make lexer error-recovery more deterministic
Date: Fri, 25 Mar 2011 16:18:11 -0500	[thread overview]
Message-ID: <4D8D0693.5090409@us.ibm.com> (raw)
In-Reply-To: <1301082479-4058-2-git-send-email-mdroth@linux.vnet.ibm.com>

On 03/25/2011 02:47 PM, Michael Roth wrote:
> Currently when we reach an error state we effectively flush everything
> fed to the lexer, which can put us in a state where we keep feeding
> tokens into the parser at arbitrary offsets in the stream. This makes it
> difficult for the lexer/tokenizer/parser to get back in sync when bad
> input is made by the client.
>
> With these changes we emit an error state/token up to the tokenizer as
> soon as we reach an error state, and continue processing any data passed
> in rather than bailing out. The reset token will be used to reset the
> tokenizer and parser, such that they'll recover state as soon as the
> lexer begins generating valid token sequences again.
>
> We also map chr(0xFF) to an error state here, since it's an invalid
> UTF-8 character. QMP guest proxy/agent use this to force a flush/reset
> of previous input for reliable delivery of certain events, so also we
> document that thoroughly here.
>
> Signed-off-by: Michael Roth<mdroth@linux.vnet.ibm.com>
> ---
>   json-lexer.c |   22 ++++++++++++++++++----
>   json-lexer.h |    1 +
>   2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/json-lexer.c b/json-lexer.c
> index 3462c89..21aa03a 100644
> --- a/json-lexer.c
> +++ b/json-lexer.c
> @@ -105,7 +105,7 @@ static const uint8_t json_lexer[][256] =  {
>           ['u'] = IN_DQ_UCODE0,
>       },
>       [IN_DQ_STRING] = {
> -        [1 ... 0xFF] = IN_DQ_STRING,
> +        [1 ... 0xFE] = IN_DQ_STRING,

We also need to exclude 192, 193, 245-254 as these are all invalid bytes 
in a UTF-8 sequence.  See http://en.wikipedia.org/wiki/UTF-8#Codepage_layout

We probably ought to actually handle UTF-8 extend byte sequences in the 
lexer but we can keep this as a future exercise.

>           ['\\'] = IN_DQ_STRING_ESCAPE,
>           ['"'] = JSON_STRING,
>       },
> @@ -144,7 +144,7 @@ static const uint8_t json_lexer[][256] =  {
>           ['u'] = IN_SQ_UCODE0,
>       },
>       [IN_SQ_STRING] = {
> -        [1 ... 0xFF] = IN_SQ_STRING,
> +        [1 ... 0xFE] = IN_SQ_STRING,
>           ['\\'] = IN_SQ_STRING_ESCAPE,
>           ['\''] = JSON_STRING,
>       },
> @@ -305,10 +305,25 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
>               new_state = IN_START;
>               break;
>           case ERROR:
> +            /* XXX: To avoid having previous bad input leaving the parser in an
> +             * unresponsive state where we consume unpredictable amounts of
> +             * subsequent "good" input, percolate this error state up to the
> +             * tokenizer/parser by forcing a NULL object to be emitted, then
> +             * reset state.
> +             *
> +             * Also note that this handling is required for reliable channel
> +             * negotiation between QMP and the guest agent, since chr(0xFF)
> +             * is placed at the beginning of certain events to ensure proper
> +             * delivery when the channel is in an unknown state. chr(0xFF) is
> +             * never a valid ASCII/UTF-8 sequence, so this should reliably
> +             * induce an error/flush state.
> +             */
> +            lexer->emit(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y);
>               QDECREF(lexer->token);
>               lexer->token = qstring_new();
>               new_state = IN_START;
> -            return -EINVAL;
> +            lexer->state = new_state;
> +            return 0;
>           default:
>               break;
>           }
> @@ -334,7 +349,6 @@ int json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size)
>
>       for (i = 0; i<  size; i++) {
>           int err;
> -

This whitespace change slipped in FWIW.

Regards,

Anthony Liguori

>           err = json_lexer_feed_char(lexer, buffer[i]);
>           if (err<  0) {
>               return err;
> diff --git a/json-lexer.h b/json-lexer.h
> index 3b50c46..10bc0a7 100644
> --- a/json-lexer.h
> +++ b/json-lexer.h
> @@ -25,6 +25,7 @@ typedef enum json_token_type {
>       JSON_STRING,
>       JSON_ESCAPE,
>       JSON_SKIP,
> +    JSON_ERROR,
>   } JSONTokenType;
>
>   typedef struct JSONLexer JSONLexer;

  reply	other threads:[~2011-03-25 21:18 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-25 19:47 [Qemu-devel] [RFC][PATCH v1 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent) Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 01/12] json-lexer: make lexer error-recovery more deterministic Michael Roth
2011-03-25 21:18   ` Anthony Liguori [this message]
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 02/12] json-streamer: add handling for JSON_ERROR token/state Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 03/12] json-parser: add handling for NULL token list Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 04/12] qapi: fix function name typo in qmp-gen.py Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 05/12] qapi: fix handling for null-return async callbacks Michael Roth
2011-03-25 21:22   ` [Qemu-devel] " Anthony Liguori
2011-03-28 16:47     ` Luiz Capitulino
2011-03-28 17:01       ` Anthony Liguori
2011-03-28 17:06         ` Luiz Capitulino
2011-03-28 17:19           ` Anthony Liguori
2011-03-28 17:27             ` Luiz Capitulino
2011-03-28 17:39               ` Anthony Liguori
2011-03-28 17:59       ` Michael Roth
2011-03-28 18:27         ` Anthony Liguori
2011-03-28 20:42           ` Michael Roth
2011-03-28 20:45             ` Anthony Liguori
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 06/12] qmp proxy: build qemu with guest proxy dependency Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 07/12] qmp proxy: core code for proxying qmp requests to guest Michael Roth
2011-03-25 21:27   ` [Qemu-devel] " Anthony Liguori
2011-03-25 21:56     ` Michael Roth
2011-03-28 19:05       ` Anthony Liguori
2011-03-28 19:57         ` Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 08/12] qemu-char: add qmp_proxy chardev Michael Roth
2011-03-25 21:29   ` [Qemu-devel] " Anthony Liguori
2011-03-25 22:11     ` Michael Roth
2011-03-28 17:45       ` Anthony Liguori
2011-03-29 18:54         ` Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 09/12] guest agent: core marshal/dispatch interfaces Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 10/12] guest agent: qemu-ga daemon Michael Roth
2011-04-01  9:45   ` [Qemu-devel] " Jes Sorensen
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 11/12] guest agent: guest-side command implementations Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 12/12] guest agent: build qemu-ga, add QEMU-wide gio dep Michael Roth
2011-03-25 20:42 ` [Qemu-devel] Re: [RFC][PATCH v1 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent) Michael Roth
2011-03-25 22:03   ` Anthony Liguori
2011-03-25 22:36     ` Michael Roth
2011-03-28 17:03       ` Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D8D0693.5090409@us.ibm.com \
    --to=aliguori@us.ibm.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=agl@linux.vnet.ibm.com \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).