All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <aliguori@us.ibm.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: aliguori@linux.vnet.ibm.com, agl@linux.vnet.ibm.com,
	qemu-devel@nongnu.org, Jes.Sorensen@redhat.com
Subject: [Qemu-devel] Re: [RFC][PATCH v1 01/12] json-lexer: make lexer error-recovery more deterministic
Date: Fri, 25 Mar 2011 16:18:11 -0500	[thread overview]
Message-ID: <4D8D0693.5090409@us.ibm.com> (raw)
In-Reply-To: <1301082479-4058-2-git-send-email-mdroth@linux.vnet.ibm.com>

On 03/25/2011 02:47 PM, Michael Roth wrote:
> Currently when we reach an error state we effectively flush everything
> fed to the lexer, which can put us in a state where we keep feeding
> tokens into the parser at arbitrary offsets in the stream. This makes it
> difficult for the lexer/tokenizer/parser to get back in sync when bad
> input is made by the client.
>
> With these changes we emit an error state/token up to the tokenizer as
> soon as we reach an error state, and continue processing any data passed
> in rather than bailing out. The reset token will be used to reset the
> tokenizer and parser, such that they'll recover state as soon as the
> lexer begins generating valid token sequences again.
>
> We also map chr(0xFF) to an error state here, since it's an invalid
> UTF-8 character. QMP guest proxy/agent use this to force a flush/reset
> of previous input for reliable delivery of certain events, so also we
> document that thoroughly here.
>
> Signed-off-by: Michael Roth<mdroth@linux.vnet.ibm.com>
> ---
>   json-lexer.c |   22 ++++++++++++++++++----
>   json-lexer.h |    1 +
>   2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/json-lexer.c b/json-lexer.c
> index 3462c89..21aa03a 100644
> --- a/json-lexer.c
> +++ b/json-lexer.c
> @@ -105,7 +105,7 @@ static const uint8_t json_lexer[][256] =  {
>           ['u'] = IN_DQ_UCODE0,
>       },
>       [IN_DQ_STRING] = {
> -        [1 ... 0xFF] = IN_DQ_STRING,
> +        [1 ... 0xFE] = IN_DQ_STRING,

We also need to exclude 192, 193, 245-254 as these are all invalid bytes 
in a UTF-8 sequence.  See http://en.wikipedia.org/wiki/UTF-8#Codepage_layout

We probably ought to actually handle UTF-8 extend byte sequences in the 
lexer but we can keep this as a future exercise.

>           ['\\'] = IN_DQ_STRING_ESCAPE,
>           ['"'] = JSON_STRING,
>       },
> @@ -144,7 +144,7 @@ static const uint8_t json_lexer[][256] =  {
>           ['u'] = IN_SQ_UCODE0,
>       },
>       [IN_SQ_STRING] = {
> -        [1 ... 0xFF] = IN_SQ_STRING,
> +        [1 ... 0xFE] = IN_SQ_STRING,
>           ['\\'] = IN_SQ_STRING_ESCAPE,
>           ['\''] = JSON_STRING,
>       },
> @@ -305,10 +305,25 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
>               new_state = IN_START;
>               break;
>           case ERROR:
> +            /* XXX: To avoid having previous bad input leaving the parser in an
> +             * unresponsive state where we consume unpredictable amounts of
> +             * subsequent "good" input, percolate this error state up to the
> +             * tokenizer/parser by forcing a NULL object to be emitted, then
> +             * reset state.
> +             *
> +             * Also note that this handling is required for reliable channel
> +             * negotiation between QMP and the guest agent, since chr(0xFF)
> +             * is placed at the beginning of certain events to ensure proper
> +             * delivery when the channel is in an unknown state. chr(0xFF) is
> +             * never a valid ASCII/UTF-8 sequence, so this should reliably
> +             * induce an error/flush state.
> +             */
> +            lexer->emit(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y);
>               QDECREF(lexer->token);
>               lexer->token = qstring_new();
>               new_state = IN_START;
> -            return -EINVAL;
> +            lexer->state = new_state;
> +            return 0;
>           default:
>               break;
>           }
> @@ -334,7 +349,6 @@ int json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size)
>
>       for (i = 0; i<  size; i++) {
>           int err;
> -

This whitespace change slipped in FWIW.

Regards,

Anthony Liguori

>           err = json_lexer_feed_char(lexer, buffer[i]);
>           if (err<  0) {
>               return err;
> diff --git a/json-lexer.h b/json-lexer.h
> index 3b50c46..10bc0a7 100644
> --- a/json-lexer.h
> +++ b/json-lexer.h
> @@ -25,6 +25,7 @@ typedef enum json_token_type {
>       JSON_STRING,
>       JSON_ESCAPE,
>       JSON_SKIP,
> +    JSON_ERROR,
>   } JSONTokenType;
>
>   typedef struct JSONLexer JSONLexer;

  reply	other threads:[~2011-03-25 21:18 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-25 19:47 [Qemu-devel] [RFC][PATCH v1 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent) Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 01/12] json-lexer: make lexer error-recovery more deterministic Michael Roth
2011-03-25 21:18   ` Anthony Liguori [this message]
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 02/12] json-streamer: add handling for JSON_ERROR token/state Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 03/12] json-parser: add handling for NULL token list Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 04/12] qapi: fix function name typo in qmp-gen.py Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 05/12] qapi: fix handling for null-return async callbacks Michael Roth
2011-03-25 21:22   ` [Qemu-devel] " Anthony Liguori
2011-03-28 16:47     ` Luiz Capitulino
2011-03-28 17:01       ` Anthony Liguori
2011-03-28 17:06         ` Luiz Capitulino
2011-03-28 17:19           ` Anthony Liguori
2011-03-28 17:27             ` Luiz Capitulino
2011-03-28 17:39               ` Anthony Liguori
2011-03-28 17:59       ` Michael Roth
2011-03-28 18:27         ` Anthony Liguori
2011-03-28 20:42           ` Michael Roth
2011-03-28 20:45             ` Anthony Liguori
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 06/12] qmp proxy: build qemu with guest proxy dependency Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 07/12] qmp proxy: core code for proxying qmp requests to guest Michael Roth
2011-03-25 21:27   ` [Qemu-devel] " Anthony Liguori
2011-03-25 21:56     ` Michael Roth
2011-03-28 19:05       ` Anthony Liguori
2011-03-28 19:57         ` Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 08/12] qemu-char: add qmp_proxy chardev Michael Roth
2011-03-25 21:29   ` [Qemu-devel] " Anthony Liguori
2011-03-25 22:11     ` Michael Roth
2011-03-28 17:45       ` Anthony Liguori
2011-03-29 18:54         ` Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 09/12] guest agent: core marshal/dispatch interfaces Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 10/12] guest agent: qemu-ga daemon Michael Roth
2011-04-01  9:45   ` [Qemu-devel] " Jes Sorensen
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 11/12] guest agent: guest-side command implementations Michael Roth
2011-03-25 19:47 ` [Qemu-devel] [RFC][PATCH v1 12/12] guest agent: build qemu-ga, add QEMU-wide gio dep Michael Roth
2011-03-25 20:42 ` [Qemu-devel] Re: [RFC][PATCH v1 00/11] QEMU Guest Agent: QMP-based host/guest communication (virtagent) Michael Roth
2011-03-25 22:03   ` Anthony Liguori
2011-03-25 22:36     ` Michael Roth
2011-03-28 17:03       ` Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D8D0693.5090409@us.ibm.com \
    --to=aliguori@us.ibm.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=agl@linux.vnet.ibm.com \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.