[Qemu-devel] [PATCH v1][ 12/14] json-lexer: make lexer error-recovery more deterministic

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Michael Roth <mdroth@linux.vnet.ibm.com>
To: qemu-devel@nongnu.org
Cc: aliguori@linux.vnet.ibm.com, Jes.Sorensen@redhat.com,
	agl@linux.vnet.ibm.com, mdroth@linux.vnet.ibm.com,
	lcapitulino@redhat.com
Subject: [Qemu-devel] [PATCH v1][ 12/14] json-lexer: make lexer error-recovery more deterministic
Date: Wed,  1 Jun 2011 12:14:58 -0500	[thread overview]
Message-ID: <1306948500-15086-13-git-send-email-mdroth@linux.vnet.ibm.com> (raw)
In-Reply-To: <1306948500-15086-1-git-send-email-mdroth@linux.vnet.ibm.com>

Currently when we reach an error state we effectively flush everything
fed to the lexer, which can put us in a state where we keep feeding
tokens into the parser at arbitrary offsets in the stream. This makes it
difficult for the lexer/tokenizer/parser to get back in sync when bad
input is made by the client.

With these changes we emit an error state/token up to the tokenizer as
soon as we reach an error state, and continue processing any data passed
in rather than bailing out. The reset token will be used to reset the
tokenizer and parser, such that they'll recover state as soon as the
lexer begins generating valid token sequences again.

We also map chr(192,193,245-255) to an error state here, since they are
invalid UTF-8 characters. QMP guest proxy/agent will use chr(255) to
force a flush/reset of previous input for reliable delivery of certain
events, so also we document that thoroughly here.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 json-lexer.c |   25 +++++++++++++++++++++----
 json-lexer.h |    1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/json-lexer.c b/json-lexer.c
index 6b49047..c21338f 100644
--- a/json-lexer.c
+++ b/json-lexer.c
@@ -105,7 +105,8 @@ static const uint8_t json_lexer[][256] =  {
         ['u'] = IN_DQ_UCODE0,
     },
     [IN_DQ_STRING] = {
-        [1 ... 0xFF] = IN_DQ_STRING,
+        [1 ... 0xBF] = IN_DQ_STRING,
+        [0xC2 ... 0xF4] = IN_DQ_STRING,
         ['\\'] = IN_DQ_STRING_ESCAPE,
         ['"'] = JSON_STRING,
     },
@@ -144,7 +145,8 @@ static const uint8_t json_lexer[][256] =  {
         ['u'] = IN_SQ_UCODE0,
     },
     [IN_SQ_STRING] = {
-        [1 ... 0xFF] = IN_SQ_STRING,
+        [1 ... 0xBF] = IN_SQ_STRING,
+        [0xC2 ... 0xF4] = IN_SQ_STRING,
         ['\\'] = IN_SQ_STRING_ESCAPE,
         ['\''] = JSON_STRING,
     },
@@ -305,10 +307,25 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
             new_state = IN_START;
             break;
         case IN_ERROR:
+            /* XXX: To avoid having previous bad input leaving the parser in an
+             * unresponsive state where we consume unpredictable amounts of
+             * subsequent "good" input, percolate this error state up to the
+             * tokenizer/parser by forcing a NULL object to be emitted, then
+             * reset state.
+             *
+             * Also note that this handling is required for reliable channel
+             * negotiation between QMP and the guest agent, since chr(0xFF)
+             * is placed at the beginning of certain events to ensure proper
+             * delivery when the channel is in an unknown state. chr(0xFF) is
+             * never a valid ASCII/UTF-8 sequence, so this should reliably
+             * induce an error/flush state.
+             */
+            lexer->emit(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y);
             QDECREF(lexer->token);
             lexer->token = qstring_new();
             new_state = IN_START;
-            return -EINVAL;
+            lexer->state = new_state;
+            return 0;
         default:
             break;
         }
@@ -346,7 +363,7 @@ int json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size)
 
 int json_lexer_flush(JSONLexer *lexer)
 {
-    return lexer->state == IN_START ? 0 : json_lexer_feed_char(lexer, 0);
+    return lexer->state == IN_START ? 0 : json_lexer_feed_char(lexer, 0, true);
 }
 
 void json_lexer_destroy(JSONLexer *lexer)
diff --git a/json-lexer.h b/json-lexer.h
index 3b50c46..10bc0a7 100644
--- a/json-lexer.h
+++ b/json-lexer.h
@@ -25,6 +25,7 @@ typedef enum json_token_type {
     JSON_STRING,
     JSON_ESCAPE,
     JSON_SKIP,
+    JSON_ERROR,
 } JSONTokenType;
 
 typedef struct JSONLexer JSONLexer;
-- 
1.7.0.4

next prev parent reply	other threads:[~2011-06-01 21:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-01 17:14 [Qemu-devel] [QAPI+QGA 1/3] Error propagation and JSON parser fix-ups Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 01/14] QError: Introduce qerror_format_desc() Michael Roth
2011-06-07 19:18   ` Anthony Liguori
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 02/14] QError: Introduce qerror_format() Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 03/14] Introduce the new error framework Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 04/14] json-parser: propagate error from parser Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 05/14] json-streamer: allow recovery after bad input Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 06/14] json-lexer: limit the maximum size of a given token Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 07/14] json-streamer: limit the maximum recursion depth and maximum token count Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 08/14] json-streamer: make sure to reset token_size after emitting a token list Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 09/14] json-parser: detect premature EOI Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 10/14] json-lexer: reset the lexer state on an invalid token Michael Roth
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 11/14] json-lexer: fix flushing logic to not always go to error state Michael Roth
2011-06-01 17:14 ` Michael Roth [this message]
2011-06-01 17:14 ` [Qemu-devel] [PATCH v1][ 13/14] json-streamer: add handling for JSON_ERROR token/state Michael Roth
2011-06-01 17:15 ` [Qemu-devel] [PATCH v1][ 14/14] json-parser: add handling for NULL token list Michael Roth

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:6b49047 dfblob:c21338f dfblob:3b50c46 dfblob:10bc0a7 )
 OR (
bs:"[Qemu-devel] [PATCH v1][ 12/14] json-lexer: make lexer error-recovery more deterministic" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1306948500-15086-13-git-send-email-mdroth@linux.vnet.ibm.com \
    --to=mdroth@linux.vnet.ibm.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=agl@linux.vnet.ibm.com \
    --cc=aliguori@linux.vnet.ibm.com \
    --cc=lcapitulino@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).