qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: qemu-devel@nongnu.org
Cc: aliguori@us.ibm.com, akong@redhat.com, mdroth@linux.vnet.ibm.com
Subject: [Qemu-devel] [PATCH 3/9] qapi.py: Restructure lexer and parser
Date: Fri, 26 Jul 2013 14:39:41 +0200	[thread overview]
Message-ID: <1374842387-17146-4-git-send-email-armbru@redhat.com> (raw)
In-Reply-To: <1374842387-17146-1-git-send-email-armbru@redhat.com>

The parser has a rather unorthodox structure:

    Until EOF:

        Read a section:

            Generator function get_expr() yields one section after the
            other, as a string.  An unindented, non-empty line that
            isn't a comment starts a new section.

        Lexing:

            Split section into a list of tokens (strings), with help
            of generator function tokenize().

        Parsing:

            Parse the first expression from the list of tokens, with
            parse(), throw away any remaining tokens.

            In parse_schema(): record value of an enum, union or
            struct key (if any) in the appropriate global table,
            append expression to the list of expressions.

    Return list of expressions.

Known issues:

(1) Indentation is significant, unlike in real JSON.

(2) Neither lexer nor parser have any idea of source positions.  Error
    reporting is hard, let's go shopping.

(3) The one error we bother to detect, we "report" via raise.

(4) The lexer silently ignores invalid characters.

(5) If everything in a section gets ignored, the parser crashes.

(6) The lexer treats a string containing a structural character exactly
    like the structural character.

(7) Tokens trailing the first expression in a section are silently
    ignored.

(8) The parser accepts any token in place of a colon.

(9) The parser treats comma as optional.

(10) parse() crashes on unexpected EOF.

(11) parse_schema() crashes when a section's expression isn't a JSON
    object.

Replace this piece of original art by a thoroughly unoriginal design.
Takes care of (1), (2), (5), (6) and (7), and lays the groundwork for
addressing the others.  Generated source files remain unchanged.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 scripts/qapi.py                                | 163 +++++++++++++------------
 tests/qapi-schema/indented-expr.out            |   2 +-
 tests/qapi-schema/missing-colon.out            |   4 +-
 tests/qapi-schema/quoted-structural-chars.err  |   1 +
 tests/qapi-schema/quoted-structural-chars.exit |   2 +-
 tests/qapi-schema/quoted-structural-chars.out  |   3 -
 6 files changed, 88 insertions(+), 87 deletions(-)

diff --git a/scripts/qapi.py b/scripts/qapi.py
index baf1321..cac56b7 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -2,9 +2,11 @@
 # QAPI helper library
 #
 # Copyright IBM, Corp. 2011
+# Copyright (c) 2013 Red Hat Inc.
 #
 # Authors:
 #  Anthony Liguori <aliguori@us.ibm.com>
+#  Markus Armbruster <armbru@redhat.com>
 #
 # This work is licensed under the terms of the GNU GPLv2.
 # See the COPYING.LIB file in the top-level directory.
@@ -17,91 +19,92 @@ builtin_types = [
     'uint8', 'uint16', 'uint32', 'uint64'
 ]
 
-def tokenize(data):
-    while len(data):
-        ch = data[0]
-        data = data[1:]
-        if ch in ['{', '}', ':', ',', '[', ']']:
-            yield ch
-        elif ch in ' \n':
-            None
-        elif ch == "'":
-            string = ''
-            esc = False
-            while True:
-                if (data == ''):
-                    raise Exception("Mismatched quotes")
-                ch = data[0]
-                data = data[1:]
-                if esc:
-                    string += ch
-                    esc = False
-                elif ch == "\\":
-                    esc = True
-                elif ch == "'":
-                    break
-                else:
-                    string += ch
-            yield string
-
-def parse(tokens):
-    if tokens[0] == '{':
-        ret = OrderedDict()
-        tokens = tokens[1:]
-        while tokens[0] != '}':
-            key = tokens[0]
-            tokens = tokens[1:]
-
-            tokens = tokens[1:] # :
-
-            value, tokens = parse(tokens)
-
-            if tokens[0] == ',':
-                tokens = tokens[1:]
-
-            ret[key] = value
-        tokens = tokens[1:]
-        return ret, tokens
-    elif tokens[0] == '[':
-        ret = []
-        tokens = tokens[1:]
-        while tokens[0] != ']':
-            value, tokens = parse(tokens)
-            if tokens[0] == ',':
-                tokens = tokens[1:]
-            ret.append(value)
-        tokens = tokens[1:]
-        return ret, tokens
-    else:
-        return tokens[0], tokens[1:]
-
-def evaluate(string):
-    return parse(map(lambda x: x, tokenize(string)))[0]
-
-def get_expr(fp):
-    expr = ''
-
-    for line in fp:
-        if line.startswith('#') or line == '\n':
-            continue
-
-        if line.startswith(' '):
-            expr += line
-        elif expr:
-            yield expr
-            expr = line
+class QAPISchema:
+
+    def __init__(self, fp):
+        self.fp = fp
+        self.src = fp.read()
+        if self.src == '' or self.src[-1] != '\n':
+            self.src += '\n'
+        self.cursor = 0
+        self.exprs = []
+        self.accept()
+
+        while self.tok != None:
+            self.exprs.append(self.get_expr())
+
+    def accept(self):
+        while True:
+            bol = self.cursor == 0 or self.src[self.cursor-1] == '\n'
+            self.tok = self.src[self.cursor]
+            self.cursor += 1
+            self.val = None
+
+            if self.tok == '#' and bol:
+                self.cursor = self.src.find('\n', self.cursor)
+            elif self.tok in ['{', '}', ':', ',', '[', ']']:
+                return
+            elif self.tok == "'":
+                string = ''
+                esc = False
+                while True:
+                    ch = self.src[self.cursor]
+                    self.cursor += 1
+                    if ch == '\n':
+                        raise Exception("Mismatched quotes")
+                    if esc:
+                        string += ch
+                        esc = False
+                    elif ch == "\\":
+                        esc = True
+                    elif ch == "'":
+                        self.val = string
+                        return
+                    else:
+                        string += ch
+            elif self.tok == '\n':
+                if self.cursor == len(self.src):
+                    self.tok = None
+                    return
+
+    def get_members(self):
+        expr = OrderedDict()
+        while self.tok != '}':
+            key = self.val
+            self.accept()
+            self.accept()        # :
+            expr[key] = self.get_expr()
+            if self.tok == ',':
+                self.accept()
+        self.accept()
+        return expr
+
+    def get_values(self):
+        expr = []
+        while self.tok != ']':
+            expr.append(self.get_expr())
+            if self.tok == ',':
+                self.accept()
+        self.accept()
+        return expr
+
+    def get_expr(self):
+        if self.tok == '{':
+            self.accept()
+            expr = self.get_members()
+        elif self.tok == '[':
+            self.accept()
+            expr = self.get_values()
         else:
-            expr += line
-
-    if expr:
-        yield expr
+            expr = self.val
+            self.accept()
+        return expr
 
 def parse_schema(fp):
+    schema = QAPISchema(fp)
     exprs = []
 
-    for expr in get_expr(fp):
-        expr_eval = evaluate(expr)
-
+    for expr_eval in schema.exprs:
         if expr_eval.has_key('enum'):
             add_enum(expr_eval['enum'])
         elif expr_eval.has_key('union'):
diff --git a/tests/qapi-schema/indented-expr.out b/tests/qapi-schema/indented-expr.out
index 98ae692..98af89a 100644
--- a/tests/qapi-schema/indented-expr.out
+++ b/tests/qapi-schema/indented-expr.out
@@ -1,3 +1,3 @@
-[OrderedDict([('id', 'eins')])]
+[OrderedDict([('id', 'eins')]), OrderedDict([('id', 'zwei')])]
 []
 []
diff --git a/tests/qapi-schema/missing-colon.out b/tests/qapi-schema/missing-colon.out
index 50f827e..e67068c 100644
--- a/tests/qapi-schema/missing-colon.out
+++ b/tests/qapi-schema/missing-colon.out
@@ -1,3 +1,3 @@
-[OrderedDict([('enum', ','), ('data', ['good', 'bad', 'ugly'])])]
-[',']
+[OrderedDict([('enum', None), ('data', ['good', 'bad', 'ugly'])])]
+[None]
 []
diff --git a/tests/qapi-schema/quoted-structural-chars.err b/tests/qapi-schema/quoted-structural-chars.err
index e69de29..48c849d 100644
--- a/tests/qapi-schema/quoted-structural-chars.err
+++ b/tests/qapi-schema/quoted-structural-chars.err
@@ -0,0 +1 @@
+Crashed: <type 'exceptions.AttributeError'>
diff --git a/tests/qapi-schema/quoted-structural-chars.exit b/tests/qapi-schema/quoted-structural-chars.exit
index 573541a..d00491f 100644
--- a/tests/qapi-schema/quoted-structural-chars.exit
+++ b/tests/qapi-schema/quoted-structural-chars.exit
@@ -1 +1 @@
-0
+1
diff --git a/tests/qapi-schema/quoted-structural-chars.out b/tests/qapi-schema/quoted-structural-chars.out
index 85405be..e69de29 100644
--- a/tests/qapi-schema/quoted-structural-chars.out
+++ b/tests/qapi-schema/quoted-structural-chars.out
@@ -1,3 +0,0 @@
-[OrderedDict([('key1', 'value1'), ('key2', [])])]
-[]
-[]
-- 
1.7.11.7

  parent reply	other threads:[~2013-07-26 12:40 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-26 12:39 [Qemu-devel] [PATCH 0/9] Our QAPI parser is a hack, replace it Markus Armbruster
2013-07-26 12:39 ` [Qemu-devel] [PATCH 1/9] tests: QAPI schema parser tests Markus Armbruster
2013-07-26 12:48   ` Eric Blake
2013-07-26 14:16     ` Markus Armbruster
2013-07-26 14:57       ` Eric Blake
2013-07-26 15:31         ` Markus Armbruster
2013-07-26 12:39 ` [Qemu-devel] [PATCH 2/9] tests: Use qapi-schema-test.json as schema parser test Markus Armbruster
2013-07-26 13:17   ` Eric Blake
2013-07-27 15:34     ` Markus Armbruster
2013-07-26 12:39 ` Markus Armbruster [this message]
2013-07-26 13:54   ` [Qemu-devel] [PATCH 3/9] qapi.py: Restructure lexer and parser Eric Blake
2013-07-26 12:39 ` [Qemu-devel] [PATCH 4/9] qapi.py: Decent syntax error reporting Markus Armbruster
2013-07-26 15:30   ` Eric Blake
2013-07-26 19:33     ` Markus Armbruster
2013-07-26 19:48       ` Paolo Bonzini
2013-07-26 19:57         ` Eric Blake
2013-07-26 12:39 ` [Qemu-devel] [PATCH 5/9] qapi.py: Reject invalid characters in schema file Markus Armbruster
2013-07-26 15:32   ` Eric Blake
2013-07-26 12:39 ` [Qemu-devel] [PATCH 6/9] qapi.py: Fix schema parser to check syntax systematically Markus Armbruster
2013-07-26 15:56   ` Eric Blake
2013-07-26 19:35     ` Markus Armbruster
2013-07-26 19:42       ` Eric Blake
2013-07-26 12:39 ` [Qemu-devel] [PATCH 7/9] qapi.py: Fix diagnosing non-objects at a schema's top-level Markus Armbruster
2013-07-26 16:03   ` Eric Blake
2013-07-26 12:39 ` [Qemu-devel] [PATCH 8/9] qapi.py: Rename expr_eval to expr in parse_schema() Markus Armbruster
2013-07-26 16:13   ` Eric Blake
2013-07-26 12:39 ` [Qemu-devel] [PATCH 9/9] qapi.py: Permit comments starting anywhere on the line Markus Armbruster
2013-07-26 16:15   ` Eric Blake
2013-07-26 14:41 ` [Qemu-devel] [PATCH 0/9] Our QAPI parser is a hack, replace it Anthony Liguori
2013-07-26 15:36   ` Markus Armbruster
2013-07-26 17:47     ` Anthony Liguori
2013-07-26 19:48       ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1374842387-17146-4-git-send-email-armbru@redhat.com \
    --to=armbru@redhat.com \
    --cc=akong@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).