qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: qemu-devel@nongnu.org
Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com,
	eblake@redhat.com
Subject: [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation
Date: Wed,  8 Aug 2018 14:02:56 +0200	[thread overview]
Message-ID: <20180808120334.10970-19-armbru@redhat.com> (raw)
In-Reply-To: <20180808120334.10970-1-armbru@redhat.com>

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c | 80 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 71 insertions(+), 9 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index e85e9a78ff..109a7d8bb8 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -18,21 +18,83 @@
 #define MAX_TOKEN_SIZE (64ULL << 20)
 
 /*
- * Required by JSON (RFC 7159):
+ * From RFC 7159 "The JavaScript Object Notation (JSON) Data
+ * Interchange Format", with [comments in brackets]:
  *
- * \"([^\\\"]|\\[\"'\\/bfnrt]|\\u[0-9a-fA-F]{4})*\"
- * -?(0|[1-9][0-9]*)(.[0-9]+)?([eE][-+]?[0-9]+)?
- * [{}\[\],:]
- * [a-z]+   # covers null, true, false
+ * The set of tokens includes six structural characters, strings,
+ * numbers, and three literal names.
  *
- * Extension of '' strings:
+ * These are the six structural characters:
  *
- * '([^\\']|\\[\"'\\/bfnrt]|\\u[0-9a-fA-F]{4})*'
+ *    begin-array     = ws %x5B ws  ; [ left square bracket
+ *    begin-object    = ws %x7B ws  ; { left curly bracket
+ *    end-array       = ws %x5D ws  ; ] right square bracket
+ *    end-object      = ws %x7D ws  ; } right curly bracket
+ *    name-separator  = ws %x3A ws  ; : colon
+ *    value-separator = ws %x2C ws  ; , comma
  *
- * Extension for vararg handling in JSON construction:
+ * Insignificant whitespace is allowed before or after any of the six
+ * structural characters.
+ * [This lexer accepts it before or after any token, which is actually
+ * the same, as the grammar always has structural characters between
+ * other tokens.]
  *
- * %((l|ll|I64)?d|[ipsf])
+ *    ws = *(
+ *           %x20 /              ; Space
+ *           %x09 /              ; Horizontal tab
+ *           %x0A /              ; Line feed or New line
+ *           %x0D )              ; Carriage return
  *
+ * [...] three literal names:
+ *    false null true
+ *  [This lexer accepts [a-z]+, and leaves rejecting unknown literal
+ *  names to the parser.]
+ *
+ * [Numbers:]
+ *
+ *    number = [ minus ] int [ frac ] [ exp ]
+ *    decimal-point = %x2E       ; .
+ *    digit1-9 = %x31-39         ; 1-9
+ *    e = %x65 / %x45            ; e E
+ *    exp = e [ minus / plus ] 1*DIGIT
+ *    frac = decimal-point 1*DIGIT
+ *    int = zero / ( digit1-9 *DIGIT )
+ *    minus = %x2D               ; -
+ *    plus = %x2B                ; +
+ *    zero = %x30                ; 0
+ *
+ * [Strings:]
+ *    string = quotation-mark *char quotation-mark
+ *
+ *    char = unescaped /
+ *        escape (
+ *            %x22 /          ; "    quotation mark  U+0022
+ *            %x5C /          ; \    reverse solidus U+005C
+ *            %x2F /          ; /    solidus         U+002F
+ *            %x62 /          ; b    backspace       U+0008
+ *            %x66 /          ; f    form feed       U+000C
+ *            %x6E /          ; n    line feed       U+000A
+ *            %x72 /          ; r    carriage return U+000D
+ *            %x74 /          ; t    tab             U+0009
+ *            %x75 4HEXDIG )  ; uXXXX                U+XXXX
+ *    escape = %x5C              ; \
+ *    quotation-mark = %x22      ; "
+ *    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
+ *
+ *
+ * Extensions over RFC 7159:
+ * - Extra escape sequence in strings:
+ *   0x27 (apostrophe) is recognized after escape, too
+ * - Single-quoted strings:
+ *   Like double-quoted strings, except they're delimited by %x27
+ *   (apostrophe) instead of %x22 (quotation mark), and can't contain
+ *   unescaped apostrophe, but can contain unescaped quotation mark.
+ * - Interpolation:
+ *   interpolation = %((l|ll|I64)[du]|[ipsf])
+ *
+ * Note:
+ * - Input must be encoded in UTF-8.
+ * - Decoding and validating is left to the parser.
  */
 
 enum json_lexer_state {
-- 
2.17.1

  parent reply	other threads:[~2018-08-08 12:03 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-08 12:02 [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 01/56] check-qjson: Cover multiple JSON objects in same string Markus Armbruster
2018-08-09 13:25   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 02/56] check-qjson: Cover blank and lexically erroneous input Markus Armbruster
2018-08-09 13:29   ` Eric Blake
2018-08-10 13:40     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 03/56] check-qjson: Cover whitespace more thoroughly Markus Armbruster
2018-08-09 13:36   ` Eric Blake
2018-08-10 13:43     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 04/56] qmp-cmd-test: Split off qmp-test Markus Armbruster
2018-08-09 13:38   ` Eric Blake
2018-08-10 13:49     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 05/56] qmp-test: Cover syntax and lexical errors Markus Armbruster
2018-08-09 13:42   ` Eric Blake
2018-08-10 13:52     ` Markus Armbruster
2018-08-10 14:06       ` Eric Blake
2018-08-16 12:44         ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 06/56] test-qga: Clean up how we test QGA synchronization Markus Armbruster
2018-08-09 13:46   ` Eric Blake
2018-08-10 13:57     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 07/56] check-qjson: Cover escaped characters more thoroughly, part 1 Markus Armbruster
2018-08-09 13:54   ` Eric Blake
2018-08-10 14:03     ` Markus Armbruster
2018-08-09 14:00   ` Eric Blake
2018-08-10 14:11     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 08/56] check-qjson: Streamline escaped_string()'s test strings Markus Armbruster
2018-08-09 13:57   ` Eric Blake
2018-08-10 14:15     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 09/56] check-qjson: Cover escaped characters more thoroughly, part 2 Markus Armbruster
2018-08-09 14:03   ` Eric Blake
2018-08-10 14:16     ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 10/56] check-qjson: Drop redundant string tests Markus Armbruster
2018-08-09 14:04   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings Markus Armbruster
2018-08-09 14:17   ` Eric Blake
2018-08-10 14:18     ` Markus Armbruster
2018-08-10 14:59       ` Eric Blake
2018-08-13  6:11         ` Markus Armbruster
2018-08-13 14:53           ` Eric Blake
2018-08-14  6:01             ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 12/56] check-qjson: Simplify utf8_string() Markus Armbruster
2018-08-09 14:20   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 13/56] check-qjson: Fix utf8_string() to test all invalid sequences Markus Armbruster
2018-08-09 14:22   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 14/56] check-qjson qmp-test: Cover control characters more thoroughly Markus Armbruster
2018-08-09 17:24   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 15/56] check-qjson: Cover interpolation " Markus Armbruster
2018-08-09 17:26   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 16/56] json: Fix lexer to include the bad character in JSON_ERROR token Markus Armbruster
2018-08-09 17:42   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters Markus Armbruster
2018-08-09 18:26   ` Eric Blake
2018-08-10 14:26     ` Markus Armbruster
2018-08-08 12:02 ` Markus Armbruster [this message]
2018-08-09 18:49   ` [Qemu-devel] [PATCH 18/56] json: Revamp lexer documentation Eric Blake
2018-08-10 14:31     ` Markus Armbruster
2018-08-10 15:02       ` Eric Blake
2018-08-13  6:12         ` Markus Armbruster
2018-08-08 12:02 ` [Qemu-devel] [PATCH 19/56] json: Tighten and simplify qstring_from_escaped_str()'s loop Markus Armbruster
2018-08-09 18:52   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 20/56] check-qjson: Document we expect invalid UTF-8 to be rejected Markus Armbruster
2018-08-09 18:55   ` Eric Blake
2018-08-08 12:02 ` [Qemu-devel] [PATCH 21/56] json: Reject invalid UTF-8 sequences Markus Armbruster
2018-08-09 22:16   ` Eric Blake
2018-08-10 14:40     ` Markus Armbruster
2018-08-10 15:21       ` Eric Blake
2018-08-16 14:50         ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 22/56] json: Report first rather than last parse error Markus Armbruster
2018-08-10 15:25   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 23/56] json: Leave rejecting invalid UTF-8 to parser Markus Armbruster
2018-08-10 15:36   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 24/56] json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") Markus Armbruster
2018-08-10 15:48   ` Eric Blake
2018-08-10 16:09     ` Eric Blake
2018-08-13  7:00       ` Markus Armbruster
2018-08-13 14:57         ` Eric Blake
2018-08-14  6:07           ` Markus Armbruster
2018-08-17  7:18         ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 25/56] json: Leave rejecting invalid escape sequences to parser Markus Armbruster
2018-08-10 15:56   ` Eric Blake
2018-08-13  7:05     ` Markus Armbruster
2018-08-13 14:58       ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 26/56] json: Simplify parse_string() Markus Armbruster
2018-08-10 15:59   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 27/56] json: Reject invalid \uXXXX, fix \u0000 Markus Armbruster
2018-08-10 16:10   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 28/56] json: Fix \uXXXX for surrogate pairs Markus Armbruster
2018-08-10 17:18   ` Eric Blake
2018-08-13  7:07     ` Markus Armbruster
2018-08-12  9:52   ` Paolo Bonzini
2018-08-13  7:12     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 29/56] check-qjson: Fix and enable utf8_string()'s disabled part Markus Armbruster
2018-08-10 17:19   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 30/56] json: remove useless return value from lexer/parser Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 31/56] json-parser: simplify and avoid JSONParserContext allocation Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 32/56] json: Have lexer call streamer directly Markus Armbruster
2018-08-10 17:22   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 33/56] json: Redesign the callback to consume JSON values Markus Armbruster
2018-08-13 15:30   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 34/56] json: Don't pass null @tokens to json_parser_parse() Markus Armbruster
2018-08-13 15:32   ` Eric Blake
2018-08-14  6:17     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 35/56] json: Don't create JSON_ERROR tokens that won't be used Markus Armbruster
2018-08-13 15:32   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 36/56] json: Rename token JSON_ESCAPE & friends to JSON_INTERPOL Markus Armbruster
2018-08-13 15:34   ` Eric Blake
2018-08-14  6:28     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 37/56] json: Treat unwanted interpolation as lexical error Markus Armbruster
2018-08-13 15:48   ` Eric Blake
2018-08-14  6:51     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 38/56] json: Pass lexical errors and limit violations to callback Markus Armbruster
2018-08-13 15:51   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 39/56] json: Leave rejecting invalid interpolation to parser Markus Armbruster
2018-08-13 16:12   ` Eric Blake
2018-08-14  7:23     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 40/56] json: Replace %I64d, %I64u by %PRId64, %PRIu64 Markus Armbruster
2018-08-13 16:18   ` Eric Blake
2018-08-14  7:24     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 41/56] json: Nicer recovery from invalid leading zero Markus Armbruster
2018-08-13 16:33   ` Eric Blake
2018-08-14  8:24     ` Markus Armbruster
2018-08-14 13:14       ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 42/56] json: Improve names of lexer states related to numbers Markus Armbruster
2018-08-13 16:36   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 43/56] qjson: Fix qobject_from_json() & friends for multiple values Markus Armbruster
2018-08-14 13:26   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 44/56] json: Fix latent parser aborts at end of input Markus Armbruster
2018-08-16 13:10   ` Eric Blake
2018-08-16 15:19     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 45/56] json: Fix streamer not to ignore trailing unterminated structures Markus Armbruster
2018-08-16 13:12   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 46/56] json: Assert json_parser_parse() consumes all tokens on success Markus Armbruster
2018-08-16 13:13   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 47/56] qjson: Have qobject_from_json() & friends reject empty and blank Markus Armbruster
2018-08-16 13:20   ` Eric Blake
2018-08-16 15:40     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 48/56] json: Enforce token count and size limits more tightly Markus Armbruster
2018-08-16 13:22   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 49/56] json: Streamline json_message_process_token() Markus Armbruster
2018-08-16 13:40   ` Eric Blake
2018-08-16 15:42     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 50/56] json: Unbox tokens queue in JSONMessageParser Markus Armbruster
2018-08-16 13:42   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 51/56] json: Eliminate lexer state IN_ERROR and pseudo-token JSON_MIN Markus Armbruster
2018-08-16 13:45   ` Eric Blake
2018-08-16 15:48     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 52/56] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP Markus Armbruster
2018-08-16 13:51   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 53/56] json: Make JSONToken opaque outside json-parser.c Markus Armbruster
2018-08-16 13:54   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 54/56] qobject: Drop superfluous includes of qemu-common.h Markus Armbruster
2018-08-16 13:54   ` Eric Blake
2018-08-08 12:03 ` [Qemu-devel] [PATCH 55/56] json: Clean up headers Markus Armbruster
2018-08-16 17:50   ` Eric Blake
2018-08-17  8:22     ` Markus Armbruster
2018-08-08 12:03 ` [Qemu-devel] [PATCH 56/56] docs/interop/qmp-spec: How to force known good parser state Markus Armbruster
2018-08-10 14:30   ` Eric Blake
2018-08-17  8:37     ` Markus Armbruster
2018-08-17 14:34       ` Eric Blake
2018-08-17 11:16     ` Markus Armbruster
2018-08-17 14:35       ` Eric Blake
2018-08-08 14:03 ` [Qemu-devel] [PATCH 00/56] json: Fixes, error reporting improvements, cleanups Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180808120334.10970-19-armbru@redhat.com \
    --to=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).