qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] json-streamer: Fix up code to limit nesting and size
@ 2015-10-29 12:44 Markus Armbruster
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 1/4] json-streamer: Apply nesting limit more sanely Markus Armbruster
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Markus Armbruster @ 2015-10-29 12:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: lcapitulino

We limit nesting depth and input size to defend against input
triggering excessive heap or stack memory use (commit 29c75dd
json-streamer: limit the maximum recursion depth and maximum token
count).  This limiting is flawed in multiple ways.  Fix it up some.

Not yet fixed: this JSON parser is an absurd memory hog; see last
patch.

Markus Armbruster (4):
  json-streamer: Apply nesting limit more sanely
  json-streamer: Don't crash when input exceeds nesting limit
  check-qjson: Add test for JSON nesting depth limit
  json-streamer: Limit number of tokens in addition to total size

 qobject/json-streamer.c |  7 ++++---
 tests/check-qjson.c     | 29 +++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 3 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 1/4] json-streamer: Apply nesting limit more sanely
  2015-10-29 12:44 [Qemu-devel] [PATCH 0/4] json-streamer: Fix up code to limit nesting and size Markus Armbruster
@ 2015-10-29 12:44 ` Markus Armbruster
  2015-10-29 16:22   ` Eric Blake
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 2/4] json-streamer: Don't crash when input exceeds nesting limit Markus Armbruster
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2015-10-29 12:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: lcapitulino

The nesting limit from commit 29c75dd "json-streamer: limit the
maximum recursion depth and maximum token count" applies separately to
braces and brackets.  This makes no sense.  Apply it to their sum,
because that's actually a measure of recursion depth.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-streamer.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 1b2f9b1..dced2c7 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -64,8 +64,7 @@ static void json_message_process_token(JSONLexer *lexer, QString *token, JSONTok
          parser->bracket_count == 0)) {
         goto out_emit;
     } else if (parser->token_size > MAX_TOKEN_SIZE ||
-               parser->bracket_count > MAX_NESTING ||
-               parser->brace_count > MAX_NESTING) {
+               parser->bracket_count + parser->brace_count > MAX_NESTING) {
         /* Security consideration, we limit total memory allocated per object
          * and the maximum recursion depth that a message can force.
          */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 2/4] json-streamer: Don't crash when input exceeds nesting limit
  2015-10-29 12:44 [Qemu-devel] [PATCH 0/4] json-streamer: Fix up code to limit nesting and size Markus Armbruster
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 1/4] json-streamer: Apply nesting limit more sanely Markus Armbruster
@ 2015-10-29 12:44 ` Markus Armbruster
  2015-10-29 16:25   ` Eric Blake
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 3/4] check-qjson: Add test for JSON nesting depth limit Markus Armbruster
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size Markus Armbruster
  3 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2015-10-29 12:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: lcapitulino

We limit nesting depth and input size to defend against input
triggering excessive heap or stack memory use (commit 29c75dd
json-streamer: limit the maximum recursion depth and maximum token
count).  However, when the nesting limit is exceeded,
parser_context_peek_token()'s assertion fails.

Broken in commit 65c0f1e "json-parser: don't replicate tokens at each
level of recursion".

To reproduce stuff 1025 open braces or brackets into QMP.

Fix by taking the error exit instead of the normal one.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-streamer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index dced2c7..755c74d 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -68,7 +68,7 @@ static void json_message_process_token(JSONLexer *lexer, QString *token, JSONTok
         /* Security consideration, we limit total memory allocated per object
          * and the maximum recursion depth that a message can force.
          */
-        goto out_emit;
+        goto out_emit_bad;
     }
 
     return;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 3/4] check-qjson: Add test for JSON nesting depth limit
  2015-10-29 12:44 [Qemu-devel] [PATCH 0/4] json-streamer: Fix up code to limit nesting and size Markus Armbruster
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 1/4] json-streamer: Apply nesting limit more sanely Markus Armbruster
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 2/4] json-streamer: Don't crash when input exceeds nesting limit Markus Armbruster
@ 2015-10-29 12:44 ` Markus Armbruster
  2015-10-29 16:36   ` Eric Blake
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size Markus Armbruster
  3 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2015-10-29 12:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: lcapitulino

This would have prevented the regression mentioned in the previous
commit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/check-qjson.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index 1cfffa5..2579d79 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -1484,6 +1484,34 @@ static void unterminated_literal(void)
     g_assert(obj == NULL);
 }
 
+static char *make_nest(char *buf, size_t cnt)
+{
+    int i;
+
+    for (i = 0; i < cnt - 1; i++) {
+        buf[i] = '[';
+        buf[2 * cnt - i - 1] = ']';
+    }
+    buf[cnt - 1] = '{';
+    buf[cnt] = '}';
+    buf[2 * cnt] = 0;
+    return buf;
+}
+
+static void limits_nesting(void)
+{
+    enum { max_nesting = 1024 }; /* see qobject/json-streamer.c */
+    char buf[2 * (max_nesting + 1) + 1];
+    QObject *obj;
+
+    obj = qobject_from_json(make_nest(buf, max_nesting));
+    g_assert(obj != NULL);
+    qobject_decref(obj);
+
+    obj = qobject_from_json(make_nest(buf, max_nesting + 1));
+    g_assert(obj == NULL);
+}
+
 int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
@@ -1519,6 +1547,7 @@ int main(int argc, char **argv)
     g_test_add_func("/errors/invalid_array_comma", invalid_array_comma);
     g_test_add_func("/errors/invalid_dict_comma", invalid_dict_comma);
     g_test_add_func("/errors/unterminated/literal", unterminated_literal);
+    g_test_add_func("/errors/limits/nesting", limits_nesting);
 
     return g_test_run();
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size
  2015-10-29 12:44 [Qemu-devel] [PATCH 0/4] json-streamer: Fix up code to limit nesting and size Markus Armbruster
                   ` (2 preceding siblings ...)
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 3/4] check-qjson: Add test for JSON nesting depth limit Markus Armbruster
@ 2015-10-29 12:44 ` Markus Armbruster
  2015-10-29 16:43   ` Eric Blake
  3 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2015-10-29 12:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: lcapitulino

Commit 29c75dd "json-streamer: limit the maximum recursion depth and
maximum token count" attempts to guard against excessive heap usage by
limiting total token size (it says "token count", but that's a lie).

Total token size is a rather imprecise predictor of heap usage: many
small tokens use more space than few large tokens with the same input
size, because there's a constant per-token overhead.

Tighten this up: limit the token count to 128Ki.

If you think 128Ki is too stingy: check-qjson's large_dict test eats a
sweet 500MiB and pegs a core for four minutes on my machine to parse
~100K tokens.  Absurdly wasteful.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-streamer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 755c74d..df2b4c1 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -19,6 +19,7 @@
 #include "qapi/qmp/json-streamer.h"
 
 #define MAX_TOKEN_SIZE (64ULL << 20)
+#define MAX_TOKEN_COUNT (128ULL << 10)
 #define MAX_NESTING (1ULL << 10)
 
 static void json_message_process_token(JSONLexer *lexer, QString *token, JSONTokenType type, int x, int y)
@@ -64,6 +65,7 @@ static void json_message_process_token(JSONLexer *lexer, QString *token, JSONTok
          parser->bracket_count == 0)) {
         goto out_emit;
     } else if (parser->token_size > MAX_TOKEN_SIZE ||
+               qlist_size(parser->tokens) > MAX_TOKEN_COUNT ||
                parser->bracket_count + parser->brace_count > MAX_NESTING) {
         /* Security consideration, we limit total memory allocated per object
          * and the maximum recursion depth that a message can force.
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] json-streamer: Apply nesting limit more sanely
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 1/4] json-streamer: Apply nesting limit more sanely Markus Armbruster
@ 2015-10-29 16:22   ` Eric Blake
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Blake @ 2015-10-29 16:22 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: lcapitulino

[-- Attachment #1: Type: text/plain, Size: 1401 bytes --]

On 10/29/2015 06:44 AM, Markus Armbruster wrote:
> The nesting limit from commit 29c75dd "json-streamer: limit the
> maximum recursion depth and maximum token count" applies separately to
> braces and brackets.  This makes no sense.  Apply it to their sum,
> because that's actually a measure of recursion depth.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  qobject/json-streamer.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
> index 1b2f9b1..dced2c7 100644
> --- a/qobject/json-streamer.c
> +++ b/qobject/json-streamer.c
> @@ -64,8 +64,7 @@ static void json_message_process_token(JSONLexer *lexer, QString *token, JSONTok
>           parser->bracket_count == 0)) {
>          goto out_emit;
>      } else if (parser->token_size > MAX_TOKEN_SIZE ||
> -               parser->bracket_count > MAX_NESTING ||
> -               parser->brace_count > MAX_NESTING) {
> +               parser->bracket_count + parser->brace_count > MAX_NESTING) {
>          /* Security consideration, we limit total memory allocated per object
>           * and the maximum recursion depth that a message can force.
>           */
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] json-streamer: Don't crash when input exceeds nesting limit
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 2/4] json-streamer: Don't crash when input exceeds nesting limit Markus Armbruster
@ 2015-10-29 16:25   ` Eric Blake
  2015-11-23 17:21     ` Markus Armbruster
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2015-10-29 16:25 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: lcapitulino

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

On 10/29/2015 06:44 AM, Markus Armbruster wrote:
> We limit nesting depth and input size to defend against input
> triggering excessive heap or stack memory use (commit 29c75dd
> json-streamer: limit the maximum recursion depth and maximum token
> count).  However, when the nesting limit is exceeded,
> parser_context_peek_token()'s assertion fails.
> 
> Broken in commit 65c0f1e "json-parser: don't replicate tokens at each
> level of recursion".
> 
> To reproduce stuff 1025 open braces or brackets into QMP.
> 
> Fix by taking the error exit instead of the normal one.
> 
> Reported-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  qobject/json-streamer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

However, a couple comments about the context:

>    if (type == JSON_ERROR) {
>        goto out_emit_bad;
>    } else if (parser->brace_count < 0 ||
>        parser->bracket_count < 0 ||
>        (parser->brace_count == 0 &&
>         parser->bracket_count == 0)) {
>        goto out_emit;

Should we go to out_emit_bad for brace_count/bracket_count < 0, and save
out_emit only for the case where brace == bracket == 0?  Can we even
trigger negative counts (probably by attempting unpaired "{]]", but will
that trigger earlier errors?)

>    } else if (parser->token_size > MAX_TOKEN_SIZE ||
>               parser->bracket_count > MAX_NESTING ||
>               parser->brace_count > MAX_NESTING) {
>        /* Security consideration, we limit total memory allocated per
object
>         * and the maximum recursion depth that a message can force.
>         */
>        goto out_emit;
>    }
>
>    return;
>
>out_emit_bad:
>    /* clear out token list and tell the parser to emit and error

Typo: s/and error/an error/

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] check-qjson: Add test for JSON nesting depth limit
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 3/4] check-qjson: Add test for JSON nesting depth limit Markus Armbruster
@ 2015-10-29 16:36   ` Eric Blake
  2015-10-29 18:33     ` Markus Armbruster
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2015-10-29 16:36 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: lcapitulino

[-- Attachment #1: Type: text/plain, Size: 2461 bytes --]

On 10/29/2015 06:44 AM, Markus Armbruster wrote:
> This would have prevented the regression mentioned in the previous
> commit.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  tests/check-qjson.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)

Better late than never.

> +++ b/tests/check-qjson.c
> @@ -1484,6 +1484,34 @@ static void unterminated_literal(void)
>      g_assert(obj == NULL);
>  }
>  
> +static char *make_nest(char *buf, size_t cnt)
> +{
> +    int i;
> +
> +    for (i = 0; i < cnt - 1; i++) {
> +        buf[i] = '[';
> +        buf[2 * cnt - i - 1] = ']';
> +    }
> +    buf[cnt - 1] = '{';
> +    buf[cnt] = '}';
> +    buf[2 * cnt] = 0;
> +    return buf;
> +}

So buf must be at least 2*cnt+1 bytes long.  (Function is static, so
lack of comments don't hurt too badly).  For a cnt of 3 (buffer size at
least 7), this creates "[[{}]]".  Larger cnt adds more outer [] pairs.
The mixed content proves that patch 1/4 covers the combined limit of []
and {} when counting nesting.

Minor optimization - make the for loop bound be 'i < cnt - 2', so you
aren't writing [] in the middle just to rewrite it to {} after the loop
(works as long as caller never passes cnt == 1, which happens to be the
case).

> +
> +static void limits_nesting(void)
> +{
> +    enum { max_nesting = 1024 }; /* see qobject/json-streamer.c */
> +    char buf[2 * (max_nesting + 1) + 1];
> +    QObject *obj;
> +
> +    obj = qobject_from_json(make_nest(buf, max_nesting));
> +    g_assert(obj != NULL);
> +    qobject_decref(obj);

Proves that we can hit our max,

> +
> +    obj = qobject_from_json(make_nest(buf, max_nesting + 1));
> +    g_assert(obj == NULL);

and that we gracefully diagnose one beyond max.

> +}
> +
>  int main(int argc, char **argv)
>  {
>      g_test_init(&argc, &argv, NULL);
> @@ -1519,6 +1547,7 @@ int main(int argc, char **argv)
>      g_test_add_func("/errors/invalid_array_comma", invalid_array_comma);
>      g_test_add_func("/errors/invalid_dict_comma", invalid_dict_comma);
>      g_test_add_func("/errors/unterminated/literal", unterminated_literal);
> +    g_test_add_func("/errors/limits/nesting", limits_nesting);
>  
>      return g_test_run();
>  }
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size
  2015-10-29 12:44 ` [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size Markus Armbruster
@ 2015-10-29 16:43   ` Eric Blake
  2015-10-29 18:27     ` Markus Armbruster
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2015-10-29 16:43 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: lcapitulino

[-- Attachment #1: Type: text/plain, Size: 2534 bytes --]

On 10/29/2015 06:44 AM, Markus Armbruster wrote:
> Commit 29c75dd "json-streamer: limit the maximum recursion depth and
> maximum token count" attempts to guard against excessive heap usage by
> limiting total token size (it says "token count", but that's a lie).
> 
> Total token size is a rather imprecise predictor of heap usage: many
> small tokens use more space than few large tokens with the same input
> size, because there's a constant per-token overhead.
> 
> Tighten this up: limit the token count to 128Ki.
> 
> If you think 128Ki is too stingy: check-qjson's large_dict test eats a
> sweet 500MiB and pegs a core for four minutes on my machine to parse
> ~100K tokens.  Absurdly wasteful.

Sounds like we have some quadratic (or worse) scaling in the parser.
Worth fixing some day, but I also agree that we don't have to tackle it
in this series.

I'm assuming you temporarily patched check-qjson to use larger constants
when you hit your ~100K token testing?  Because I am definitely seeing a
lot of execution time spent on large_dict when running tests/check-qjson
by hand, in relation to all the other tests of that file, but not
minutes worth.  Care to post the diff you played with?

> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  qobject/json-streamer.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
> index 755c74d..df2b4c1 100644
> --- a/qobject/json-streamer.c
> +++ b/qobject/json-streamer.c
> @@ -19,6 +19,7 @@
>  #include "qapi/qmp/json-streamer.h"
>  
>  #define MAX_TOKEN_SIZE (64ULL << 20)
> +#define MAX_TOKEN_COUNT (128ULL << 10)
>  #define MAX_NESTING (1ULL << 10)
>  
>  static void json_message_process_token(JSONLexer *lexer, QString *token, JSONTokenType type, int x, int y)
> @@ -64,6 +65,7 @@ static void json_message_process_token(JSONLexer *lexer, QString *token, JSONTok
>           parser->bracket_count == 0)) {
>          goto out_emit;
>      } else if (parser->token_size > MAX_TOKEN_SIZE ||
> +               qlist_size(parser->tokens) > MAX_TOKEN_COUNT ||
>                 parser->bracket_count + parser->brace_count > MAX_NESTING) {
>          /* Security consideration, we limit total memory allocated per object
>           * and the maximum recursion depth that a message can force.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size
  2015-10-29 16:43   ` Eric Blake
@ 2015-10-29 18:27     ` Markus Armbruster
  2015-10-29 23:35       ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2015-10-29 18:27 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, lcapitulino

Eric Blake <eblake@redhat.com> writes:

> On 10/29/2015 06:44 AM, Markus Armbruster wrote:
>> Commit 29c75dd "json-streamer: limit the maximum recursion depth and
>> maximum token count" attempts to guard against excessive heap usage by
>> limiting total token size (it says "token count", but that's a lie).
>> 
>> Total token size is a rather imprecise predictor of heap usage: many
>> small tokens use more space than few large tokens with the same input
>> size, because there's a constant per-token overhead.
>> 
>> Tighten this up: limit the token count to 128Ki.
>> 
>> If you think 128Ki is too stingy: check-qjson's large_dict test eats a
>> sweet 500MiB and pegs a core for four minutes on my machine to parse
>> ~100K tokens.  Absurdly wasteful.
>
> Sounds like we have some quadratic (or worse) scaling in the parser.
> Worth fixing some day, but I also agree that we don't have to tackle it
> in this series.

I believe it's linear with a criminally negligent constant (several KiB
per token).  The first hog is actually fairly obvious: we use on QDict
per token.

> I'm assuming you temporarily patched check-qjson to use larger constants
> when you hit your ~100K token testing?  Because I am definitely seeing a
> lot of execution time spent on large_dict when running tests/check-qjson
> by hand, in relation to all the other tests of that file, but not
> minutes worth.  Care to post the diff you played with?

I tested on a slow machine.

>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  qobject/json-streamer.c | 2 ++
>>  1 file changed, 2 insertions(+)
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] check-qjson: Add test for JSON nesting depth limit
  2015-10-29 16:36   ` Eric Blake
@ 2015-10-29 18:33     ` Markus Armbruster
  0 siblings, 0 replies; 15+ messages in thread
From: Markus Armbruster @ 2015-10-29 18:33 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, lcapitulino

Eric Blake <eblake@redhat.com> writes:

> On 10/29/2015 06:44 AM, Markus Armbruster wrote:
>> This would have prevented the regression mentioned in the previous
>> commit.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  tests/check-qjson.c | 29 +++++++++++++++++++++++++++++
>>  1 file changed, 29 insertions(+)
>
> Better late than never.
>
>> +++ b/tests/check-qjson.c
>> @@ -1484,6 +1484,34 @@ static void unterminated_literal(void)
>>      g_assert(obj == NULL);
>>  }
>>  
>> +static char *make_nest(char *buf, size_t cnt)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < cnt - 1; i++) {
>> +        buf[i] = '[';
>> +        buf[2 * cnt - i - 1] = ']';
>> +    }
>> +    buf[cnt - 1] = '{';
>> +    buf[cnt] = '}';
>> +    buf[2 * cnt] = 0;
>> +    return buf;
>> +}
>
> So buf must be at least 2*cnt+1 bytes long.  (Function is static, so
> lack of comments don't hurt too badly).  For a cnt of 3 (buffer size at
> least 7), this creates "[[{}]]".  Larger cnt adds more outer [] pairs.
> The mixed content proves that patch 1/4 covers the combined limit of []
> and {} when counting nesting.
>
> Minor optimization - make the for loop bound be 'i < cnt - 2', so you
> aren't writing [] in the middle just to rewrite it to {} after the loop
> (works as long as caller never passes cnt == 1, which happens to be the
> case).

The loop's first assignment fills 0..cnt-2 inclusive, the second one
fills 2*cnt-1..(2*cnt-1)-(cnt-2) = 2*cnt-1..cnt+1 inclusive.

If I need to respin, I'll change to a pair of memset().

>> +static void limits_nesting(void)
>> +{
>> +    enum { max_nesting = 1024 }; /* see qobject/json-streamer.c */
>> +    char buf[2 * (max_nesting + 1) + 1];
>> +    QObject *obj;
>> +
>> +    obj = qobject_from_json(make_nest(buf, max_nesting));
>> +    g_assert(obj != NULL);
>> +    qobject_decref(obj);
>
> Proves that we can hit our max,
>
>> +
>> +    obj = qobject_from_json(make_nest(buf, max_nesting + 1));
>> +    g_assert(obj == NULL);
>
> and that we gracefully diagnose one beyond max.
>
>> +}
>> +
>>  int main(int argc, char **argv)
>>  {
>>      g_test_init(&argc, &argv, NULL);
>> @@ -1519,6 +1547,7 @@ int main(int argc, char **argv)
>>      g_test_add_func("/errors/invalid_array_comma", invalid_array_comma);
>>      g_test_add_func("/errors/invalid_dict_comma", invalid_dict_comma);
>>      g_test_add_func("/errors/unterminated/literal", unterminated_literal);
>> +    g_test_add_func("/errors/limits/nesting", limits_nesting);
>>  
>>      return g_test_run();
>>  }
>> 
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size
  2015-10-29 18:27     ` Markus Armbruster
@ 2015-10-29 23:35       ` Eric Blake
  2015-10-30  7:52         ` Markus Armbruster
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2015-10-29 23:35 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 2333 bytes --]

On 10/29/2015 12:27 PM, Markus Armbruster wrote:
>> Sounds like we have some quadratic (or worse) scaling in the parser.
>> Worth fixing some day, but I also agree that we don't have to tackle it
>> in this series.
> 
> I believe it's linear with a criminally negligent constant (several KiB
> per token).  The first hog is actually fairly obvious: we use on QDict
> per token.

Wow.  I just read through the code, and you're right.  We are passing
around QDicts right and left (one per token, with 4 keys, which is
several mallocs per token), and then creating a QList of those tokens.

Prior to commit 65c0f1e9, when we were really incrementing the refcount
of each token on each level of nesting (as part of copying context, for
definite quadratic behavior), the refcounts let us ensure tokens would
be cleaned up at the end.  But I'm hard pressed to see the refcount of
tokens going above one in the current code, which means we aren't
gaining anything by using QDict for reference counting.  For that
matter, JSON is quite linear - the code talks about needing to backtrack
in different contexts, but JSON doesn't have ambiguities that need
backtracking to try multiple different rules.  It seems like the code is
overengineered because it is copied from another language where
backtracking to try several alternate parses actually makes sense.

I suspect that using a C struct per token, and a C array of those
structs, would go a long way towards alleviating memory abuse per token.

Are you tackling that, or would you like me to take a stab at it while
you work on flushing my pending qapi patches?

> 
>> I'm assuming you temporarily patched check-qjson to use larger constants
>> when you hit your ~100K token testing?  Because I am definitely seeing a
>> lot of execution time spent on large_dict when running tests/check-qjson
>> by hand, in relation to all the other tests of that file, but not
>> minutes worth.  Care to post the diff you played with?
> 
> I tested on a slow machine.

I guess it was all the malloc pressure on a low-memory system that would
make it so much slower than what I'm seeing, if you stuck with the
default gen_test_json(gstr, 10, 100).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size
  2015-10-29 23:35       ` Eric Blake
@ 2015-10-30  7:52         ` Markus Armbruster
  2015-10-30 15:22           ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2015-10-30  7:52 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, lcapitulino

Eric Blake <eblake@redhat.com> writes:

> On 10/29/2015 12:27 PM, Markus Armbruster wrote:
>>> Sounds like we have some quadratic (or worse) scaling in the parser.
>>> Worth fixing some day, but I also agree that we don't have to tackle it
>>> in this series.
>> 
>> I believe it's linear with a criminally negligent constant (several KiB
>> per token).  The first hog is actually fairly obvious: we use on QDict
>> per token.
>
> Wow.  I just read through the code, and you're right.  We are passing
> around QDicts right and left (one per token, with 4 keys, which is
> several mallocs per token), and then creating a QList of those tokens.

Disgusting, isn't it?

> Prior to commit 65c0f1e9, when we were really incrementing the refcount
> of each token on each level of nesting (as part of copying context, for
> definite quadratic behavior), the refcounts let us ensure tokens would
> be cleaned up at the end.  But I'm hard pressed to see the refcount of
> tokens going above one in the current code, which means we aren't
> gaining anything by using QDict for reference counting.  For that
> matter, JSON is quite linear - the code talks about needing to backtrack
> in different contexts, but JSON doesn't have ambiguities that need
> backtracking to try multiple different rules.  It seems like the code is
> overengineered because it is copied from another language where
> backtracking to try several alternate parses actually makes sense.

Yes, starting with a copy of an inappropriate solution is a possible
explanation for this mess.

> I suspect that using a C struct per token, and a C array of those
> structs, would go a long way towards alleviating memory abuse per token.

Yes.

> Are you tackling that, or would you like me to take a stab at it while
> you work on flushing my pending qapi patches?

I think I could use a bit of hacking to clear my mind between bouts of
patch review.  This job should be about the right size.  Plus, I better
become more familiar with qobject/json-* --- Luiz is dying to get rid of
it ;)

>>> I'm assuming you temporarily patched check-qjson to use larger constants
>>> when you hit your ~100K token testing?  Because I am definitely seeing a
>>> lot of execution time spent on large_dict when running tests/check-qjson
>>> by hand, in relation to all the other tests of that file, but not
>>> minutes worth.  Care to post the diff you played with?
>> 
>> I tested on a slow machine.
>
> I guess it was all the malloc pressure on a low-memory system that would
> make it so much slower than what I'm seeing, if you stuck with the
> default gen_test_json(gstr, 10, 100).

Something must have been wrong with this machine yesterday, because
today it runs the exact same test much, much faster.  Computers...

I'll update the commit message.  Thanks for your sanity check!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size
  2015-10-30  7:52         ` Markus Armbruster
@ 2015-10-30 15:22           ` Eric Blake
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Blake @ 2015-10-30 15:22 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On 10/30/2015 01:52 AM, Markus Armbruster wrote:

>>>> I'm assuming you temporarily patched check-qjson to use larger constants
>>>> when you hit your ~100K token testing?  Because I am definitely seeing a
>>>> lot of execution time spent on large_dict when running tests/check-qjson
>>>> by hand, in relation to all the other tests of that file, but not
>>>> minutes worth.  Care to post the diff you played with?
>>>
>>> I tested on a slow machine.
>>
>> I guess it was all the malloc pressure on a low-memory system that would
>> make it so much slower than what I'm seeing, if you stuck with the
>> default gen_test_json(gstr, 10, 100).
> 
> Something must have been wrong with this machine yesterday, because
> today it runs the exact same test much, much faster.  Computers...

Or maybe you had still reverted 65c0f1e9 to figure out where the
regression was at asserting rather than gracefully failing as fixed by
patch 2/4?  Still, even if it is much faster, it is noticeably slow
enough to wonder if we have some inefficiencies beyond the poor memory
usage.

> 
> I'll update the commit message.  Thanks for your sanity check!
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] json-streamer: Don't crash when input exceeds nesting limit
  2015-10-29 16:25   ` Eric Blake
@ 2015-11-23 17:21     ` Markus Armbruster
  0 siblings, 0 replies; 15+ messages in thread
From: Markus Armbruster @ 2015-11-23 17:21 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, lcapitulino

Eric Blake <eblake@redhat.com> writes:

> On 10/29/2015 06:44 AM, Markus Armbruster wrote:
>> We limit nesting depth and input size to defend against input
>> triggering excessive heap or stack memory use (commit 29c75dd
>> json-streamer: limit the maximum recursion depth and maximum token
>> count).  However, when the nesting limit is exceeded,
>> parser_context_peek_token()'s assertion fails.
>> 
>> Broken in commit 65c0f1e "json-parser: don't replicate tokens at each
>> level of recursion".
>> 
>> To reproduce stuff 1025 open braces or brackets into QMP.
>> 
>> Fix by taking the error exit instead of the normal one.
>> 
>> Reported-by: Eric Blake <eblake@redhat.com>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  qobject/json-streamer.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> However, a couple comments about the context:
>
>>    if (type == JSON_ERROR) {
>>        goto out_emit_bad;
>>    } else if (parser->brace_count < 0 ||
>>        parser->bracket_count < 0 ||
>>        (parser->brace_count == 0 &&
>>         parser->bracket_count == 0)) {
>>        goto out_emit;
>
> Should we go to out_emit_bad for brace_count/bracket_count < 0, and save
> out_emit only for the case where brace == bracket == 0?  Can we even
> trigger negative counts (probably by attempting unpaired "{]]", but will
> that trigger earlier errors?)

Fair questions.

Unpaired closing brace or bracket makes the count go negative.  A simple
test input is "}".  As far as I can tell, nothing breaks: we emit an
error, and the parser recovers.  Perhaps the error exit would be more
appropriate anyway, but I can't tell.  Let's leave it alone as long as
it works.

>>    } else if (parser->token_size > MAX_TOKEN_SIZE ||
>>               parser->bracket_count > MAX_NESTING ||
>>               parser->brace_count > MAX_NESTING) {
>>        /* Security consideration, we limit total memory allocated per object
>>         * and the maximum recursion depth that a message can force.
>>         */
>>        goto out_emit;
>>    }
>>
>>    return;
>>
>>out_emit_bad:
>>    /* clear out token list and tell the parser to emit and error
>
> Typo: s/and error/an error/

Fixing, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-23 17:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-29 12:44 [Qemu-devel] [PATCH 0/4] json-streamer: Fix up code to limit nesting and size Markus Armbruster
2015-10-29 12:44 ` [Qemu-devel] [PATCH 1/4] json-streamer: Apply nesting limit more sanely Markus Armbruster
2015-10-29 16:22   ` Eric Blake
2015-10-29 12:44 ` [Qemu-devel] [PATCH 2/4] json-streamer: Don't crash when input exceeds nesting limit Markus Armbruster
2015-10-29 16:25   ` Eric Blake
2015-11-23 17:21     ` Markus Armbruster
2015-10-29 12:44 ` [Qemu-devel] [PATCH 3/4] check-qjson: Add test for JSON nesting depth limit Markus Armbruster
2015-10-29 16:36   ` Eric Blake
2015-10-29 18:33     ` Markus Armbruster
2015-10-29 12:44 ` [Qemu-devel] [PATCH 4/4] json-streamer: Limit number of tokens in addition to total size Markus Armbruster
2015-10-29 16:43   ` Eric Blake
2015-10-29 18:27     ` Markus Armbruster
2015-10-29 23:35       ` Eric Blake
2015-10-30  7:52         ` Markus Armbruster
2015-10-30 15:22           ` Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).