All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Richard W.M. Jones" <rjones@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH 1/2] util: Refactor json-writer's string sanitizer to be public
Date: Fri, 2 Aug 2024 22:00:27 +0100	[thread overview]
Message-ID: <20240802210027.GW1450@redhat.com> (raw)
In-Reply-To: <20240802194156.2131519-5-eblake@redhat.com>

On Fri, Aug 02, 2024 at 02:26:05PM -0500, Eric Blake wrote:
> My next patch needs to convert text from an untrusted input into an
> output representation that is suitable for display on a terminal is
> useful to more than just the json-writer; the text should normally be
> UTF-8, but blindly allowing all Unicode code points (including ASCII
> ESC) through to a terminal risks remote-code-execution attacks on some
> terminals.  Extract the existing body of json-writer's quoted_strinto
> a new helper routine mod_utf8_sanitize, and generalize it to also work
> on data that is length-limited rather than NUL-terminated.  [I was
> actually surprised that glib does not have such a sanitizer already -
> Google turns up lots of examples of rolling your own string
> sanitizer.]
> 
> If desired in the future, we may want to tweak whether the output is
> guaranteed to be ASCII (using lots of \u escape sequences, including
> surrogate pairs for code points outside the BMP) or if we are okay
> passing printable Unicode through (we still need to escape control
> characters).  But for now, I went for minimal code churn, including
> the fact that the resulting function allows a non-UTF-8 2-byte synonym
> for U+0000.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

I have to admit I'd never heard of "modified UTF-8" before, but it's a
thing:

https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

As the patch is almost a simple code motion:

Reviewed-by: Richard W.M. Jones <rjones@redhat.com>

Rich.

> ---
>  include/qemu/unicode.h |  3 ++
>  qobject/json-writer.c  | 47 +----------------------
>  util/unicode.c         | 84 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+), 46 deletions(-)
> 
> diff --git a/include/qemu/unicode.h b/include/qemu/unicode.h
> index 7fa10b8e604..e1013b29f72 100644
> --- a/include/qemu/unicode.h
> +++ b/include/qemu/unicode.h
> @@ -4,4 +4,7 @@
>  int mod_utf8_codepoint(const char *s, size_t n, char **end);
>  ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint);
> 
> +void mod_utf8_sanitize(GString *buf, const char *str);
> +void mod_utf8_sanitize_len(GString *buf, const char *str, ssize_t len);
> +
>  #endif
> diff --git a/qobject/json-writer.c b/qobject/json-writer.c
> index 309a31d57a9..5c14574efee 100644
> --- a/qobject/json-writer.c
> +++ b/qobject/json-writer.c
> @@ -104,53 +104,8 @@ static void pretty_newline_or_space(JSONWriter *writer)
> 
>  static void quoted_str(JSONWriter *writer, const char *str)
>  {
> -    const char *ptr;
> -    char *end;
> -    int cp;
> -
>      g_string_append_c(writer->contents, '"');
> -
> -    for (ptr = str; *ptr; ptr = end) {
> -        cp = mod_utf8_codepoint(ptr, 6, &end);
> -        switch (cp) {
> -        case '\"':
> -            g_string_append(writer->contents, "\\\"");
> -            break;
> -        case '\\':
> -            g_string_append(writer->contents, "\\\\");
> -            break;
> -        case '\b':
> -            g_string_append(writer->contents, "\\b");
> -            break;
> -        case '\f':
> -            g_string_append(writer->contents, "\\f");
> -            break;
> -        case '\n':
> -            g_string_append(writer->contents, "\\n");
> -            break;
> -        case '\r':
> -            g_string_append(writer->contents, "\\r");
> -            break;
> -        case '\t':
> -            g_string_append(writer->contents, "\\t");
> -            break;
> -        default:
> -            if (cp < 0) {
> -                cp = 0xFFFD; /* replacement character */
> -            }
> -            if (cp > 0xFFFF) {
> -                /* beyond BMP; need a surrogate pair */
> -                g_string_append_printf(writer->contents, "\\u%04X\\u%04X",
> -                                       0xD800 + ((cp - 0x10000) >> 10),
> -                                       0xDC00 + ((cp - 0x10000) & 0x3FF));
> -            } else if (cp < 0x20 || cp >= 0x7F) {
> -                g_string_append_printf(writer->contents, "\\u%04X", cp);
> -            } else {
> -                g_string_append_c(writer->contents, cp);
> -            }
> -        }
> -    };
> -
> +    mod_utf8_sanitize(writer->contents, str);
>      g_string_append_c(writer->contents, '"');
>  }
> 
> diff --git a/util/unicode.c b/util/unicode.c
> index 8580bc598b3..a419ed4de35 100644
> --- a/util/unicode.c
> +++ b/util/unicode.c
> @@ -154,3 +154,87 @@ ssize_t mod_utf8_encode(char buf[], size_t bufsz, int codepoint)
>      buf[4] = 0;
>      return 4;
>  }
> +
> +/**
> + * mod_utf8_sanitize:
> + * @buf: Destination buffer
> + * @str: Modified UTF-8 string to sanitize
> + *
> + * Append the contents of the NUL-terminated Modified UTF-8 string
> + * @str into @buf, with escape sequences used for non-printable ASCII
> + * and for quote characters.  The result is therefore safe for output
> + * to a terminal.
> + *
> + * Modified UTF-8 is exactly like UTF-8, except U+0000 is encoded as
> + * "\xC0\x80".
> + */
> +void mod_utf8_sanitize(GString *buf, const char *str)
> +{
> +    mod_utf8_sanitize_len(buf, str, -1);
> +}
> +
> +/**
> + * mod_utf8_sanitize:
> + * @buf: Destination buffer
> + * @str: Modified UTF-8 string to sanitize
> + * @len: Length of @str, or negative to stop at NUL terminator
> + *
> + * Append the contents of @len bytes of the Modified UTF-8 string
> + * @str into @buf, with escape sequences used for non-printable ASCII
> + * and for quote characters.  The result is therefore safe for output
> + * to a terminal.
> + *
> + * Modified UTF-8 is exactly like UTF-8, except U+0000 is encoded as
> + * "\xC0\x80".
> + */
> +void mod_utf8_sanitize_len(GString *buf, const char *str, ssize_t len)
> +{
> +    const char *ptr;
> +    char *end;
> +    int cp;
> +
> +    if (len < 0) {
> +        len = strlen(str);
> +    }
> +
> +    for (ptr = str; *ptr; ptr = end) {
> +        cp = mod_utf8_codepoint(ptr, MIN(6, str + len - ptr), &end);
> +        switch (cp) {
> +        case '\"':
> +            g_string_append(buf, "\\\"");
> +            break;
> +        case '\\':
> +            g_string_append(buf, "\\\\");
> +            break;
> +        case '\b':
> +            g_string_append(buf, "\\b");
> +            break;
> +        case '\f':
> +            g_string_append(buf, "\\f");
> +            break;
> +        case '\n':
> +            g_string_append(buf, "\\n");
> +            break;
> +        case '\r':
> +            g_string_append(buf, "\\r");
> +            break;
> +        case '\t':
> +            g_string_append(buf, "\\t");
> +            break;
> +        default:
> +            if (cp < 0) {
> +                cp = 0xFFFD; /* replacement character */
> +            }
> +            if (cp > 0xFFFF) {
> +                /* beyond BMP; need a surrogate pair */
> +                g_string_append_printf(buf, "\\u%04X\\u%04X",
> +                                       0xD800 + ((cp - 0x10000) >> 10),
> +                                       0xDC00 + ((cp - 0x10000) & 0x3FF));
> +            } else if (cp < 0x20 || cp >= 0x7F) {
> +                g_string_append_printf(buf, "\\u%04X", cp);
> +            } else {
> +                g_string_append_c(buf, cp);
> +            }
> +        }
> +    }
> +}
> -- 
> 2.45.2

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



  reply	other threads:[~2024-08-02 21:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-02 19:26 [PATCH for-9.1 0/2] NBD: don't print raw server error text to terminal Eric Blake
2024-08-02 19:26 ` [PATCH 1/2] util: Refactor json-writer's string sanitizer to be public Eric Blake
2024-08-02 21:00   ` Richard W.M. Jones [this message]
2024-08-02 21:38   ` Philippe Mathieu-Daudé
2024-08-07 18:49   ` Daniel P. Berrangé
2024-08-08  7:57     ` Markus Armbruster
2024-08-08  7:54   ` Markus Armbruster
2024-08-08 14:02     ` Eric Blake
2024-08-02 19:26 ` [PATCH 2/2] qemu-img: CVE-XXX Sanitize untrusted output from NBD server Eric Blake
2024-08-02 21:03   ` Richard W.M. Jones
2024-08-07 18:45     ` Daniel P. Berrangé
2024-08-02 21:41   ` Philippe Mathieu-Daudé
2024-08-07 13:43     ` Stefan Hajnoczi
2024-08-02 22:01   ` Richard W.M. Jones
2024-08-03  8:20     ` Richard W.M. Jones
2024-08-05 18:48 ` [PATCH for-9.1 0/2] NBD: don't print raw server error text to terminal Eric Blake
2024-08-05 19:11   ` Richard W.M. Jones
2024-08-07 17:51     ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240802210027.GW1450@redhat.com \
    --to=rjones@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.