All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Yihan Ding" <dingyihan@uniontech.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH bpf v3 1/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare()
Date: Thu, 16 Apr 2026 13:03:12 +0000	[thread overview]
Message-ID: <20260416130312.E370FC2BCAF@smtp.kernel.org> (raw)
In-Reply-To: <20260416120142.1420646-2-dingyihan@uniontech.com>

Sashiko AI review found 1 potential issue(s):
- [Medium] UTF-8 literals are incorrectly rejected when placed immediately after '%s' or '%p' format specifiers.
--

commit c7ffc79748019b47230d65f7c4646b2bba78f020
Author: Yihan Ding <dingyihan@uniontech.com>

bpf: allow UTF-8 literals in bpf_bprintf_prepare()

This commit modifies bpf_bprintf_prepare() to permit bytes >= 0x80 in
plain text, allowing UTF-8 literals to pass through unchanged while still
rejecting ASCII control bytes.

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index bb95e287b0dc8..2bb60200c2665 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -845,7 +845,13 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
>  		data->buf = buffers->buf;
>  
>  	for (i = 0; i < fmt_size; i++) {
> -		if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i])) {
> +		unsigned char c = fmt[i];
> +
> +		/*
> +		 * Permit bytes >= 0x80 in plain text so UTF-8 literals can pass
> +		 * through unchanged, while still rejecting ASCII control bytes.
> +		 */
> +		if (isascii(c) && !isprint(c) && !isspace(c)) {
>  			err = -EINVAL;
>  			goto out;
>  		}

Does this change overlook the lookahead checks performed on characters
immediately following format specifiers like %s and %p?

For example, looking at the parsing logic for %s:

kernel/bpf/helpers.c:bpf_bprintf_prepare() {
    ...
		} else if (fmt[i] == 's') {
			fmt_ptype = fmt[i];
fmt_str:
			if (fmt[i + 1] != 0 &&
			    !isspace(fmt[i + 1]) &&
			    !ispunct(fmt[i + 1])) {
				err = -EINVAL;
				goto out;
			}
    ...
}

Because the initial loop now permits bytes >= 0x80, these bytes can appear
immediately after a format specifier. If a UTF-8 literal is placed immediately
after the specifier, such as in the string "%s你好", fmt[i + 1] will be the
first byte of the UTF-8 character.

Since isspace() and ispunct() only match ASCII characters, both will
evaluate to false. Will this cause the parser to incorrectly reject valid
UTF-8 strings with -EINVAL?

A similar lookahead check exists for %p:

kernel/bpf/helpers.c:bpf_bprintf_prepare() {
    ...
		if (fmt[i] == 'p') {
			sizeof_cur_arg = sizeof(long);

			if (fmt[i + 1] == 0 || isspace(fmt[i + 1]) ||
			    ispunct(fmt[i + 1])) {
				if (tmp_buf)
					cur_arg = raw_args[num_spec];
				goto nocopy_fmt;
			}
    ...
}

If fmt[i + 1] is a UTF-8 character, it won't match isspace() or ispunct().
It will then fall through to the %pI4/%pi4 checks and also be rejected with
-EINVAL.

Can the lookahead checks be updated to also permit bytes >= 0x80, matching
the new plain text parsing rules?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260416120142.1420646-1-dingyihan@uniontech.com?part=1

  reply	other threads:[~2026-04-16 13:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16 12:01 [PATCH bpf v3 0/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() Yihan Ding
2026-04-16 12:01 ` [PATCH bpf v3 1/2] " Yihan Ding
2026-04-16 13:03   ` sashiko-bot [this message]
2026-04-16 22:32   ` Paul Chaignon
2026-04-16 12:01 ` [PATCH bpf v3 2/2] selftests/bpf: cover UTF-8 trace_printk output Yihan Ding
2026-04-16 22:35   ` Paul Chaignon
2026-04-16 23:00 ` [PATCH bpf v3 0/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260416130312.E370FC2BCAF@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=dingyihan@uniontech.com \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.