From: Paul Chaignon <paul.chaignon@gmail.com>
To: Yihan Ding <dingyihan@uniontech.com>
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
bpf@vger.kernel.org, shuah@kernel.org,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
kernel@uniontech.com
Subject: Re: [PATCH bpf] bpf: allow UTF-8 literals in bpf_bprintf_prepare()
Date: Tue, 14 Apr 2026 17:26:02 +0200 [thread overview]
Message-ID: <ad5ciqi7KAq0C9A3@mail.gmail.com> (raw)
In-Reply-To: <20260414014001.814324-1-dingyihan@uniontech.com>
On Tue, Apr 14, 2026 at 09:40:01AM +0800, Yihan Ding wrote:
> bpf_bprintf_prepare() currently rejects any non-ASCII byte in the format
> string, such as UTF-8 Chinese text.
>
> All BPF formatted output helpers that go through bpf_bprintf_prepare(),
> such as bpf_trace_printk(), bpf_seq_printf() and bpf_snprintf(), only
> need ASCII parsing for conversion specifiers. Plain text does not need
> that restriction, but today any byte >= 0x80 makes the format fail
> validation.
>
> As a result, UTF-8 text literals are rejected even when they are not part
> of a format specifier. In practice, an ASCII-only bpf_trace_printk()
> format works, while the same format with UTF-8 literal text produces no
> trace output.
>
> Allow non-ASCII bytes in plain text while keeping the existing control
> character checks and keeping format specifiers ASCII-only. This preserves
> the current parsing rules for '%' sequences and allows valid UTF-8 text
> to be emitted.
Nice! I can finally print proper French from BPF :)
>
> Extend the trace_printk selftest accordingly by emitting both ASCII and
> UTF-8 strings and verifying that both appear in the trace output.
>
> Fixes: 48cac3f4a96d ("bpf: Implement formatted output helpers with bstr_printf")
> Signed-off-by: Yihan Ding <dingyihan@uniontech.com>
Not sure if this should go to bpf or bpf-next. I'll let the maintainers
decide.
> ---
> Testing:
> - Reproduced on x86_64 without this patch: ASCII trace output works, while
> UTF-8 literal text in bpf_trace_printk() is rejected and produces no trace
> output.
> - Verified with tools/testing/selftests/bpf: ./test_progs -t trace_printk
>
> kernel/bpf/helpers.c | 21 +++++++++++++-----
> .../selftests/bpf/prog_tests/trace_printk.c | 22 ++++++++++++++-----
> .../selftests/bpf/progs/trace_printk.c | 5 +++++
> 3 files changed, 38 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 6eb6c82ed2ee..e2f103297e4a 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -845,7 +845,13 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
> data->buf = buffers->buf;
>
> for (i = 0; i < fmt_size; i++) {
> - if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i])) {
> + unsigned char c = fmt[i];
> +
> + /*
> + * Permit non-ASCII bytes in plain text so UTF-8 messages can be
> + * emitted, while keeping format specifiers ASCII-only.
> + */
> + if (isascii(c) && !isprint(c) && !isspace(c)) {
> err = -EINVAL;
> goto out;
> }
> @@ -867,6 +873,10 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
> * always access fmt[i + 1], in the worst case it will be a 0
> */
> i++;
> + if (!isascii((unsigned char)fmt[i])) {
> + err = -EINVAL;
> + goto out;
> + }
>
> /* skip optional "[0 +-][num]" width formatting field */
> while (fmt[i] == '0' || fmt[i] == '+' || fmt[i] == '-' ||
> @@ -881,8 +891,9 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
> if (fmt[i] == 'p') {
> sizeof_cur_arg = sizeof(long);
>
> - if (fmt[i + 1] == 0 || isspace(fmt[i + 1]) ||
> - ispunct(fmt[i + 1])) {
> + if (fmt[i + 1] == 0 ||
> + isspace((unsigned char)fmt[i + 1]) ||
> + ispunct((unsigned char)fmt[i + 1])) {
Why is this change needed? isspace and ispunct already cast to unsigned
char.
> if (tmp_buf)
> cur_arg = raw_args[num_spec];
> goto nocopy_fmt;
> @@ -958,8 +969,8 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
> fmt_ptype = fmt[i];
> fmt_str:
> if (fmt[i + 1] != 0 &&
> - !isspace(fmt[i + 1]) &&
> - !ispunct(fmt[i + 1])) {
> + !isspace((unsigned char)fmt[i + 1]) &&
> + !ispunct((unsigned char)fmt[i + 1])) {
Same here.
> err = -EINVAL;
> goto out;
> }
> diff --git a/tools/testing/selftests/bpf/prog_tests/trace_printk.c b/tools/testing/selftests/bpf/prog_tests/trace_printk.c
> index e56e88596d64..f7b03dc4eaf4 100644
> --- a/tools/testing/selftests/bpf/prog_tests/trace_printk.c
> +++ b/tools/testing/selftests/bpf/prog_tests/trace_printk.c
> @@ -6,18 +6,21 @@
> #include "trace_printk.lskel.h"
>
> #define SEARCHMSG "testing,testing"
> +#define SEARCHMSG_UTF8 "中文,测试"
>
> static void trace_pipe_cb(const char *str, void *data)
> {
> if (strstr(str, SEARCHMSG) != NULL)
> - (*(int *)data)++;
> + ((int *)data)[0]++;
> + if (strstr(str, SEARCHMSG_UTF8) != NULL)
> + ((int *)data)[1]++;
> }
>
> void serial_test_trace_printk(void)
> {
> struct trace_printk_lskel__bss *bss;
> struct trace_printk_lskel *skel;
> - int err = 0, found = 0;
> + int err = 0, found[2] = {};
>
> skel = trace_printk_lskel__open();
> if (!ASSERT_OK_PTR(skel, "trace_printk__open"))
> @@ -46,11 +49,20 @@ void serial_test_trace_printk(void)
> if (!ASSERT_GT(bss->trace_printk_ret, 0, "bss->trace_printk_ret"))
> goto cleanup;
>
> - /* verify our search string is in the trace buffer */
> - ASSERT_OK(read_trace_pipe_iter(trace_pipe_cb, &found, 1000),
> + if (!ASSERT_GT(bss->trace_printk_utf8_ran, 0, "bss->trace_printk_utf8_ran"))
> + goto cleanup;
> +
> + if (!ASSERT_GT(bss->trace_printk_utf8_ret, 0, "bss->trace_printk_utf8_ret"))
> + goto cleanup;
> +
> + /* verify our search strings are in the trace buffer */
> + ASSERT_OK(read_trace_pipe_iter(trace_pipe_cb, found, 1000),
> "read_trace_pipe_iter");
>
> - if (!ASSERT_EQ(found, bss->trace_printk_ran, "found"))
> + if (!ASSERT_EQ(found[0], bss->trace_printk_ran, "found"))
> + goto cleanup;
> +
> + if (!ASSERT_EQ(found[1], bss->trace_printk_utf8_ran, "found_utf8"))
> goto cleanup;
>
> cleanup:
> diff --git a/tools/testing/selftests/bpf/progs/trace_printk.c b/tools/testing/selftests/bpf/progs/trace_printk.c
> index 6695478c2b25..97afe8b149b0 100644
> --- a/tools/testing/selftests/bpf/progs/trace_printk.c
> +++ b/tools/testing/selftests/bpf/progs/trace_printk.c
> @@ -10,13 +10,18 @@ char _license[] SEC("license") = "GPL";
>
> int trace_printk_ret = 0;
> int trace_printk_ran = 0;
> +int trace_printk_utf8_ret = 0;
> +int trace_printk_utf8_ran = 0;
>
> -const char fmt[] = "Testing,testing %d\n";
> +static const char fmt[] = "Testing,testing %d\n";
> +static const char utf8_fmt[] = "中文,测试 %d\n";
The build is failing because you made these static.
>
> SEC("fentry/" SYS_PREFIX "sys_nanosleep")
> int sys_enter(void *ctx)
> {
> trace_printk_ret = bpf_trace_printk(fmt, sizeof(fmt),
> ++trace_printk_ran);
> + trace_printk_utf8_ret = bpf_trace_printk(utf8_fmt, sizeof(utf8_fmt),
> + ++trace_printk_utf8_ran);
> return 0;
> }
Please put the selftest coverage extensions in a second patch.
pw-bot: cr
> --
> 2.20.1
>
next prev parent reply other threads:[~2026-04-14 15:26 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-14 1:40 [PATCH bpf] bpf: allow UTF-8 literals in bpf_bprintf_prepare() Yihan Ding
2026-04-14 15:26 ` Paul Chaignon [this message]
2026-04-14 19:32 ` Alan Maguire
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad5ciqi7KAq0C9A3@mail.gmail.com \
--to=paul.chaignon@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dingyihan@uniontech.com \
--cc=kernel@uniontech.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.