* [PATCH bpf v2 0/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() @ 2026-04-15 3:21 Yihan Ding 2026-04-15 3:21 ` [PATCH bpf v2 1/2] " Yihan Ding 2026-04-15 3:21 ` [PATCH bpf v2 2/2] selftests/bpf: cover UTF-8 trace_printk output Yihan Ding 0 siblings, 2 replies; 6+ messages in thread From: Yihan Ding @ 2026-04-15 3:21 UTC (permalink / raw) To: bpf Cc: ast, daniel, andrii, shuah, linux-kernel, paul.chaignon, alan.maguire, kernel, Yihan Ding bpf_bprintf_prepare() currently rejects any non-ASCII byte in format strings, so helpers such as bpf_trace_printk() fail to emit UTF-8 literal text even when those bytes are not part of a format specifier. Keep plain text permissive while continuing to parse '%' sequences as ASCII-only, then extend trace_printk selftests to cover both the valid UTF-8 literal case and the invalid non-ASCII-after-'%' case. Changes in v2: - split the core change and selftest updates into two patches - drop unnecessary isspace()/ispunct() casts - add comments to clarify plain-text vs format-specifier handling - add a negative selftest for non-ASCII bytes inside '%' sequences Testing: - Reproduced on x86_64 without the core fix: ASCII trace output works, while UTF-8 literal text in bpf_trace_printk() is rejected and produces no trace output - Verified with tools/testing/selftests/bpf: ./test_progs -t trace_printk Yihan Ding (2): bpf: allow UTF-8 literals in bpf_bprintf_prepare() selftests/bpf: cover UTF-8 trace_printk output kernel/bpf/helpers.c | 16 +++++++++++- .../selftests/bpf/prog_tests/trace_printk.c | 26 +++++++++++++++---- .../selftests/bpf/progs/trace_printk.c | 9 +++++++ 3 files changed, 45 insertions(+), 6 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH bpf v2 1/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() 2026-04-15 3:21 [PATCH bpf v2 0/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() Yihan Ding @ 2026-04-15 3:21 ` Yihan Ding 2026-04-15 10:44 ` Paul Chaignon 2026-04-15 3:21 ` [PATCH bpf v2 2/2] selftests/bpf: cover UTF-8 trace_printk output Yihan Ding 1 sibling, 1 reply; 6+ messages in thread From: Yihan Ding @ 2026-04-15 3:21 UTC (permalink / raw) To: bpf Cc: ast, daniel, andrii, shuah, linux-kernel, paul.chaignon, alan.maguire, kernel, Yihan Ding bpf_bprintf_prepare() only needs ASCII parsing for conversion specifiers. Plain text can safely carry bytes >= 0x80, so allow UTF-8 literals outside '%' sequences while keeping ASCII control bytes rejected and format specifiers ASCII-only. This keeps existing parsing rules for format directives unchanged, while allowing helpers such as bpf_trace_printk() to emit UTF-8 literal text. Fixes: 48cac3f4a96d ("bpf: Implement formatted output helpers with bstr_printf") Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Yihan Ding <dingyihan@uniontech.com> --- kernel/bpf/helpers.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 6eb6c82ed2ee..6319b39c92f9 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -845,7 +845,13 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args, data->buf = buffers->buf; for (i = 0; i < fmt_size; i++) { - if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i])) { + unsigned char c = fmt[i]; + + /* + * Permit bytes >= 0x80 in plain text so UTF-8 literals can pass + * through unchanged, while still rejecting ASCII control bytes. + */ + if (isascii(c) && !isprint(c) && !isspace(c)) { err = -EINVAL; goto out; } @@ -867,6 +873,14 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args, * always access fmt[i + 1], in the worst case it will be a 0 */ i++; + /* + * The format parser below only understands ASCII conversion + * specifiers and modifiers, so reject non-ASCII after '%'. + */ + if (!isascii((unsigned char)fmt[i])) { + err = -EINVAL; + goto out; + } /* skip optional "[0 +-][num]" width formatting field */ while (fmt[i] == '0' || fmt[i] == '+' || fmt[i] == '-' || -- 2.20.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() 2026-04-15 3:21 ` [PATCH bpf v2 1/2] " Yihan Ding @ 2026-04-15 10:44 ` Paul Chaignon 2026-04-15 10:49 ` Paul Chaignon 0 siblings, 1 reply; 6+ messages in thread From: Paul Chaignon @ 2026-04-15 10:44 UTC (permalink / raw) To: Yihan Ding Cc: bpf, ast, daniel, andrii, shuah, linux-kernel, alan.maguire, kernel On Wed, Apr 15, 2026 at 11:21:25AM +0800, Yihan Ding wrote: > bpf_bprintf_prepare() only needs ASCII parsing for conversion > specifiers. Plain text can safely carry bytes >= 0x80, so allow > UTF-8 literals outside '%' sequences while keeping ASCII control > bytes rejected and format specifiers ASCII-only. > > This keeps existing parsing rules for format directives unchanged, > while allowing helpers such as bpf_trace_printk() to emit UTF-8 > literal text. > > Fixes: 48cac3f4a96d ("bpf: Implement formatted output helpers with bstr_printf") > Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> I don't think this tag is appropriate here. If you want to give credit for changes made after reviews, you can do so in the Changelogs of the cover letter :) > Signed-off-by: Yihan Ding <dingyihan@uniontech.com> > --- > kernel/bpf/helpers.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 6eb6c82ed2ee..6319b39c92f9 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -845,7 +845,13 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args, > data->buf = buffers->buf; > > for (i = 0; i < fmt_size; i++) { > - if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i])) { > + unsigned char c = fmt[i]; > + > + /* > + * Permit bytes >= 0x80 in plain text so UTF-8 literals can pass > + * through unchanged, while still rejecting ASCII control bytes. > + */ > + if (isascii(c) && !isprint(c) && !isspace(c)) { > err = -EINVAL; > goto out; > } > @@ -867,6 +873,14 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args, > * always access fmt[i + 1], in the worst case it will be a 0 > */ > i++; > + /* > + * The format parser below only understands ASCII conversion > + * specifiers and modifiers, so reject non-ASCII after '%'. > + */ > + if (!isascii((unsigned char)fmt[i])) { > + err = -EINVAL; > + goto out; > + } Acked-by: Paul Chaignon <paul.chaignon@gmail.com> > > /* skip optional "[0 +-][num]" width formatting field */ > while (fmt[i] == '0' || fmt[i] == '+' || fmt[i] == '-' || > -- > 2.20.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() 2026-04-15 10:44 ` Paul Chaignon @ 2026-04-15 10:49 ` Paul Chaignon 0 siblings, 0 replies; 6+ messages in thread From: Paul Chaignon @ 2026-04-15 10:49 UTC (permalink / raw) To: Yihan Ding Cc: bpf, ast, daniel, andrii, shuah, linux-kernel, alan.maguire, kernel On Wed, Apr 15, 2026 at 12:45:02PM +0200, Paul Chaignon wrote: > On Wed, Apr 15, 2026 at 11:21:25AM +0800, Yihan Ding wrote: > > bpf_bprintf_prepare() only needs ASCII parsing for conversion > > specifiers. Plain text can safely carry bytes >= 0x80, so allow > > UTF-8 literals outside '%' sequences while keeping ASCII control > > bytes rejected and format specifiers ASCII-only. > > > > This keeps existing parsing rules for format directives unchanged, > > while allowing helpers such as bpf_trace_printk() to emit UTF-8 > > literal text. > > > > Fixes: 48cac3f4a96d ("bpf: Implement formatted output helpers with bstr_printf") > > Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> > > I don't think this tag is appropriate here. If you want to give credit > for changes made after reviews, you can do so in the Changelogs of the > cover letter :) > > > Signed-off-by: Yihan Ding <dingyihan@uniontech.com> > > --- > > kernel/bpf/helpers.c | 16 +++++++++++++++- > > 1 file changed, 15 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > index 6eb6c82ed2ee..6319b39c92f9 100644 > > --- a/kernel/bpf/helpers.c > > +++ b/kernel/bpf/helpers.c > > @@ -845,7 +845,13 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args, > > data->buf = buffers->buf; > > > > for (i = 0; i < fmt_size; i++) { > > - if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i])) { > > + unsigned char c = fmt[i]; > > + > > + /* > > + * Permit bytes >= 0x80 in plain text so UTF-8 literals can pass > > + * through unchanged, while still rejecting ASCII control bytes. > > + */ > > + if (isascii(c) && !isprint(c) && !isspace(c)) { > > err = -EINVAL; > > goto out; > > } > > @@ -867,6 +873,14 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args, > > * always access fmt[i + 1], in the worst case it will be a 0 > > */ > > i++; > > + /* > > + * The format parser below only understands ASCII conversion > > + * specifiers and modifiers, so reject non-ASCII after '%'. > > + */ > > + if (!isascii((unsigned char)fmt[i])) { > > + err = -EINVAL; > > + goto out; > > + } > > Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Actually, this patch will require changes to fixup the existing test_snprintf_negative() selftest that is currently failing. That fixup needs to be in this commit to not break selftests during bisections. > > > > > /* skip optional "[0 +-][num]" width formatting field */ > > while (fmt[i] == '0' || fmt[i] == '+' || fmt[i] == '-' || > > -- > > 2.20.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH bpf v2 2/2] selftests/bpf: cover UTF-8 trace_printk output 2026-04-15 3:21 [PATCH bpf v2 0/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() Yihan Ding 2026-04-15 3:21 ` [PATCH bpf v2 1/2] " Yihan Ding @ 2026-04-15 3:21 ` Yihan Ding 2026-04-15 10:46 ` Paul Chaignon 1 sibling, 1 reply; 6+ messages in thread From: Yihan Ding @ 2026-04-15 3:21 UTC (permalink / raw) To: bpf Cc: ast, daniel, andrii, shuah, linux-kernel, paul.chaignon, alan.maguire, kernel, Yihan Ding Extend trace_printk coverage to verify that UTF-8 literal text is emitted successfully and that non-ASCII bytes are still rejected once parsing is inside a '%' format sequence. Suggested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Yihan Ding <dingyihan@uniontech.com> --- .../selftests/bpf/prog_tests/trace_printk.c | 26 +++++++++++++++---- .../selftests/bpf/progs/trace_printk.c | 9 +++++++ 2 files changed, 30 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/trace_printk.c b/tools/testing/selftests/bpf/prog_tests/trace_printk.c index e56e88596d64..40499f01d228 100644 --- a/tools/testing/selftests/bpf/prog_tests/trace_printk.c +++ b/tools/testing/selftests/bpf/prog_tests/trace_printk.c @@ -6,18 +6,21 @@ #include "trace_printk.lskel.h" #define SEARCHMSG "testing,testing" +#define SEARCHMSG_UTF8 "中文,测试" static void trace_pipe_cb(const char *str, void *data) { if (strstr(str, SEARCHMSG) != NULL) - (*(int *)data)++; + ((int *)data)[0]++; + if (strstr(str, SEARCHMSG_UTF8) != NULL) + ((int *)data)[1]++; } void serial_test_trace_printk(void) { struct trace_printk_lskel__bss *bss; struct trace_printk_lskel *skel; - int err = 0, found = 0; + int err = 0, found[2] = {}; skel = trace_printk_lskel__open(); if (!ASSERT_OK_PTR(skel, "trace_printk__open")) @@ -46,11 +49,24 @@ void serial_test_trace_printk(void) if (!ASSERT_GT(bss->trace_printk_ret, 0, "bss->trace_printk_ret")) goto cleanup; - /* verify our search string is in the trace buffer */ - ASSERT_OK(read_trace_pipe_iter(trace_pipe_cb, &found, 1000), + if (!ASSERT_GT(bss->trace_printk_utf8_ran, 0, "bss->trace_printk_utf8_ran")) + goto cleanup; + + if (!ASSERT_GT(bss->trace_printk_utf8_ret, 0, "bss->trace_printk_utf8_ret")) + goto cleanup; + + if (!ASSERT_LT(bss->trace_printk_utf8_spec_ret, 0, + "bss->trace_printk_utf8_spec_ret")) + goto cleanup; + + /* verify our search strings are in the trace buffer */ + ASSERT_OK(read_trace_pipe_iter(trace_pipe_cb, found, 1000), "read_trace_pipe_iter"); - if (!ASSERT_EQ(found, bss->trace_printk_ran, "found")) + if (!ASSERT_EQ(found[0], bss->trace_printk_ran, "found")) + goto cleanup; + + if (!ASSERT_EQ(found[1], bss->trace_printk_utf8_ran, "found_utf8")) goto cleanup; cleanup: diff --git a/tools/testing/selftests/bpf/progs/trace_printk.c b/tools/testing/selftests/bpf/progs/trace_printk.c index 6695478c2b25..62153d8c5eba 100644 --- a/tools/testing/selftests/bpf/progs/trace_printk.c +++ b/tools/testing/selftests/bpf/progs/trace_printk.c @@ -10,13 +10,22 @@ char _license[] SEC("license") = "GPL"; int trace_printk_ret = 0; int trace_printk_ran = 0; +int trace_printk_utf8_spec_ret = 0; +int trace_printk_utf8_ret = 0; +int trace_printk_utf8_ran = 0; const char fmt[] = "Testing,testing %d\n"; +static const char utf8_fmt[] = "中文,测试 %d\n"; +static const char utf8_spec_fmt[] = "%中文\n"; SEC("fentry/" SYS_PREFIX "sys_nanosleep") int sys_enter(void *ctx) { trace_printk_ret = bpf_trace_printk(fmt, sizeof(fmt), ++trace_printk_ran); + trace_printk_utf8_ret = bpf_trace_printk(utf8_fmt, sizeof(utf8_fmt), + ++trace_printk_utf8_ran); + trace_printk_utf8_spec_ret = bpf_trace_printk(utf8_spec_fmt, + sizeof(utf8_spec_fmt)); return 0; } -- 2.20.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH bpf v2 2/2] selftests/bpf: cover UTF-8 trace_printk output 2026-04-15 3:21 ` [PATCH bpf v2 2/2] selftests/bpf: cover UTF-8 trace_printk output Yihan Ding @ 2026-04-15 10:46 ` Paul Chaignon 0 siblings, 0 replies; 6+ messages in thread From: Paul Chaignon @ 2026-04-15 10:46 UTC (permalink / raw) To: Yihan Ding Cc: bpf, ast, daniel, andrii, shuah, linux-kernel, alan.maguire, kernel On Wed, Apr 15, 2026 at 11:21:26AM +0800, Yihan Ding wrote: > Extend trace_printk coverage to verify that UTF-8 literal text is > emitted successfully and that non-ASCII bytes are still rejected once > parsing is inside a '%' format sequence. > > Suggested-by: Alan Maguire <alan.maguire@oracle.com> > Signed-off-by: Yihan Ding <dingyihan@uniontech.com> > --- > .../selftests/bpf/prog_tests/trace_printk.c | 26 +++++++++++++++---- > .../selftests/bpf/progs/trace_printk.c | 9 +++++++ > 2 files changed, 30 insertions(+), 5 deletions(-) > > diff --git a/tools/testing/selftests/bpf/prog_tests/trace_printk.c b/tools/testing/selftests/bpf/prog_tests/trace_printk.c > index e56e88596d64..40499f01d228 100644 > --- a/tools/testing/selftests/bpf/prog_tests/trace_printk.c > +++ b/tools/testing/selftests/bpf/prog_tests/trace_printk.c > @@ -6,18 +6,21 @@ > #include "trace_printk.lskel.h" > > #define SEARCHMSG "testing,testing" > +#define SEARCHMSG_UTF8 "中文,测试" > > static void trace_pipe_cb(const char *str, void *data) > { > if (strstr(str, SEARCHMSG) != NULL) > - (*(int *)data)++; > + ((int *)data)[0]++; > + if (strstr(str, SEARCHMSG_UTF8) != NULL) > + ((int *)data)[1]++; > } > > void serial_test_trace_printk(void) > { > struct trace_printk_lskel__bss *bss; > struct trace_printk_lskel *skel; > - int err = 0, found = 0; > + int err = 0, found[2] = {}; > > skel = trace_printk_lskel__open(); > if (!ASSERT_OK_PTR(skel, "trace_printk__open")) > @@ -46,11 +49,24 @@ void serial_test_trace_printk(void) > if (!ASSERT_GT(bss->trace_printk_ret, 0, "bss->trace_printk_ret")) > goto cleanup; > > - /* verify our search string is in the trace buffer */ > - ASSERT_OK(read_trace_pipe_iter(trace_pipe_cb, &found, 1000), > + if (!ASSERT_GT(bss->trace_printk_utf8_ran, 0, "bss->trace_printk_utf8_ran")) > + goto cleanup; > + > + if (!ASSERT_GT(bss->trace_printk_utf8_ret, 0, "bss->trace_printk_utf8_ret")) > + goto cleanup; > + > + if (!ASSERT_LT(bss->trace_printk_utf8_spec_ret, 0, > + "bss->trace_printk_utf8_spec_ret")) > + goto cleanup; > + > + /* verify our search strings are in the trace buffer */ > + ASSERT_OK(read_trace_pipe_iter(trace_pipe_cb, found, 1000), > "read_trace_pipe_iter"); > > - if (!ASSERT_EQ(found, bss->trace_printk_ran, "found")) > + if (!ASSERT_EQ(found[0], bss->trace_printk_ran, "found")) > + goto cleanup; > + > + if (!ASSERT_EQ(found[1], bss->trace_printk_utf8_ran, "found_utf8")) > goto cleanup; > > cleanup: > diff --git a/tools/testing/selftests/bpf/progs/trace_printk.c b/tools/testing/selftests/bpf/progs/trace_printk.c > index 6695478c2b25..62153d8c5eba 100644 > --- a/tools/testing/selftests/bpf/progs/trace_printk.c > +++ b/tools/testing/selftests/bpf/progs/trace_printk.c > @@ -10,13 +10,22 @@ char _license[] SEC("license") = "GPL"; > > int trace_printk_ret = 0; > int trace_printk_ran = 0; > +int trace_printk_utf8_spec_ret = 0; > +int trace_printk_utf8_ret = 0; > +int trace_printk_utf8_ran = 0; > > const char fmt[] = "Testing,testing %d\n"; > +static const char utf8_fmt[] = "中文,测试 %d\n"; > +static const char utf8_spec_fmt[] = "%中文\n"; What's the purpose of the second string here? > > SEC("fentry/" SYS_PREFIX "sys_nanosleep") > int sys_enter(void *ctx) > { > trace_printk_ret = bpf_trace_printk(fmt, sizeof(fmt), > ++trace_printk_ran); > + trace_printk_utf8_ret = bpf_trace_printk(utf8_fmt, sizeof(utf8_fmt), > + ++trace_printk_utf8_ran); > + trace_printk_utf8_spec_ret = bpf_trace_printk(utf8_spec_fmt, > + sizeof(utf8_spec_fmt)); > return 0; > } > -- > 2.20.1 > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-15 10:49 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-15 3:21 [PATCH bpf v2 0/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() Yihan Ding 2026-04-15 3:21 ` [PATCH bpf v2 1/2] " Yihan Ding 2026-04-15 10:44 ` Paul Chaignon 2026-04-15 10:49 ` Paul Chaignon 2026-04-15 3:21 ` [PATCH bpf v2 2/2] selftests/bpf: cover UTF-8 trace_printk output Yihan Ding 2026-04-15 10:46 ` Paul Chaignon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox