From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 35909396B68 for ; Thu, 16 Apr 2026 13:03:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776344593; cv=none; b=lmos77z1vGPZJjCVOrc5ruwzRPpTFCmM1xGA6J3qA9Q2DgdC+iDrymh1SE32x56juVTdG7HVOWY6PPdfjj78fYSlJBdXn9BPnyTJ8Mu00JGPG8CIh8YVx9Ii4XVC7tk12Y4ZrGVG+v0KyEBT0e02y2F2AZ/VFkinDre+bznsqxM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776344593; c=relaxed/simple; bh=oWEIc+lUhdxSETwB1umCKp0ytYefDHz9aJtoED2gv8E=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=jlqjCpIlO6JqnYHjfsYH/uzTw2+W1aLb+vTmypKwCcHCSNSs81uA4+Kmv/zMalrYcvaVPvD6IsJSRH3nykXSPHuNovcTxtBMgkAFrednZSbcqNuKTa3MK4QoKMXamfb+dHM0Q3mNqbOVZhn13vIB/j15LVKLIhMJ5nnFFWty6Lk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hkG0rf8x; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hkG0rf8x" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E370FC2BCAF; Thu, 16 Apr 2026 13:03:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776344593; bh=oWEIc+lUhdxSETwB1umCKp0ytYefDHz9aJtoED2gv8E=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=hkG0rf8xr1x40ybbYg3G9olG275FDvlYbCrNwUmgejYgs1v+CumxT9EwIe+JYg0v+ t/TKGSXpDW8kDcNeqD8czkkg5HMVmxWMvKR67qNl/XOH4qfXpVASvXJc8W52/x21hj B3hbQQiUmckdJFFxc9pyt3xh29XkIQlOwD31w9T1N67+Dn2C4U8RAvRImaXN8/PZD5 0HaC9eMDTI4jy7516ql5ywDJb8urA8VGDqgdKeH/oss6F1A8o7rbfZlSaXceLH1zog EtLlXMOuV5JstI97peVpBic1ci0aaE6qE2dmfVAZvFdpb+/Zc+nOZOaH9RWbKw0XLb 7XbfWxppNAigw== From: sashiko-bot@kernel.org Subject: Re: [PATCH bpf v3 1/2] bpf: allow UTF-8 literals in bpf_bprintf_prepare() Reply-To: sashiko@lists.linux.dev To: "Yihan Ding" Cc: bpf@vger.kernel.org In-Reply-To: <20260416120142.1420646-2-dingyihan@uniontech.com> References: <20260416120142.1420646-2-dingyihan@uniontech.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Thu, 16 Apr 2026 13:03:12 +0000 Message-Id: <20260416130312.E370FC2BCAF@smtp.kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Sashiko AI review found 1 potential issue(s): - [Medium] UTF-8 literals are incorrectly rejected when placed immediately = after '%s' or '%p' format specifiers. -- commit c7ffc79748019b47230d65f7c4646b2bba78f020 Author: Yihan Ding bpf: allow UTF-8 literals in bpf_bprintf_prepare() This commit modifies bpf_bprintf_prepare() to permit bytes >=3D 0x80 in plain text, allowing UTF-8 literals to pass through unchanged while still rejecting ASCII control bytes. > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index bb95e287b0dc8..2bb60200c2665 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -845,7 +845,13 @@ int bpf_bprintf_prepare(const char *fmt, u32 fmt_siz= e, const u64 *raw_args, > data->buf =3D buffers->buf; > =20 > for (i =3D 0; i < fmt_size; i++) { > - if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i])) { > + unsigned char c =3D fmt[i]; > + > + /* > + * Permit bytes >=3D 0x80 in plain text so UTF-8 literals can pass > + * through unchanged, while still rejecting ASCII control bytes. > + */ > + if (isascii(c) && !isprint(c) && !isspace(c)) { > err =3D -EINVAL; > goto out; > } Does this change overlook the lookahead checks performed on characters immediately following format specifiers like %s and %p? For example, looking at the parsing logic for %s: kernel/bpf/helpers.c:bpf_bprintf_prepare() { ... } else if (fmt[i] =3D=3D 's') { fmt_ptype =3D fmt[i]; fmt_str: if (fmt[i + 1] !=3D 0 && !isspace(fmt[i + 1]) && !ispunct(fmt[i + 1])) { err =3D -EINVAL; goto out; } ... } Because the initial loop now permits bytes >=3D 0x80, these bytes can appear immediately after a format specifier. If a UTF-8 literal is placed immediat= ely after the specifier, such as in the string "%s=E4=BD=A0=E5=A5=BD", fmt[i + = 1] will be the first byte of the UTF-8 character. Since isspace() and ispunct() only match ASCII characters, both will evaluate to false. Will this cause the parser to incorrectly reject valid UTF-8 strings with -EINVAL? A similar lookahead check exists for %p: kernel/bpf/helpers.c:bpf_bprintf_prepare() { ... if (fmt[i] =3D=3D 'p') { sizeof_cur_arg =3D sizeof(long); if (fmt[i + 1] =3D=3D 0 || isspace(fmt[i + 1]) || ispunct(fmt[i + 1])) { if (tmp_buf) cur_arg =3D raw_args[num_spec]; goto nocopy_fmt; } ... } If fmt[i + 1] is a UTF-8 character, it won't match isspace() or ispunct(). It will then fall through to the %pI4/%pi4 checks and also be rejected with -EINVAL. Can the lookahead checks be updated to also permit bytes >=3D 0x80, matching the new plain text parsing rules? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260416120142.1420= 646-1-dingyihan@uniontech.com?part=3D1