From: Matt Bobrowski <mattbobrowski@google.com>
To: Yonghong Song <yonghong.song@linux.dev>
Cc: "Andrii Nakryiko" <andrii@kernel.org>,
"Eduard Zingerman" <eddyz87@gmail.com>, 梅开彦 <kaiyanm@hust.edu.cn>,
"Daniel Borkmann" <daniel@iogearbox.net>,
bpf <bpf@vger.kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
hust-os-kernel-patches@googlegroups.com,
"Yinhao Hu" <dddddd@hust.edu.cn>,
dzm91@hust.edu.cn, "KP Singh" <kpsingh@kernel.org>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>
Subject: Re: bpf: mmap_file LSM hook allows NULL pointer dereference
Date: Mon, 29 Dec 2025 10:33:24 +0000 [thread overview]
Message-ID: <aVJY9H-e83T7ivT4@google.com> (raw)
In-Reply-To: <9e402939-40ea-4da2-aad1-43d2afb74a83@linux.dev>
On Thu, Dec 18, 2025 at 02:51:27PM -0800, Yonghong Song wrote:
>
>
> On 12/11/25 1:39 PM, Matt Bobrowski wrote:
> > On Wed, Dec 10, 2025 at 10:02:16AM +0000, Matt Bobrowski wrote:
> > > On Wed, Dec 03, 2025 at 10:23:43AM -0800, Alexei Starovoitov wrote:
> > > > On Wed, Dec 3, 2025 at 12:47 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > > > > > We can play tricks with __weak. Like:
> > > > > >
> > > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > > > index 7cb6e8d4282c..60d269a85bf1 100644
> > > > > > --- a/kernel/bpf/bpf_lsm.c
> > > > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > > > @@ -21,7 +21,7 @@
> > > > > > * function where a BPF program can be attached.
> > > > > > */
> > > > > > #define LSM_HOOK(RET, DEFAULT, NAME, ...) \
> > > > > > -noinline RET bpf_lsm_##NAME(__VA_ARGS__) \
> > > > > > +__weak noinline RET bpf_lsm_##NAME(__VA_ARGS__) \
> > > > > >
> > > > > > diff kernel/bpf/bpf_lsm_proto.c
> > > > > >
> > > > > > +int bpf_lsm_mmap_file(struct file *file__nullable, unsigned long reqprot,
> > > > > > + unsigned long prot, unsigned long flags)
> > > > > > +{
> > > > > > + return 0;
> > > > > > +}
> > > > > >
> > > > > > and above one with __nullable will be in vmlinux BTF.
> > > > > >
> > > > > > afaik __weak functions are not removed by linker when in non-LTO,
> > > > > > but it's still better than
> > > > > > +#define bpf_lsm_mmap_file bpf_lsm_mmap_file__original
> > > > > > No need to change bpf_lsm.h either.
> > > > > Annotating with a weak attribute would be quite nice, but the compiler
> > > > > will complain about the redefinition of the symbol
> > > > > bpf_lsm_mmap_file. To avoid this, we'd still need to rely on the
> > > > > rename and ignore dance by using the aforementioned define, which at
> > > > > that point would still result in both symbols being exposed in both
> > > > > BTF and the .text section.
> > > > Not quite. You missed this part in the above:
> > > >
> > > > > > diff kernel/bpf/bpf_lsm_proto.c
> > > > it's a different file.
> > > Yes, yes, this will work. However, as discussed, it's fundamentally
> > > reliant on a small "hack" which I've implemented within
> > > kernel/bpf/Makefile here [0] to workaround current pahole
> > > deduplication logic.
> > >
> > > Andrii and Eduard,
> > >
> > > I’d like your input on a pahole BTF generation issue which I've
> > > recently come across. In the series I just sent [0], I had to
> > > implement a workaround to force pahole to process bpf_lsm_proto.o
> > > before bpf_lsm.o.
> > >
> > > This was necessary to ensure pahole generates BTF for the strong
> > > definition of bpf_lsm_mmap_file() (in bpf_lsm_proto.c) rather than the
> > > weak definition (in bpf_lsm.c). Without this forced ordering, pahole
> > > processed the weak definition first, resulting in a state array like
> > > this:
> > >
> > > ```
> > > btf_encoder.func_states.array[N] = bpf_lsm_mmap_file (weak
> > > definition from bpf_lsm.o)
> > >
> > > btf_encoder.func_states.array[N+1] = bpf_lsm_mmap_file (strong
> > > definition from bpf_lsm_proto.o)
> > > ```
> > >
> > > Because the deduplication logic in btf_encoder__add_saved_funcs()
> > > folds duplicates (those determined by saved_functions_combine()) into
> > > the first occurrence, the resulting BTF was derived from the weak
> > > definition. This is incorrect, as the strong definition is the one
> > > actually linked into the final vmlinux image.
> > >
> > > An obvious fix that immediately came to mind here was to essentially
> > > teach pahole about strong function prototype definitions, and prefer
> > > to emit BTF for those instead of any weak defined counterparts?
> > Thinking about this a little more. Perhaps whilst in
> > btf_encoder__add_saved_funcs() we should only emit BTF for any
> > duplicated function within a CU which happen to match the
> > corresponding entry within the backing ELF symtab? We can do this by
> > checking whether the virtual address stored within DW_AT_low_pc
> > matches that of what's stored in the st_value field for the
> > corresponding ELF symtab entry? For example, for bpf_lsm_mmap_file we
>
> I think this is the correct way to do it. Basically we should
> pick the dwarf subprogram entry whose DW_AT_low_pc should match
> same-name same-low_pc ksym entry.
Yes, this exactly what I'm thinking. I'll send out a patch that makes
this amendment.
> > Output from reading the vmlinux symbol table:
> > ```
> > $ readelf -s <input> | grep bpf_lsm_mmap_file
> > 165360: ffffffff8152f9b0 16 FUNC GLOBAL DEFAULT 1 bpf_lsm_mmap_file
> > ```
> > Output from reading the vmlinux DWARF debugging information:
> > ```
> > <2a40982> DW_AT_name : (indirect string, offset: 0x1352ea): bpf_lsm_mmap_file
> > <2a40986> DW_AT_decl_file : 4
> > <2a40987> DW_AT_decl_line : 199
> > <2a40988> DW_AT_decl_column : 1
> > <2a40989> DW_AT_prototyped : 1
> > <2a40989> DW_AT_type : <0x2a1b010>
> > <2a4098d> DW_AT_low_pc : 0xffffffff8152e260
> > <2a40995> DW_AT_high_pc : 0x10
> > <2a4099d> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
> > <2a4099f> DW_AT_call_all_calls: 1
> > <2a4099f> DW_AT_sibling : <0x2a409d8>
> > <2><2a409a3>: Abbrev Number: 10 (DW_TAG_formal_parameter)
> > <2a409a4> DW_AT_name : (indirect string, offset: 0x3623df): file
> > <2a409a8> DW_AT_decl_file : 4
> > <2a409a9> DW_AT_decl_line : 199
> > <2a409aa> DW_AT_decl_column : 1
> > <2a409aa> DW_AT_type : <0x2a234ef>
> > <2a409ae> DW_AT_location : 1 byte block: 55 (DW_OP_reg5 (rdi))
> > <2><2a409b0>: Abbrev Number: 10 (DW_TAG_formal_parameter)
> > <2a409b1> DW_AT_name : (indirect string, offset: 0x23a09d): reqprot
> > <2a409b5> DW_AT_decl_file : 4
> > --
> > <2a60e0a> DW_AT_name : (indirect string, offset: 0x1352ea): bpf_lsm_mmap_file
> > <2a60e0e> DW_AT_decl_file : 1
> > <2a60e0f> DW_AT_decl_line : 15
> > <2a60e10> DW_AT_decl_column : 5
> > <2a60e11> DW_AT_prototyped : 1
> > <2a60e11> DW_AT_type : <0x2a42713>
> > <2a60e15> DW_AT_low_pc : 0xffffffff8152f9b0
> > <2a60e1d> DW_AT_high_pc : 0x10
> > <2a60e25> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
> > <2a60e27> DW_AT_call_all_calls: 1
> > <2><2a60e27>: Abbrev Number: 82 (DW_TAG_formal_parameter)
> > <2a60e28> DW_AT_name : (indirect string, offset: 0x135ede): file__nullable
> > <2a60e2c> DW_AT_decl_file : 1
> > <2a60e2c> DW_AT_decl_line : 15
> > <2a60e2d> DW_AT_decl_column : 36
> > <2a60e2e> DW_AT_type : <0x2a49f59>
> > <2a60e32> DW_AT_location : 1 byte block: 55 (DW_OP_reg5 (rdi))
> > ```
> >
> > > [0] https://lore.kernel.org/bpf/20251210090701.2753545-1-mattbobrowski@google.com/T/#me14d534fb559a349c46e094f18c63d477644d511
>
prev parent reply other threads:[~2025-12-29 10:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-02 7:09 bpf: mmap_file LSM hook allows NULL pointer dereference 梅开彦
2025-12-02 10:38 ` Matt Bobrowski
2025-12-02 14:54 ` Matt Bobrowski
2025-12-02 17:27 ` Alexei Starovoitov
2025-12-02 19:17 ` Matt Bobrowski
2025-12-02 21:40 ` Alexei Starovoitov
2025-12-03 8:47 ` Matt Bobrowski
2025-12-03 18:23 ` Alexei Starovoitov
2025-12-10 10:02 ` Matt Bobrowski
2025-12-11 21:39 ` Matt Bobrowski
2025-12-18 22:51 ` Yonghong Song
2025-12-29 10:33 ` Matt Bobrowski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aVJY9H-e83T7ivT4@google.com \
--to=mattbobrowski@google.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dddddd@hust.edu.cn \
--cc=dzm91@hust.edu.cn \
--cc=eddyz87@gmail.com \
--cc=hust-os-kernel-patches@googlegroups.com \
--cc=kaiyanm@hust.edu.cn \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox