From: Matt Bobrowski <mattbobrowski@google.com>
To: Yonghong Song <yonghong.song@linux.dev>
Cc: "Andrii Nakryiko" <andrii@kernel.org>,
"Eduard Zingerman" <eddyz87@gmail.com>, 梅开彦 <kaiyanm@hust.edu.cn>,
"Daniel Borkmann" <daniel@iogearbox.net>,
bpf <bpf@vger.kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
hust-os-kernel-patches@googlegroups.com,
"Yinhao Hu" <dddddd@hust.edu.cn>,
dzm91@hust.edu.cn, "KP Singh" <kpsingh@kernel.org>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>
Subject: Re: bpf: mmap_file LSM hook allows NULL pointer dereference
Date: Mon, 29 Dec 2025 10:33:24 +0000 [thread overview]
Message-ID: <aVJY9H-e83T7ivT4@google.com> (raw)
In-Reply-To: <9e402939-40ea-4da2-aad1-43d2afb74a83@linux.dev>
On Thu, Dec 18, 2025 at 02:51:27PM -0800, Yonghong Song wrote:
>
>
> On 12/11/25 1:39 PM, Matt Bobrowski wrote:
> > On Wed, Dec 10, 2025 at 10:02:16AM +0000, Matt Bobrowski wrote:
> > > On Wed, Dec 03, 2025 at 10:23:43AM -0800, Alexei Starovoitov wrote:
> > > > On Wed, Dec 3, 2025 at 12:47 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > > > > > We can play tricks with __weak. Like:
> > > > > >
> > > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > > > > > index 7cb6e8d4282c..60d269a85bf1 100644
> > > > > > --- a/kernel/bpf/bpf_lsm.c
> > > > > > +++ b/kernel/bpf/bpf_lsm.c
> > > > > > @@ -21,7 +21,7 @@
> > > > > > * function where a BPF program can be attached.
> > > > > > */
> > > > > > #define LSM_HOOK(RET, DEFAULT, NAME, ...) \
> > > > > > -noinline RET bpf_lsm_##NAME(__VA_ARGS__) \
> > > > > > +__weak noinline RET bpf_lsm_##NAME(__VA_ARGS__) \
> > > > > >
> > > > > > diff kernel/bpf/bpf_lsm_proto.c
> > > > > >
> > > > > > +int bpf_lsm_mmap_file(struct file *file__nullable, unsigned long reqprot,
> > > > > > + unsigned long prot, unsigned long flags)
> > > > > > +{
> > > > > > + return 0;
> > > > > > +}
> > > > > >
> > > > > > and above one with __nullable will be in vmlinux BTF.
> > > > > >
> > > > > > afaik __weak functions are not removed by linker when in non-LTO,
> > > > > > but it's still better than
> > > > > > +#define bpf_lsm_mmap_file bpf_lsm_mmap_file__original
> > > > > > No need to change bpf_lsm.h either.
> > > > > Annotating with a weak attribute would be quite nice, but the compiler
> > > > > will complain about the redefinition of the symbol
> > > > > bpf_lsm_mmap_file. To avoid this, we'd still need to rely on the
> > > > > rename and ignore dance by using the aforementioned define, which at
> > > > > that point would still result in both symbols being exposed in both
> > > > > BTF and the .text section.
> > > > Not quite. You missed this part in the above:
> > > >
> > > > > > diff kernel/bpf/bpf_lsm_proto.c
> > > > it's a different file.
> > > Yes, yes, this will work. However, as discussed, it's fundamentally
> > > reliant on a small "hack" which I've implemented within
> > > kernel/bpf/Makefile here [0] to workaround current pahole
> > > deduplication logic.
> > >
> > > Andrii and Eduard,
> > >
> > > I’d like your input on a pahole BTF generation issue which I've
> > > recently come across. In the series I just sent [0], I had to
> > > implement a workaround to force pahole to process bpf_lsm_proto.o
> > > before bpf_lsm.o.
> > >
> > > This was necessary to ensure pahole generates BTF for the strong
> > > definition of bpf_lsm_mmap_file() (in bpf_lsm_proto.c) rather than the
> > > weak definition (in bpf_lsm.c). Without this forced ordering, pahole
> > > processed the weak definition first, resulting in a state array like
> > > this:
> > >
> > > ```
> > > btf_encoder.func_states.array[N] = bpf_lsm_mmap_file (weak
> > > definition from bpf_lsm.o)
> > >
> > > btf_encoder.func_states.array[N+1] = bpf_lsm_mmap_file (strong
> > > definition from bpf_lsm_proto.o)
> > > ```
> > >
> > > Because the deduplication logic in btf_encoder__add_saved_funcs()
> > > folds duplicates (those determined by saved_functions_combine()) into
> > > the first occurrence, the resulting BTF was derived from the weak
> > > definition. This is incorrect, as the strong definition is the one
> > > actually linked into the final vmlinux image.
> > >
> > > An obvious fix that immediately came to mind here was to essentially
> > > teach pahole about strong function prototype definitions, and prefer
> > > to emit BTF for those instead of any weak defined counterparts?
> > Thinking about this a little more. Perhaps whilst in
> > btf_encoder__add_saved_funcs() we should only emit BTF for any
> > duplicated function within a CU which happen to match the
> > corresponding entry within the backing ELF symtab? We can do this by
> > checking whether the virtual address stored within DW_AT_low_pc
> > matches that of what's stored in the st_value field for the
> > corresponding ELF symtab entry? For example, for bpf_lsm_mmap_file we
>
> I think this is the correct way to do it. Basically we should
> pick the dwarf subprogram entry whose DW_AT_low_pc should match
> same-name same-low_pc ksym entry.
Yes, this exactly what I'm thinking. I'll send out a patch that makes
this amendment.
> > Output from reading the vmlinux symbol table:
> > ```
> > $ readelf -s <input> | grep bpf_lsm_mmap_file
> > 165360: ffffffff8152f9b0 16 FUNC GLOBAL DEFAULT 1 bpf_lsm_mmap_file
> > ```
> > Output from reading the vmlinux DWARF debugging information:
> > ```
> > <2a40982> DW_AT_name : (indirect string, offset: 0x1352ea): bpf_lsm_mmap_file
> > <2a40986> DW_AT_decl_file : 4
> > <2a40987> DW_AT_decl_line : 199
> > <2a40988> DW_AT_decl_column : 1
> > <2a40989> DW_AT_prototyped : 1
> > <2a40989> DW_AT_type : <0x2a1b010>
> > <2a4098d> DW_AT_low_pc : 0xffffffff8152e260
> > <2a40995> DW_AT_high_pc : 0x10
> > <2a4099d> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
> > <2a4099f> DW_AT_call_all_calls: 1
> > <2a4099f> DW_AT_sibling : <0x2a409d8>
> > <2><2a409a3>: Abbrev Number: 10 (DW_TAG_formal_parameter)
> > <2a409a4> DW_AT_name : (indirect string, offset: 0x3623df): file
> > <2a409a8> DW_AT_decl_file : 4
> > <2a409a9> DW_AT_decl_line : 199
> > <2a409aa> DW_AT_decl_column : 1
> > <2a409aa> DW_AT_type : <0x2a234ef>
> > <2a409ae> DW_AT_location : 1 byte block: 55 (DW_OP_reg5 (rdi))
> > <2><2a409b0>: Abbrev Number: 10 (DW_TAG_formal_parameter)
> > <2a409b1> DW_AT_name : (indirect string, offset: 0x23a09d): reqprot
> > <2a409b5> DW_AT_decl_file : 4
> > --
> > <2a60e0a> DW_AT_name : (indirect string, offset: 0x1352ea): bpf_lsm_mmap_file
> > <2a60e0e> DW_AT_decl_file : 1
> > <2a60e0f> DW_AT_decl_line : 15
> > <2a60e10> DW_AT_decl_column : 5
> > <2a60e11> DW_AT_prototyped : 1
> > <2a60e11> DW_AT_type : <0x2a42713>
> > <2a60e15> DW_AT_low_pc : 0xffffffff8152f9b0
> > <2a60e1d> DW_AT_high_pc : 0x10
> > <2a60e25> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
> > <2a60e27> DW_AT_call_all_calls: 1
> > <2><2a60e27>: Abbrev Number: 82 (DW_TAG_formal_parameter)
> > <2a60e28> DW_AT_name : (indirect string, offset: 0x135ede): file__nullable
> > <2a60e2c> DW_AT_decl_file : 1
> > <2a60e2c> DW_AT_decl_line : 15
> > <2a60e2d> DW_AT_decl_column : 36
> > <2a60e2e> DW_AT_type : <0x2a49f59>
> > <2a60e32> DW_AT_location : 1 byte block: 55 (DW_OP_reg5 (rdi))
> > ```
> >
> > > [0] https://lore.kernel.org/bpf/20251210090701.2753545-1-mattbobrowski@google.com/T/#me14d534fb559a349c46e094f18c63d477644d511
>
prev parent reply other threads:[~2025-12-29 10:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-02 7:09 bpf: mmap_file LSM hook allows NULL pointer dereference 梅开彦
2025-12-02 10:38 ` Matt Bobrowski
2025-12-02 14:54 ` Matt Bobrowski
2025-12-02 17:27 ` Alexei Starovoitov
2025-12-02 19:17 ` Matt Bobrowski
2025-12-02 21:40 ` Alexei Starovoitov
2025-12-03 8:47 ` Matt Bobrowski
2025-12-03 18:23 ` Alexei Starovoitov
2025-12-10 10:02 ` Matt Bobrowski
2025-12-11 21:39 ` Matt Bobrowski
2025-12-18 22:51 ` Yonghong Song
2025-12-29 10:33 ` Matt Bobrowski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aVJY9H-e83T7ivT4@google.com \
--to=mattbobrowski@google.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dddddd@hust.edu.cn \
--cc=dzm91@hust.edu.cn \
--cc=eddyz87@gmail.com \
--cc=hust-os-kernel-patches@googlegroups.com \
--cc=kaiyanm@hust.edu.cn \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.