From: Jiri Olsa <olsajiri@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Jiri Olsa <olsajiri@gmail.com>,
Juri Lelli <juri.lelli@redhat.com>, bpf <bpf@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Artem Savkov <asavkov@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
"Jose E. Marchesi" <jose.marchesi@oracle.com>
Subject: Re: NULL pointer deref when running BPF monitor program (6.11.0-rc1)
Date: Thu, 15 Aug 2024 13:48:02 +0200 [thread overview]
Message-ID: <Zr3q8ihbe8cUdpfp@krava> (raw)
In-Reply-To: <CAADnVQLYxdKn-J2-2iXKKKTg=o6xkKWzV2WyYrnmQ-j62b9STA@mail.gmail.com>
On Thu, Aug 08, 2024 at 08:43:05AM -0700, Alexei Starovoitov wrote:
> On Thu, Aug 8, 2024 at 3:46 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Tue, Aug 06, 2024 at 11:44:52AM -0700, Alexei Starovoitov wrote:
> > > On Tue, Aug 6, 2024 at 6:24 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > > Jiri,
> > > > >
> > > > > the verifier removes the check because it assumes that pointers
> > > > > passed by the kernel into tracepoint are valid and trusted.
> > > > > In this case:
> > > > > trace_sched_pi_setprio(p, pi_task);
> > > > >
> > > > > pi_task can be NULL.
> > > > >
> > > > > We cannot make all tracepoint pointers to be PTR_TRUSTED | PTR_MAYBE_NULL
> > > > > by default, since it will break a bunch of progs.
> > > > > Instead we can annotate this tracepoint arg as __nullable and
> > > > > teach the verifier to recognize such special arguments of tracepoints.
> > > >
> > > > ok, so you mean to be able to mark it in event header like:
> > > >
> > > > TRACE_EVENT(sched_pi_setprio,
> > > > TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task __nullable),
> > > >
> > > > I guess we could make pahole to emit DECL_TAG for that argument,
> > > > but I'm not sure how to propagate that __nullable info to pahole
> > > >
> > > > while wondering about that, I tried the direct fix below ;-)
> > >
> > > We don't need to rush such a hack below.
> > > No need to add decl_tag and change pahole either.
> > > The arg name is already vmlinux BTF:
> > > [51371] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
> > > '__data' type_id=61
> > > 'tsk' type_id=77
> > > 'pi_task' type_id=77
> > > [51372] FUNC '__bpf_trace_sched_pi_setprio' type_id=51371 linkage=static
> > >
> > > just need to rename "pi_task" to "pi_task__nullable"
> > > and teach the verifier.
> >
> > the problem is that btf_trace_<xxx> is typedef
> >
> > typedef void (*btf_trace_##call)(void *__data, proto);
> >
> > and dwarf does not store argument names for subroutine type entry,
> > so it's not in BTF's TYPEDEF either
> >
> > it's the btf_trace_##call typedef ID that verifier has to work with,
> > I wonder we could somehow associate that ID with __bpf_trace_##call
> > subroutine entry which has the argument names
> >
> > we could store __bpf_trace_##call's BTF_ID in __bpf_raw_tp_map record,
> > but we'd need to do the lookup based on the tracepoint name when loading
> > the program .. ATM we do the lookup __bpf_raw_tp_map record only when
> > doing attach, so we would need to move it to program load time
> >
> > or we could 'fix' the argument names in pahole, but that'd probably
> > mean extra setup and hash lookup, so also not great
>
> I would do a simple string search in vmlinux BTF for "__bpf_trace" + tp name.
> No need to add btf_id-s and waste memory to speed up the slow path.
I checked bit more and there are more tracepoints with the same issue,
the first diff stat looks like:
include/trace/events/afs.h | 44 ++++++++++++++++++++++----------------------
include/trace/events/cachefiles.h | 96 ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
include/trace/events/ext4.h | 6 +++---
include/trace/events/fib.h | 16 ++++++++--------
include/trace/events/filelock.h | 38 +++++++++++++++++++-------------------
include/trace/events/host1x.h | 10 +++++-----
include/trace/events/huge_memory.h | 24 ++++++++++++------------
include/trace/events/kmem.h | 18 +++++++++---------
include/trace/events/netfs.h | 16 ++++++++--------
include/trace/events/power.h | 6 +++---
include/trace/events/qdisc.h | 8 ++++----
include/trace/events/rxrpc.h | 12 ++++++------
include/trace/events/sched.h | 12 ++++++------
include/trace/events/sunrpc.h | 8 ++++----
include/trace/events/tcp.h | 14 +++++++-------
include/trace/events/tegra_apb_dma.h | 6 +++---
include/trace/events/timer_migration.h | 10 +++++-----
include/trace/events/writeback.h | 16 ++++++++--------
plus there's one case where pointer needs to be checked with IS_ERR in
include/trace/events/rdma_core.h trace_mr_alloc/mr_integ_alloc
I'm not excited about the '_nullable' argument suffix, because it's lot
of extra changes/renames in TP_fast_assign and it does not solve the
IS_ERR case above
I checked on the type tag and with llvm build we get the TYPE_TAG info
nicely in BTF:
[119148] TYPEDEF 'btf_trace_sched_pi_setprio' type_id=119149
[119149] PTR '(anon)' type_id=119150
[119150] FUNC_PROTO '(anon)' ret_type_id=0 vlen=3
'(anon)' type_id=27
'(anon)' type_id=678
'(anon)' type_id=119152
[119151] TYPE_TAG 'nullable' type_id=679
[119152] PTR '(anon)' type_id=119151
[679] STRUCT 'task_struct' size=15424 vlen=277
which we can easily check in verifier.. the tracepoint definition would look like:
- TP_PROTO(struct task_struct *tsk, struct task_struct *pi_task),
+ TP_PROTO(struct task_struct *tsk, struct task_struct __nullable *pi_task),
and no other change in TP_fast_assign is needed
I think using the type tag for this is nicer, but I'm not sure where's
gcc at with btf_type_tag implementation, need to check on that
jirka
next prev parent reply other threads:[~2024-08-15 11:48 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-05 9:20 NULL pointer deref when running BPF monitor program (6.11.0-rc1) Juri Lelli
2024-08-05 16:49 ` Jiri Olsa
2024-08-05 17:00 ` Alexei Starovoitov
2024-08-06 7:08 ` Juri Lelli
2024-08-06 13:17 ` Jiri Olsa
2024-08-06 13:24 ` Jiri Olsa
2024-08-06 18:44 ` Alexei Starovoitov
2024-08-08 10:46 ` Jiri Olsa
2024-08-08 15:43 ` Alexei Starovoitov
2024-08-15 11:48 ` Jiri Olsa [this message]
2024-08-15 12:37 ` Alexei Starovoitov
2024-08-16 14:10 ` Steven Rostedt
2024-08-16 18:59 ` Jiri Olsa
2024-08-16 19:30 ` Steven Rostedt
2024-08-19 11:47 ` Jiri Olsa
2024-08-19 14:05 ` Jiri Olsa
2024-08-19 15:37 ` Steven Rostedt
2024-08-20 10:17 ` Jiri Olsa
2024-08-20 15:05 ` Steven Rostedt
2024-10-02 16:30 ` Jiri Olsa
2024-10-09 20:41 ` Jiri Olsa
2024-10-10 0:33 ` Josh Poimboeuf
2024-10-10 0:56 ` Steven Rostedt
2024-10-10 0:57 ` Steven Rostedt
2024-10-10 3:17 ` Josh Poimboeuf
2024-10-10 9:00 ` Jiri Olsa
2024-10-10 13:49 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zr3q8ihbe8cUdpfp@krava \
--to=olsajiri@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=asavkov@redhat.com \
--cc=bpf@vger.kernel.org \
--cc=jose.marchesi@oracle.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox