Re: [RFC PATCH bpf-next] bpf: change syscall_nr type to int in struct syscall_tp_t

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Artem Savkov <asavkov@redhat.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	Masami Hiramatsu <mhiramat@kernel.org>,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-rt-users@vger.kernel.org, Jiri Olsa <jolsa@kernel.org>
Subject: Re: [RFC PATCH bpf-next] bpf: change syscall_nr type to int in struct syscall_tp_t
Date: Fri, 13 Oct 2023 08:01:34 +0200	[thread overview]
Message-ID: <ZSjdPqQiPdqa-UTs@wtfbox.lan> (raw)
In-Reply-To: <CAEf4BzZKWkJjOjw8x_eL_hsU-QzFuSzd5bkBH2EHtirN2hnEgA@mail.gmail.com>

On Thu, Oct 12, 2023 at 04:32:51PM -0700, Andrii Nakryiko wrote:
> On Thu, Oct 12, 2023 at 6:43 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Thu, 12 Oct 2023 13:45:50 +0200
> > Artem Savkov <asavkov@redhat.com> wrote:
> >
> > > linux-rt-devel tree contains a patch (b1773eac3f29c ("sched: Add support
> > > for lazy preemption")) that adds an extra member to struct trace_entry.
> > > This causes the offset of args field in struct trace_event_raw_sys_enter
> > > be different from the one in struct syscall_trace_enter:
> > >
> > > struct trace_event_raw_sys_enter {
> > >         struct trace_entry         ent;                  /*     0    12 */
> > >
> > >         /* XXX last struct has 3 bytes of padding */
> > >         /* XXX 4 bytes hole, try to pack */
> > >
> > >         long int                   id;                   /*    16     8 */
> > >         long unsigned int          args[6];              /*    24    48 */
> > >         /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> > >         char                       __data[];             /*    72     0 */
> > >
> > >         /* size: 72, cachelines: 2, members: 4 */
> > >         /* sum members: 68, holes: 1, sum holes: 4 */
> > >         /* paddings: 1, sum paddings: 3 */
> > >         /* last cacheline: 8 bytes */
> > > };
> > >
> > > struct syscall_trace_enter {
> > >         struct trace_entry         ent;                  /*     0    12 */
> > >
> > >         /* XXX last struct has 3 bytes of padding */
> > >
> > >         int                        nr;                   /*    12     4 */
> > >         long unsigned int          args[];               /*    16     0 */
> > >
> > >         /* size: 16, cachelines: 1, members: 3 */
> > >         /* paddings: 1, sum paddings: 3 */
> > >         /* last cacheline: 16 bytes */
> > > };
> > >
> > > This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
> > > test_profiler testcase because max_ctx_offset is calculated based on the
> > > former struct, while off on the latter:
> > >
> > >   10488         if (is_tracepoint || is_syscall_tp) {
> > >   10489                 int off = trace_event_get_offsets(event->tp_event);
> > >   10490
> > >   10491                 if (prog->aux->max_ctx_offset > off)
> > >   10492                         return -EACCES;
> > >   10493         }
> > >
> > > What bpf program is actually getting is a pointer to struct
> > > syscall_tp_t, defined in kernel/trace/trace_syscalls.c. This patch fixes
> > > the problem by aligning struct syscall_tp_t with with struct
> > > syscall_trace_(enter|exit) and changing the tests to use these structs
> > > to dereference context.
> > >
> > > Signed-off-by: Artem Savkov <asavkov@redhat.com>
> >
> 
> I think these changes make sense regardless, can you please resend the
> patch without RFC tag so that our CI can run tests for it?

Ok, didn't know it was set up like that.

> > Thanks for doing a proper fix.
> >
> > Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> 
> But looking at [0] and briefly reading some of the discussions you,
> Steven, had. I'm just wondering if it would be best to avoid
> increasing struct trace_entry altogether? It seems like preempt_count
> is actually a 4-bit field in trace context, so it doesn't seem like we
> really need to allocate an entire byte for both preempt_count and
> preempt_lazy_count. Why can't we just combine them and not waste 8
> extra bytes for each trace event in a ring buffer?
> 
>   [0] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?id=b1773eac3f29cbdcdfd16e0339f1a164066e9f71

I agree that avoiding increase in struct trace_entry size would be very
desirable, but I have no knowledge whether rt developers had reasons to
do it like this.

Nevertheless I think the issue with verifier running against a wrong
struct still needs to be addressed.

-- 
Regards,
  Artem

next prev parent reply	other threads:[~2023-10-13  6:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-02 13:52 [RFC PATCH] tracing: change syscall number type in struct syscall_trace_* Artem Savkov
2023-10-03 22:11 ` Andrii Nakryiko
2023-10-04  7:02   ` Artem Savkov
2023-10-04  1:38 ` Steven Rostedt
2023-10-04 12:55   ` Artem Savkov
2023-10-05 12:34     ` Artem Savkov
2023-10-12 11:45       ` [RFC PATCH bpf-next] bpf: change syscall_nr type to int in struct syscall_tp_t Artem Savkov
2023-10-12 13:44         ` Steven Rostedt
2023-10-12 23:32           ` Andrii Nakryiko
2023-10-13  5:42             ` [PATCH " Artem Savkov
2023-10-13 19:50               ` patchwork-bot+netdevbpf
2023-10-13  6:01             ` Artem Savkov [this message]
2023-10-13 14:00               ` [RFC PATCH " Steven Rostedt
2023-10-13 19:43                 ` Andrii Nakryiko
2023-10-16 15:53                   ` Steven Rostedt
2023-10-13  3:13         ` Rod Webster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZSjdPqQiPdqa-UTs@wtfbox.lan \
    --to=asavkov@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.