From: Namhyung Kim <namhyung@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jakub Brnak <jbrnak@redhat.com>,
peterz@infradead.org, mingo@redhat.com, irogers@google.com,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
jolsa@kernel.org, adrian.hunter@intel.com,
kan.liang@linux.intel.com, howardchu95@gmail.com
Subject: Re: [PATCH] perf: use standard syscall tracepoint structs for augmentation
Date: Fri, 8 Aug 2025 15:21:53 -0700 [thread overview]
Message-ID: <aJZ4gXyOftCV9tHw@google.com> (raw)
In-Reply-To: <aJUJh8oDaH-2Ptll@x1>
On Thu, Aug 07, 2025 at 05:16:07PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Aug 06, 2025 at 04:07:32PM -0700, Namhyung Kim wrote:
> > Hello,
> >
> > On Wed, Aug 06, 2025 at 03:00:17PM +0200, Jakub Brnak wrote:
> > > Replace custom syscall structs with the standard trace_event_raw_sys_enter
> > > and trace_event_raw_sys_exit from vmlinux.h.
> > > This fixes a data structure misalignment issue discovered on RHEL-9, which
> > > prevented BPF programs from correctly accessing syscall arguments.
> >
> > Can you explain what the alignment issue was? It's not clear to me what
> > makes it misaligned.
>
> Yeah, mentioning a "misalignment" and then not spelling it out precisely
> doesn't help.
>
> Showing the pahole output of the expected structure layout in both
> kernels and what was being used would help us to understand the problem.
>
> For instance, here I have:
>
> acme@x1:~/git/bpf-next$ uname -r
> 6.15.5-200.fc42.x86_64
> acme@x1:~/git/bpf-next$ pahole -E trace_event_raw_sys_enter
> struct trace_event_raw_sys_enter {
> struct trace_entry {
> short unsigned int type; /* 0 2 */
> unsigned char flags; /* 2 1 */
> unsigned char preempt_count; /* 3 1 */
> int pid; /* 4 4 */
> } ent; /* 0 8 */
> long int id; /* 8 8 */
> long unsigned int args[6]; /* 16 48 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> char __data[]; /* 64 0 */
>
> /* size: 64, cachelines: 1, members: 4 */
> };
>
> acme@x1:~/git/bpf-next$
>
> And:
>
> ⬢ [acme@toolbx linux]$ pahole -C syscall_enter_args /tmp/build/linux/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o
> struct syscall_enter_args {
> unsigned long long common_tp_fields; /* 0 8 */
> long syscall_nr; /* 8 8 */
> unsigned long args[6]; /* 16 48 */
>
> /* size: 64, cachelines: 1, members: 3 */
> };
>
> ⬢ [acme@toolbx linux]$
>
> So yes, it is "aligned", the 'id' is the 'syscall_nr' and both are at
> offset 8, then we have the syscall args starting at offset 16 in both
> cases.
>
> The layout for rhel9 then we see the issue, the id, syscall_nr, is at
> offset 16, there is the misalignment:
>
> sh-5.1# pahole -E -C trace_event_raw_sys_enter /usr/lib/debug/lib/modules/5.14.0-570.32.1.el9_6.x86_64/vmlinux
> struct trace_event_raw_sys_enter {
> struct trace_entry {
> short unsigned int type; /* 0 2 */
> unsigned char flags; /* 2 1 */
> unsigned char preempt_count; /* 3 1 */
> int pid; /* 4 4 */
> unsigned char preempt_lazy_count; /* 8 1 */
Oh.. RHEL9 has this new field.
> } ent; /* 0 12 */
>
> /* XXX last struct has 3 bytes of padding */
> /* XXX 4 bytes hole, try to pack */
>
> long int id; /* 16 8 */
> long unsigned int args[6]; /* 24 48 */
> /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> char __data[]; /* 72 0 */
>
> /* size: 72, cachelines: 2, members: 4 */
> /* sum members: 68, holes: 1, sum holes: 4 */
> /* paddings: 1, sum paddings: 3 */
> /* last cacheline: 8 bytes */
> };
>
> sh-5.1#
>
> So if we always use 'struct trace_event_raw_sys_enter' from the
> vmlinux.h generated from the BTF info and have it all as CO-RE enabled
> (preserving the access index of fields, etc) it will work on any kernel
> you install on machine.
Agreed, thanks.
Namhyung
prev parent reply other threads:[~2025-08-08 22:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-06 13:00 [PATCH] perf: use standard syscall tracepoint structs for augmentation Jakub Brnak
2025-08-06 23:07 ` Namhyung Kim
2025-08-07 20:16 ` Arnaldo Carvalho de Melo
2025-08-08 22:21 ` Namhyung Kim [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aJZ4gXyOftCV9tHw@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=howardchu95@gmail.com \
--cc=irogers@google.com \
--cc=jbrnak@redhat.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.