From: Namhyung Kim <namhyung@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jakub Brnak <jbrnak@redhat.com>,
peterz@infradead.org, mingo@redhat.com, irogers@google.com,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
jolsa@kernel.org, adrian.hunter@intel.com,
kan.liang@linux.intel.com, howardchu95@gmail.com
Subject: Re: [PATCH] perf: use standard syscall tracepoint structs for augmentation
Date: Fri, 8 Aug 2025 15:21:53 -0700 [thread overview]
Message-ID: <aJZ4gXyOftCV9tHw@google.com> (raw)
In-Reply-To: <aJUJh8oDaH-2Ptll@x1>
On Thu, Aug 07, 2025 at 05:16:07PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Aug 06, 2025 at 04:07:32PM -0700, Namhyung Kim wrote:
> > Hello,
> >
> > On Wed, Aug 06, 2025 at 03:00:17PM +0200, Jakub Brnak wrote:
> > > Replace custom syscall structs with the standard trace_event_raw_sys_enter
> > > and trace_event_raw_sys_exit from vmlinux.h.
> > > This fixes a data structure misalignment issue discovered on RHEL-9, which
> > > prevented BPF programs from correctly accessing syscall arguments.
> >
> > Can you explain what the alignment issue was? It's not clear to me what
> > makes it misaligned.
>
> Yeah, mentioning a "misalignment" and then not spelling it out precisely
> doesn't help.
>
> Showing the pahole output of the expected structure layout in both
> kernels and what was being used would help us to understand the problem.
>
> For instance, here I have:
>
> acme@x1:~/git/bpf-next$ uname -r
> 6.15.5-200.fc42.x86_64
> acme@x1:~/git/bpf-next$ pahole -E trace_event_raw_sys_enter
> struct trace_event_raw_sys_enter {
> struct trace_entry {
> short unsigned int type; /* 0 2 */
> unsigned char flags; /* 2 1 */
> unsigned char preempt_count; /* 3 1 */
> int pid; /* 4 4 */
> } ent; /* 0 8 */
> long int id; /* 8 8 */
> long unsigned int args[6]; /* 16 48 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> char __data[]; /* 64 0 */
>
> /* size: 64, cachelines: 1, members: 4 */
> };
>
> acme@x1:~/git/bpf-next$
>
> And:
>
> ⬢ [acme@toolbx linux]$ pahole -C syscall_enter_args /tmp/build/linux/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o
> struct syscall_enter_args {
> unsigned long long common_tp_fields; /* 0 8 */
> long syscall_nr; /* 8 8 */
> unsigned long args[6]; /* 16 48 */
>
> /* size: 64, cachelines: 1, members: 3 */
> };
>
> ⬢ [acme@toolbx linux]$
>
> So yes, it is "aligned", the 'id' is the 'syscall_nr' and both are at
> offset 8, then we have the syscall args starting at offset 16 in both
> cases.
>
> The layout for rhel9 then we see the issue, the id, syscall_nr, is at
> offset 16, there is the misalignment:
>
> sh-5.1# pahole -E -C trace_event_raw_sys_enter /usr/lib/debug/lib/modules/5.14.0-570.32.1.el9_6.x86_64/vmlinux
> struct trace_event_raw_sys_enter {
> struct trace_entry {
> short unsigned int type; /* 0 2 */
> unsigned char flags; /* 2 1 */
> unsigned char preempt_count; /* 3 1 */
> int pid; /* 4 4 */
> unsigned char preempt_lazy_count; /* 8 1 */
Oh.. RHEL9 has this new field.
> } ent; /* 0 12 */
>
> /* XXX last struct has 3 bytes of padding */
> /* XXX 4 bytes hole, try to pack */
>
> long int id; /* 16 8 */
> long unsigned int args[6]; /* 24 48 */
> /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> char __data[]; /* 72 0 */
>
> /* size: 72, cachelines: 2, members: 4 */
> /* sum members: 68, holes: 1, sum holes: 4 */
> /* paddings: 1, sum paddings: 3 */
> /* last cacheline: 8 bytes */
> };
>
> sh-5.1#
>
> So if we always use 'struct trace_event_raw_sys_enter' from the
> vmlinux.h generated from the BTF info and have it all as CO-RE enabled
> (preserving the access index of fields, etc) it will work on any kernel
> you install on machine.
Agreed, thanks.
Namhyung
prev parent reply other threads:[~2025-08-08 22:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-06 13:00 [PATCH] perf: use standard syscall tracepoint structs for augmentation Jakub Brnak
2025-08-06 23:07 ` Namhyung Kim
2025-08-07 20:16 ` Arnaldo Carvalho de Melo
2025-08-08 22:21 ` Namhyung Kim [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aJZ4gXyOftCV9tHw@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=howardchu95@gmail.com \
--cc=irogers@google.com \
--cc=jbrnak@redhat.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).