linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jakub Brnak <jbrnak@redhat.com>,
	peterz@infradead.org, mingo@redhat.com, irogers@google.com,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	jolsa@kernel.org, adrian.hunter@intel.com,
	kan.liang@linux.intel.com, howardchu95@gmail.com
Subject: Re: [PATCH] perf: use standard syscall tracepoint structs for augmentation
Date: Fri, 8 Aug 2025 15:21:53 -0700	[thread overview]
Message-ID: <aJZ4gXyOftCV9tHw@google.com> (raw)
In-Reply-To: <aJUJh8oDaH-2Ptll@x1>

On Thu, Aug 07, 2025 at 05:16:07PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Aug 06, 2025 at 04:07:32PM -0700, Namhyung Kim wrote:
> > Hello,
> > 
> > On Wed, Aug 06, 2025 at 03:00:17PM +0200, Jakub Brnak wrote:
> > > Replace custom syscall structs with the standard trace_event_raw_sys_enter
> > > and trace_event_raw_sys_exit from vmlinux.h.
> > > This fixes a data structure misalignment issue discovered on RHEL-9, which
> > > prevented BPF programs from correctly accessing syscall arguments.
> > 
> > Can you explain what the alignment issue was?  It's not clear to me what
> > makes it misaligned.
> 
> Yeah, mentioning a "misalignment" and then not spelling it out precisely
> doesn't help.
> 
> Showing the pahole output of the expected structure layout in both
> kernels and what was being used would help us to understand the problem.
> 
> For instance, here I have:
> 
> acme@x1:~/git/bpf-next$ uname -r
> 6.15.5-200.fc42.x86_64
> acme@x1:~/git/bpf-next$ pahole -E trace_event_raw_sys_enter
> struct trace_event_raw_sys_enter {
> 	struct trace_entry {
> 		short unsigned int type;                 /*     0     2 */
> 		unsigned char      flags;                /*     2     1 */
> 		unsigned char      preempt_count;        /*     3     1 */
> 		int                pid;                  /*     4     4 */
> 	} ent; /*     0     8 */
> 	long int                   id;                   /*     8     8 */
> 	long unsigned int          args[6];              /*    16    48 */
> 	/* --- cacheline 1 boundary (64 bytes) --- */
> 	char                       __data[];             /*    64     0 */
> 
> 	/* size: 64, cachelines: 1, members: 4 */
> };
> 
> acme@x1:~/git/bpf-next$
> 
> And:
> 
> ⬢ [acme@toolbx linux]$ pahole -C syscall_enter_args /tmp/build/linux/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o 
> struct syscall_enter_args {
> 	unsigned long long         common_tp_fields;     /*     0     8 */
> 	long                       syscall_nr;           /*     8     8 */
> 	unsigned long              args[6];              /*    16    48 */
> 
> 	/* size: 64, cachelines: 1, members: 3 */
> };
> 
> ⬢ [acme@toolbx linux]$
> 
> So yes, it is "aligned", the 'id' is the 'syscall_nr' and both are at
> offset 8, then we have the syscall args starting at offset 16 in both
> cases.
> 
> The layout for rhel9 then we see the issue, the id, syscall_nr, is at
> offset 16, there is the misalignment:
> 
> sh-5.1# pahole -E -C trace_event_raw_sys_enter /usr/lib/debug/lib/modules/5.14.0-570.32.1.el9_6.x86_64/vmlinux
> struct trace_event_raw_sys_enter {
> 	struct trace_entry {
> 		short unsigned int type;                 /*     0     2 */
> 		unsigned char      flags;                /*     2     1 */
> 		unsigned char      preempt_count;        /*     3     1 */
> 		int                pid;                  /*     4     4 */
> 		unsigned char      preempt_lazy_count;   /*     8     1 */

Oh.. RHEL9 has this new field.


> 	} ent; /*     0    12 */
> 
> 	/* XXX last struct has 3 bytes of padding */
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	long int                   id;                   /*    16     8 */
> 	long unsigned int          args[6];              /*    24    48 */
> 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> 	char                       __data[];             /*    72     0 */
> 
> 	/* size: 72, cachelines: 2, members: 4 */
> 	/* sum members: 68, holes: 1, sum holes: 4 */
> 	/* paddings: 1, sum paddings: 3 */
> 	/* last cacheline: 8 bytes */
> };
> 
> sh-5.1#
> 
> So if we always use 'struct trace_event_raw_sys_enter' from the
> vmlinux.h generated from the BTF info and have it all as CO-RE enabled
> (preserving the access index of fields, etc) it will work on any kernel
> you install on machine.

Agreed, thanks.
Namhyung


      reply	other threads:[~2025-08-08 22:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06 13:00 [PATCH] perf: use standard syscall tracepoint structs for augmentation Jakub Brnak
2025-08-06 23:07 ` Namhyung Kim
2025-08-07 20:16   ` Arnaldo Carvalho de Melo
2025-08-08 22:21     ` Namhyung Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJZ4gXyOftCV9tHw@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=howardchu95@gmail.com \
    --cc=irogers@google.com \
    --cc=jbrnak@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).