All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jakub Brnak <jbrnak@redhat.com>,
	peterz@infradead.org, mingo@redhat.com, irogers@google.com,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	jolsa@kernel.org, adrian.hunter@intel.com,
	kan.liang@linux.intel.com, howardchu95@gmail.com
Subject: Re: [PATCH] perf: use standard syscall tracepoint structs for augmentation
Date: Fri, 8 Aug 2025 15:21:53 -0700	[thread overview]
Message-ID: <aJZ4gXyOftCV9tHw@google.com> (raw)
In-Reply-To: <aJUJh8oDaH-2Ptll@x1>

On Thu, Aug 07, 2025 at 05:16:07PM -0300, Arnaldo Carvalho de Melo wrote:
> On Wed, Aug 06, 2025 at 04:07:32PM -0700, Namhyung Kim wrote:
> > Hello,
> > 
> > On Wed, Aug 06, 2025 at 03:00:17PM +0200, Jakub Brnak wrote:
> > > Replace custom syscall structs with the standard trace_event_raw_sys_enter
> > > and trace_event_raw_sys_exit from vmlinux.h.
> > > This fixes a data structure misalignment issue discovered on RHEL-9, which
> > > prevented BPF programs from correctly accessing syscall arguments.
> > 
> > Can you explain what the alignment issue was?  It's not clear to me what
> > makes it misaligned.
> 
> Yeah, mentioning a "misalignment" and then not spelling it out precisely
> doesn't help.
> 
> Showing the pahole output of the expected structure layout in both
> kernels and what was being used would help us to understand the problem.
> 
> For instance, here I have:
> 
> acme@x1:~/git/bpf-next$ uname -r
> 6.15.5-200.fc42.x86_64
> acme@x1:~/git/bpf-next$ pahole -E trace_event_raw_sys_enter
> struct trace_event_raw_sys_enter {
> 	struct trace_entry {
> 		short unsigned int type;                 /*     0     2 */
> 		unsigned char      flags;                /*     2     1 */
> 		unsigned char      preempt_count;        /*     3     1 */
> 		int                pid;                  /*     4     4 */
> 	} ent; /*     0     8 */
> 	long int                   id;                   /*     8     8 */
> 	long unsigned int          args[6];              /*    16    48 */
> 	/* --- cacheline 1 boundary (64 bytes) --- */
> 	char                       __data[];             /*    64     0 */
> 
> 	/* size: 64, cachelines: 1, members: 4 */
> };
> 
> acme@x1:~/git/bpf-next$
> 
> And:
> 
> ⬢ [acme@toolbx linux]$ pahole -C syscall_enter_args /tmp/build/linux/util/bpf_skel/.tmp/augmented_raw_syscalls.bpf.o 
> struct syscall_enter_args {
> 	unsigned long long         common_tp_fields;     /*     0     8 */
> 	long                       syscall_nr;           /*     8     8 */
> 	unsigned long              args[6];              /*    16    48 */
> 
> 	/* size: 64, cachelines: 1, members: 3 */
> };
> 
> ⬢ [acme@toolbx linux]$
> 
> So yes, it is "aligned", the 'id' is the 'syscall_nr' and both are at
> offset 8, then we have the syscall args starting at offset 16 in both
> cases.
> 
> The layout for rhel9 then we see the issue, the id, syscall_nr, is at
> offset 16, there is the misalignment:
> 
> sh-5.1# pahole -E -C trace_event_raw_sys_enter /usr/lib/debug/lib/modules/5.14.0-570.32.1.el9_6.x86_64/vmlinux
> struct trace_event_raw_sys_enter {
> 	struct trace_entry {
> 		short unsigned int type;                 /*     0     2 */
> 		unsigned char      flags;                /*     2     1 */
> 		unsigned char      preempt_count;        /*     3     1 */
> 		int                pid;                  /*     4     4 */
> 		unsigned char      preempt_lazy_count;   /*     8     1 */

Oh.. RHEL9 has this new field.


> 	} ent; /*     0    12 */
> 
> 	/* XXX last struct has 3 bytes of padding */
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	long int                   id;                   /*    16     8 */
> 	long unsigned int          args[6];              /*    24    48 */
> 	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
> 	char                       __data[];             /*    72     0 */
> 
> 	/* size: 72, cachelines: 2, members: 4 */
> 	/* sum members: 68, holes: 1, sum holes: 4 */
> 	/* paddings: 1, sum paddings: 3 */
> 	/* last cacheline: 8 bytes */
> };
> 
> sh-5.1#
> 
> So if we always use 'struct trace_event_raw_sys_enter' from the
> vmlinux.h generated from the BTF info and have it all as CO-RE enabled
> (preserving the access index of fields, etc) it will work on any kernel
> you install on machine.

Agreed, thanks.
Namhyung


      reply	other threads:[~2025-08-08 22:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06 13:00 [PATCH] perf: use standard syscall tracepoint structs for augmentation Jakub Brnak
2025-08-06 23:07 ` Namhyung Kim
2025-08-07 20:16   ` Arnaldo Carvalho de Melo
2025-08-08 22:21     ` Namhyung Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJZ4gXyOftCV9tHw@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=howardchu95@gmail.com \
    --cc=irogers@google.com \
    --cc=jbrnak@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.