Linux Trace Kernel
 help / color / mirror / Atom feed
* Re: [PATCH v4 2/3] perf: enable unprivileged syscall tracing with perf trace
From: Peter Zijlstra @ 2026-05-18 21:41 UTC (permalink / raw)
  To: Anubhav Shelat
  Cc: mpetlan, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Thomas Falcon, linux-kernel, linux-trace-kernel,
	linux-perf-users
In-Reply-To: <20260515194010.93725-4-ashelat@redhat.com>

On Fri, May 15, 2026 at 03:40:06PM -0400, Anubhav Shelat wrote:
> Allow unprivileged users to trace their own processes' syscalls using
> perf trace, similar to strace without the intrusive overhead of ptrace().
> 
> Currently, perf trace requires CAP_PERFMON or paranoid level ≤ 1 even
> though the kernel has existing infrastructure (TRACE_EVENT_FL_CAP_ANY)
> specifically designed to mark syscall tracepoints as safe for
> unprivileged access. To fix this:
> 
> 1. Loosen the condition in perf_event_open() which requires privileges
>    for all events with exclude_kernel=0. This allows perf_event_open() to
>    bypass the paranoid check for task-attached tracepoint events. Ensure
>    that sample types which can expose kernel addresses to unprivileged
>    users are blocked. Ensure the PERF_SECURITY_KERNEL LSM hook is
>    preserved.
> 
> 2. Make the format and id tracefs files world-readable only for tracepoints
>    with TRACE_EVENT_FL_CAP_ANY, allowing unprivileged users to see syscall
>    tracepoint ids without exposing sensitive information.
> 
> 3. Add a check to perf_trace_event_perm() to block PERF_SAMPLE_IP on
>    kernel tracepoints for unprivileged users to prevent KASLR bypass. We do
>    this here rather than in kaddr_leak because perf_trace_event_perm() can
>    distinguish between kernel tracepoints and uprobe tracepoints, where the
>    IP is a safe user space address and is necessary for uprobe
>    functionality.
> 
> 4. Restrict pure counting events (no PERF_SAMPLE_RAW) to
>    TRACE_EVENT_FL_CAP_ANY tracepoints preventing unprivileged users from
>    counting internal kernel tracepoints while preserving current
>    behavior for exclude_kernel=1 events.

Typically patches are supposed to a single thing, you're listing 4
things. What gives?

> Example usage after this change:
>   $ perf trace ls          # works as unprivileged user
>   $ perf trace             # system-wide, still requires privileges
>   $ perf trace -p 1234     # requires ptrace permission on pid 1234
> 
> Assisted-by: Claude:claude-sonnet-4.5
> Signed-off-by: Anubhav Shelat <ashelat@redhat.com>
> ---
>  kernel/events/core.c            | 28 +++++++++++++++++++++++++---
>  kernel/trace/trace_event_perf.c | 21 ++++++++++++++++++++-
>  kernel/trace/trace_events.c     | 16 ++++++++++++++--
>  3 files changed, 59 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 7935d5663944..ff2d1e9a0b79 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -13873,9 +13873,31 @@ SYSCALL_DEFINE5(perf_event_open,
>  		return err;
>  
>  	if (!attr.exclude_kernel) {
> -		err = perf_allow_kernel();
> -		if (err)
> -			return err;
> +		bool tp_bypass = false;
> +
> +		/* Check unprivileged tracepoints */
> +		if (attr.type == PERF_TYPE_TRACEPOINT && pid != -1) {
> +			/*
> +			 * Block sample types that expose kernel addresses to
> +			 * prevent KASLR bypass
> +			 */
> +			u64 kaddr_leak = PERF_SAMPLE_CALLCHAIN |
> +					 PERF_SAMPLE_BRANCH_STACK |
> +					 PERF_SAMPLE_ADDR |
> +					 PERF_SAMPLE_REGS_INTR;

PERF_SAMPLE_IP should be here too, no?

And I'm not sure if tracepoints can trigger it, but PHYS_ADDR also seems
something we shouldn't allow.

And we're sure RAW doesn't include pointers?

> +
> +			tp_bypass = !(attr.sample_type & kaddr_leak);
> +		}
> +
> +		if (!tp_bypass) {
> +			err = perf_allow_kernel();
> +			if (err)
> +				return err;
> +		} else {
> +			err = security_perf_event_open(PERF_SECURITY_KERNEL);
> +			if (err)
> +				return err;
> +		}
>  	}
>  
>  	if (attr.namespaces) {
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index a6bb7577e8c5..466007ed2869 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -72,9 +72,28 @@ static int perf_trace_event_perm(struct trace_event_call *tp_event,
>  			return -EINVAL;
>  	}
>  
> +	/*
> +	 * PERF_SAMPLE_IP on kernel tracepoints exposes a kernel text
> +	 * address, weakening KASLR. Block for unprivileged users unless
> +	 * the tracepoint is a uprobe (userspace IP, safe to expose).
> +	 */
> +	if ((p_event->attr.sample_type & PERF_SAMPLE_IP) &&
> +	    !p_event->attr.exclude_kernel &&
> +	    !(tp_event->flags & TRACE_EVENT_FL_UPROBE) &&
> +	    sysctl_perf_event_paranoid > 1 && !perfmon_capable())
> +		return -EACCES;
> +
>  	/* No tracing, just counting, so no obvious leak */
> -	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW))
> +	if (!(p_event->attr.sample_type & PERF_SAMPLE_RAW)) {
> +		/* Prevent unprivileged users from counting kernel tracepoints */
> +		if (!p_event->attr.exclude_kernel &&
> +		    sysctl_perf_event_paranoid > 1 && !perfmon_capable()) {
> +			if (!(p_event->attach_state == PERF_ATTACH_TASK &&
> +			      (tp_event->flags & TRACE_EVENT_FL_CAP_ANY)))
> +				return -EACCES;
> +		}
>  		return 0;
> +	}

Maybe use less AI and try and type this yourself. I think you'll find
that repeating the same clauses over and over gets tiresome. IIRC they
invented something for that in the 60s or so :/

>  	/* Some events are ok to be traced by non-root users... */
>  	if (p_event->attach_state == PERF_ATTACH_TASK) {
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index c46e623e7e0d..cbd07e2ec528 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -3050,7 +3050,13 @@ static int event_callback(const char *name, umode_t *mode, void **data,
>  	struct trace_event_call *call = file->event_call;
>  
>  	if (strcmp(name, "format") == 0) {
> -		*mode = TRACE_MODE_READ;
> +		/*
> +		 * Make format tracefs file world readable for tracepoints with
> +		 * TRACE_EVENT_FL_CAP_ANY
> +		 */
> +		*mode = (call->flags & TRACE_EVENT_FL_CAP_ANY) ?
> +			(TRACE_MODE_READ | 0004) :
> +			TRACE_MODE_READ;
>  		*fops = &ftrace_event_format_fops;
>  		return 1;
>  	}
> @@ -3086,7 +3092,13 @@ static int event_callback(const char *name, umode_t *mode, void **data,
>  #ifdef CONFIG_PERF_EVENTS
>  	if (call->event.type && call->class->reg &&
>  	    strcmp(name, "id") == 0) {
> -		*mode = TRACE_MODE_READ;
> +		/*
> +		 * Make id tracefs file world readable for tracepoints with
> +		 * TRACE_EVENT_FL_CAP_ANY
> +		 */
> +		*mode = (call->flags & TRACE_EVENT_FL_CAP_ANY) ?
> +			(TRACE_MODE_READ | 0004) :
> +			TRACE_MODE_READ;
>  		*data = (void *)(long)call->event.type;
>  		*fops = &ftrace_event_id_fops;
>  		return 1;

Again, you're doing the same thing in multiple places. If only there was
something to re-use a previous expression.

None of this gives me warm and fuzzy feelings.

^ permalink raw reply

* Re: [PATCH v3 10/11] kernel: time, trace: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Remanan Pillai @ 2026-05-18 23:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anna-Maria Behnsen, Frederic Weisbecker, Ingo Molnar,
	Steven Rostedt, Masami Hiramatsu, linux-kernel,
	linux-trace-kernel, Peter Zijlstra
In-Reply-To: <87jyt2xzj6.ffs@tglx>

On Sun, May 17, 2026 at 3:31 AM Thomas Gleixner <tglx@kernel.org> wrote:
>
> On Fri, May 15 2026 at 09:59, Vineeth Pillai wrote:
> > ---
> >  kernel/time/tick-sched.c       | 12 ++++++------
> >  kernel/trace/trace_benchmark.c |  2 +-
> >  2 files changed, 7 insertions(+), 7 deletions(-)
>
> Please split that into a tick/sched and trace patch so each can be picked
> up in the relevant subsystems.
>
Sorry about this, will split and send it in next iteration.

Thanks,
Vineeth

^ permalink raw reply

* Re: [PATCH v3 06/11] drm: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Remanan Pillai @ 2026-05-18 23:20 UTC (permalink / raw)
  To: phasta
  Cc: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Harry Wentland, Leo Li, Matthew Brost, Danilo Krummrich,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, amd-gfx,
	dri-devel, Steven Rostedt, linux-trace-kernel, Peter Zijlstra
In-Reply-To: <81783d0807a5ffac93f61eddba0d2f595d7f239f.camel@mailbox.org>

On Mon, May 18, 2026 at 11:01 AM Philipp Stanner <phasta@mailbox.org> wrote:
>
> On Fri, 2026-05-15 at 09:59 -0400, Vineeth Pillai (Google) wrote:
> > From: Vineeth Pillai <vineeth@bitbyteword.org>
> >
> > Replace trace_foo() with the new trace_call__foo() at sites already
> > guarded by trace_foo_enabled(), avoiding a redundant
> > static_branch_unlikely() re-evaluation inside the tracepoint.
> > trace_call__foo() calls the tracepoint callbacks directly without
> > utilizing the static branch again.
>
> The "foo" terminology is unusual I think? I always wrote it with regex,
> like "trace_*()".
>
Sorry about the terminology. Part of the patches got merged this way,
so is it okay to continue the terminology to have consistency?

>
>
> >
> > Original v2 series:
> > https://lore.kernel.org/linux-trace-kernel/20260323160052.17528-1-vineeth@bitbyteword.org/
>
> I'd put this in a Link: tag section below.
>
Makes sense, will do. Steve also suggested to put this whole section
after "---" because it isn't relevant to the changes. Will fix this in
next iteration.

> >
> > Parts of the original v2 series have already been merged in mainline.
> > This patch is being reposted as a follow-up cleanup for the remaining
> > unmerged pieces.
>
> So this v3 series as a whole is a followup to that v2?
>
v3 is a follow up to remaining patches that were not merged with the
previous cycle. The core api and couple of patches went in the
previous cycle, so this is for rest of it.

The intention was to send this v3 as a direct patch to individual
subsystem maintainers but forgot to remove the numbering and hence
there might be a confusion. Will remove the numbering and send it  as
stand alone patch in the next iteration.

> >
> > Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> > Assisted-by: Claude:claude-sonnet-4-6
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c            |  2 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c            |  4 ++--
> >  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 +++++-----
> >  drivers/gpu/drm/scheduler/sched_entity.c          |  5 +++--
> >  4 files changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index b24d5d21be5f..cb0b5cb07d57 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -1004,7 +1004,7 @@ static void trace_amdgpu_cs_ibs(struct amdgpu_cs_parser *p)
> >               struct amdgpu_job *job = p->jobs[i];
> >
> >               for (j = 0; j < job->num_ibs; ++j)
> > -                     trace_amdgpu_cs(p, job, &job->ibs[j]);
> > +                     trace_call__amdgpu_cs(p, job, &job->ibs[j]);
> >       }
> >  }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > index 9ba9de16a27a..a36ae94c425f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > @@ -1415,7 +1415,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
> >
> >       if (trace_amdgpu_vm_bo_mapping_enabled()) {
> >               list_for_each_entry(mapping, &bo_va->valids, list)
> > -                     trace_amdgpu_vm_bo_mapping(mapping);
> > +                     trace_call__amdgpu_vm_bo_mapping(mapping);
> >       }
> >
> >  error_free:
> > @@ -2183,7 +2183,7 @@ void amdgpu_vm_bo_trace_cs(struct amdgpu_vm *vm, struct ww_acquire_ctx *ticket)
> >                               continue;
> >               }
> >
> > -             trace_amdgpu_vm_bo_cs(mapping);
> > +             trace_call__amdgpu_vm_bo_cs(mapping);
> >       }
> >  }
> >
> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > index 5fc5d5608506..fbdc12cdd6bb 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > @@ -5263,11 +5263,11 @@ static void amdgpu_dm_backlight_set_level(struct amdgpu_display_manager *dm,
> >       }
> >
> >       if (trace_amdgpu_dm_brightness_enabled()) {
> > -             trace_amdgpu_dm_brightness(__builtin_return_address(0),
> > -                                        user_brightness,
> > -                                        brightness,
> > -                                        caps->aux_support,
> > -                                        power_supply_is_system_supplied() > 0);
> > +             trace_call__amdgpu_dm_brightness(__builtin_return_address(0),
> > +                                              user_brightness,
> > +                                              brightness,
> > +                                              caps->aux_support,
> > +                                              power_supply_is_system_supplied() > 0);
> >       }
> >
> >       if (caps->aux_support) {
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > index fe174a4857be..185a2636b599 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -429,7 +429,8 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity,
> >
> >       if (trace_drm_sched_job_unschedulable_enabled() &&
> >           !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &entity->dependency->flags))
> > -             trace_drm_sched_job_unschedulable(sched_job, entity->dependency);
> > +             trace_call__drm_sched_job_unschedulable(sched_job,
> > +                                                     entity->dependency);
>
> I would be more happy if you sacrifice a bit of space here and keep it
> a single line since the if condition is already quite convoluted and
> challenging to read.
>
I understand, will fix it in next iteration.

Thanks,
Vineeth

^ permalink raw reply

* Re: [PATCH v3 08/11] scsi: ufs: Use trace_call__##name() at guarded tracepoint call sites
From: Vineeth Remanan Pillai @ 2026-05-18 23:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Steven Rostedt, James E.J. Bottomley, Martin K. Petersen,
	linux-scsi, linux-trace-kernel, Peter Zijlstra
In-Reply-To: <ebdc020e-419d-458a-9211-36f22af0c1d9@acm.org>

On Fri, May 15, 2026 at 3:22 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 5/15/26 11:50 AM, Steven Rostedt wrote:
> > On Fri, 15 May 2026 08:27:27 -0700
> > Bart Van Assche <bvanassche@acm.org> wrote:
> >
> >> On 5/15/26 6:59 AM, Vineeth Pillai (Google) wrote:
> >>>    static void ufshcd_add_query_upiu_trace(struct ufs_hba *hba,
> >>> @@ -432,8 +432,8 @@ static void ufshcd_add_query_upiu_trace(struct ufs_hba *hba,
> >>>     if (!trace_ufshcd_upiu_enabled())
> >>>             return;
> >>>
> >>> -   trace_ufshcd_upiu(hba, str_t, &rq_rsp->header,
> >>> -                     &rq_rsp->qr, UFS_TSF_OSF);
> >>> +   trace_call__ufshcd_upiu(hba, str_t, &rq_rsp->header,
> >>> +                          &rq_rsp->qr, UFS_TSF_OSF);
> >>>    }
> >>
> >> Instead of making this change, please remove the
> >> trace_ufshcd_upiu_enabled() call because it is redundant.
> >
> > You mean to remove the ufshcd_add_query_upiu_trace() function and just use
> > a tracepoint where it is called?
>
> That would be even better.
>
Will do.

> >>>    static void ufshcd_add_tm_upiu_trace(struct ufs_hba *hba, unsigned int tag,
> >>> @@ -445,15 +445,15 @@ static void ufshcd_add_tm_upiu_trace(struct ufs_hba *hba, unsigned int tag,
> >>>             return;
> >>>
> >>>     if (str_t == UFS_TM_SEND)
> >>> -           trace_ufshcd_upiu(hba, str_t,
> >>> -                             &descp->upiu_req.req_header,
> >>> -                             &descp->upiu_req.input_param1,
> >>> -                             UFS_TSF_TM_INPUT);
> >>> +           trace_call__ufshcd_upiu(hba, str_t,
> >>> +                                   &descp->upiu_req.req_header,
> >>> +                                   &descp->upiu_req.input_param1,
> >>> +                                   UFS_TSF_TM_INPUT);
> >>>     else
> >>> -           trace_ufshcd_upiu(hba, str_t,
> >>> -                             &descp->upiu_rsp.rsp_header,
> >>> -                             &descp->upiu_rsp.output_param1,
> >>> -                             UFS_TSF_TM_OUTPUT);
> >>> +           trace_call__ufshcd_upiu(hba, str_t,
> >>> +                                   &descp->upiu_rsp.rsp_header,
> >>> +                                   &descp->upiu_rsp.output_param1,
> >>> +                                   UFS_TSF_TM_OUTPUT);
> >>>    }
> >>
> >> Same comment here: I think it would be better to remove the
> >> trace_ufshcd_upiu_enabled() call rather than
> >> changing trace_ufshcd_upiu() into trace_call__ufshcd_upiu().
> >
> > Well, removing it here would mean placing the if (str == UFS_TM_SEND) into
> > the code and processing it even when tracing is disabled. With the
> > trace_*_enabled() helper, it's all a nop.
>
> The ufshcd_add_tm_upiu_trace() function is only called from the UFS
> error handler and hence is not performance sensitive. The execution of
> an additional if-test in this function is not a concern at all.
>
Sure, I shall change this.

Thanks,
Vineeth

^ permalink raw reply

* [PATCH 00/28] mm/damon: introduce data attributes monitoring
From: SeongJae Park @ 2026-05-18 23:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Liam R. Howlett, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Masami Hiramatsu,
	Mathieu Desnoyers, Michal Hocko, Mike Rapoport, Shuah Khan,
	Shuah Khan, Steven Rostedt, Suren Baghdasaryan, Vlastimil Babka,
	damon, linux-doc, linux-kernel, linux-kselftest, linux-mm,
	linux-trace-kernel

TL; DR
======

Extend DAMON for monitoring general data attributes other than accesses.
The short term motivation is lightweight page type (e.g., belonging
cgroup) aware monitoring.  In long term, this will help extending DAMON
for multiple access events capture primitives (e.g., page faults and
PMU) and eventually pivotting DAMON to a "Data Attributes Monitoring and
Operations eNgine" in long term.

Background: High Cost of Page Level Properties Monitoring
=========================================================

DAMON is initially introduced as a Data Access MONitor.  It has been
extended for not only access monitoring but also data access-aware
system operations (DAMOS).  But still the monitoring part is only for
data accesses.

Data access patterns is good information, but some users need more
holistic views.  Particularly, users want to show the access pattern
information together with the types of the memory.  For example, users
who work for making huge pages efficiently want to know how much of
DAMON-found hot/cold regions are backed by huge pages.  Users who run
multiple workloads with different cgroups want to know how much of
DAMON-found hot/cold regions belong to specific cgroups.

For the user demand, we developed a DAMOS extension for page level
properties based monitoring [1], which has landed on 6.14.  Using the
feature, users can inform the page level data properties that they are
interested in, in a flexible format that uses DAMOS filters.  Then,
DAMON applies the filters to each folio of the entire DAMON region and
lets users know how many bytes of memory in each DAMON region passed the
given filters.

This gives page level detailed and deterministic information to users.
But, because the operation is done at page level, the overhead is
proportional to the memory size.  It was useful for test or debugging
purposes on a small number of machines.  But it was obviously too heavy
to be enabled always on all machines running the real user workloads.
For real world workloads, it was recommended to use the feature with
user-space controlled sampling approaches.  For example, users could do
the page level monitoring only once per hour, on randomly selected one
percent of machines of their fleet.  If the runtime and the  size of the
fleet is long and big enough, it should provide statistically meaningful
data.

But users are too busy to implement such controls on their own.

Data Attributes Monitoring
==========================

Extend DAMON to monitor not only data accesses, but also general data
attributes.  Do the extension while keeping the main promise of DAMON,
the bounded and best-effort minimum overhead.

Allow users to specify what data attributes in addition to the data
access they want to monitor.  Users can install one 'data probe' per
data attribute of their interest for this purpose.  The 'data probe'
should be able to be applied to any memory, and determine if the given
memory has the appropriate data attribute.  E.g., if memory of physical
address 42 belongs to cgroup A.  Each 'data probe' is configured with
filters that are very similar to the DAMOS filters.

When DAMON checks if each sampling address memory of each region is
accessed since the last check, it applies data probes if registered.
Same to the number of access check-positive samples accounting
(nr_accesses), it accounts the number of each data probe-positive
samples in another per-region counters array, namely 'probe_hits'. When
DAMON resets nr_accesses every aggregation interval, it resets
'probe_hits' together.

Users can read 'probe_hits' just before the values are reset.  In this
way, users can know how many hot/cold memory regions have data
attributes of their interest.  E.g., 30 percent of this system's hot
memory is belonging to cgroup A, and 80 percent of the cgroup
A-belonging hot memory is backed by huge pages.

Patches Sequence
================

First eight patches implement the core feature, interface and the
working support.  Patch 1 introduces data probe data structure, namely
damon_probe.  Patch 2 extends damon_ctx for installing data probes.
Patch 3 introduces another data structure for filters of each data
probe, namely damon_filter.  Patch 4 updates damon_ctx commit function
to handle the probes.  Patch 5 extends damon_region for the per-region
per-probe positive samples counter, namely probe_hits.  Patch 6 extends
damon_operations for applying probes on the underlying DAMON operations
implementation.  Patch 7 updates kdamond_fn() to invoke the probes
applying callback.  Patch 8 finally implements the probes support on
paddr ops.

Ten changes for user interface (patches 9-18) come next.  Patches 9-13
implements sysfs directories and files for setting data probes, namely
probes directory, probe directory, filters directory, filter directory
and filter directory internal files, respectively.  Patch 14 connects
the user inputs that are made via the sysfs files to DAMON core.
Following three patches (patches 15-17) implement sysfs directories and
files for showing the probe_hits to users, namely probes directory,
probe directory and hits files, respectively.  Patch 18 introduces a new
tracepoint for showing the probe_hits via tracefs.

Patch 19 adds a selftest for the sysfs files.

Patches 20 and 21 documents the design and usage of the new feature,
respectively.

Seven additional patches (patches 22-28) for monitoring belonging memory
cgroup follow.  Depending on the feedback, this part might be separated
to another series in future.  Patch 22 defines the DAMON filter type for
the new attribute, namely DAMON_FILTER_TYPE_MEMCG.  Patch 23 add the
support on paddr ops.  Patch 24 updates the sysfs interface for setup of
the target memcg.  Patch 25 move code for easy reuse of the filter
target memcg setup.  Patch 26 connects the user input to the core layer.
Finally, patches 27 and 28 update the design and usage documents for the
memcg attribute monitoring support.

Discussions
===========

This allows the page properties monitoring with overhead that is low
enough to be enabled always on real world workloads.  Because the
sampling time for access check is reused for data attributes check,  the
upper-bounded and best-effort minimum overhead of DAMON is kept.
Because the sampling memory for access check is reused for data
attributes check, additional overhead is minimum.

Still DAMOS-based page level properties monitoring should be useful,
because it provides a deterministic page level information.  When in
doubt of the sampling based information, running DAMOS-based one
together and comparing the results would be useful, for debugging and
tuning.

Plan for Dropping RFC tag
=========================

Making changes for feedback from myself, humans and Sashiko should be
the major remaining work.

I'm currently hoping to drop the RFC tag by 7.2-rc1.

Future Works: Mid Term
========================

This version of implementation is limiting the maximum number of data
probes to four.  I will try to find a way to remove the limit in future.
I personally think it should be enough for common use cases, though, and
therefore not giving high priority at the moment.

Future Works: Long Term
=======================

There are user requests for extending DAMON with detailed access
information, for example, per-CPUs/threads/read/writes monitoring.  For
that, I was working [2] on extending DAMON to use page fault events as
another access check primitives, and making the infrastructure flexible
for future use of yet another access check primitive.  Actually there is
another ongoing work [3] for extending DAMON with PMU events.  The
motivation of the work is reducing the overhead, though.

In my work [2], I was introducing a new interface for access sampling
primitives control.  Now I think this data probe interface can be used
for that, too.  That is, data access becomes just one type of data
attribute.  Also, pg_idle-confirmed access, page fault-confirmed access,
and PMU event-confirmed access will be different types of data
attributes.

The regions adjustment mechanism is currently working based on the
access information.  That's because DAMON is designed for data access
monitoring.  That is, data access information is the primary interest,
and therefore DAMON adjusts regions in a way that can best-present the
information.

Once data access becomes just one of data attributes, there is no reason
to think data access that special.  There might be some users not
interested in access at all but want to know the location of memory of
specific type.  Data probes interface will allow doing that.  Further,
we could extend the interface to let users set any data attribute as the
'primary' attribute.  Then, DAMON will split and merge regions in a way
that can best-present the 'primary' attributes.

DAMOS will also be extended, to specify targets based on not only the
data access pattern, but all user-registered data attributes.  From this
stage, we may be able to call DAMON as a "Data Attributes Monitoring and
Operations eNgine".

[1] https://lore.kernel.org/20250106193401.109161-1-sj@kernel.org
[2] https://lore.kernel.org/20251208062943.68824-1-sj@kernel.org/
[3] https://lore.kernel.org/20260423004211.7037-1-akinobu.mita@gmail.com

Changes from RFC v3
- rfc v3: https://lore.kernel.org/20260516183712.81393-1-sj@kernel.org
- Wordsmithing documentation.
- Drop RFC tag.
- Rebase to mm-new.
Changes from RFC v2.2
- rfc v2.2: https://lore.kernel.org/20260515004433.128933-1-sj@kernel.org
- Rename damon_aggregated_v2 trace event to damon_region_aggregated.
- Address Sashiko issues.
  - Enclose arguments on damon_for_each_{probe,filter}[_safe]() macros.
  - Fix typos in comments and documents.
  - Update probe_hits for region split and merge.
  - Add more documentation for damon_operation->apply_probes() callback.
  - Reduce unnecessary folio_{get,put}() in damon_pa_apply_probes().
  - Define damon_sysfs_probe_attrs as static.
  - Link scheme tried region sysfs dir and increase the count only after
    all internal dir population success.
  - Commit damon_filter->memcg_id for newly added filters.
Changes from RFC v2.1
- rfc v2.1: https://lore.kernel.org/20260514140904.119781-1-sj@kernel.org
- Rebase to mm-stable (7.1-rc3) to avoid Sashiko patch apply failure.
Changes from RFC v2
- rfc v2: https://lore.kernel.org/20260512143645.113201-1-sj@kernel.org
- Optimize nr_probes calculation for probe_hits tracepoint.
- Use TRACE_EVENT_CONDITION() for probe_hits tracepoint.
- Rebase to latest mm-new.
Changes from RFC
- rfc: https://lore.kernel.org/all/20260426205222.93895-1-sj@kernel.org/
- Support memcg DAMON filter.
- Use per-probe probe_hits sysfs file.
- Use dynamic_array for probe_hits tracing.
- Fix filter matching field.
- Fix folio leaking in damon_pa_filter_pass().
- Move nr_regions of damon_aggregated_v2 tracepoint after end.
- Rename DAMON_TEST_TYPE_ANON to DAMON_FILTER_TYPE_ANON.

SeongJae Park (28):
  mm/damon/core: introduce struct damon_probe
  mm/damon/core: embed damon_probe objects in damon_ctx
  mm/damon/core: introduce damon_filter
  mm/damon/core: commit probes
  mm/damon/core: introduce damon_region->probe_hits
  mm/damon/core: introduce damon_ops->apply_probes
  mm/damon/core: do data attributes monitoring
  mm/damon/paddr: support data attributes monitoring
  mm/damon/sysfs: implement probes dir
  mm/damon/sysfs: implement probe dir
  mm/damon/sysfs: implement filters directory
  mm/damon/sysfs: implement filter dir
  mm/damon/sysfs: implement filter dir files
  mm/damon/sysfs: setup probes on DAMON core API parameters
  mm/damon/sysfs-schemes: implement tried_regions/<r>/probes/
  mm/damon/sysfs-schemes: implement probe dir
  mm/damon/sysfs-schemes: implement probe/hits file
  mm/damon: trace probe_hits
  selftests/damon/sysfs.sh: test probes dir
  Docs/mm/damon/design: document data attributes monitoring
  Docs/admin-guide/mm/damon/usage: document data attributes monitoring
  mm/damon/core: introduce DAMON_FILTER_TYPE_MEMCG
  mm/damon/paddr: support DAMON_FILTER_TYPE_MEMCG
  mm/damon/sysfs: add filters/<F>/path file
  mm/damon/sysfs-schemes: move memcg_path_to_id() to sysfs-common
  mm/damon/sysfs: setup damon_filter->memcg_id from path
  Docs/mm/damon/design: update for memcg damon filter
  Docs/admin-guide/mm/damon/usage: update for memcg damon filter

 Documentation/admin-guide/mm/damon/usage.rst |  46 +-
 Documentation/mm/damon/design.rst            |  39 ++
 include/linux/damon.h                        |  69 +++
 include/trace/events/damon.h                 |  38 ++
 mm/damon/core.c                              | 211 +++++++
 mm/damon/paddr.c                             |  76 +++
 mm/damon/sysfs-common.c                      |  41 ++
 mm/damon/sysfs-common.h                      |   2 +
 mm/damon/sysfs-schemes.c                     | 224 ++++++--
 mm/damon/sysfs.c                             | 557 +++++++++++++++++++
 tools/testing/selftests/damon/sysfs.sh       |  48 ++
 11 files changed, 1303 insertions(+), 48 deletions(-)


base-commit: b491d3b062a367a23fdc98def7fe3a8cf21bb3b0
-- 
2.47.3

^ permalink raw reply

* [PATCH 18/28] mm/damon: trace probe_hits
From: SeongJae Park @ 2026-05-18 23:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Masami Hiramatsu, Mathieu Desnoyers,
	Steven Rostedt, damon, linux-kernel, linux-mm, linux-trace-kernel
In-Reply-To: <20260518234119.97569-1-sj@kernel.org>

Introduce a new tracepoint for exposing the per-region per-probe
positive sample count via tracefs.

Signed-off-by: SeongJae Park <sj@kernel.org>
---
 include/trace/events/damon.h | 38 ++++++++++++++++++++++++++++++++++++
 mm/damon/core.c              |  9 +++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 7e25f4469b81b..78388538acf44 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -130,6 +130,44 @@ TRACE_EVENT(damon_monitor_intervals_tune,
 	TP_printk("sample_us=%lu", __entry->sample_us)
 );
 
+TRACE_EVENT_CONDITION(damon_region_aggregated,
+
+	TP_PROTO(unsigned int target_id, struct damon_region *r,
+		unsigned int nr_regions, unsigned int nr_probes),
+
+	TP_ARGS(target_id, r, nr_regions, nr_probes),
+
+	TP_CONDITION(nr_probes > 0),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, target_id)
+		__field(unsigned long, start)
+		__field(unsigned long, end)
+		__field(unsigned int, nr_regions)
+		__field(unsigned int, nr_accesses)
+		__field(unsigned int, age)
+		__dynamic_array(unsigned char, probe_hits, nr_probes)
+	),
+
+	TP_fast_assign(
+		__entry->target_id = target_id;
+		__entry->start = r->ar.start;
+		__entry->end = r->ar.end;
+		__entry->nr_regions = nr_regions;
+		__entry->nr_accesses = r->nr_accesses;
+		__entry->age = r->age;
+		memcpy(__get_dynamic_array(probe_hits), r->probe_hits,
+			sizeof(*r->probe_hits) * nr_probes);
+	),
+
+	TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u %u probe_hits=%s",
+			__entry->target_id, __entry->nr_regions,
+			__entry->start, __entry->end,
+			__entry->nr_accesses, __entry->age,
+			__print_hex(__get_dynamic_array(probe_hits),
+				__get_dynamic_array_len(probe_hits)))
+);
+
 TRACE_EVENT(damon_aggregated,
 
 	TP_PROTO(unsigned int target_id, struct damon_region *r,
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 433da8781e255..5ba7ad4df4351 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1908,6 +1908,13 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
 {
 	struct damon_target *t;
 	unsigned int ti = 0;	/* target's index */
+	unsigned int nr_probes = 0;
+	struct damon_probe *probe;
+
+	if (trace_damon_region_aggregated_enabled()) {
+		damon_for_each_probe(probe, c)
+			nr_probes++;
+	}
 
 	damon_for_each_target(t, c) {
 		struct damon_region *r;
@@ -1916,6 +1923,8 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
 			int i;
 
 			trace_damon_aggregated(ti, r, damon_nr_regions(t));
+			trace_damon_region_aggregated(ti, r,
+					damon_nr_regions(t), nr_probes);
 			damon_warn_fix_nr_accesses_corruption(r);
 			r->last_nr_accesses = r->nr_accesses;
 			r->nr_accesses = 0;
-- 
2.47.3

^ permalink raw reply related

* Re: [PATCH 00/28] mm/damon: introduce data attributes monitoring
From: SeongJae Park @ 2026-05-18 23:53 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Andrew Morton, Liam R. Howlett, David Hildenbrand,
	Jonathan Corbet, Lorenzo Stoakes, Masami Hiramatsu,
	Mathieu Desnoyers, Michal Hocko, Mike Rapoport, Shuah Khan,
	Shuah Khan, Steven Rostedt, Suren Baghdasaryan, Vlastimil Babka,
	damon, linux-doc, linux-kernel, linux-kselftest, linux-mm,
	linux-trace-kernel
In-Reply-To: <20260518234119.97569-1-sj@kernel.org>

On Mon, 18 May 2026 16:40:48 -0700 SeongJae Park <sj@kernel.org> wrote:

> TL; DR
> ======
> 
> Extend DAMON for monitoring general data attributes other than accesses.
> The short term motivation is lightweight page type (e.g., belonging
> cgroup) aware monitoring.  In long term, this will help extending DAMON
> for multiple access events capture primitives (e.g., page faults and
> PMU) and eventually pivotting DAMON to a "Data Attributes Monitoring and
> Operations eNgine" in long term.
[...]
> Changes from RFC v3
> - rfc v3: https://lore.kernel.org/20260516183712.81393-1-sj@kernel.org
> - Wordsmithing documentation.
> - Drop RFC tag.
> - Rebase to mm-new.

Sashiko failed [1] to reivew this series because it is still having an old
version of mm-new, while this series is based on mm-new.  Same issues were
found in RFC versions, so I was making those to based on mm-stable, and got
Sashiko reviews.  On the last version (RFC v3), I confirmed [2] Sashiko find no
more blocker.  So I believe this is good to go for more testing in mm-new.  I
will of course happy to get different inputs.

[1] https://sashiko.dev/#/patchset/20260518234119.97569-1-sj%40kernel.org
[2] https://lore.kernel.org/20260516220317.4300-1-sj@kernel.org


Thanks,
SJ

[...]

^ permalink raw reply

* Re: [PATCH 00/28] mm/damon: introduce data attributes monitoring
From: Andrew Morton @ 2026-05-19  0:54 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Liam R. Howlett, David Hildenbrand, Jonathan Corbet,
	Lorenzo Stoakes, Masami Hiramatsu, Mathieu Desnoyers,
	Michal Hocko, Mike Rapoport, Shuah Khan, Shuah Khan,
	Steven Rostedt, Suren Baghdasaryan, Vlastimil Babka, damon,
	linux-doc, linux-kernel, linux-kselftest, linux-mm,
	linux-trace-kernel
In-Reply-To: <20260518234119.97569-1-sj@kernel.org>

On Mon, 18 May 2026 16:40:48 -0700 SeongJae Park <sj@kernel.org> wrote:

> TL; DR
> ======
> 
> Extend DAMON for monitoring general data attributes other than accesses.
> The short term motivation is lightweight page type (e.g., belonging
> cgroup) aware monitoring.  In long term, this will help extending DAMON
> for multiple access events capture primitives (e.g., page faults and
> PMU) and eventually pivotting DAMON to a "Data Attributes Monitoring and
> Operations eNgine" in long term.

Added, thanks.

> Plan for Dropping RFC tag
> =========================
> 
> Making changes for feedback from myself, humans and Sashiko should be
> the major remaining work.
> 
> I'm currently hoping to drop the RFC tag by 7.2-rc1.
> 

I removed this section.



^ permalink raw reply

* [PATCH] tools/bootconfig: Fix buf leaks in apply_xbc
From: lihongtao @ 2026-05-19  3:12 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: linux-kernel, linux-trace-kernel, lihongtao

If data calloc failed, free the buf before return.

Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command")
Signed-off-by: lihongtao <lihongtao@kylinos.cn>
---
 tools/bootconfig/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/bootconfig/main.c b/tools/bootconfig/main.c
index 643f707b8f1d..ddabde20585f 100644
--- a/tools/bootconfig/main.c
+++ b/tools/bootconfig/main.c
@@ -390,8 +390,10 @@ static int apply_xbc(const char *path, const char *xbc_path)
 
 	/* Backup the bootconfig data */
 	data = calloc(size + BOOTCONFIG_ALIGN + BOOTCONFIG_FOOTER_SIZE, 1);
-	if (!data)
+	if (!data) {
+		free(buf);
 		return -ENOMEM;
+	}
 	memcpy(data, buf, size);
 
 	/* Check the data format */
-- 
2.25.1


^ permalink raw reply related

* [PATCH] tools/bootconfig: Fix null pointer when free buf
From: lihongtao @ 2026-05-19  3:14 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: linux-kernel, linux-trace-kernel, lihongtao

In show_xbc() and delete_xbc(), if load_xbc_from_initrd failed,
the buf may be NULL.

Fixes: 950313ebf79c ("tools: bootconfig: Add bootconfig command")
Signed-off-by: lihongtao <lihongtao@kylinos.cn>
---
 tools/bootconfig/main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/bootconfig/main.c b/tools/bootconfig/main.c
index ddabde20585f..417d07a46f92 100644
--- a/tools/bootconfig/main.c
+++ b/tools/bootconfig/main.c
@@ -328,7 +328,8 @@ static int show_xbc(const char *path, bool list)
 		xbc_show_compact_tree();
 	ret = 0;
 out:
-	free(buf);
+	if (buf)
+		free(buf);
 
 	return ret;
 }
@@ -360,7 +361,8 @@ static int delete_xbc(const char *path)
 	} /* Ignore if there is no boot config in initrd */
 
 	close(fd);
-	free(buf);
+	if (buf)
+		free(buf);
 
 	return ret;
 }
-- 
2.25.1


^ permalink raw reply related

* [PATCH v4] tracing/probes: Allow use of BTF names to dereference pointers
From: Steven Rostedt @ 2026-05-19  3:23 UTC (permalink / raw)
  To: LKML, Linux Trace Kernel, bpf
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Mark Rutland, Peter Zijlstra,
	Namhyung Kim, Takaya Saeki, Douglas Raillard, Tom Zanussi,
	Andrew Morton, Thomas Gleixner, Ian Rogers, Jiri Olsa,
	"Subject:[PATCH  v2]", tracing/pr

From: Steven Rostedt <rostedt@goodmis.org>

Add syntax to the FETCHARGS parsing of probes to be able to typecast a
value to a pointer to a structure.

Currently, a dereference must be a number, where the user has to figure
out manually the offset of a member of a structure that they want to
dereference, unless the member is a function parameter that BTF already has
information about what structure the argument is pointing to.

But for event probes, or generic kprobes that records a register that
happens to be a pointer to a structure, they cannot dereference these
values with BTF naming, but must use numerical offsets.

For example, to find out what device a sk_buff is pointing to in the
net_dev_xmit trace event, one must first use gdb to find the offsets of the
members of the structures:

 (gdb) p &((struct sk_buff *)0)->dev
 $1 = (struct net_device **) 0x10
 (gdb) p &((struct net_device *)0)->name
 $2 = (char (*)[16]) 0x118

And then use the raw numbers to dereference:

  # echo 'e:xmit net.net_dev_xmit +0x118(+0x10($skbaddr)):string' >> dynamic_events

If BTF is in the kernel, then instead, the $skbaddr can be typecast to
sk_buff and use the normal dereference logic.

  # echo 'e:xmit net.net_dev_xmit (sk_buff*)$skbaddr->dev->name:string' >> dynamic_events
  # echo 1 > events/eprobes/xmit/enable
  # cat trace
[..]
    sshd-session-1022    [000] b..2.   860.249343: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250061: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.250142: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.263553: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.283820: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.302716: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.322905: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.342828: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.362268: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.382335: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.400856: xmit: (net.net_dev_xmit) arg1="enp7s0"
    sshd-session-1022    [000] b..2.   860.419893: xmit: (net.net_dev_xmit) arg1="enp7s0"

The syntax is simply: ([STRUCT]*)(VAR)->FIELD[->FIELD..]

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
Changes since v3: https://patch.msgid.link/20260518095832.52659a3a@gandalf.local.home

 *** COMPLETE REWRITE FROM V3 ***

- Rewrote it to use typecasting instead of simply replacing BTF names with
  offsets.

 Documentation/trace/kprobetrace.rst |   3 +
 kernel/trace/trace_probe.c          | 110 ++++++++++++++++++++++++----
 kernel/trace/trace_probe.h          |   3 +
 3 files changed, 100 insertions(+), 16 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 3b6791c17e9b..450ac646fe4c 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -54,6 +54,9 @@ Synopsis of kprobe_events
   $retval	: Fetch return value.(\*2)
   $comm		: Fetch current task comm.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
+  (STRUCT*)FETCHARG->FIELD[->FIELD] : If BTF is supported, typecast FETCHARG to
+                  a pointer to STRUCT and then derference the pointer defined by
+                  ->FIELD.
   \IMM		: Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index e0d3a0da26af..b0829eb1cb52 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -464,6 +464,26 @@ static const char *fetch_type_from_btf_type(struct btf *btf,
 	return NULL;
 }
 
+static int query_btf_struct(const char *sname, struct traceprobe_parse_context *ctx)
+{
+	int id;
+
+	if (!ctx->btf) {
+		struct btf *btf;
+		id = bpf_find_btf_id(sname, BTF_KIND_STRUCT, &btf);
+		if (id < 0)
+			return -EINVAL;
+		ctx->btf = btf;
+	} else {
+		id = btf_find_by_name_kind(ctx->btf, sname, BTF_KIND_STRUCT);
+		if (id < 0)
+			return -EINVAL;
+	}
+
+	ctx->last_struct = btf_type_by_id(ctx->btf, id);
+	return 0;
+}
+
 static int query_btf_context(struct traceprobe_parse_context *ctx)
 {
 	const struct btf_param *param;
@@ -471,12 +491,12 @@ static int query_btf_context(struct traceprobe_parse_context *ctx)
 	struct btf *btf;
 	s32 nr;
 
-	if (ctx->btf)
-		return 0;
-
 	if (!ctx->funcname)
 		return -EINVAL;
 
+	if (ctx->btf)
+		return 0;
+
 	type = btf_find_func_proto(ctx->funcname, &btf);
 	if (!type)
 		return -ENOENT;
@@ -514,6 +534,7 @@ static void clear_btf_context(struct traceprobe_parse_context *ctx)
 		ctx->proto = NULL;
 		ctx->params = NULL;
 		ctx->nr_params = 0;
+		ctx->last_struct = NULL;
 	}
 }
 
@@ -554,22 +575,28 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
 	struct fetch_insn *code = *pcode;
 	const struct btf_member *field;
 	u32 bitoffs, anon_offs;
+	bool is_struct = ctx->flags & TPARG_FL_STRUCT;
 	char *next;
 	int is_ptr;
 	s32 tid;
 
 	do {
-		/* Outer loop for solving arrow operator ('->') */
-		if (BTF_INFO_KIND(type->info) != BTF_KIND_PTR) {
-			trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
-			return -EINVAL;
-		}
-		/* Convert a struct pointer type to a struct type */
-		type = btf_type_skip_modifiers(ctx->btf, type->type, &tid);
-		if (!type) {
-			trace_probe_log_err(ctx->offset, BAD_BTF_TID);
-			return -EINVAL;
+		if (!is_struct) {
+			/* Outer loop for solving arrow operator ('->') */
+			if (BTF_INFO_KIND(type->info) != BTF_KIND_PTR) {
+				trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
+				return -EINVAL;
+			}
+
+			/* Convert a struct pointer type to a struct type */
+			type = btf_type_skip_modifiers(ctx->btf, type->type, &tid);
+			if (!type) {
+				trace_probe_log_err(ctx->offset, BAD_BTF_TID);
+				return -EINVAL;
+			}
 		}
+		/* Only the first type can skip being a pointer */
+		is_struct = false;
 
 		bitoffs = 0;
 		do {
@@ -635,12 +662,12 @@ static int parse_btf_arg(char *varname,
 {
 	struct fetch_insn *code = *pcode;
 	const struct btf_param *params;
-	const struct btf_type *type;
+	const struct btf_type *type = NULL;
 	char *field = NULL;
 	int i, is_ptr, ret;
 	u32 tid;
 
-	if (WARN_ON_ONCE(!ctx->funcname))
+	if (WARN_ON_ONCE(!ctx->funcname && !(ctx->flags & TPARG_FL_STRUCT)))
 		return -EINVAL;
 
 	is_ptr = split_next_field(varname, &field, ctx);
@@ -704,11 +731,18 @@ static int parse_btf_arg(char *varname,
 			goto found;
 		}
 	}
+
+	if (ctx->flags & TPARG_FL_STRUCT) {
+		type = ctx->last_struct;
+		goto found;
+	}
+
 	trace_probe_log_err(ctx->offset, NO_BTFARG);
 	return -ENOENT;
 
 found:
-	type = btf_type_skip_modifiers(ctx->btf, tid, &tid);
+	if (!type)
+		type = btf_type_skip_modifiers(ctx->btf, tid, &tid);
 	if (!type) {
 		trace_probe_log_err(ctx->offset, BAD_BTF_TID);
 		return -EINVAL;
@@ -952,6 +986,12 @@ static int parse_probe_vars(char *orig_arg, const struct fetch_type *t,
 	int ret = 0;
 	int len;
 
+	if (ctx->flags & TPARG_FL_STRUCT) {
+		ret = parse_btf_arg(orig_arg, pcode, end, ctx);
+		if (ret < 0)
+			return ret;
+	}
+
 	if (ctx->flags & TPARG_FL_TEVENT) {
 		if (code->data)
 			return -EFAULT;
@@ -1231,6 +1271,43 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 				code->op = FETCH_OP_IMM;
 		}
 		break;
+	case '(':
+		tmp = strrchr(arg, ')');
+		if (!tmp) {
+			trace_probe_log_err(ctx->offset + strlen(arg),
+					    DEREF_OPEN_BRACE);
+			return -EINVAL;
+		}
+
+		tmp--;
+		if (*tmp != '*') {
+			trace_probe_log_err(ctx->offset + (tmp - arg),
+					    NO_PTR_STRCT);
+			return -EINVAL;
+		}
+		*tmp = '\0';
+		ret = query_btf_struct(arg + 1, ctx);
+		*tmp = '*';
+
+		if (ret < 0) {
+			trace_probe_log_err(ctx->offset + 1, NO_PTR_STRCT);
+			return -EINVAL;
+		}
+
+		ctx->flags |= TPARG_FL_STRUCT;
+		tmp += 2;
+
+		if (*tmp != '$') {
+			trace_probe_log_err(ctx->offset + (tmp - arg),
+					    BAD_VAR);
+			return -EINVAL;
+		}
+
+		ctx->offset += tmp - arg;
+		ret = parse_probe_vars(tmp, type, pcode, end, ctx);
+		ctx->flags &= ~TPARG_FL_STRUCT;
+		ctx->last_struct = NULL;
+		break;
 	default:
 		if (isalpha(arg[0]) || arg[0] == '_') {	/* BTF variable */
 			if (!tparg_is_function_entry(ctx->flags) &&
@@ -1504,6 +1581,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
 	code[FETCH_INSN_MAX - 1].op = FETCH_OP_END;
 
 	ctx->last_type = NULL;
+	ctx->last_struct = NULL;
 	ret = parse_probe_arg(arg, parg->type, &code, &code[FETCH_INSN_MAX - 1],
 			      ctx);
 	if (ret < 0)
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 262d8707a3df..88ab9f6da591 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -394,6 +394,7 @@ static inline int traceprobe_get_entry_data_size(struct trace_probe *tp)
  * TPARG_FL_KERNEL and TPARG_FL_USER are also mutually exclusive.
  * TPARG_FL_FPROBE and TPARG_FL_TPOINT are optional but it should be with
  * TPARG_FL_KERNEL.
+ * TPARG_FL_STRUCT is set if an argument was typecast to a structure.
  */
 #define TPARG_FL_RETURN BIT(0)
 #define TPARG_FL_KERNEL BIT(1)
@@ -402,6 +403,7 @@ static inline int traceprobe_get_entry_data_size(struct trace_probe *tp)
 #define TPARG_FL_USER   BIT(4)
 #define TPARG_FL_FPROBE BIT(5)
 #define TPARG_FL_TPOINT BIT(6)
+#define TPARG_FL_STRUCT BIT(7)
 #define TPARG_FL_LOC_MASK	GENMASK(4, 0)
 
 static inline bool tparg_is_function_entry(unsigned int flags)
@@ -423,6 +425,7 @@ struct traceprobe_parse_context {
 	s32 nr_params;			/* The number of the parameters */
 	struct btf *btf;		/* The BTF to be used */
 	const struct btf_type *last_type;	/* Saved type */
+	const struct btf_type *last_struct;	/* Saved structure */
 	u32 last_bitoffs;		/* Saved bitoffs */
 	u32 last_bitsize;		/* Saved bitsize */
 	struct trace_probe *tp;
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v3 06/11] drm: Use trace_call__##name() at guarded tracepoint call sites
From: Philipp Stanner @ 2026-05-19  7:23 UTC (permalink / raw)
  To: Vineeth Remanan Pillai, phasta
  Cc: Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Harry Wentland, Leo Li, Matthew Brost, Danilo Krummrich,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, amd-gfx,
	dri-devel, Steven Rostedt, linux-trace-kernel, Peter Zijlstra
In-Reply-To: <CAO7JXPhMBd0xgDRO-gZ2HpSTnrj1OD67c39jrXWEKaowNc9GEA@mail.gmail.com>

On Mon, 2026-05-18 at 19:20 -0400, Vineeth Remanan Pillai wrote:
> On Mon, May 18, 2026 at 11:01 AM Philipp Stanner <phasta@mailbox.org> wrote:
> > 
> > On Fri, 2026-05-15 at 09:59 -0400, Vineeth Pillai (Google) wrote:
> > > From: Vineeth Pillai <vineeth@bitbyteword.org>
> > > 
> > > Replace trace_foo() with the new trace_call__foo() at sites already
> > > guarded by trace_foo_enabled(), avoiding a redundant
> > > static_branch_unlikely() re-evaluation inside the tracepoint.
> > > trace_call__foo() calls the tracepoint callbacks directly without
> > > utilizing the static branch again.
> > 
> > The "foo" terminology is unusual I think? I always wrote it with regex,
> > like "trace_*()".
> > 
> Sorry about the terminology. Part of the patches got merged this way,
> so is it okay to continue the terminology to have consistency?

Sure, no big deal.

> 
> > 
> > 
> > > 
> > > Original v2 series:
> > > https://lore.kernel.org/linux-trace-kernel/20260323160052.17528-1-vineeth@bitbyteword.org/
> > 
> > I'd put this in a Link: tag section below.
> > 
> Makes sense, will do. Steve also suggested to put this whole section
> after "---" because it isn't relevant to the changes. Will fix this in
> next iteration.

I agree with Steve. We have had some tendency in the past to have all
sorts of versioning information in the git log, which IMO is not useful
to anyone a few months after merging. The commit message should detail
the why, how and what, not the history.

> 
> > > 
> > > Parts of the original v2 series have already been merged in mainline.
> > > This patch is being reposted as a follow-up cleanup for the remaining
> > > unmerged pieces.
> > 
> > So this v3 series as a whole is a followup to that v2?
> > 
> v3 is a follow up to remaining patches that were not merged with the
> previous cycle. The core api and couple of patches went in the
> previous cycle, so this is for rest of it.
> 
> The intention was to send this v3 as a direct patch to individual
> subsystem maintainers but forgot to remove the numbering and hence
> there might be a confusion. Will remove the numbering and send it  as
> stand alone patch in the next iteration.

OK, cool.

Thanks,
Philipp


> 
> > > 
> > > Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> > > Suggested-by: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
> > > Assisted-by: Claude:claude-sonnet-4-6
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c            |  2 +-
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c            |  4 ++--
> > >  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 +++++-----
> > >  drivers/gpu/drm/scheduler/sched_entity.c          |  5 +++--
> > >  4 files changed, 11 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > index b24d5d21be5f..cb0b5cb07d57 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > @@ -1004,7 +1004,7 @@ static void trace_amdgpu_cs_ibs(struct amdgpu_cs_parser *p)
> > >               struct amdgpu_job *job = p->jobs[i];
> > > 
> > >               for (j = 0; j < job->num_ibs; ++j)
> > > -                     trace_amdgpu_cs(p, job, &job->ibs[j]);
> > > +                     trace_call__amdgpu_cs(p, job, &job->ibs[j]);
> > >       }
> > >  }
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > > index 9ba9de16a27a..a36ae94c425f 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> > > @@ -1415,7 +1415,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
> > > 
> > >       if (trace_amdgpu_vm_bo_mapping_enabled()) {
> > >               list_for_each_entry(mapping, &bo_va->valids, list)
> > > -                     trace_amdgpu_vm_bo_mapping(mapping);
> > > +                     trace_call__amdgpu_vm_bo_mapping(mapping);
> > >       }
> > > 
> > >  error_free:
> > > @@ -2183,7 +2183,7 @@ void amdgpu_vm_bo_trace_cs(struct amdgpu_vm *vm, struct ww_acquire_ctx *ticket)
> > >                               continue;
> > >               }
> > > 
> > > -             trace_amdgpu_vm_bo_cs(mapping);
> > > +             trace_call__amdgpu_vm_bo_cs(mapping);
> > >       }
> > >  }
> > > 
> > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > index 5fc5d5608506..fbdc12cdd6bb 100644
> > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > @@ -5263,11 +5263,11 @@ static void amdgpu_dm_backlight_set_level(struct amdgpu_display_manager *dm,
> > >       }
> > > 
> > >       if (trace_amdgpu_dm_brightness_enabled()) {
> > > -             trace_amdgpu_dm_brightness(__builtin_return_address(0),
> > > -                                        user_brightness,
> > > -                                        brightness,
> > > -                                        caps->aux_support,
> > > -                                        power_supply_is_system_supplied() > 0);
> > > +             trace_call__amdgpu_dm_brightness(__builtin_return_address(0),
> > > +                                              user_brightness,
> > > +                                              brightness,
> > > +                                              caps->aux_support,
> > > +                                              power_supply_is_system_supplied() > 0);
> > >       }
> > > 
> > >       if (caps->aux_support) {
> > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> > > index fe174a4857be..185a2636b599 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > > @@ -429,7 +429,8 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity,
> > > 
> > >       if (trace_drm_sched_job_unschedulable_enabled() &&
> > >           !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &entity->dependency->flags))
> > > -             trace_drm_sched_job_unschedulable(sched_job, entity->dependency);
> > > +             trace_call__drm_sched_job_unschedulable(sched_job,
> > > +                                                     entity->dependency);
> > 
> > I would be more happy if you sacrifice a bit of space here and keep it
> > a single line since the if condition is already quite convoluted and
> > challenging to read.
> > 
> I understand, will fix it in next iteration.
> 
> Thanks,
> Vineeth


^ permalink raw reply

* Re: [PATCH v2 08/14] verification/rvgen: Add golden and spec folders for tests
From: Gabriele Monaco @ 2026-05-19  7:29 UTC (permalink / raw)
  To: Nam Cao
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang,
	linux-kernel, linux-trace-kernel, Steven Rostedt
In-Reply-To: <87pl2t6qo8.fsf@yellow.woof>

On Mon, 2026-05-18 at 10:57 +0200, Nam Cao wrote:
> Gabriele Monaco <gmonaco@redhat.com> writes:
> > Create reference models specifications and generated files in the golded
> > folder. Those can be used as reference to validate rvgen still generates
> > files as expected in automated tests.
> > 
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> 
> Didn't look at the "golden" files, I presume those are generated.
> 
> Reviewed-by: Nam Cao <namcao@linutronix.de>

Thanks for the review!

Yes the golden are generated, I checked them and had a few AIs run through them
and that's how I spotted the True/true issue.
They aren't guaranteed to be spotless but again, some test is better than no
test.

Thanks,
Gabriele


^ permalink raw reply

* [PATCH 0/3] rv: rtapp monitor update
From: Nam Cao @ 2026-05-19  7:49 UTC (permalink / raw)
  To: Gabriele Monaco, Steven Rostedt, linux-kernel, linux-trace-kernel; +Cc: Nam Cao

Hi,

A couple of minor improvements to the rtapp monitor, making the monitor
more informative to user and update the allow list regarding
clock_nanosleep syscall.

Nam Cao (3):
  rv/rtapp/sleep: Make the error more informative for user
  rv/rtapp/sleep: Update nanosleep rule
  rv/rtapp: Add wakeup monitor

 kernel/trace/rv/Kconfig                       |   1 +
 kernel/trace/rv/Makefile                      |   1 +
 kernel/trace/rv/monitors/sleep/sleep.c        |  18 +-
 kernel/trace/rv/monitors/sleep/sleep.h        |  52 +++---
 kernel/trace/rv/monitors/wakeup/Kconfig       |  17 ++
 kernel/trace/rv/monitors/wakeup/wakeup.c      | 155 ++++++++++++++++++
 kernel/trace/rv/monitors/wakeup/wakeup.h      |  92 +++++++++++
 .../trace/rv/monitors/wakeup/wakeup_trace.h   |  14 ++
 kernel/trace/rv/rv_trace.h                    |   1 +
 tools/verification/models/rtapp/sleep.ltl     |   2 +-
 tools/verification/models/rtapp/wakeup.ltl    |   5 +
 11 files changed, 318 insertions(+), 40 deletions(-)
 create mode 100644 kernel/trace/rv/monitors/wakeup/Kconfig
 create mode 100644 kernel/trace/rv/monitors/wakeup/wakeup.c
 create mode 100644 kernel/trace/rv/monitors/wakeup/wakeup.h
 create mode 100644 kernel/trace/rv/monitors/wakeup/wakeup_trace.h
 create mode 100644 tools/verification/models/rtapp/wakeup.ltl

-- 
2.47.3


^ permalink raw reply

* [PATCH 1/3] rv/rtapp/sleep: Make the error more informative for user
From: Nam Cao @ 2026-05-19  7:49 UTC (permalink / raw)
  To: Gabriele Monaco, Steven Rostedt, linux-kernel, linux-trace-kernel; +Cc: Nam Cao
In-Reply-To: <cover.1779176466.git.namcao@linutronix.de>

The rtapp/sleep monitor detects real-time tasks which go to sleep in an
real-time-unsafe manner. If this happen, the monitor triggers a trace event
in the sched_wakeup tracepoint's handler.

However, the invoking context of that trace event is not the most
informative, because of the stack trace of that event is the wakeup's code
path which is not very helpful:

74.669317: rv:error_sleep: condvar[254]: violation detected
    ltl_validate+0x345 ([kernel.kallsyms])
    handle_sched_wakeup+0x34 ([kernel.kallsyms])
    ttwu_do_activate+0xff ([kernel.kallsyms])
    sched_ttwu_pending+0x104 ([kernel.kallsyms])
    __flush_smp_call_function_queue+0x15b ([kernel.kallsyms])
    __sysvec_call_function_single+0x18 ([kernel.kallsyms])
    sysvec_call_function_single+0x66 ([kernel.kallsyms])
    asm_sysvec_call_function_single+0x1a ([kernel.kallsyms])
    pv_native_safe_halt+0xf ([kernel.kallsyms])
    default_idle+0x9 ([kernel.kallsyms])
    default_idle_call+0x33 ([kernel.kallsyms])
    do_idle+0x234 ([kernel.kallsyms])
    cpu_startup_entry+0x24 ([kernel.kallsyms])
    start_secondary+0xf8 ([kernel.kallsyms])
    common_startup_64+0x13e ([kernel.kallsyms])

What would be much more valuable is the stack trace of the task itself.

Change the update of WAKEUP from being in sched_wakeup trace point's
handler to sched_exit trace point's handler. This makes the event happen in
the task's context, making the stack trace far more informative for user:

rv:error_sleep: condvar[254]: violation detected
    ltl_validate+0x345 ([kernel.kallsyms])
    handle_sched_exit+0x39 ([kernel.kallsyms])
    __schedule+0x80f ([kernel.kallsyms])
    schedule+0x22 ([kernel.kallsyms])
    futex_do_wait+0x33 ([kernel.kallsyms])
    __futex_wait+0x8c ([kernel.kallsyms])
    futex_wait+0x73 ([kernel.kallsyms])
    do_futex+0xc6 ([kernel.kallsyms])
    __x64_sys_futex+0x121 ([kernel.kallsyms])
    do_syscall_64+0xf3 ([kernel.kallsyms])
    entry_SYSCALL_64_after_hwframe+0x77 ([kernel.kallsyms])
    __futex_abstimed_wait_common64+0xc6 (inlined)
    __futex_abstimed_wait_common+0xc6 (/usr/lib/x86_64-linux-gnu/libc.so.6)

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 kernel/trace/rv/monitors/sleep/sleep.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/rv/monitors/sleep/sleep.c b/kernel/trace/rv/monitors/sleep/sleep.c
index 8dfe5ec13e19..0a36f5519e6b 100644
--- a/kernel/trace/rv/monitors/sleep/sleep.c
+++ b/kernel/trace/rv/monitors/sleep/sleep.c
@@ -92,9 +92,9 @@ static void handle_sched_set_state(void *data, struct task_struct *task, int sta
 		ltl_atom_pulse(task, LTL_ABORT_SLEEP, true);
 }
 
-static void handle_sched_wakeup(void *data, struct task_struct *task)
+static void handle_sched_exit(void *data, bool is_switch)
 {
-	ltl_atom_pulse(task, LTL_WAKE, true);
+	ltl_atom_pulse(current, LTL_WAKE, true);
 }
 
 static void handle_sched_waking(void *data, struct task_struct *task)
@@ -200,7 +200,7 @@ static int enable_sleep(void)
 		return retval;
 
 	rv_attach_trace_probe("rtapp_sleep", sched_waking, handle_sched_waking);
-	rv_attach_trace_probe("rtapp_sleep", sched_wakeup, handle_sched_wakeup);
+	rv_attach_trace_probe("rtapp_sleep", sched_exit_tp, handle_sched_exit);
 	rv_attach_trace_probe("rtapp_sleep", sched_set_state_tp, handle_sched_set_state);
 	rv_attach_trace_probe("rtapp_sleep", contention_begin, handle_contention_begin);
 	rv_attach_trace_probe("rtapp_sleep", contention_end, handle_contention_end);
@@ -213,7 +213,7 @@ static int enable_sleep(void)
 static void disable_sleep(void)
 {
 	rv_detach_trace_probe("rtapp_sleep", sched_waking, handle_sched_waking);
-	rv_detach_trace_probe("rtapp_sleep", sched_wakeup, handle_sched_wakeup);
+	rv_detach_trace_probe("rtapp_sleep", sched_exit_tp, handle_sched_exit);
 	rv_detach_trace_probe("rtapp_sleep", sched_set_state_tp, handle_sched_set_state);
 	rv_detach_trace_probe("rtapp_sleep", contention_begin, handle_contention_begin);
 	rv_detach_trace_probe("rtapp_sleep", contention_end, handle_contention_end);
-- 
2.47.3


^ permalink raw reply related

* [PATCH 2/3] rv/rtapp/sleep: Update nanosleep rule
From: Nam Cao @ 2026-05-19  7:49 UTC (permalink / raw)
  To: Gabriele Monaco, Steven Rostedt, linux-kernel, linux-trace-kernel; +Cc: Nam Cao
In-Reply-To: <cover.1779176466.git.namcao@linutronix.de>

CLOCK_REALTIME is the only clock that often is misused in real-time
applications. The other clocks either are safe for real-time uses
(CLOCK_TAI, CLOCK_MONOTONIC, CLOCK_BOOTTIME) or are unlikely to be misused
(CLOCK_AUX, CLOCK_PROCESS_CPUTIME_ID).

The rtapp monitor's purpose is warning people about common mistakes with
real-time design. However, warning about all clock types generates too much
false positives.

Update the monitor to only warn about CLOCK_REALTIME.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 kernel/trace/rv/monitors/sleep/sleep.c    | 10 ++---
 kernel/trace/rv/monitors/sleep/sleep.h    | 52 +++++++++++------------
 tools/verification/models/rtapp/sleep.ltl |  2 +-
 3 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/kernel/trace/rv/monitors/sleep/sleep.c b/kernel/trace/rv/monitors/sleep/sleep.c
index 0a36f5519e6b..e01ac56b3f4a 100644
--- a/kernel/trace/rv/monitors/sleep/sleep.c
+++ b/kernel/trace/rv/monitors/sleep/sleep.c
@@ -43,9 +43,7 @@ static void ltl_atoms_init(struct task_struct *task, struct ltl_monitor *mon, bo
 	ltl_atom_set(mon, LTL_WOKEN_BY_EQUAL_OR_HIGHER_PRIO, false);
 
 	if (task_creation) {
-		ltl_atom_set(mon, LTL_KTHREAD_SHOULD_STOP, false);
-		ltl_atom_set(mon, LTL_NANOSLEEP_CLOCK_MONOTONIC, false);
-		ltl_atom_set(mon, LTL_NANOSLEEP_CLOCK_TAI, false);
+		ltl_atom_set(mon, LTL_NANOSLEEP_CLOCK_REALTIME, false);
 		ltl_atom_set(mon, LTL_NANOSLEEP_TIMER_ABSTIME, false);
 		ltl_atom_set(mon, LTL_CLOCK_NANOSLEEP, false);
 		ltl_atom_set(mon, LTL_FUTEX_WAIT, false);
@@ -136,8 +134,7 @@ static void handle_sys_enter(void *data, struct pt_regs *regs, long id)
 	case __NR_clock_nanosleep_time64:
 #endif
 		syscall_get_arguments(current, regs, args);
-		ltl_atom_set(mon, LTL_NANOSLEEP_CLOCK_MONOTONIC, args[0] == CLOCK_MONOTONIC);
-		ltl_atom_set(mon, LTL_NANOSLEEP_CLOCK_TAI, args[0] == CLOCK_TAI);
+		ltl_atom_set(mon, LTL_NANOSLEEP_CLOCK_REALTIME, args[0] == CLOCK_REALTIME);
 		ltl_atom_set(mon, LTL_NANOSLEEP_TIMER_ABSTIME, args[1] == TIMER_ABSTIME);
 		ltl_atom_update(current, LTL_CLOCK_NANOSLEEP, true);
 		break;
@@ -178,8 +175,7 @@ static void handle_sys_exit(void *data, struct pt_regs *regs, long ret)
 
 	ltl_atom_set(mon, LTL_FUTEX_LOCK_PI, false);
 	ltl_atom_set(mon, LTL_FUTEX_WAIT, false);
-	ltl_atom_set(mon, LTL_NANOSLEEP_CLOCK_MONOTONIC, false);
-	ltl_atom_set(mon, LTL_NANOSLEEP_CLOCK_TAI, false);
+	ltl_atom_set(mon, LTL_NANOSLEEP_CLOCK_REALTIME, false);
 	ltl_atom_set(mon, LTL_NANOSLEEP_TIMER_ABSTIME, false);
 	ltl_atom_set(mon, LTL_EPOLL_WAIT, false);
 	ltl_atom_update(current, LTL_CLOCK_NANOSLEEP, false);
diff --git a/kernel/trace/rv/monitors/sleep/sleep.h b/kernel/trace/rv/monitors/sleep/sleep.h
index 95dc2727c059..ed1ac7ad008e 100644
--- a/kernel/trace/rv/monitors/sleep/sleep.h
+++ b/kernel/trace/rv/monitors/sleep/sleep.h
@@ -20,8 +20,7 @@ enum ltl_atom {
 	LTL_FUTEX_WAIT,
 	LTL_KERNEL_THREAD,
 	LTL_KTHREAD_SHOULD_STOP,
-	LTL_NANOSLEEP_CLOCK_MONOTONIC,
-	LTL_NANOSLEEP_CLOCK_TAI,
+	LTL_NANOSLEEP_CLOCK_REALTIME,
 	LTL_NANOSLEEP_TIMER_ABSTIME,
 	LTL_RT,
 	LTL_SLEEP,
@@ -46,8 +45,7 @@ static const char *ltl_atom_str(enum ltl_atom atom)
 		"fu_wa",
 		"ker_th",
 		"kth_sh_st",
-		"na_cl_mo",
-		"na_cl_ta",
+		"na_cl_re",
 		"na_ti_ab",
 		"rt",
 		"sl",
@@ -87,8 +85,7 @@ static void ltl_start(struct task_struct *task, struct ltl_monitor *mon)
 	bool sleep = test_bit(LTL_SLEEP, mon->atoms);
 	bool rt = test_bit(LTL_RT, mon->atoms);
 	bool nanosleep_timer_abstime = test_bit(LTL_NANOSLEEP_TIMER_ABSTIME, mon->atoms);
-	bool nanosleep_clock_tai = test_bit(LTL_NANOSLEEP_CLOCK_TAI, mon->atoms);
-	bool nanosleep_clock_monotonic = test_bit(LTL_NANOSLEEP_CLOCK_MONOTONIC, mon->atoms);
+	bool nanosleep_clock_realtime = test_bit(LTL_NANOSLEEP_CLOCK_REALTIME, mon->atoms);
 	bool kthread_should_stop = test_bit(LTL_KTHREAD_SHOULD_STOP, mon->atoms);
 	bool kernel_thread = test_bit(LTL_KERNEL_THREAD, mon->atoms);
 	bool futex_wait = test_bit(LTL_FUTEX_WAIT, mon->atoms);
@@ -97,17 +94,17 @@ static void ltl_start(struct task_struct *task, struct ltl_monitor *mon)
 	bool clock_nanosleep = test_bit(LTL_CLOCK_NANOSLEEP, mon->atoms);
 	bool block_on_rt_mutex = test_bit(LTL_BLOCK_ON_RT_MUTEX, mon->atoms);
 	bool abort_sleep = test_bit(LTL_ABORT_SLEEP, mon->atoms);
-	bool val42 = task_is_rcu || task_is_migration;
-	bool val43 = futex_lock_pi || val42;
-	bool val5 = block_on_rt_mutex || val43;
-	bool val34 = abort_sleep || kthread_should_stop;
-	bool val35 = woken_by_nmi || val34;
-	bool val36 = woken_by_hardirq || val35;
-	bool val14 = woken_by_equal_or_higher_prio || val36;
+	bool val41 = task_is_rcu || task_is_migration;
+	bool val42 = futex_lock_pi || val41;
+	bool val5 = block_on_rt_mutex || val42;
+	bool val33 = abort_sleep || kthread_should_stop;
+	bool val34 = woken_by_nmi || val33;
+	bool val35 = woken_by_hardirq || val34;
+	bool val14 = woken_by_equal_or_higher_prio || val35;
 	bool val13 = !wake;
-	bool val26 = nanosleep_clock_monotonic || nanosleep_clock_tai;
-	bool val27 = nanosleep_timer_abstime && val26;
-	bool val18 = clock_nanosleep && val27;
+	bool val25 = !nanosleep_clock_realtime;
+	bool val26 = nanosleep_timer_abstime && val25;
+	bool val18 = clock_nanosleep && val26;
 	bool val20 = val18 || epoll_wait;
 	bool val9 = futex_wait || val20;
 	bool val11 = val9 || kernel_thread;
@@ -138,8 +135,7 @@ ltl_possible_next_states(struct ltl_monitor *mon, unsigned int state, unsigned l
 	bool sleep = test_bit(LTL_SLEEP, mon->atoms);
 	bool rt = test_bit(LTL_RT, mon->atoms);
 	bool nanosleep_timer_abstime = test_bit(LTL_NANOSLEEP_TIMER_ABSTIME, mon->atoms);
-	bool nanosleep_clock_tai = test_bit(LTL_NANOSLEEP_CLOCK_TAI, mon->atoms);
-	bool nanosleep_clock_monotonic = test_bit(LTL_NANOSLEEP_CLOCK_MONOTONIC, mon->atoms);
+	bool nanosleep_clock_realtime = test_bit(LTL_NANOSLEEP_CLOCK_REALTIME, mon->atoms);
 	bool kthread_should_stop = test_bit(LTL_KTHREAD_SHOULD_STOP, mon->atoms);
 	bool kernel_thread = test_bit(LTL_KERNEL_THREAD, mon->atoms);
 	bool futex_wait = test_bit(LTL_FUTEX_WAIT, mon->atoms);
@@ -148,17 +144,17 @@ ltl_possible_next_states(struct ltl_monitor *mon, unsigned int state, unsigned l
 	bool clock_nanosleep = test_bit(LTL_CLOCK_NANOSLEEP, mon->atoms);
 	bool block_on_rt_mutex = test_bit(LTL_BLOCK_ON_RT_MUTEX, mon->atoms);
 	bool abort_sleep = test_bit(LTL_ABORT_SLEEP, mon->atoms);
-	bool val42 = task_is_rcu || task_is_migration;
-	bool val43 = futex_lock_pi || val42;
-	bool val5 = block_on_rt_mutex || val43;
-	bool val34 = abort_sleep || kthread_should_stop;
-	bool val35 = woken_by_nmi || val34;
-	bool val36 = woken_by_hardirq || val35;
-	bool val14 = woken_by_equal_or_higher_prio || val36;
+	bool val41 = task_is_rcu || task_is_migration;
+	bool val42 = futex_lock_pi || val41;
+	bool val5 = block_on_rt_mutex || val42;
+	bool val33 = abort_sleep || kthread_should_stop;
+	bool val34 = woken_by_nmi || val33;
+	bool val35 = woken_by_hardirq || val34;
+	bool val14 = woken_by_equal_or_higher_prio || val35;
 	bool val13 = !wake;
-	bool val26 = nanosleep_clock_monotonic || nanosleep_clock_tai;
-	bool val27 = nanosleep_timer_abstime && val26;
-	bool val18 = clock_nanosleep && val27;
+	bool val25 = !nanosleep_clock_realtime;
+	bool val26 = nanosleep_timer_abstime && val25;
+	bool val18 = clock_nanosleep && val26;
 	bool val20 = val18 || epoll_wait;
 	bool val9 = futex_wait || val20;
 	bool val11 = val9 || kernel_thread;
diff --git a/tools/verification/models/rtapp/sleep.ltl b/tools/verification/models/rtapp/sleep.ltl
index 6f26c4810f78..2637bc48a620 100644
--- a/tools/verification/models/rtapp/sleep.ltl
+++ b/tools/verification/models/rtapp/sleep.ltl
@@ -9,7 +9,7 @@ RT_VALID_SLEEP_REASON = FUTEX_WAIT
 
 RT_FRIENDLY_NANOSLEEP = CLOCK_NANOSLEEP
                     and NANOSLEEP_TIMER_ABSTIME
-                    and (NANOSLEEP_CLOCK_MONOTONIC or NANOSLEEP_CLOCK_TAI)
+                    and not NANOSLEEP_CLOCK_REALTIME
 
 RT_FRIENDLY_WAKE = WOKEN_BY_EQUAL_OR_HIGHER_PRIO
                 or WOKEN_BY_HARDIRQ
-- 
2.47.3


^ permalink raw reply related

* [PATCH 3/3] rv/rtapp: Add wakeup monitor
From: Nam Cao @ 2026-05-19  7:49 UTC (permalink / raw)
  To: Gabriele Monaco, Steven Rostedt, linux-kernel, linux-trace-kernel; +Cc: Nam Cao
In-Reply-To: <cover.1779176466.git.namcao@linutronix.de>

Add a wakeup monitor to detect a lower-priority task waking up a
higher-priority task.

The rtapp/sleep monitor already detects this. However, that monitor
triggers an error in the context of the woken task and user only gets the
stacktrace of that task. It is also extremely useful to get the stacktrace
of the waking task, which this monitor offers. In other words, this monitor
complements the rtapp/sleep monitor.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 kernel/trace/rv/Kconfig                       |   1 +
 kernel/trace/rv/Makefile                      |   1 +
 kernel/trace/rv/monitors/wakeup/Kconfig       |  17 ++
 kernel/trace/rv/monitors/wakeup/wakeup.c      | 155 ++++++++++++++++++
 kernel/trace/rv/monitors/wakeup/wakeup.h      |  92 +++++++++++
 .../trace/rv/monitors/wakeup/wakeup_trace.h   |  14 ++
 kernel/trace/rv/rv_trace.h                    |   1 +
 tools/verification/models/rtapp/wakeup.ltl    |   5 +
 8 files changed, 286 insertions(+)
 create mode 100644 kernel/trace/rv/monitors/wakeup/Kconfig
 create mode 100644 kernel/trace/rv/monitors/wakeup/wakeup.c
 create mode 100644 kernel/trace/rv/monitors/wakeup/wakeup.h
 create mode 100644 kernel/trace/rv/monitors/wakeup/wakeup_trace.h
 create mode 100644 tools/verification/models/rtapp/wakeup.ltl

diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 3884b14df375..4d3a14a0bac2 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -76,6 +76,7 @@ source "kernel/trace/rv/monitors/opid/Kconfig"
 source "kernel/trace/rv/monitors/rtapp/Kconfig"
 source "kernel/trace/rv/monitors/pagefault/Kconfig"
 source "kernel/trace/rv/monitors/sleep/Kconfig"
+source "kernel/trace/rv/monitors/wakeup/Kconfig"
 # Add new rtapp monitors here
 
 source "kernel/trace/rv/monitors/stall/Kconfig"
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 94498da35b37..c2c0e4142eb4 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
 obj-$(CONFIG_RV_MON_STALL) += monitors/stall/stall.o
 obj-$(CONFIG_RV_MON_DEADLINE) += monitors/deadline/deadline.o
 obj-$(CONFIG_RV_MON_NOMISS) += monitors/nomiss/nomiss.o
+obj-$(CONFIG_RV_MON_WAKEUP) += monitors/wakeup/wakeup.o
 # Add new monitors here
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/wakeup/Kconfig b/kernel/trace/rv/monitors/wakeup/Kconfig
new file mode 100644
index 000000000000..3cf11c5cd5f7
--- /dev/null
+++ b/kernel/trace/rv/monitors/wakeup/Kconfig
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_WAKEUP
+	depends on RV
+	depends on RV_MON_RTAPP
+	depends on HAVE_SYSCALL_TRACEPOINTS
+	select TRACE_IRQFLAGS
+	default y
+	select LTL_MON_EVENTS_ID
+	bool "wakeup monitor"
+	help
+	  This monitor detects a lower-priority task waking up a
+	  higher-priority task. The RV_MON_SLEEP monitor already
+	  detects this case, but this monitor detects in the context
+	  of the waking task instead. This and RV_MON_SLEEP can be
+	  enabled together to get the stacktrace of both the waking
+	  task and the woken task.
diff --git a/kernel/trace/rv/monitors/wakeup/wakeup.c b/kernel/trace/rv/monitors/wakeup/wakeup.c
new file mode 100644
index 000000000000..534997a7b45c
--- /dev/null
+++ b/kernel/trace/rv/monitors/wakeup/wakeup.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+
+#define MODULE_NAME "wakeup"
+
+#include <trace/events/syscalls.h>
+#include <trace/events/sched.h>
+#include <trace/events/lock.h>
+#include <uapi/linux/futex.h>
+
+#include <rv_trace.h>
+#include <monitors/rtapp/rtapp.h>
+
+
+#ifndef __NR_futex
+#define __NR_futex (-__COUNTER__)
+#endif
+#ifndef __NR_futex_time64
+#define __NR_futex_time64 (-__COUNTER__)
+#endif
+
+#include "wakeup.h"
+#include <rv/ltl_monitor.h>
+
+static void ltl_atoms_fetch(struct task_struct *task, struct ltl_monitor *mon)
+{
+	/*
+	 * This includes "actual" real-time tasks and also PI-boosted
+	 * tasks. A task being PI-boosted means it is blocking an "actual"
+	 * real-task, therefore it should also obey the monitor's rule,
+	 * otherwise the "actual" real-task may be delayed.
+	 */
+	ltl_atom_set(mon, LTL_RT, rt_or_dl_task(task));
+}
+
+static void ltl_atoms_init(struct task_struct *task, struct ltl_monitor *mon, bool task_creation)
+{
+	ltl_atom_set(mon, LTL_WOKEN_BY_LOWER_PRIO, false);
+	ltl_atom_set(mon, LTL_WOKEN_BY_SOFTIRQ, false);
+
+	if (task_creation) {
+		ltl_atom_set(mon, LTL_BLOCK_ON_RT_MUTEX, false);
+		ltl_atom_set(mon, LTL_FUTEX_LOCK_PI, false);
+	}
+
+	ltl_atom_set(mon, LTL_USER_THREAD, !(task->flags & PF_KTHREAD));
+}
+
+static void handle_sched_waking(void *data, struct task_struct *task)
+{
+	if (this_cpu_read(hardirq_context)) {
+		return;
+	} else if (in_task()) {
+		if (current->prio > task->prio)
+			ltl_atom_pulse(task, LTL_WOKEN_BY_LOWER_PRIO, true);
+	} else if (in_serving_softirq()) {
+		ltl_atom_pulse(task, LTL_WOKEN_BY_SOFTIRQ, true);
+	}
+}
+
+static void handle_contention_begin(void *data, void *lock, unsigned int flags)
+{
+	if (flags & LCB_F_RT)
+		ltl_atom_update(current, LTL_BLOCK_ON_RT_MUTEX, true);
+}
+
+static void handle_contention_end(void *data, void *lock, int ret)
+{
+	ltl_atom_update(current, LTL_BLOCK_ON_RT_MUTEX, false);
+}
+
+static void handle_sys_enter(void *data, struct pt_regs *regs, long id)
+{
+	unsigned long args[6];
+	int op, cmd;
+
+	switch (id) {
+	case __NR_futex:
+	case __NR_futex_time64:
+		syscall_get_arguments(current, regs, args);
+		op = args[1];
+		cmd = op & FUTEX_CMD_MASK;
+
+		switch (cmd) {
+		case FUTEX_LOCK_PI:
+		case FUTEX_LOCK_PI2:
+			ltl_atom_update(current, LTL_FUTEX_LOCK_PI, true);
+			break;
+		}
+		break;
+	}
+}
+
+static void handle_sys_exit(void *data, struct pt_regs *regs, long ret)
+{
+	ltl_atom_update(current, LTL_FUTEX_LOCK_PI, false);
+}
+
+static int enable_wakeup(void)
+{
+	int retval;
+
+	retval = ltl_monitor_init();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("rtapp_wakeup", sched_waking, handle_sched_waking);
+	rv_attach_trace_probe("rtapp_wakeup", contention_begin, handle_contention_begin);
+	rv_attach_trace_probe("rtapp_wakeup", contention_end, handle_contention_end);
+	rv_attach_trace_probe("rtapp_wakeup", sys_enter, handle_sys_enter);
+	rv_attach_trace_probe("rtapp_wakeup", sys_exit, handle_sys_exit);
+
+	return 0;
+}
+
+static void disable_wakeup(void)
+{
+	rv_detach_trace_probe("rtapp_wakeup", sched_waking, handle_sched_waking);
+	rv_detach_trace_probe("rtapp_wakeup", contention_begin, handle_contention_begin);
+	rv_detach_trace_probe("rtapp_wakeup", contention_end, handle_contention_end);
+	rv_detach_trace_probe("rtapp_wakeup", sys_enter, handle_sys_enter);
+	rv_detach_trace_probe("rtapp_wakeup", sys_exit, handle_sys_exit);
+
+	ltl_monitor_destroy();
+}
+
+static struct rv_monitor rv_wakeup = {
+	.name = "wakeup",
+	.description = "Monitor that real-time tasks are not woken by lower-priority tasks",
+	.enable = enable_wakeup,
+	.disable = disable_wakeup,
+};
+
+static int __init register_wakeup(void)
+{
+	return rv_register_monitor(&rv_wakeup, &rv_rtapp);
+}
+
+static void __exit unregister_wakeup(void)
+{
+	rv_unregister_monitor(&rv_wakeup);
+}
+
+module_init(register_wakeup);
+module_exit(unregister_wakeup);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Nam Cao <namcao@linutronix.de>");
+MODULE_DESCRIPTION("Monitor that real-time tasks are not woken by lower-priority tasks");
diff --git a/kernel/trace/rv/monitors/wakeup/wakeup.h b/kernel/trace/rv/monitors/wakeup/wakeup.h
new file mode 100644
index 000000000000..6f80da64e0e1
--- /dev/null
+++ b/kernel/trace/rv/monitors/wakeup/wakeup.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * C implementation of Buchi automaton, automatically generated by
+ * tools/verification/rvgen from the linear temporal logic specification.
+ * For further information, see kernel documentation:
+ *   Documentation/trace/rv/linear_temporal_logic.rst
+ */
+
+#include <linux/rv.h>
+
+#define MONITOR_NAME wakeup
+
+enum ltl_atom {
+	LTL_BLOCK_ON_RT_MUTEX,
+	LTL_FUTEX_LOCK_PI,
+	LTL_RT,
+	LTL_USER_THREAD,
+	LTL_WOKEN_BY_LOWER_PRIO,
+	LTL_WOKEN_BY_SOFTIRQ,
+	LTL_NUM_ATOM
+};
+static_assert(LTL_NUM_ATOM <= RV_MAX_LTL_ATOM);
+
+static const char *ltl_atom_str(enum ltl_atom atom)
+{
+	static const char *const names[] = {
+		"bl_on_rt_mu",
+		"fu_lo_pi",
+		"rt",
+		"us_th",
+		"wo_lo_pr",
+		"wo_so",
+	};
+
+	return names[atom];
+}
+
+enum ltl_buchi_state {
+	S0,
+	RV_NUM_BA_STATES
+};
+static_assert(RV_NUM_BA_STATES <= RV_MAX_BA_STATES);
+
+static void ltl_start(struct task_struct *task, struct ltl_monitor *mon)
+{
+	bool woken_by_softirq = test_bit(LTL_WOKEN_BY_SOFTIRQ, mon->atoms);
+	bool woken_by_lower_prio = test_bit(LTL_WOKEN_BY_LOWER_PRIO, mon->atoms);
+	bool user_thread = test_bit(LTL_USER_THREAD, mon->atoms);
+	bool rt = test_bit(LTL_RT, mon->atoms);
+	bool futex_lock_pi = test_bit(LTL_FUTEX_LOCK_PI, mon->atoms);
+	bool block_on_rt_mutex = test_bit(LTL_BLOCK_ON_RT_MUTEX, mon->atoms);
+	bool val9 = block_on_rt_mutex || futex_lock_pi;
+	bool val6 = !woken_by_softirq;
+	bool val5 = !woken_by_lower_prio;
+	bool val8 = val5 && val6;
+	bool val10 = val8 || val9;
+	bool val3 = !user_thread;
+	bool val2 = !rt;
+	bool val4 = val2 || val3;
+	bool val11 = val4 || val10;
+
+	if (val11)
+		__set_bit(S0, mon->states);
+}
+
+static void
+ltl_possible_next_states(struct ltl_monitor *mon, unsigned int state, unsigned long *next)
+{
+	bool woken_by_softirq = test_bit(LTL_WOKEN_BY_SOFTIRQ, mon->atoms);
+	bool woken_by_lower_prio = test_bit(LTL_WOKEN_BY_LOWER_PRIO, mon->atoms);
+	bool user_thread = test_bit(LTL_USER_THREAD, mon->atoms);
+	bool rt = test_bit(LTL_RT, mon->atoms);
+	bool futex_lock_pi = test_bit(LTL_FUTEX_LOCK_PI, mon->atoms);
+	bool block_on_rt_mutex = test_bit(LTL_BLOCK_ON_RT_MUTEX, mon->atoms);
+	bool val9 = block_on_rt_mutex || futex_lock_pi;
+	bool val6 = !woken_by_softirq;
+	bool val5 = !woken_by_lower_prio;
+	bool val8 = val5 && val6;
+	bool val10 = val8 || val9;
+	bool val3 = !user_thread;
+	bool val2 = !rt;
+	bool val4 = val2 || val3;
+	bool val11 = val4 || val10;
+
+	switch (state) {
+	case S0:
+		if (val11)
+			__set_bit(S0, next);
+		break;
+	}
+}
diff --git a/kernel/trace/rv/monitors/wakeup/wakeup_trace.h b/kernel/trace/rv/monitors/wakeup/wakeup_trace.h
new file mode 100644
index 000000000000..7e056183f920
--- /dev/null
+++ b/kernel/trace/rv/monitors/wakeup/wakeup_trace.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_WAKEUP
+DEFINE_EVENT(event_ltl_monitor_id, event_wakeup,
+	     TP_PROTO(struct task_struct *task, char *states, char *atoms, char *next),
+	     TP_ARGS(task, states, atoms, next));
+DEFINE_EVENT(error_ltl_monitor_id, error_wakeup,
+	     TP_PROTO(struct task_struct *task),
+	     TP_ARGS(task));
+#endif /* CONFIG_RV_MON_WAKEUP */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 9622c269789c..2f8a932432c9 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -241,6 +241,7 @@ DECLARE_EVENT_CLASS(error_ltl_monitor_id,
 );
 #include <monitors/pagefault/pagefault_trace.h>
 #include <monitors/sleep/sleep_trace.h>
+#include <monitors/wakeup/wakeup_trace.h>
 // Add new monitors based on CONFIG_LTL_MON_EVENTS_ID here
 #endif /* CONFIG_LTL_MON_EVENTS_ID */
 
diff --git a/tools/verification/models/rtapp/wakeup.ltl b/tools/verification/models/rtapp/wakeup.ltl
new file mode 100644
index 000000000000..a5d63ca0811a
--- /dev/null
+++ b/tools/verification/models/rtapp/wakeup.ltl
@@ -0,0 +1,5 @@
+RULE = always (((RT and USER_THREAD) imply
+		(not (WOKEN_BY_LOWER_PRIO or WOKEN_BY_SOFTIRQ)) or ALLOWLIST))
+
+ALLOWLIST = BLOCK_ON_RT_MUTEX
+         or FUTEX_LOCK_PI
-- 
2.47.3


^ permalink raw reply related

* [PATCH] tracing/blktrace: Use sysfs_emit() for sysfs show callbacks
From: Yu Peng @ 2026-05-19  7:50 UTC (permalink / raw)
  To: Jens Axboe, Steven Rostedt, Masami Hiramatsu
  Cc: Mathieu Desnoyers, linux-block, linux-kernel, linux-trace-kernel,
	Yu Peng

Use sysfs_emit() and sysfs_emit_at() instead of sprintf() when
formatting blktrace sysfs show output.

No functional change intended.

Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
 kernel/trace/blktrace.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 8cd2520b4c99e..1eda8158883ca 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -2025,11 +2025,11 @@ static ssize_t blk_trace_mask2str(char *buf, int mask)
 
 	for (i = 0; i < ARRAY_SIZE(mask_maps); i++) {
 		if (mask & mask_maps[i].mask) {
-			p += sprintf(p, "%s%s",
+			p += sysfs_emit_at(buf, p - buf, "%s%s",
 				    (p == buf) ? "" : ",", mask_maps[i].str);
 		}
 	}
-	*p++ = '\n';
+	p += sysfs_emit_at(buf, p - buf, "\n");
 
 	return p - buf;
 }
@@ -2048,20 +2048,20 @@ static ssize_t sysfs_blk_trace_attr_show(struct device *dev,
 	bt = rcu_dereference_protected(q->blk_trace,
 				       lockdep_is_held(&q->debugfs_mutex));
 	if (attr == &dev_attr_enable) {
-		ret = sprintf(buf, "%u\n", !!bt);
+		ret = sysfs_emit(buf, "%u\n", !!bt);
 		goto out_unlock_bdev;
 	}
 
 	if (bt == NULL)
-		ret = sprintf(buf, "disabled\n");
+		ret = sysfs_emit(buf, "disabled\n");
 	else if (attr == &dev_attr_act_mask)
 		ret = blk_trace_mask2str(buf, bt->act_mask);
 	else if (attr == &dev_attr_pid)
-		ret = sprintf(buf, "%u\n", bt->pid);
+		ret = sysfs_emit(buf, "%u\n", bt->pid);
 	else if (attr == &dev_attr_start_lba)
-		ret = sprintf(buf, "%llu\n", bt->start_lba);
+		ret = sysfs_emit(buf, "%llu\n", bt->start_lba);
 	else if (attr == &dev_attr_end_lba)
-		ret = sprintf(buf, "%llu\n", bt->end_lba);
+		ret = sysfs_emit(buf, "%llu\n", bt->end_lba);
 
 out_unlock_bdev:
 	blk_debugfs_unlock_nomemrestore(q);
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 07/13] rv: Simply hybrid automata monitors's clock variables
From: Gabriele Monaco @ 2026-05-19  7:58 UTC (permalink / raw)
  To: Nam Cao
  Cc: Steven Rostedt, Wander Lairson Costa, linux-trace-kernel,
	linux-kernel
In-Reply-To: <87h5o588m4.fsf@yellow.woof>

On Mon, 2026-05-18 at 09:44 +0200, Nam Cao wrote:
> Gabriele Monaco <gmonaco@redhat.com> writes:
> > On Mon, 2026-05-11 at 13:55 +0200, Nam Cao wrote:
> > > That can work, but not ideal, because hrtimer will not be usable.
> > 
> > Why not? If we have HA_TIMER_WHEEL , we'd use timer and expire, if we have
> > HA_TIMER_HRTIMER we'd only need hrtimer with it's hrtimer_get_expires():
> > 
> >  union {
> >  struct hrtimer hrtimer;
> >  struct {
> >  struct timer_list timer;
> >  u64 expire; /* Explicitly store the armed budget */
> >  };
> > 
> > we already can't use timer and hrtimer interchangeably.
> > What am I missing here?
> 
> Ah, now I understand the trick, thanks.
> 
> We already have an "expires" field in struct timer_list. But I am not
> sure if we are supposed to touch that field. Your proposal looks safer.

Yeah and even if we did, that'd be jiffy-granularity, so not good if the clock
is ns-based.

Let me sketch it out.

Anyway back to the patch, you need to fix the build for HA_TIMER_HRTIMER as well
(too many arguments to function ‘ha_invariant_passed_ns’; expected 3, have 4),
and the title should s/Simply/Simplify/

Thanks,
Gabriele

> 
> > > Looking at the throttle monitor again, is it possible to rewrite
> > > runtime_left_ns() to read .dl_runtime instead of .runtime? I don't know
> > > the deadline schedule very well, but I think .dl_runtime is not changing
> > > like .runtime?
> > 
> > In theory yes, but since the runtime is consumed only when running, we
> > cannot
> > just set the timeout once. We either save how much was consumed somewhere or
> > do
> > some start/pause mechanism.
> > Neither looks simpler to me.
> 
> Understood.
> 
> Nam


^ permalink raw reply

* [PATCH] tracing/branch: Use pr_warn() instead of printk(KERN_WARNING)
From: Yu Peng @ 2026-05-19  8:16 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu
  Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel, Yu Peng

Use pr_warn() instead of printk(KERN_WARNING ...) for the branch tracer
warning messages.

Keep the message text unchanged. The change only removes the open-coded
log level from these warnings.

Signed-off-by: Yu Peng <pengyu@kylinos.cn>
 kernel/trace/trace_branch.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
index d1564db95a8f5..d8e97ad798f07 100644
--- a/kernel/trace/trace_branch.c
+++ b/kernel/trace/trace_branch.c
@@ -181,8 +181,7 @@ __init static int init_branch_tracer(void)
 
 	ret = register_trace_event(&trace_branch_event);
 	if (!ret) {
-		printk(KERN_WARNING "Warning: could not register "
-				    "branch events\n");
+		pr_warn("Warning: could not register branch events\n");
 		return 1;
 	}
 	return register_tracer(&branch_trace);
@@ -374,8 +373,7 @@ __init static int init_annotated_branch_stats(void)
 
 	ret = register_stat_tracer(&annotated_branch_stats);
 	if (ret) {
-		printk(KERN_WARNING "Warning: could not register "
-				    "annotated branches stats\n");
+		pr_warn("Warning: could not register annotated branches stats\n");
 		return ret;
 	}
 	return 0;
@@ -439,8 +437,7 @@ __init static int all_annotated_branch_stats(void)
 
 	ret = register_stat_tracer(&all_branch_stats);
 	if (ret) {
-		printk(KERN_WARNING "Warning: could not register "
-				    "all branches stats\n");
+		pr_warn("Warning: could not register all branches stats\n");
 		return ret;
 	}
 	return 0;
-- 
2.43.0

^ permalink raw reply related

* [PATCH] tracing: Use krealloc_array() for trace option array growth
From: Yu Peng @ 2026-05-19  8:34 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu
  Cc: Mathieu Desnoyers, linux-kernel, linux-trace-kernel, Yu Peng

Use krealloc_array() when growing tr->topts instead of open-coding the
size calculation in krealloc().

This makes the resize path use the helper intended for array allocations
and avoids manual multiplication of the element count and element size.

Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
 kernel/trace/trace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6eb4d3097a4d5..bde22d693d2e4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7928,8 +7928,8 @@ create_trace_option_files(struct trace_array *tr, struct tracer *tracer,
 	if (!topts)
 		return 0;
 
-	tr_topts = krealloc(tr->topts, sizeof(*tr->topts) * (tr->nr_topts + 1),
-			    GFP_KERNEL);
+	tr_topts = krealloc_array(tr->topts, tr->nr_topts + 1, sizeof(*tr->topts),
+				  GFP_KERNEL);
 	if (!tr_topts) {
 		kfree(topts);
 		return -ENOMEM;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 6/9] rv: Ensure synchronous cleanup for HA monitors
From: Gabriele Monaco @ 2026-05-19  9:31 UTC (permalink / raw)
  To: Wen Yang; +Cc: linux-kernel, Steven Rostedt, Nam Cao, linux-trace-kernel
In-Reply-To: <88a6fc5c08d18e3c1f6d29dc106db80fa688bf87.camel@redhat.com>



On Mon, 2026-05-18 at 13:54 +0200, Gabriele Monaco wrote:
> Something like:
> 
> void __ha_monitor_timer_callback() {
> 	guard(rcu)(); //this is only for waiters, let them wait more
> 
> 	if (unlikely(!da_monitor_handling_event(&ha_mon->da_mon)))
> 		return;
> 	smp_rmb();
> 	curr_state = READ_ONCE(ha_mon->da_mon.curr_state);
> 	...
> }
> 
> void da_monitor_reset() {
> 	da_monitor_reset_hook(da_mon);
> 	WRITE_ONCE(da_mon->monitoring, 0);
> 	smp_wmb();
> 	WRITE_ONCE(da_mon->curr_state, model_get_initial_state());
> }

That's obviously not going to work unless I read curr_state earlier (and use the
acquire/release helpers while at it):

void __ha_monitor_timer_callback() {
	guard(rcu)(); //this is only for waiters, let them wait more

	curr_state = smp_load_acquire(&ha_mon->da_mon.curr_state);
	if (unlikely(!da_monitor_handling_event(&ha_mon->da_mon)))
		return;
	...
}

void da_monitor_reset() {
	da_monitor_reset_hook(da_mon);
	WRITE_ONCE(da_mon->monitoring, 0);
	smp_store_release(&da_mon->curr_state, model_get_initial_state());
}


^ permalink raw reply

* Re: [PATCH v4] tracing/probes: Allow use of BTF names to dereference pointers
From: kernel test robot @ 2026-05-19  9:34 UTC (permalink / raw)
  To: Steven Rostedt, LKML, Linux Trace Kernel, bpf
  Cc: oe-kbuild-all, Masami Hiramatsu, Mathieu Desnoyers, Mark Rutland,
	Peter Zijlstra, Namhyung Kim, Takaya Saeki, Douglas Raillard,
	Tom Zanussi, Andrew Morton, Linux Memory Management List,
	Thomas Gleixner, Ian Rogers, Jiri Olsa, Subject:[PATCH v2]
In-Reply-To: <20260518232312.0c78f055@gandalf.local.home>

Hi Steven,

kernel test robot noticed the following build errors:

[auto build test ERROR on trace/for-next]
[also build test ERROR on linus/master v7.1-rc4 next-20260518]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steven-Rostedt/tracing-probes-Allow-use-of-BTF-names-to-dereference-pointers/20260519-121930
base:   https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace for-next
patch link:    https://lore.kernel.org/r/20260518232312.0c78f055%40gandalf.local.home
patch subject: [PATCH v4] tracing/probes: Allow use of BTF names to dereference pointers
config: sh-defconfig (https://download.01.org/0day-ci/archive/20260519/202605191710.jVjifK67-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 15.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260519/202605191710.jVjifK67-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605191710.jVjifK67-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/trace/trace_probe.c: In function 'parse_probe_arg':
>> kernel/trace/trace_probe.c:1289:23: error: implicit declaration of function 'query_btf_struct' [-Wimplicit-function-declaration]
    1289 |                 ret = query_btf_struct(arg + 1, ctx);
         |                       ^~~~~~~~~~~~~~~~


vim +/query_btf_struct +1289 kernel/trace/trace_probe.c

  1120	
  1121	/* Recursive argument parser */
  1122	static int
  1123	parse_probe_arg(char *arg, const struct fetch_type *type,
  1124			struct fetch_insn **pcode, struct fetch_insn *end,
  1125			struct traceprobe_parse_context *ctx)
  1126	{
  1127		struct fetch_insn *code = *pcode;
  1128		unsigned long param;
  1129		int deref = FETCH_OP_DEREF;
  1130		long offset = 0;
  1131		char *tmp;
  1132		int ret = 0;
  1133	
  1134		switch (arg[0]) {
  1135		case '$':
  1136			ret = parse_probe_vars(arg, type, pcode, end, ctx);
  1137			break;
  1138	
  1139		case '%':	/* named register */
  1140			if (ctx->flags & (TPARG_FL_TEVENT | TPARG_FL_FPROBE)) {
  1141				/* eprobe and fprobe do not handle registers */
  1142				trace_probe_log_err(ctx->offset, BAD_VAR);
  1143				break;
  1144			}
  1145			ret = regs_query_register_offset(arg + 1);
  1146			if (ret >= 0) {
  1147				code->op = FETCH_OP_REG;
  1148				code->param = (unsigned int)ret;
  1149				ret = 0;
  1150			} else
  1151				trace_probe_log_err(ctx->offset, BAD_REG_NAME);
  1152			break;
  1153	
  1154		case '@':	/* memory, file-offset or symbol */
  1155			if (isdigit(arg[1])) {
  1156				ret = kstrtoul(arg + 1, 0, &param);
  1157				if (ret) {
  1158					trace_probe_log_err(ctx->offset, BAD_MEM_ADDR);
  1159					break;
  1160				}
  1161				/* load address */
  1162				code->op = FETCH_OP_IMM;
  1163				code->immediate = param;
  1164			} else if (arg[1] == '+') {
  1165				/* kprobes don't support file offsets */
  1166				if (ctx->flags & TPARG_FL_KERNEL) {
  1167					trace_probe_log_err(ctx->offset, FILE_ON_KPROBE);
  1168					return -EINVAL;
  1169				}
  1170				ret = kstrtol(arg + 2, 0, &offset);
  1171				if (ret) {
  1172					trace_probe_log_err(ctx->offset, BAD_FILE_OFFS);
  1173					break;
  1174				}
  1175	
  1176				code->op = FETCH_OP_FOFFS;
  1177				code->immediate = (unsigned long)offset;  // imm64?
  1178			} else {
  1179				/* uprobes don't support symbols */
  1180				if (!(ctx->flags & TPARG_FL_KERNEL)) {
  1181					trace_probe_log_err(ctx->offset, SYM_ON_UPROBE);
  1182					return -EINVAL;
  1183				}
  1184				/* Preserve symbol for updating */
  1185				code->op = FETCH_NOP_SYMBOL;
  1186				code->data = kstrdup(arg + 1, GFP_KERNEL);
  1187				if (!code->data)
  1188					return -ENOMEM;
  1189				if (++code == end) {
  1190					trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
  1191					return -EINVAL;
  1192				}
  1193				code->op = FETCH_OP_IMM;
  1194				code->immediate = 0;
  1195			}
  1196			/* These are fetching from memory */
  1197			if (++code == end) {
  1198				trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
  1199				return -EINVAL;
  1200			}
  1201			*pcode = code;
  1202			code->op = FETCH_OP_DEREF;
  1203			code->offset = offset;
  1204			break;
  1205	
  1206		case '+':	/* deref memory */
  1207		case '-':
  1208			if (arg[1] == 'u') {
  1209				deref = FETCH_OP_UDEREF;
  1210				arg[1] = arg[0];
  1211				arg++;
  1212			}
  1213			if (arg[0] == '+')
  1214				arg++;	/* Skip '+', because kstrtol() rejects it. */
  1215			tmp = strchr(arg, '(');
  1216			if (!tmp) {
  1217				trace_probe_log_err(ctx->offset, DEREF_NEED_BRACE);
  1218				return -EINVAL;
  1219			}
  1220			*tmp = '\0';
  1221			ret = kstrtol(arg, 0, &offset);
  1222			if (ret) {
  1223				trace_probe_log_err(ctx->offset, BAD_DEREF_OFFS);
  1224				break;
  1225			}
  1226			ctx->offset += (tmp + 1 - arg) + (arg[0] != '-' ? 1 : 0);
  1227			arg = tmp + 1;
  1228			tmp = strrchr(arg, ')');
  1229			if (!tmp) {
  1230				trace_probe_log_err(ctx->offset + strlen(arg),
  1231						    DEREF_OPEN_BRACE);
  1232				return -EINVAL;
  1233			} else {
  1234				const struct fetch_type *t2 = find_fetch_type(NULL, ctx->flags);
  1235				int cur_offs = ctx->offset;
  1236	
  1237				*tmp = '\0';
  1238				ret = parse_probe_arg(arg, t2, &code, end, ctx);
  1239				if (ret)
  1240					break;
  1241				ctx->offset = cur_offs;
  1242				if (code->op == FETCH_OP_COMM ||
  1243				    code->op == FETCH_OP_DATA) {
  1244					trace_probe_log_err(ctx->offset, COMM_CANT_DEREF);
  1245					return -EINVAL;
  1246				}
  1247				if (++code == end) {
  1248					trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
  1249					return -EINVAL;
  1250				}
  1251				*pcode = code;
  1252	
  1253				code->op = deref;
  1254				code->offset = offset;
  1255				/* Reset the last type if used */
  1256				ctx->last_type = NULL;
  1257			}
  1258			break;
  1259		case '\\':	/* Immediate value */
  1260			if (arg[1] == '"') {	/* Immediate string */
  1261				ret = __parse_imm_string(arg + 2, &tmp, ctx->offset + 2);
  1262				if (ret)
  1263					break;
  1264				code->op = FETCH_OP_DATA;
  1265				code->data = tmp;
  1266			} else {
  1267				ret = str_to_immediate(arg + 1, &code->immediate);
  1268				if (ret)
  1269					trace_probe_log_err(ctx->offset + 1, BAD_IMM);
  1270				else
  1271					code->op = FETCH_OP_IMM;
  1272			}
  1273			break;
  1274		case '(':
  1275			tmp = strrchr(arg, ')');
  1276			if (!tmp) {
  1277				trace_probe_log_err(ctx->offset + strlen(arg),
  1278						    DEREF_OPEN_BRACE);
  1279				return -EINVAL;
  1280			}
  1281	
  1282			tmp--;
  1283			if (*tmp != '*') {
  1284				trace_probe_log_err(ctx->offset + (tmp - arg),
  1285						    NO_PTR_STRCT);
  1286				return -EINVAL;
  1287			}
  1288			*tmp = '\0';
> 1289			ret = query_btf_struct(arg + 1, ctx);
  1290			*tmp = '*';
  1291	
  1292			if (ret < 0) {
  1293				trace_probe_log_err(ctx->offset + 1, NO_PTR_STRCT);
  1294				return -EINVAL;
  1295			}
  1296	
  1297			ctx->flags |= TPARG_FL_STRUCT;
  1298			tmp += 2;
  1299	
  1300			if (*tmp != '$') {
  1301				trace_probe_log_err(ctx->offset + (tmp - arg),
  1302						    BAD_VAR);
  1303				return -EINVAL;
  1304			}
  1305	
  1306			ctx->offset += tmp - arg;
  1307			ret = parse_probe_vars(tmp, type, pcode, end, ctx);
  1308			ctx->flags &= ~TPARG_FL_STRUCT;
  1309			ctx->last_struct = NULL;
  1310			break;
  1311		default:
  1312			if (isalpha(arg[0]) || arg[0] == '_') {	/* BTF variable */
  1313				if (!tparg_is_function_entry(ctx->flags) &&
  1314				    !tparg_is_function_return(ctx->flags)) {
  1315					trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
  1316					return -EINVAL;
  1317				}
  1318				ret = parse_btf_arg(arg, pcode, end, ctx);
  1319				break;
  1320			}
  1321		}
  1322		if (!ret && code->op == FETCH_OP_NOP) {
  1323			/* Parsed, but do not find fetch method */
  1324			trace_probe_log_err(ctx->offset, BAD_FETCH_ARG);
  1325			ret = -EINVAL;
  1326		}
  1327		return ret;
  1328	}
  1329	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v4] tracing/probes: Allow use of BTF names to dereference pointers
From: Masami Hiramatsu @ 2026-05-19  9:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux Trace Kernel, bpf, Masami Hiramatsu,
	Mathieu Desnoyers, Mark Rutland, Peter Zijlstra, Namhyung Kim,
	Takaya Saeki, Douglas Raillard, Tom Zanussi, Andrew Morton,
	Thomas Gleixner, Ian Rogers, Jiri Olsa,
	"Subject:[PATCH  v2]", tracing/pr
In-Reply-To: <20260518232312.0c78f055@gandalf.local.home>

On Mon, 18 May 2026 23:23:12 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt <rostedt@goodmis.org>
> 
> Add syntax to the FETCHARGS parsing of probes to be able to typecast a
> value to a pointer to a structure.
> 
> Currently, a dereference must be a number, where the user has to figure
> out manually the offset of a member of a structure that they want to
> dereference, unless the member is a function parameter that BTF already has
> information about what structure the argument is pointing to.
> 
> But for event probes, or generic kprobes that records a register that
> happens to be a pointer to a structure, they cannot dereference these
> values with BTF naming, but must use numerical offsets.

Thanks for updating!

> 
> For example, to find out what device a sk_buff is pointing to in the
> net_dev_xmit trace event, one must first use gdb to find the offsets of the
> members of the structures:
> 
>  (gdb) p &((struct sk_buff *)0)->dev
>  $1 = (struct net_device **) 0x10
>  (gdb) p &((struct net_device *)0)->name
>  $2 = (char (*)[16]) 0x118
> 
> And then use the raw numbers to dereference:
> 
>   # echo 'e:xmit net.net_dev_xmit +0x118(+0x10($skbaddr)):string' >> dynamic_events
> 
> If BTF is in the kernel, then instead, the $skbaddr can be typecast to
> sk_buff and use the normal dereference logic.
> 
>   # echo 'e:xmit net.net_dev_xmit (sk_buff*)$skbaddr->dev->name:string' >> dynamic_events

Ah, eprobes supports "$PARAM" to access its parameter by name.
That is a bit complicated. Should we allow user to access
parameter without '$' prefix for eprobes?

>   # echo 1 > events/eprobes/xmit/enable
>   # cat trace
> [..]
>     sshd-session-1022    [000] b..2.   860.249343: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.250061: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.250142: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.263553: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.283820: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.302716: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.322905: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.342828: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.362268: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.382335: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.400856: xmit: (net.net_dev_xmit) arg1="enp7s0"
>     sshd-session-1022    [000] b..2.   860.419893: xmit: (net.net_dev_xmit) arg1="enp7s0"

Looks very nice!

> 
> The syntax is simply: ([STRUCT]*)(VAR)->FIELD[->FIELD..]

Is the STRUCT optional?? (because [] means optional.) I guess no.

I think we maybe possible to skip '*' (Or, make it optional)
because this is not C-like typecasting, we don't support "struct"
reserved word, and it does not support white-spaces in each
fetcharg. In this case, (STRUCT)VAR->FIELD should work.

BTW, I'm also considering to support new cast syntax, which allows
us to derefer a pointer with "container_of". This is typically
used in the kernel.

We usually see this pattern:

struct {
	unsigned long		data;
	struct list_head	list;
} foo;

void callback(struct list_head *foo_list)
{
	unsigned long data = container_of(foo_list, struct foo, list)->data;
	...
}

To access @data, simple casting does not work. Thus we need a
new syntax:

	(STRUCT)(PTR,ASSIGN)->FIELD

So the above case, we can do:

	data=(foo)(foo_list,list)->data

This is naturally extend the type casting to support container_of()
equivalent casting.

> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
> Changes since v3: https://patch.msgid.link/20260518095832.52659a3a@gandalf.local.home
> 
>  *** COMPLETE REWRITE FROM V3 ***
> 
> - Rewrote it to use typecasting instead of simply replacing BTF names with
>   offsets.
> 
>  Documentation/trace/kprobetrace.rst |   3 +
>  kernel/trace/trace_probe.c          | 110 ++++++++++++++++++++++++----
>  kernel/trace/trace_probe.h          |   3 +
>  3 files changed, 100 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
> index 3b6791c17e9b..450ac646fe4c 100644
> --- a/Documentation/trace/kprobetrace.rst
> +++ b/Documentation/trace/kprobetrace.rst
> @@ -54,6 +54,9 @@ Synopsis of kprobe_events
>    $retval	: Fetch return value.(\*2)
>    $comm		: Fetch current task comm.
>    +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
> +  (STRUCT*)FETCHARG->FIELD[->FIELD] : If BTF is supported, typecast FETCHARG to
> +                  a pointer to STRUCT and then derference the pointer defined by
> +                  ->FIELD.
>    \IMM		: Store an immediate value to the argument.
>    NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
>    FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index e0d3a0da26af..b0829eb1cb52 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -464,6 +464,26 @@ static const char *fetch_type_from_btf_type(struct btf *btf,
>  	return NULL;
>  }
>  
> +static int query_btf_struct(const char *sname, struct traceprobe_parse_context *ctx)
> +{
> +	int id;
> +
> +	if (!ctx->btf) {
> +		struct btf *btf;

This needs an empty line here.

> +		id = bpf_find_btf_id(sname, BTF_KIND_STRUCT, &btf);
> +		if (id < 0)
> +			return -EINVAL;

Why don't you return id (it has corresponding errno)?

> +		ctx->btf = btf;
> +	} else {
> +		id = btf_find_by_name_kind(ctx->btf, sname, BTF_KIND_STRUCT);
> +		if (id < 0)
> +			return -EINVAL;

Ditto.

> +	}
> +
> +	ctx->last_struct = btf_type_by_id(ctx->btf, id);
> +	return 0;
> +}
> +
>  static int query_btf_context(struct traceprobe_parse_context *ctx)
>  {
>  	const struct btf_param *param;
> @@ -471,12 +491,12 @@ static int query_btf_context(struct traceprobe_parse_context *ctx)
>  	struct btf *btf;
>  	s32 nr;
>  
> -	if (ctx->btf)
> -		return 0;
> -
>  	if (!ctx->funcname)
>  		return -EINVAL;
>  
> +	if (ctx->btf)
> +		return 0;
> +

Could you tell me why this order is changed?
I think this type casting will allow us to skip checking funcname
because btf context is already specified.

Ah, BTW, we may need to use a special struct btf* for type
casting. If the target function is in a module and the
casting type is defined in vmlinux, those are stored in
the different places...


for example,

 p funcA (foo)$arg1->bar buz

In this case, buz needs to use BTF including funcA.
Maybe we need to introduce ctx->func_btf, which resets ctx->btf
in traceprobe_parse_probe_arg_body() where parse_probe_arg()
is calling, e.g.

	ctx->last_type = NULL;
+	if (ctx->btf)
+		btf_put(ctx->btf);
+	ctx->btf = ctx->func_btf;
	ret = parse_probe_arg(arg, parg->type, &code, &code[FETCH_INSN_MAX - 1],
			      ctx);


>  	type = btf_find_func_proto(ctx->funcname, &btf);
>  	if (!type)
>  		return -ENOENT;
> @@ -514,6 +534,7 @@ static void clear_btf_context(struct traceprobe_parse_context *ctx)
>  		ctx->proto = NULL;
>  		ctx->params = NULL;
>  		ctx->nr_params = 0;
> +		ctx->last_struct = NULL;
>  	}
>  }
>  
> @@ -554,22 +575,28 @@ static int parse_btf_field(char *fieldname, const struct btf_type *type,
>  	struct fetch_insn *code = *pcode;
>  	const struct btf_member *field;
>  	u32 bitoffs, anon_offs;
> +	bool is_struct = ctx->flags & TPARG_FL_STRUCT;
>  	char *next;
>  	int is_ptr;
>  	s32 tid;
>  
>  	do {
> -		/* Outer loop for solving arrow operator ('->') */
> -		if (BTF_INFO_KIND(type->info) != BTF_KIND_PTR) {
> -			trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
> -			return -EINVAL;
> -		}
> -		/* Convert a struct pointer type to a struct type */
> -		type = btf_type_skip_modifiers(ctx->btf, type->type, &tid);
> -		if (!type) {
> -			trace_probe_log_err(ctx->offset, BAD_BTF_TID);
> -			return -EINVAL;
> +		if (!is_struct) {
> +			/* Outer loop for solving arrow operator ('->') */
> +			if (BTF_INFO_KIND(type->info) != BTF_KIND_PTR) {
> +				trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
> +				return -EINVAL;
> +			}
> +
> +			/* Convert a struct pointer type to a struct type */
> +			type = btf_type_skip_modifiers(ctx->btf, type->type, &tid);
> +			if (!type) {
> +				trace_probe_log_err(ctx->offset, BAD_BTF_TID);
> +				return -EINVAL;
> +			}
>  		}
> +		/* Only the first type can skip being a pointer */
> +		is_struct = false;
>  
>  		bitoffs = 0;
>  		do {
> @@ -635,12 +662,12 @@ static int parse_btf_arg(char *varname,
>  {
>  	struct fetch_insn *code = *pcode;
>  	const struct btf_param *params;
> -	const struct btf_type *type;
> +	const struct btf_type *type = NULL;
>  	char *field = NULL;
>  	int i, is_ptr, ret;
>  	u32 tid;
>  
> -	if (WARN_ON_ONCE(!ctx->funcname))
> +	if (WARN_ON_ONCE(!ctx->funcname && !(ctx->flags & TPARG_FL_STRUCT)))
>  		return -EINVAL;
>  
>  	is_ptr = split_next_field(varname, &field, ctx);
> @@ -704,11 +731,18 @@ static int parse_btf_arg(char *varname,
>  			goto found;
>  		}
>  	}
> +
> +	if (ctx->flags & TPARG_FL_STRUCT) {
> +		type = ctx->last_struct;
> +		goto found;

I rather like to jump type_found: label instead of
checking !type. (Or, save tid instead of type)

> +	}
> +
>  	trace_probe_log_err(ctx->offset, NO_BTFARG);
>  	return -ENOENT;
>  
>  found:
> -	type = btf_type_skip_modifiers(ctx->btf, tid, &tid);
> +	if (!type)
> +		type = btf_type_skip_modifiers(ctx->btf, tid, &tid);

type_found:

>  	if (!type) {
>  		trace_probe_log_err(ctx->offset, BAD_BTF_TID);
>  		return -EINVAL;
> @@ -952,6 +986,12 @@ static int parse_probe_vars(char *orig_arg, const struct fetch_type *t,
>  	int ret = 0;
>  	int len;
>  
> +	if (ctx->flags & TPARG_FL_STRUCT) {
> +		ret = parse_btf_arg(orig_arg, pcode, end, ctx);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
>  	if (ctx->flags & TPARG_FL_TEVENT) {
>  		if (code->data)
>  			return -EFAULT;
> @@ -1231,6 +1271,43 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
>  				code->op = FETCH_OP_IMM;
>  		}
>  		break;
> +	case '(':
> +		tmp = strrchr(arg, ')');

OK, in this step, we don't support nested cast etc. so this works.

> +		if (!tmp) {
> +			trace_probe_log_err(ctx->offset + strlen(arg),
> +					    DEREF_OPEN_BRACE);
> +			return -EINVAL;
> +		}
> +
> +		tmp--;
> +		if (*tmp != '*') {
> +			trace_probe_log_err(ctx->offset + (tmp - arg),
> +					    NO_PTR_STRCT);
> +			return -EINVAL;
> +		}

So I think this can be optional, not an error.

> +		*tmp = '\0';
> +		ret = query_btf_struct(arg + 1, ctx);
> +		*tmp = '*';
> +
> +		if (ret < 0) {
> +			trace_probe_log_err(ctx->offset + 1, NO_PTR_STRCT);
> +			return -EINVAL;
> +		}
> +
> +		ctx->flags |= TPARG_FL_STRUCT;
> +		tmp += 2;
> +
> +		if (*tmp != '$') {
> +			trace_probe_log_err(ctx->offset + (tmp - arg),
> +					    BAD_VAR);
> +			return -EINVAL;
> +		}

Ok, this limitation will be removed afterwards.

Thanks,

> +
> +		ctx->offset += tmp - arg;
> +		ret = parse_probe_vars(tmp, type, pcode, end, ctx);
> +		ctx->flags &= ~TPARG_FL_STRUCT;
> +		ctx->last_struct = NULL;
> +		break;
>  	default:
>  		if (isalpha(arg[0]) || arg[0] == '_') {	/* BTF variable */
>  			if (!tparg_is_function_entry(ctx->flags) &&
> @@ -1504,6 +1581,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
>  	code[FETCH_INSN_MAX - 1].op = FETCH_OP_END;
>  
>  	ctx->last_type = NULL;
> +	ctx->last_struct = NULL;
>  	ret = parse_probe_arg(arg, parg->type, &code, &code[FETCH_INSN_MAX - 1],
>  			      ctx);
>  	if (ret < 0)
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index 262d8707a3df..88ab9f6da591 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -394,6 +394,7 @@ static inline int traceprobe_get_entry_data_size(struct trace_probe *tp)
>   * TPARG_FL_KERNEL and TPARG_FL_USER are also mutually exclusive.
>   * TPARG_FL_FPROBE and TPARG_FL_TPOINT are optional but it should be with
>   * TPARG_FL_KERNEL.
> + * TPARG_FL_STRUCT is set if an argument was typecast to a structure.
>   */
>  #define TPARG_FL_RETURN BIT(0)
>  #define TPARG_FL_KERNEL BIT(1)
> @@ -402,6 +403,7 @@ static inline int traceprobe_get_entry_data_size(struct trace_probe *tp)
>  #define TPARG_FL_USER   BIT(4)
>  #define TPARG_FL_FPROBE BIT(5)
>  #define TPARG_FL_TPOINT BIT(6)
> +#define TPARG_FL_STRUCT BIT(7)
>  #define TPARG_FL_LOC_MASK	GENMASK(4, 0)
>  
>  static inline bool tparg_is_function_entry(unsigned int flags)
> @@ -423,6 +425,7 @@ struct traceprobe_parse_context {
>  	s32 nr_params;			/* The number of the parameters */
>  	struct btf *btf;		/* The BTF to be used */
>  	const struct btf_type *last_type;	/* Saved type */
> +	const struct btf_type *last_struct;	/* Saved structure */
>  	u32 last_bitoffs;		/* Saved bitoffs */
>  	u32 last_bitsize;		/* Saved bitsize */
>  	struct trace_probe *tp;
> -- 
> 2.53.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v4] tracing/probes: Allow use of BTF names to dereference pointers
From: kernel test robot @ 2026-05-19 10:10 UTC (permalink / raw)
  To: Steven Rostedt, LKML, Linux Trace Kernel, bpf
  Cc: llvm, oe-kbuild-all, Masami Hiramatsu, Mathieu Desnoyers,
	Mark Rutland, Peter Zijlstra, Namhyung Kim, Takaya Saeki,
	Douglas Raillard, Tom Zanussi, Andrew Morton,
	Linux Memory Management List, Thomas Gleixner, Ian Rogers,
	Jiri Olsa, Subject:[PATCH v2]
In-Reply-To: <20260518232312.0c78f055@gandalf.local.home>

Hi Steven,

kernel test robot noticed the following build errors:

[auto build test ERROR on trace/for-next]
[also build test ERROR on linus/master v7.1-rc4 next-20260518]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Steven-Rostedt/tracing-probes-Allow-use-of-BTF-names-to-dereference-pointers/20260519-121930
base:   https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace for-next
patch link:    https://lore.kernel.org/r/20260518232312.0c78f055%40gandalf.local.home
patch subject: [PATCH v4] tracing/probes: Allow use of BTF names to dereference pointers
config: sparc64-defconfig (https://download.01.org/0day-ci/archive/20260519/202605191828.Y3E73pH1-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260519/202605191828.Y3E73pH1-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605191828.Y3E73pH1-lkp@intel.com/

All errors (new ones prefixed by >>):

>> kernel/trace/trace_probe.c:1289:9: error: call to undeclared function 'query_btf_struct'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    1289 |                 ret = query_btf_struct(arg + 1, ctx);
         |                       ^
   1 error generated.


vim +/query_btf_struct +1289 kernel/trace/trace_probe.c

  1120	
  1121	/* Recursive argument parser */
  1122	static int
  1123	parse_probe_arg(char *arg, const struct fetch_type *type,
  1124			struct fetch_insn **pcode, struct fetch_insn *end,
  1125			struct traceprobe_parse_context *ctx)
  1126	{
  1127		struct fetch_insn *code = *pcode;
  1128		unsigned long param;
  1129		int deref = FETCH_OP_DEREF;
  1130		long offset = 0;
  1131		char *tmp;
  1132		int ret = 0;
  1133	
  1134		switch (arg[0]) {
  1135		case '$':
  1136			ret = parse_probe_vars(arg, type, pcode, end, ctx);
  1137			break;
  1138	
  1139		case '%':	/* named register */
  1140			if (ctx->flags & (TPARG_FL_TEVENT | TPARG_FL_FPROBE)) {
  1141				/* eprobe and fprobe do not handle registers */
  1142				trace_probe_log_err(ctx->offset, BAD_VAR);
  1143				break;
  1144			}
  1145			ret = regs_query_register_offset(arg + 1);
  1146			if (ret >= 0) {
  1147				code->op = FETCH_OP_REG;
  1148				code->param = (unsigned int)ret;
  1149				ret = 0;
  1150			} else
  1151				trace_probe_log_err(ctx->offset, BAD_REG_NAME);
  1152			break;
  1153	
  1154		case '@':	/* memory, file-offset or symbol */
  1155			if (isdigit(arg[1])) {
  1156				ret = kstrtoul(arg + 1, 0, &param);
  1157				if (ret) {
  1158					trace_probe_log_err(ctx->offset, BAD_MEM_ADDR);
  1159					break;
  1160				}
  1161				/* load address */
  1162				code->op = FETCH_OP_IMM;
  1163				code->immediate = param;
  1164			} else if (arg[1] == '+') {
  1165				/* kprobes don't support file offsets */
  1166				if (ctx->flags & TPARG_FL_KERNEL) {
  1167					trace_probe_log_err(ctx->offset, FILE_ON_KPROBE);
  1168					return -EINVAL;
  1169				}
  1170				ret = kstrtol(arg + 2, 0, &offset);
  1171				if (ret) {
  1172					trace_probe_log_err(ctx->offset, BAD_FILE_OFFS);
  1173					break;
  1174				}
  1175	
  1176				code->op = FETCH_OP_FOFFS;
  1177				code->immediate = (unsigned long)offset;  // imm64?
  1178			} else {
  1179				/* uprobes don't support symbols */
  1180				if (!(ctx->flags & TPARG_FL_KERNEL)) {
  1181					trace_probe_log_err(ctx->offset, SYM_ON_UPROBE);
  1182					return -EINVAL;
  1183				}
  1184				/* Preserve symbol for updating */
  1185				code->op = FETCH_NOP_SYMBOL;
  1186				code->data = kstrdup(arg + 1, GFP_KERNEL);
  1187				if (!code->data)
  1188					return -ENOMEM;
  1189				if (++code == end) {
  1190					trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
  1191					return -EINVAL;
  1192				}
  1193				code->op = FETCH_OP_IMM;
  1194				code->immediate = 0;
  1195			}
  1196			/* These are fetching from memory */
  1197			if (++code == end) {
  1198				trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
  1199				return -EINVAL;
  1200			}
  1201			*pcode = code;
  1202			code->op = FETCH_OP_DEREF;
  1203			code->offset = offset;
  1204			break;
  1205	
  1206		case '+':	/* deref memory */
  1207		case '-':
  1208			if (arg[1] == 'u') {
  1209				deref = FETCH_OP_UDEREF;
  1210				arg[1] = arg[0];
  1211				arg++;
  1212			}
  1213			if (arg[0] == '+')
  1214				arg++;	/* Skip '+', because kstrtol() rejects it. */
  1215			tmp = strchr(arg, '(');
  1216			if (!tmp) {
  1217				trace_probe_log_err(ctx->offset, DEREF_NEED_BRACE);
  1218				return -EINVAL;
  1219			}
  1220			*tmp = '\0';
  1221			ret = kstrtol(arg, 0, &offset);
  1222			if (ret) {
  1223				trace_probe_log_err(ctx->offset, BAD_DEREF_OFFS);
  1224				break;
  1225			}
  1226			ctx->offset += (tmp + 1 - arg) + (arg[0] != '-' ? 1 : 0);
  1227			arg = tmp + 1;
  1228			tmp = strrchr(arg, ')');
  1229			if (!tmp) {
  1230				trace_probe_log_err(ctx->offset + strlen(arg),
  1231						    DEREF_OPEN_BRACE);
  1232				return -EINVAL;
  1233			} else {
  1234				const struct fetch_type *t2 = find_fetch_type(NULL, ctx->flags);
  1235				int cur_offs = ctx->offset;
  1236	
  1237				*tmp = '\0';
  1238				ret = parse_probe_arg(arg, t2, &code, end, ctx);
  1239				if (ret)
  1240					break;
  1241				ctx->offset = cur_offs;
  1242				if (code->op == FETCH_OP_COMM ||
  1243				    code->op == FETCH_OP_DATA) {
  1244					trace_probe_log_err(ctx->offset, COMM_CANT_DEREF);
  1245					return -EINVAL;
  1246				}
  1247				if (++code == end) {
  1248					trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
  1249					return -EINVAL;
  1250				}
  1251				*pcode = code;
  1252	
  1253				code->op = deref;
  1254				code->offset = offset;
  1255				/* Reset the last type if used */
  1256				ctx->last_type = NULL;
  1257			}
  1258			break;
  1259		case '\\':	/* Immediate value */
  1260			if (arg[1] == '"') {	/* Immediate string */
  1261				ret = __parse_imm_string(arg + 2, &tmp, ctx->offset + 2);
  1262				if (ret)
  1263					break;
  1264				code->op = FETCH_OP_DATA;
  1265				code->data = tmp;
  1266			} else {
  1267				ret = str_to_immediate(arg + 1, &code->immediate);
  1268				if (ret)
  1269					trace_probe_log_err(ctx->offset + 1, BAD_IMM);
  1270				else
  1271					code->op = FETCH_OP_IMM;
  1272			}
  1273			break;
  1274		case '(':
  1275			tmp = strrchr(arg, ')');
  1276			if (!tmp) {
  1277				trace_probe_log_err(ctx->offset + strlen(arg),
  1278						    DEREF_OPEN_BRACE);
  1279				return -EINVAL;
  1280			}
  1281	
  1282			tmp--;
  1283			if (*tmp != '*') {
  1284				trace_probe_log_err(ctx->offset + (tmp - arg),
  1285						    NO_PTR_STRCT);
  1286				return -EINVAL;
  1287			}
  1288			*tmp = '\0';
> 1289			ret = query_btf_struct(arg + 1, ctx);
  1290			*tmp = '*';
  1291	
  1292			if (ret < 0) {
  1293				trace_probe_log_err(ctx->offset + 1, NO_PTR_STRCT);
  1294				return -EINVAL;
  1295			}
  1296	
  1297			ctx->flags |= TPARG_FL_STRUCT;
  1298			tmp += 2;
  1299	
  1300			if (*tmp != '$') {
  1301				trace_probe_log_err(ctx->offset + (tmp - arg),
  1302						    BAD_VAR);
  1303				return -EINVAL;
  1304			}
  1305	
  1306			ctx->offset += tmp - arg;
  1307			ret = parse_probe_vars(tmp, type, pcode, end, ctx);
  1308			ctx->flags &= ~TPARG_FL_STRUCT;
  1309			ctx->last_struct = NULL;
  1310			break;
  1311		default:
  1312			if (isalpha(arg[0]) || arg[0] == '_') {	/* BTF variable */
  1313				if (!tparg_is_function_entry(ctx->flags) &&
  1314				    !tparg_is_function_return(ctx->flags)) {
  1315					trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
  1316					return -EINVAL;
  1317				}
  1318				ret = parse_btf_arg(arg, pcode, end, ctx);
  1319				break;
  1320			}
  1321		}
  1322		if (!ret && code->op == FETCH_OP_NOP) {
  1323			/* Parsed, but do not find fetch method */
  1324			trace_probe_log_err(ctx->offset, BAD_FETCH_ARG);
  1325			ret = -EINVAL;
  1326		}
  1327		return ret;
  1328	}
  1329	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox