Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH v7 07/42] KVM: guest_memfd: Only prepare folios for private pages
From: Michael Roth @ 2026-06-03 13:54 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Suzuki K Poulose, aik, andrew.jones, binbin.wu, brauner,
	chao.p.peng, david, ira.weiny, jmattson, jthoughton, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, tabba, willy, wyihan, yan.y.zhao, forkloop,
	pratyush, aneesh.kumar, liam, Paolo Bonzini, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CAEvNRgGzOnA34WyOHtkOx5MZDZhOHaXAe+nD75AiJsZ-PsTSFQ@mail.gmail.com>

On Tue, Jun 02, 2026 at 01:46:09PM -0700, Ackerley Tng wrote:
> Suzuki K Poulose <suzuki.poulose@arm.com> writes:
> 
> > On 23/05/2026 01:17, Ackerley Tng via B4 Relay wrote:
> >> From: Ackerley Tng <ackerleytng@google.com>
> >>
> >> All-shared guest_memfd used to be only supported for non-CoCo VMs where
> >> preparation doesn't apply. INIT_SHARED is about to be supported for
> >> non-CoCo VMs in a later patch in this series.
> >
> > nit: s/non-CoCo/CoCo ?
> >
> 
> Yes, thanks!
> 
> >>
> >> In addition, KVM_SET_MEMORY_ATTRIBUTES2 is about to be supported in
> >> guest_memfd in a later patch in this series.
> >>
> >> This means that the kvm fault handler may now call kvm_gmem_get_pfn() on a
> >> shared folio for a CoCo VM where preparation applies.
> >>
> >> Add a check to make sure that preparation is only performed for private
> >> folios.
> >>
> >> Preparation will be undone on freeing (see kvm_gmem_free_folio()) and on
> >> conversion to shared.
> >>
> >> Signed-off-by: Michael Roth <michael.roth@amd.com>
> >
> > nit: Missing Co-Developed-by: ?
> >
> 
> IIRC this should have been
> 
> Suggested-by: Michael Roth <michael.roth@amd.com>
> 
> IIRC Michael suggested this on one of the guest_memfd calls, Michael
> please let me know if you remember otherwise!

That rings a bell. Feel free to add, or just drop the stray SoB, either
way.

-Mike

> 
> >>
> >> [...snip...]
> >>

^ permalink raw reply

* Re: [RFC PATCH 07/10] rcu: Wake NOCB rcuog kthreads on expedited grace period completion
From: Frederic Weisbecker @ 2026-06-03 14:07 UTC (permalink / raw)
  To: Puranjay Mohan
  Cc: rcu, linux-kernel, linux-trace-kernel, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
In-Reply-To: <20260417231203.785172-8-puranjay@kernel.org>

Le Fri, Apr 17, 2026 at 04:11:55PM -0700, Puranjay Mohan a écrit :
> When an expedited grace period completes, rcu_exp_wait_wake() wakes
> waiters on rnp->exp_wq[] but does not notify NOCB rcuog kthreads.  These
> kthreads may be sleeping waiting for a grace period to complete.
> Without this wakeup, callbacks on offloaded CPUs that could benefit from
> the expedited GP must wait until the rcuog kthread wakes for some other
> reason (e.g., next normal GP completion or a timer).
> 
> Add rcu_exp_wake_nocb() which wakes rcuog kthreads for leaf-node CPUs,
> deduplicating via rdp->nocb_gp_rdp since multiple CPUs share one rcuog
> kthread.  Uses for_each_leaf_node_possible_cpu() because offline CPUs
> can have pending callbacks. The function is defined in tree_nocb.h with
> an empty stub for CONFIG_RCU_NOCB_CPU=n builds.
> 
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
>  kernel/rcu/tree.h      |  1 +
>  kernel/rcu/tree_exp.h  |  1 +
>  kernel/rcu/tree_nocb.h | 29 +++++++++++++++++++++++++++++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 7dfc57e9adb1..40f778453591 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -500,6 +500,7 @@ static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
>  static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
>  static void rcu_init_one_nocb(struct rcu_node *rnp);
>  static bool wake_nocb_gp(struct rcu_data *rdp);
> +static void rcu_exp_wake_nocb(struct rcu_node *rnp);
>  static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
>  				  unsigned long j, bool lazy);
>  static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 82cada459e5d..0df1009c6e97 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -708,6 +708,7 @@ static void rcu_exp_wait_wake(unsigned long s)
>  		}
>  		smp_mb(); /* All above changes before wakeup. */
>  		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
> +		rcu_exp_wake_nocb(rnp);
>  	}
>  	trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake"));
>  	mutex_unlock(&rcu_state.exp_wake_mutex);
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 7462cd5e2507..f37ee56d62a9 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -190,6 +190,31 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
>  	init_swait_queue_head(&rnp->nocb_gp_wq[1]);
>  }
>  
> +/*
> + * Wake NOCB rcuog kthreads for leaf-node CPUs so that they can advance
> + * callbacks that were waiting for the just-completed expedited GP.
> + * Deduplicate via nocb_gp_rdp since multiple CPUs share one rcuog
> + * kthread.  Use for_each_leaf_node_possible_cpu() because offline CPUs
> + * may have pending callbacks.
> + */
> +static void rcu_exp_wake_nocb(struct rcu_node *rnp)

Please consolidate the naming to match rcu_nocb_gp_cleanup().

> +{
> +	struct rcu_data *last_rdp_gp = NULL;
> +	int cpu;
> +
> +	if (!rcu_is_leaf_node(rnp))
> +		return;
> +
> +	for_each_leaf_node_possible_cpu(rnp, cpu) {
> +		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> +
> +		if (rdp->nocb_gp_rdp == last_rdp_gp)
> +			continue;
> +		last_rdp_gp = rdp->nocb_gp_rdp;
> +		wake_nocb_gp(rdp);
> +	}

There are two waitqueues for rcuog wake-ups:

1) rdp->rdp_gp->nocb_gp_wq: to wait for callbacks on the queue
2) rnp->nocb_gp_wq: to wait for grace periods

So you're waking up the wrong one.

Something like the below? (untested)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0e43866dc4cd..436e12e313c2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2193,8 +2193,13 @@ static noinline void rcu_gp_cleanup(void)
 			dump_blkd_tasks(rnp, 10);
 		WARN_ON_ONCE(rnp->qsmask);
 		WRITE_ONCE(rnp->gp_seq, new_gp_seq);
-		if (!rnp->parent)
-			smp_mb(); // Order against failing poll_state_synchronize_rcu_full().
+		if (!rnp->parent) {
+			/*
+			 * Order against failing poll_state_synchronize_rcu_full().
+			 * and also rcu_nocb_cleanup_wake() -> swait_active()
+			 */
+			smp_mb();
+		}
 		rdp = this_cpu_ptr(&rcu_data);
 		if (rnp == rdp->mynode)
 			needgp = __note_gp_changes(rnp, rdp) || needgp;
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 40f778453591..8f272cb4e4f3 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -253,7 +253,7 @@ struct rcu_data {
 	u8 nocb_gp_sleep;		/* Is the nocb GP thread asleep? */
 	u8 nocb_gp_bypass;		/* Found a bypass on last scan? */
 	u8 nocb_gp_gp;			/* GP to wait for on last scan? */
-	unsigned long nocb_gp_seq;	/*  If so, ->gp_seq to wait for. */
+	struct rcu_gp_oldstate nocb_gp_seq; /*  If so, ->gp_seq to wait for. */
 	unsigned long nocb_gp_loops;	/* # passes through wait code. */
 	struct swait_queue_head nocb_gp_wq; /* For nocb kthreads to sleep on. */
 	bool nocb_cb_sleep;		/* Is the nocb CB thread asleep? */
@@ -498,9 +498,9 @@ static bool rcu_preempt_need_deferred_qs(struct task_struct *t);
 static void zero_cpu_stall_ticks(struct rcu_data *rdp);
 static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp);
 static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq);
+static void rcu_nocb_exp_cleanup(struct rcu_node *rnp);
 static void rcu_init_one_nocb(struct rcu_node *rnp);
 static bool wake_nocb_gp(struct rcu_data *rdp);
-static void rcu_exp_wake_nocb(struct rcu_node *rnp);
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
 				  unsigned long j, bool lazy);
 static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 0df1009c6e97..43c167a8a145 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -708,7 +708,8 @@ static void rcu_exp_wait_wake(unsigned long s)
 		}
 		smp_mb(); /* All above changes before wakeup. */
 		wake_up_all(&rnp->exp_wq[rcu_seq_ctr(s) & 0x3]);
-		rcu_exp_wake_nocb(rnp);
+		if (rcu_is_leaf_node(rnp))
+			rcu_nocb_exp_cleanup(rnp);
 	}
 	trace_rcu_exp_grace_period(rcu_state.name, s, TPS("endwake"));
 	mutex_unlock(&rcu_state.exp_wake_mutex);
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index f37ee56d62a9..60d4182b9509 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -170,13 +170,59 @@ static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp)
 		lockdep_assert_held(&rdp->nocb_lock);
 }
 
+static void rcu_nocb_cleanup_wake(struct swait_queue_head *sq)
+{
+	if (swait_active(sq))
+		swake_up_all(sq);
+}
+
 /*
  * Wake up any no-CBs CPUs' kthreads that were waiting on the just-ended
  * grace period.
  */
 static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq)
 {
-	swake_up_all(sq);
+	/*
+	 * swait active() can be checked first because:
+	 *
+	 * rcu_gp_cleanup()                              nocb_gp_wait()
+	 * ---------------                               --------------
+	 * WRITE_ONCE(root->gp_seq, new_gp_seq);         swait_event_interruptible_exclusive(sq)
+	 * smp_mb()                                         prepare_to_swait()
+	 * if swait_active(sq)                                 list_add_tail(&wait->task_list, &q->task_list);
+	 *    swake_up_all(sq)                                 set_current_state()
+	 *                                                       smp_mb()
+	 *                                                  if (poll_state_synchronize_rcu_full())
+	 *                                                     if (rcu_seq_done_exact(&root->gp_seq, rgosp->rgos_norm))
+	 *                                                        ...
+	 */
+	rcu_nocb_cleanup_wake(sq);
+}
+
+/*
+ * Wake NOCB rcuog kthreads for leaf-node CPUs so that they can advance
+ * callbacks that were waiting for the just-completed expedited GP.
+ * Wake-up waitqueues for both even and odd GP numbers because exp and
+ * normal sequences don't match.
+ */
+static void rcu_nocb_exp_cleanup(struct rcu_node *rnp)
+{
+/*
+ * swait active() can be checked first because:
+ *
+ * rcu_exp_wait_wake()                           nocb_gp_wait()
+ * ---------------                               --------------
+ * rcu_seq_end(&rcu_state.expedited_sequence);   swait_event_interruptible_exclusive(sq)
+ * smp_mb()                                         prepare_to_swait()
+ * if swait_active(sq)                                 list_add_tail(&wait->task_list, &q->task_list);
+ *    swake_up_all(sq)                                 set_current_state()
+ *                                                        smp_mb()
+ *                                                  if (poll_state_synchronize_rcu_full())
+ *                                                     if (rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp))
+ *                                                        ...
+ */
+	rcu_nocb_cleanup_wake(&rnp->nocb_gp_wq[0]);
+	rcu_nocb_cleanup_wake(&rnp->nocb_gp_wq[1]);
 }
 
 static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
@@ -190,31 +236,6 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
 	init_swait_queue_head(&rnp->nocb_gp_wq[1]);
 }
 
-/*
- * Wake NOCB rcuog kthreads for leaf-node CPUs so that they can advance
- * callbacks that were waiting for the just-completed expedited GP.
- * Deduplicate via nocb_gp_rdp since multiple CPUs share one rcuog
- * kthread.  Use for_each_leaf_node_possible_cpu() because offline CPUs
- * may have pending callbacks.
- */
-static void rcu_exp_wake_nocb(struct rcu_node *rnp)
-{
-	struct rcu_data *last_rdp_gp = NULL;
-	int cpu;
-
-	if (!rcu_is_leaf_node(rnp))
-		return;
-
-	for_each_leaf_node_possible_cpu(rnp, cpu) {
-		struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
-
-		if (rdp->nocb_gp_rdp == last_rdp_gp)
-			continue;
-		last_rdp_gp = rdp->nocb_gp_rdp;
-		wake_nocb_gp(rdp);
-	}
-}
-
 /* Clear any pending deferred wakeup timer (nocb_gp_lock must be held). */
 static void nocb_defer_wakeup_cancel(struct rcu_data *rdp_gp)
 {
@@ -684,7 +705,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 {
 	bool bypass = false;
 	int __maybe_unused cpu = my_rdp->cpu;
-	struct rcu_gp_oldstate cur_gp_seq_full;
+	struct rcu_gp_oldstate wait_gp_seq = {0}; //remove uninitialized warning
 	unsigned long flags;
 	bool gotcbs = false;
 	unsigned long j = jiffies;
@@ -694,7 +715,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	bool needwake_gp;
 	struct rcu_data *rdp, *rdp_toggling = NULL;
 	struct rcu_node *rnp;
-	unsigned long wait_gp_seq = 0; // Suppress "use uninitialized" warning.
 	bool wasempty = false;
 
 	/*
@@ -718,6 +738,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 	 * won't be ignored for long.
 	 */
 	list_for_each_entry(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp) {
+		struct rcu_gp_oldstate cur_gp_seq;
 		long bypass_ncbs;
 		bool flush_bypass = false;
 		long lazy_ncbs;
@@ -755,8 +776,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		needwake_gp = false;
 		if (!rcu_segcblist_restempty(&rdp->cblist,
 					     RCU_NEXT_READY_TAIL) ||
-		    (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full) &&
-		     poll_state_synchronize_rcu_full(&cur_gp_seq_full))) {
+		    (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
+		     poll_state_synchronize_rcu_full(&cur_gp_seq))) {
 			raw_spin_lock_rcu_node(rnp); /* irqs disabled. */
 			needwake_gp = rcu_advance_cbs(rnp, rdp);
 			wasempty = rcu_segcblist_restempty(&rdp->cblist,
@@ -777,11 +798,15 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		 * numbers from rcu_accelerate_cbs() inside
 		 * rcu_advance_cbs() and will be handled on the next pass.
 		 */
-		if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq_full) &&
-		    !poll_state_synchronize_rcu_full(&cur_gp_seq_full)) {
+		if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
+		    !poll_state_synchronize_rcu_full(&cur_gp_seq)) {
+			if (!needwait_gp ||
+			    ULONG_CMP_LT(cur_gp_seq.rgos_norm, wait_gp_seq.rgos_norm))
+				wait_gp_seq.rgos_norm = cur_gp_seq.rgos_norm;
 			if (!needwait_gp ||
-			    ULONG_CMP_LT(cur_gp_seq_full.rgos_norm, wait_gp_seq))
-				wait_gp_seq = cur_gp_seq_full.rgos_norm;
+			    ULONG_CMP_LT(cur_gp_seq.rgos_exp, wait_gp_seq.rgos_exp))
+				wait_gp_seq.rgos_exp = cur_gp_seq.rgos_exp;
+
 			needwait_gp = true;
 			trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
 					    TPS("NeedWaitGP"));
@@ -803,7 +828,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 
 	my_rdp->nocb_gp_bypass = bypass;
 	my_rdp->nocb_gp_gp = needwait_gp;
-	my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
+	if (needwait_gp)
+		my_rdp->nocb_gp_seq = wait_gp_seq;
 
 	// At least one child with non-empty ->nocb_bypass, so set
 	// timer in order to avoid stranding its callbacks.
@@ -838,12 +864,12 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		nocb_gp_sleep(my_rdp, cpu);
 	} else {
 		rnp = my_rdp->mynode;
-		trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("StartWait"));
+		trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq.rgos_norm, TPS("StartWait"));
 		swait_event_interruptible_exclusive(
-			rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq) & 0x1],
-			rcu_seq_done(&rnp->gp_seq, wait_gp_seq) ||
+			rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq.rgos_norm) & 0x1],
+			poll_state_synchronize_rcu_full(&wait_gp_seq) ||
 			!READ_ONCE(my_rdp->nocb_gp_sleep));
-		trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("EndWait"));
+		trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq.rgos_norm, TPS("EndWait"));
 	}
 
 	if (!rcu_nocb_poll) {
@@ -877,7 +903,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
 		swake_up_one(&rdp_toggling->nocb_state_wq);
 	}
 
-	my_rdp->nocb_gp_seq = -1;
+	my_rdp->nocb_gp_seq.rgos_norm = -1;
+	my_rdp->nocb_gp_seq.rgos_exp = -1;
 	WARN_ON(signal_pending(current));
 }
 
@@ -1561,7 +1588,7 @@ static void show_rcu_nocb_gp_state(struct rcu_data *rdp)
 {
 	struct rcu_node *rnp = rdp->mynode;
 
-	pr_info("nocb GP %d %c%c%c%c%c %c[%c%c] %c%c:%ld rnp %d:%d %lu %c CPU %d%s\n",
+	pr_info("nocb GP %d %c%c%c%c%c %c[%c%c] %c%c:%ld/%ld rnp %d:%d %lu %c CPU %d%s\n",
 		rdp->cpu,
 		"kK"[!!rdp->nocb_gp_kthread],
 		"lL"[raw_spin_is_locked(&rdp->nocb_gp_lock)],
@@ -1573,7 +1600,8 @@ static void show_rcu_nocb_gp_state(struct rcu_data *rdp)
 		".W"[swait_active(&rnp->nocb_gp_wq[1])],
 		".B"[!!rdp->nocb_gp_bypass],
 		".G"[!!rdp->nocb_gp_gp],
-		(long)rdp->nocb_gp_seq,
+		(long)rdp->nocb_gp_seq.rgos_norm,
+		(long)rdp->nocb_gp_seq.rgos_exp,
 		rnp->grplo, rnp->grphi, READ_ONCE(rdp->nocb_gp_loops),
 		rdp->nocb_gp_kthread ? task_state_to_char(rdp->nocb_gp_kthread) : '.',
 		rdp->nocb_gp_kthread ? (int)task_cpu(rdp->nocb_gp_kthread) : -1,
@@ -1684,16 +1712,16 @@ static void rcu_nocb_gp_cleanup(struct swait_queue_head *sq)
 {
 }
 
-static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
+static void rcu_nocb_exp_cleanup(struct rcu_node *rnp)
 {
-	return NULL;
 }
 
-static void rcu_init_one_nocb(struct rcu_node *rnp)
+static struct swait_queue_head *rcu_nocb_gp_get(struct rcu_node *rnp)
 {
+	return NULL;
 }
 
-static void rcu_exp_wake_nocb(struct rcu_node *rnp)
+static void rcu_init_one_nocb(struct rcu_node *rnp)
 {
 }
 

^ permalink raw reply related

* Re: [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock
From: Dmitry Ilvokhin @ 2026-06-03 14:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Thomas Bogendoerfer, Juergen Gross, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list, Thomas Gleixner,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Arnd Bergmann,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, linux-mips,
	virtualization, linux-arch, linux-mm, linux-trace-kernel,
	kernel-team, Paul E. McKenney
In-Reply-To: <20260603120811.GW3493090@noisy.programming.kicks-ass.net>

On Wed, Jun 03, 2026 at 02:08:11PM +0200, Peter Zijlstra wrote:
> On Thu, May 14, 2026 at 12:34:55PM +0000, Dmitry Ilvokhin wrote:
> 
> > Baseline, in best case scenario of least number of executed
> > instructions.
> > 
> >     3e0:  endbr64                          ; 4 bytes (always executed)
> >     3e4:  movb $0x0,(%rdi)                 ; 3 bytes (unlock,
> >                                            ; always executed)
> >     3e7:  decl %gs:__preempt_count         ; 7 bytes (always executed)
> >     3ee:  je   3f5                         ; 2 bytes (always executed)
> >     3f0:  jmp  __x86_return_thunk          ; 5 bytes (executed if above
> >                                            ; je is not taken)
> >                                            ; rest is not executed
> >     3f5:  call __SCT__preempt_schedule     ; 5 bytes
> >     3fa:  jmp  __x86_return_thunk          ; 5 bytes
> > 
> > Tracepoint (again same case of least number of executed instructions).
> > 
> >     bc0:  endbr64                          ; 4 bytes (always executed)
> >     bc4:  xchg %ax,%ax                     ; 2 bytes (always executed, this is an
> >                                            ; only addition on the execution path).
> >     bc6:  movb $0x0,(%rdi)                 ; 3 bytes (unlock, always executed)
> >     bc9:  decl %gs:__preempt_count         ; 7 bytes (always executed)
> >     bd0:  je   bde                         ; 2 bytes (always executed)
> >     bd2:  jmp  __x86_return_thunk          ; 5 bytes (executed if above
> >                                            ; je is not taken)
> >                                            ; rest is not executed
> >     bd7:  call queued_spin_release_traced  ; 5 bytes
> >     bdc:  jmp  bc9                         ; 2 bytes
> >     bde:  call __SCT__preempt_schedule     ; 5 bytes
> >     be3:  jmp  __x86_return_thunk          ; 5 bytes
> > 
> 
> So I've been playing with this a bit, and it is all really sad.
> 
> Now, since pretty much everybody+dog will have PARAVIRT_SPINLOCK=y, the
> 'best' solution would be changing that paravirt call with a
> static_call(), that actually shrinks the code by 1 byte.
> 
> And then this tracepoint nonsense can simply use a different unlock
> function, just like paravirt.
> 
> 0000 00000000000001d0 <_raw_spin_unlock>:
> 0000  1d0:	f3 0f 1e fa          	endbr64
> 0004  1d4:	ff 15 00 00 00 00    	call   *0x0(%rip)        # 1da <_raw_spin_unlock+0xa>	1d6: R_X86_64_PC32	pv_ops_lock+0x4
> 000a  1da:	65 ff 0d 00 00 00 00 	decl   %gs:0x0(%rip)        # 1e1 <_raw_spin_unlock+0x11>	1dd: R_X86_64_PC32	__preempt_count-0x4
> 0011  1e1:	74 06                	je     1e9 <_raw_spin_unlock+0x19>
> 0013  1e3:	2e e9 00 00 00 00    	cs jmp 1e9 <_raw_spin_unlock+0x19>	1e5: R_X86_64_PLT32	__x86_return_thunk-0x4
> 0019  1e9:	e8 00 00 00 00       	call   1ee <_raw_spin_unlock+0x1e>	1ea: R_X86_64_PLT32	__SCT__preempt_schedule-0x4
> 001e  1ee:	2e e9 00 00 00 00    	cs jmp 1f4 <_raw_spin_unlock+0x24>	1f0: R_X86_64_PLT32	__x86_return_thunk-0x4
> 
> 
> 0000 00000000000001d0 <_raw_spin_unlock>:
> 0000  1d0:	f3 0f 1e fa          	endbr64
> 0004  1d4:	e8 00 00 00 00       	call   1d9 <_raw_spin_unlock+0x9>	1d5: R_X86_64_PLT32	__SCT__queued_spin_unlock-0x4
> 0009  1d9:	65 ff 0d 00 00 00 00 	decl   %gs:0x0(%rip)        # 1e0 <_raw_spin_unlock+0x10>	1dc: R_X86_64_PC32	__preempt_count-0x4
> 0010  1e0:	74 06                	je     1e8 <_raw_spin_unlock+0x18>
> 0012  1e2:	2e e9 00 00 00 00    	cs jmp 1e8 <_raw_spin_unlock+0x18>	1e4: R_X86_64_PLT32	__x86_return_thunk-0x4
> 0018  1e8:	e8 00 00 00 00       	call   1ed <_raw_spin_unlock+0x1d>	1e9: R_X86_64_PLT32	__SCT__preempt_schedule-0x4
> 001d  1ed:	2e e9 00 00 00 00    	cs jmp 1f3 <_raw_spin_unlock+0x23>	1ef: R_X86_64_PLT32	__x86_return_thunk-0x4
> 
> 
> Something a little like so, which is completely untested, except to
> build kernel/locking/spinlock.o (with clang-23).

Thanks a lot for taking a look, Peter.

I like the static_call idea. It's truly zero cost on x86 (and, as you
note, even a byte smaller). The one caveat is that it relies on
HAVE_STATIC_CALL_INLINE to stay free. 

So my plan would be: static_call where HAVE_STATIC_CALL_INLINE is
available (x86), and a static branch fallback elsewhere, gated behind a
default-off config so it imposes nothing on arches/kernels that don't
opt in. I'm mostly interested in x86, but would like arm64 to work too,
which would use the fallback.

Concretely:

1. Split the sleepable-lock patches out and send them separately.
   They're independent of the static call work and look far less
   controversial.

2. Convert the paravirt spinlock unlock to a static_call, as the
   foundation for the unlock tracepoint. I'm happy to take a stab at it.
   Let me know if you'd rather do it yourself.

3. Build the unlock tracepoint on top: static_call where it's cheap,
   config-gated static_branch fallback where it isn't.

Does this plan sound reasonable to you?

> 
> Also, I think someone should go do some performance runs with
> ARCH_INLINE_SPIN_* set for x86 just like for s390.

That's a good point, I'll run benchmarks and report back with the
results.

^ permalink raw reply

* Re: [PATCH v6 5/7] locking: Add contended_release tracepoint to qspinlock
From: Peter Zijlstra @ 2026-06-03 14:26 UTC (permalink / raw)
  To: Dmitry Ilvokhin
  Cc: Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Thomas Bogendoerfer, Juergen Gross, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list, Thomas Gleixner,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Arnd Bergmann,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, linux-kernel, linux-mips,
	virtualization, linux-arch, linux-mm, linux-trace-kernel,
	kernel-team, Paul E. McKenney
In-Reply-To: <aiA3dzjKVq2_cpSY@shell.ilvokhin.com>

On Wed, Jun 03, 2026 at 02:17:27PM +0000, Dmitry Ilvokhin wrote:

> > Something a little like so, which is completely untested, except to
> > build kernel/locking/spinlock.o (with clang-23).
> 
> Thanks a lot for taking a look, Peter.
> 
> I like the static_call idea. It's truly zero cost on x86 (and, as you
> note, even a byte smaller). The one caveat is that it relies on
> HAVE_STATIC_CALL_INLINE to stay free. 
> 
> So my plan would be: static_call where HAVE_STATIC_CALL_INLINE is
> available (x86), and a static branch fallback elsewhere, gated behind a
> default-off config so it imposes nothing on arches/kernels that don't
> opt in. I'm mostly interested in x86, but would like arm64 to work too,
> which would use the fallback.

(i386 doesn't have STATIC_CALL_INLINE, but nobody cares about the
performance on that target, so anything goes really ;-)

> 
> Concretely:
> 
> 1. Split the sleepable-lock patches out and send them separately.
>    They're independent of the static call work and look far less
>    controversial.
> 
> 2. Convert the paravirt spinlock unlock to a static_call, as the
>    foundation for the unlock tracepoint. I'm happy to take a stab at it.
>    Let me know if you'd rather do it yourself.

Yeah, I think that patch as-is *should* work, but like said, I haven't
even tried it, so it could be terribly broken :-)

> 3. Build the unlock tracepoint on top: static_call where it's cheap,
>    config-gated static_branch fallback where it isn't.

Right, so I think we need some sort of custom callback for tracepoint
enable/disable. Its been a minute since I dug through the tracepoint
code, but I don't think it provides that with a convenient wrapper, but
it should be doable.

One thing to note is that when you set the tracepoint unlock function,
it should either tail-call into the original function, or you have to
create two unlock_trace functions, one for native and one for paravirt
and pick the right one.

> Does this plan sound reasonable to you?

Yeah, should work.

> > Also, I think someone should go do some performance runs with
> > ARCH_INLINE_SPIN_* set for x86 just like for s390.
> 
> That's a good point, I'll run benchmarks and report back with the
> results.

Thanks!

^ permalink raw reply

* Re: [RFC PATCH 00/10] RCU: Enable callbacks to benefit from expedited grace periods
From: Puranjay Mohan @ 2026-06-03 14:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: rcu, linux-kernel, linux-trace-kernel, Paul E. McKenney,
	Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Masami Hiramatsu, Davidlohr Bueso
In-Reply-To: <ahgxkIbBLkTJyUVt@localhost.localdomain>

Hi Frederic,

Thanks a lot for the detailed feedback on all patches, I will send the
next version soon with requested changes.

On Thu, May 28, 2026 at 1:14 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> Hi,
>
> Le Fri, Apr 17, 2026 at 04:11:48PM -0700, Puranjay Mohan a écrit :
> > RCU callbacks only track normal grace period sequence numbers. This
> > means callbacks must wait for normal GPs even when expedited GPs have
> > already elapsed.
> >
> > This series tracks both normal and expedited GP sequences in the
> > callback infrastructure using struct rcu_gp_oldstate, so callbacks
> > advance when either GP type completes.
>
> What is the motivation behind this? When most callbacks don't have much latency
> requirements between queue and invocation?

Right now there's an asymmetry: synchronize_rcu_expedited() callers
get fast object freeing, but call_rcu() callers never benefit from
those same grace periods even though they prove the exact same thing.
This series closes that gap, the callback infrastructure treats a
grace period as a grace period regardless of how it was driven. The
practical effect is faster memory reclaim: objects that could already
be safely freed stop sitting around waiting for the next normal GP.
I'll make this motivation clearer in the next version.

Thanks,
Puranjay

^ permalink raw reply

* Re: [PATCH v2 01/13] verification/rvgen: Switch LTL parser to Lark
From: Gabriele Monaco @ 2026-06-03 14:49 UTC (permalink / raw)
  To: Nam Cao
  Cc: Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
	linux-kernel
In-Reply-To: <e1736ef618e32712eeb65d1714a2fea76298057b.1779956342.git.namcao@linutronix.de>

On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> The LTL parser is built using Ply. However, Ply is no longer
> maintained [1].
> 
> Switch to use Lark instead. In addition to being actively maintained,
> Lark
> also offers additional features (namely, automatically creating the
> abstract syntax tree) which make the parser simpler.
> 
> Link:
> https://github.com/dabeaz/ply/commit/9d7c40099e23ff78f9d86ef69a26c1e8a83e706a
>  [1]
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

> v2:
>   - fix identifier starting with a digit is allowed [Wander]
>   - fixup ast node uid [Gabriele]
>   - Fix up Literal AST node construction [Wander, Sashiko]
>   - FIx up unary op error message [Sashiko]
>   - Add nice exception handling [Gabriele]
> ---
>  tools/verification/rvgen/__main__.py     |   5 +-
>  tools/verification/rvgen/rvgen/ltl2ba.py | 202 +++++++++------------
> --
>  2 files changed, 82 insertions(+), 125 deletions(-)
> 
> diff --git a/tools/verification/rvgen/__main__.py
> b/tools/verification/rvgen/__main__.py
> index 5c923dc10d0f..0915cf86e43b 100644
> --- a/tools/verification/rvgen/__main__.py
> +++ b/tools/verification/rvgen/__main__.py
> @@ -14,6 +14,7 @@ if __name__ == '__main__':
>      from rvgen.container import Container
>      from rvgen.ltl2k import ltl2k
>      from rvgen.automata import AutomataError
> +    from rvgen.ltl2ba import LTLError
>      import argparse
>      import sys
>  
> @@ -57,8 +58,8 @@ if __name__ == '__main__':
>                  sys.exit(1)
>          else:
>              monitor = Container(vars(params))
> -    except AutomataError as e:
> -        print(f"There was an error processing {params.spec}: {e}",
> file=sys.stderr)
> +    except (AutomataError, LTLError) as e:
> +        print(f"There was an error processing {params.spec}:\n{e}",
> file=sys.stderr)
>          sys.exit(1)
>  
>      print(f"Writing the monitor into the directory {monitor.name}")
> diff --git a/tools/verification/rvgen/rvgen/ltl2ba.py
> b/tools/verification/rvgen/rvgen/ltl2ba.py
> index 016e7cf93bbb..7cebda61bce8 100644
> --- a/tools/verification/rvgen/rvgen/ltl2ba.py
> +++ b/tools/verification/rvgen/rvgen/ltl2ba.py
> @@ -7,9 +7,7 @@
>  # https://doi.org/10.1007/978-0-387-34892-6_1
>  # With extra optimizations
>  
> -from ply.lex import lex
> -from ply.yacc import yacc
> -from .automata import AutomataError
> +import lark
>  
>  # Grammar:
>  # 	ltl ::= opd | ( ltl ) | ltl binop ltl | unop ltl
> @@ -30,42 +28,41 @@ from .automata import AutomataError
>  #       imply
>  #       equivalent
>  
> -tokens = (
> -   'AND',
> -   'OR',
> -   'IMPLY',
> -   'UNTIL',
> -   'ALWAYS',
> -   'EVENTUALLY',
> -   'NEXT',
> -   'VARIABLE',
> -   'LITERAL',
> -   'NOT',
> -   'LPAREN',
> -   'RPAREN',
> -   'ASSIGN',
> -)
> -
> -t_AND = r'and'
> -t_OR = r'or'
> -t_IMPLY = r'imply'
> -t_UNTIL = r'until'
> -t_ALWAYS = r'always'
> -t_NEXT = r'next'
> -t_EVENTUALLY = r'eventually'
> -t_VARIABLE = r'[A-Z_0-9]+'
> -t_LITERAL = r'true|false'
> -t_NOT = r'not'
> -t_LPAREN = r'\('
> -t_RPAREN = r'\)'
> -t_ASSIGN = r'='
> -t_ignore_COMMENT = r'\#.*'
> -t_ignore = ' \t\n'
> -
> -def t_error(t):
> -    raise AutomataError(f"Illegal character '{t.value[0]}'")
> -
> -lexer = lex()
> +GRAMMAR = r'''
> +start: assign+
> +
> +assign: VARIABLE "=" _ltl
> +
> +_ltl: _opd | binop | unop
> +
> +_opd : VARIABLE
> +     | LITERAL
> +     | "(" _ltl ")"
> +
> +unop: UNOP _ltl
> +UNOP: "always"
> +     | "eventually"
> +     | "next"
> +     | "not"
> +
> +binop: _opd BINOP _ltl
> +BINOP: "until"
> +      | "and"
> +      | "or"
> +      | "imply"
> +
> +VARIABLE: /[A-Z_][A-Z0-9_]*/
> +LITERAL: "true" | "false"
> +
> +COMMENT: "#" /.*/ "\n"
> +%ignore COMMENT
> +
> +%import common.WS
> +%ignore WS
> +'''
> +
> +class LTLError(Exception):
> +    "Exception raised for malformed linear temporal logic"
>  
>  class GraphNode:
>      uid = 0
> @@ -97,7 +94,7 @@ class GraphNode:
>          return self.id < other.id
>  
>  class ASTNode:
> -    uid = 1
> +    uid = 0
>  
>      def __init__(self, op):
>          self.op = op
> @@ -433,90 +430,49 @@ class Literal:
>          node.old |= {n}
>          return node.expand(node_set)
>  
> -def p_spec(p):
> -    '''
> -    spec : assign
> -         | assign spec
> -    '''
> -    if len(p) == 3:
> -        p[2].append(p[1])
> -        p[0] = p[2]
> -    else:
> -        p[0] = [p[1]]
> -
> -def p_assign(p):
> -    '''
> -    assign : VARIABLE ASSIGN ltl
> -    '''
> -    p[0] = (p[1], p[3])
> -
> -def p_ltl(p):
> -    '''
> -    ltl : opd
> -        | binop
> -        | unop
> -    '''
> -    p[0] = p[1]
> -
> -def p_opd(p):
> -    '''
> -    opd : VARIABLE
> -        | LITERAL
> -        | LPAREN ltl RPAREN
> -    '''
> -    if p[1] == "true":
> -        p[0] = ASTNode(Literal(True))
> -    elif p[1] == "false":
> -        p[0] = ASTNode(Literal(False))
> -    elif p[1] == '(':
> -        p[0] = p[2]
> -    else:
> -        p[0] = ASTNode(Variable(p[1]))
> -
> -def p_unop(p):
> -    '''
> -    unop : ALWAYS ltl
> -         | EVENTUALLY ltl
> -         | NEXT ltl
> -         | NOT ltl
> -    '''
> -    if p[1] == "always":
> -        op = AlwaysOp(p[2])
> -    elif p[1] == "eventually":
> -        op = EventuallyOp(p[2])
> -    elif p[1] == "next":
> -        op = NextOp(p[2])
> -    elif p[1] == "not":
> -        op = NotOp(p[2])
> -    else:
> -        raise AutomataError(f"Invalid unary operator {p[1]}")
> -
> -    p[0] = ASTNode(op)
> -
> -def p_binop(p):
> -    '''
> -    binop : opd UNTIL ltl
> -          | opd AND ltl
> -          | opd OR ltl
> -          | opd IMPLY ltl
> -    '''
> -    if p[2] == "and":
> -        op = AndOp(p[1], p[3])
> -    elif p[2] == "until":
> -        op = UntilOp(p[1], p[3])
> -    elif p[2] == "or":
> -        op = OrOp(p[1], p[3])
> -    elif p[2] == "imply":
> -        op = ImplyOp(p[1], p[3])
> -    else:
> -        raise AutomataError(f"Invalid binary operator {p[2]}")
> -
> -    p[0] = ASTNode(op)
> -
> -parser = yacc()
> +class Transform(lark.visitors.Transformer):
> +    def unop(self, node):
> +        if node[0] == "always":
> +            return ASTNode(AlwaysOp(node[1]))
> +        if node[0] == "eventually":
> +            return ASTNode(EventuallyOp(node[1]))
> +        if node[0] == "next":
> +            return ASTNode(NextOp(node[1]))
> +        if node[0] == "not":
> +            return ASTNode(NotOp(node[1]))
> +        raise ValueError("Unknown operator %s" % node[0])
> +
> +    def binop(self, node):
> +        if node[1] == "until":
> +            return ASTNode(UntilOp(node[0], node[2]))
> +        if node[1] == "and":
> +            return ASTNode(AndOp(node[0], node[2]))
> +        if node[1] == "or":
> +            return ASTNode(OrOp(node[0], node[2]))
> +        if node[1] == "imply":
> +            return ASTNode(ImplyOp(node[0], node[2]))
> +        raise ValueError("Unknown operator %s" % node[1])
> +
> +    def VARIABLE(self, args):
> +        return ASTNode(Variable(args))
> +
> +    def LITERAL(self, args):
> +        return ASTNode(Literal(args == "true"))
> +
> +    def start(self, node):
> +        return node
> +
> +    def assign(self, node):
> +        return node[0].op.name, node[1]
> +
> +parser = lark.Lark(GRAMMAR)
>  
>  def parse_ltl(s: str) -> ASTNode:
> -    spec = parser.parse(s)
> +    try:
> +        spec = parser.parse(s)
> +    except lark.exceptions.UnexpectedInput as e:
> +        raise LTLError(str(e))
> +    spec = Transform().transform(spec)
>  
>      rule = None
>      subexpr = {}
> @@ -528,7 +484,7 @@ def parse_ltl(s: str) -> ASTNode:
>              subexpr[assign[0]] = assign[1]
>  
>      if rule is None:
> -        raise AutomataError("Please define your specification in the
> \"RULE = <LTL spec>\" format")
> +        raise LTLError("Please define your specification in the
> \"RULE = <LTL spec>\" format")
>  
>      for node in rule:
>          if not isinstance(node.op, Variable):


^ permalink raw reply

* Re: [PATCH v2 02/13] verification/rvgen: Introduce a parse tree for automata using Lark
From: Gabriele Monaco @ 2026-06-03 14:55 UTC (permalink / raw)
  To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
	linux-kernel
In-Reply-To: <13a2f241b74e02b13efa7fe188be3fa7e6148b4a.1779956342.git.namcao@linutronix.de>

On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> The DOT parsing scripts directly parse the raw text and they are
> quite
> fragile. If the input dot files' formats are slightly changed (for
> instance, by breaking long some lines which is allowed by the DOT
> language
> defined by graphviz), the scripts would fail.
> 
> To make the scripts robust, the parser should be implemented based on
> the
> dot language specification, not based on how the existing dot files
> look.
> 
> As a first step, use Lark to implement a Parser based on the graphviz
> dot
> language specification. The resulting parse tree is not used yet, but
> the
> existing scripts will be converted one by one to use this new parse
> tree in
> the follow-up commits.
> 
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

> v2:
>   - switch to use Lark's CNAME for identifier [Wander]
>   - switch to use Lark's ESCAPED_STRING for string [Sashiko]
>   - clean up variable name shadowing [Sashiko]
>   - Properly catch Lark exception
> ---
>  tools/verification/rvgen/rvgen/automata.py | 186
> +++++++++++++++++++++
>  1 file changed, 186 insertions(+)
> 
> diff --git a/tools/verification/rvgen/rvgen/automata.py
> b/tools/verification/rvgen/rvgen/automata.py
> index b9f8149f7118..8649d982383d 100644
> --- a/tools/verification/rvgen/rvgen/automata.py
> +++ b/tools/verification/rvgen/rvgen/automata.py
> @@ -13,6 +13,191 @@ import re
>  from typing import Iterator
>  from itertools import islice
>  
> +import lark
> +
> +class ParseTree:
> +    # based on https://graphviz.org/doc/info/lang.html
> +    # with the irrelevant stuffs (port and compass) removed
> +    grammar = r'''
> +    start: "strict"? ("graph" | "digraph") ID? "{" stmt_list "}"
> +
> +    stmt_list: (stmt ";"? stmt_list)?
> +
> +    stmt: node_stmt
> +        | edge_stmt
> +        | attr_stmt
> +        | ID "=" ID
> +        | subgraph
> +
> +    attr_stmt: attr_type attr_list
> +
> +    attr_type: "graph" -> graph
> +            | "node"  -> node
> +            | "edge"  -> edge
> +
> +    attr_list: "[" a_list? "]" attr_list?
> +
> +    a_list: ID "=" ID (";" | ",")? a_list?
> +
> +    edge_stmt: (node_id | subgraph) edgerhs attr_list?
> +
> +    edgerhs: edgeop (node_id | subgraph) edgerhs?
> +
> +    edgeop: "->" | "--"
> +
> +    node_stmt: node_id attr_list?
> +
> +    node_id: ID
> +
> +    subgraph: ("subgraph" ID?)? "{" stmt_list "}"
> +
> +    ID: CNAME
> +      | /-?(\.[0-9]+|[0-9]+(\.[0-9]*))/
> +      | ESCAPED_STRING
> +
> +    %import common.CNAME
> +    %import common.ESCAPED_STRING
> +    %import common.WS
> +    %ignore WS
> +    '''
> +
> +    @staticmethod
> +    def parse_edge(tree: lark.Tree) -> tuple[str, str]:
> +        # only support a simple node-to-node edge
> +        nodes = []
> +        for node in tree.iter_subtrees_topdown():
> +            if node.data == "node_id":
> +                nodes.append(node.children[0].strip('"'))
> +
> +        if len(nodes) != 2:
> +            raise AutomataError("Only state-to-state transition is
> supported")
> +
> +        return tuple(nodes)
> +
> +    class ParseNodes(lark.visitors.Visitor):
> +        def __init__(self, *args, **kwargs):
> +            self.nodes = set()
> +            super().__init__(*args, **kwargs)
> +
> +        def node_stmt(self, tree):
> +            node_id = tree.children[0]
> +            node = node_id.children[0].strip('"')
> +            self.nodes.add(node)
> +
> +    class ParseEdges(lark.visitors.Visitor):
> +        def __init__(self, *args, **kwargs):
> +            self.edges = set()
> +            super().__init__(*args, **kwargs)
> +
> +        def edge_stmt(self, tree):
> +            edge = ParseTree.parse_edge(tree)
> +            self.edges.add(edge)
> +
> +    class ParseAttributes(lark.visitors.Interpreter):
> +        def __init__(self, *args, **kwargs):
> +            '''
> +            Stacks of default attributes. [0] is the default
> +            attributes for the outermost scope, while [-1] is the
> +            default attributes for the current scope.
> +            '''
> +            self.default_node_attrs = [{}]
> +            self.default_edge_attrs = [{}]
> +
> +            self.node_attrs = {}
> +            self.edge_attrs = {}
> +
> +            super().__init__(*args, **kwargs)
> +
> +        @staticmethod
> +        def __get_attrs(stmt: lark.Tree) -> dict[str, str]:
> +            attrs = {}
> +
> +            for node in stmt.iter_subtrees():
> +                if node.data == "a_list":
> +                    attrs[node.children[0]] =
> node.children[1].strip('"')
> +
> +            return attrs
> +
> +
> +        def subgraph(self, tree):
> +            # We are entering a new scope, inherit the default
> +            # attributes of the outer scope
> +            self.default_node_attrs.append(self.default_node_attrs[-
> 1].copy())
> +            self.default_edge_attrs.append(self.default_edge_attrs[-
> 1].copy())
> +
> +            children = self.visit_children(tree)
> +
> +            # Exiting the scope
> +            del self.default_node_attrs[-1]
> +            del self.default_edge_attrs[-1]
> +
> +            return children
> +
> +        def node_stmt(self, tree):
> +            node_id = tree.children[0]
> +            node = node_id.children[0].strip('"')
> +
> +            attrs = self.default_node_attrs[-1].copy()
> +            attrs |= self.__get_attrs(tree)
> +
> +            if attrs:
> +                if node in self.node_attrs:
> +                    self.node_attrs[node] = attrs |
> self.node_attrs[node]
> +                else:
> +                    self.node_attrs[node] = attrs
> +
> +            return self.visit_children(tree)
> +
> +        def edge_stmt(self, tree):
> +            edge = ParseTree.parse_edge(tree)
> +
> +            attrs = self.default_edge_attrs[-1].copy()
> +            attrs |= self.__get_attrs(tree)
> +
> +            if attrs:
> +                if edge in self.edge_attrs:
> +                    self.edge_attrs[edge] = attrs |
> self.edge_attrs[edge]
> +                else:
> +                    self.edge_attrs[edge] = attrs
> +
> +            return self.visit_children(tree)
> +
> +        def attr_stmt(self, tree):
> +            attr_type = tree.children[0].data
> +            attrs = self.__get_attrs(tree)
> +
> +            if attr_type == "node":
> +                self.default_node_attrs[-1] |= attrs
> +            elif attr_type == "edge":
> +                self.default_edge_attrs[-1] |= attrs
> +            else:
> +                # graph attributes are irrelevant
> +                pass
> +
> +            self.visit_children(tree)
> +
> +    def __init__(self, dot_file):
> +        parser = lark.Lark(self.grammar, parser='lalr')
> +        node_parser = self.ParseNodes()
> +        edge_parser = self.ParseEdges()
> +        attributes_parser = self.ParseAttributes()
> +
> +        try:
> +            with open(dot_file, "r") as f:
> +                tree = parser.parse(f.read())
> +                attributes_parser.visit(tree)
> +                node_parser.visit(tree)
> +                edge_parser.visit(tree)
> +        except OSError as exc:
> +            raise AutomataError(exc.strerror) from exc
> +        except lark.exceptions.UnexpectedInput as exc:
> +            raise AutomataError(str(exc))
> +
> +        self.nodes = node_parser.nodes
> +        self.edges = edge_parser.edges
> +        self.node_attrs = attributes_parser.node_attrs
> +        self.edge_attrs = attributes_parser.edge_attrs
> +
>  class _ConstraintKey:
>      """Base class for constraint keys."""
>  
> @@ -66,6 +251,7 @@ class Automata:
>          self.__dot_path = file_path
>          self.name = model_name or self.__get_model_name()
>          self.__dot_lines = self.__open_dot()
> +        self.__parse_tree = ParseTree(file_path)
>          self.states, self.initial_state, self.final_states =
> self.__get_state_variables()
>          self.env_types = {}
>          self.env_stored = set()


^ permalink raw reply

* Re: [PATCH v2 03/13] verification/rvgen: Implement state and transition parser based on Lark
From: Gabriele Monaco @ 2026-06-03 14:55 UTC (permalink / raw)
  To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
	linux-kernel
In-Reply-To: <e84c39bab831f1623142e6b83a9513ba99a702f1.1779956342.git.namcao@linutronix.de>

On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> The DOT parsing scripts directly parse the raw text and they are
> quite
> fragile. If the input dot files' formats are slightly changed (for
> instance, by breaking long some lines which is allowed by the DOT
> language), the scripts would fail.
> 
> Prepare to move away from the raw text processing, implement parsers
> based
> on Lark which parse states, transitions and constraints.
> 
> The parse results are not used yet. The existing scripts will be
> converted
> one by one to them, and the raw text processing will eventually be
> removed.
> 
> Signed-off-by: Nam Cao <namcao@linutronix.de>

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

> ---
> v2:
>   - fixup guard grammar [Gabriele]
>   - fixup inconsistent types in a list [Wander]
>   - compiling the parsers only once to avoid overhead [Sashiko]
>   - fix up the signature of State.__init__() [Sashiko]
>   - gracefully handle node statement without label [Sashiko]
>   - lark parse exception handling
> ---
>  tools/verification/rvgen/rvgen/automata.py | 216
> +++++++++++++++++++++
>  1 file changed, 216 insertions(+)
> 
> diff --git a/tools/verification/rvgen/rvgen/automata.py
> b/tools/verification/rvgen/rvgen/automata.py
> index 8649d982383d..b86275e7bf28 100644
> --- a/tools/verification/rvgen/rvgen/automata.py
> +++ b/tools/verification/rvgen/rvgen/automata.py
> @@ -198,6 +198,164 @@ class ParseTree:
>          self.node_attrs = attributes_parser.node_attrs
>          self.edge_attrs = attributes_parser.edge_attrs
>  
> +class ConstraintCondition:
> +    def __init__(self, env: str, op: str, val: str, unit=None):
> +        self.env = env
> +        self.op = op
> +        self.val = val
> +        self.unit = unit
> +        if unit is None:
> +            # try to infer unit from constants or parameters
> +            val_for_unit = val.lower().replace("()", "")
> +            if val_for_unit.endswith("_ns"):
> +                self.unit = "ns"
> +            if val_for_unit.endswith("_jiffies"):
> +                self.unit = "j"
> +
> +class ConstraintRule:
> +    grammar = r'''
> +        rule: condition (OP condition)*
> +
> +        OP: "&&" | "||"
> +
> +        condition: ENV CMP_OP VAL UNIT?
> +
> +        ENV: CNAME
> +
> +        CMP_OP: "==" | "<=" | "<" | ">=" | ">"
> +
> +        VAL: /[0-9]+/
> +           | /[A-Z_]+\(\)/
> +           | /[A-Z_]+/
> +           | /[a-z_]+\(\)/
> +           | /[a-z_]+/
> +
> +        UNIT: "ns" | "us" | "ms" | "s"
> +    '''
> +
> +    def __init__(self, c: ConstraintCondition):
> +        '''
> +        A list of pairs of
> +          - the condition (e.g. is_constr_dl == 1)
> +          - the logical operator ("||" or "&&") combining this
> +            condition with the next one if it exists, otherwise None
> +
> +        TODO: Perhaps use an abstract syntax tree instead, because
> +              this representation cannot capture precedence
> +        '''
> +        self.rules = [[c, None]]
> +
> +    def chain(self, op: str, c: ConstraintCondition):
> +        self.rules[-1][1] = op
> +        self.rules.append([c, None])
> +
> +class ConstraintReset:
> +    def __init__(self, env):
> +        self.env = env
> +
> +class StateLabelParser:
> +    grammar = r'''
> +    label: CNAME ("\\n" condition)?
> +
> +    %import common.CNAME
> +    %import common.WS
> +    %ignore WS
> +    ''' + ConstraintRule.grammar
> +
> +    parser = lark.Lark(grammar, parser='lalr', start="label")
> +
> +    def __init__(self, label: str):
> +        try:
> +            tree = self.parser.parse(label)
> +        except lark.exceptions.UnexpectedInput as exc:
> +            raise(AutomataError(f"Unrecognised state
> \"{label}\"\n{exc}"))
> +
> +        self.state = tree.children[0]
> +        self.constraint = None
> +
> +        if len(tree.children) == 2:
> +            self.constraint =
> ConstraintCondition(*tree.children[1].children)
> +            if self.constraint.op not in ("<", "<="):
> +                raise AutomataError("State constraints must be clock
> expirations like"
> +                                    f" clk<N ({label})")
> +
> +class EventLabelParser:
> +    grammar = r'''
> +    events: event ("\\n" event)*
> +
> +    event: name (";" guard)?
> +
> +    guard: reset
> +         | rule
> +         | rule ";" reset
> +         | reset ";" rule
> +
> +    name: CNAME
> +
> +    reset: "reset" "(" ENV ")"
> +
> +    %import common.CNAME
> +    %import common.WS
> +    %ignore WS
> +    ''' + ConstraintRule.grammar
> +
> +    parser = lark.Lark(grammar, parser='lalr', start="events")
> +
> +    class GetEvents(lark.visitors.Transformer):
> +        def guard(self, args):
> +            reset = None
> +            rule = None
> +            for arg in args:
> +                if arg.data == "reset":
> +                    reset = ConstraintReset(arg.children[0])
> +                elif arg.data == "rule":
> +                    conditions = arg.children
> +                    rule = ConstraintRule(conditions[0])
> +                    for i in range(1, len(conditions), 2):
> +                        rule.chain(conditions[i], conditions[i + 1])
> +            return reset, rule
> +
> +        def OP(self, args):
> +            return args
> +
> +        def condition(self, args):
> +            return ConstraintCondition(*args)
> +
> +        def event(self, args):
> +            assert(len(args) <= 2)
> +            name = args[0]
> +            rule, reset = None, None
> +            if len(args) == 2:
> +                reset, rule = args[1]
> +            return name, reset, rule
> +
> +        def events(self, args):
> +            return args
> +
> +        def name(self, args):
> +            return args[0]
> +
> +    def __init__(self, label: str):
> +        try:
> +            tree = self.parser.parse(label)
> +            self.events = self.GetEvents().transform(tree)
> +        except lark.exceptions.UnexpectedInput as exc:
> +            raise(AutomataError(f"Unrecognised event
> \"{label}\"\n{exc}"))
> +
> +class Transition:
> +    def __init__(self, src: str, dst: str, event: str,
> +                 reset: ConstraintReset, rule: ConstraintRule):
> +        self.src = src
> +        self.dst = dst
> +        self.event = event
> +        self.rule = rule
> +        self.reset = reset
> +
> +class State:
> +    def __init__(self, name: str, inv: ConstraintCondition):
> +        self.name = name
> +        self.inv = inv
> +
>  class _ConstraintKey:
>      """Base class for constraint keys."""
>  
> @@ -252,6 +410,8 @@ class Automata:
>          self.name = model_name or self.__get_model_name()
>          self.__dot_lines = self.__open_dot()
>          self.__parse_tree = ParseTree(file_path)
> +        self.transitions = self.__parse_transitions()
> +        self._states, self._initial_state, self._final_states =
> self.__parse_states()
>          self.states, self.initial_state, self.final_states =
> self.__get_state_variables()
>          self.env_types = {}
>          self.env_stored = set()
> @@ -327,6 +487,62 @@ class Automata:
>  
>          return cursor
>  
> +    def __parse_transitions(self):
> +        transitions = []
> +
> +        for edge in self.__parse_tree.edges:
> +            attr = self.__parse_tree.edge_attrs.get(edge)
> +            if not attr:
> +                continue
> +
> +            label = attr.get("label")
> +
> +            src, dst = edge
> +
> +            parser = EventLabelParser(label)
> +            for event, reset, rule in parser.events:
> +                transitions.append(Transition(src, dst, event,
> reset, rule))
> +
> +        transitions.sort(key=lambda t : (t.src, t.event))
> +        return transitions
> +
> +    def __parse_states(self):
> +        initial_state = ""
> +        states = []
> +        final_states = []
> +
> +        for node in self.__parse_tree.nodes:
> +            attr = self.__parse_tree.node_attrs[node]
> +            label = attr.get("label")
> +
> +            if node.startswith(Automata.init_marker):
> +                initial_state = node[len(Automata.init_marker):]
> +
> +            if not label:
> +                continue
> +
> +            parser = StateLabelParser(label)
> +            state = State(parser.state, parser.constraint)
> +
> +            states.append(state)
> +
> +            shape = attr.get("shape")
> +            if shape in ("doublecircle", "ellipse"):
> +                final_states.append(state)
> +
> +
> +        initial_state = next((s for s in states if s.name ==
> initial_state), None)
> +        if not initial_state:
> +            raise AutomataError("The automaton doesn't have an
> initial state")
> +
> +        if not final_states:
> +            final_states.append(initial_state)
> +
> +        states.remove(initial_state)
> +        states.sort(key=lambda s : s.name)
> +        states.insert(0, initial_state)
> +        return states, initial_state, final_states
> +
>      def __get_state_variables(self) -> tuple[list[str], str,
> list[str]]:
>          # wait for node declaration
>          states = []


^ permalink raw reply

* Re: [PATCH v2 04/13] verification/rvgen: Convert __fill_verify_invariants_func() to Lark
From: Gabriele Monaco @ 2026-06-03 14:56 UTC (permalink / raw)
  To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
	linux-kernel
In-Reply-To: <d1fbbafd8af64a9663b3982845ecb2c3c5a4eded.1779956342.git.namcao@linutronix.de>

On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> Convert __fill_verify_invariants_func() to use the parsed states
> information from Lark, prepare to remove the old raw text parsing
> code.
> 
> Signed-off-by: Nam Cao <namcao@linutronix.de>

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

> ---
> v2: fix up __start_to_invariant_check()'s signature [Sashiko]
> ---
>  tools/verification/rvgen/rvgen/dot2k.py | 32 ++++++++++++++++-------
> --
>  1 file changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index e6f476b903b0..a344cbbcb346 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -12,6 +12,7 @@ from collections import deque
>  from .dot2c import Dot2c
>  from .generator import Monitor
>  from .automata import _EventConstraintKey, _StateConstraintKey,
> AutomataError
> +from .automata import ConstraintRule, ConstraintCondition
>  
>  
>  class dot2k(Monitor, Dot2c):
> @@ -177,6 +178,14 @@ class ha2k(dot2k):
>              raise AutomataError("Detected deterministic automaton,
> use the 'da' class")
>          self.trace_h = self._read_template_file("trace_hybrid.h")
>          self.__parse_constraints()
> +        self.has_invariant = False
> +        self.has_guard = False
> +        for state in self._states:
> +            if state.inv:
> +                self.has_invariant = True
> +        for transition in self.transitions:
> +            if transition.rule or transition.reset:
> +                self.has_guard = True
>  
>      def fill_monitor_class_type(self) -> str:
>          if self._is_id_monitor():
> @@ -218,14 +227,13 @@ class ha2k(dot2k):
>          assert env.rstrip(f"_{self.name}") in self.envs
>          return env
>  
> -    def __start_to_invariant_check(self, constr: str) -> str:
> +    def __start_to_invariant_check(self, inv: ConstraintCondition) -
> > str:
>          # by default assume the timer has ns expiration
> -        env = self.__get_constraint_env(constr)
>          clock_type = "ns"
> -        if self.env_types.get(env.rstrip(f"_{self.name}")) == "j":
> +        if inv.unit == "j":
>              clock_type = "jiffy"
>  
> -        return f"return ha_check_invariant_{clock_type}(ha_mon,
> {env}, time_ns)"
> +        return f"return ha_check_invariant_{clock_type}(ha_mon,
> {inv.env}_{self.name}, time_ns)"
>  
>      def __start_to_conv(self, constr: str) -> str:
>          """
> @@ -320,20 +328,22 @@ class ha2k(dot2k):
>                  self.invariants[key] = rules[0]
>  
>      def __fill_verify_invariants_func(self) -> list[str]:
> -        buff = []
> -        if not self.invariants:
> +        if not self.has_invariant:
>              return []
>  
> -        buff.append(
> +        buff = [
>  f"""static inline bool ha_verify_invariants(struct ha_monitor
> *ha_mon,
>  \t\t\t\t\tenum {self.enum_states_def} curr_state, enum
> {self.enum_events_def} event,
>  \t\t\t\t\tenum {self.enum_states_def} next_state, u64 time_ns)
> -{{""")
> +{{"""]
>  
>          _else = ""
> -        for state, constr in sorted(self.invariants.items()):
> -            check_str = self.__start_to_invariant_check(constr)
> -            buff.append(f"\t{_else}if (curr_state ==
> {self.states[state]}{self.enum_suffix})")
> +        for state in self._states:
> +            if not state.inv:
> +                continue
> +
> +            check_str = self.__start_to_invariant_check(state.inv)
> +            buff.append(f"\t{_else}if (curr_state ==
> {state.name}{self.enum_suffix})")
>              buff.append(f"\t\t{check_str};")
>              _else = "else "
>  


^ permalink raw reply

* Re: [PATCH v2 05/13] verification/rvgen: Convert __fill_setup_invariants_func() to Lark
From: Gabriele Monaco @ 2026-06-03 15:24 UTC (permalink / raw)
  To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
	linux-kernel
In-Reply-To: <0b2cf5e1bb03d0e3a667b0fc2c7093123ef0a78c.1779956342.git.namcao@linutronix.de>

On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> Prepare for self.invariants and __parse_constraints() to be removed.
> convert __fill_setup_invariants_func() to use the new parsed states
> from Lark.
> 
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

> v2: add missing time conversion [Sashiko]
> ---
>  tools/verification/rvgen/rvgen/dot2k.py | 44 ++++++++++++++++++++---
> --
>  1 file changed, 35 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index a344cbbcb346..d9f8e1c7737a 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -250,6 +250,26 @@ class ha2k(dot2k):
>          return (f"ha_start_timer_{clock_type}(ha_mon,
> {rule["env"]}{self.enum_suffix},"
>                  f" {value}, time_ns)")
>  
> +    def __parse_invariant(self, inv):
> +        # by default assume the timer has ns expiration
> +        clock_type = "ns"
> +        if inv.unit == "j":
> +            clock_type = "jiffy"
> +
> +        env = inv.env + self.enum_suffix
> +        val = inv.val.replace("()", "(ha_mon)")
> +
> +        match inv.unit:
> +            case "us":
> +                val *= 10**3
> +            case "ms":
> +                val *= 10**6
> +            case "s":
> +                val *= 10**9
> +
> +        return (f"ha_start_timer_{clock_type}(ha_mon, {env},"
> +                f" {val}, time_ns)")
> +
>      def __format_guard_rules(self, rules: list[str]) -> list[str]:
>          """
>          Merge guard constraints as a single C return statement.
> @@ -463,15 +483,14 @@ f"""static inline bool ha_verify_guards(struct
> ha_monitor *ha_mon,
>          return conflict_guards, conflict_invs
>  
>      def __fill_setup_invariants_func(self) -> list[str]:
> -        buff = []
> -        if not self.invariants:
> +        if not self.has_invariant:
>              return []
>  
> -        buff.append(
> +        buff = [
>  f"""static inline void ha_setup_invariants(struct ha_monitor
> *ha_mon,
>  \t\t\t\t       enum {self.enum_states_def} curr_state, enum
> {self.enum_events_def} event,
>  \t\t\t\t       enum {self.enum_states_def} next_state, u64 time_ns)
> -{{""")
> +{{"""]
>  
>          conditions = ["next_state == curr_state"]
>          conditions += [f"event != {e}{self.enum_suffix}"
> @@ -480,13 +499,20 @@ f"""static inline void
> ha_setup_invariants(struct ha_monitor *ha_mon,
>          buff.append(f"\tif ({condition_str})\n\t\treturn;")
>  
>          _else = ""
> -        for state, constr in sorted(self.invariants.items()):
> -            buff.append(f"\t{_else}if (next_state ==
> {self.states[state]}{self.enum_suffix})")
> -            buff.append(f"\t\t{constr};")
> +        for state in self._states:
> +            inv = state.inv
> +            if not inv:
> +                continue
> +            inv = self.__parse_invariant(inv)
> +            buff.append(f"\t{_else}if (next_state ==
> {state.name}{self.enum_suffix})")
> +            buff.append(f"\t\t{inv};")
>              _else = "else "
>  
> -        for state in self.invariants:
> -            buff.append(f"\telse if (curr_state ==
> {self.states[state]}{self.enum_suffix})")
> +        for state in self._states:
> +            inv = state.inv
> +            if not inv:
> +                continue
> +            buff.append(f"\telse if (curr_state ==
> {state.name}{self.enum_suffix})")
>              buff.append("\t\tha_cancel_timer(ha_mon);")
>  
>          buff.append("}\n")


^ permalink raw reply

* [PATCH v3] tracing: fix CFI violation in probestub test
From: Eva Kurchatova @ 2026-06-03 15:31 UTC (permalink / raw)
  To: mhiramat, rostedt
  Cc: linux-trace-kernel, linux-kernel, mathieu.desnoyers, peterz,
	jpoimboe, samitolvanen, eva.kurchatova

When multiple callbacks are registered on the same tracepoint,
callbacks will be indirectly called via traceiter helper.

Pointers to __probestub_* callbacks reside in __tracepoints section,
which is excluded from ENDBR checks in objtool, causing objtool to
assume those functions are never indirectly called.

Registering multiple callbacks using sched_wakeup test will result
in #CP exception due to missing ENDBR in __probestub_sched_wakeup
on a CFI-enabled machine.

Fix this by adding CFI_NOSEAL annotation to probestub declaration.

Fixes: d5173f753750 ("objtool: Exclude __tracepoints data from ENDBR checks")
Signed-off-by: Eva Kurchatova <eva.kurchatova@virtuozzo.com>
---
 include/linux/tracepoint.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 763eea4d80d8..2d2b9f8cdda4 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -20,6 +20,7 @@
 #include <linux/rcupdate_trace.h>
 #include <linux/tracepoint-defs.h>
 #include <linux/static_call.h>
+#include <linux/cfi.h>
 
 struct module;
 struct tracepoint;
@@ -389,6 +390,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 	void __probestub_##_name(void *__data, proto)			\
 	{								\
 	}								\
+	/*								\
+	 * Annotate the probestub 'CFI_NOSEAL' to stop objtool from	\
+	 * requesting the kernel remove the ENDBR, because the only	\
+	 * references to the function are in the __tracepoint section,	\
+	 * that objtool doesn't scan.					\
+	 */								\
+	CFI_NOSEAL(__probestub_##_name);				\
 	DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name);	\
 	DEFINE_RUST_DO_TRACE(_name, TP_PROTO(proto), TP_ARGS(args))
 
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v2 13/13] verification/rvgen: Remove dead code
From: Gabriele Monaco @ 2026-06-03 15:36 UTC (permalink / raw)
  To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
	linux-kernel
In-Reply-To: <9120d2a5987b79b646dde4001381581b703c0dd7.1779956342.git.namcao@linutronix.de>

On Thu, 2026-05-28 at 10:28 +0200, Nam Cao wrote:
> The conversion to use Lark left some dead code behind. Remove them.
> 
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---

You might want to remove unused imports (linters should help you with
that too):
* re, typing.Iterator, and itertools.islice from automata.py
* deque and ConstraintRule from dot2k

Other than that

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

>  tools/verification/rvgen/rvgen/automata.py | 154 -------------------
> --
>  tools/verification/rvgen/rvgen/dot2k.py    |  28 +---
>  2 files changed, 1 insertion(+), 181 deletions(-)
> 
> diff --git a/tools/verification/rvgen/rvgen/automata.py
> b/tools/verification/rvgen/rvgen/automata.py
> index a3be327c2a73..b6ff5fceb820 100644
> --- a/tools/verification/rvgen/rvgen/automata.py
> +++ b/tools/verification/rvgen/rvgen/automata.py
> @@ -356,19 +356,6 @@ class State:
>          self.name = name
>          self.inv = inv
>  
> -class _ConstraintKey:
> -    """Base class for constraint keys."""
> -
> -class _StateConstraintKey(_ConstraintKey, int):
> -    """Key for a state constraint. Under the hood just state_id."""
> -    def __new__(cls, state_id: int):
> -        return super().__new__(cls, state_id)
> -
> -class _EventConstraintKey(_ConstraintKey, tuple):
> -    """Key for an event constraint. Under the hood just
> tuple(state_id,event_id)."""
> -    def __new__(cls, state_id: int, event_id: int):
> -        return super().__new__(cls, (state_id, event_id))
> -
>  class AutomataError(Exception):
>      """Exception raised for errors in automata parsing and
> validation.
>  
> @@ -387,28 +374,10 @@ class Automata:
>  
>      invalid_state_str = "INVALID_STATE"
>      init_marker = "__init_"
> -    node_marker = "{node"
> -    # val can be numerical, uppercase (constant or macro), lowercase
> (parameter or function)
> -    # only numerical values should have units
> -    constraint_rule = re.compile(r"""
> -        ^
> -        (?P<env>[a-zA-Z_][a-zA-Z0-9_]+)  # C-like identifier for the
> env var
> -        (?P<op>[!<=>]{1,2})              # operator
> -        (?P<val>
> -            [0-9]+ |                     # numerical value
> -            [A-Z_]+\(\) |                # macro
> -            [A-Z_]+ |                    # constant
> -            [a-z_]+\(\) |                # function
> -            [a-z_]+                      # parameter
> -        )
> -        (?P<unit>[a-z]{1,2})?            # optional unit for
> numerical values
> -        """, re.VERBOSE)
> -    constraint_reset = re.compile(r"^reset\((?P<env>[a-zA-Z_][a-zA-
> Z0-9_]+)\)")
>  
>      def __init__(self, file_path, model_name=None):
>          self.__dot_path = file_path
>          self.name = model_name or self.__get_model_name()
> -        self.__dot_lines = self.__open_dot()
>          self.__parse_tree = ParseTree(file_path)
>          self.transitions = self.__parse_transitions()
>          self.states, self.initial_state, self.final_states =
> self.__parse_states()
> @@ -435,57 +404,6 @@ class Automata:
>  
>          return model_name
>  
> -    def __open_dot(self) -> list[str]:
> -        dot_lines = []
> -        try:
> -            with open(self.__dot_path) as dot_file:
> -                dot_lines = dot_file.readlines()
> -        except OSError as exc:
> -            raise AutomataError(exc.strerror) from exc
> -
> -        if not dot_lines:
> -            raise AutomataError(f"{self.__dot_path} is empty")
> -
> -        # checking the first line:
> -        line = dot_lines[0].split()
> -
> -        if len(line) < 2 or line[0] != "digraph" or line[1] !=
> "state_automaton":
> -            raise AutomataError(f"Not a valid .dot format:
> {self.__dot_path}")
> -
> -        return dot_lines
> -
> -    def __get_cursor_begin_states(self) -> int:
> -        for cursor, line in enumerate(self.__dot_lines):
> -            split_line = line.split()
> -
> -            if len(split_line) and split_line[0] ==
> self.node_marker:
> -                return cursor
> -
> -        raise AutomataError("Could not find a beginning state")
> -
> -    def __get_cursor_begin_events(self) -> int:
> -        state = 0
> -        cursor = 0 # make pyright happy
> -
> -        for cursor, line in enumerate(self.__dot_lines):
> -            line = line.split()
> -            if not line:
> -                continue
> -
> -            if state == 0:
> -                if line[0] == self.node_marker:
> -                    state = 1
> -            elif line[0] != self.node_marker:
> -                break
> -        else:
> -            raise AutomataError("Could not find beginning event")
> -
> -        cursor += 1 # skip initial state transition
> -        if cursor == len(self.__dot_lines):
> -            raise AutomataError("Dot file ended after event
> beginning")
> -
> -        return cursor
> -
>      def __parse_transitions(self):
>          transitions = []
>  
> @@ -542,51 +460,6 @@ class Automata:
>          states.insert(0, initial_state)
>          return states, initial_state, final_states
>  
> -    def __get_state_variables(self) -> tuple[list[str], str,
> list[str]]:
> -        # wait for node declaration
> -        states = []
> -        final_states = []
> -        initial_state = ""
> -
> -        has_final_states = False
> -        cursor = self.__get_cursor_begin_states()
> -
> -        # process nodes
> -        for line in islice(self.__dot_lines, cursor, None):
> -            split_line = line.split()
> -            if not split_line or split_line[0] != self.node_marker:
> -                break
> -
> -            raw_state = split_line[-1]
> -
> -            #  "enabled_fired"}; -> enabled_fired
> -            state = raw_state.replace('"', '').replace('};',
> '').replace(',', '_')
> -            if state.startswith(self.init_marker):
> -                initial_state = state[len(self.init_marker):]
> -            else:
> -                states.append(state)
> -                if "doublecircle" in line:
> -                    final_states.append(state)
> -                    has_final_states = True
> -
> -                if "ellipse" in line:
> -                    final_states.append(state)
> -                    has_final_states = True
> -
> -        if not initial_state:
> -            raise AutomataError("The automaton doesn't have an
> initial state")
> -
> -        states = sorted(set(states))
> -        states.remove(initial_state)
> -
> -        # Insert the initial state at the beginning of the states
> -        states.insert(0, initial_state)
> -
> -        if not has_final_states:
> -            final_states.append(initial_state)
> -
> -        return states, initial_state, final_states
> -
>      def __get_event_variables(self) -> tuple[list[str], list[str]]:
>          events: list[str] = []
>          envs: list[str] = []
> @@ -609,26 +482,6 @@ class Automata:
>  
>          return sorted(set(events)), sorted(set(envs))
>  
> -    def _split_constraint_expr(self, constr: list[str]) ->
> Iterator[tuple[str,
> -
>                                                                       
>     str | None]]:
> -        """
> -        Get a list of strings of the type constr1 && constr2 and
> returns a list of
> -        constraints and separators: [[constr1,"&&"],[constr2,None]]
> -        """
> -        exprs = []
> -        seps = []
> -        for c in constr:
> -            while "&&" in c or "||" in c:
> -                a = c.find("&&")
> -                o = c.find("||")
> -                pos = a if o < 0 or 0 < a < o else o
> -                exprs.append(c[:pos].replace(" ", ""))
> -                seps.append(c[pos:pos + 2].replace(" ", ""))
> -                c = c[pos + 2:].replace(" ", "")
> -            exprs.append(c)
> -            seps.append(None)
> -        return zip(exprs, seps)
> -
>      def __extract_env_var(self, constraint: ConstraintCondition):
>          if constraint.unit:
>              self.env_types[constraint.env] = constraint.unit
> @@ -697,10 +550,3 @@ class Automata:
>  
>      def is_hybrid_automata(self) -> bool:
>          return bool(self.envs)
> -
> -    def is_event_constraint(self, key: _ConstraintKey) -> bool:
> -        """
> -        Given the key in self.constraints return true if it is an
> event
> -        constraint, false if it is a state constraint
> -        """
> -        return isinstance(key, _EventConstraintKey)
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index 49cb3e724a93..d6779ac6b7dd 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -11,9 +11,7 @@
>  from collections import deque
>  from .dot2c import Dot2c
>  from .generator import Monitor
> -from .automata import _EventConstraintKey, _StateConstraintKey,
> AutomataError
> -from .automata import ConstraintRule, ConstraintCondition
> -
> +from .automata import ConstraintRule, ConstraintCondition,
> AutomataError
>  
>  class dot2k(Monitor, Dot2c):
>      template_dir = "dot2k"
> @@ -217,9 +215,6 @@ class ha2k(dot2k):
>                  value *= 10**9
>          return str(value) + "ull"
>  
> -    def __parse_single_constraint(self, rule: dict, value: str) ->
> str:
> -        return f"ha_get_env(ha_mon, {rule["env"]}{self.enum_suffix},
> time_ns) {rule["op"]} {value}"
> -
>      def __parse_guard_rule(self, rule) -> list[str]:
>          buff = []
>          for c, sep in rule.rules:
> @@ -233,12 +228,6 @@ class ha2k(dot2k):
>              buff.append(cond)
>          return buff
>  
> -    def __get_constraint_env(self, constr: str) -> str:
> -        """Extract the second argument from an ha_ function"""
> -        env =
> constr.split("(")[1].split()[1].rstrip(")").rstrip(",")
> -        assert env.rstrip(f"_{self.name}") in self.envs
> -        return env
> -
>      def __start_to_invariant_check(self, inv: ConstraintCondition) -
> > str:
>          # by default assume the timer has ns expiration
>          clock_type = "ns"
> @@ -249,21 +238,6 @@ class ha2k(dot2k):
>  
>          return f"return ha_check_invariant_{clock_type}(ha_mon,
> {inv.env}_{self.name}, time_ns, {value})"
>  
> -    def __start_to_conv(self, constr: str) -> str:
> -        """
> -        Undo the storage conversion done by ha_start_timer_
> -        """
> -        return "ha_inv_to_guard" + constr[constr.find("("):]
> -
> -    def __parse_timer_constraint(self, rule: dict, value: str) ->
> str:
> -        # by default assume the timer has ns expiration
> -        clock_type = "ns"
> -        if self.env_types.get(rule["env"]) == "j":
> -            clock_type = "jiffy"
> -
> -        return (f"ha_start_timer_{clock_type}(ha_mon,
> {rule["env"]}{self.enum_suffix},"
> -                f" {value}, time_ns)")
> -
>      def __parse_invariant(self, inv):
>          # by default assume the timer has ns expiration
>          clock_type = "ns"


^ permalink raw reply

* Re: [PATCH v2 06/13] verification/rvgen: Convert __fill_verify_guards_func() to Lark
From: Gabriele Monaco @ 2026-06-03 16:00 UTC (permalink / raw)
  To: Nam Cao, Wander Lairson Costa, Steven Rostedt, linux-trace-kernel,
	linux-kernel
In-Reply-To: <2157c2e5fe34251c403604f0d58a3a686a6f6a8b.1779956342.git.namcao@linutronix.de>

On Thu, 2026-05-28 at 10:27 +0200, Nam Cao wrote:
> Prepare to remove self.guards and self.__parse_constraints(), convert
> __fill_verify_guards_func() to use the parsed transitions from Lark.
> 
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---

Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>

> v2:
>   - fix up the ';' separator [Gabriele]
>   - make return type consistent with the function's signature
> [Wander]
>   - fix up __parse_guard_rule()'s signature
> ---
>  tools/verification/rvgen/rvgen/dot2k.py | 38 +++++++++++++++++++----
> --
>  1 file changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/verification/rvgen/rvgen/dot2k.py
> b/tools/verification/rvgen/rvgen/dot2k.py
> index d9f8e1c7737a..ebaa6c9135c2 100644
> --- a/tools/verification/rvgen/rvgen/dot2k.py
> +++ b/tools/verification/rvgen/rvgen/dot2k.py
> @@ -221,6 +221,19 @@ class ha2k(dot2k):
>      def __parse_single_constraint(self, rule: dict, value: str) ->
> str:
>          return f"ha_get_env(ha_mon, {rule["env"]}{self.enum_suffix},
> time_ns) {rule["op"]} {value}"
>  
> +    def __parse_guard_rule(self, rule) -> list[str]:
> +        buff = []
> +        for c, sep in rule.rules:
> +            env = c.env + self.enum_suffix
> +            op = c.op
> +            val = self.__adjust_value(c.val, c.unit)
> +
> +            cond = f"ha_get_env(ha_mon, {env}, time_ns) {op} {val}"
> +            if sep:
> +                cond += f" {sep}"
> +            buff.append(cond)
> +        return buff
> +
>      def __get_constraint_env(self, constr: str) -> str:
>          """Extract the second argument from an ha_ function"""
>          env =
> constr.split("(")[1].split()[1].rstrip(")").rstrip(",")
> @@ -287,7 +300,7 @@ class ha2k(dot2k):
>          rules = invalid_checks + rules
>  
>          separator = "\n\t\t      " if sum(len(r) for r in rules) >
> 80 else " "
> -        return ["res = " + separator.join(rules)]
> +        return ["res = " + separator.join(rules) + ";"]
>  
>      def __validate_constraint(self, key: tuple[int, int] | int,
> constr: str,
>                                rule, reset) -> None:
> @@ -406,7 +419,8 @@ f"""static inline void
> ha_convert_inv_guard(struct ha_monitor *ha_mon,
>  
>      def __fill_verify_guards_func(self) -> list[str]:
>          buff = []
> -        if not self.guards:
> +
> +        if not self.has_guard:
>              return []
>  
>          buff.append(
> @@ -418,14 +432,22 @@ f"""static inline bool ha_verify_guards(struct
> ha_monitor *ha_mon,
>  """)
>  
>          _else = ""
> -        for edge, constr in sorted(self.guards.items()):
> +        for transition in self.transitions:
> +            if not transition.rule and not transition.reset:
> +                continue
> +
>              buff.append(f"\t{_else}if (curr_state == "
> -                        f"{self.states[edge[0]]}{self.enum_suffix}
> && "
> -                        f"event ==
> {self.events[edge[1]]}{self.enum_suffix})")
> -            if constr.count(";") > 0:
> +                        f"{transition.src}{self.enum_suffix} && "
> +                        f"event ==
> {transition.event}{self.enum_suffix})")
> +            rule = transition.rule
> +            reset = transition.reset
> +            if rule and reset:
>                  buff[-1] += " {"
> -            buff += [f"\t\t{c};" for c in constr.split(";")]
> -            if constr.count(";") > 0:
> +            if rule:
> +                buff.append("\t\t" +
> self.__format_guard_rules(self.__parse_guard_rule(rule))[0])
> +            if reset:
> +                buff.append(f"\t\tha_reset_env(ha_mon,
> {reset.env}{self.enum_suffix}, time_ns);")
> +            if rule and reset:
>                  _else = "} else "
>              else:
>                  _else = "else "


^ permalink raw reply

* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: Steven Rostedt @ 2026-06-03 16:17 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony, bp@alien8.de,
	akpm@linux-foundation.org, linmiaohe@huawei.com,
	xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <b930678d-a1ae-458c-8705-7ca9680d4cb6@kernel.org>

On Wed, 3 Jun 2026 15:44:54 +0200
"David Hildenbrand (Arm)" <david@kernel.org> wrote:

> Likely the latter. BPF [1] documents:
> 
> Q: Are tracepoints part of the stable ABI?
> A: NO. Tracepoints are tied to internal implementation details hence they are
> subject to change and can break with newer kernels. BPF programs need to change
> accordingly when this happens.
> 
> The Kernel ABI document explicitly doesn't list them AFAIKS.
> 
> There were previous discussions on the stability of tracepints [2], I don't know
> what changed in the meantime. CCing Steve
> 
> [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html
> [2] https://lwn.net/Articles/747256/
> [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html

Tracepoints are not stable or BPF programs only. But other applications
they are[1].

Adding Linus as he's the Supreme Judge on the matter.

-- Steve

[1] https://lwn.net/Articles/442113/

^ permalink raw reply

* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: Borislav Petkov @ 2026-06-03 16:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Hildenbrand (Arm), Zhuo, Qiuxu, mchehab+huawei@kernel.org,
	Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com,
	xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <20260603121707.7eccb9fb@gandalf.local.home>

On Wed, Jun 03, 2026 at 12:17:07PM -0400, Steven Rostedt wrote:
> On Wed, 3 Jun 2026 15:44:54 +0200
> "David Hildenbrand (Arm)" <david@kernel.org> wrote:
> 
> > Likely the latter. BPF [1] documents:
> > 
> > Q: Are tracepoints part of the stable ABI?
> > A: NO. Tracepoints are tied to internal implementation details hence they are
> > subject to change and can break with newer kernels. BPF programs need to change
> > accordingly when this happens.
> > 
> > The Kernel ABI document explicitly doesn't list them AFAIKS.
> > 
> > There were previous discussions on the stability of tracepints [2], I don't know
> > what changed in the meantime. CCing Steve
> > 
> > [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html
> > [2] https://lwn.net/Articles/747256/
> > [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html
> 
> Tracepoints are not stable or BPF programs only. But other applications
> they are[1].
> 
> Adding Linus as he's the Supreme Judge on the matter.

I *think* tools or libtraceevent can't really anticipate the TP namespace
change so we might have to revert, I'm afraid...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: David Hildenbrand (Arm) @ 2026-06-03 16:26 UTC (permalink / raw)
  To: Borislav Petkov, Steven Rostedt
  Cc: Zhuo, Qiuxu, mchehab+huawei@kernel.org, Luck, Tony,
	akpm@linux-foundation.org, linmiaohe@huawei.com,
	xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <20260603161947.GBaiBUI7C8WWPwD84S@fat_crate.local>

On 6/3/26 18:19, Borislav Petkov wrote:
> On Wed, Jun 03, 2026 at 12:17:07PM -0400, Steven Rostedt wrote:
>> On Wed, 3 Jun 2026 15:44:54 +0200
>> "David Hildenbrand (Arm)" <david@kernel.org> wrote:
>>
>>> Likely the latter. BPF [1] documents:
>>>
>>> Q: Are tracepoints part of the stable ABI?
>>> A: NO. Tracepoints are tied to internal implementation details hence they are
>>> subject to change and can break with newer kernels. BPF programs need to change
>>> accordingly when this happens.
>>>
>>> The Kernel ABI document explicitly doesn't list them AFAIKS.
>>>
>>> There were previous discussions on the stability of tracepints [2], I don't know
>>> what changed in the meantime. CCing Steve
>>>
>>> [1] https://www.kernel.org/doc/html/latest/bpf/bpf_design_QA.html
>>> [2] https://lwn.net/Articles/747256/
>>> [3] https://www.kernel.org/doc/html/latest/admin-guide/abi.html
>>
>> Tracepoints are not stable or BPF programs only. But other applications
>> they are[1].
>>
>> Adding Linus as he's the Supreme Judge on the matter.
> 
> I *think* tools or libtraceevent can't really anticipate the TP namespace
> change so we might have to revert, I'm afraid...

Yeah, I was fearing that when I read in [2]:

	"It has become clear in the past that this promise extends to
	 tracepoints, most notably in 2011 when a tracepoint change broke
	 powertop and had to be reverted."

Which means that I now also fully understand

	"Some kernel maintainers prohibit or severely restrict the addition of
	 tracepoints to their subsystems out of fear that a similar thing could
	 happen to them. "

Whatever the result of this discussion will be, I'll try to document it.

-- 
Cheers,

David

^ permalink raw reply

* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: Steven Rostedt @ 2026-06-03 17:00 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org,
	Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com,
	xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <0c16bf3d-7c6d-4e28-b200-03b7d0ef714a@kernel.org>

On Wed, 3 Jun 2026 18:26:24 +0200
"David Hildenbrand (Arm)" <david@kernel.org> wrote:

> Yeah, I was fearing that when I read in [2]:
> 
> 	"It has become clear in the past that this promise extends to
> 	 tracepoints, most notably in 2011 when a tracepoint change broke
> 	 powertop and had to be reverted."

Technically the issue is with trace events and not tracepoints. The
difference is that a trace event is created via the TRACE_EVENT() macro
which defines what is to be collected from the tracepoint and exposes that
information to tracefs which applications can easily see.

A tracepoint is simply the hook in the code that you can attach to. Trace
events create a callback from that hook to extract the data from the
tracepoint to fill in the fields.

> 
> Which means that I now also fully understand
> 
> 	"Some kernel maintainers prohibit or severely restrict the addition of
> 	 tracepoints to their subsystems out of fear that a similar thing could
> 	 happen to them. "
> 
> Whatever the result of this discussion will be, I'll try to document it.

You can still create a tracepoint without creating a trace event by using
the DECLARE_TRACE() macro. The scheduler subsystem uses that quite
extensively. That creates a tracepoint without exposing it to tracefs. The
runtime verifier uses these hooks to monitor the scheduler.

But you can still connect to these tracepoints from tracefs via a tprobe. A
tprobe hooks to tracepoints that you need the source code to find (just
like a fprobe hooks to any function). Thus applications *can't* rely on
them because there's nothing there to tell you it exists or not.

For example, for the given tracepoint:

 # cd /sys/kernel/tracing
 # echo 't:rfail memory_failure_event pfn=pfn type=type result=result' > dynamic_events
 # cat events/tracepoints/rfail/format 
name: rfail
ID: 1894
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:unsigned long __probe_ip;	offset:8;	size:8;	signed:0;
	field:u64 pfn;	offset:16;	size:8;	signed:0;
	field:s32 type;	offset:24;	size:4;	signed:1;
	field:s32 result;	offset:28;	size:4;	signed:1;

print fmt: "(%lx) pfn=%Lu type=%d result=%d", REC->__probe_ip, REC->pfn, REC->type, REC->result

It requires that BTF exists and the above doesn't annotate the result as
nicely. But you can get data directly from tracepoints this way.

-- Steve

^ permalink raw reply

* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: David Hildenbrand (Arm) @ 2026-06-03 19:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org,
	Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com,
	xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <20260603130006.7d2c4a62@gandalf.local.home>

On 6/3/26 19:00, Steven Rostedt wrote:
> On Wed, 3 Jun 2026 18:26:24 +0200
> "David Hildenbrand (Arm)" <david@kernel.org> wrote:
> 
>> Yeah, I was fearing that when I read in [2]:
>>
>> 	"It has become clear in the past that this promise extends to
>> 	 tracepoints, most notably in 2011 when a tracepoint change broke
>> 	 powertop and had to be reverted."
> 
> Technically the issue is with trace events and not tracepoints. The
> difference is that a trace event is created via the TRACE_EVENT() macro
> which defines what is to be collected from the tracepoint and exposes that
> information to tracefs which applications can easily see.
> 
> A tracepoint is simply the hook in the code that you can attach to. Trace
> events create a callback from that hook to extract the data from the
> tracepoint to fill in the fields.
> 
>>
>> Which means that I now also fully understand
>>
>> 	"Some kernel maintainers prohibit or severely restrict the addition of
>> 	 tracepoints to their subsystems out of fear that a similar thing could
>> 	 happen to them. "
>>
>> Whatever the result of this discussion will be, I'll try to document it.
> 
> You can still create a tracepoint without creating a trace event by using
> the DECLARE_TRACE() macro. The scheduler subsystem uses that quite
> extensively. That creates a tracepoint without exposing it to tracefs. The
> runtime verifier uses these hooks to monitor the scheduler.
> 
> But you can still connect to these tracepoints from tracefs via a tprobe. A
> tprobe hooks to tracepoints that you need the source code to find (just
> like a fprobe hooks to any function). Thus applications *can't* rely on
> them because there's nothing there to tell you it exists or not.

Thanks, that makes sense!

So, would it be fair to say that, in general, what's exposed through

	/sys/kernel/tracing/events/

is stable ABI?


Would the following be sufficient to avoid a full revert and the dependency on CONFIG_RAS?

diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h
index aa57cc8f896b..c46b17602578 100644
--- a/include/trace/events/memory-failure.h
+++ b/include/trace/events/memory-failure.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #undef TRACE_SYSTEM
-#define TRACE_SYSTEM memory_failure
+/* Some user space relies on ras/memory_failure_event */
+#define TRACE_SYSTEM ras
 #define TRACE_INCLUDE_FILE memory-failure
 
 #if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ)


-- 
Cheers,

David

^ permalink raw reply related

* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: Steven Rostedt @ 2026-06-03 19:30 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org,
	Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com,
	xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <b637ede2-73da-49f0-a7eb-70ec79e79624@kernel.org>

On Wed, 3 Jun 2026 21:13:30 +0200
"David Hildenbrand (Arm)" <david@kernel.org> wrote:

> Would the following be sufficient to avoid a full revert and the dependency on CONFIG_RAS?
> 
> diff --git a/include/trace/events/memory-failure.h b/include/trace/events/memory-failure.h
> index aa57cc8f896b..c46b17602578 100644
> --- a/include/trace/events/memory-failure.h
> +++ b/include/trace/events/memory-failure.h
> @@ -1,6 +1,7 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #undef TRACE_SYSTEM
> -#define TRACE_SYSTEM memory_failure
> +/* Some user space relies on ras/memory_failure_event */
> +#define TRACE_SYSTEM ras

If that puts back the original path then yeah, all would be good.

-- Steve
  

>  #define TRACE_INCLUDE_FILE memory-failure
>  
>  #if !defined(_TRACE_MEMORY_FAILURE_H) || defined(TRACE_HEADER_MULTI_READ)


^ permalink raw reply

* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: Steven Rostedt @ 2026-06-03 19:31 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Borislav Petkov, Zhuo, Qiuxu, mchehab+huawei@kernel.org,
	Luck, Tony, akpm@linux-foundation.org, linmiaohe@huawei.com,
	xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <b637ede2-73da-49f0-a7eb-70ec79e79624@kernel.org>

On Wed, 3 Jun 2026 21:13:30 +0200
"David Hildenbrand (Arm)" <david@kernel.org> wrote:

> Thanks, that makes sense!
> 
> So, would it be fair to say that, in general, what's exposed through
> 
> 	/sys/kernel/tracing/events/
> 
> is stable ABI?

It's only stable if something depends on it. It changes all the time.
It's only when someone complains about it that it becomes "stable"!

-- Steve

^ permalink raw reply

* Re: mm/memory-failure tracepoint change breaks userspace rasdaemon
From: Andrew Morton @ 2026-06-03 19:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Hildenbrand (Arm), Borislav Petkov, Zhuo, Qiuxu,
	mchehab+huawei@kernel.org, Luck, Tony, linmiaohe@huawei.com,
	xieyuanbin1@huawei.com, Lai, Yi1, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org, Linus Torvalds
In-Reply-To: <20260603130006.7d2c4a62@gandalf.local.home>

On Wed, 3 Jun 2026 13:00:06 -0400 Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 3 Jun 2026 18:26:24 +0200
> "David Hildenbrand (Arm)" <david@kernel.org> wrote:
> 
> > Yeah, I was fearing that when I read in [2]:
> > 
> > 	"It has become clear in the past that this promise extends to
> > 	 tracepoints, most notably in 2011 when a tracepoint change broke
> > 	 powertop and had to be reverted."
> 
> Technically the issue is with trace events and not tracepoints. The
> difference is that a trace event is created via the TRACE_EVENT() macro
> which defines what is to be collected from the tracepoint and exposes that
> information to tracefs which applications can easily see.
> 
> A tracepoint is simply the hook in the code that you can attach to. Trace
> events create a callback from that hook to extract the data from the
> tracepoint to fill in the fields.

The problem here appears to be that "ras:memory_failure_event" became
"memory_failure:memory_failure_event".

Perhaps we can add infrastructure to permit aliasing "ras" onto
"memory_failure".  So if we make these namespace alterations we can
easily preserve back-compatibility?


^ permalink raw reply

* Re: [PATCH v7 00/42] guest_memfd: In-place conversion support
From: Ackerley Tng @ 2026-06-03 21:27 UTC (permalink / raw)
  To: Ackerley Tng via B4 Relay, aik, andrew.jones, binbin.wu, brauner,
	chao.p.peng, david, ira.weiny, jmattson, jthoughton, michael.roth,
	oupton, pankaj.gupta, qperret, rick.p.edgecombe, rientjes,
	shivankg, steven.price, tabba, willy, wyihan, yan.y.zhao,
	forkloop, pratyush, suzuki.poulose, aneesh.kumar, liam,
	Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka
  Cc: kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260522-gmem-inplace-conversion-v7-0-2f0fae496530@google.com>

Ackerley Tng via B4 Relay <devnull+ackerleytng.google.com@kernel.org>
writes:

> This is v7 of guest_memfd in-place conversion support.
>

Here's the outstanding items after going over everyone's comments
including Sashiko's:

+ KVM: TDX: Make source page optional for KVM_TDX_INIT_MEM_REGION
    + Need to move page clearing into __kvm_gmem_get_pfn to resolve
      leak where populate can put initialized kernel memory into TDX
      guest
    + See suggested fix at [1]
+ KVM: guest_memfd: Only prepare folios for private pages,
    + s/non-CoCo/CoCo in commit message "INIT_SHARED is about to be
      supported for non-CoCo VMs in a later patch in this series
    + Use Suggested-by: Michael Roth <michael.roth@amd.com>
+ KVM: selftests: Test that shared/private status is consistent across
  processes
    + Improve test reliability using pthread_mutex
    + I have a fixup patch offline.
	
I would like feedback on these:
	
+ KVM: selftests: Test conversion with elevated page refcount
    + Askar pointed out that soon vmsplice may not pin pages. Should I
      pin pages through CONFIG_GUP_TEST like in [2]? I prefer not to
      take a dependency on CONFIG_GUP_TEST.
+ KVM: selftests: Add script to exercise private_mem_conversions_test
    + Would like to know what people think of a wrapper script before
      I address Sashiko's comments.

[1] https://lore.kernel.org/all/CAEvNRgEVC=fFuKVgZYvWyZD7t_zvUZihFG8hrACjvtkD5cwugw@mail.gmail.com/
[2] https://lore.kernel.org/all/baa8838f623102931e755cf34c86314b305af49c.1747264138.git.ackerleytng@google.com/

>
> [...snip...]
>

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/3] tracing: Expose tracepoint BTF ids via tracefs
From: Andrii Nakryiko @ 2026-06-03 22:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mykyta Yatsenko, bpf, Mykyta Yatsenko, linux-trace-kernel,
	Andrii Nakryiko, Alexei Starovoitov
In-Reply-To: <20260526215741.1e5b3e42@fedora>

On Tue, May 26, 2026 at 6:57 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 26 May 2026 11:07:56 +0100
> Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> wrote:
>
> > Hi Steven,
> >
> > Gentle ping on this patch from the series.
> >
> > Since this part touches tracing, I’d appreciate your thoughts on the
> > tracing changes whenever you have a chance.
> >
>
> Hi,
>
> I've been looking at this and was wondering if there are ways to not
> extend the trace_event_class structure. It's added for most trace
> events (actually each DECLARE_EVENT_CLASS). Although when things like
> BTF is enabled, this is a very small amount of extra memory.
>
> I haven't been ignoring this. I've just been thinking about other
> approaches, but haven't come up with anything. Of course, I haven't
> been spending that much time on it, as I've been focused on other
> things.

Just in theory, what alternative would there be besides having one
extra pointer in trace_event_class struct? Some sort of lookup by name
or something? E.g., if we know "call" part at runtime for any given
tracepoint for

.btf_ids                = __bpf_trace_btf_ids_##call,

we can probably lookup symbol from kallsyms and fetch BTF IDs that way
without extending the struct?

FWIW, this type info for tracepoints (classic and raw both) are very
useful, because right now one needs to do a bunch of work (subject to
break due to kernel type/name changes, etc) to find this, and for
various generic tracing tooling this type information is actually a
necessity to be useful.

That's just to say that it would be great to have this in some form of
shape, so please help getting this into acceptable form, thanks!

>
> -- Steve

^ permalink raw reply

* Re: [PATCH bpf-next v2 2/3] tracing: Expose tracepoint BTF ids via tracefs
From: Steven Rostedt @ 2026-06-03 22:50 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Mykyta Yatsenko, bpf, Mykyta Yatsenko, linux-trace-kernel,
	Andrii Nakryiko, Alexei Starovoitov
In-Reply-To: <CAEf4BzY5MTGJyys369qDS3b63-tKq3okCpFMqUdfzy0dXk7vLg@mail.gmail.com>

On Wed, 3 Jun 2026 15:41:50 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> That's just to say that it would be great to have this in some form of
> shape, so please help getting this into acceptable form, thanks!

As I said, I'm not against it. I was hoping to find something to help
alleviate the memory usage.

I'll likely take these as is, but I'm currently on vacation and it will
have to wait until next week.

-- Steve
 

^ permalink raw reply

* Re: [GIT PULL] rv fixes for v7.1
From: Steven Rostedt @ 2026-06-03 23:16 UTC (permalink / raw)
  To: Gabriele Monaco; +Cc: linux-kernel, linux-trace-kernel, unknownbbqrx, Wen Yang
In-Reply-To: <20260603125056.75559-1-gmonaco@redhat.com>

On Wed,  3 Jun 2026 14:50:56 +0200
Gabriele Monaco <gmonaco@redhat.com> wrote:

> unknownbbqrx (1):
>       tools/rv: Ensure monitor name and desc are NUL-terminated

Hi Gabriele,

What is this? All commits need to be authored by and signed off by from
a real person with their official name.

 https://docs.kernel.org/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin

-- Steve

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox