Live Patching

Live Patching
 help / color / mirror / Atom feed

* [RFC PATCH 1/2] kernel/notifier: replace single-linked list with double-linked list for reverse traversal
From: chensong_2000 @ 2026-04-15  7:01 UTC (permalink / raw)
  To: rafael, lenb, mturquette, sboyd, viresh.kumar, agk, snitzer,
	mpatocka, bmarzins, song, yukuai, linan122, jason.wessel, danielt,
	dianders, horms, davem, edumazet, kuba, pabeni, paulmck, frederic,
	mcgrof, petr.pavlu, da.gomez, samitolvanen, atomlin, jpoimboe,
	jikos, mbenes, pmladek, joe.lawrence, rostedt, mhiramat,
	mark.rutland, mathieu.desnoyers
  Cc: linux-modules, linux-kernel, linux-trace-kernel, linux-acpi,
	linux-clk, linux-pm, live-patching, dm-devel, linux-raid,
	kgdb-bugreport, netdev, Song Chen

From: Song Chen <chensong_2000@189.cn>

The current notifier chain implementation uses a single-linked list
(struct notifier_block *next), which only supports forward traversal
in priority order. This makes it difficult to handle cleanup/teardown
scenarios that require notifiers to be called in reverse priority order.

A concrete example is the ordering dependency between ftrace and
livepatch during module load/unload. see the detail here [1].

This patch replaces the single-linked list in struct notifier_block
with a struct list_head, converting the notifier chain into a
doubly-linked list sorted in descending priority order. Based on
this, a new function notifier_call_chain_reverse() is introduced,
which traverses the chain in reverse (ascending priority order).
The corresponding blocking_notifier_call_chain_reverse() is also
added as the locking wrapper for blocking notifier chains.

The internal notifier_call_chain_robust() is updated to use
notifier_call_chain_reverse() for rollback: on error, it records
the failing notifier (last_nb) and the count of successfully called
notifiers (nr), then rolls back exactly those nr-1 notifiers in
reverse order starting from last_nb's predecessor, without needing
to know the total length of the chain.

With this change, subsystems with symmetric setup/teardown ordering
requirements can register a single notifier_block with one priority
value, and rely on blocking_notifier_call_chain() for forward
traversal and blocking_notifier_call_chain_reverse() for reverse
traversal, without needing hard-coded call sequences or separate
notifier registrations for each direction.

[1]:https://lore.kernel.org/all
	/alpine.LNX.2.00.1602172216491.22700@cbobk.fhfr.pm/

Signed-off-by: Song Chen <chensong_2000@189.cn>
---
 drivers/acpi/sleep.c      |   1 -
 drivers/clk/clk.c         |   2 +-
 drivers/cpufreq/cpufreq.c |   2 +-
 drivers/md/dm-integrity.c |   1 -
 drivers/md/md.c           |   1 -
 include/linux/notifier.h  |  26 ++---
 kernel/debug/debug_core.c |   1 -
 kernel/notifier.c         | 219 ++++++++++++++++++++++++++++++++------
 net/ipv4/nexthop.c        |   2 +-
 9 files changed, 201 insertions(+), 54 deletions(-)

diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index 132a9df98471..b776dbd5a382 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -56,7 +56,6 @@ static int tts_notify_reboot(struct notifier_block *this,
 
 static struct notifier_block tts_notifier = {
 	.notifier_call	= tts_notify_reboot,
-	.next		= NULL,
 	.priority	= 0,
 };
 
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 47093cda9df3..b6fe380d0468 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -4862,7 +4862,7 @@ int clk_notifier_unregister(struct clk *clk, struct notifier_block *nb)
 			clk->core->notifier_count--;
 
 			/* XXX the notifier code should handle this better */
-			if (!cn->notifier_head.head) {
+			if (list_empty(&cn->notifier_head.head)) {
 				srcu_cleanup_notifier_head(&cn->notifier_head);
 				list_del(&cn->node);
 				kfree(cn);
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 277884d91913..12637e742ffa 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -445,7 +445,7 @@ static void cpufreq_list_transition_notifiers(void)
 
 	mutex_lock(&cpufreq_transition_notifier_list.mutex);
 
-	for (nb = cpufreq_transition_notifier_list.head; nb; nb = nb->next)
+	list_for_each_entry(nb, &cpufreq_transition_notifier_list.head, entry)
 		pr_info("%pS\n", nb->notifier_call);
 
 	mutex_unlock(&cpufreq_transition_notifier_list.mutex);
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index 06e805902151..ccdf75c40b62 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -3909,7 +3909,6 @@ static void dm_integrity_resume(struct dm_target *ti)
 	}
 
 	ic->reboot_notifier.notifier_call = dm_integrity_reboot;
-	ic->reboot_notifier.next = NULL;
 	ic->reboot_notifier.priority = INT_MAX - 1;	/* be notified after md and before hardware drivers */
 	WARN_ON(register_reboot_notifier(&ic->reboot_notifier));
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3ce6f9e9d38e..8249e78636ab 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -10480,7 +10480,6 @@ static int md_notify_reboot(struct notifier_block *this,
 
 static struct notifier_block md_notifier = {
 	.notifier_call	= md_notify_reboot,
-	.next		= NULL,
 	.priority	= INT_MAX, /* before any real devices */
 };
 
diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 01b6c9d9956f..b2abbdfcaadd 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -53,41 +53,41 @@ typedef	int (*notifier_fn_t)(struct notifier_block *nb,
 
 struct notifier_block {
 	notifier_fn_t notifier_call;
-	struct notifier_block __rcu *next;
+	struct list_head __rcu entry;
 	int priority;
 };
 
 struct atomic_notifier_head {
 	spinlock_t lock;
-	struct notifier_block __rcu *head;
+	struct list_head __rcu head;
 };
 
 struct blocking_notifier_head {
 	struct rw_semaphore rwsem;
-	struct notifier_block __rcu *head;
+	struct list_head __rcu head;
 };
 
 struct raw_notifier_head {
-	struct notifier_block __rcu *head;
+	struct list_head __rcu head;
 };
 
 struct srcu_notifier_head {
 	struct mutex mutex;
 	struct srcu_usage srcuu;
 	struct srcu_struct srcu;
-	struct notifier_block __rcu *head;
+	struct list_head __rcu head;
 };
 
 #define ATOMIC_INIT_NOTIFIER_HEAD(name) do {	\
 		spin_lock_init(&(name)->lock);	\
-		(name)->head = NULL;		\
+		INIT_LIST_HEAD(&(name)->head);		\
 	} while (0)
 #define BLOCKING_INIT_NOTIFIER_HEAD(name) do {	\
 		init_rwsem(&(name)->rwsem);	\
-		(name)->head = NULL;		\
+		INIT_LIST_HEAD(&(name)->head);		\
 	} while (0)
 #define RAW_INIT_NOTIFIER_HEAD(name) do {	\
-		(name)->head = NULL;		\
+		INIT_LIST_HEAD(&(name)->head);		\
 	} while (0)
 
 /* srcu_notifier_heads must be cleaned up dynamically */
@@ -97,17 +97,17 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh);
 
 #define ATOMIC_NOTIFIER_INIT(name) {				\
 		.lock = __SPIN_LOCK_UNLOCKED(name.lock),	\
-		.head = NULL }
+		.head = LIST_HEAD_INIT((name).head) }
 #define BLOCKING_NOTIFIER_INIT(name) {				\
 		.rwsem = __RWSEM_INITIALIZER((name).rwsem),	\
-		.head = NULL }
+		.head = LIST_HEAD_INIT((name).head) }
 #define RAW_NOTIFIER_INIT(name)	{				\
-		.head = NULL }
+		.head = LIST_HEAD_INIT((name).head) }
 
 #define SRCU_NOTIFIER_INIT(name, pcpu)				\
 	{							\
 		.mutex = __MUTEX_INITIALIZER(name.mutex),	\
-		.head = NULL,					\
+		.head = LIST_HEAD_INIT((name).head),					\
 		.srcuu = __SRCU_USAGE_INIT(name.srcuu),		\
 		.srcu = __SRCU_STRUCT_INIT(name.srcu, name.srcuu, pcpu, 0), \
 	}
@@ -170,6 +170,8 @@ extern int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
 		unsigned long val, void *v);
 extern int blocking_notifier_call_chain(struct blocking_notifier_head *nh,
 		unsigned long val, void *v);
+extern int blocking_notifier_call_chain_reverse(struct blocking_notifier_head *nh,
+		unsigned long val, void *v);
 extern int raw_notifier_call_chain(struct raw_notifier_head *nh,
 		unsigned long val, void *v);
 extern int srcu_notifier_call_chain(struct srcu_notifier_head *nh,
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 0b9495187fba..a26a7683d142 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -1054,7 +1054,6 @@ dbg_notify_reboot(struct notifier_block *this, unsigned long code, void *x)
 
 static struct notifier_block dbg_reboot_notifier = {
 	.notifier_call		= dbg_notify_reboot,
-	.next			= NULL,
 	.priority		= INT_MAX,
 };
 
diff --git a/kernel/notifier.c b/kernel/notifier.c
index 2f9fe7c30287..6f4d887771c4 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -14,39 +14,47 @@
  *	are layered on top of these, with appropriate locking added.
  */
 
-static int notifier_chain_register(struct notifier_block **nl,
+static int notifier_chain_register(struct list_head *nl,
 				   struct notifier_block *n,
 				   bool unique_priority)
 {
-	while ((*nl) != NULL) {
-		if (unlikely((*nl) == n)) {
+	struct notifier_block *cur;
+
+	list_for_each_entry(cur, nl, entry) {
+		if (unlikely(cur == n)) {
 			WARN(1, "notifier callback %ps already registered",
 			     n->notifier_call);
 			return -EEXIST;
 		}
-		if (n->priority > (*nl)->priority)
-			break;
-		if (n->priority == (*nl)->priority && unique_priority)
+
+		if (n->priority == cur->priority && unique_priority)
 			return -EBUSY;
-		nl = &((*nl)->next);
+
+		if (n->priority > cur->priority) {
+			list_add_tail(&n->entry, &cur->entry);
+			goto out;
+		}
 	}
-	n->next = *nl;
-	rcu_assign_pointer(*nl, n);
+
+	list_add_tail(&n->entry, nl);
+out:
 	trace_notifier_register((void *)n->notifier_call);
 	return 0;
 }
 
-static int notifier_chain_unregister(struct notifier_block **nl,
+static int notifier_chain_unregister(struct list_head *nl,
 		struct notifier_block *n)
 {
-	while ((*nl) != NULL) {
-		if ((*nl) == n) {
-			rcu_assign_pointer(*nl, n->next);
+	struct notifier_block *cur;
+
+	list_for_each_entry(cur, nl, entry) {
+		if (cur == n) {
+			list_del(&n->entry);
 			trace_notifier_unregister((void *)n->notifier_call);
 			return 0;
 		}
-		nl = &((*nl)->next);
 	}
+
 	return -ENOENT;
 }
 
@@ -59,25 +67,25 @@ static int notifier_chain_unregister(struct notifier_block **nl,
  *			value of this parameter is -1.
  *	@nr_calls:	Records the number of notifications sent. Don't care
  *			value of this field is NULL.
+ *	@last_nb:  Records the last called notifier block for rolling back
  *	Return:		notifier_call_chain returns the value returned by the
  *			last notifier function called.
  */
-static int notifier_call_chain(struct notifier_block **nl,
+static int notifier_call_chain(struct list_head *nl,
 			       unsigned long val, void *v,
-			       int nr_to_call, int *nr_calls)
+			       int nr_to_call, int *nr_calls,
+				   struct notifier_block **last_nb)
 {
 	int ret = NOTIFY_DONE;
-	struct notifier_block *nb, *next_nb;
-
-	nb = rcu_dereference_raw(*nl);
+	struct notifier_block *nb;
 
-	while (nb && nr_to_call) {
-		next_nb = rcu_dereference_raw(nb->next);
+	if (!nr_to_call)
+		return ret;
 
+	list_for_each_entry(nb, nl, entry) {
 #ifdef CONFIG_DEBUG_NOTIFIERS
 		if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
 			WARN(1, "Invalid notifier called!");
-			nb = next_nb;
 			continue;
 		}
 #endif
@@ -87,15 +95,118 @@ static int notifier_call_chain(struct notifier_block **nl,
 		if (nr_calls)
 			(*nr_calls)++;
 
+		if (last_nb)
+			*last_nb = nb;
+
 		if (ret & NOTIFY_STOP_MASK)
 			break;
-		nb = next_nb;
-		nr_to_call--;
+
+		if (nr_to_call-- == 0)
+			break;
 	}
 	return ret;
 }
 NOKPROBE_SYMBOL(notifier_call_chain);
 
+/**
+ * notifier_call_chain_reverse - Informs the registered notifiers
+ *			about an event reversely.
+ *	@nl:		Pointer to head of the blocking notifier chain
+ *	@val:		Value passed unmodified to notifier function
+ *	@v:		Pointer passed unmodified to notifier function
+ *	@nr_to_call:	Number of notifier functions to be called. Don't care
+ *			value of this parameter is -1.
+ *	@nr_calls:	Records the number of notifications sent. Don't care
+ *			value of this field is NULL.
+ *	Return:		notifier_call_chain returns the value returned by the
+ *			last notifier function called.
+ */
+static int notifier_call_chain_reverse(struct list_head *nl,
+					struct notifier_block *start,
+					unsigned long val, void *v,
+					int nr_to_call, int *nr_calls)
+{
+	int ret = NOTIFY_DONE;
+	struct notifier_block *nb;
+	bool do_call = (start == NULL);
+
+	if (!nr_to_call)
+		return ret;
+
+	list_for_each_entry_reverse(nb, nl, entry) {
+		if (!do_call) {
+			if (nb == start)
+				do_call = true;
+			continue;
+		}
+#ifdef CONFIG_DEBUG_NOTIFIERS
+		if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
+			WARN(1, "Invalid notifier called!");
+			continue;
+		}
+#endif
+		trace_notifier_run((void *)nb->notifier_call);
+		ret = nb->notifier_call(nb, val, v);
+
+		if (nr_calls)
+			(*nr_calls)++;
+
+		if (ret & NOTIFY_STOP_MASK)
+			break;
+
+		if (nr_to_call-- == 0)
+			break;
+	}
+	return ret;
+}
+NOKPROBE_SYMBOL(notifier_call_chain_reverse);
+
+/**
+ * notifier_call_chain_rcu - Informs the registered notifiers
+ *			about an event for srcu notifier chain.
+ *	@nl:		Pointer to head of the blocking notifier chain
+ *	@val:		Value passed unmodified to notifier function
+ *	@v:		Pointer passed unmodified to notifier function
+ *	@nr_to_call:	Number of notifier functions to be called. Don't care
+ *			value of this parameter is -1.
+ *	@nr_calls:	Records the number of notifications sent. Don't care
+ *			value of this field is NULL.
+ *	Return:		notifier_call_chain returns the value returned by the
+ *			last notifier function called.
+ */
+static int notifier_call_chain_rcu(struct list_head *nl,
+			       unsigned long val, void *v,
+			       int nr_to_call, int *nr_calls)
+{
+	int ret = NOTIFY_DONE;
+	struct notifier_block *nb;
+
+	if (!nr_to_call)
+		return ret;
+
+	list_for_each_entry_rcu(nb, nl, entry) {
+#ifdef CONFIG_DEBUG_NOTIFIERS
+		if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
+			WARN(1, "Invalid notifier called!");
+			continue;
+		}
+#endif
+		trace_notifier_run((void *)nb->notifier_call);
+		ret = nb->notifier_call(nb, val, v);
+
+		if (nr_calls)
+			(*nr_calls)++;
+
+		if (ret & NOTIFY_STOP_MASK)
+			break;
+
+		if (nr_to_call-- == 0)
+			break;
+	}
+	return ret;
+}
+NOKPROBE_SYMBOL(notifier_call_chain_rcu);
+
 /**
  * notifier_call_chain_robust - Inform the registered notifiers about an event
  *                              and rollback on error.
@@ -111,15 +222,16 @@ NOKPROBE_SYMBOL(notifier_call_chain);
  *
  * Return:	the return value of the @val_up call.
  */
-static int notifier_call_chain_robust(struct notifier_block **nl,
+static int notifier_call_chain_robust(struct list_head *nl,
 				     unsigned long val_up, unsigned long val_down,
 				     void *v)
 {
 	int ret, nr = 0;
+	struct notifier_block *last_nb = NULL;
 
-	ret = notifier_call_chain(nl, val_up, v, -1, &nr);
+	ret = notifier_call_chain(nl, val_up, v, -1, &nr, &last_nb);
 	if (ret & NOTIFY_STOP_MASK)
-		notifier_call_chain(nl, val_down, v, nr-1, NULL);
+		notifier_call_chain_reverse(nl, last_nb, val_down, v, nr-1, NULL);
 
 	return ret;
 }
@@ -220,7 +332,7 @@ int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
 	int ret;
 
 	rcu_read_lock();
-	ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
+	ret = notifier_call_chain(&nh->head, val, v, -1, NULL, NULL);
 	rcu_read_unlock();
 
 	return ret;
@@ -238,7 +350,7 @@ NOKPROBE_SYMBOL(atomic_notifier_call_chain);
  */
 bool atomic_notifier_call_chain_is_empty(struct atomic_notifier_head *nh)
 {
-	return !rcu_access_pointer(nh->head);
+	return list_empty(&nh->head);
 }
 
 /*
@@ -340,7 +452,7 @@ int blocking_notifier_call_chain_robust(struct blocking_notifier_head *nh,
 	 * racy then it does not matter what the result of the test
 	 * is, we re-check the list after having taken the lock anyway:
 	 */
-	if (rcu_access_pointer(nh->head)) {
+	if (!list_empty(&nh->head)) {
 		down_read(&nh->rwsem);
 		ret = notifier_call_chain_robust(&nh->head, val_up, val_down, v);
 		up_read(&nh->rwsem);
@@ -375,15 +487,52 @@ int blocking_notifier_call_chain(struct blocking_notifier_head *nh,
 	 * racy then it does not matter what the result of the test
 	 * is, we re-check the list after having taken the lock anyway:
 	 */
-	if (rcu_access_pointer(nh->head)) {
+	if (!list_empty(&nh->head)) {
 		down_read(&nh->rwsem);
-		ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
+		ret = notifier_call_chain(&nh->head, val, v, -1, NULL, NULL);
 		up_read(&nh->rwsem);
 	}
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blocking_notifier_call_chain);
 
+/**
+ *	blocking_notifier_call_chain_reverse - Call functions reversely in
+ *				a blocking notifier chain
+ *	@nh: Pointer to head of the blocking notifier chain
+ *	@val: Value passed unmodified to notifier function
+ *	@v: Pointer passed unmodified to notifier function
+ *
+ *	Calls each function in a notifier chain in turn.  The functions
+ *	run in a process context, so they are allowed to block.
+ *
+ *	If the return value of the notifier can be and'ed
+ *	with %NOTIFY_STOP_MASK then blocking_notifier_call_chain()
+ *	will return immediately, with the return value of
+ *	the notifier function which halted execution.
+ *	Otherwise the return value is the return value
+ *	of the last notifier function called.
+ */
+
+int blocking_notifier_call_chain_reverse(struct blocking_notifier_head *nh,
+		unsigned long val, void *v)
+{
+	int ret = NOTIFY_DONE;
+
+	/*
+	 * We check the head outside the lock, but if this access is
+	 * racy then it does not matter what the result of the test
+	 * is, we re-check the list after having taken the lock anyway:
+	 */
+	if (!list_empty(&nh->head)) {
+		down_read(&nh->rwsem);
+		ret = notifier_call_chain_reverse(&nh->head, NULL, val, v, -1, NULL);
+		up_read(&nh->rwsem);
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(blocking_notifier_call_chain_reverse);
+
 /*
  *	Raw notifier chain routines.  There is no protection;
  *	the caller must provide it.  Use at your own risk!
@@ -450,7 +599,7 @@ EXPORT_SYMBOL_GPL(raw_notifier_call_chain_robust);
 int raw_notifier_call_chain(struct raw_notifier_head *nh,
 		unsigned long val, void *v)
 {
-	return notifier_call_chain(&nh->head, val, v, -1, NULL);
+	return notifier_call_chain(&nh->head, val, v, -1, NULL, NULL);
 }
 EXPORT_SYMBOL_GPL(raw_notifier_call_chain);
 
@@ -543,7 +692,7 @@ int srcu_notifier_call_chain(struct srcu_notifier_head *nh,
 	int idx;
 
 	idx = srcu_read_lock(&nh->srcu);
-	ret = notifier_call_chain(&nh->head, val, v, -1, NULL);
+	ret = notifier_call_chain_rcu(&nh->head, val, v, -1, NULL);
 	srcu_read_unlock(&nh->srcu, idx);
 	return ret;
 }
@@ -566,7 +715,7 @@ void srcu_init_notifier_head(struct srcu_notifier_head *nh)
 	mutex_init(&nh->mutex);
 	if (init_srcu_struct(&nh->srcu) < 0)
 		BUG();
-	nh->head = NULL;
+	INIT_LIST_HEAD(&nh->head);
 }
 EXPORT_SYMBOL_GPL(srcu_init_notifier_head);
 
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index c942f1282236..0afcba2967c7 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -90,7 +90,7 @@ static const struct nla_policy rtm_nh_res_bucket_policy_get[] = {
 
 static bool nexthop_notifiers_is_empty(struct net *net)
 {
-	return !net->nexthop.notifier_chain.head;
+	return list_empty(&net->nexthop.notifier_chain.head);
 }
 
 static void
-- 
2.43.0


^ permalink raw reply related

* Re: [RFC PATCH 2/2] kernel/module: Decouple klp and ftrace from load_module
From: Song Chen @ 2026-04-15  6:43 UTC (permalink / raw)
  To: Petr Pavlu
  Cc: rafael, lenb, mturquette, sboyd, viresh.kumar, agk, snitzer,
	mpatocka, bmarzins, song, yukuai, linan122, jason.wessel, danielt,
	dianders, horms, davem, edumazet, kuba, pabeni, paulmck, frederic,
	mcgrof, da.gomez, samitolvanen, atomlin, jpoimboe, jikos, mbenes,
	pmladek, joe.lawrence, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-modules, linux-kernel,
	linux-trace-kernel, linux-acpi, linux-clk, linux-pm,
	live-patching, dm-devel, linux-raid, kgdb-bugreport, netdev
In-Reply-To: <1191caf5-6a61-4622-a15e-854d3701f4fc@suse.com>

Hi,

On 4/14/26 22:33, Petr Pavlu wrote:
> On 4/13/26 10:07 AM, chensong_2000@189.cn wrote:
>> From: Song Chen <chensong_2000@189.cn>
>>
>> ftrace and livepatch currently have their module load/unload callbacks
>> hard-coded in the module loader as direct function calls to
>> ftrace_module_enable(), klp_module_coming(), klp_module_going()
>> and ftrace_release_mod(). This tight coupling was originally introduced
>> to enforce strict call ordering that could not be guaranteed by the
>> module notifier chain, which only supported forward traversal. Their
>> notifiers were moved in and out back and forth. see [1] and [2].
> 
> I'm unclear about what is meant by the notifiers being moved back and
> forth. The links point to patches that converted ftrace+klp from using
> module notifiers to explicit callbacks due to ordering issues, but this
> switch occurred only once. Have there been other attempts to use
> notifiers again?
> 

Yes,only once,i will rephrase.

>>
>> Now that the notifier chain supports reverse traversal via
>> blocking_notifier_call_chain_reverse(), the ordering can be enforced
>> purely through notifier priority. As a result, the module loader is now
>> decoupled from the implementation details of ftrace and livepatch.
>> What's more, adding a new subsystem with symmetric setup/teardown ordering
>> requirements during module load/unload no longer requires modifying
>> kernel/module/main.c; it only needs to register a notifier_block with an
>> appropriate priority.
>>
>> [1]:https://lore.kernel.org/all/
>> 	alpine.LNX.2.00.1602172216491.22700@cbobk.fhfr.pm/
>> [2]:https://lore.kernel.org/all/
>> 	20160301030034.GC12120@packer-debian-8-amd64.digitalocean.com/
> 
> Nit: Avoid wrapping URLs, as it breaks autolinking and makes the links
> harder to copy.
> 
> Better links would be:
> [1] https://lore.kernel.org/all/1455661953-15838-1-git-send-email-jeyu@redhat.com/
> [2] https://lore.kernel.org/all/1458176139-17455-1-git-send-email-jeyu@redhat.com/
> 
> The first link is the final version of what landed as commit
> 7dcd182bec27 ("ftrace/module: remove ftrace module notifier"). The
> second is commit 7e545d6eca20 ("livepatch/module: remove livepatch
> module notifier").
> 

Thank you, i will update.

>>
>> Signed-off-by: Song Chen <chensong_2000@189.cn>
>> ---
>>   include/linux/module.h  |  8 ++++++++
>>   kernel/livepatch/core.c | 29 ++++++++++++++++++++++++++++-
>>   kernel/module/main.c    | 34 +++++++++++++++-------------------
>>   kernel/trace/ftrace.c   | 38 ++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 89 insertions(+), 20 deletions(-)
>>
>> diff --git a/include/linux/module.h b/include/linux/module.h
>> index 14f391b186c6..0bdd56f9defd 100644
>> --- a/include/linux/module.h
>> +++ b/include/linux/module.h
>> @@ -308,6 +308,14 @@ enum module_state {
>>   	MODULE_STATE_COMING,	/* Full formed, running module_init. */
>>   	MODULE_STATE_GOING,	/* Going away. */
>>   	MODULE_STATE_UNFORMED,	/* Still setting it up. */
>> +	MODULE_STATE_FORMED,
> 
> I don't see a reason to add a new module state. Why is it necessary and
> how does it fit with the existing states?
> 
because once notifier fails in state MODULE_STATE_UNFORMED (now only 
ftrace has someting to do in this state), notifier chain will roll back 
by calling blocking_notifier_call_chain_robust, i'm afraid 
MODULE_STATE_GOING is going to jeopardise the notifers which don't 
handle it appropriately, like:

case MODULE_STATE_COMING:
      kmalloc();
case MODULE_STATE_GOING:
      kfree();


>> +};
>> +
>> +enum module_notifier_prio {
>> +	MODULE_NOTIFIER_PRIO_LOW = INT_MIN,	/* Low prioroty, coming last, going first */
>> +	MODULE_NOTIFIER_PRIO_MID = 0,	/* Normal priority. */
>> +	MODULE_NOTIFIER_PRIO_SECOND_HIGH = INT_MAX - 1,	/* Second high priorigy, coming second*/
>> +	MODULE_NOTIFIER_PRIO_HIGH = INT_MAX,	/* High priorigy, coming first, going late. */
> 
> I suggest being explicit about how the notifiers are ordered. For
> example:
> 
> enum module_notifier_prio {
> 	MODULE_NOTIFIER_PRIO_NORMAL,	/* Normal priority, coming last, going first. */
> 	MODULE_NOTIFIER_PRIO_LIVEPATCH,
> 	MODULE_NOTIFIER_PRIO_FTRACE,	/* High priority, coming first, going late. */
> };
> 

accepted.

>>   };
>>   
>>   struct mod_tree_node {
>> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
>> index 28d15ba58a26..ce78bb23e24b 100644
>> --- a/kernel/livepatch/core.c
>> +++ b/kernel/livepatch/core.c
>> @@ -1375,13 +1375,40 @@ void *klp_find_section_by_name(const struct module *mod, const char *name,
>>   }
>>   EXPORT_SYMBOL_GPL(klp_find_section_by_name);
>>   
>> +static int klp_module_callback(struct notifier_block *nb, unsigned long op,
>> +			void *module)
>> +{
>> +	struct module *mod = module;
>> +	int err = 0;
>> +
>> +	switch (op) {
>> +	case MODULE_STATE_COMING:
>> +		err = klp_module_coming(mod);
>> +		break;
>> +	case MODULE_STATE_LIVE:
>> +		break;
>> +	case MODULE_STATE_GOING:
>> +		klp_module_going(mod);
>> +		break;
>> +	default:
>> +		break;
>> +	}
> 
> klp_module_coming() and klp_module_going() are now used only in
> kernel/livepatch/core.c where they are also defined. This means the
> functions can be static and their declarations removed from
> include/linux/livepatch.h.
> 
> Nit: The MODULE_STATE_LIVE and default cases in the switch can be
> removed.
> 

accepted.

>> +
>> +	return notifier_from_errno(err);
>> +}
>> +
>> +static struct notifier_block klp_module_nb = {
>> +	.notifier_call = klp_module_callback,
>> +	.priority = MODULE_NOTIFIER_PRIO_SECOND_HIGH
>> +};
>> +
>>   static int __init klp_init(void)
>>   {
>>   	klp_root_kobj = kobject_create_and_add("livepatch", kernel_kobj);
>>   	if (!klp_root_kobj)
>>   		return -ENOMEM;
>>   
>> -	return 0;
>> +	return register_module_notifier(&klp_module_nb);
>>   }
>>   
>>   module_init(klp_init);
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index c3ce106c70af..226dd5b80997 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -833,10 +833,8 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
>>   	/* Final destruction now no one is using it. */
>>   	if (mod->exit != NULL)
>>   		mod->exit();
>> -	blocking_notifier_call_chain(&module_notify_list,
>> +	blocking_notifier_call_chain_reverse(&module_notify_list,
>>   				     MODULE_STATE_GOING, mod);
>> -	klp_module_going(mod);
>> -	ftrace_release_mod(mod);
>>   
>>   	async_synchronize_full();
>>   
>> @@ -3135,10 +3133,8 @@ static noinline int do_init_module(struct module *mod)
>>   	mod->state = MODULE_STATE_GOING;
>>   	synchronize_rcu();
>>   	module_put(mod);
>> -	blocking_notifier_call_chain(&module_notify_list,
>> +	blocking_notifier_call_chain_reverse(&module_notify_list,
>>   				     MODULE_STATE_GOING, mod);
>> -	klp_module_going(mod);
>> -	ftrace_release_mod(mod);
>>   	free_module(mod);
>>   	wake_up_all(&module_wq);
>>   
> 
> The patch unexpectedly leaves a call to ftrace_free_mem() in
> do_init_module().

Thanks for pointing it out, it was removed when i implemented and 
tested, but when i organized the patch, it was left. I will remove it.

> 
>> @@ -3281,20 +3277,14 @@ static int complete_formation(struct module *mod, struct load_info *info)
>>   	return err;
>>   }
>>   
>> -static int prepare_coming_module(struct module *mod)
>> +static int prepare_module_state_transaction(struct module *mod,
>> +			unsigned long val_up, unsigned long val_down)
>>   {
>>   	int err;
>>   
>> -	ftrace_module_enable(mod);
>> -	err = klp_module_coming(mod);
>> -	if (err)
>> -		return err;
>> -
>>   	err = blocking_notifier_call_chain_robust(&module_notify_list,
>> -			MODULE_STATE_COMING, MODULE_STATE_GOING, mod);
>> +			val_up, val_down, mod);
>>   	err = notifier_to_errno(err);
>> -	if (err)
>> -		klp_module_going(mod);
>>   
>>   	return err;
>>   }
>> @@ -3468,14 +3458,21 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>   	init_build_id(mod, info);
>>   
>>   	/* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
>> -	ftrace_module_init(mod);
>> +	err = prepare_module_state_transaction(mod,
>> +				MODULE_STATE_UNFORMED, MODULE_STATE_FORMED);
> 
> I believe val_down should be MODULE_STATE_GOING to reverse the
> operation. Why is the new state MODULE_STATE_FORMED needed here?
to avoid this:

case MODULE_STATE_COMING:
      kmalloc();
case MODULE_STATE_GOING:
      kfree();



> 
>> +	if (err)
>> +		goto ddebug_cleanup;
>>   
>>   	/* Finally it's fully formed, ready to start executing. */
>>   	err = complete_formation(mod, info);
>> -	if (err)
>> +	if (err) {
>> +		blocking_notifier_call_chain_reverse(&module_notify_list,
>> +				MODULE_STATE_FORMED, mod);
>>   		goto ddebug_cleanup;
>> +	}
>>   
>> -	err = prepare_coming_module(mod);
>> +	err = prepare_module_state_transaction(mod,
>> +				MODULE_STATE_COMING, MODULE_STATE_GOING);
>>   	if (err)
>>   		goto bug_cleanup;
>>   
>> @@ -3522,7 +3519,6 @@ static int load_module(struct load_info *info, const char __user *uargs,
>>   	destroy_params(mod->kp, mod->num_kp);
>>   	blocking_notifier_call_chain(&module_notify_list,
>>   				     MODULE_STATE_GOING, mod);
> 
> My understanding is that all notifier chains for MODULE_STATE_GOING
> should be reversed.
yes, all, from lowest priority notifier to highest.
I will resend patch 1 which was failed due to my proxy setting.

> 
>> -	klp_module_going(mod);
>>    bug_cleanup:
>>   	mod->state = MODULE_STATE_GOING;
>>   	/* module_bug_cleanup needs module_mutex protection */
> 
> The patch removes the klp_module_going() cleanup call in load_module().
> Similarly, the ftrace_release_mod() call under the ddebug_cleanup label
> should be removed and appropriately replaced with a cleanup via
> a notifier.
> 
     err = prepare_module_state_transaction(mod,
                 MODULE_STATE_UNFORMED, MODULE_STATE_FORMED);
     if (err)
         goto ddebug_cleanup;

ftrace will be cleanup in blocking_notifier_call_chain_robust rolling back.

     err = prepare_module_state_transaction(mod,
                 MODULE_STATE_COMING, MODULE_STATE_GOING);

each notifier including ftrace and klp will be cleanup in 
blocking_notifier_call_chain_robust rolling back.

if all notifiers are successful in MODULE_STATE_COMING, they all will be 
clean up in
  coming_cleanup:
     mod->state = MODULE_STATE_GOING;
     destroy_params(mod->kp, mod->num_kp);
     blocking_notifier_call_chain(&module_notify_list,
                      MODULE_STATE_GOING, mod);

if  something wrong underneath.

>> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
>> index 8df69e702706..efedb98d3db4 100644
>> --- a/kernel/trace/ftrace.c
>> +++ b/kernel/trace/ftrace.c
>> @@ -5241,6 +5241,44 @@ static int __init ftrace_mod_cmd_init(void)
>>   }
>>   core_initcall(ftrace_mod_cmd_init);
>>   
>> +static int ftrace_module_callback(struct notifier_block *nb, unsigned long op,
>> +			void *module)
>> +{
>> +	struct module *mod = module;
>> +
>> +	switch (op) {
>> +	case MODULE_STATE_UNFORMED:
>> +		ftrace_module_init(mod);
>> +		break;
>> +	case MODULE_STATE_COMING:
>> +		ftrace_module_enable(mod);
>> +		break;
>> +	case MODULE_STATE_LIVE:
>> +		ftrace_free_mem(mod, mod->mem[MOD_INIT_TEXT].base,
>> +				mod->mem[MOD_INIT_TEXT].base + mod->mem[MOD_INIT_TEXT].size);
>> +		break;
>> +	case MODULE_STATE_GOING:
>> +	case MODULE_STATE_FORMED:
>> +		ftrace_release_mod(mod);
>> +		break;
>> +	default:
>> +		break;
>> +	}
> 
> ftrace_module_init(), ftrace_module_enable(), ftrace_free_mem() and
> ftrace_release_mod() should be newly used only in kernel/trace/ftrace.c
> where they are also defined. The functions can then be made static and
> removed from include/linux/ftrace.h.
> 
> Nit: The default case in the switch can be removed.
> 

accepted.

>> +
>> +	return notifier_from_errno(0);
> 
> Nit: This can be simply "return NOTIFY_OK;".

accepted
> 
>> +}
>> +
>> +static struct notifier_block ftrace_module_nb = {
>> +	.notifier_call = ftrace_module_callback,
>> +	.priority = MODULE_NOTIFIER_PRIO_HIGH
>> +};
>> +
>> +static int __init ftrace_register_module_notifier(void)
>> +{
>> +	return register_module_notifier(&ftrace_module_nb);
>> +}
>> +core_initcall(ftrace_register_module_notifier);
>> +
>>   static void function_trace_probe_call(unsigned long ip, unsigned long parent_ip,
>>   				      struct ftrace_ops *op, struct ftrace_regs *fregs)
>>   {
> 

Best regards

Song


^ permalink raw reply

* Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions
From: Masami Hiramatsu @ 2026-04-15  0:48 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jpoimboe, jikos, mbenes, pmladek, joe.lawrence, rostedt,
	mathieu.desnoyers, kpsingh, mattbobrowski, song, jolsa, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, yonghong.song,
	live-patching, linux-kernel, linux-trace-kernel, bpf
In-Reply-To: <CALOAHbAOx=C4b+4xQwRf59xvY0vbMPfOjO5LMDghC4Ryksv++Q@mail.gmail.com>

On Sun, 12 Apr 2026 21:50:31 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> On Fri, Apr 10, 2026 at 12:38 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > Hi Yafang,
> >
> > On Thu,  2 Apr 2026 17:26:03 +0800
> > Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > > Livepatching allows for rapid experimentation with new kernel features
> > > without interrupting production workloads. However, static livepatches lack
> > > the flexibility required to tune features based on task-specific attributes,
> > > such as cgroup membership, which is critical in multi-tenant k8s
> > > environments. Furthermore, hardcoding logic into a livepatch prevents
> > > dynamic adjustments based on the runtime environment.
> > >
> > > To address this, we propose a hybrid approach using BPF. Our production use
> > > case involves:
> > >
> > > 1. Deploying a Livepatch function to serve as a stable BPF hook.
> > >
> > > 2. Utilizing bpf_override_return() to dynamically modify the return value
> > >    of that hook based on the current task's context.
> >
> > First of all, I don't like this approach to test a new feature in the
> > kernel, because it sounds like allowing multiple different generations
> > of implementations to coexist simultaneously. The standard kernel code
> > is not designed to withstand such implementations.
> 
> However, this approach is invaluable for rapidly deploying new kernel
> features to production servers without downtime. Upgrading kernels
> across a large fleet remains a significant challenge.

I think that downtime should be accepted as a cost for stability in
general. If your new kernel feature has a bug and causes a crash,
anyway it gets your servers down.

> >
> > For example, if you implement a well-designed framework in a specific
> > subsystem, like Schedext, which allows multiple implementations extended
> > with BPF to coexist, there's no problem (at least it's debatable).
> >
> > But if it is for any function, it is dangerous feature. Bugs that occur
> > in kernels that use this functionality cannot be addressed here. They
> > need to be treated the same way as out-of-tree drivers or forked kernels.
> > I mean, add a tainted flag for this feature. And we don't care of it.
> 
> Agreed. This should be handled as an OOT module rather than part of
> the core kernel.
> 
> >
> > >
> > > A significant challenge arises when atomic-replace is enabled. In this
> > > mode, deploying a new livepatch changes the target function's address,
> > > forcing a re-attachment of the BPF program. This re-attachment latency is
> > > unacceptable in critical paths, such as those handling networking policies.
> > >
> > > To solve this, we introduce a hybrid livepatch mode that allows specific
> > > patches to remain non-replaceable, ensuring the function address remains
> > > stable and the BPF program stays attached.
> >
> > Can you share your actual problem to be solved?
> 
> Here is an example we recently deployed on our production servers:
> 
>   https://lore.kernel.org/bpf/CALOAHbDnNba_w_nWH3-S9GAXw0+VKuLTh1gy5hy9Yqgeo4C0iA@mail.gmail.com/
> 
> In one of our specific clusters, we needed to send BGP traffic out
> through specific NICs based on the destination IP. To achieve this
> without interrupting service, we live-patched
> bond_xmit_3ad_xor_slave_get(), added a new hook called
> bond_get_slave_hook(), and then ran a BPF program attached to that
> hook to select the outgoing NIC from the SKB. This allowed us to
> rapidly deploy the feature with zero downtime.

In this case, you can make specific livepatch or kernel module
to replace the kernel function without using BPF on your server.

The BGP trafic in bonding device seems very specific, so it may not
cause a trouble, but this is very generic change, which allows
user to change more core kernel feature, e.g. memory management
or scheduler etc.

Excessive degrees of freedom introduce uncertainty and instability
into a system. While the functionality is interesting, it would be
a way to generalize schedext in an uncontrolled way.

At a minimum, some form of build time and runtime constraint, along
with a taint flag that clearly indicates in the crash logs that this
feature is being used, would be necessary. (It means this is should
not be used in production environment.)

Thank you,
> 
> [...]
> 
> -- 
> Regards
> Yafang


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v3 0/8] unwind, arm64: add sframe unwinder for kernel
From: Jens Remus @ 2026-04-14 16:10 UTC (permalink / raw)
  To: Dylan Hatch, Roman Gushchin, Weinan Liu, Will Deacon,
	Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Steven Rostedt,
	Catalin Marinas, Jiri Kosina
  Cc: Mark Rutland, Prasanna Kumar T S M, Puranjay Mohan, Song Liu,
	joe.lawrence, linux-toolchains, linux-kernel, live-patching,
	linux-arm-kernel
In-Reply-To: <20260406185000.1378082-1-dylanbhatch@google.com>

On 4/6/2026 8:49 PM, Dylan Hatch wrote:

> Dylan Hatch (7):
>   sframe: Allow kernelspace sframe sections.
>   arm64, unwind: build kernel with sframe V3 info
>   sframe: Provide PC lookup for vmlinux .sframe section.
>   sframe: Allow unsorted FDEs.
>   arm64/module, sframe: Add sframe support for modules.
>   sframe: Introduce in-kernel SFRAME_VALIDATION.
>   unwind: arm64: Use sframe to unwind interrupt frames.

Nit: Trailing dot in commit subjects seems unusual.  Remove?

> Weinan Liu (1):
>   arm64: entry: add unwind info for various kernel entries
Regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


^ permalink raw reply

* Re: [RFC PATCH 2/2] kernel/module: Decouple klp and ftrace from load_module
From: Petr Pavlu @ 2026-04-14 14:33 UTC (permalink / raw)
  To: chensong_2000
  Cc: rafael, lenb, mturquette, sboyd, viresh.kumar, agk, snitzer,
	mpatocka, bmarzins, song, yukuai, linan122, jason.wessel, danielt,
	dianders, horms, davem, edumazet, kuba, pabeni, paulmck, frederic,
	mcgrof, da.gomez, samitolvanen, atomlin, jpoimboe, jikos, mbenes,
	pmladek, joe.lawrence, rostedt, mhiramat, mark.rutland,
	mathieu.desnoyers, linux-modules, linux-kernel,
	linux-trace-kernel, linux-acpi, linux-clk, linux-pm,
	live-patching, dm-devel, linux-raid, kgdb-bugreport, netdev
In-Reply-To: <20260413080701.180976-1-chensong_2000@189.cn>

On 4/13/26 10:07 AM, chensong_2000@189.cn wrote:
> From: Song Chen <chensong_2000@189.cn>
> 
> ftrace and livepatch currently have their module load/unload callbacks
> hard-coded in the module loader as direct function calls to
> ftrace_module_enable(), klp_module_coming(), klp_module_going()
> and ftrace_release_mod(). This tight coupling was originally introduced
> to enforce strict call ordering that could not be guaranteed by the
> module notifier chain, which only supported forward traversal. Their
> notifiers were moved in and out back and forth. see [1] and [2].

I'm unclear about what is meant by the notifiers being moved back and
forth. The links point to patches that converted ftrace+klp from using
module notifiers to explicit callbacks due to ordering issues, but this
switch occurred only once. Have there been other attempts to use
notifiers again?

> 
> Now that the notifier chain supports reverse traversal via
> blocking_notifier_call_chain_reverse(), the ordering can be enforced
> purely through notifier priority. As a result, the module loader is now
> decoupled from the implementation details of ftrace and livepatch.
> What's more, adding a new subsystem with symmetric setup/teardown ordering
> requirements during module load/unload no longer requires modifying
> kernel/module/main.c; it only needs to register a notifier_block with an
> appropriate priority.
> 
> [1]:https://lore.kernel.org/all/
> 	alpine.LNX.2.00.1602172216491.22700@cbobk.fhfr.pm/
> [2]:https://lore.kernel.org/all/
> 	20160301030034.GC12120@packer-debian-8-amd64.digitalocean.com/

Nit: Avoid wrapping URLs, as it breaks autolinking and makes the links
harder to copy.

Better links would be:
[1] https://lore.kernel.org/all/1455661953-15838-1-git-send-email-jeyu@redhat.com/
[2] https://lore.kernel.org/all/1458176139-17455-1-git-send-email-jeyu@redhat.com/

The first link is the final version of what landed as commit
7dcd182bec27 ("ftrace/module: remove ftrace module notifier"). The
second is commit 7e545d6eca20 ("livepatch/module: remove livepatch
module notifier").

> 
> Signed-off-by: Song Chen <chensong_2000@189.cn>
> ---
>  include/linux/module.h  |  8 ++++++++
>  kernel/livepatch/core.c | 29 ++++++++++++++++++++++++++++-
>  kernel/module/main.c    | 34 +++++++++++++++-------------------
>  kernel/trace/ftrace.c   | 38 ++++++++++++++++++++++++++++++++++++++
>  4 files changed, 89 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 14f391b186c6..0bdd56f9defd 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -308,6 +308,14 @@ enum module_state {
>  	MODULE_STATE_COMING,	/* Full formed, running module_init. */
>  	MODULE_STATE_GOING,	/* Going away. */
>  	MODULE_STATE_UNFORMED,	/* Still setting it up. */
> +	MODULE_STATE_FORMED,

I don't see a reason to add a new module state. Why is it necessary and
how does it fit with the existing states?

> +};
> +
> +enum module_notifier_prio {
> +	MODULE_NOTIFIER_PRIO_LOW = INT_MIN,	/* Low prioroty, coming last, going first */
> +	MODULE_NOTIFIER_PRIO_MID = 0,	/* Normal priority. */
> +	MODULE_NOTIFIER_PRIO_SECOND_HIGH = INT_MAX - 1,	/* Second high priorigy, coming second*/
> +	MODULE_NOTIFIER_PRIO_HIGH = INT_MAX,	/* High priorigy, coming first, going late. */

I suggest being explicit about how the notifiers are ordered. For
example:

enum module_notifier_prio {
	MODULE_NOTIFIER_PRIO_NORMAL,	/* Normal priority, coming last, going first. */
	MODULE_NOTIFIER_PRIO_LIVEPATCH,
	MODULE_NOTIFIER_PRIO_FTRACE,	/* High priority, coming first, going late. */
};

>  };
>  
>  struct mod_tree_node {
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index 28d15ba58a26..ce78bb23e24b 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -1375,13 +1375,40 @@ void *klp_find_section_by_name(const struct module *mod, const char *name,
>  }
>  EXPORT_SYMBOL_GPL(klp_find_section_by_name);
>  
> +static int klp_module_callback(struct notifier_block *nb, unsigned long op,
> +			void *module)
> +{
> +	struct module *mod = module;
> +	int err = 0;
> +
> +	switch (op) {
> +	case MODULE_STATE_COMING:
> +		err = klp_module_coming(mod);
> +		break;
> +	case MODULE_STATE_LIVE:
> +		break;
> +	case MODULE_STATE_GOING:
> +		klp_module_going(mod);
> +		break;
> +	default:
> +		break;
> +	}

klp_module_coming() and klp_module_going() are now used only in
kernel/livepatch/core.c where they are also defined. This means the
functions can be static and their declarations removed from
include/linux/livepatch.h.

Nit: The MODULE_STATE_LIVE and default cases in the switch can be
removed.

> +
> +	return notifier_from_errno(err);
> +}
> +
> +static struct notifier_block klp_module_nb = {
> +	.notifier_call = klp_module_callback,
> +	.priority = MODULE_NOTIFIER_PRIO_SECOND_HIGH
> +};
> +
>  static int __init klp_init(void)
>  {
>  	klp_root_kobj = kobject_create_and_add("livepatch", kernel_kobj);
>  	if (!klp_root_kobj)
>  		return -ENOMEM;
>  
> -	return 0;
> +	return register_module_notifier(&klp_module_nb);
>  }
>  
>  module_init(klp_init);
> diff --git a/kernel/module/main.c b/kernel/module/main.c
> index c3ce106c70af..226dd5b80997 100644
> --- a/kernel/module/main.c
> +++ b/kernel/module/main.c
> @@ -833,10 +833,8 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
>  	/* Final destruction now no one is using it. */
>  	if (mod->exit != NULL)
>  		mod->exit();
> -	blocking_notifier_call_chain(&module_notify_list,
> +	blocking_notifier_call_chain_reverse(&module_notify_list,
>  				     MODULE_STATE_GOING, mod);
> -	klp_module_going(mod);
> -	ftrace_release_mod(mod);
>  
>  	async_synchronize_full();
>  
> @@ -3135,10 +3133,8 @@ static noinline int do_init_module(struct module *mod)
>  	mod->state = MODULE_STATE_GOING;
>  	synchronize_rcu();
>  	module_put(mod);
> -	blocking_notifier_call_chain(&module_notify_list,
> +	blocking_notifier_call_chain_reverse(&module_notify_list,
>  				     MODULE_STATE_GOING, mod);
> -	klp_module_going(mod);
> -	ftrace_release_mod(mod);
>  	free_module(mod);
>  	wake_up_all(&module_wq);
>  

The patch unexpectedly leaves a call to ftrace_free_mem() in
do_init_module().

> @@ -3281,20 +3277,14 @@ static int complete_formation(struct module *mod, struct load_info *info)
>  	return err;
>  }
>  
> -static int prepare_coming_module(struct module *mod)
> +static int prepare_module_state_transaction(struct module *mod,
> +			unsigned long val_up, unsigned long val_down)
>  {
>  	int err;
>  
> -	ftrace_module_enable(mod);
> -	err = klp_module_coming(mod);
> -	if (err)
> -		return err;
> -
>  	err = blocking_notifier_call_chain_robust(&module_notify_list,
> -			MODULE_STATE_COMING, MODULE_STATE_GOING, mod);
> +			val_up, val_down, mod);
>  	err = notifier_to_errno(err);
> -	if (err)
> -		klp_module_going(mod);
>  
>  	return err;
>  }
> @@ -3468,14 +3458,21 @@ static int load_module(struct load_info *info, const char __user *uargs,
>  	init_build_id(mod, info);
>  
>  	/* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
> -	ftrace_module_init(mod);
> +	err = prepare_module_state_transaction(mod,
> +				MODULE_STATE_UNFORMED, MODULE_STATE_FORMED);

I believe val_down should be MODULE_STATE_GOING to reverse the
operation. Why is the new state MODULE_STATE_FORMED needed here?

> +	if (err)
> +		goto ddebug_cleanup;
>  
>  	/* Finally it's fully formed, ready to start executing. */
>  	err = complete_formation(mod, info);
> -	if (err)
> +	if (err) {
> +		blocking_notifier_call_chain_reverse(&module_notify_list,
> +				MODULE_STATE_FORMED, mod);
>  		goto ddebug_cleanup;
> +	}
>  
> -	err = prepare_coming_module(mod);
> +	err = prepare_module_state_transaction(mod,
> +				MODULE_STATE_COMING, MODULE_STATE_GOING);
>  	if (err)
>  		goto bug_cleanup;
>  
> @@ -3522,7 +3519,6 @@ static int load_module(struct load_info *info, const char __user *uargs,
>  	destroy_params(mod->kp, mod->num_kp);
>  	blocking_notifier_call_chain(&module_notify_list,
>  				     MODULE_STATE_GOING, mod);

My understanding is that all notifier chains for MODULE_STATE_GOING
should be reversed.

> -	klp_module_going(mod);
>   bug_cleanup:
>  	mod->state = MODULE_STATE_GOING;
>  	/* module_bug_cleanup needs module_mutex protection */

The patch removes the klp_module_going() cleanup call in load_module().
Similarly, the ftrace_release_mod() call under the ddebug_cleanup label
should be removed and appropriately replaced with a cleanup via
a notifier.

> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 8df69e702706..efedb98d3db4 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -5241,6 +5241,44 @@ static int __init ftrace_mod_cmd_init(void)
>  }
>  core_initcall(ftrace_mod_cmd_init);
>  
> +static int ftrace_module_callback(struct notifier_block *nb, unsigned long op,
> +			void *module)
> +{
> +	struct module *mod = module;
> +
> +	switch (op) {
> +	case MODULE_STATE_UNFORMED:
> +		ftrace_module_init(mod);
> +		break;
> +	case MODULE_STATE_COMING:
> +		ftrace_module_enable(mod);
> +		break;
> +	case MODULE_STATE_LIVE:
> +		ftrace_free_mem(mod, mod->mem[MOD_INIT_TEXT].base,
> +				mod->mem[MOD_INIT_TEXT].base + mod->mem[MOD_INIT_TEXT].size);
> +		break;
> +	case MODULE_STATE_GOING:
> +	case MODULE_STATE_FORMED:
> +		ftrace_release_mod(mod);
> +		break;
> +	default:
> +		break;
> +	}

ftrace_module_init(), ftrace_module_enable(), ftrace_free_mem() and
ftrace_release_mod() should be newly used only in kernel/trace/ftrace.c
where they are also defined. The functions can then be made static and
removed from include/linux/ftrace.h.

Nit: The default case in the switch can be removed.

> +
> +	return notifier_from_errno(0);

Nit: This can be simply "return NOTIFY_OK;".

> +}
> +
> +static struct notifier_block ftrace_module_nb = {
> +	.notifier_call = ftrace_module_callback,
> +	.priority = MODULE_NOTIFIER_PRIO_HIGH
> +};
> +
> +static int __init ftrace_register_module_notifier(void)
> +{
> +	return register_module_notifier(&ftrace_module_nb);
> +}
> +core_initcall(ftrace_register_module_notifier);
> +
>  static void function_trace_probe_call(unsigned long ip, unsigned long parent_ip,
>  				      struct ftrace_ops *op, struct ftrace_regs *fregs)
>  {

-- 
Thanks,
Petr

^ permalink raw reply

* Re: [PATCH v3 2/8] arm64, unwind: build kernel with sframe V3 info
From: Jens Remus @ 2026-04-14 12:43 UTC (permalink / raw)
  To: Dylan Hatch, Roman Gushchin, Weinan Liu, Will Deacon,
	Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Steven Rostedt,
	Catalin Marinas, Jiri Kosina
  Cc: Mark Rutland, Prasanna Kumar T S M, Puranjay Mohan, Song Liu,
	joe.lawrence, linux-toolchains, linux-kernel, live-patching,
	linux-arm-kernel, Heiko Carstens, Vasily Gorbik, Ilya Leoshkevich
In-Reply-To: <20260406185000.1378082-3-dylanbhatch@google.com>

On 4/6/2026 8:49 PM, Dylan Hatch wrote:
> Build with -Wa,--gsframe-3 flags to generate a .sframe section. This
> will be used for in-kernel reliable stacktrace in cases where the frame
> pointer alone is insufficient.
> 
> Currently, the sframe format only supports arm64, x86_64 and s390x
> architectures.
> 
> Signed-off-by: Weinan Liu <wnliu@google.com>
> Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
> Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
> diff --git a/MAINTAINERS b/MAINTAINERS

> @@ -27561,6 +27561,7 @@ STACK UNWINDING
>  M:	Josh Poimboeuf <jpoimboe@kernel.org>
>  M:	Steven Rostedt <rostedt@goodmis.org>
>  S:	Maintained
> +F:	arch/*/include/asm/unwind_sframe.h
>  F:	include/linux/sframe.h
>  F:	include/linux/unwind*.h
>  F:	kernel/unwind/

Good catch!  Should we rather add the following in the series you are
basing on, as there are already arch-specific unwind_user.h and
unwind_user_sframe.h?

F:	arch/*/include/asm/unwind*.h

On the other hand I wonder whether the arch-specific headers should
remain maintained by the respective arch maintainers?  How is that
handled in general?

> diff --git a/arch/Kconfig b/arch/Kconfig

> @@ -520,6 +520,13 @@ config SFRAME_VALIDATION
>  
>  	  If unsure, say N.
>  
> +config ARCH_SUPPORTS_SFRAME_UNWINDER
> +	bool
> +	help
> +	  An architecture can select this if it  enables the sframe (Simple
> +	  Frame) unwinder for unwinding kernel stack traces. It uses unwind
> +	  table that is directly generatedby toolchain based on DWARF CFI information.

Nit: s/sframe/SFrame/

> +
>  config HAVE_PERF_REGS
>  	bool
>  	help

> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig

> @@ -112,6 +112,7 @@ config ARM64
>  	select ARCH_SUPPORTS_SCHED_SMT
>  	select ARCH_SUPPORTS_SCHED_CLUSTER
>  	select ARCH_SUPPORTS_SCHED_MC
> +	select ARCH_SUPPORTS_SFRAME_UNWINDER
>  	select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
>  	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
>  	select ARCH_WANT_DEFAULT_BPF_JIT

> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug

> @@ -20,4 +20,17 @@ config ARM64_RELOC_TEST
>  	depends on m
>  	tristate "Relocation testing module"
>  
> +config SFRAME_UNWINDER

Why do you introduce this for arm64 (and debug) only?  If s390 were to
add support (as replacement for s390 backchain), this would have to be
moved or duplicated.  It would not suffice to enable
ARCH_SUPPORTS_SFRAME_UNWINDER to also enable SFRAME_UNWINDER.

As mentioned in my feedback on the previous patch in this series:
Would it make sense to align the naming to the existing 
HAVE_UNWIND_USER_SFRAME, for instance:

  HAVE_UNWIND_KERNEL_SFRAME

> +	bool "Sframe unwinder"
> +	depends on AS_SFRAME3
> +	depends on 64BIT
> +	depends on ARCH_SUPPORTS_SFRAME_UNWINDER
> +	select SFRAME_LOOKUP
> +	help
> +	  This option enables the sframe (Simple Frame) unwinder for unwinding
> +	  kernel stack traces. It uses unwind table that is directly generated
> +	  by toolchain based on DWARF CFI information. In, practice this can
> +	  provide more reliable stacktrace results than unwinding with frame
> +	  pointers alone.

Nit: s/sframe/SFrame/

> +
>  source "drivers/hwtracing/coresight/Kconfig"

You are introducing two new Kconfig options (SFRAME_UNWINDER and
ARCH_SUPPORTS_SFRAME_UNWINDER).  I wonder whether they could somehow be
combined into a single new option.  Although I am not sure how an option
can be both selectable and depending at the same time, so that the ARM64
config could select it, but it would also depend on the above.

> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h

> @@ -491,6 +491,8 @@
>  		*(.rodata1)						\
>  	}								\
>  									\
> +	SFRAME								\
> +									\
>  	/* PCI quirks */						\
>  	.pci_fixup        : AT(ADDR(.pci_fixup) - LOAD_OFFSET) {	\
>  		BOUNDED_SECTION_PRE_LABEL(.pci_fixup_early,  _pci_fixups_early,  __start, __end) \
> @@ -911,6 +913,19 @@
>  #define TRACEDATA
>  #endif
>  
> +#ifdef CONFIG_SFRAME_UNWINDER
> +#define SFRAME							\
> +	/* sframe */						\
> +	.sframe : AT(ADDR(.sframe) - LOAD_OFFSET) {		\
> +		__start_sframe_header = .;			\

		__start_sframe[_section] = .;

> +		KEEP(*(.sframe))				\
> +		KEEP(*(.init.sframe))				\
> +		__stop_sframe_header = .;			\

		__stop_sframe[_section] = .;

Unless I am missing something both are not the start/stop of the .sframe
header (in the .sframe section) but the .sframe section itself (see also
your subsequent "[PATCH v3 4/8] sframe: Provide PC lookup for vmlinux
.sframe section." where you assign both to kernel_sfsec.sframe_start
and kernel_sfsec.sframe_end.

> +	}
> +#else
> +#define SFRAME
> +#endif
> +
>  #ifdef CONFIG_PRINTK_INDEX
>  #define PRINTK_INDEX							\
>  	.printk_index : AT(ADDR(.printk_index) - LOAD_OFFSET) {		\

Regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


^ permalink raw reply

* Re: [PATCH v2 0/6] kselftests: livepatch: Adapt tests to be executed on 4.12 kernels
From: Marcos Paulo de Souza @ 2026-04-14 12:12 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan
  Cc: live-patching, linux-kselftest, linux-kernel
In-Reply-To: <20260413-lp-tests-old-fixes-v2-0-367c7cb5006f@suse.com>

On Mon, 2026-04-13 at 14:26 -0300, Marcos Paulo de Souza wrote:
> A new version of the patchset, with fewer patches now. Please take a
> look!
> 
> Original cover-letter:
> These patches don't really change how the patches are run, just skip
> some tests on kernels that don't support a feature (like kprobe and
> livepatched living together) or when a livepatch sysfs attribute is
> missing.
> 
> The last patch slightly adjusts check_result function to skip dmesg
> messages on SLE kernels when a livepatch is removed.
> 
> These patches are based on printk/for-next branch.
> 
> Please review! Thanks!
> 
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>


I saw some checks made by sashiko, and besides they not being so
horrible, I believe that I should fix me an send a v3 to address them.
But still, feel free to take a look :)

https://sashiko.dev/#/patchset/20260413-lp-tests-old-fixes-v2-0-367c7cb5006f%40suse.com

> ---
> Changes in v2:
> - Patch descriptions were changed to remove "test-X", since it was
> polluting the commit subjects (Miroslav Benes)
> - Patch 8 was dropped since it was checking for a message from an
> out-of-tree patch. (Petr Mladek)
> - Patch 3 was dropped as should be treated as expected failure for
> older kernels. (Petr Mladek)
> - Patch 2 was changed to use y/n instead of 1/0, since it's more
> natural to use it.
> - Patch 1 was changed to handle ppc and loongson, and error out if
> dealing with a different architecture that sets
>   CONFIG_ARCH_HAS_SYSCALL_WRAPPER and haven't changed the test to
> include the proper wrapper prefix.
> - Patch 4 was changed to invert the return of the bash function to
> return 1 in failure, like
>   a normal bash function (Joe Lawrence)
> - Patches 5, 6 an 7 were changed to not split the tests, but to only
> run the tests
>   when the attribute were present (Miroslav Benes)
> - Link to v1:
> https://patch.msgid.link/20260313-lp-tests-old-fixes-v1-0-71ac6dfb3253@suse.com
> 
> ---
> Marcos Paulo de Souza (6):
>       selftests: livepatch: Check for ARCH_HAS_SYSCALL_WRAPPER config
>       selftests: livepatch: Replace true/false module parameter by
> y/n
>       selftests: livepatch: Introduce does_sysfs_exists function
>       selftests: livepatch: Check if patched sysfs attribute exists
>       selftests: livepatch: Check if replace sysfs attribute exists
>       selftests: livepatch: Check if stack_order sysfs attribute
> exists
> 
>  tools/testing/selftests/livepatch/functions.sh     |  10 ++
>  tools/testing/selftests/livepatch/test-kprobe.sh   |   8 +-
>  tools/testing/selftests/livepatch/test-sysfs.sh    | 120
> ++++++++++++---------
>  .../livepatch/test_modules/test_klp_syscall.c      |  17 ++-
>  4 files changed, 99 insertions(+), 56 deletions(-)
> ---
> base-commit: 712c0756828becbfc629ff8d8b82deff5d1115e4
> change-id: 20260309-lp-tests-old-fixes-f955abc8ec27
> 
> Best regards,
> --  
> Marcos Paulo de Souza <mpdesouza@suse.com>

^ permalink raw reply

* Re: [PATCH v3 1/8] sframe: Allow kernelspace sframe sections.
From: Jens Remus @ 2026-04-14 12:09 UTC (permalink / raw)
  To: Dylan Hatch, Roman Gushchin, Weinan Liu, Will Deacon,
	Josh Poimboeuf, Indu Bhagat, Peter Zijlstra, Steven Rostedt,
	Catalin Marinas, Jiri Kosina
  Cc: Mark Rutland, Prasanna Kumar T S M, Puranjay Mohan, Song Liu,
	joe.lawrence, linux-toolchains, linux-kernel, live-patching,
	linux-arm-kernel
In-Reply-To: <20260406185000.1378082-2-dylanbhatch@google.com>

Hello Dylan!

On 4/6/2026 8:49 PM, Dylan Hatch wrote:
> Generalize the sframe lookup code to support kernelspace sections. This
> is done by defining a SFRAME_LOOKUP option that can be activated
> separate from UNWIND_USER_SFRAME, as there will be other clients to this
> library than just userspace unwind.

Nit: s/UNWIND_USER_SFRAME/HAVE_UNWIND_USER_SFRAME/

This actually uses the following two new Kconfig options (with
SFRAME_UNWINDER technically being introduced in the next patch):

  SFRAME_LOOKUP
  SFRAME_UNWINDER

IIUC SFRAME_UNWINDER is the kernel counterpart to the existing
HAVE_UNWIND_USER_SFRAME.  Would it therefore make sense to align the
naming as follows?

  HAVE_UNWIND_KERNEL_SFRAME (instead of SFRAME_UNWINDER)
  HAVE_UNWIND_USER_SFRAME

> Sframe section location is now tracked in a separate sec_type field to
> determine whether user-access functions are necessary to read the sframe
> data. Relevant type delarations are moved and renamed to reflect the
> non-user sframe support.
> 
> Signed-off-by: Dylan Hatch <dylanbhatch@google.com>

> diff --git a/arch/Kconfig b/arch/Kconfig

> @@ -486,6 +486,9 @@ config AS_SFRAME3
>  	def_bool $(as-instr,.cfi_startproc\n.cfi_endproc,-Wa$(comma)--gsframe-3)
>  	select AS_SFRAME
>  
> +config SFRAME_LOOKUP
> +	bool
> +
>  config UNWIND_USER
>  	bool
>  
> @@ -496,6 +499,7 @@ config HAVE_UNWIND_USER_FP
>  config HAVE_UNWIND_USER_SFRAME
>  	bool
>  	select UNWIND_USER
> +	select SFRAME_LOOKUP
>  
>  config SFRAME_VALIDATION
>  	bool "Enable .sframe section debugging"

IIUC SFRAME_LOOKUP only exists to pull in the common (kernel and user)
sframe lookup code if SFRAME_UNWINDER and/or UNWIND_USER_SFRAME are
enabled.  Given there is currently no other use case than kernel/user
stacktrace unwinding, would it make sense to rename it as follows to
group all of the related options with the UNWIND prefix?

  UNWIND_SFRAME[_LOOKUP]

> diff --git a/include/linux/sframe.h b/include/linux/sframe.h

> @@ -4,36 +4,85 @@
>  
>  #include <linux/mm_types.h>
>  #include <linux/srcu.h>
> -#include <linux/unwind_user_types.h>
>  
> -#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
> +#define UNWIND_RULE_DEREF			BIT(31)
> +
> +enum unwind_cfa_rule {
> +	UNWIND_CFA_RULE_SP_OFFSET,		/* CFA = SP + offset */
> +	UNWIND_CFA_RULE_FP_OFFSET,		/* CFA = FP + offset */
> +	UNWIND_CFA_RULE_REG_OFFSET,	/* CFA = reg + offset */
> +	/* DEREF variants */
> +	UNWIND_CFA_RULE_REG_OFFSET_DEREF =	/* CFA = *(reg + offset) */
> +		UNWIND_CFA_RULE_REG_OFFSET | UNWIND_RULE_DEREF,
> +};
> +
> +struct unwind_cfa_rule_data {
> +	enum unwind_cfa_rule rule;
> +	s32 offset;
> +	unsigned int regnum;
> +};
> +
> +enum unwind_rule {
> +	UNWIND_RULE_RETAIN,		/* entity = entity */
> +	UNWIND_RULE_CFA_OFFSET,		/* entity = CFA + offset */
> +	UNWIND_RULE_REG_OFFSET,		/* entity = register + offset */
> +	/* DEREF variants */
> +	UNWIND_RULE_CFA_OFFSET_DEREF =	/* entity = *(CFA + offset) */
> +		UNWIND_RULE_CFA_OFFSET | UNWIND_RULE_DEREF,
> +	UNWIND_RULE_REG_OFFSET_DEREF =	/* entity = *(register + offset) */
> +		UNWIND_RULE_REG_OFFSET | UNWIND_RULE_DEREF,
> +};
> +
> +struct unwind_rule_data {
> +	enum unwind_rule rule;
> +	s32 offset;
> +	unsigned int regnum;
> +};
> +
> +struct unwind_frame {
> +	struct unwind_cfa_rule_data cfa;
> +	struct unwind_rule_data ra;
> +	struct unwind_rule_data fp;
> +	bool outermost;
> +};

You are moving (and renaming to generalize for kernel and user unwind
use) the above definitions from include/linux/unwind_user_types.h to
include/linux/sframe.h.  Given the definitions are used in
kernel/unwind/user.c for FP and SFRAME unwinding this seems wrong to
me.  The definitions should better be moved (and renamed as you did)
into a new include/linux/unwind_types.h (or the like).

> diff --git a/include/linux/unwind_user_types.h b/include/linux/unwind_user_types.h

> @@ -27,47 +27,6 @@ struct unwind_stacktrace {
>  	unsigned long	*entries;
>  };
>  
> -#define UNWIND_USER_RULE_DEREF			BIT(31)
> -
> -enum unwind_user_cfa_rule {
> -	UNWIND_USER_CFA_RULE_SP_OFFSET,		/* CFA = SP + offset */
> -	UNWIND_USER_CFA_RULE_FP_OFFSET,		/* CFA = FP + offset */
> -	UNWIND_USER_CFA_RULE_REG_OFFSET,	/* CFA = reg + offset */
> -	/* DEREF variants */
> -	UNWIND_USER_CFA_RULE_REG_OFFSET_DEREF =	/* CFA = *(reg + offset) */
> -		UNWIND_USER_CFA_RULE_REG_OFFSET | UNWIND_USER_RULE_DEREF,
> -};
> -
> -struct unwind_user_cfa_rule_data {
> -	enum unwind_user_cfa_rule rule;
> -	s32 offset;
> -	unsigned int regnum;
> -};
> -
> -enum unwind_user_rule {
> -	UNWIND_USER_RULE_RETAIN,		/* entity = entity */
> -	UNWIND_USER_RULE_CFA_OFFSET,		/* entity = CFA + offset */
> -	UNWIND_USER_RULE_REG_OFFSET,		/* entity = register + offset */
> -	/* DEREF variants */
> -	UNWIND_USER_RULE_CFA_OFFSET_DEREF =	/* entity = *(CFA + offset) */
> -		UNWIND_USER_RULE_CFA_OFFSET | UNWIND_USER_RULE_DEREF,
> -	UNWIND_USER_RULE_REG_OFFSET_DEREF =	/* entity = *(register + offset) */
> -		UNWIND_USER_RULE_REG_OFFSET | UNWIND_USER_RULE_DEREF,
> -};
> -
> -struct unwind_user_rule_data {
> -	enum unwind_user_rule rule;
> -	s32 offset;
> -	unsigned int regnum;
> -};
> -
> -struct unwind_user_frame {
> -	struct unwind_user_cfa_rule_data cfa;
> -	struct unwind_user_rule_data ra;
> -	struct unwind_user_rule_data fp;
> -	bool outermost;
> -};
> -
>  struct unwind_user_state {
>  	unsigned long				ip;
>  	unsigned long				sp;

> diff --git a/kernel/unwind/Makefile b/kernel/unwind/Makefile

> @@ -1,2 +1,2 @@
>   obj-$(CONFIG_UNWIND_USER)		+= user.o deferred.o
> - obj-$(CONFIG_HAVE_UNWIND_USER_SFRAME)	+= sframe.o
> + obj-$(CONFIG_SFRAME_LOOKUP)	+= sframe.o

> diff --git a/kernel/unwind/sframe.c b/kernel/unwind/sframe.c

> @@ -44,8 +43,6 @@ struct sframe_fre_internal {
>  	unsigned char	dw_size;
>  };
>  
> -DEFINE_STATIC_SRCU(sframe_srcu);
> -
>  static __always_inline unsigned char fre_type_to_size(unsigned char fre_type)
>  {
>  	if (fre_type > 2)
> @@ -60,6 +57,78 @@ static __always_inline unsigned char dataword_size_enum_to_size(unsigned char da
>  	return 1 << dataword_size;
>  }
>  
> +#ifdef CONFIG_HAVE_UNWIND_USER_SFRAME
> +
> +DEFINE_STATIC_SRCU(sframe_srcu);
> +
> +#define UNSAFE_USER_COPY(to, from, size, label)				\
> +	unsafe_copy_from_user(to, (void __user *)from, size, label)
> +
> +#define UNSAFE_USER_GET(to, from, type, label)				\
> +	unsafe_get_user(to, (type __user *)from, label)
> +
> +#else /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
> +
> +#define UNSAFE_USER_COPY(to, from, size, label) do {			\
> +	(void)to; (void)from; (void)size;				\
> +	goto label;							\
> +} while (0)
> +
> +#define UNSAFE_USER_GET(to, from, type, label) do {			\
> +	(void)to; (void)from;						\
> +	goto label;							\
> +} while (0)
> +
> +#endif /* !CONFIG_HAVE_UNWIND_USER_SFRAME */
> +
> +#ifdef CONFIG_SFRAME_UNWINDER
> +
> +#define KERNEL_COPY(to, from, size) memcpy(to, (void *)from, size)
> +#define KERNEL_GET(to, from, type) ({ (to) = *(type *)(from); })
> +
> +#else /* !CONFIG_SFRAME_UNWINDER */
> +
> +#define KERNEL_COPY(to, from, size) do {				\
> +	(void)(to); (void)(from); (void)size;				\
> +	return -EFAULT;							\
> +} while (0)
> +
> +#define KERNEL_GET(to, from, type) do {					\
> +	(void)(to); (void)(from);					\
> +	return -EFAULT;							\
> +} while (0)

The error return value in above dummy implementations is never used
(see DATA_COPY() and DATA_GET() below).  Maybe better define the KERNEL
flavors with the same interface as the UNSAFE_USER ones (with error
label) and have the dummy implementations goto that label?

> +
> +

Nit: Two instead of one empty line.

> +#endif /* !CONFIG_SFRAME_UNWINDER */
> +
> +#define DATA_COPY(sec, to, from, size, label)			\
> +({								\
> +	switch (sec->sec_type) {				\
> +	case SFRAME_KERNEL:					\
> +		KERNEL_COPY(to, from, size);			\
> +		break;						\
> +	case SFRAME_USER:					\
> +		UNSAFE_USER_COPY(to, from, size, label);	\
> +		break;						\
> +	default:						\
> +		return -EFAULT;					\
> +	}							\
> +})
> +
> +#define DATA_GET(sec, to, from, type, label)			\
> +({								\
> +	switch (sec->sec_type) {				\
> +	case SFRAME_KERNEL:					\
> +		KERNEL_GET(to, from, type);			\
> +		break;						\
> +	case SFRAME_USER:					\
> +		UNSAFE_USER_GET(to, from, type, label);		\
> +		break;						\
> +	default:						\
> +		return -EFAULT;					\
> +	}							\
> +})
> +
>  static __always_inline int __read_fde(struct sframe_section *sec,
>  				      unsigned int fde_num,
>  				      struct sframe_fde_internal *fde)

Regards,
Jens
-- 
Jens Remus
Linux on Z Development (D3303)
jremus@de.ibm.com / jremus@linux.ibm.com

IBM Deutschland Research & Development GmbH; Vorsitzender des Aufsichtsrats: Wolfgang Wendt; Geschäftsführung: David Faller; Sitz der Gesellschaft: Ehningen; Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement: https://www.ibm.com/privacy/


^ permalink raw reply

* [PATCH v2 6/6] selftests: livepatch: Check if stack_order sysfs attribute exists
From: Marcos Paulo de Souza @ 2026-04-13 17:26 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan
  Cc: live-patching, linux-kselftest, linux-kernel,
	Marcos Paulo de Souza
In-Reply-To: <20260413-lp-tests-old-fixes-v2-0-367c7cb5006f@suse.com>

In order to run the selftests on older kernels, check if given kernel
has support for the attribute. If the attribute is not supported, skip
the checks.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 tools/testing/selftests/livepatch/test-sysfs.sh | 43 ++++++++++++++-----------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/livepatch/test-sysfs.sh b/tools/testing/selftests/livepatch/test-sysfs.sh
index 0cdaeef00983..77f515529646 100755
--- a/tools/testing/selftests/livepatch/test-sysfs.sh
+++ b/tools/testing/selftests/livepatch/test-sysfs.sh
@@ -10,6 +10,7 @@ MOD_LIVEPATCH3=test_klp_syscall
 
 HAS_PATCH_ATTR=0
 HAS_REPLACE_ATTR=0
+HAS_STACK_ORDER_ATTR=0
 
 setup_config
 
@@ -23,8 +24,6 @@ check_sysfs_rights "$MOD_LIVEPATCH" "" "drwxr-xr-x"
 check_sysfs_rights "$MOD_LIVEPATCH" "enabled" "-rw-r--r--"
 check_sysfs_value  "$MOD_LIVEPATCH" "enabled" "1"
 check_sysfs_rights "$MOD_LIVEPATCH" "force" "--w-------"
-check_sysfs_rights "$MOD_LIVEPATCH" "stack_order" "-r--r--r--"
-check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
 check_sysfs_rights "$MOD_LIVEPATCH" "transition" "-r--r--r--"
 check_sysfs_value  "$MOD_LIVEPATCH" "transition" "0"
 
@@ -39,6 +38,12 @@ if does_sysfs_exists "$MOD_LIVEPATCH" "replace"; then
 	HAS_REPLACE_ATTR=1
 fi
 
+if does_sysfs_exists "$MOD_LIVEPATCH" "stack_order"; then
+	check_sysfs_rights "$MOD_LIVEPATCH" "stack_order" "-r--r--r--"
+	check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
+	HAS_STACK_ORDER_ATTR=1
+fi
+
 disable_lp $MOD_LIVEPATCH
 
 unload_lp $MOD_LIVEPATCH
@@ -150,33 +155,34 @@ livepatch: '$MOD_LIVEPATCH': unpatching complete
 % rmmod $MOD_LIVEPATCH"
 fi
 
-start_test "sysfs test stack_order value"
+if [[ "$HAS_STACK_ORDER_ATTR" == "1" ]]; then
+	start_test "sysfs test stack_order value"
 
-load_lp $MOD_LIVEPATCH
+	load_lp $MOD_LIVEPATCH
 
-check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
+	check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
 
-load_lp $MOD_LIVEPATCH2
+	load_lp $MOD_LIVEPATCH2
 
-check_sysfs_value  "$MOD_LIVEPATCH2" "stack_order" "2"
+	check_sysfs_value  "$MOD_LIVEPATCH2" "stack_order" "2"
 
-load_lp $MOD_LIVEPATCH3
+	load_lp $MOD_LIVEPATCH3
 
-check_sysfs_value  "$MOD_LIVEPATCH3" "stack_order" "3"
+	check_sysfs_value  "$MOD_LIVEPATCH3" "stack_order" "3"
 
-disable_lp $MOD_LIVEPATCH2
-unload_lp $MOD_LIVEPATCH2
+	disable_lp $MOD_LIVEPATCH2
+	unload_lp $MOD_LIVEPATCH2
 
-check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
-check_sysfs_value  "$MOD_LIVEPATCH3" "stack_order" "2"
+	check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
+	check_sysfs_value  "$MOD_LIVEPATCH3" "stack_order" "2"
 
-disable_lp $MOD_LIVEPATCH3
-unload_lp $MOD_LIVEPATCH3
+	disable_lp $MOD_LIVEPATCH3
+	unload_lp $MOD_LIVEPATCH3
 
-disable_lp $MOD_LIVEPATCH
-unload_lp $MOD_LIVEPATCH
+	disable_lp $MOD_LIVEPATCH
+	unload_lp $MOD_LIVEPATCH
 
-check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
+	check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 livepatch: '$MOD_LIVEPATCH': starting patching transition
@@ -216,5 +222,6 @@ livepatch: '$MOD_LIVEPATCH': starting unpatching transition
 livepatch: '$MOD_LIVEPATCH': completing unpatching transition
 livepatch: '$MOD_LIVEPATCH': unpatching complete
 % rmmod $MOD_LIVEPATCH"
+fi
 
 exit 0

-- 
2.52.0


^ permalink raw reply related

* [PATCH v2 5/6] selftests: livepatch: Check if replace sysfs attribute exists
From: Marcos Paulo de Souza @ 2026-04-13 17:26 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan
  Cc: live-patching, linux-kselftest, linux-kernel,
	Marcos Paulo de Souza
In-Reply-To: <20260413-lp-tests-old-fixes-v2-0-367c7cb5006f@suse.com>

In order to run the selftests on older kernels, check if given kernel
has support for the attribute. If the attribute is not supported, skip
the checks.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 tools/testing/selftests/livepatch/test-sysfs.sh | 39 +++++++++++++++----------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/livepatch/test-sysfs.sh b/tools/testing/selftests/livepatch/test-sysfs.sh
index a2d649404a63..0cdaeef00983 100755
--- a/tools/testing/selftests/livepatch/test-sysfs.sh
+++ b/tools/testing/selftests/livepatch/test-sysfs.sh
@@ -9,6 +9,7 @@ MOD_LIVEPATCH2=test_klp_callbacks_demo
 MOD_LIVEPATCH3=test_klp_syscall
 
 HAS_PATCH_ATTR=0
+HAS_REPLACE_ATTR=0
 
 setup_config
 
@@ -22,7 +23,6 @@ check_sysfs_rights "$MOD_LIVEPATCH" "" "drwxr-xr-x"
 check_sysfs_rights "$MOD_LIVEPATCH" "enabled" "-rw-r--r--"
 check_sysfs_value  "$MOD_LIVEPATCH" "enabled" "1"
 check_sysfs_rights "$MOD_LIVEPATCH" "force" "--w-------"
-check_sysfs_rights "$MOD_LIVEPATCH" "replace" "-r--r--r--"
 check_sysfs_rights "$MOD_LIVEPATCH" "stack_order" "-r--r--r--"
 check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
 check_sysfs_rights "$MOD_LIVEPATCH" "transition" "-r--r--r--"
@@ -34,6 +34,11 @@ if does_sysfs_exists "$MOD_LIVEPATCH/vmlinux" "patched"; then
 	HAS_PATCH_ATTR=1
 fi
 
+if does_sysfs_exists "$MOD_LIVEPATCH" "replace"; then
+	check_sysfs_rights "$MOD_LIVEPATCH" "replace" "-r--r--r--"
+	HAS_REPLACE_ATTR=1
+fi
+
 disable_lp $MOD_LIVEPATCH
 
 unload_lp $MOD_LIVEPATCH
@@ -96,18 +101,19 @@ livepatch: 'test_klp_callbacks_demo': unpatching complete
 % rmmod test_klp_callbacks_demo"
 fi
 
-start_test "sysfs test replace enabled"
+if [[ "$HAS_REPLACE_ATTR" == "1" ]]; then
+	start_test "sysfs test replace enabled"
 
-MOD_LIVEPATCH=test_klp_atomic_replace
-load_lp $MOD_LIVEPATCH replace=1
+	MOD_LIVEPATCH=test_klp_atomic_replace
+	load_lp $MOD_LIVEPATCH replace=1
 
-check_sysfs_rights "$MOD_LIVEPATCH" "replace" "-r--r--r--"
-check_sysfs_value  "$MOD_LIVEPATCH" "replace" "1"
+	check_sysfs_rights "$MOD_LIVEPATCH" "replace" "-r--r--r--"
+	check_sysfs_value  "$MOD_LIVEPATCH" "replace" "1"
 
-disable_lp $MOD_LIVEPATCH
-unload_lp $MOD_LIVEPATCH
+	disable_lp $MOD_LIVEPATCH
+	unload_lp $MOD_LIVEPATCH
 
-check_result "% insmod test_modules/$MOD_LIVEPATCH.ko replace=1
+	check_result "% insmod test_modules/$MOD_LIVEPATCH.ko replace=1
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 livepatch: '$MOD_LIVEPATCH': starting patching transition
@@ -120,17 +126,17 @@ livepatch: '$MOD_LIVEPATCH': completing unpatching transition
 livepatch: '$MOD_LIVEPATCH': unpatching complete
 % rmmod $MOD_LIVEPATCH"
 
-start_test "sysfs test replace disabled"
+	start_test "sysfs test replace disabled"
 
-load_lp $MOD_LIVEPATCH replace=0
+	load_lp $MOD_LIVEPATCH replace=0
 
-check_sysfs_rights "$MOD_LIVEPATCH" "replace" "-r--r--r--"
-check_sysfs_value  "$MOD_LIVEPATCH" "replace" "0"
+	check_sysfs_rights "$MOD_LIVEPATCH" "replace" "-r--r--r--"
+	check_sysfs_value  "$MOD_LIVEPATCH" "replace" "0"
 
-disable_lp $MOD_LIVEPATCH
-unload_lp $MOD_LIVEPATCH
+	disable_lp $MOD_LIVEPATCH
+	unload_lp $MOD_LIVEPATCH
 
-check_result "% insmod test_modules/$MOD_LIVEPATCH.ko replace=0
+	check_result "% insmod test_modules/$MOD_LIVEPATCH.ko replace=0
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 livepatch: '$MOD_LIVEPATCH': starting patching transition
@@ -142,6 +148,7 @@ livepatch: '$MOD_LIVEPATCH': starting unpatching transition
 livepatch: '$MOD_LIVEPATCH': completing unpatching transition
 livepatch: '$MOD_LIVEPATCH': unpatching complete
 % rmmod $MOD_LIVEPATCH"
+fi
 
 start_test "sysfs test stack_order value"
 

-- 
2.52.0


^ permalink raw reply related

* [PATCH v2 4/6] selftests: livepatch: Check if patched sysfs attribute exists
From: Marcos Paulo de Souza @ 2026-04-13 17:26 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan
  Cc: live-patching, linux-kselftest, linux-kernel,
	Marcos Paulo de Souza
In-Reply-To: <20260413-lp-tests-old-fixes-v2-0-367c7cb5006f@suse.com>

In order to run the selftests on older kernels, check if given kernel
has support for the attribute. If the attribute is not supported, skip
the checks.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 tools/testing/selftests/livepatch/test-sysfs.sh | 38 +++++++++++++++----------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/livepatch/test-sysfs.sh b/tools/testing/selftests/livepatch/test-sysfs.sh
index 58fe1d96997c..a2d649404a63 100755
--- a/tools/testing/selftests/livepatch/test-sysfs.sh
+++ b/tools/testing/selftests/livepatch/test-sysfs.sh
@@ -8,6 +8,8 @@ MOD_LIVEPATCH=test_klp_livepatch
 MOD_LIVEPATCH2=test_klp_callbacks_demo
 MOD_LIVEPATCH3=test_klp_syscall
 
+HAS_PATCH_ATTR=0
+
 setup_config
 
 # - load a livepatch and verifies the sysfs entries work as expected
@@ -25,8 +27,12 @@ check_sysfs_rights "$MOD_LIVEPATCH" "stack_order" "-r--r--r--"
 check_sysfs_value  "$MOD_LIVEPATCH" "stack_order" "1"
 check_sysfs_rights "$MOD_LIVEPATCH" "transition" "-r--r--r--"
 check_sysfs_value  "$MOD_LIVEPATCH" "transition" "0"
-check_sysfs_rights "$MOD_LIVEPATCH" "vmlinux/patched" "-r--r--r--"
-check_sysfs_value  "$MOD_LIVEPATCH" "vmlinux/patched" "1"
+
+if does_sysfs_exists "$MOD_LIVEPATCH/vmlinux" "patched"; then
+	check_sysfs_rights "$MOD_LIVEPATCH" "vmlinux/patched" "-r--r--r--"
+	check_sysfs_value  "$MOD_LIVEPATCH" "vmlinux/patched" "1"
+	HAS_PATCH_ATTR=1
+fi
 
 disable_lp $MOD_LIVEPATCH
 
@@ -45,23 +51,24 @@ livepatch: '$MOD_LIVEPATCH': completing unpatching transition
 livepatch: '$MOD_LIVEPATCH': unpatching complete
 % rmmod $MOD_LIVEPATCH"
 
-start_test "sysfs test object/patched"
+if [[ "$HAS_PATCH_ATTR" == "1" ]]; then
+	start_test "sysfs test object/patched"
 
-MOD_LIVEPATCH=test_klp_callbacks_demo
-MOD_TARGET=test_klp_callbacks_mod
-load_lp $MOD_LIVEPATCH
+	MOD_LIVEPATCH=test_klp_callbacks_demo
+	MOD_TARGET=test_klp_callbacks_mod
+	load_lp $MOD_LIVEPATCH
 
-# check the "patch" file changes as target module loads/unloads
-check_sysfs_value  "$MOD_LIVEPATCH" "$MOD_TARGET/patched" "0"
-load_mod $MOD_TARGET
-check_sysfs_value  "$MOD_LIVEPATCH" "$MOD_TARGET/patched" "1"
-unload_mod $MOD_TARGET
-check_sysfs_value  "$MOD_LIVEPATCH" "$MOD_TARGET/patched" "0"
+	# check the "patch" file changes as target module loads/unloads
+	check_sysfs_value  "$MOD_LIVEPATCH" "$MOD_TARGET/patched" "0"
+	load_mod $MOD_TARGET
+	check_sysfs_value  "$MOD_LIVEPATCH" "$MOD_TARGET/patched" "1"
+	unload_mod $MOD_TARGET
+	check_sysfs_value  "$MOD_LIVEPATCH" "$MOD_TARGET/patched" "0"
 
-disable_lp $MOD_LIVEPATCH
-unload_lp $MOD_LIVEPATCH
+	disable_lp $MOD_LIVEPATCH
+	unload_lp $MOD_LIVEPATCH
 
-check_result "% insmod test_modules/test_klp_callbacks_demo.ko
+	check_result "% insmod test_modules/test_klp_callbacks_demo.ko
 livepatch: enabling patch 'test_klp_callbacks_demo'
 livepatch: 'test_klp_callbacks_demo': initializing patching transition
 test_klp_callbacks_demo: pre_patch_callback: vmlinux
@@ -87,6 +94,7 @@ livepatch: 'test_klp_callbacks_demo': completing unpatching transition
 test_klp_callbacks_demo: post_unpatch_callback: vmlinux
 livepatch: 'test_klp_callbacks_demo': unpatching complete
 % rmmod test_klp_callbacks_demo"
+fi
 
 start_test "sysfs test replace enabled"
 

-- 
2.52.0


^ permalink raw reply related

* [PATCH v2 3/6] selftests: livepatch: Introduce does_sysfs_exists function
From: Marcos Paulo de Souza @ 2026-04-13 17:26 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan
  Cc: live-patching, linux-kselftest, linux-kernel,
	Marcos Paulo de Souza
In-Reply-To: <20260413-lp-tests-old-fixes-v2-0-367c7cb5006f@suse.com>

Return 1 if the livepatch sysfs attribute exists, and 0 otherwise. This
new function will be used in the next patches.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 tools/testing/selftests/livepatch/functions.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index 8ec0cb64ad94..382596eaaf01 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -339,6 +339,16 @@ function check_result {
 	fi
 }
 
+# does_sysfs_exists(modname, attr) - check sysfs attribute existence
+#	modname - livepatch module creating the sysfs interface
+#	attr - attribute name to be checked
+function does_sysfs_exists() {
+	local mod="$1"; shift
+	local attr="$1"; shift
+
+	[[ -f "$SYSFS_KLP_DIR/$mod/$attr" ]]
+}
+
 # check_sysfs_rights(modname, rel_path, expected_rights) - check sysfs
 # path permissions
 #	modname - livepatch module creating the sysfs interface

-- 
2.52.0


^ permalink raw reply related

* [PATCH v2 2/6] selftests: livepatch: Replace true/false module parameter by y/n
From: Marcos Paulo de Souza @ 2026-04-13 17:26 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan
  Cc: live-patching, linux-kselftest, linux-kernel,
	Marcos Paulo de Souza
In-Reply-To: <20260413-lp-tests-old-fixes-v2-0-367c7cb5006f@suse.com>

Older kernels don't support true/false for boolean module parameters
because they lack commit 0d6ea3ac94ca
("lib/kstrtox.c: add "false"/"true" support to kstrtobool()"). Replace
true/false by y/n so the test module can be loaded on older kernels.

No functional changes.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 tools/testing/selftests/livepatch/test-kprobe.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/livepatch/test-kprobe.sh b/tools/testing/selftests/livepatch/test-kprobe.sh
index b67dfad03d97..7ced4082cff3 100755
--- a/tools/testing/selftests/livepatch/test-kprobe.sh
+++ b/tools/testing/selftests/livepatch/test-kprobe.sh
@@ -20,11 +20,11 @@ start_test "livepatch interaction with kprobed function with post_handler"
 
 echo 1 > "$SYSFS_KPROBES_DIR/enabled"
 
-load_mod $MOD_KPROBE has_post_handler=true
+load_mod $MOD_KPROBE has_post_handler=y
 load_failing_mod $MOD_LIVEPATCH
 unload_mod $MOD_KPROBE
 
-check_result "% insmod test_modules/test_klp_kprobe.ko has_post_handler=true
+check_result "% insmod test_modules/test_klp_kprobe.ko has_post_handler=y
 % insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
@@ -39,14 +39,14 @@ insmod: ERROR: could not insert module test_modules/$MOD_LIVEPATCH.ko: Device or
 
 start_test "livepatch interaction with kprobed function without post_handler"
 
-load_mod $MOD_KPROBE has_post_handler=false
+load_mod $MOD_KPROBE has_post_handler=n
 load_lp $MOD_LIVEPATCH
 
 unload_mod $MOD_KPROBE
 disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 
-check_result "% insmod test_modules/test_klp_kprobe.ko has_post_handler=false
+check_result "% insmod test_modules/test_klp_kprobe.ko has_post_handler=n
 % insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition

-- 
2.52.0


^ permalink raw reply related

* [PATCH v2 1/6] selftests: livepatch: Check for ARCH_HAS_SYSCALL_WRAPPER config
From: Marcos Paulo de Souza @ 2026-04-13 17:26 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan
  Cc: live-patching, linux-kselftest, linux-kernel,
	Marcos Paulo de Souza
In-Reply-To: <20260413-lp-tests-old-fixes-v2-0-367c7cb5006f@suse.com>

Older kernels that lack CONFIG_ARCH_HAS_SYSCALL_WRAPPER config don't
have any prefixes for their syscalls. The same applies to current
powerpc and loongarch, covering all currently supported architectures
that support livepatch.

The other supported architectures have specific prefixes, so error out
when a new architecture adds livepatch support with wrappes but didn't
update the test to include it.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 .../selftests/livepatch/test_modules/test_klp_syscall.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c b/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
index dd802783ea84..b5527a288a7c 100644
--- a/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
+++ b/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
@@ -12,15 +12,26 @@
 #include <linux/slab.h>
 #include <linux/livepatch.h>
 
-#if defined(__x86_64__)
+/*
+ * Before CONFIG_ARCH_HAS_SYSCALL_WRAPPER was introduced there were no
+ * prefixes for system calls.
+ * Both ppc and loongarch does not set prefixes for their system calls either.
+ */
+#if !defined(CONFIG_ARCH_HAS_SYSCALL_WRAPPER) ||  defined(__powerpc__) || \
+	defined(__loongarch__)
+#define FN_PREFIX
+#elif defined(__x86_64__)
 #define FN_PREFIX __x64_
 #elif defined(__s390x__)
 #define FN_PREFIX __s390x_
 #elif defined(__aarch64__)
 #define FN_PREFIX __arm64_
-#else
-/* powerpc does not select ARCH_HAS_SYSCALL_WRAPPER */
+#elif defined(__powerpc__)
+#define FN_PREFIX
+#elif defined(__loongarch__)
 #define FN_PREFIX
+#else
+#error "Missing syscall wrapper for the given architecture."
 #endif
 
 /* Protects klp_pids */

-- 
2.52.0


^ permalink raw reply related

* [PATCH v2 0/6] kselftests: livepatch: Adapt tests to be executed on 4.12 kernels
From: Marcos Paulo de Souza @ 2026-04-13 17:26 UTC (permalink / raw)
  To: Josh Poimboeuf, Jiri Kosina, Miroslav Benes, Petr Mladek,
	Joe Lawrence, Shuah Khan
  Cc: live-patching, linux-kselftest, linux-kernel,
	Marcos Paulo de Souza

A new version of the patchset, with fewer patches now. Please take a look!

Original cover-letter:
These patches don't really change how the patches are run, just skip
some tests on kernels that don't support a feature (like kprobe and
livepatched living together) or when a livepatch sysfs attribute is
missing.

The last patch slightly adjusts check_result function to skip dmesg
messages on SLE kernels when a livepatch is removed.

These patches are based on printk/for-next branch.

Please review! Thanks!

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
Changes in v2:
- Patch descriptions were changed to remove "test-X", since it was polluting the commit subjects (Miroslav Benes)
- Patch 8 was dropped since it was checking for a message from an out-of-tree patch. (Petr Mladek)
- Patch 3 was dropped as should be treated as expected failure for older kernels. (Petr Mladek)
- Patch 2 was changed to use y/n instead of 1/0, since it's more natural to use it.
- Patch 1 was changed to handle ppc and loongson, and error out if dealing with a different architecture that sets
  CONFIG_ARCH_HAS_SYSCALL_WRAPPER and haven't changed the test to include the proper wrapper prefix.
- Patch 4 was changed to invert the return of the bash function to return 1 in failure, like
  a normal bash function (Joe Lawrence)
- Patches 5, 6 an 7 were changed to not split the tests, but to only run the tests
  when the attribute were present (Miroslav Benes)
- Link to v1: https://patch.msgid.link/20260313-lp-tests-old-fixes-v1-0-71ac6dfb3253@suse.com

---
Marcos Paulo de Souza (6):
      selftests: livepatch: Check for ARCH_HAS_SYSCALL_WRAPPER config
      selftests: livepatch: Replace true/false module parameter by y/n
      selftests: livepatch: Introduce does_sysfs_exists function
      selftests: livepatch: Check if patched sysfs attribute exists
      selftests: livepatch: Check if replace sysfs attribute exists
      selftests: livepatch: Check if stack_order sysfs attribute exists

 tools/testing/selftests/livepatch/functions.sh     |  10 ++
 tools/testing/selftests/livepatch/test-kprobe.sh   |   8 +-
 tools/testing/selftests/livepatch/test-sysfs.sh    | 120 ++++++++++++---------
 .../livepatch/test_modules/test_klp_syscall.c      |  17 ++-
 4 files changed, 99 insertions(+), 56 deletions(-)
---
base-commit: 712c0756828becbfc629ff8d8b82deff5d1115e4
change-id: 20260309-lp-tests-old-fixes-f955abc8ec27

Best regards,
--  
Marcos Paulo de Souza <mpdesouza@suse.com>


^ permalink raw reply

* [RFC PATCH 0/2] Decouple ftrace/livepatch from module loader via notifier priority and reverse traversal
From: chensong_2000 @ 2026-04-13  8:01 UTC (permalink / raw)
  To: rafael, lenb, mturquette, sboyd, viresh.kumar, agk, snitzer,
	mpatocka, bmarzins, song, yukuai, linan122, jason.wessel, danielt,
	dianders, horms, davem, edumazet, kuba, pabeni, paulmck, frederic,
	mcgrof, petr.pavlu, da.gomez, samitolvanen, atomlin, jpoimboe,
	jikos, mbenes, pmladek, joe.lawrence, rostedt, mhiramat,
	mark.rutland, mathieu.desnoyers
  Cc: linux-modules, linux-kernel, linux-trace-kernel, linux-acpi,
	linux-clk, linux-pm, live-patching, dm-devel, linux-raid,
	kgdb-bugreport, netdev, Song Chen

From: Song Chen <chensong_2000@189.cn>

This patchset addresses a long-standing tight coupling between the
module loader and two of its key consumers: ftrace and livepatch.

Background:

The module loader currently hard-codes direct calls to
ftrace_module_enable(), klp_module_coming(), klp_module_going() and
ftrace_release_mod() inside prepare_coming_module() and the module
unload path. This hard-coding was necessary because the module notifier
chain could not guarantee the strict call ordering that ftrace and
livepatch require:

  During MODULE_STATE_COMING, ftrace must run before livepatch, so
  that per-module function records are ready before livepatch registers
  its ftrace hooks.

  During MODULE_STATE_GOING, livepatch must run before ftrace, so that
  livepatch removes its hooks before ftrace releases those records.

This symmetric setup/teardown ordering could not be expressed through
the notifier chain because the chain only supported forward (descending
priority) traversal. Without reverse traversal, it was impossible to
guarantee that the GOING order would be the strict inverse of the
COMING order using a single priority value per notifier.

Patch 1 - notifier: replace single-linked list with double-linked list.
Patch 2 - ftrace/klp: decouple from module loader using notifier
priority.

headsup: somehow the smtp of my mailbox doesn't work very well lately, 
if i receive return letter, i have to resend, sorry in advance.

Song Chen (2):
  kernel/notifier: replace single-linked list with double-linked list
    for reverse traversal
  kernel/module: Decouple klp and ftrace from load_module

 drivers/acpi/sleep.c      |   1 -
 drivers/clk/clk.c         |   2 +-
 drivers/cpufreq/cpufreq.c |   2 +-
 drivers/md/dm-integrity.c |   1 -
 drivers/md/md.c           |   1 -
 include/linux/module.h    |   8 ++
 include/linux/notifier.h  |  26 ++---
 kernel/debug/debug_core.c |   1 -
 kernel/livepatch/core.c   |  29 ++++-
 kernel/module/main.c      |  34 +++---
 kernel/notifier.c         | 219 ++++++++++++++++++++++++++++++++------
 kernel/trace/ftrace.c     |  38 +++++++
 net/ipv4/nexthop.c        |   2 +-
 13 files changed, 290 insertions(+), 74 deletions(-)

-- 
2.43.0

^ permalink raw reply

* Re: [PATCH] kernel/trace/ftrace: introduce ftrace module notifier
From: Song Chen @ 2026-04-12 14:10 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Miroslav Benes, mcgrof, petr.pavlu, da.gomez,
	samitolvanen, atomlin, mhiramat, mark.rutland, mathieu.desnoyers,
	linux-modules, linux-kernel, linux-trace-kernel, live-patching
In-Reply-To: <aaqk-GrpCTqO36xj@pathway.suse.cz>

Hi,


在 2026/3/6 17:57, Petr Mladek 写道:
> On Fri 2026-02-27 09:34:59, Song Chen wrote:
>> Hi,
>>
>> 在 2026/2/27 01:30, Steven Rostedt 写道:
>>> On Thu, 26 Feb 2026 11:51:53 +0100 (CET)
>>> Miroslav Benes <mbenes@suse.cz> wrote:
>>>
>>>>> Let me see if there is any way to use notifier and remain below calling
>>>>> sequence:
>>>>>
>>>>> ftrace_module_enable
>>>>> klp_module_coming
>>>>> blocking_notifier_call_chain_robust(MODULE_STATE_COMING)
>>>>>
>>>>> blocking_notifier_call_chain(MODULE_STATE_GOING)
>>>>> klp_module_going
>>>>> ftrace_release_mod
>>>>
>>>> Both klp and ftrace used module notifiers in the past. We abandoned that
>>>> and opted for direct calls due to issues with ordering at the time. I do
>>>> not have the list of problems at hand but I remember it was very fragile.
>>>>
>>>> See commits 7dcd182bec27 ("ftrace/module: remove ftrace module
>>>> notifier"), 7e545d6eca20 ("livepatch/module: remove livepatch module
>>>> notifier") and their surroundings.
>>>>
>>>> So unless there is a reason for the change (which should be then carefully
>>>> reviewed and properly tested), I would prefer to keep it as is. What is
>>>> the motivation? I am failing to find it in the commit log.
>>
>> There is no special motivation, i just read btf initialization in module
>> loading and found direct calls of ftrace and klp, i thought they were just
>> forgotten to use notifier and i even didn't search git log to verify, sorry
>> about that.
>>
>>>
>>> Honestly, I do think just decoupling ftrace and live kernel patching from
>>> modules is rationale enough, as it makes the code a bit cleaner. But to do
>>> so, we really need to make sure there is absolutely no regressions.
>>>
>>> Thus, to allow such a change, I would ask those that are proposing it, show
>>> a full work flow of how ftrace, live kernel patching, and modules work with
>>> each other and why those functions are currently injected in the module code.
>>>
>>> As Miroslav stated, we tried to do it via notifiers in the past and it
>>> failed. I don't want to find out why they failed by just adding them back
>>> to notifiers again. Instead, the reasons must be fully understood and
>>> updates made to make sure they will not fail in the future.
>>
>> Yes, you are right, i read commit msg of 7dcd182bec27, this patch just
>> reverses it simply and will introduce order issue back. I will try to find
>> out the problem in the past at first.
> 
> AFAIK, the root of the problem is that livepatch uses the ftrace
> framework. It means that:
> 
>     + ftrace must be initialized before livepatch gets enabled
>     + livepatch must be disabled before ftrace support gets removed
> 
> My understanding is that this can't be achieved by notifiers easily
> because they are always proceed in the same order.
> 
> An elegant solution would be to introduce  notifier_reverse_call_chain()
> which would process the callbacks in the reverse order. But it might
> be non-trivial:
> 
>    + We would need to make sure that it does not break some
>      existing "hidden" dependencies.
> 
Thanks so much, this is the solution i'm working on. I replaced next 
with a list_head in notifier_block and implemented 
anotifier_call_chain_reverse to address the order issues, like your 
suggestion. And a new robust revision for rolling back.

+static int notifier_call_chain_reverse(struct list_head *nl,
+                    struct notifier_block *start,
+                    unsigned long val, void *v,
+                    int nr_to_call, int *nr_calls)
+{
+    int ret = NOTIFY_DONE;
+    struct notifier_block *nb;
+    bool do_call = (start == NULL);
+
+    if (!nr_to_call)
+        return ret;
+
+    list_for_each_entry_reverse(nb, nl, entry) {
+        if (!do_call) {
+            if (nb == start)
+                do_call = true;
+            continue;
+        }
+#ifdef CONFIG_DEBUG_NOTIFIERS
+        if (unlikely(!func_ptr_is_kernel_text(nb->notifier_call))) {
+            WARN(1, "Invalid notifier called!");
+            continue;
+        }
+#endif
+        trace_notifier_run((void *)nb->notifier_call);
+        ret = nb->notifier_call(nb, val, v);
+
+        if (nr_calls)
+            (*nr_calls)++;
+
+        if (ret & NOTIFY_STOP_MASK)
+            break;
+
+        if (nr_to_call-- == 0)
+            break;
+    }
+    return ret;
+}
+NOKPROBE_SYMBOL(notifier_call_chain_reverse);

I'll send the patches for review soon.

Best regards

Song
>    + notifier_call_chain() uses RCU to process the list of registered
>      callbacks. I am not sure how complicated would be to make it safe
>      in both directions.
> 
> Best Regards,
> Petr
> 
> 

^ permalink raw reply

* Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions
From: Yafang Shao @ 2026-04-12 13:50 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: jpoimboe, jikos, mbenes, pmladek, joe.lawrence, rostedt,
	mathieu.desnoyers, kpsingh, mattbobrowski, song, jolsa, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, yonghong.song,
	live-patching, linux-kernel, linux-trace-kernel, bpf
In-Reply-To: <20260410133844.56ab7964da7628d1c3482acb@kernel.org>

On Fri, Apr 10, 2026 at 12:38 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi Yafang,
>
> On Thu,  2 Apr 2026 17:26:03 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > Livepatching allows for rapid experimentation with new kernel features
> > without interrupting production workloads. However, static livepatches lack
> > the flexibility required to tune features based on task-specific attributes,
> > such as cgroup membership, which is critical in multi-tenant k8s
> > environments. Furthermore, hardcoding logic into a livepatch prevents
> > dynamic adjustments based on the runtime environment.
> >
> > To address this, we propose a hybrid approach using BPF. Our production use
> > case involves:
> >
> > 1. Deploying a Livepatch function to serve as a stable BPF hook.
> >
> > 2. Utilizing bpf_override_return() to dynamically modify the return value
> >    of that hook based on the current task's context.
>
> First of all, I don't like this approach to test a new feature in the
> kernel, because it sounds like allowing multiple different generations
> of implementations to coexist simultaneously. The standard kernel code
> is not designed to withstand such implementations.

However, this approach is invaluable for rapidly deploying new kernel
features to production servers without downtime. Upgrading kernels
across a large fleet remains a significant challenge.

>
> For example, if you implement a well-designed framework in a specific
> subsystem, like Schedext, which allows multiple implementations extended
> with BPF to coexist, there's no problem (at least it's debatable).
>
> But if it is for any function, it is dangerous feature. Bugs that occur
> in kernels that use this functionality cannot be addressed here. They
> need to be treated the same way as out-of-tree drivers or forked kernels.
> I mean, add a tainted flag for this feature. And we don't care of it.

Agreed. This should be handled as an OOT module rather than part of
the core kernel.

>
> >
> > A significant challenge arises when atomic-replace is enabled. In this
> > mode, deploying a new livepatch changes the target function's address,
> > forcing a re-attachment of the BPF program. This re-attachment latency is
> > unacceptable in critical paths, such as those handling networking policies.
> >
> > To solve this, we introduce a hybrid livepatch mode that allows specific
> > patches to remain non-replaceable, ensuring the function address remains
> > stable and the BPF program stays attached.
>
> Can you share your actual problem to be solved?

Here is an example we recently deployed on our production servers:

  https://lore.kernel.org/bpf/CALOAHbDnNba_w_nWH3-S9GAXw0+VKuLTh1gy5hy9Yqgeo4C0iA@mail.gmail.com/

In one of our specific clusters, we needed to send BGP traffic out
through specific NICs based on the destination IP. To achieve this
without interrupting service, we live-patched
bond_xmit_3ad_xor_slave_get(), added a new hook called
bond_get_slave_hook(), and then ran a BPF program attached to that
hook to select the outgoing NIC from the SKB. This allowed us to
rapidly deploy the feature with zero downtime.

[...]

-- 
Regards
Yafang

^ permalink raw reply

* Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions
From: Yafang Shao @ 2026-04-12 13:30 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Song Liu, jpoimboe, jikos, pmladek, joe.lawrence, rostedt,
	mhiramat, mathieu.desnoyers, kpsingh, mattbobrowski, jolsa, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, yonghong.song,
	live-patching, linux-kernel, linux-trace-kernel, bpf
In-Reply-To: <alpine.LSU.2.21.2604091205250.31526@pobox.suse.cz>

On Thu, Apr 9, 2026 at 6:08 PM Miroslav Benes <mbenes@suse.cz> wrote:
>
> > Can we add something like ALLOW_LIVEPATCH_ERROR_INJECTION() to allow
> > error injection on functions defined inside a livepatch?
>
> No.
>
> I am sorry but you always seem to find band aids to your set up and how
> you deal with live patches internally. While I can see that something like
> a hybrid mode might be useful to people if done right (and we are not
> there yet), the combination of it with bpf overrides or anything like that
> is not something I would like to see in upstream.

The upstream kernel already allows for combining BPF and livepatch to
override functions. Song’s patch offers a great reference for
implementing this without changing the kernel:

  https://lore.kernel.org/bpf/20260408175217.1011024-1-song@kernel.org/

-- 
Regards
Yafang

^ permalink raw reply

* Re: [RFC PATCH 2/4] trace: Allow kprobes to override livepatched functions
From: Yafang Shao @ 2026-04-12 13:08 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: jpoimboe, jikos, pmladek, joe.lawrence, rostedt, mhiramat,
	mathieu.desnoyers, kpsingh, mattbobrowski, song, jolsa, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, yonghong.song,
	live-patching, linux-kernel, linux-trace-kernel, bpf
In-Reply-To: <alpine.LSU.2.21.2604091145340.31526@pobox.suse.cz>

On Thu, Apr 9, 2026 at 5:47 PM Miroslav Benes <mbenes@suse.cz> wrote:
>
> Hi,
>
> On Thu, 2 Apr 2026, Yafang Shao wrote:
>
> > Introduce the ability for kprobes to override the return values of
> > functions that have been livepatched. This functionality is guarded by the
> > CONFIG_KPROBE_OVERRIDE_KLP_FUNC configuration option.
>
> this is imprecise if I read the code correctly. You want to override live
> patch functions, not the original ones which are live patched.

Correct. The BPF program will override the livepatched functions
rather than the original ones.

>
> I also think that if nothing else, it needs to be more specific then just
> checking mod->klp. It should check if a function itself in klp module is
> overridable to keep it as limited as possible.

That is exactly what I am currently implementing in
trace_kprobe_klp_func_overridable().

> Even with that, the
> concerns expressed by the others still apply.

-- 
Regards
Yafang

^ permalink raw reply

* Re: [RFC PATCH 3/4] livepatch: Add "replaceable" attribute to klp_patch
From: Yafang Shao @ 2026-04-12 12:18 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Song Liu, Joe Lawrence, Dylan Hatch, jpoimboe, jikos, mbenes,
	rostedt, mhiramat, mathieu.desnoyers, kpsingh, mattbobrowski,
	jolsa, ast, daniel, andrii, martin.lau, eddyz87, memxor,
	yonghong.song, live-patching, linux-kernel, linux-trace-kernel,
	bpf
In-Reply-To: <addW_-whBavyHY-Z@pathway.suse.cz>

On Thu, Apr 9, 2026 at 3:36 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2026-04-08 11:19:50, Song Liu wrote:
> > On Wed, Apr 8, 2026 at 4:43 AM Petr Mladek <pmladek@suse.com> wrote:
> > [...]
> > > > >
> > > > > This is weird semantic. Which livepatch tag would be allowed to
> > > > > supersede it, please?
> > > > >
> > > > > Do we still need this category?
> > > >
> > > > It can be superseded by any livepatch that has a non-zero tag set.
> > >
> > > And this exactly the weird thing.
> > >
> > > A patch with the .replace flag set is supposed to obsolete all already
> > > installed livepatches. It means that it should provide all existing
> > > fixes and features.
> > >
> > > Now, we want to introduce a replace flag/set which would allow to
> > > replace/obsolete only the livepatch with the same tag/set number.
> > > And we want to prevent conflicts by making sure that livepatches with
> > > different tag/set number will never livepatch the same function.
> > >
> > > Obviously, livepatches with different tag/set number could not
> > > obsolete the same no-replace livepatch. They would need to livepatch
> > > the same functions touched by the no-replace livepatch and would
> > > conflict.
> > >
> > > So, I suggest to remove the no-replace mode completely. It should
> > > not be needed. A livepatch which should be installed in parallel
> > > will simply use another unique tag/set number.
> >
> > I think I see your point now. Existing code works as:
> > - replace=false doesn't replace anything
> > - replace=true replaces everything
> >
> > If we assume false=0 and true=1, it is technically possible to define:
> > - replace_set=0 doesn't replace anything
> > - replace_set=1 replaces everything
> > - replace_set=2+ only replace the same replace_set
>
> Yes. This well describes my point.
>
> > This is probably a little too complicated.
> >
> > > > This ensures backward compatibility: while a non-atomic-replace
> > > > livepatch can be superseded by an atomic-replace one, the reverse is
> > > > not permitted—an atomic-replace livepatch cannot be superseded by a
> > > > non-atomic one.
> > >
> > > IMHO, the backward compatibility would just create complexity and mess
> > > in this case.
> >
> > Given that livepatch is for expert users, I think we can make this work
> > without backward compatibility. But breaking compatibility is always not
> > preferred.
>
> I believe that it is acceptable because:
>
>   1. It was always hard to combine no-replace and replace livepatches.
>      I wonder if anyone combines them at all.

Because 'replace' patches can supersede 'no-replace' ones, users have
to maintain a strict loading order. I doubt anyone actually combines
them in production.

>
>   2. I believe that nobody tries to load the same livepatch module on
>      different kernel versions. Instead, everyone prepares a custom
>      livepatch module for each livepatched kernel version/release.

Correct. We always build and apply distinct livepatches for each
specific kernel version.

>
>      And the tooling for creating livepatches will need to be updated
>      to use "number" instead of "true/false" anyway.
>
> That said, it is easier to always use "0" for non-replace patches
> instead of assigning an unique "number" to avoid replacing. But
> I do not think that this would justify the complexity of having
> different semantic for 0, 1, and 2+ replace_set numbers.

Fair enough. Let's drop backward compatibility to keep the
implementation simple.

--
Regards
Yafang

^ permalink raw reply

* Re: [RFC PATCH 3/4] livepatch: Add "replaceable" attribute to klp_patch
From: Yafang Shao @ 2026-04-12 12:09 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Song Liu, Joe Lawrence, Dylan Hatch, jpoimboe, jikos, mbenes,
	rostedt, mhiramat, mathieu.desnoyers, kpsingh, mattbobrowski,
	jolsa, ast, daniel, andrii, martin.lau, eddyz87, memxor,
	yonghong.song, live-patching, linux-kernel, linux-trace-kernel,
	bpf
In-Reply-To: <adY_WgA54CDtWBq6@pathway.suse.cz>

On Wed, Apr 8, 2026 at 7:43 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2026-04-08 10:40:10, Yafang Shao wrote:
> > On Tue, Apr 7, 2026 at 11:08 PM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Tue 2026-04-07 17:45:31, Yafang Shao wrote:
> > > > On Tue, Apr 7, 2026 at 11:16 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > On Tue, Apr 7, 2026 at 10:54 AM Song Liu <song@kernel.org> wrote:
> > > > > >
> > > > > > On Mon, Apr 6, 2026 at 2:12 PM Joe Lawrence <joe.lawrence@redhat.com> wrote:
> > > > > > [...]
> > > > > > > > > > - The regular livepatches are cumulative, have the replace flag; and
> > > > > > > > > >   are replaceable.
> > > > > > > > > > - The occasional "off-band" livepatches do not have the replace flag,
> > > > > > > > > >   and are not replaceable.
> > > > > > > > > >
> > > > > > > > > > With this setup, for systems with off-band livepatches loaded, we can
> > > > > > > > > > still release a cumulative livepatch to replace the previous cumulative
> > > > > > > > > > livepatch. Is this the expected use case?
> > > > > > > > >
> > > > > > > > > That matches our expected use case.
> > > > > > > >
> > > > > > > > If we really want to serve use cases like this, I think we can introduce
> > > > > > > > some replace tag concept: Each livepatch will have a tag, u32 number.
> > > > > > > > Newly loaded livepatch will only replace existing livepatch with the
> > > > > > > > same tag. We can even reuse the existing "bool replace" in klp_patch,
> > > > > > > > and make it u32: replace=0 means no replace; replace > 0 are the
> > > > > > > > replace tag.
> > > > > > > >
> > > > > > > > For current users of cumulative patches, all the livepatch will have the
> > > > > > > > same tag, say 1. For your use case, you can assign each user a
> > > > > > > > unique tag. Then all these users can do atomic upgrades of their
> > > > > > > > own livepatches.
> > > > > > > >
> > > > > > > > We may also need to check whether two livepatches of different tags
> > > > > > > > touch the same kernel function. When that happens, the later
> > > > > > > > livepatch should fail to load.
> > >
> > > I still think how to make the hybrid mode more secure:
> > >
> > >     + The isolated sets of livepatched functions look like a good rule.
> > >     + What about isolating the shadow variables/states as well?
> >
> > We might consider extending the klp_shadow_* API to support the new
> > livepatch tag.
>
> It would be nice to associate shadow variables with states so that
> we could check which shadow variables are used by each livepatch.
>
> It is partially implemented in my earlier RFC, see
> https://lore.kernel.org/all/20250115082431.5550-3-pmladek@suse.com/

This patch is still pending acceptance, but it offers a nice
improvement. With your modifications, the remaining task would be to
integrate a new replace_set into klp_state and update the API
accordingly

[...]

-- 
Regards
Yafang

^ permalink raw reply

* Re: [PATCH 1/1] objtool/klp: Fix is_uncorrelated_static_local() for Clang naming
From: Song Liu @ 2026-04-10 22:19 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-kernel, live-patching, Josh Poimboeuf, Jiri Kosina,
	Miroslav Benes, Petr Mladek
In-Reply-To: <20260408144919.3825518-1-joe.lawrence@redhat.com>

On Wed, Apr 8, 2026 at 7:49 AM Joe Lawrence <joe.lawrence@redhat.com> wrote:
>
> For naming function-local static locals, GCC uses "<var>.<id>":
>   e.g. __already_done.15
>
> while Clang uses "<func>.<var>" with optional ".<id>"
>   e.g. create_worker.__already_done.111
>
> The existing is_uncorrelated_static_local() check only matches the GCC
> convention where the variable name is a prefix.  Handle both cases by
> checking for a prefix match (GCC) and by checking after the first dot
> separator (Clang).
>
> Fixes: dd590d4d57eb ("objtool/klp: Introduce klp diff subcommand for diffing object files")
> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>

LGTM.

Acked-by: Song Liu <song@kernel.org>

^ permalink raw reply

* Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions
From: Masami Hiramatsu @ 2026-04-10  4:38 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jpoimboe, jikos, mbenes, pmladek, joe.lawrence, rostedt,
	mathieu.desnoyers, kpsingh, mattbobrowski, song, jolsa, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, yonghong.song,
	live-patching, linux-kernel, linux-trace-kernel, bpf
In-Reply-To: <20260402092607.96430-1-laoar.shao@gmail.com>

Hi Yafang,

On Thu,  2 Apr 2026 17:26:03 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> Livepatching allows for rapid experimentation with new kernel features
> without interrupting production workloads. However, static livepatches lack
> the flexibility required to tune features based on task-specific attributes,
> such as cgroup membership, which is critical in multi-tenant k8s
> environments. Furthermore, hardcoding logic into a livepatch prevents
> dynamic adjustments based on the runtime environment.
> 
> To address this, we propose a hybrid approach using BPF. Our production use
> case involves:
> 
> 1. Deploying a Livepatch function to serve as a stable BPF hook.
> 
> 2. Utilizing bpf_override_return() to dynamically modify the return value
>    of that hook based on the current task's context.

First of all, I don't like this approach to test a new feature in the
kernel, because it sounds like allowing multiple different generations
of implementations to coexist simultaneously. The standard kernel code
is not designed to withstand such implementations.

For example, if you implement a well-designed framework in a specific
subsystem, like Schedext, which allows multiple implementations extended
with BPF to coexist, there's no problem (at least it's debatable).

But if it is for any function, it is dangerous feature. Bugs that occur
in kernels that use this functionality cannot be addressed here. They
need to be treated the same way as out-of-tree drivers or forked kernels.
I mean, add a tainted flag for this feature. And we don't care of it.

> 
> A significant challenge arises when atomic-replace is enabled. In this
> mode, deploying a new livepatch changes the target function's address,
> forcing a re-attachment of the BPF program. This re-attachment latency is
> unacceptable in critical paths, such as those handling networking policies.
> 
> To solve this, we introduce a hybrid livepatch mode that allows specific
> patches to remain non-replaceable, ensuring the function address remains
> stable and the BPF program stays attached.

Can you share your actual problem to be solved? 
If the specific problem and the specific subsystem are clear,
I think there is room to discuss it with the subsystem maintainer.

> 
> Furthermore, this mechanism provides a lower-maintenance alternative to
> out-of-tree BPF hooks. Given the complexities of upstreaming custom BPF
> hooks (e.g., [0], [1]), this hybrid mode allows for the maintenance of
> stable, minimal hook points via livepatching with significantly reduced
> maintenance burden.

Maintenance cost is the same. We need to add out-of-tree BPF hooks on
source code. Maybe deploying cost will be reduced.

Thank you,

> 
> Link: https://lwn.net/Articles/1054030/ [0]
> Link: https://lwn.net/Articles/1043548/ [1]
> 
> Yafang Shao (4):
>   trace: Simplify kprobe overridable function check
>   trace: Allow kprobes to override livepatched functions
>   livepatch: Add "replaceable" attribute to klp_patch
>   livepatch: Implement livepatch hybrid mode
> 
>  include/linux/livepatch.h   |  2 ++
>  kernel/livepatch/core.c     | 50 +++++++++++++++++++++++++++++++
>  kernel/trace/Kconfig        | 14 +++++++++
>  kernel/trace/bpf_trace.c    | 14 ++++++---
>  kernel/trace/trace_kprobe.c | 49 ++++++++++++------------------
>  kernel/trace/trace_probe.h  | 59 +++++++++++++++++++++++++++----------
>  6 files changed, 139 insertions(+), 49 deletions(-)
> 
> -- 
> 2.47.3
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [RFC PATCH 0/4] trace, livepatch: Allow kprobe return overriding for livepatched functions
From: Miroslav Benes @ 2026-04-09 10:08 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Song Liu, jpoimboe, jikos, pmladek, joe.lawrence, rostedt,
	mhiramat, mathieu.desnoyers, kpsingh, mattbobrowski, jolsa, ast,
	daniel, andrii, martin.lau, eddyz87, memxor, yonghong.song,
	live-patching, linux-kernel, linux-trace-kernel, bpf
In-Reply-To: <CALOAHbAmTAfamStF9sZtO6efWYJ1sbXJp3PbsVapZf7dba91ig@mail.gmail.com>

> Can we add something like ALLOW_LIVEPATCH_ERROR_INJECTION() to allow
> error injection on functions defined inside a livepatch?

No.

I am sorry but you always seem to find band aids to your set up and how 
you deal with live patches internally. While I can see that something like 
a hybrid mode might be useful to people if done right (and we are not 
there yet), the combination of it with bpf overrides or anything like that 
is not something I would like to see in upstream.

Miroslav

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox