[PATCH] klp: use stop machine to check and expedite transition for running tasks

public inbox for live-patching@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] klp: use stop machine to check and expedite transition for running tasks
@ 2026-02-02  9:13 Li Zhe
  2026-02-04  2:20 ` Josh Poimboeuf
  0 siblings, 1 reply; 5+ messages in thread
From: Li Zhe @ 2026-02-02  9:13 UTC (permalink / raw)
  To: jpoimboe, jikos, mbenes, pmladek, joe.lawrence
  Cc: live-patching, linux-kernel, lizhe.67, qirui.001

In the current KLP transition implementation, the strategy for running
tasks relies on waiting for a context switch to attempt to clear the
TIF_PATCH_PENDING flag. Alternatively, determine whether the
TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
process has yielded the CPU. However, this approach proves problematic
in certain environments.

Consider a scenario where the majority of system CPUs are configured
with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
to that physical core and configured with idle=poll within the guest.
Under such conditions, these vCPUs rarely leave the CPU. Combined with
the high core counts typical of modern server platforms, this results
in transition completion times that are not only excessively prolonged
but also highly unpredictable.

This patch resolves this issue by registering a callback with
stop_machine. The callback attempts to transition the associated running
task. In a VM environment configured with 32 CPUs, the live patching
operation completes promptly after the SIGNALS_TIMEOUT period with this
patch applied; without it, the process nearly fails to complete under
the same scenario.

Co-developed-by: Rui Qi <qirui.001@bytedance.com>
Signed-off-by: Rui Qi <qirui.001@bytedance.com>
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
 kernel/livepatch/transition.c | 62 ++++++++++++++++++++++++++++++++---
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 2351a19ac2a9..9c078b9bd755 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -10,6 +10,7 @@
 #include <linux/cpu.h>
 #include <linux/stacktrace.h>
 #include <linux/static_call.h>
+#include <linux/stop_machine.h>
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -297,6 +298,61 @@ static int klp_check_and_switch_task(struct task_struct *task, void *arg)
 	return 0;
 }
 
+enum klp_stop_work_bit {
+	KLP_STOP_WORK_PENDING_BIT,
+};
+
+struct klp_stop_work_info {
+	struct task_struct *task;
+	unsigned long flag;
+};
+
+static DEFINE_PER_CPU(struct cpu_stop_work, klp_transition_stop_work);
+static DEFINE_PER_CPU(struct klp_stop_work_info, klp_stop_work_info);
+
+static int klp_check_task(struct task_struct *task, void *old_name)
+{
+	if (task == current)
+		return klp_check_and_switch_task(current, old_name);
+	else
+		return task_call_func(task, klp_check_and_switch_task, old_name);
+}
+
+static int klp_transition_stop_work_fn(void *arg)
+{
+	struct klp_stop_work_info *info = (struct klp_stop_work_info *)arg;
+	struct task_struct *task = info->task;
+	const char *old_name;
+
+	clear_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag);
+
+	if (likely(klp_patch_pending(task)))
+		klp_check_task(task, &old_name);
+
+	put_task_struct(task);
+
+	return 0;
+}
+
+static void klp_try_transition_running_task(struct task_struct *task)
+{
+	int cpu = task_cpu(task);
+
+	if (klp_signals_cnt && !(klp_signals_cnt % SIGNALS_TIMEOUT)) {
+		struct klp_stop_work_info *info =
+			per_cpu_ptr(&klp_stop_work_info, cpu);
+
+		if (test_and_set_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag))
+			return;
+
+		info->task = get_task_struct(task);
+		if (!stop_one_cpu_nowait(cpu, klp_transition_stop_work_fn, info,
+					 per_cpu_ptr(&klp_transition_stop_work,
+					 cpu)))
+			put_task_struct(task);
+	}
+}
+
 /*
  * Try to safely switch a task to the target patch state.  If it's currently
  * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
@@ -323,10 +379,7 @@ static bool klp_try_switch_task(struct task_struct *task)
 	 * functions.  If all goes well, switch the task to the target patch
 	 * state.
 	 */
-	if (task == current)
-		ret = klp_check_and_switch_task(current, &old_name);
-	else
-		ret = task_call_func(task, klp_check_and_switch_task, &old_name);
+	ret = klp_check_task(task, &old_name);
 
 	switch (ret) {
 	case 0:		/* success */
@@ -335,6 +388,7 @@ static bool klp_try_switch_task(struct task_struct *task)
 	case -EBUSY:	/* klp_check_and_switch_task() */
 		pr_debug("%s: %s:%d is running\n",
 			 __func__, task->comm, task->pid);
+		klp_try_transition_running_task(task);
 		break;
 	case -EINVAL:	/* klp_check_and_switch_task() */
 		pr_debug("%s: %s:%d has an unreliable stack\n",
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] klp: use stop machine to check and expedite transition for running tasks
  2026-02-02  9:13 [PATCH] klp: use stop machine to check and expedite transition for running tasks Li Zhe
@ 2026-02-04  2:20 ` Josh Poimboeuf
  2026-02-04  2:47   ` Li Zhe
  2026-02-09 19:12   ` Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Josh Poimboeuf @ 2026-02-04  2:20 UTC (permalink / raw)
  To: Li Zhe
  Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching, linux-kernel,
	qirui.001, Peter Zijlstra

On Mon, Feb 02, 2026 at 05:13:34PM +0800, Li Zhe wrote:
> In the current KLP transition implementation, the strategy for running
> tasks relies on waiting for a context switch to attempt to clear the
> TIF_PATCH_PENDING flag. Alternatively, determine whether the
> TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
> process has yielded the CPU. However, this approach proves problematic
> in certain environments.
> 
> Consider a scenario where the majority of system CPUs are configured
> with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
> to that physical core and configured with idle=poll within the guest.
> Under such conditions, these vCPUs rarely leave the CPU. Combined with
> the high core counts typical of modern server platforms, this results
> in transition completion times that are not only excessively prolonged
> but also highly unpredictable.
> 
> This patch resolves this issue by registering a callback with
> stop_machine. The callback attempts to transition the associated running
> task. In a VM environment configured with 32 CPUs, the live patching
> operation completes promptly after the SIGNALS_TIMEOUT period with this
> patch applied; without it, the process nearly fails to complete under
> the same scenario.
> 
> Co-developed-by: Rui Qi <qirui.001@bytedance.com>
> Signed-off-by: Rui Qi <qirui.001@bytedance.com>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>

PeterZ, what's your take on this?

I wonder if we could instead do resched_cpu() or something similar to
trigger the call to klp_sched_try_switch() in __schedule()?

> ---
>  kernel/livepatch/transition.c | 62 ++++++++++++++++++++++++++++++++---
>  1 file changed, 58 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 2351a19ac2a9..9c078b9bd755 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -10,6 +10,7 @@
>  #include <linux/cpu.h>
>  #include <linux/stacktrace.h>
>  #include <linux/static_call.h>
> +#include <linux/stop_machine.h>
>  #include "core.h"
>  #include "patch.h"
>  #include "transition.h"
> @@ -297,6 +298,61 @@ static int klp_check_and_switch_task(struct task_struct *task, void *arg)
>  	return 0;
>  }
>  
> +enum klp_stop_work_bit {
> +	KLP_STOP_WORK_PENDING_BIT,
> +};
> +
> +struct klp_stop_work_info {
> +	struct task_struct *task;
> +	unsigned long flag;
> +};
> +
> +static DEFINE_PER_CPU(struct cpu_stop_work, klp_transition_stop_work);
> +static DEFINE_PER_CPU(struct klp_stop_work_info, klp_stop_work_info);
> +
> +static int klp_check_task(struct task_struct *task, void *old_name)
> +{
> +	if (task == current)
> +		return klp_check_and_switch_task(current, old_name);
> +	else
> +		return task_call_func(task, klp_check_and_switch_task, old_name);
> +}
> +
> +static int klp_transition_stop_work_fn(void *arg)
> +{
> +	struct klp_stop_work_info *info = (struct klp_stop_work_info *)arg;
> +	struct task_struct *task = info->task;
> +	const char *old_name;
> +
> +	clear_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag);
> +
> +	if (likely(klp_patch_pending(task)))
> +		klp_check_task(task, &old_name);
> +
> +	put_task_struct(task);
> +
> +	return 0;
> +}
> +
> +static void klp_try_transition_running_task(struct task_struct *task)
> +{
> +	int cpu = task_cpu(task);
> +
> +	if (klp_signals_cnt && !(klp_signals_cnt % SIGNALS_TIMEOUT)) {
> +		struct klp_stop_work_info *info =
> +			per_cpu_ptr(&klp_stop_work_info, cpu);
> +
> +		if (test_and_set_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag))
> +			return;
> +
> +		info->task = get_task_struct(task);
> +		if (!stop_one_cpu_nowait(cpu, klp_transition_stop_work_fn, info,
> +					 per_cpu_ptr(&klp_transition_stop_work,
> +					 cpu)))
> +			put_task_struct(task);
> +	}
> +}
> +
>  /*
>   * Try to safely switch a task to the target patch state.  If it's currently
>   * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
> @@ -323,10 +379,7 @@ static bool klp_try_switch_task(struct task_struct *task)
>  	 * functions.  If all goes well, switch the task to the target patch
>  	 * state.
>  	 */
> -	if (task == current)
> -		ret = klp_check_and_switch_task(current, &old_name);
> -	else
> -		ret = task_call_func(task, klp_check_and_switch_task, &old_name);
> +	ret = klp_check_task(task, &old_name);
>  
>  	switch (ret) {
>  	case 0:		/* success */
> @@ -335,6 +388,7 @@ static bool klp_try_switch_task(struct task_struct *task)
>  	case -EBUSY:	/* klp_check_and_switch_task() */
>  		pr_debug("%s: %s:%d is running\n",
>  			 __func__, task->comm, task->pid);
> +		klp_try_transition_running_task(task);
>  		break;
>  	case -EINVAL:	/* klp_check_and_switch_task() */
>  		pr_debug("%s: %s:%d has an unreliable stack\n",
> -- 
> 2.20.1

-- 
Josh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] klp: use stop machine to check and expedite transition for running tasks
  2026-02-04  2:20 ` Josh Poimboeuf
@ 2026-02-04  2:47   ` Li Zhe
  2026-02-09  2:54     ` Li Zhe
  2026-02-09 19:12   ` Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Li Zhe @ 2026-02-04  2:47 UTC (permalink / raw)
  To: jpoimboe
  Cc: jikos, joe.lawrence, linux-kernel, live-patching, lizhe.67,
	mbenes, peterz, pmladek, qirui.001

On Tue, 3 Feb 2026 18:20:22 -0800, jpoimboe@kernel.org wrote:
 
> On Mon, Feb 02, 2026 at 05:13:34PM +0800, Li Zhe wrote:
> > In the current KLP transition implementation, the strategy for running
> > tasks relies on waiting for a context switch to attempt to clear the
> > TIF_PATCH_PENDING flag. Alternatively, determine whether the
> > TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
> > process has yielded the CPU. However, this approach proves problematic
> > in certain environments.
> > 
> > Consider a scenario where the majority of system CPUs are configured
> > with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
> > to that physical core and configured with idle=poll within the guest.
> > Under such conditions, these vCPUs rarely leave the CPU. Combined with
> > the high core counts typical of modern server platforms, this results
> > in transition completion times that are not only excessively prolonged
> > but also highly unpredictable.
> > 
> > This patch resolves this issue by registering a callback with
> > stop_machine. The callback attempts to transition the associated running
> > task. In a VM environment configured with 32 CPUs, the live patching
> > operation completes promptly after the SIGNALS_TIMEOUT period with this
> > patch applied; without it, the process nearly fails to complete under
> > the same scenario.
> > 
> > Co-developed-by: Rui Qi <qirui.001@bytedance.com>
> > Signed-off-by: Rui Qi <qirui.001@bytedance.com>
> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> 
> PeterZ, what's your take on this?
> 
> I wonder if we could instead do resched_cpu() or something similar to
> trigger the call to klp_sched_try_switch() in __schedule()?

klp_sched_try_switch() only invokes __klp_sched_try_switch() after
verifying that the corresponding task has the TASK_FREEZABLE flag
set. I remain uncertain whether this approach adequately resolves
the issue.

Thanks,
Zhe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] klp: use stop machine to check and expedite transition for running tasks
  2026-02-04  2:47   ` Li Zhe
@ 2026-02-09  2:54     ` Li Zhe
  0 siblings, 0 replies; 5+ messages in thread
From: Li Zhe @ 2026-02-09  2:54 UTC (permalink / raw)
  To: lizhe.67
  Cc: jikos, joe.lawrence, jpoimboe, linux-kernel, live-patching,
	mbenes, peterz, pmladek, qirui.001

On Mon, 2 Feb 2026 17:13:34 +0800, lizhe.67@bytedance.com wrote:
 
> In the current KLP transition implementation, the strategy for running
> tasks relies on waiting for a context switch to attempt to clear the
> TIF_PATCH_PENDING flag. Alternatively, determine whether the
> TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
> process has yielded the CPU. However, this approach proves problematic
> in certain environments.
> 
> Consider a scenario where the majority of system CPUs are configured
> with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
> to that physical core and configured with idle=poll within the guest.
> Under such conditions, these vCPUs rarely leave the CPU. Combined with
> the high core counts typical of modern server platforms, this results
> in transition completion times that are not only excessively prolonged
> but also highly unpredictable.
> 
> This patch resolves this issue by registering a callback with
> stop_machine. The callback attempts to transition the associated running
> task. In a VM environment configured with 32 CPUs, the live patching
> operation completes promptly after the SIGNALS_TIMEOUT period with this
> patch applied; without it, the process nearly fails to complete under
> the same scenario.
> 
> Co-developed-by: Rui Qi <qirui.001@bytedance.com>
> Signed-off-by: Rui Qi <qirui.001@bytedance.com>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
>  kernel/livepatch/transition.c | 62 ++++++++++++++++++++++++++++++++---
>  1 file changed, 58 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 2351a19ac2a9..9c078b9bd755 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -10,6 +10,7 @@
>  #include <linux/cpu.h>
>  #include <linux/stacktrace.h>
>  #include <linux/static_call.h>
> +#include <linux/stop_machine.h>
>  #include "core.h"
>  #include "patch.h"
>  #include "transition.h"
> @@ -297,6 +298,61 @@ static int klp_check_and_switch_task(struct task_struct *task, void *arg)
>  	return 0;
>  }
>  
> +enum klp_stop_work_bit {
> +	KLP_STOP_WORK_PENDING_BIT,
> +};
> +
> +struct klp_stop_work_info {
> +	struct task_struct *task;
> +	unsigned long flag;
> +};
> +
> +static DEFINE_PER_CPU(struct cpu_stop_work, klp_transition_stop_work);
> +static DEFINE_PER_CPU(struct klp_stop_work_info, klp_stop_work_info);
> +
> +static int klp_check_task(struct task_struct *task, void *old_name)
> +{
> +	if (task == current)
> +		return klp_check_and_switch_task(current, old_name);
> +	else
> +		return task_call_func(task, klp_check_and_switch_task, old_name);
> +}
> +
> +static int klp_transition_stop_work_fn(void *arg)
> +{
> +	struct klp_stop_work_info *info = (struct klp_stop_work_info *)arg;
> +	struct task_struct *task = info->task;
> +	const char *old_name;
> +
> +	clear_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag);
> +
> +	if (likely(klp_patch_pending(task)))
> +		klp_check_task(task, &old_name);
> +
> +	put_task_struct(task);
> +
> +	return 0;
> +}
> +
> +static void klp_try_transition_running_task(struct task_struct *task)
> +{
> +	int cpu = task_cpu(task);
> +
> +	if (klp_signals_cnt && !(klp_signals_cnt % SIGNALS_TIMEOUT)) {
> +		struct klp_stop_work_info *info =
> +			per_cpu_ptr(&klp_stop_work_info, cpu);
> +
> +		if (test_and_set_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag))
> +			return;
> +
> +		info->task = get_task_struct(task);
> +		if (!stop_one_cpu_nowait(cpu, klp_transition_stop_work_fn, info,
> +					 per_cpu_ptr(&klp_transition_stop_work,
> +					 cpu)))
> +			put_task_struct(task);
> +	}
> +}
> +
>  /*
>   * Try to safely switch a task to the target patch state.  If it's currently
>   * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
> @@ -323,10 +379,7 @@ static bool klp_try_switch_task(struct task_struct *task)
>  	 * functions.  If all goes well, switch the task to the target patch
>  	 * state.
>  	 */
> -	if (task == current)
> -		ret = klp_check_and_switch_task(current, &old_name);
> -	else
> -		ret = task_call_func(task, klp_check_and_switch_task, &old_name);
> +	ret = klp_check_task(task, &old_name);
>  
>  	switch (ret) {
>  	case 0:		/* success */
> @@ -335,6 +388,7 @@ static bool klp_try_switch_task(struct task_struct *task)
>  	case -EBUSY:	/* klp_check_and_switch_task() */
>  		pr_debug("%s: %s:%d is running\n",
>  			 __func__, task->comm, task->pid);
> +		klp_try_transition_running_task(task);
>  		break;
>  	case -EINVAL:	/* klp_check_and_switch_task() */
>  		pr_debug("%s: %s:%d has an unreliable stack\n",
> -- 
> 2.20.1

Hi all,

Just a gentle ping on this patch.
Please let me know if there's anything I can improve or if you need
more information.

Thanks,
Zhe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] klp: use stop machine to check and expedite transition for running tasks
  2026-02-04  2:20 ` Josh Poimboeuf
  2026-02-04  2:47   ` Li Zhe
@ 2026-02-09 19:12   ` Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2026-02-09 19:12 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Li Zhe, jikos, mbenes, pmladek, joe.lawrence, live-patching,
	linux-kernel, qirui.001, vschneid, dave.hansen

On Tue, Feb 03, 2026 at 06:20:22PM -0800, Josh Poimboeuf wrote:
> On Mon, Feb 02, 2026 at 05:13:34PM +0800, Li Zhe wrote:
> > In the current KLP transition implementation, the strategy for running
> > tasks relies on waiting for a context switch to attempt to clear the
> > TIF_PATCH_PENDING flag. Alternatively, determine whether the
> > TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
> > process has yielded the CPU. However, this approach proves problematic
> > in certain environments.
> > 
> > Consider a scenario where the majority of system CPUs are configured
> > with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
> > to that physical core and configured with idle=poll within the guest.
> > Under such conditions, these vCPUs rarely leave the CPU. Combined with
> > the high core counts typical of modern server platforms, this results
> > in transition completion times that are not only excessively prolonged
> > but also highly unpredictable.
> > 
> > This patch resolves this issue by registering a callback with
> > stop_machine. The callback attempts to transition the associated running
> > task. In a VM environment configured with 32 CPUs, the live patching
> > operation completes promptly after the SIGNALS_TIMEOUT period with this
> > patch applied; without it, the process nearly fails to complete under
> > the same scenario.
> > 
> > Co-developed-by: Rui Qi <qirui.001@bytedance.com>
> > Signed-off-by: Rui Qi <qirui.001@bytedance.com>
> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> 
> PeterZ, what's your take on this?
> 
> I wonder if we could instead do resched_cpu() or something similar to
> trigger the call to klp_sched_try_switch() in __schedule()?

Yeah, this is broken. So the whole point of NOHZ_FULL is to not have the
CPU disturbed, *ever*.

People are working really hard to remove any and all disturbance from
these CPUs with the eventual goal of making any disturbance a fatal
condition (userspace will get a fatal signal if disturbed or so).

Explicitly adding disturbance to NOHZ_FULL is an absolute no-no.

NAK

There are two ways this can be solved:

1) make it a user problem -- userspace wants to load kernel patch,
 userspace can force their QEMU or whatnot through a system call to make
 progress

2) fix it properly and do it like the deferred IPI stuff; recognise
 that as long as the task is in userspace, it doesn't care about kernel
 text changes.

  https://lkml.kernel.org/r/20251114150133.1056710-1-vschneid@redhat.com

While 2 sounds easy, the tricky comes from the fact that you have to
deal with the task coming back to kernel space eventually, possibly in
the middle of your KLP patching. So you've got to do thing like that
patch series above, and make sure the whole of KLP happens while the
other CPU is in USER/GUEST context or waits for things when it tries to
leave while things are in progress.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-09 19:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-02  9:13 [PATCH] klp: use stop machine to check and expedite transition for running tasks Li Zhe
2026-02-04  2:20 ` Josh Poimboeuf
2026-02-04  2:47   ` Li Zhe
2026-02-09  2:54     ` Li Zhe
2026-02-09 19:12   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox