[PATCH 0/3] livepatch: Some improvements

Live Patching
 help / color / mirror / Atom feed

* [PATCH 0/3] livepatch: Some improvements
@ 2025-02-11  6:24 Yafang Shao
  2025-02-11  6:24 ` [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops() Yafang Shao
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Yafang Shao @ 2025-02-11  6:24 UTC (permalink / raw)
  To: jpoimboe, jikos, mbenes, pmladek, joe.lawrence; +Cc: live-patching, Yafang Shao

  #1: Clarify klp_add_nops()
  #2: Avoid hard lockup in klp transition
  #3: Avoid RCU stalls in klp transition

Yafang Shao (3):
  livepatch: Add comment to clarify klp_add_nops()
  livepatch: Avoid blocking tasklist_lock too long
  livepatch: Avoid potential RCU stalls in klp transition

 kernel/livepatch/core.c       |  3 +++
 kernel/livepatch/transition.c | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops()
  2025-02-11  6:24 [PATCH 0/3] livepatch: Some improvements Yafang Shao
@ 2025-02-11  6:24 ` Yafang Shao
  2025-02-12 12:51   ` Petr Mladek
  2025-02-11  6:24 ` [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Yafang Shao
  2025-02-11  6:24 ` [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition Yafang Shao
  2 siblings, 1 reply; 30+ messages in thread
From: Yafang Shao @ 2025-02-11  6:24 UTC (permalink / raw)
  To: jpoimboe, jikos, mbenes, pmladek, joe.lawrence; +Cc: live-patching, Yafang Shao

Add detailed comments to clarify the purpose of klp_add_nops() function.
These comments are based on Petr's explanation[0].

Link: https://lore.kernel.org/all/Z6XUA7D0eU_YDMVp@pathway.suse.cz/ [0]
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
---
 kernel/livepatch/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 0cd39954d5a1..5b2a52e7c2f6 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -604,6 +604,9 @@ static int klp_add_object_nops(struct klp_patch *patch,
  * Add 'nop' functions which simply return to the caller to run
  * the original function. The 'nop' functions are added to a
  * patch to facilitate a 'replace' mode.
+ *
+ * The 'nop' entries are added only for functions which are currently
+ * livepatched but are no longer included in the new livepatch.
  */
 static int klp_add_nops(struct klp_patch *patch)
 {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-11  6:24 [PATCH 0/3] livepatch: Some improvements Yafang Shao
  2025-02-11  6:24 ` [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops() Yafang Shao
@ 2025-02-11  6:24 ` Yafang Shao
  2025-02-12  0:40   ` Josh Poimboeuf
  2025-02-13 11:19   ` Find root of the stall: was: " Petr Mladek
  2025-02-11  6:24 ` [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition Yafang Shao
  2 siblings, 2 replies; 30+ messages in thread
From: Yafang Shao @ 2025-02-11  6:24 UTC (permalink / raw)
  To: jpoimboe, jikos, mbenes, pmladek, joe.lawrence; +Cc: live-patching, Yafang Shao

I encountered a hard lockup when attempting to reproduce the panic issue
occurred on our production servers [0]. The hard lockup is as follows,

[15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
[15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
[15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
[15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
[15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
[15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
[15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
[15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
[15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
[15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
[15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
[15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
[15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
[15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
[15852778.412319] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15
...
[15852778.412374] CPU: 15 PID: 1 Comm: systemd Kdump: loaded Tainted: G S      W  O  K    6.1.52-3
[15852778.412378] RIP: 0010:queued_write_lock_slowpath+0x75/0x135
...
[15852778.412397] Call Trace:
[15852778.412398]  <NMI>
[15852778.412400]  ? show_regs.cold+0x1a/0x1f
[15852778.412403]  ? watchdog_overflow_callback.cold+0x1e/0x70
[15852778.412406]  ? __perf_event_overflow+0x102/0x1e0
[15852778.412409]  ? perf_event_overflow+0x19/0x20
[15852778.412411]  ? x86_pmu_handle_irq+0xf7/0x160
[15852778.412415]  ? flush_tlb_one_kernel+0xe/0x30
[15852778.412418]  ? __set_pte_vaddr+0x2d/0x40
[15852778.412421]  ? set_pte_vaddr_p4d+0x3d/0x50
[15852778.412423]  ? set_pte_vaddr+0x6d/0xa0
[15852778.412424]  ? __native_set_fixmap+0x28/0x40
[15852778.412426]  ? native_set_fixmap+0x54/0x60
[15852778.412428]  ? ghes_copy_tofrom_phys+0x75/0x120
[15852778.412431]  ? __ghes_peek_estatus.isra.0+0x4e/0xb0
[15852778.412434]  ? ghes_in_nmi_queue_one_entry.constprop.0+0x3d/0x240
[15852778.412437]  ? amd_pmu_handle_irq+0x48/0xc0
[15852778.412438]  ? perf_event_nmi_handler+0x2d/0x50
[15852778.412440]  ? nmi_handle+0x60/0x120
[15852778.412443]  ? default_do_nmi+0x45/0x120
[15852778.412446]  ? exc_nmi+0x118/0x150
[15852778.412447]  ? end_repeat_nmi+0x16/0x67
[15852778.412450]  ? copy_process+0xf01/0x19f0
[15852778.412452]  ? queued_write_lock_slowpath+0x75/0x135
[15852778.412455]  ? queued_write_lock_slowpath+0x75/0x135
[15852778.412457]  ? queued_write_lock_slowpath+0x75/0x135
[15852778.412459]  </NMI>
[15852778.412460]  <TASK>
[15852778.412461]  _raw_write_lock_irq+0x43/0x50
[15852778.412463]  copy_process+0xf01/0x19f0
[15852778.412466]  kernel_clone+0x9d/0x3e0
[15852778.412468]  ? autofs_dev_ioctl_requester+0x100/0x100
[15852778.412471]  __do_sys_clone+0x66/0x90
[15852778.412475]  __x64_sys_clone+0x25/0x30
[15852778.412477]  do_syscall_64+0x38/0x90
[15852778.412478]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[15852778.412481] RIP: 0033:0x7f426bb3b9c1
...

Notebly, the dynamic_debug is enabled to collect the debug information
when applying a livepatch. Therefore, there're numerous debug
information.

As the execution of klp_try_switch_task() has been holding the
tasklist_lock, if another task is trying to hold it again, it will have to
spin on it. In the copy_process() path, the irq lock is disabled as well,
and thus this hard lockup occurs. We can avoid this hard lockup by checking
the spinlock contention.

This change is based on code originally developed by Petr[1].

Link: https://lore.kernel.org/all/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com/ [0]
Link: https://lore.kernel.org/all/Z6Tmqro6CSm0h-E3@pathway.suse.cz/ [1]
Originally-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/livepatch/transition.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index ba069459c101..04704a19dcfe 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -450,6 +450,7 @@ static void klp_send_signals(void)
  */
 void klp_try_complete_transition(void)
 {
+	unsigned long timeout, proceed_pending_processes;
 	unsigned int cpu;
 	struct task_struct *g, *task;
 	struct klp_patch *patch;
@@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
 	 * unless the patch includes changes to a very common function.
 	 */
 	read_lock(&tasklist_lock);
-	for_each_process_thread(g, task)
+	timeout = jiffies + HZ;
+	proceed_pending_processes = 0;
+	for_each_process_thread(g, task) {
+		/* check if this task has already switched over */
+		if (task->patch_state == klp_target_state)
+			continue;
+
+		proceed_pending_processes++;
+
 		if (!klp_try_switch_task(task))
 			complete = false;
+
+		/*
+		 * Prevent hardlockup by not blocking tasklist_lock for too long.
+		 * But guarantee the forward progress by making sure at least
+		 * some pending processes were checked.
+		 */
+		if (rwlock_is_contended(&tasklist_lock) &&
+		    time_after(jiffies, timeout) &&
+		    proceed_pending_processes > 100) {
+			complete = false;
+			break;
+		}
+	}
 	read_unlock(&tasklist_lock);
 
 	/*
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition
  2025-02-11  6:24 [PATCH 0/3] livepatch: Some improvements Yafang Shao
  2025-02-11  6:24 ` [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops() Yafang Shao
  2025-02-11  6:24 ` [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Yafang Shao
@ 2025-02-11  6:24 ` Yafang Shao
  2025-02-12  0:52   ` Josh Poimboeuf
  2 siblings, 1 reply; 30+ messages in thread
From: Yafang Shao @ 2025-02-11  6:24 UTC (permalink / raw)
  To: jpoimboe, jikos, mbenes, pmladek, joe.lawrence; +Cc: live-patching, Yafang Shao

During the livepatch loading process, several RCU warnings were triggered on
our production servers, as shown below:

[20329161.705294] livepatch: enabling patch 'livepatch_61_release6'
[20329161.713184] livepatch: 'livepatch_61_release6': starting patching transition
[20329172.998661] rcu_tasks_wait_gp: rcu_tasks grace period 1109713 is 10099 jiffies old.
[20329193.049536] rcu_tasks_wait_gp: rcu_tasks grace period 1109717 is 10059 jiffies old.
[20329213.131403] rcu_tasks_wait_gp: rcu_tasks grace period 1109725 is 10037 jiffies old.
[20329213.934005] livepatch: 'livepatch_61_release6': patching complete

The cause of these warnings was that the KLP transition was holding the
tasklist_lock (which is part of the RCU read-side critical section) for
too long, triggering the warning. To resolve this, we should avoid holding
the lock for an extended period. By checking need_resched(), we can ensure
the RCU warning no longer appears.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/livepatch/transition.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 04704a19dcfe..3d1f8d3d0f5a 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -491,9 +491,18 @@ void klp_try_complete_transition(void)
 			complete = false;
 			break;
 		}
+
+		/* Avoid potential RCU stalls */
+		if (need_resched()) {
+			complete = false;
+			break;
+		}
 	}
 	read_unlock(&tasklist_lock);

+	/* The above operation might be expensive. */
+	cond_resched();
+
 	/*
 	 * Ditto for the idle "swapper" tasks.
 	 */
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-11  6:24 ` [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Yafang Shao
@ 2025-02-12  0:40   ` Josh Poimboeuf
  2025-02-12  2:34     ` Yafang Shao
  2025-02-13 11:19   ` Find root of the stall: was: " Petr Mladek
  1 sibling, 1 reply; 30+ messages in thread
From: Josh Poimboeuf @ 2025-02-12  0:40 UTC (permalink / raw)
  To: Yafang Shao; +Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching

On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
>  void klp_try_complete_transition(void)
>  {
> +	unsigned long timeout, proceed_pending_processes;
>  	unsigned int cpu;
>  	struct task_struct *g, *task;
>  	struct klp_patch *patch;
> @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
>  	 * unless the patch includes changes to a very common function.
>  	 */
>  	read_lock(&tasklist_lock);
> -	for_each_process_thread(g, task)
> +	timeout = jiffies + HZ;
> +	proceed_pending_processes = 0;
> +	for_each_process_thread(g, task) {
> +		/* check if this task has already switched over */
> +		if (task->patch_state == klp_target_state)
> +			continue;
> +
> +		proceed_pending_processes++;
> +
>  		if (!klp_try_switch_task(task))
>  			complete = false;
> +
> +		/*
> +		 * Prevent hardlockup by not blocking tasklist_lock for too long.
> +		 * But guarantee the forward progress by making sure at least
> +		 * some pending processes were checked.
> +		 */
> +		if (rwlock_is_contended(&tasklist_lock) &&
> +		    time_after(jiffies, timeout) &&
> +		    proceed_pending_processes > 100) {
> +			complete = false;
> +			break;
> +		}
> +	}
>  	read_unlock(&tasklist_lock);

Instead of all this can we not just use rcu_read_lock() instead of
tasklist_lock?

Petr, I know you mentioned that would widen the race window for the
do_exit() path, but don't we need to fix that race anyway?

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition
  2025-02-11  6:24 ` [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition Yafang Shao
@ 2025-02-12  0:52   ` Josh Poimboeuf
  2025-02-12  2:42     ` Yafang Shao
  0 siblings, 1 reply; 30+ messages in thread
From: Josh Poimboeuf @ 2025-02-12  0:52 UTC (permalink / raw)
  To: Yafang Shao; +Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching

On Tue, Feb 11, 2025 at 02:24:37PM +0800, Yafang Shao wrote:
> +++ b/kernel/livepatch/transition.c
> @@ -491,9 +491,18 @@ void klp_try_complete_transition(void)
>  			complete = false;
>  			break;
>  		}
> +
> +		/* Avoid potential RCU stalls */
> +		if (need_resched()) {
> +			complete = false;
> +			break;
> +		}
>  	}
>  	read_unlock(&tasklist_lock);
>  
> +	/* The above operation might be expensive. */
> +	cond_resched();
> +

This is also nasty, yet another reason to use rcu_read_lock() if we can.

Also, with the new lazy preemption model, I believe cond_resched() is
pretty much deprecated.

Also, for future patch sets, please also add lkml to cc.  Thanks.

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-12  0:40   ` Josh Poimboeuf
@ 2025-02-12  2:34     ` Yafang Shao
  2025-02-12 11:54       ` Yafang Shao
  0 siblings, 1 reply; 30+ messages in thread
From: Yafang Shao @ 2025-02-12  2:34 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching

On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
> >  void klp_try_complete_transition(void)
> >  {
> > +     unsigned long timeout, proceed_pending_processes;
> >       unsigned int cpu;
> >       struct task_struct *g, *task;
> >       struct klp_patch *patch;
> > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
> >        * unless the patch includes changes to a very common function.
> >        */
> >       read_lock(&tasklist_lock);
> > -     for_each_process_thread(g, task)
> > +     timeout = jiffies + HZ;
> > +     proceed_pending_processes = 0;
> > +     for_each_process_thread(g, task) {
> > +             /* check if this task has already switched over */
> > +             if (task->patch_state == klp_target_state)
> > +                     continue;
> > +
> > +             proceed_pending_processes++;
> > +
> >               if (!klp_try_switch_task(task))
> >                       complete = false;
> > +
> > +             /*
> > +              * Prevent hardlockup by not blocking tasklist_lock for too long.
> > +              * But guarantee the forward progress by making sure at least
> > +              * some pending processes were checked.
> > +              */
> > +             if (rwlock_is_contended(&tasklist_lock) &&
> > +                 time_after(jiffies, timeout) &&
> > +                 proceed_pending_processes > 100) {
> > +                     complete = false;
> > +                     break;
> > +             }
> > +     }
> >       read_unlock(&tasklist_lock);
>
> Instead of all this can we not just use rcu_read_lock() instead of
> tasklist_lock?

I’m concerned that there’s a potential race condition in fork() if we
use RCU, as illustrated below:

  CPU0                                                     CPU1

write_lock_irq(&tasklist_lock);
klp_copy_process(p);

                                         parent->patch_state=klp_target_state

list_add_tail_rcu(&p->tasks, &init_task.tasks);
write_unlock_irq(&tasklist_lock);

In this scenario, after the parent executes klp_copy_process(p) to
copy its patch_state to the child, but before adding the child to the
task list, the parent’s patch_state might be updated by the KLP
transition. This could result in the child being left with an outdated
state.

We need to ensure that klp_copy_process() and list_add_tail_rcu() are
treated as a single atomic operation.

--
Regards



Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition
  2025-02-12  0:52   ` Josh Poimboeuf
@ 2025-02-12  2:42     ` Yafang Shao
  2025-02-13  1:58       ` Josh Poimboeuf
  0 siblings, 1 reply; 30+ messages in thread
From: Yafang Shao @ 2025-02-12  2:42 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching

On Wed, Feb 12, 2025 at 8:52 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Tue, Feb 11, 2025 at 02:24:37PM +0800, Yafang Shao wrote:
> > +++ b/kernel/livepatch/transition.c
> > @@ -491,9 +491,18 @@ void klp_try_complete_transition(void)
> >                       complete = false;
> >                       break;
> >               }
> > +
> > +             /* Avoid potential RCU stalls */
> > +             if (need_resched()) {
> > +                     complete = false;
> > +                     break;
> > +             }
> >       }
> >       read_unlock(&tasklist_lock);
> >
> > +     /* The above operation might be expensive. */
> > +     cond_resched();
> > +
>
> This is also nasty, yet another reason to use rcu_read_lock() if we can.

The RCU stalls still happen even if we use rcu_read_lock() as it is
still in the RCU read-side critical section.

>
> Also, with the new lazy preemption model, I believe cond_resched() is
> pretty much deprecated.

I'm not familiar with the newly introduced PREEMPT_LAZY, but it
appears to be a configuration option. Therefore, we still need this
cond_resched() for users who don't have PREEMPT_LAZY set as the
default.

>
> Also, for future patch sets, please also add lkml to cc.  Thanks.

Sure.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-12  2:34     ` Yafang Shao
@ 2025-02-12 11:54       ` Yafang Shao
  2025-02-12 15:42         ` Petr Mladek
  2025-02-13  2:47         ` Josh Poimboeuf
  0 siblings, 2 replies; 30+ messages in thread
From: Yafang Shao @ 2025-02-12 11:54 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching

On Wed, Feb 12, 2025 at 10:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
> > >  void klp_try_complete_transition(void)
> > >  {
> > > +     unsigned long timeout, proceed_pending_processes;
> > >       unsigned int cpu;
> > >       struct task_struct *g, *task;
> > >       struct klp_patch *patch;
> > > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
> > >        * unless the patch includes changes to a very common function.
> > >        */
> > >       read_lock(&tasklist_lock);
> > > -     for_each_process_thread(g, task)
> > > +     timeout = jiffies + HZ;
> > > +     proceed_pending_processes = 0;
> > > +     for_each_process_thread(g, task) {
> > > +             /* check if this task has already switched over */
> > > +             if (task->patch_state == klp_target_state)
> > > +                     continue;
> > > +
> > > +             proceed_pending_processes++;
> > > +
> > >               if (!klp_try_switch_task(task))
> > >                       complete = false;
> > > +
> > > +             /*
> > > +              * Prevent hardlockup by not blocking tasklist_lock for too long.
> > > +              * But guarantee the forward progress by making sure at least
> > > +              * some pending processes were checked.
> > > +              */
> > > +             if (rwlock_is_contended(&tasklist_lock) &&
> > > +                 time_after(jiffies, timeout) &&
> > > +                 proceed_pending_processes > 100) {
> > > +                     complete = false;
> > > +                     break;
> > > +             }
> > > +     }
> > >       read_unlock(&tasklist_lock);
> >
> > Instead of all this can we not just use rcu_read_lock() instead of
> > tasklist_lock?
>
> I’m concerned that there’s a potential race condition in fork() if we
> use RCU, as illustrated below:
>
>   CPU0                                                     CPU1
>
> write_lock_irq(&tasklist_lock);
> klp_copy_process(p);
>
>                                          parent->patch_state=klp_target_state
>
> list_add_tail_rcu(&p->tasks, &init_task.tasks);
> write_unlock_irq(&tasklist_lock);
>
> In this scenario, after the parent executes klp_copy_process(p) to
> copy its patch_state to the child, but before adding the child to the
> task list, the parent’s patch_state might be updated by the KLP
> transition. This could result in the child being left with an outdated
> state.
>
> We need to ensure that klp_copy_process() and list_add_tail_rcu() are
> treated as a single atomic operation.

Before the newly forked task is added to the task list, it doesn’t
execute any code and can always be considered safe during the KLP
transition. Therefore, we could replace klp_copy_process() with
klp_init_process(), where we simply set patch_state to
KLP_TRANSITION_IDLE, as shown below:

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process(
                p->exit_signal = args->exit_signal;
        }

-       klp_copy_process(p);
+       // klp_init_process(p);
+       clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
+       child->patch_state = KLP_TRANSITION_IDLE;

        sched_core_fork(p);

Some additional changes may be needed, such as removing
WARN_ON_ONCE(patch_state == KLP_TRANSITION_IDLE) in
klp_ftrace_handler().

This would allow us to safely convert tasklist_lock to rcu_read_lock()
during the KLP transition.
Of course, we also need to address the race condition in do_exit().


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops()
  2025-02-11  6:24 ` [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops() Yafang Shao
@ 2025-02-12 12:51   ` Petr Mladek
  2025-02-13  5:49     ` Yafang Shao
  0 siblings, 1 reply; 30+ messages in thread
From: Petr Mladek @ 2025-02-12 12:51 UTC (permalink / raw)
  To: Yafang Shao; +Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching

On Tue 2025-02-11 14:24:35, Yafang Shao wrote:
> Add detailed comments to clarify the purpose of klp_add_nops() function.
> These comments are based on Petr's explanation[0].
> 
> Link: https://lore.kernel.org/all/Z6XUA7D0eU_YDMVp@pathway.suse.cz/ [0]
> Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Petr Mladek <pmladek@suse.com>
> ---
>  kernel/livepatch/core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index 0cd39954d5a1..5b2a52e7c2f6 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -604,6 +604,9 @@ static int klp_add_object_nops(struct klp_patch *patch,
>   * Add 'nop' functions which simply return to the caller to run
>   * the original function. The 'nop' functions are added to a
>   * patch to facilitate a 'replace' mode.
> + *
> + * The 'nop' entries are added only for functions which are currently
> + * livepatched but are no longer included in the new livepatch.
>   */

The new comment makes perfect sense. But I would re-shuffle the text a bit
to to make it more clear that it is used only in the 'replace' mode.
Something like:

/*
 * Add 'nop' functions which simply return to the caller to run the original
 * function.
 *
 * They are added only when the atomic replace mode is used and only for
 * functions which are currently livepatched but are no longer included
 * in the new livepatch.
 */


>  static int klp_add_nops(struct klp_patch *patch)
>  {

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-12 11:54       ` Yafang Shao
@ 2025-02-12 15:42         ` Petr Mladek
  2025-02-13  1:36           ` Josh Poimboeuf
  2025-02-13  2:47         ` Josh Poimboeuf
  1 sibling, 1 reply; 30+ messages in thread
From: Petr Mladek @ 2025-02-12 15:42 UTC (permalink / raw)
  To: Yafang Shao; +Cc: Josh Poimboeuf, jikos, mbenes, joe.lawrence, live-patching

On Wed 2025-02-12 19:54:21, Yafang Shao wrote:
> On Wed, Feb 12, 2025 at 10:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Wed, Feb 12, 2025 at 8:40 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > On Tue, Feb 11, 2025 at 02:24:36PM +0800, Yafang Shao wrote:
> > > >  void klp_try_complete_transition(void)
> > > >  {
> > > > +     unsigned long timeout, proceed_pending_processes;
> > > >       unsigned int cpu;
> > > >       struct task_struct *g, *task;
> > > >       struct klp_patch *patch;
> > > > @@ -467,9 +468,30 @@ void klp_try_complete_transition(void)
> > > >        * unless the patch includes changes to a very common function.
> > > >        */
> > > >       read_lock(&tasklist_lock);
> > > > -     for_each_process_thread(g, task)
> > > > +     timeout = jiffies + HZ;
> > > > +     proceed_pending_processes = 0;
> > > > +     for_each_process_thread(g, task) {
> > > > +             /* check if this task has already switched over */
> > > > +             if (task->patch_state == klp_target_state)
> > > > +                     continue;
> > > > +
> > > > +             proceed_pending_processes++;
> > > > +
> > > >               if (!klp_try_switch_task(task))
> > > >                       complete = false;
> > > > +
> > > > +             /*
> > > > +              * Prevent hardlockup by not blocking tasklist_lock for too long.
> > > > +              * But guarantee the forward progress by making sure at least
> > > > +              * some pending processes were checked.
> > > > +              */
> > > > +             if (rwlock_is_contended(&tasklist_lock) &&
> > > > +                 time_after(jiffies, timeout) &&
> > > > +                 proceed_pending_processes > 100) {
> > > > +                     complete = false;
> > > > +                     break;
> > > > +             }
> > > > +     }
> > > >       read_unlock(&tasklist_lock);
> > >
> > > Instead of all this can we not just use rcu_read_lock() instead of
> > > tasklist_lock?
> >
> > I’m concerned that there’s a potential race condition in fork() if we
> > use RCU, as illustrated below:
> >
> >   CPU0                                                     CPU1
> >
> > write_lock_irq(&tasklist_lock);
> > klp_copy_process(p);
> >
> >                                          parent->patch_state=klp_target_state
> >
> > list_add_tail_rcu(&p->tasks, &init_task.tasks);
> > write_unlock_irq(&tasklist_lock);
> >
> > In this scenario, after the parent executes klp_copy_process(p) to
> > copy its patch_state to the child, but before adding the child to the
> > task list, the parent’s patch_state might be updated by the KLP
> > transition. This could result in the child being left with an outdated
> > state.

Similar race actually existed even before. 

> > We need to ensure that klp_copy_process() and list_add_tail_rcu() are
> > treated as a single atomic operation.
> 
> Before the newly forked task is added to the task list, it doesn’t
> execute any code and can always be considered safe during the KLP
> transition. Therefore, we could replace klp_copy_process() with
> klp_init_process(), where we simply set patch_state to
> KLP_TRANSITION_IDLE, as shown below:

This might work when a new process is created, for example, using
execve(). But it would fail when the process is just forked (man 2 fork)
and both parent and child continue running the same code.


> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process(
>                 p->exit_signal = args->exit_signal;
>         }
> 
> -       klp_copy_process(p);
> +       // klp_init_process(p);
> +       clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
> +       child->patch_state = KLP_TRANSITION_IDLE;

This is exactly the loction where we depend on the sychronization
with the tasklist_lock. It allows us to synchronize both
TIF_PATCH_PENDING flag and p->patch_state between the parent
and child, see the comment in klp_copy_process():

/* Called from copy_process() during fork */
void klp_copy_process(struct task_struct *child)
{
	/*
	 * The parent process may have gone through a KLP transition since
	 * the thread flag was copied in setup_thread_stack earlier. Bring
	 * the task flag up to date with the parent here.
	 *
	 * The operation is serialized against all klp_*_transition()
	 * operations by the tasklist_lock. The only exceptions are
	 * klp_update_patch_state(current) and __klp_sched_try_switch(), but we
	 * cannot race with them because we are current.
	 */
	if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
		set_tsk_thread_flag(child, TIF_PATCH_PENDING);
	else
		clear_tsk_thread_flag(child, TIF_PATCH_PENDING);

	child->patch_state = current->patch_state;
}

When using rcu_read_lock() in klp_try_complete_transition,
child->patch_state might get an outdated information. The following
race commes to my mind:

CPU1				CPU1

				klp_try_complete_transition()


taskA:	
 + fork()
   + klp_copy_process()
      child->patch_state = KLP_PATCH_UNPATCHED

				  klp_try_switch_task(taskA)
				    // safe

				child->patch_state = KLP_PATCH_PATCHED

				all processes patched

				klp_finish_transition()


	list_add_tail_rcu(&p->thread_node,
			  &p->signal->thread_head);


BANG: The forked task has KLP_PATCH_UNPATCHED so that
      klp_ftrace_handler() will redirect it to the old code.

      But CPU1 thinks that all tasks are migrated and is going
      to finish the transition



My opinion:

I am afraid that using rcu_read_lock() in klp_try_complete_transition()
might cause more harm than good. The code already is complicated
and this might make it even more tricky.

I would first like to understand how exactly the stall happens.
It is possible that even rcu_read_lock() won't help here!

If the it takes too long time to check backtraces of all pending
processes then even rcu_read_lock() might trigger the RCU stall
warning as well.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-12 15:42         ` Petr Mladek
@ 2025-02-13  1:36           ` Josh Poimboeuf
  2025-02-13  5:53             ` Yafang Shao
  2025-02-13  9:48             ` Petr Mladek
  0 siblings, 2 replies; 30+ messages in thread
From: Josh Poimboeuf @ 2025-02-13  1:36 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Yafang Shao, jikos, mbenes, joe.lawrence, live-patching

On Wed, Feb 12, 2025 at 04:42:39PM +0100, Petr Mladek wrote:
> CPU1				CPU1
> 
> 				klp_try_complete_transition()
> 
> 
> taskA:	
>  + fork()
>    + klp_copy_process()
>       child->patch_state = KLP_PATCH_UNPATCHED
> 
> 				  klp_try_switch_task(taskA)
> 				    // safe
> 
> 				child->patch_state = KLP_PATCH_PATCHED
> 
> 				all processes patched
> 
> 				klp_finish_transition()
> 
> 
> 	list_add_tail_rcu(&p->thread_node,
> 			  &p->signal->thread_head);
> 
> 
> BANG: The forked task has KLP_PATCH_UNPATCHED so that
>       klp_ftrace_handler() will redirect it to the old code.
> 
>       But CPU1 thinks that all tasks are migrated and is going
>       to finish the transition


Maybe klp_try_complete_transition() could iterate the tasks in two
passes?  The first pass would use rcu_read_lock().  Then if all tasks
appear to be patched, try again with tasklist_lock.

Or, we could do something completely different.  There's no need for
klp_copy_process() to copy the parent's state: a newly forked task can
be patched immediately because it has no stack.

So instead, just initialize it to KLP_TRANSITION_IDLE with
TIF_PATCH_PENDING cleared.  Then when klp_ftrace_handler() encounters a
KLP_TRANSITION_IDLE task, it considers its state to be 'klp_target_state'.

// called from copy_process()
void klp_init_task(struct task_struct *child)
{
	/* klp_ftrace_handler() will transition the task immediately */
	child->patch_state = KLP_TRANSITION_IDLE;
	clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
}


klp_ftrace_handler():

		patch_state = current->patch_state;

		if (patch_state == KLP_TRANSITION_IDLE)
			patch_state = klp_target_state;
		...

Hm?

> I would first like to understand how exactly the stall happens.
> It is possible that even rcu_read_lock() won't help here!
> 
> If the it takes too long time to check backtraces of all pending
> processes then even rcu_read_lock() might trigger the RCU stall
> warning as well.

Yeah, based on Yafang's reply it appears there are RCU stalls either
way.

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition
  2025-02-12  2:42     ` Yafang Shao
@ 2025-02-13  1:58       ` Josh Poimboeuf
  2025-02-13  5:51         ` Yafang Shao
  0 siblings, 1 reply; 30+ messages in thread
From: Josh Poimboeuf @ 2025-02-13  1:58 UTC (permalink / raw)
  To: Yafang Shao; +Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching

On Wed, Feb 12, 2025 at 10:42:33AM +0800, Yafang Shao wrote:
> On Wed, Feb 12, 2025 at 8:52 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > On Tue, Feb 11, 2025 at 02:24:37PM +0800, Yafang Shao wrote:
> > > +++ b/kernel/livepatch/transition.c
> > > @@ -491,9 +491,18 @@ void klp_try_complete_transition(void)
> > >                       complete = false;
> > >                       break;
> > >               }
> > > +
> > > +             /* Avoid potential RCU stalls */
> > > +             if (need_resched()) {
> > > +                     complete = false;
> > > +                     break;
> > > +             }
> > >       }
> > >       read_unlock(&tasklist_lock);
> > >
> > > +     /* The above operation might be expensive. */
> > > +     cond_resched();
> > > +
> >
> > This is also nasty, yet another reason to use rcu_read_lock() if we can.
> 
> The RCU stalls still happen even if we use rcu_read_lock() as it is
> still in the RCU read-side critical section.
> 
> >
> > Also, with the new lazy preemption model, I believe cond_resched() is
> > pretty much deprecated.
> 
> I'm not familiar with the newly introduced PREEMPT_LAZY, but it
> appears to be a configuration option. Therefore, we still need this
> cond_resched() for users who don't have PREEMPT_LAZY set as the
> default.

IIRC, the goal is to get rid of PREEMPT_NONE and PREEMPT_VOLUNTARY (and
PREEMPT_DYNAMIC) and to remove almost all the cond_resched() calls.  So
we should really avoid adding them at this point.

The patch already breaks out of the loop for need_resched(), is the
cond_resched() really needed there?  For the PREEMPT_VOLUNTARY case I
think it should already get preempted anyway when it releases the lock.

And regardless, by that point it's fairly close to scheduling the
delayed work and returning back to the user.  That could happen even
sooner by skipping the "swapper" task loop.

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-12 11:54       ` Yafang Shao
  2025-02-12 15:42         ` Petr Mladek
@ 2025-02-13  2:47         ` Josh Poimboeuf
  1 sibling, 0 replies; 30+ messages in thread
From: Josh Poimboeuf @ 2025-02-13  2:47 UTC (permalink / raw)
  To: Yafang Shao; +Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching

On Wed, Feb 12, 2025 at 07:54:21PM +0800, Yafang Shao wrote:
> Before the newly forked task is added to the task list, it doesn’t
> execute any code and can always be considered safe during the KLP
> transition. Therefore, we could replace klp_copy_process() with
> klp_init_process(), where we simply set patch_state to
> KLP_TRANSITION_IDLE, as shown below:
> 
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2544,7 +2544,9 @@ __latent_entropy struct task_struct *copy_process(
>                 p->exit_signal = args->exit_signal;
>         }
> 
> -       klp_copy_process(p);
> +       // klp_init_process(p);
> +       clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
> +       child->patch_state = KLP_TRANSITION_IDLE;
> 
>         sched_core_fork(p);
> 
> Some additional changes may be needed, such as removing
> WARN_ON_ONCE(patch_state == KLP_TRANSITION_IDLE) in
> klp_ftrace_handler().

Oops, I managed to miss this email before my reply.  Looks like we had a
similar idea :-)

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops()
  2025-02-12 12:51   ` Petr Mladek
@ 2025-02-13  5:49     ` Yafang Shao
  0 siblings, 0 replies; 30+ messages in thread
From: Yafang Shao @ 2025-02-13  5:49 UTC (permalink / raw)
  To: Petr Mladek; +Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching

On Wed, Feb 12, 2025 at 8:51 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Tue 2025-02-11 14:24:35, Yafang Shao wrote:
> > Add detailed comments to clarify the purpose of klp_add_nops() function.
> > These comments are based on Petr's explanation[0].
> >
> > Link: https://lore.kernel.org/all/Z6XUA7D0eU_YDMVp@pathway.suse.cz/ [0]
> > Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: Petr Mladek <pmladek@suse.com>
> > ---
> >  kernel/livepatch/core.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> > index 0cd39954d5a1..5b2a52e7c2f6 100644
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -604,6 +604,9 @@ static int klp_add_object_nops(struct klp_patch *patch,
> >   * Add 'nop' functions which simply return to the caller to run
> >   * the original function. The 'nop' functions are added to a
> >   * patch to facilitate a 'replace' mode.
> > + *
> > + * The 'nop' entries are added only for functions which are currently
> > + * livepatched but are no longer included in the new livepatch.
> >   */
>
> The new comment makes perfect sense. But I would re-shuffle the text a bit
> to to make it more clear that it is used only in the 'replace' mode.
> Something like:
>
> /*
>  * Add 'nop' functions which simply return to the caller to run the original
>  * function.
>  *
>  * They are added only when the atomic replace mode is used and only for
>  * functions which are currently livepatched but are no longer included
>  * in the new livepatch.
>  */

Thanks for your suggestion. I’ll make the change in the next version.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition
  2025-02-13  1:58       ` Josh Poimboeuf
@ 2025-02-13  5:51         ` Yafang Shao
  0 siblings, 0 replies; 30+ messages in thread
From: Yafang Shao @ 2025-02-13  5:51 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching

On Thu, Feb 13, 2025 at 9:58 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Wed, Feb 12, 2025 at 10:42:33AM +0800, Yafang Shao wrote:
> > On Wed, Feb 12, 2025 at 8:52 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > On Tue, Feb 11, 2025 at 02:24:37PM +0800, Yafang Shao wrote:
> > > > +++ b/kernel/livepatch/transition.c
> > > > @@ -491,9 +491,18 @@ void klp_try_complete_transition(void)
> > > >                       complete = false;
> > > >                       break;
> > > >               }
> > > > +
> > > > +             /* Avoid potential RCU stalls */
> > > > +             if (need_resched()) {
> > > > +                     complete = false;
> > > > +                     break;
> > > > +             }
> > > >       }
> > > >       read_unlock(&tasklist_lock);
> > > >
> > > > +     /* The above operation might be expensive. */
> > > > +     cond_resched();
> > > > +
> > >
> > > This is also nasty, yet another reason to use rcu_read_lock() if we can.
> >
> > The RCU stalls still happen even if we use rcu_read_lock() as it is
> > still in the RCU read-side critical section.
> >
> > >
> > > Also, with the new lazy preemption model, I believe cond_resched() is
> > > pretty much deprecated.
> >
> > I'm not familiar with the newly introduced PREEMPT_LAZY, but it
> > appears to be a configuration option. Therefore, we still need this
> > cond_resched() for users who don't have PREEMPT_LAZY set as the
> > default.
>
> IIRC, the goal is to get rid of PREEMPT_NONE and PREEMPT_VOLUNTARY (and
> PREEMPT_DYNAMIC) and to remove almost all the cond_resched() calls.  So
> we should really avoid adding them at this point.
>
> The patch already breaks out of the loop for need_resched(), is the
> cond_resched() really needed there?  For the PREEMPT_VOLUNTARY case I
> think it should already get preempted anyway when it releases the lock.
>
> And regardless, by that point it's fairly close to scheduling the
> delayed work and returning back to the user.  That could happen even
> sooner by skipping the "swapper" task loop.

You're correct. I’ve verified that it can avoid the RCU stalls even
without the cond_resched().

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-13  1:36           ` Josh Poimboeuf
@ 2025-02-13  5:53             ` Yafang Shao
  2025-02-13  9:48             ` Petr Mladek
  1 sibling, 0 replies; 30+ messages in thread
From: Yafang Shao @ 2025-02-13  5:53 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Petr Mladek, jikos, mbenes, joe.lawrence, live-patching

On Thu, Feb 13, 2025 at 9:36 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Wed, Feb 12, 2025 at 04:42:39PM +0100, Petr Mladek wrote:
> > CPU1                          CPU1
> >
> >                               klp_try_complete_transition()
> >
> >
> > taskA:
> >  + fork()
> >    + klp_copy_process()
> >       child->patch_state = KLP_PATCH_UNPATCHED
> >
> >                                 klp_try_switch_task(taskA)
> >                                   // safe
> >
> >                               child->patch_state = KLP_PATCH_PATCHED
> >
> >                               all processes patched
> >
> >                               klp_finish_transition()
> >
> >
> >       list_add_tail_rcu(&p->thread_node,
> >                         &p->signal->thread_head);
> >
> >
> > BANG: The forked task has KLP_PATCH_UNPATCHED so that
> >       klp_ftrace_handler() will redirect it to the old code.
> >
> >       But CPU1 thinks that all tasks are migrated and is going
> >       to finish the transition
>
>
> Maybe klp_try_complete_transition() could iterate the tasks in two
> passes?  The first pass would use rcu_read_lock().  Then if all tasks
> appear to be patched, try again with tasklist_lock.

This option is much simpler, easier to understand, and more
maintainable. I prefer this approach.

>
> Or, we could do something completely different.  There's no need for
> klp_copy_process() to copy the parent's state: a newly forked task can
> be patched immediately because it has no stack.
>
> So instead, just initialize it to KLP_TRANSITION_IDLE with
> TIF_PATCH_PENDING cleared.  Then when klp_ftrace_handler() encounters a
> KLP_TRANSITION_IDLE task, it considers its state to be 'klp_target_state'.
>
> // called from copy_process()
> void klp_init_task(struct task_struct *child)
> {
>         /* klp_ftrace_handler() will transition the task immediately */
>         child->patch_state = KLP_TRANSITION_IDLE;
>         clear_tsk_thread_flag(child, TIF_PATCH_PENDING);
> }
>
>
> klp_ftrace_handler():
>
>                 patch_state = current->patch_state;
>
>                 if (patch_state == KLP_TRANSITION_IDLE)
>                         patch_state = klp_target_state;
>                 ...
>
> Hm?


-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-13  1:36           ` Josh Poimboeuf
  2025-02-13  5:53             ` Yafang Shao
@ 2025-02-13  9:48             ` Petr Mladek
  2025-02-13 17:32               ` Josh Poimboeuf
  1 sibling, 1 reply; 30+ messages in thread
From: Petr Mladek @ 2025-02-13  9:48 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Yafang Shao, jikos, mbenes, joe.lawrence, live-patching

On Wed 2025-02-12 17:36:03, Josh Poimboeuf wrote:
> On Wed, Feb 12, 2025 at 04:42:39PM +0100, Petr Mladek wrote:
> > CPU1				CPU1
> > 
> > 				klp_try_complete_transition()
> > 
> > 
> > taskA:	
> >  + fork()
> >    + klp_copy_process()
> >       child->patch_state = KLP_PATCH_UNPATCHED
> > 
> > 				  klp_try_switch_task(taskA)
> > 				    // safe
> > 
> > 				child->patch_state = KLP_PATCH_PATCHED
> > 
> > 				all processes patched
> > 
> > 				klp_finish_transition()
> > 
> > 
> > 	list_add_tail_rcu(&p->thread_node,
> > 			  &p->signal->thread_head);
> > 
> > 
> > BANG: The forked task has KLP_PATCH_UNPATCHED so that
> >       klp_ftrace_handler() will redirect it to the old code.
> > 
> >       But CPU1 thinks that all tasks are migrated and is going
> >       to finish the transition
> 
> 
> Maybe klp_try_complete_transition() could iterate the tasks in two
> passes?  The first pass would use rcu_read_lock().  Then if all tasks
> appear to be patched, try again with tasklist_lock.
> 
> Or, we could do something completely different.  There's no need for
> klp_copy_process() to copy the parent's state: a newly forked task can
> be patched immediately because it has no stack.

Is this true, please?

If I get it correctly then copy_process() is used also by fork(2) where
the child continues from fork(2) call. I can't find it in the code
but I suppose that the child should use a copy of the parent's stack
in this case.

Or am I wrong?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-11  6:24 ` [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Yafang Shao
  2025-02-12  0:40   ` Josh Poimboeuf
@ 2025-02-13 11:19   ` Petr Mladek
  2025-02-13 12:32     ` Yafang Shao
  1 sibling, 1 reply; 30+ messages in thread
From: Petr Mladek @ 2025-02-13 11:19 UTC (permalink / raw)
  To: Yafang Shao; +Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching

On Tue 2025-02-11 14:24:36, Yafang Shao wrote:
> I encountered a hard lockup when attempting to reproduce the panic issue
> occurred on our production servers [0]. The hard lockup is as follows,
> 
> [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit

This message does not exist in vanilla kernel. It looks like an extra
debug message. And many extra messages might create stalls on its own.

AFAIK, your reproduced the problem even without these extra messages.
At least, I see the following at
https://lore.kernel.org/r/CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@mail.gmail.com

<paste>
[20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
[20329703.340417] livepatch: 'livepatch_61_release6': starting
patching transition
[20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
10166 jiffies old.
[20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
10219 jiffies old.
[20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
10199 jiffies old.
[20329754.848036] livepatch: 'livepatch_61_release6': patching complete
</paste>

Could you please confirm that this the original _non-filtered_ log?
I mean that the debug messages were _not_ printed and later filtered?

I would like to know more about the system where RCU reported the
stall. How many processes are running there in average?
A rough number is enough. I wonder if it is about 1000, 10000, or
50000?

Also what is the CONFIG_HZ value, please?

Also we should get some statistics how long klp_try_switch_task()
lasts in average. I have never did it but I guess that
it should be rather easy with perf. Or maybe just by looking
at function_graph trace.

I would like to be more sure that klp_try_complete_transition() really
could block RCU for that long. I would like to confirm that
the following is the reality:

  num_processes * average_klp_try_switch_task > 10second

If it is true than we really need to break the cycle after some
timeout. And rcu_read_lock() is _not_ a solution because it would
block RCU the same way.

Does it make sense, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-13 11:19   ` Find root of the stall: was: " Petr Mladek
@ 2025-02-13 12:32     ` Yafang Shao
  2025-02-13 12:39       ` Yafang Shao
  2025-02-14  9:46       ` Petr Mladek
  0 siblings, 2 replies; 30+ messages in thread
From: Yafang Shao @ 2025-02-13 12:32 UTC (permalink / raw)
  To: Petr Mladek; +Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching

On Thu, Feb 13, 2025 at 7:19 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Tue 2025-02-11 14:24:36, Yafang Shao wrote:
> > I encountered a hard lockup when attempting to reproduce the panic issue
> > occurred on our production servers [0]. The hard lockup is as follows,
> >
> > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
>
> This message does not exist in vanilla kernel. It looks like an extra
> debug message. And many extra messages might create stalls on its own.

Right, the dynamic_debug is enabled:

  $ echo 'file kernel/* +p' > /sys/kernel/debug/dynamic_debug/control

>
> AFAIK, your reproduced the problem even without these extra messages.

There are two issues during the KLP transition:
1. RCU warnings
2. hard lockup

RCU stalls can be easily reproduced without the extra messages.
However, hard lockups cannot be reproduced unless dynamic debugging is
enabled.

> At least, I see the following at
> https://lore.kernel.org/r/CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@mail.gmail.com

That's correct, this is related to the RCU warnings issue.

>
> <paste>
> [20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
> [20329703.340417] livepatch: 'livepatch_61_release6': starting
> patching transition
> [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
> 10166 jiffies old.
> [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
> 10219 jiffies old.
> [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
> 10199 jiffies old.
> [20329754.848036] livepatch: 'livepatch_61_release6': patching complete
> </paste>
>
> Could you please confirm that this the original _non-filtered_ log?

Right.

> I mean that the debug messages were _not_ printed and later filtered?

Right.

>
> I would like to know more about the system where RCU reported the
> stall. How many processes are running there in average?
> A rough number is enough. I wonder if it is about 1000, 10000, or
> 50000?

Most of the servers have between 5,000 and 10,000 threads.

>
> Also what is the CONFIG_HZ value, please?

CONFIG_HZ_PERIODIC=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000

>
> Also we should get some statistics how long klp_try_switch_task()
> lasts in average. I have never did it but I guess that
> it should be rather easy with perf. Or maybe just by looking
> at function_graph trace.

Currently, on one of my production servers, CPU usage is around 40%,
and the number of threads is approximately 9,100. The livepatch is
applied every 5 seconds. We can adjust the frequency, but the
difference won't be significant, as RCU stalls are easy to reproduce
even at very low frequencies.
The duration of klp_try_switch_task() is as follows:

$ /usr/share/bcc/tools/funclatency klp_try_switch_task.part.0 -d 60
Tracing 1 functions for "klp_try_switch_task.part.0"... Hit Ctrl-C to end.
     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 1        |                                        |
      1024 -> 2047       : 26665    |***********                             |
      2048 -> 4095       : 93834    |****************************************|
      4096 -> 8191       : 2695     |*                                       |
      8192 -> 16383      : 268      |                                        |
     16384 -> 32767      : 24       |                                        |
     32768 -> 65535      : 2        |                                        |

avg = 2475 nsecs, total: 305745369 nsecs, count: 123489

>
> I would like to be more sure that klp_try_complete_transition() really
> could block RCU for that long. I would like to confirm that
> the following is the reality:
>
>   num_processes * average_klp_try_switch_task > 10second

9100 * 2.475 / 1000 is around 22 second

>
> If it is true than we really need to break the cycle after some
> timeout. And rcu_read_lock() is _not_ a solution because it would
> block RCU the same way.
>
> Does it make sense, please?

Makes sense.
I hope this clarifies things.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-13 12:32     ` Yafang Shao
@ 2025-02-13 12:39       ` Yafang Shao
  2025-02-14  2:44         ` Yafang Shao
  2025-02-14  9:46       ` Petr Mladek
  1 sibling, 1 reply; 30+ messages in thread
From: Yafang Shao @ 2025-02-13 12:39 UTC (permalink / raw)
  To: Petr Mladek; +Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching

On Thu, Feb 13, 2025 at 8:32 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Feb 13, 2025 at 7:19 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Tue 2025-02-11 14:24:36, Yafang Shao wrote:
> > > I encountered a hard lockup when attempting to reproduce the panic issue
> > > occurred on our production servers [0]. The hard lockup is as follows,
> > >
> > > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> > > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> > > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> > > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> > > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> > > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> > > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> > > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> > > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> > > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> > > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> > > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> > > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> > > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
> >
> > This message does not exist in vanilla kernel. It looks like an extra
> > debug message. And many extra messages might create stalls on its own.
>
> Right, the dynamic_debug is enabled:
>
>   $ echo 'file kernel/* +p' > /sys/kernel/debug/dynamic_debug/control
>
> >
> > AFAIK, your reproduced the problem even without these extra messages.
>
> There are two issues during the KLP transition:
> 1. RCU warnings
> 2. hard lockup
>
> RCU stalls can be easily reproduced without the extra messages.
> However, hard lockups cannot be reproduced unless dynamic debugging is
> enabled.
>
> > At least, I see the following at
> > https://lore.kernel.org/r/CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@mail.gmail.com
>
> That's correct, this is related to the RCU warnings issue.
>
> >
> > <paste>
> > [20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
> > [20329703.340417] livepatch: 'livepatch_61_release6': starting
> > patching transition
> > [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
> > 10166 jiffies old.
> > [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
> > 10219 jiffies old.
> > [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
> > 10199 jiffies old.
> > [20329754.848036] livepatch: 'livepatch_61_release6': patching complete
> > </paste>
> >
> > Could you please confirm that this the original _non-filtered_ log?
>
> Right.
>
> > I mean that the debug messages were _not_ printed and later filtered?
>
> Right.
>
> >
> > I would like to know more about the system where RCU reported the
> > stall. How many processes are running there in average?
> > A rough number is enough. I wonder if it is about 1000, 10000, or
> > 50000?
>
> Most of the servers have between 5,000 and 10,000 threads.
>
> >
> > Also what is the CONFIG_HZ value, please?
>
> CONFIG_HZ_PERIODIC=y
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
>
> >
> > Also we should get some statistics how long klp_try_switch_task()
> > lasts in average. I have never did it but I guess that
> > it should be rather easy with perf. Or maybe just by looking
> > at function_graph trace.
>
> Currently, on one of my production servers, CPU usage is around 40%,
> and the number of threads is approximately 9,100. The livepatch is
> applied every 5 seconds. We can adjust the frequency, but the
> difference won't be significant, as RCU stalls are easy to reproduce
> even at very low frequencies.
> The duration of klp_try_switch_task() is as follows:
>
> $ /usr/share/bcc/tools/funclatency klp_try_switch_task.part.0 -d 60
> Tracing 1 functions for "klp_try_switch_task.part.0"... Hit Ctrl-C to end.
>      nsecs               : count     distribution
>          0 -> 1          : 0        |                                        |
>          2 -> 3          : 0        |                                        |
>          4 -> 7          : 0        |                                        |
>          8 -> 15         : 0        |                                        |
>         16 -> 31         : 0        |                                        |
>         32 -> 63         : 0        |                                        |
>         64 -> 127        : 0        |                                        |
>        128 -> 255        : 0        |                                        |
>        256 -> 511        : 0        |                                        |
>        512 -> 1023       : 1        |                                        |
>       1024 -> 2047       : 26665    |***********                             |
>       2048 -> 4095       : 93834    |****************************************|
>       4096 -> 8191       : 2695     |*                                       |
>       8192 -> 16383      : 268      |                                        |
>      16384 -> 32767      : 24       |                                        |
>      32768 -> 65535      : 2        |                                        |
>
> avg = 2475 nsecs, total: 305745369 nsecs, count: 123489
>
> >
> > I would like to be more sure that klp_try_complete_transition() really
> > could block RCU for that long. I would like to confirm that
> > the following is the reality:
> >
> >   num_processes * average_klp_try_switch_task > 10second
>
> 9100 * 2.475 / 1000 is around 22 second

It looks like I misread the nsec as usec, so the total average
duration is actually around 22ms. ;)
I’ll double-check it to confirm.

>
> >
> > If it is true than we really need to break the cycle after some
> > timeout. And rcu_read_lock() is _not_ a solution because it would
> > block RCU the same way.
> >
> > Does it make sense, please?
>
> Makes sense.
> I hope this clarifies things.
>
> --
> Regards
> Yafang



-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-13  9:48             ` Petr Mladek
@ 2025-02-13 17:32               ` Josh Poimboeuf
  2025-02-14 14:44                 ` Petr Mladek
  0 siblings, 1 reply; 30+ messages in thread
From: Josh Poimboeuf @ 2025-02-13 17:32 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Yafang Shao, jikos, mbenes, joe.lawrence, live-patching

On Thu, Feb 13, 2025 at 10:48:27AM +0100, Petr Mladek wrote:
> On Wed 2025-02-12 17:36:03, Josh Poimboeuf wrote:
> > Or, we could do something completely different.  There's no need for
> > klp_copy_process() to copy the parent's state: a newly forked task can
> > be patched immediately because it has no stack.
> 
> Is this true, please?
> 
> If I get it correctly then copy_process() is used also by fork(2) where
> the child continues from fork(2) call. I can't find it in the code
> but I suppose that the child should use a copy of the parent's stack
> in this case.

The child's *user* stack is a copy, but the kernel stack is empty.

On x86, before adding it to the task list, copy->process() ->
copy_thread() sets the child's kernel stack pointer to empty (pointing
to 'struct inactive_task_frame' adjacent to user pt_regs) and sets the
saved instruction pointer (frame->ret_addr) to 'ret_from_fork_asm'.

Then later when the child first gets scheduled, __switch_to_asm()
switches to the new stack and pops most of the inactive_task_frame,
except for the 'ret_from_fork_asm' return value which remains on the top
of the stack.  Then it jumps to __switch_to() which then "returns" to
ret_from_fork_asm().

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-13 12:39       ` Yafang Shao
@ 2025-02-14  2:44         ` Yafang Shao
  2025-02-14  8:36           ` Josh Poimboeuf
  0 siblings, 1 reply; 30+ messages in thread
From: Yafang Shao @ 2025-02-14  2:44 UTC (permalink / raw)
  To: Petr Mladek; +Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching

On Thu, Feb 13, 2025 at 8:39 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Feb 13, 2025 at 8:32 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Feb 13, 2025 at 7:19 PM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Tue 2025-02-11 14:24:36, Yafang Shao wrote:
> > > > I encountered a hard lockup when attempting to reproduce the panic issue
> > > > occurred on our production servers [0]. The hard lockup is as follows,
> > > >
> > > > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> > > > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> > > > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> > > > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> > > > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> > > > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> > > > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> > > > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> > > > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> > > > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> > > > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> > > > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> > > > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> > > > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
> > >
> > > This message does not exist in vanilla kernel. It looks like an extra
> > > debug message. And many extra messages might create stalls on its own.
> >
> > Right, the dynamic_debug is enabled:
> >
> >   $ echo 'file kernel/* +p' > /sys/kernel/debug/dynamic_debug/control
> >
> > >
> > > AFAIK, your reproduced the problem even without these extra messages.
> >
> > There are two issues during the KLP transition:
> > 1. RCU warnings
> > 2. hard lockup
> >
> > RCU stalls can be easily reproduced without the extra messages.
> > However, hard lockups cannot be reproduced unless dynamic debugging is
> > enabled.
> >
> > > At least, I see the following at
> > > https://lore.kernel.org/r/CALOAHbB8j6RrpJAyRkzPx2U6YhjWEipRspoQQ_7cvQ+M0zgdXg@mail.gmail.com
> >
> > That's correct, this is related to the RCU warnings issue.
> >
> > >
> > > <paste>
> > > [20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
> > > [20329703.340417] livepatch: 'livepatch_61_release6': starting
> > > patching transition
> > > [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
> > > 10166 jiffies old.
> > > [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
> > > 10219 jiffies old.
> > > [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
> > > 10199 jiffies old.
> > > [20329754.848036] livepatch: 'livepatch_61_release6': patching complete
> > > </paste>
> > >
> > > Could you please confirm that this the original _non-filtered_ log?
> >
> > Right.
> >
> > > I mean that the debug messages were _not_ printed and later filtered?
> >
> > Right.
> >
> > >
> > > I would like to know more about the system where RCU reported the
> > > stall. How many processes are running there in average?
> > > A rough number is enough. I wonder if it is about 1000, 10000, or
> > > 50000?
> >
> > Most of the servers have between 5,000 and 10,000 threads.
> >
> > >
> > > Also what is the CONFIG_HZ value, please?
> >
> > CONFIG_HZ_PERIODIC=y
> > CONFIG_HZ_1000=y
> > CONFIG_HZ=1000
> >
> > >
> > > Also we should get some statistics how long klp_try_switch_task()
> > > lasts in average. I have never did it but I guess that
> > > it should be rather easy with perf. Or maybe just by looking
> > > at function_graph trace.
> >
> > Currently, on one of my production servers, CPU usage is around 40%,
> > and the number of threads is approximately 9,100. The livepatch is
> > applied every 5 seconds. We can adjust the frequency, but the
> > difference won't be significant, as RCU stalls are easy to reproduce
> > even at very low frequencies.
> > The duration of klp_try_switch_task() is as follows:
> >
> > $ /usr/share/bcc/tools/funclatency klp_try_switch_task.part.0 -d 60
> > Tracing 1 functions for "klp_try_switch_task.part.0"... Hit Ctrl-C to end.
> >      nsecs               : count     distribution
> >          0 -> 1          : 0        |                                        |
> >          2 -> 3          : 0        |                                        |
> >          4 -> 7          : 0        |                                        |
> >          8 -> 15         : 0        |                                        |
> >         16 -> 31         : 0        |                                        |
> >         32 -> 63         : 0        |                                        |
> >         64 -> 127        : 0        |                                        |
> >        128 -> 255        : 0        |                                        |
> >        256 -> 511        : 0        |                                        |
> >        512 -> 1023       : 1        |                                        |
> >       1024 -> 2047       : 26665    |***********                             |
> >       2048 -> 4095       : 93834    |****************************************|
> >       4096 -> 8191       : 2695     |*                                       |
> >       8192 -> 16383      : 268      |                                        |
> >      16384 -> 32767      : 24       |                                        |
> >      32768 -> 65535      : 2        |                                        |
> >
> > avg = 2475 nsecs, total: 305745369 nsecs, count: 123489
> >
> > >
> > > I would like to be more sure that klp_try_complete_transition() really
> > > could block RCU for that long. I would like to confirm that
> > > the following is the reality:
> > >
> > >   num_processes * average_klp_try_switch_task > 10second
> >
> > 9100 * 2.475 / 1000 is around 22 second
>
> It looks like I misread the nsec as usec, so the total average
> duration is actually around 22ms. ;)
> I’ll double-check it to confirm.

I've confirmed that during RCU stalls, the average_klp_try_switch_task
value isn't unusually large. However, the real issue lies in the
extended duration of the klp_try_complete_transition().

- The RCU stall

[Fri Feb 14 10:14:56 2025] livepatch: enabling patch 'livepatch_61_release6'
[Fri Feb 14 10:14:56 2025] livepatch: 'livepatch_61_release6':
starting patching transition
[Fri Feb 14 10:15:10 2025] rcu_tasks_wait_gp: rcu_tasks grace period
32001 is 10073 jiffies old.
[Fri Feb 14 10:15:14 2025] livepatch: 'livepatch_61_release6': patching complete


- klp_try_switch_task
10:17:50
     nsecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 0        |                                        |
        64 -> 127        : 0        |                                        |
       128 -> 255        : 2        |                                        |
       256 -> 511        : 5        |                                        |
       512 -> 1023       : 3        |                                        |
      1024 -> 2047       : 806      |*                                       |
      2048 -> 4095       : 30382    |****************************************|
      4096 -> 8191       : 10806    |**************                          |
      8192 -> 16383      : 65       |                                        |
     16384 -> 32767      : 9        |                                        |

avg = 3562 nsecs, total: 149912461 nsecs, count: 42078

The avg of klp_try_switch_task is only 3.5us.

- number of threads

Fri Feb 14 10:14:48
12106
Fri Feb 14 10:14:59
11900
Fri Feb 14 10:15:10
12004
Fri Feb 14 10:15:20
11899
Fri Feb 14 10:15:31
12293
Fri Feb 14 10:15:42
11831

At that point, the system has around 12,000 threads, so the total
duration comes out to approximately 12,000 * 3.5µs, or about 42ms—well
below the 10s threshold.
However, the duration of klp_try_complete_transition() exceeds 10 seconds.

10:15:41
               nsecs                         : count     distribution
                   0 -> 1                    : 0        |                    |
                   2 -> 3                    : 0        |                    |
                   4 -> 7                    : 0        |                    |
                   8 -> 15                   : 0        |                    |
                  16 -> 31                   : 0        |                    |
                  32 -> 63                   : 0        |                    |
                  64 -> 127                  : 0        |                    |
                 128 -> 255                  : 0        |                    |
                 256 -> 511                  : 0        |                    |
                 512 -> 1023                 : 0        |                    |
                1024 -> 2047                 : 0        |                    |
                2048 -> 4095                 : 0        |                    |
                4096 -> 8191                 : 0        |                    |
                8192 -> 16383                : 0        |                    |
               16384 -> 32767                : 0        |                    |
               32768 -> 65535                : 0        |                    |
               65536 -> 131071               : 0        |                    |
              131072 -> 262143               : 0        |                    |
              262144 -> 524287               : 0        |                    |
              524288 -> 1048575              : 0        |                    |
             1048576 -> 2097151              : 0        |                    |
             2097152 -> 4194303              : 0        |                    |
             4194304 -> 8388607              : 0        |                    |
             8388608 -> 16777215             : 0        |                    |
            16777216 -> 33554431             : 0        |                    |
            33554432 -> 67108863             : 1        |********************|
            67108864 -> 134217727            : 1        |********************|
           134217728 -> 268435455            : 0        |                    |
           268435456 -> 536870911            : 0        |                    |
           536870912 -> 1073741823           : 0        |                    |
          1073741824 -> 2147483647           : 0        |                    |
          2147483648 -> 4294967295           : 0        |                    |
          4294967296 -> 8589934591           : 1        |********************|
          8589934592 -> 17179869183          : 1        |********************|

avg = 5630258522 nsecs, total: 22521034091 nsecs, count: 4

The longest duration of klp_try_complete_transition() ranges from 8.5
to 17.2 seconds.

It appears that the RCU stall is not only driven by num_processes *
average_klp_try_switch_task, but also by contention within
klp_try_complete_transition(), particularly around the tasklist_lock.
Interestingly, even after replacing "read_lock(&tasklist_lock)" with
"rcu_read_lock()", the RCU stall persists. My verification shows that
the only way to prevent the stall is by checking need_resched() during
each iteration of the loop.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-14  2:44         ` Yafang Shao
@ 2025-02-14  8:36           ` Josh Poimboeuf
  2025-02-14 11:37             ` Petr Mladek
  0 siblings, 1 reply; 30+ messages in thread
From: Josh Poimboeuf @ 2025-02-14  8:36 UTC (permalink / raw)
  To: Yafang Shao; +Cc: Petr Mladek, jikos, mbenes, joe.lawrence, live-patching

On Fri, Feb 14, 2025 at 10:44:59AM +0800, Yafang Shao wrote:
> The longest duration of klp_try_complete_transition() ranges from 8.5
> to 17.2 seconds.
> 
> It appears that the RCU stall is not only driven by num_processes *
> average_klp_try_switch_task, but also by contention within
> klp_try_complete_transition(), particularly around the tasklist_lock.
> Interestingly, even after replacing "read_lock(&tasklist_lock)" with
> "rcu_read_lock()", the RCU stall persists. My verification shows that
> the only way to prevent the stall is by checking need_resched() during
> each iteration of the loop.

I'm confused... rcu_read_lock() shouldn't cause any contention, right?
So if klp_try_switch_task() isn't the problem, then what is?

I wonder if those function timings might be misleading.  If
klp_try_complete_transition() gets preempted immediately when it
releases the lock, it could take a while before it eventually returns.
So that funclatency might not be telling the whole story.

Though 8.5 - 17.2 seconds is a bit excessive...

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-13 12:32     ` Yafang Shao
  2025-02-13 12:39       ` Yafang Shao
@ 2025-02-14  9:46       ` Petr Mladek
  1 sibling, 0 replies; 30+ messages in thread
From: Petr Mladek @ 2025-02-14  9:46 UTC (permalink / raw)
  To: Yafang Shao; +Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching

On Thu 2025-02-13 20:32:19, Yafang Shao wrote:
> On Thu, Feb 13, 2025 at 7:19 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Tue 2025-02-11 14:24:36, Yafang Shao wrote:
> > > I encountered a hard lockup when attempting to reproduce the panic issue
> > > occurred on our production servers [0]. The hard lockup is as follows,
> > >
> > > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106 is sleeping on function do_exit
> > > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244 is sleeping on function do_exit
> > > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457 is sleeping on function do_exit
> > > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407 is sleeping on function do_exit
> > > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184 is sleeping on function do_exit
> > > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582 is sleeping on function do_exit
> > > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738 is sleeping on function do_exit
> > > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739 is sleeping on function do_exit
> > > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833 is sleeping on function do_exit
> > > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893 is sleeping on function do_exit
> > > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894 is sleeping on function do_exit
> > > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976 is sleeping on function do_exit
> > > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977 is sleeping on function do_exit
> > > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610 is sleeping on function do_exit
> >
> > This message does not exist in vanilla kernel. It looks like an extra
> > debug message. And many extra messages might create stalls on its own.
> 
> Right, the dynamic_debug is enabled:
> 
>   $ echo 'file kernel/* +p' > /sys/kernel/debug/dynamic_debug/control
> 
> >
> > AFAIK, your reproduced the problem even without these extra messages.
> 
> There are two issues during the KLP transition:
> 1. RCU warnings
> 2. hard lockup
> 
> RCU stalls can be easily reproduced without the extra messages.
> However, hard lockups cannot be reproduced unless dynamic debugging is
> enabled.

OK, I would ignore the hard lockup for now. I believe that it is
related to flushing the debug messages on the console. And a solution
for the RCU stall might likely solve this as well. Also the debug
messages are not enabled on production systems...

We should debug the RCU warning without these debug messages!

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-14  8:36           ` Josh Poimboeuf
@ 2025-02-14 11:37             ` Petr Mladek
  2025-02-18  2:19               ` Yafang Shao
  0 siblings, 1 reply; 30+ messages in thread
From: Petr Mladek @ 2025-02-14 11:37 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Yafang Shao, jikos, mbenes, joe.lawrence, live-patching

On Fri 2025-02-14 00:36:03, Josh Poimboeuf wrote:
> On Fri, Feb 14, 2025 at 10:44:59AM +0800, Yafang Shao wrote:
> > The longest duration of klp_try_complete_transition() ranges from 8.5
> > to 17.2 seconds.
> > 
> > It appears that the RCU stall is not only driven by num_processes *
> > average_klp_try_switch_task, but also by contention within
> > klp_try_complete_transition(), particularly around the tasklist_lock.
> > Interestingly, even after replacing "read_lock(&tasklist_lock)" with
> > "rcu_read_lock()", the RCU stall persists. My verification shows that
> > the only way to prevent the stall is by checking need_resched() during
> > each iteration of the loop.
> 
> I'm confused... rcu_read_lock() shouldn't cause any contention, right?
> So if klp_try_switch_task() isn't the problem, then what is?

I agree that it does not make much sense.

> I wonder if those function timings might be misleading.  If
> klp_try_complete_transition() gets preempted immediately when it
> releases the lock, it could take a while before it eventually returns.
> So that funclatency might not be telling the whole story.

The scheduling might be an explanation.

> Though 8.5 - 17.2 seconds is a bit excessive...

If klp_try_complete_transition() scheduled out and we see this delay
then the system likely had a pretty high load at the moment.
Is it possible?

Yafang, just to be sure. Have you seen these numbers with
the original klp_try_complete_transition() code and with debug
messages disabled?

Or did you saw them with some extra debugging code or other
modifications?

Also just to be sure. Is this on bare metal?

Finally, what preemption mode are you using? Which CONFIG_PREEMPT*?

Best regards,
Petr

PS: JFYI, I have vacation the following week and won't have
    access to mails...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-13 17:32               ` Josh Poimboeuf
@ 2025-02-14 14:44                 ` Petr Mladek
  2025-02-14 18:12                   ` Josh Poimboeuf
  0 siblings, 1 reply; 30+ messages in thread
From: Petr Mladek @ 2025-02-14 14:44 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Yafang Shao, jikos, mbenes, joe.lawrence, live-patching

On Thu 2025-02-13 09:32:53, Josh Poimboeuf wrote:
> On Thu, Feb 13, 2025 at 10:48:27AM +0100, Petr Mladek wrote:
> > On Wed 2025-02-12 17:36:03, Josh Poimboeuf wrote:
> > > Or, we could do something completely different.  There's no need for
> > > klp_copy_process() to copy the parent's state: a newly forked task can
> > > be patched immediately because it has no stack.
> > 
> > Is this true, please?
> > 
> > If I get it correctly then copy_process() is used also by fork(2) where
> > the child continues from fork(2) call. I can't find it in the code
> > but I suppose that the child should use a copy of the parent's stack
> > in this case.
> 
> The child's *user* stack is a copy, but the kernel stack is empty.
> 
> On x86, before adding it to the task list, copy->process() ->
> copy_thread() sets the child's kernel stack pointer to empty (pointing
> to 'struct inactive_task_frame' adjacent to user pt_regs) and sets the
> saved instruction pointer (frame->ret_addr) to 'ret_from_fork_asm'.
> 
> Then later when the child first gets scheduled, __switch_to_asm()
> switches to the new stack and pops most of the inactive_task_frame,
> except for the 'ret_from_fork_asm' return value which remains on the top
> of the stack.  Then it jumps to __switch_to() which then "returns" to
> ret_from_fork_asm().

Right. Only the *user* stack is a copy.

I guess that we really could consider the new task as migrated
and clear TIF_PATCH_PENDING.

But we can't set child->patch_state to KLP_TRANSITION_IDLE. It won't
work when the transition gets reverted. [*]

Best Regards,
Petr

[*] I gave this few brain cycles but I did not find any elegant
    way how to set this a safe way and allow using rcu_read_lock()
    in klp_try_complete_transition().

    It might be because it is Friday evening and I am leaving for
    a trip tomorrow. Also I not motivated enough to think about it
    because Yafang saw the RCU stall even with that rcu_read_lock().
    So I send this just for record.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-14 14:44                 ` Petr Mladek
@ 2025-02-14 18:12                   ` Josh Poimboeuf
  2025-02-18  2:37                     ` Yafang Shao
  0 siblings, 1 reply; 30+ messages in thread
From: Josh Poimboeuf @ 2025-02-14 18:12 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Yafang Shao, jikos, mbenes, joe.lawrence, live-patching

On Fri, Feb 14, 2025 at 03:44:10PM +0100, Petr Mladek wrote:
> I guess that we really could consider the new task as migrated
> and clear TIF_PATCH_PENDING.
> 
> But we can't set child->patch_state to KLP_TRANSITION_IDLE. It won't
> work when the transition gets reverted. [*]

Hm, why not?  I don't see any difference between patching or unpatching?

klp_init_transition() has a barrier to enforce the order of the
klp_target_state and func->transition writes, as read by the
klp_ftrace_handler().

So in the ftrace handler, if func->transition is set and the task is
KLP_TRANSITION_IDLE, it can use klp_target_state to decide which
function to use.  Its patch state would effectively be the same as any
other already-transitioned task, whether it's patching or unpatching.

Then in klp_complete_transition(), after func->transition gets set to
false, klp_synchronize_transition() flushes out any running ftrace
handlers.  From that point on, func->transition is false, so the ftrace
handler would no longer read klp_target_state.

> [*] I gave this few brain cycles but I did not find any elegant
>     way how to set this a safe way and allow using rcu_read_lock()
>     in klp_try_complete_transition().
> 
>     It might be because it is Friday evening and I am leaving for
>     a trip tomorrow. Also I not motivated enough to think about it
>     because Yafang saw the RCU stall even with that rcu_read_lock().
>     So I send this just for record.

Even if it doesn't fix the RCU stalls, I think we should still try to
avoid holding the tasklist_lock.  It's a global lock which can be
contended, and we want the livepatch transition to be as unobtrusive as
possible.

If the system is doing a lot of forking across many CPUs, holding the
lock could block all the forking tasks and trigger system-wide
scheduling latencies.  And that could be compounded by the unnecessary
transitioning of new tasks every time the delayed work runs.

-- 
Josh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Find root of the stall: was: Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-14 11:37             ` Petr Mladek
@ 2025-02-18  2:19               ` Yafang Shao
  0 siblings, 0 replies; 30+ messages in thread
From: Yafang Shao @ 2025-02-18  2:19 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Josh Poimboeuf, jikos, mbenes, joe.lawrence, live-patching

On Fri, Feb 14, 2025 at 7:37 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2025-02-14 00:36:03, Josh Poimboeuf wrote:
> > On Fri, Feb 14, 2025 at 10:44:59AM +0800, Yafang Shao wrote:
> > > The longest duration of klp_try_complete_transition() ranges from 8.5
> > > to 17.2 seconds.
> > >
> > > It appears that the RCU stall is not only driven by num_processes *
> > > average_klp_try_switch_task, but also by contention within
> > > klp_try_complete_transition(), particularly around the tasklist_lock.
> > > Interestingly, even after replacing "read_lock(&tasklist_lock)" with
> > > "rcu_read_lock()", the RCU stall persists. My verification shows that
> > > the only way to prevent the stall is by checking need_resched() during
> > > each iteration of the loop.
> >
> > I'm confused... rcu_read_lock() shouldn't cause any contention, right?
> > So if klp_try_switch_task() isn't the problem, then what is?
>
> I agree that it does not make much sense.

I'm confused too and trying to understand it better.

>
> > I wonder if those function timings might be misleading.  If
> > klp_try_complete_transition() gets preempted immediately when it
> > releases the lock, it could take a while before it eventually returns.
> > So that funclatency might not be telling the whole story.
>
> The scheduling might be an explanation.
>
> > Though 8.5 - 17.2 seconds is a bit excessive...
>
> If klp_try_complete_transition() scheduled out and we see this delay
> then the system likely had a pretty high load at the moment.
> Is it possible?

It appears to be workload-related. The RCU warning occurred at
specific time periods, likely due to certain workloads running at
those times, though I haven't confirmed it yet.

>
> Yafang, just to be sure. Have you seen these numbers with
> the original klp_try_complete_transition() code and with debug
> messages disabled?

Right. These RCU warnings appeared on our production servers without
any debugging enabled, and klp_try_complete_transition() hasn't
changed either.

>
> Or did you saw them with some extra debugging code or other
> modifications?

No, these are the default production settings as they originally were.

>
> Also just to be sure. Is this on bare metal?

Yes.

>
> Finally, what preemption mode are you using? Which CONFIG_PREEMPT*?

The preemption configuration is as follows:

CONFIG_PREEMPT_BUILD=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not set


> PS: JFYI, I have vacation the following week and won't have
>    access to mails...

Enjoy your holiday

--
Regards



Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long
  2025-02-14 18:12                   ` Josh Poimboeuf
@ 2025-02-18  2:37                     ` Yafang Shao
  0 siblings, 0 replies; 30+ messages in thread
From: Yafang Shao @ 2025-02-18  2:37 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: Petr Mladek, jikos, mbenes, joe.lawrence, live-patching

On Sat, Feb 15, 2025 at 2:12 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Fri, Feb 14, 2025 at 03:44:10PM +0100, Petr Mladek wrote:
> > I guess that we really could consider the new task as migrated
> > and clear TIF_PATCH_PENDING.
> >
> > But we can't set child->patch_state to KLP_TRANSITION_IDLE. It won't
> > work when the transition gets reverted. [*]
>
> Hm, why not?  I don't see any difference between patching or unpatching?
>
> klp_init_transition() has a barrier to enforce the order of the
> klp_target_state and func->transition writes, as read by the
> klp_ftrace_handler().
>
> So in the ftrace handler, if func->transition is set and the task is
> KLP_TRANSITION_IDLE, it can use klp_target_state to decide which
> function to use.  Its patch state would effectively be the same as any
> other already-transitioned task, whether it's patching or unpatching.
>
> Then in klp_complete_transition(), after func->transition gets set to
> false, klp_synchronize_transition() flushes out any running ftrace
> handlers.  From that point on, func->transition is false, so the ftrace
> handler would no longer read klp_target_state.
>
> > [*] I gave this few brain cycles but I did not find any elegant
> >     way how to set this a safe way and allow using rcu_read_lock()
> >     in klp_try_complete_transition().
> >
> >     It might be because it is Friday evening and I am leaving for
> >     a trip tomorrow. Also I not motivated enough to think about it
> >     because Yafang saw the RCU stall even with that rcu_read_lock().
> >     So I send this just for record.
>
> Even if it doesn't fix the RCU stalls, I think we should still try to
> avoid holding the tasklist_lock.  It's a global lock which can be
> contended, and we want the livepatch transition to be as unobtrusive as
> possible.

I agree. Since it's feasible to eliminate this global lock, I think
it's worth pursuing this optimization.

>
> If the system is doing a lot of forking across many CPUs, holding the
> lock could block all the forking tasks and trigger system-wide
> scheduling latencies.  And that could be compounded by the unnecessary
> transitioning of new tasks every time the delayed work runs.


--
Regards

Yafang

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-02-18  2:37 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-11  6:24 [PATCH 0/3] livepatch: Some improvements Yafang Shao
2025-02-11  6:24 ` [PATCH 1/3] livepatch: Add comment to clarify klp_add_nops() Yafang Shao
2025-02-12 12:51   ` Petr Mladek
2025-02-13  5:49     ` Yafang Shao
2025-02-11  6:24 ` [PATCH 2/3] livepatch: Avoid blocking tasklist_lock too long Yafang Shao
2025-02-12  0:40   ` Josh Poimboeuf
2025-02-12  2:34     ` Yafang Shao
2025-02-12 11:54       ` Yafang Shao
2025-02-12 15:42         ` Petr Mladek
2025-02-13  1:36           ` Josh Poimboeuf
2025-02-13  5:53             ` Yafang Shao
2025-02-13  9:48             ` Petr Mladek
2025-02-13 17:32               ` Josh Poimboeuf
2025-02-14 14:44                 ` Petr Mladek
2025-02-14 18:12                   ` Josh Poimboeuf
2025-02-18  2:37                     ` Yafang Shao
2025-02-13  2:47         ` Josh Poimboeuf
2025-02-13 11:19   ` Find root of the stall: was: " Petr Mladek
2025-02-13 12:32     ` Yafang Shao
2025-02-13 12:39       ` Yafang Shao
2025-02-14  2:44         ` Yafang Shao
2025-02-14  8:36           ` Josh Poimboeuf
2025-02-14 11:37             ` Petr Mladek
2025-02-18  2:19               ` Yafang Shao
2025-02-14  9:46       ` Petr Mladek
2025-02-11  6:24 ` [PATCH 3/3] livepatch: Avoid potential RCU stalls in klp transition Yafang Shao
2025-02-12  0:52   ` Josh Poimboeuf
2025-02-12  2:42     ` Yafang Shao
2025-02-13  1:58       ` Josh Poimboeuf
2025-02-13  5:51         ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox