linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] livepatch: Add support for hybrid mode
@ 2025-01-27  6:35 Yafang Shao
  2025-01-27  6:35 ` [RFC PATCH 1/2] livepatch: Add replaceable attribute Yafang Shao
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Yafang Shao @ 2025-01-27  6:35 UTC (permalink / raw)
  To: jpoimboe, jikos, mbenes, pmladek, joe.lawrence
  Cc: live-patching, linux-kernel, Yafang Shao

The atomic replace livepatch mechanism was introduced to handle scenarios
where we want to unload a specific livepatch without unloading others.
However, its current implementation has significant shortcomings, making
it less than ideal in practice. Below are the key downsides:

- It is expensive

  During testing with frequent replacements of an old livepatch, random RCU
  warnings were observed:

  [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
  [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
  [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
  [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
  [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
  [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
  [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
  [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
  [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
  [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
  [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
  
  This indicates that atomic replacement can cause performance issues,
  particularly with RCU synchronization under frequent use.

- Potential Risks During Replacement 

  One known issue involves replacing livepatched versions of critical
  functions such as do_exit(). During the replacement process, a panic
  might occur, as highlighted in [0]. Other potential risks may also arise
  due to inconsistencies or race conditions during transitions.

- Temporary Loss of Patching 

  During the replacement process, the old patch is set to a NOP (no-operation)
  before the new patch is fully applied. This creates a window where the
  function temporarily reverts to its original, unpatched state. If the old
  patch fixed a critical issue (e.g., one that prevented a system panic), the
  system could become vulnerable to that issue during the transition.

The current atomic replacement approach replaces all old livepatches,
even when such a sweeping change is unnecessary. This can be improved
by introducing a hybrid mode, which allows the coexistence of both
atomic replace and non atomic replace livepatches.

In the hybrid mode:

- Specific livepatches can be marked as "non-replaceable" to ensure they
  remain active and unaffected during replacements.

- Other livepatches can be marked as "replaceable", allowing targeted
  replacements of only those patches.

This selective approach would reduce unnecessary transitions, lower the
risk of temporary patch loss, and mitigate performance issues during
livepatch replacement.


Future work:
- Support it in kpatch[1]

Link: https://lore.kernel.org/live-patching/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com/ [0]
Link: https://github.com/dynup/kpatch [1]

Yafang Shao (2):
  livepatch: Add replaceable attribute
  livepatch: Implement livepatch hybrid mode

 include/linux/livepatch.h |  2 ++
 kernel/livepatch/core.c   | 50 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH 1/2] livepatch: Add replaceable attribute
  2025-01-27  6:35 [RFC PATCH 0/2] livepatch: Add support for hybrid mode Yafang Shao
@ 2025-01-27  6:35 ` Yafang Shao
  2025-01-27  6:35 ` [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode Yafang Shao
  2025-01-27 13:46 ` [RFC PATCH 0/2] livepatch: Add support for " Petr Mladek
  2 siblings, 0 replies; 36+ messages in thread
From: Yafang Shao @ 2025-01-27  6:35 UTC (permalink / raw)
  To: jpoimboe, jikos, mbenes, pmladek, joe.lawrence
  Cc: live-patching, linux-kernel, Yafang Shao

Add a new attribute "replaceable" to allow the coexsist of both atomic
replace livepatch and non atomic replace livepatch. If the replaceable is
set to 0, the livepatch won't be replaced by a atomic replace livepatch.

This is a preparation for the followup patch.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/livepatch.h |  2 ++
 kernel/livepatch/core.c   | 44 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 51a258c24ff5..f2e962aab5b0 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -147,6 +147,7 @@ struct klp_state {
  * @objs:	object entries for kernel objects to be patched
  * @states:	system states that can get modified
  * @replace:	replace all actively used patches
+ * @replaceable:	whether this patch can be replaced or not
  * @list:	list node for global list of actively used patches
  * @kobj:	kobject for sysfs resources
  * @obj_list:	dynamic list of the object entries
@@ -161,6 +162,7 @@ struct klp_patch {
 	struct klp_object *objs;
 	struct klp_state *states;
 	bool replace;
+	bool replaceable;
 
 	/* internal */
 	struct list_head list;
diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 0cd39954d5a1..5e0c2caa0af8 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -347,6 +347,7 @@ int klp_apply_section_relocs(struct module *pmod, Elf_Shdr *sechdrs,
  * /sys/kernel/livepatch/<patch>/transition
  * /sys/kernel/livepatch/<patch>/force
  * /sys/kernel/livepatch/<patch>/replace
+ * /sys/kernel/livepatch/<patch>/replaceable
  * /sys/kernel/livepatch/<patch>/stack_order
  * /sys/kernel/livepatch/<patch>/<object>
  * /sys/kernel/livepatch/<patch>/<object>/patched
@@ -474,17 +475,60 @@ static ssize_t stack_order_show(struct kobject *kobj,
 	return sysfs_emit(buf, "%d\n", stack_order);
 }
 
+static ssize_t replaceable_store(struct kobject *kobj, struct kobj_attribute *attr,
+				const char *buf, size_t count)
+{
+	struct klp_patch *patch;
+	bool replaceable;
+	int ret;
+
+	ret = kstrtobool(buf, &replaceable);
+	if (ret)
+		return ret;
+
+	patch = container_of(kobj, struct klp_patch, kobj);
+
+	mutex_lock(&klp_mutex);
+
+	if (patch->replaceable == replaceable)
+		goto out;
+
+	if (patch == klp_transition_patch) {
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	patch->replaceable = replaceable;
+
+out:
+	mutex_unlock(&klp_mutex);
+
+	if (ret)
+		return ret;
+	return count;
+}
+static ssize_t replaceable_show(struct kobject *kobj,
+			       struct kobj_attribute *attr, char *buf)
+{
+	struct klp_patch *patch;
+
+	patch = container_of(kobj, struct klp_patch, kobj);
+	return sysfs_emit(buf, "%d\n", patch->replaceable);
+}
+
 static struct kobj_attribute enabled_kobj_attr = __ATTR_RW(enabled);
 static struct kobj_attribute transition_kobj_attr = __ATTR_RO(transition);
 static struct kobj_attribute force_kobj_attr = __ATTR_WO(force);
 static struct kobj_attribute replace_kobj_attr = __ATTR_RO(replace);
 static struct kobj_attribute stack_order_kobj_attr = __ATTR_RO(stack_order);
+static struct kobj_attribute replaceable_kobj_attr = __ATTR_RW(replaceable);
 static struct attribute *klp_patch_attrs[] = {
 	&enabled_kobj_attr.attr,
 	&transition_kobj_attr.attr,
 	&force_kobj_attr.attr,
 	&replace_kobj_attr.attr,
 	&stack_order_kobj_attr.attr,
+	&replaceable_kobj_attr.attr,
 	NULL
 };
 ATTRIBUTE_GROUPS(klp_patch);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-01-27  6:35 [RFC PATCH 0/2] livepatch: Add support for hybrid mode Yafang Shao
  2025-01-27  6:35 ` [RFC PATCH 1/2] livepatch: Add replaceable attribute Yafang Shao
@ 2025-01-27  6:35 ` Yafang Shao
  2025-01-27 14:31   ` Petr Mladek
  2025-02-07  2:31   ` Josh Poimboeuf
  2025-01-27 13:46 ` [RFC PATCH 0/2] livepatch: Add support for " Petr Mladek
  2 siblings, 2 replies; 36+ messages in thread
From: Yafang Shao @ 2025-01-27  6:35 UTC (permalink / raw)
  To: jpoimboe, jikos, mbenes, pmladek, joe.lawrence
  Cc: live-patching, linux-kernel, Yafang Shao

The atomic replace livepatch mechanism was introduced to handle scenarios
where we want to unload a specific livepatch without unloading others.
However, its current implementation has significant shortcomings, making
it less than ideal in practice. Below are the key downsides:

- It is expensive

  During testing with frequent replacements of an old livepatch, random RCU
  warnings were observed:

  [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
  [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
  [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
  [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
  [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
  [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
  [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
  [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
  [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
  [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
  [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
  
  This indicates that atomic replacement can cause performance issues,
  particularly with RCU synchronization under frequent use.

- Potential Risks During Replacement 

  One known issue involves replacing livepatched versions of critical
  functions such as do_exit(). During the replacement process, a panic
  might occur, as highlighted in [0]. Other potential risks may also arise
  due to inconsistencies or race conditions during transitions.

- Temporary Loss of Patching 

  During the replacement process, the old patch is set to a NOP (no-operation)
  before the new patch is fully applied. This creates a window where the
  function temporarily reverts to its original, unpatched state. If the old
  patch fixed a critical issue (e.g., one that prevented a system panic), the
  system could become vulnerable to that issue during the transition.

The current atomic replacement approach replaces all old livepatches,
even when such a sweeping change is unnecessary. This can be improved
by introducing a hybrid mode, which allows the coexistence of both
atomic replace and non atomic replace livepatches.

In the hybrid mode:

- Specific livepatches can be marked as "non-replaceable" to ensure they
  remain active and unaffected during replacements.

- Other livepatches can be marked as "replaceable," allowing targeted
  replacements of only those patches.

This selective approach would reduce unnecessary transitions, lower the
risk of temporary patch loss, and mitigate performance issues during
livepatch replacement.

Link: https://lore.kernel.org/live-patching/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com/ [0]
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/livepatch/core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 5e0c2caa0af8..f820b50c1b26 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
 		klp_for_each_object(old_patch, old_obj) {
 			int err;
 
+			if (!old_patch->replaceable)
+				continue;
 			err = klp_add_object_nops(patch, old_obj);
 			if (err)
 				return err;
@@ -830,6 +832,8 @@ void klp_free_replaced_patches_async(struct klp_patch *new_patch)
 	klp_for_each_patch_safe(old_patch, tmp_patch) {
 		if (old_patch == new_patch)
 			return;
+		if (!old_patch->replaceable)
+			continue;
 		klp_free_patch_async(old_patch);
 	}
 }
@@ -1232,6 +1236,8 @@ void klp_unpatch_replaced_patches(struct klp_patch *new_patch)
 		if (old_patch == new_patch)
 			return;
 
+		if (!old_patch->replaceable)
+			continue;
 		old_patch->enabled = false;
 		klp_unpatch_objects(old_patch);
 	}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-01-27  6:35 [RFC PATCH 0/2] livepatch: Add support for hybrid mode Yafang Shao
  2025-01-27  6:35 ` [RFC PATCH 1/2] livepatch: Add replaceable attribute Yafang Shao
  2025-01-27  6:35 ` [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode Yafang Shao
@ 2025-01-27 13:46 ` Petr Mladek
  2025-01-27 14:22   ` Yafang Shao
  2 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2025-01-27 13:46 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Mon 2025-01-27 14:35:24, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:
> 
> - It is expensive
> 
>   During testing with frequent replacements of an old livepatch, random RCU
>   warnings were observed:
> 
>   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
>   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
>   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
>   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
>   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
>   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
>   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
>   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
>   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
>   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
>   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
>   
>   This indicates that atomic replacement can cause performance issues,
>   particularly with RCU synchronization under frequent use.

Please, provide more details about the test:

  + List of patched functions.

  + What exactly is meant by frequent replacements (busy loop?, once a minute?)

  + Size of the systems (number of CPUs, number of running processes)

  + Were there any extra changes in the livepatch code code,
    e.g. debugging messages?


> - Potential Risks During Replacement 
> 
>   One known issue involves replacing livepatched versions of critical
>   functions such as do_exit(). During the replacement process, a panic
>   might occur, as highlighted in [0].

I would rather prefer to make the atomic replace safe. I mean to
block the transition until all exiting processes are not gone.

Josh made a good point. We should look how this is handled by
RCU, tracing, or another subsystems which might have similar
problems.


> Other potential risks may also arise
>   due to inconsistencies or race conditions during transitions.

What inconsistencies and race conditions you have in mind, please?


> - Temporary Loss of Patching 
> 
>   During the replacement process, the old patch is set to a NOP (no-operation)
>   before the new patch is fully applied. This creates a window where the
>   function temporarily reverts to its original, unpatched state. If the old
>   patch fixed a critical issue (e.g., one that prevented a system panic), the
>   system could become vulnerable to that issue during the transition.

This is not true!

Please, look where klp_patch_object() and klp_unpatch_objects() is
called. Also look at how ops->func_stack is handled in
klp_ftrace_handler().

Also you might want to read Documentation/livepatch/livepatch.rst


> The current atomic replacement approach replaces all old livepatches,
> even when such a sweeping change is unnecessary. This can be improved
> by introducing a hybrid mode, which allows the coexistence of both
> atomic replace and non atomic replace livepatches.
> 
> In the hybrid mode:
> 
> - Specific livepatches can be marked as "non-replaceable" to ensure they
>   remain active and unaffected during replacements.
> 
> - Other livepatches can be marked as "replaceable", allowing targeted
>   replacements of only those patches.

Honestly, I consider this as a workaround for a problem which should
be fixed a proper way.

The main advantage of the atomic replace is simplify the maintenance
and debugging. It reduces the amount of possible combinations. The
hybrid mode brings back the jungle.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-01-27 13:46 ` [RFC PATCH 0/2] livepatch: Add support for " Petr Mladek
@ 2025-01-27 14:22   ` Yafang Shao
  2025-01-31 13:18     ` Miroslav Benes
  0 siblings, 1 reply; 36+ messages in thread
From: Yafang Shao @ 2025-01-27 14:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Mon, Jan 27, 2025 at 9:46 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-01-27 14:35:24, Yafang Shao wrote:
> > The atomic replace livepatch mechanism was introduced to handle scenarios
> > where we want to unload a specific livepatch without unloading others.
> > However, its current implementation has significant shortcomings, making
> > it less than ideal in practice. Below are the key downsides:
> >
> > - It is expensive
> >
> >   During testing with frequent replacements of an old livepatch, random RCU
> >   warnings were observed:
> >
> >   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
> >   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
> >   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
> >   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
> >   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
> >   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
> >   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
> >   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
> >   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
> >   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
> >   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
> >
> >   This indicates that atomic replacement can cause performance issues,
> >   particularly with RCU synchronization under frequent use.
>
> Please, provide more details about the test:
>
>   + List of patched functions.

$ ls /sys/kernel/livepatch/livepatch_61_release12/vmlinux/
blk_throtl_cancel_bios,1  __d_move,1  do_iter_readv_writev,1  patched
           update_rq_clock,1
blk_mq_handle_expired,1  d_delete,1                do_exit,1
high_work_func,1        try_charge_memcg,1

$ ls /sys/kernel/livepatch/livepatch_61_release12/xfs/
patched  xfs_extent_busy_flush,1  xfs_iget,1  xfs_inode_mark_reclaimable,1

$ ls /sys/kernel/livepatch/livepatch_61_release12/fuse/
fuse_send_init,1  patched  process_init_reply,1

$ ls /sys/kernel/livepatch/livepatch_61_release12/nf_tables/
nft_data_init,1  patched

>
>   + What exactly is meant by frequent replacements (busy loop?, once a minute?)

The script:

#!/bin/bash
while true; do
        yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
        ./apply_livepatch_61.sh # it will sleep 5s
        yum erase -y kernel-livepatch-6.1.12-0.x86_64
        yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
        ./apply_livepatch_61.sh  # it will sleep 5s
done


>
>   + Size of the systems (number of CPUs, number of running processes)

top - 22:08:17 up 227 days,  3:29,  1 user,  load average: 50.66, 32.92, 20.77
Tasks: 1349 total,   2 running, 543 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.4 us,  0.9 sy,  0.0 ni, 88.1 id,  2.9 wa,  0.3 hi,  0.4 si,  0.0 st
KiB Mem : 39406304+total,  8899864 free, 43732568 used, 34143062+buff/cache
KiB Swap:        0 total,        0 free,        0 used. 32485065+avail Mem

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    64
Socket(s):             1
NUMA node(s):          1
Vendor ID:             AuthenticAMD
CPU family:            25
Model:                 1
Model name:            AMD EPYC 7Q83 64-Core Processor
Stepping:              1
CPU MHz:               3234.573
CPU max MHz:           3854.4431
CPU min MHz:           1500.0000
BogoMIPS:              4890.66
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              32768K
NUMA node0 CPU(s):     0-127

>
>   + Were there any extra changes in the livepatch code code,
>     e.g. debugging messages?

Not at all.

>
>
> > - Potential Risks During Replacement
> >
> >   One known issue involves replacing livepatched versions of critical
> >   functions such as do_exit(). During the replacement process, a panic
> >   might occur, as highlighted in [0].
>
> I would rather prefer to make the atomic replace safe. I mean to
> block the transition until all exiting processes are not gone.
>
> Josh made a good point. We should look how this is handled by
> RCU, tracing, or another subsystems which might have similar
> problems.

I don't against this fix.

>
>
> > Other potential risks may also arise
> >   due to inconsistencies or race conditions during transitions.
>
> What inconsistencies and race conditions you have in mind, please?

I have explained it at
https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354

 klp_ftrace_handler
      if (unlikely(func->transition)) {
          WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
  }

Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
that led to the decision to add this warning?

>
>
> > - Temporary Loss of Patching
> >
> >   During the replacement process, the old patch is set to a NOP (no-operation)
> >   before the new patch is fully applied. This creates a window where the
> >   function temporarily reverts to its original, unpatched state. If the old
> >   patch fixed a critical issue (e.g., one that prevented a system panic), the
> >   system could become vulnerable to that issue during the transition.
>
> This is not true!
>
> Please, look where klp_patch_object() and klp_unpatch_objects() is
> called. Also look at how ops->func_stack is handled in
> klp_ftrace_handler().
>
> Also you might want to read Documentation/livepatch/livepatch.rst

Thanks for your guidance.

>
>
> > The current atomic replacement approach replaces all old livepatches,
> > even when such a sweeping change is unnecessary. This can be improved
> > by introducing a hybrid mode, which allows the coexistence of both
> > atomic replace and non atomic replace livepatches.
> >
> > In the hybrid mode:
> >
> > - Specific livepatches can be marked as "non-replaceable" to ensure they
> >   remain active and unaffected during replacements.
> >
> > - Other livepatches can be marked as "replaceable", allowing targeted
> >   replacements of only those patches.
>
> Honestly, I consider this as a workaround for a problem which should
> be fixed a proper way.

This is not a workaround. We should avoid replacing functions that do not
differ between the two versions. Since automatic handling is not
possible for now, we can provide a sysfs interface
to allow users to perform it manually.

>
> The main advantage of the atomic replace is simplify the maintenance
> and debugging.

Is it worth the high overhead on production servers?
Can you provide examples of companies that use atomic replacement at
scale in their production environments?

>  It reduces the amount of possible combinations. The
> hybrid mode brings back the jungle.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-01-27  6:35 ` [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode Yafang Shao
@ 2025-01-27 14:31   ` Petr Mladek
  2025-01-27 15:34     ` Yafang Shao
  2025-02-07  2:31   ` Josh Poimboeuf
  1 sibling, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2025-01-27 14:31 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:

[...]

> In the hybrid mode:
> 
> - Specific livepatches can be marked as "non-replaceable" to ensure they
>   remain active and unaffected during replacements.
> 
> - Other livepatches can be marked as "replaceable," allowing targeted
>   replacements of only those patches.
> 
> This selective approach would reduce unnecessary transitions, lower the
> risk of temporary patch loss, and mitigate performance issues during
> livepatch replacement.
> 
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
>  		klp_for_each_object(old_patch, old_obj) {
>  			int err;
>  
> +			if (!old_patch->replaceable)
> +				continue;

This is one example where things might get very complicated.

The same function might be livepatched by more livepatches, see
ops->func_stack. For example, let's have funcA and three livepatches:
a
  + lp1:
	.replace = false,
	.non-replace = true,
	.func =	{
			.old_name = "funcA",
			.new_func = lp1_funcA,
		}, { }

  + lp2:
	.replace = false,
	.non-replace = false,
	.func =	{
			.old_name = "funcA",
			.new_func = lp2_funcA,
		},{
			.old_name = "funcB",
			.new_func = lp2_funcB,
		}, { }


  + lp3:
	.replace = true,
	.non-replace = false,
	.func =	{
			.old_name = "funcB",
			.new_func = lp3_funcB,
		}, { }


Now, apply lp1:

      + funcA() gets redirected to lp1_funcA()

Then, apply lp2

      + funcA() gets redirected to lp2_funcA()

Finally, apply lp3:

      + The proposed code would add "nop()" for
	funcA() because	it exists in lp2 and does not exist in lp3.

      + funcA() will get redirected to the original code
	because of the nop() during transition

      + nop() will get removed in klp_complete_transition() and
	funcA() will get suddenly redirected to lp1_funcA()
	because it will still be on ops->func_stack even
	after the "nop" and lp2_funcA() gets removed.

	   => The entire system will start using another funcA
	      implementation at some random time

	   => this would violate the consistency model


The proper solution might be tricky:

1. We would need to detect this situation and do _not_ add
   the "nop" for lp3 and funcA().

2. We would need a more complicate code for handling the task states.

   klp_update_patch_state() sets task->patch_state using
   the global "klp_target_state". But in the above example,
   when enabling lp3:

    + funcA would need to get transitioned _backward_:
	 KLP_TRANSITION_PATCHED -> KLP_TRANSITION_UNPATCHED
      , so that it goes on ops->func_stack:
	 lp2_funcA -> lp1->funcA

   while:

    + funcA would need to get transitioned forward:
	 KLP_TRANSITION_UNPATCHED -> KLP_TRANSITION_PATCHED
      , so that it goes on ops->func_stack:
	 lp2_funcB -> lp3->funcB


=> the hybrid mode would complicate the life for both livepatch
   creators/maintainers and kernel code developers/maintainers.

   I am afraid that this complexity is not acceptable if there are
   better solutions for the original problem.

>  			err = klp_add_object_nops(patch, old_obj);
>  			if (err)
>  				return err;

I am sorry but I am quite strongly against this approach!

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-01-27 14:31   ` Petr Mladek
@ 2025-01-27 15:34     ` Yafang Shao
  2025-02-04 13:21       ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Yafang Shao @ 2025-01-27 15:34 UTC (permalink / raw)
  To: Petr Mladek
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Mon, Jan 27, 2025 at 10:31 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> > The atomic replace livepatch mechanism was introduced to handle scenarios
> > where we want to unload a specific livepatch without unloading others.
> > However, its current implementation has significant shortcomings, making
> > it less than ideal in practice. Below are the key downsides:
>
> [...]
>
> > In the hybrid mode:
> >
> > - Specific livepatches can be marked as "non-replaceable" to ensure they
> >   remain active and unaffected during replacements.
> >
> > - Other livepatches can be marked as "replaceable," allowing targeted
> >   replacements of only those patches.
> >
> > This selective approach would reduce unnecessary transitions, lower the
> > risk of temporary patch loss, and mitigate performance issues during
> > livepatch replacement.
> >
> > --- a/kernel/livepatch/core.c
> > +++ b/kernel/livepatch/core.c
> > @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
> >               klp_for_each_object(old_patch, old_obj) {
> >                       int err;
> >
> > +                     if (!old_patch->replaceable)
> > +                             continue;
>
> This is one example where things might get very complicated.

Why does this example even exist in the first place?
If hybrid mode is enabled, this scenario could have been avoided entirely.

>
> The same function might be livepatched by more livepatches, see
> ops->func_stack. For example, let's have funcA and three livepatches:
> a
>   + lp1:
>         .replace = false,
>         .non-replace = true,
>         .func = {
>                         .old_name = "funcA",
>                         .new_func = lp1_funcA,
>                 }, { }
>
>   + lp2:
>         .replace = false,
>         .non-replace = false,
>         .func = {
>                         .old_name = "funcA",
>                         .new_func = lp2_funcA,
>                 },{
>                         .old_name = "funcB",
>                         .new_func = lp2_funcB,
>                 }, { }
>
>
>   + lp3:
>         .replace = true,
>         .non-replace = false,
>         .func = {
>                         .old_name = "funcB",
>                         .new_func = lp3_funcB,
>                 }, { }
>
>
> Now, apply lp1:
>
>       + funcA() gets redirected to lp1_funcA()
>
> Then, apply lp2
>
>       + funcA() gets redirected to lp2_funcA()
>
> Finally, apply lp3:
>
>       + The proposed code would add "nop()" for
>         funcA() because it exists in lp2 and does not exist in lp3.
>
>       + funcA() will get redirected to the original code
>         because of the nop() during transition
>
>       + nop() will get removed in klp_complete_transition() and
>         funcA() will get suddenly redirected to lp1_funcA()
>         because it will still be on ops->func_stack even
>         after the "nop" and lp2_funcA() gets removed.
>
>            => The entire system will start using another funcA
>               implementation at some random time
>
>            => this would violate the consistency model
>
>
> The proper solution might be tricky:
>
> 1. We would need to detect this situation and do _not_ add
>    the "nop" for lp3 and funcA().
>
> 2. We would need a more complicate code for handling the task states.
>
>    klp_update_patch_state() sets task->patch_state using
>    the global "klp_target_state". But in the above example,
>    when enabling lp3:
>
>     + funcA would need to get transitioned _backward_:
>          KLP_TRANSITION_PATCHED -> KLP_TRANSITION_UNPATCHED
>       , so that it goes on ops->func_stack:
>          lp2_funcA -> lp1->funcA
>
>    while:
>
>     + funcA would need to get transitioned forward:
>          KLP_TRANSITION_UNPATCHED -> KLP_TRANSITION_PATCHED
>       , so that it goes on ops->func_stack:
>          lp2_funcB -> lp3->funcB
>
>
> => the hybrid mode would complicate the life for both livepatch
>    creators/maintainers and kernel code developers/maintainers.
>

I don’t believe they should be held responsible for poor
configurations. These settings could have been avoided from the start.
There are countless tunable knobs in the system—should we try to
accommodate every possible combination? No, that’s not how systems are
designed to operate. Instead, we should identify and follow best
practices to ensure optimal functionality.

>    I am afraid that this complexity is not acceptable if there are
>    better solutions for the original problem.
>
> >                       err = klp_add_object_nops(patch, old_obj);
> >                       if (err)
> >                               return err;
>
> I am sorry but I am quite strongly against this approach!
>
> Best Regards,
> Petr



--
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-01-27 14:22   ` Yafang Shao
@ 2025-01-31 13:18     ` Miroslav Benes
  2025-02-03  9:44       ` Yafang Shao
  0 siblings, 1 reply; 36+ messages in thread
From: Miroslav Benes @ 2025-01-31 13:18 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Petr Mladek, jpoimboe, jikos, joe.lawrence, live-patching,
	linux-kernel

> >
> >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> 
> The script:
> 
> #!/bin/bash
> while true; do
>         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
>         ./apply_livepatch_61.sh # it will sleep 5s
>         yum erase -y kernel-livepatch-6.1.12-0.x86_64
>         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
>         ./apply_livepatch_61.sh  # it will sleep 5s
> done
 
A live patch application is a slowpath. It is expected not to run 
frequently (in a relative sense). If you stress it like this, it is quite 
expected that it will have an impact. Especially on a large busy system.

> >
> > > Other potential risks may also arise
> > >   due to inconsistencies or race conditions during transitions.
> >
> > What inconsistencies and race conditions you have in mind, please?
> 
> I have explained it at
> https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> 
>  klp_ftrace_handler
>       if (unlikely(func->transition)) {
>           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
>   }
> 
> Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> that led to the decision to add this warning?

A safety measure for something which really should not happen.

> > The main advantage of the atomic replace is simplify the maintenance
> > and debugging.
> 
> Is it worth the high overhead on production servers?

Yes, because the overhead once a live patch is applied is negligible.

> Can you provide examples of companies that use atomic replacement at
> scale in their production environments?

At least SUSE uses it as a solution for its customers. No many problems 
have been reported since we started ~10 years ago.

Regards,
Miroslav

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-01-31 13:18     ` Miroslav Benes
@ 2025-02-03  9:44       ` Yafang Shao
  2025-02-03 21:53         ` Song Liu
  2025-02-04 13:05         ` Petr Mladek
  0 siblings, 2 replies; 36+ messages in thread
From: Yafang Shao @ 2025-02-03  9:44 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Petr Mladek, jpoimboe, jikos, joe.lawrence, live-patching,
	linux-kernel

On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
>
> > >
> > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> >
> > The script:
> >
> > #!/bin/bash
> > while true; do
> >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> >         ./apply_livepatch_61.sh # it will sleep 5s
> >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> >         ./apply_livepatch_61.sh  # it will sleep 5s
> > done
>
> A live patch application is a slowpath. It is expected not to run
> frequently (in a relative sense).

The frequency isn’t the main concern here; _scalability_ is the key issue.
Running livepatches once per day (a relatively low frequency) across all of our
production servers (hundreds of thousands) isn’t feasible. Instead, we need to
periodically run tests on a subset of test servers.


> If you stress it like this, it is quite
> expected that it will have an impact. Especially on a large busy system.

It seems you agree that the current atomic-replace process lacks scalability.
When deploying a livepatch across a large fleet of servers, it’s impossible to
ensure that the servers are idle, as their workloads are constantly varying and
are not deterministic.

The challenges are very different when managing 1K servers versus 1M servers.
Similarly, the issues differ significantly between patching a single
function and
patching 100 functions, especially when some of those functions are critical.
That’s what scalability is all about.

Since we transitioned from the old livepatch mode to the new
atomic-replace mode,
our SREs have consistently reported that one or more servers become
stalled during
the upgrade (replacement).

>
> > >
> > > > Other potential risks may also arise
> > > >   due to inconsistencies or race conditions during transitions.
> > >
> > > What inconsistencies and race conditions you have in mind, please?
> >
> > I have explained it at
> > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> >
> >  klp_ftrace_handler
> >       if (unlikely(func->transition)) {
> >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> >   }
> >
> > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > that led to the decision to add this warning?
>
> A safety measure for something which really should not happen.

Unfortunately, this issue occurs during my stress tests.

>
> > > The main advantage of the atomic replace is simplify the maintenance
> > > and debugging.
> >
> > Is it worth the high overhead on production servers?
>
> Yes, because the overhead once a live patch is applied is negligible.

If you’re managing a large fleet of servers, this issue is far from negligible.

>
> > Can you provide examples of companies that use atomic replacement at
> > scale in their production environments?
>
> At least SUSE uses it as a solution for its customers. No many problems
> have been reported since we started ~10 years ago.

Perhaps we’re running different workloads.
Going back to the original purpose of livepatching: is it designed to address
security vulnerabilities, or to deploy new features?
If it’s the latter, then there’s definitely a lot of room for improvement.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-03  9:44       ` Yafang Shao
@ 2025-02-03 21:53         ` Song Liu
  2025-02-05 14:42           ` Yafang Shao
  2025-02-04 13:05         ` Petr Mladek
  1 sibling, 1 reply; 36+ messages in thread
From: Song Liu @ 2025-02-03 21:53 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Miroslav Benes, Petr Mladek, jpoimboe, jikos, joe.lawrence,
	live-patching, linux-kernel

On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
[...]
>
> If you’re managing a large fleet of servers, this issue is far from negligible.
>
> >
> > > Can you provide examples of companies that use atomic replacement at
> > > scale in their production environments?
> >
> > At least SUSE uses it as a solution for its customers. No many problems
> > have been reported since we started ~10 years ago.

We (Meta) always use atomic replacement for our live patches.

>
> Perhaps we’re running different workloads.
> Going back to the original purpose of livepatching: is it designed to address
> security vulnerabilities, or to deploy new features?
> If it’s the latter, then there’s definitely a lot of room for improvement.

We only use KLP to fix bugs and security vulnerabilities. We do not use
live patches to deploy new features.

Thanks,
Song

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-03  9:44       ` Yafang Shao
  2025-02-03 21:53         ` Song Liu
@ 2025-02-04 13:05         ` Petr Mladek
  2025-02-05  6:16           ` Yafang Shao
  1 sibling, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2025-02-04 13:05 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Miroslav Benes, jpoimboe, jikos, joe.lawrence, live-patching,
	linux-kernel

On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> >
> > > >
> > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > >
> > > The script:
> > >
> > > #!/bin/bash
> > > while true; do
> > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > >         ./apply_livepatch_61.sh # it will sleep 5s
> > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > done
> >
> > A live patch application is a slowpath. It is expected not to run
> > frequently (in a relative sense).
> 
> The frequency isn’t the main concern here; _scalability_ is the key issue.
> Running livepatches once per day (a relatively low frequency) across all of our
> production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> periodically run tests on a subset of test servers.

I am confused. The original problem was a system crash when
livepatching do_exit() function, see
https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com

The rcu watchdog warning was first mentioned in this patchset.
Do you see rcu watchdog warning in production or just
with this artificial test, please?


> > If you stress it like this, it is quite
> > expected that it will have an impact. Especially on a large busy system.
> 
> It seems you agree that the current atomic-replace process lacks scalability.
> When deploying a livepatch across a large fleet of servers, it’s impossible to
> ensure that the servers are idle, as their workloads are constantly varying and
> are not deterministic.

Do you see the scalability problem in production, please?
And could you prove that it was caused by livepatching, please?


> The challenges are very different when managing 1K servers versus 1M servers.
> Similarly, the issues differ significantly between patching a single
> function and
> patching 100 functions, especially when some of those functions are critical.
> That’s what scalability is all about.
> 
> Since we transitioned from the old livepatch mode to the new
> atomic-replace mode,

What do you mean with the old livepatch mode, please?

Did you allow to install more livepatches in parallel?
What was the motivation to switch to the atomic replace, please?

> our SREs have consistently reported that one or more servers become
> stalled during
> the upgrade (replacement).

What is SRE, please?
Could you please show some log from a production system?


> > > > > Other potential risks may also arise
> > > > >   due to inconsistencies or race conditions during transitions.
> > > >
> > > > What inconsistencies and race conditions you have in mind, please?
> > >
> > > I have explained it at
> > > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> > >
> > >  klp_ftrace_handler
> > >       if (unlikely(func->transition)) {
> > >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> > >   }
> > >
> > > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > > that led to the decision to add this warning?
> >
> > A safety measure for something which really should not happen.
> 
> Unfortunately, this issue occurs during my stress tests.

I am confused. Do you see the above WARN_ON_ONCE() during your
stress test? Could you please provide a log?

> > > > The main advantage of the atomic replace is simplify the maintenance
> > > > and debugging.
> > >
> > > Is it worth the high overhead on production servers?
> >
> > Yes, because the overhead once a live patch is applied is negligible.
> 
> If you’re managing a large fleet of servers, this issue is far from negligible.
> 
> >
> > > Can you provide examples of companies that use atomic replacement at
> > > scale in their production environments?
> >
> > At least SUSE uses it as a solution for its customers. No many problems
> > have been reported since we started ~10 years ago.
> 
> Perhaps we’re running different workloads.
> Going back to the original purpose of livepatching: is it designed to address
> security vulnerabilities, or to deploy new features?

We (SUSE) use livepatches only for fixing CVEs and serious bugs.


> If it’s the latter, then there’s definitely a lot of room for improvement.

You might be right. I am just not sure whether the hybrid mode would
be the right solution.

If you have problems with the atomic replace then you might stop using
it completely and just install more livepatches in parallel.


My view:

More livepatches installed in parallel are more prone to
inconsistencies. A good example is the thread about introducing
stack order sysfs interface, see
https://lore.kernel.org/all/AAD198C9-210E-4E31-8FD7-270C39A974A8@gmail.com/

The atomic replace helps to keep the livepatched functions consistent.

The hybrid model would allow to install more livepatches in parallel except
that one livepatch could be replaced atomically. It would create even
more scenarios than allowing all livepatches in parallel.

What would be the rules, please?

Which functionality will be livepatched by the atomic replace, please?

Which functionality will be handled by the extra non-replaceable
livepatches, please?

How would you keep the livepatches consistent, please?

How would you manage dependencies between livepatches, please?

What is the advantage of the hybrid model over allowing
all livepatches in parallel, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-01-27 15:34     ` Yafang Shao
@ 2025-02-04 13:21       ` Petr Mladek
  2025-02-05  2:54         ` Yafang Shao
  0 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2025-02-04 13:21 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Mon 2025-01-27 23:34:50, Yafang Shao wrote:
> On Mon, Jan 27, 2025 at 10:31 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> > > The atomic replace livepatch mechanism was introduced to handle scenarios
> > > where we want to unload a specific livepatch without unloading others.
> > > However, its current implementation has significant shortcomings, making
> > > it less than ideal in practice. Below are the key downsides:
> >
> > [...]
> >
> > > In the hybrid mode:
> > >
> > > - Specific livepatches can be marked as "non-replaceable" to ensure they
> > >   remain active and unaffected during replacements.
> > >
> > > - Other livepatches can be marked as "replaceable," allowing targeted
> > >   replacements of only those patches.
> > >
> > > This selective approach would reduce unnecessary transitions, lower the
> > > risk of temporary patch loss, and mitigate performance issues during
> > > livepatch replacement.
> > >
> > > --- a/kernel/livepatch/core.c
> > > +++ b/kernel/livepatch/core.c
> > > @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
> > >               klp_for_each_object(old_patch, old_obj) {
> > >                       int err;
> > >
> > > +                     if (!old_patch->replaceable)
> > > +                             continue;
> >
> > This is one example where things might get very complicated.
> 
> Why does this example even exist in the first place?
> If hybrid mode is enabled, this scenario could have been avoided entirely.

How exactly this scenario could be avoided, please?

In the real life, livepatches are used to fix many bugs.
And every bug is fixed by livepatching several functions.

Fix1 livepatches: funcA
Fix2 livepatches: funcA, funcB
Fix3 livepatches: funcB

How would you handle this with the hybrid model?

Which fix will be handled by livepatches installed in parallel?
Which fix will be handled by atomic replace?
How would you keep it consistent?

How would it work when there are hundreds of fixes and thousands
of livepatched functions?

Where exactly is the advantage of the hybrid model?

> >
> > The same function might be livepatched by more livepatches, see
> > ops->func_stack. For example, let's have funcA and three livepatches:
> > a
> >   + lp1:
> >         .replace = false,
> >         .non-replace = true,
> >         .func = {
> >                         .old_name = "funcA",
> >                         .new_func = lp1_funcA,
> >                 }, { }
> >
> >   + lp2:
> >         .replace = false,
> >         .non-replace = false,
> >         .func = {
> >                         .old_name = "funcA",
> >                         .new_func = lp2_funcA,
> >                 },{
> >                         .old_name = "funcB",
> >                         .new_func = lp2_funcB,
> >                 }, { }
> >
> >
> >   + lp3:
> >         .replace = true,
> >         .non-replace = false,
> >         .func = {
> >                         .old_name = "funcB",
> >                         .new_func = lp3_funcB,
> >                 }, { }
> >
> >
> > Now, apply lp1:
> >
> >       + funcA() gets redirected to lp1_funcA()
> >
> > Then, apply lp2
> >
> >       + funcA() gets redirected to lp2_funcA()
> >
> > Finally, apply lp3:
> >
> >       + The proposed code would add "nop()" for
> >         funcA() because it exists in lp2 and does not exist in lp3.
> >
> >       + funcA() will get redirected to the original code
> >         because of the nop() during transition
> >
> >       + nop() will get removed in klp_complete_transition() and
> >         funcA() will get suddenly redirected to lp1_funcA()
> >         because it will still be on ops->func_stack even
> >         after the "nop" and lp2_funcA() gets removed.
> >
> >            => The entire system will start using another funcA
> >               implementation at some random time
> >
> >            => this would violate the consistency model
> >
> >
> > The proper solution might be tricky:
> >
> > 1. We would need to detect this situation and do _not_ add
> >    the "nop" for lp3 and funcA().
> >
> > 2. We would need a more complicate code for handling the task states.
> >
> >    klp_update_patch_state() sets task->patch_state using
> >    the global "klp_target_state". But in the above example,
> >    when enabling lp3:
> >
> >     + funcA would need to get transitioned _backward_:
> >          KLP_TRANSITION_PATCHED -> KLP_TRANSITION_UNPATCHED
> >       , so that it goes on ops->func_stack:
> >          lp2_funcA -> lp1->funcA
> >
> >    while:
> >
> >     + funcA would need to get transitioned forward:
> >          KLP_TRANSITION_UNPATCHED -> KLP_TRANSITION_PATCHED
> >       , so that it goes on ops->func_stack:
> >          lp2_funcB -> lp3->funcB
> >
> >
> > => the hybrid mode would complicate the life for both livepatch
> >    creators/maintainers and kernel code developers/maintainers.
> >
> 
> I don’t believe they should be held responsible for poor
> configurations. These settings could have been avoided from the start.
> There are countless tunable knobs in the system—should we try to
> accommodate every possible combination? No, that’s not how systems are
> designed to operate. Instead, we should identify and follow best
> practices to ensure optimal functionality.

I am sorry but I do not understand the above paragraph at all.

Livepatches modify functions.
How is it related to system configuration or tunable knobs?
What best practices are you talking, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-04 13:21       ` Petr Mladek
@ 2025-02-05  2:54         ` Yafang Shao
  2025-02-05 16:03           ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Yafang Shao @ 2025-02-05  2:54 UTC (permalink / raw)
  To: Petr Mladek
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Tue, Feb 4, 2025 at 9:21 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-01-27 23:34:50, Yafang Shao wrote:
> > On Mon, Jan 27, 2025 at 10:31 PM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> > > > The atomic replace livepatch mechanism was introduced to handle scenarios
> > > > where we want to unload a specific livepatch without unloading others.
> > > > However, its current implementation has significant shortcomings, making
> > > > it less than ideal in practice. Below are the key downsides:
> > >
> > > [...]
> > >
> > > > In the hybrid mode:
> > > >
> > > > - Specific livepatches can be marked as "non-replaceable" to ensure they
> > > >   remain active and unaffected during replacements.
> > > >
> > > > - Other livepatches can be marked as "replaceable," allowing targeted
> > > >   replacements of only those patches.
> > > >
> > > > This selective approach would reduce unnecessary transitions, lower the
> > > > risk of temporary patch loss, and mitigate performance issues during
> > > > livepatch replacement.
> > > >
> > > > --- a/kernel/livepatch/core.c
> > > > +++ b/kernel/livepatch/core.c
> > > > @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
> > > >               klp_for_each_object(old_patch, old_obj) {
> > > >                       int err;
> > > >
> > > > +                     if (!old_patch->replaceable)
> > > > +                             continue;
> > >
> > > This is one example where things might get very complicated.
> >
> > Why does this example even exist in the first place?
> > If hybrid mode is enabled, this scenario could have been avoided entirely.
>

You have many questions, but it seems you haven’t taken the time to read even
a single line of this patchset. I’ll try to address them to save you some time.

> How exactly this scenario could be avoided, please?
>
> In the real life, livepatches are used to fix many bugs.
> And every bug is fixed by livepatching several functions.
>
> Fix1 livepatches: funcA
> Fix2 livepatches: funcA, funcB
> Fix3 livepatches: funcB
>
> How would you handle this with the hybrid model?

In your scenario, each Fix will replace the applied livepatches, so
they must be set to replaceable.
To clarify the hybrid model, I'll illustrate it with the following Fixes:

Fix1 livepatches: funcA
Fix2  livepatches: funcC
Fix3 livepatches: funcA, funcB
Fix4  livepatches: funcD
Fix5 livepatches: funcB

Each FixN represents a single /sys/kernel/livepatch/FixN.
By default, all Fixes are set to non-replaceable.

Step-by-step process:
1. Loading Fix1
   Loaded Fixes: Fix1
2. Loading Fix2
   Loaded Fixes: Fix1 Fix2
3. Loading Fix3
    Before loading, set Fix1 to replaceable and replace it with Fix3,
which is a cumulative patch of Fix1 and Fix3.
    Loaded Fixes:  Fix2 Fix3
4. Loading Fix4
    Loaded Fixes:  Fix2 Fix3 Fix4
5. Loading Fix5
    Similar to Step 3, set Fix3 to replaceable and replace it with Fix5.
    Loaded Fixes:  Fix2 Fix4 Fix5

This hybrid model ensures that irrelevant Fixes remain unaffected
during replacements.

>
> Which fix will be handled by livepatches installed in parallel?
> Which fix will be handled by atomic replace?
> How would you keep it consistent?
>
> How would it work when there are hundreds of fixes and thousands
> of livepatched functions?

The key point is that if a new Fix modifies a function already changed
by previous Fixes, all the affected old Fixes should be set to
replaceable, merged into the new Fix, and then the old Fixes should be
replaced with the new one.

>
> Where exactly is the advantage of the hybrid model?

That can be seen as a form of "deferred replacement"—in other words,
only replacing the old Fixes when absolutely necessary. This approach
can significantly reduce unnecessary work.

>
> > >
> > > The same function might be livepatched by more livepatches, see
> > > ops->func_stack. For example, let's have funcA and three livepatches:
> > > a
> > >   + lp1:
> > >         .replace = false,
> > >         .non-replace = true,
> > >         .func = {
> > >                         .old_name = "funcA",
> > >                         .new_func = lp1_funcA,
> > >                 }, { }
> > >
> > >   + lp2:
> > >         .replace = false,
> > >         .non-replace = false,
> > >         .func = {
> > >                         .old_name = "funcA",
> > >                         .new_func = lp2_funcA,
> > >                 },{
> > >                         .old_name = "funcB",
> > >                         .new_func = lp2_funcB,
> > >                 }, { }
> > >
> > >
> > >   + lp3:
> > >         .replace = true,
> > >         .non-replace = false,
> > >         .func = {
> > >                         .old_name = "funcB",
> > >                         .new_func = lp3_funcB,
> > >                 }, { }
> > >
> > >
> > > Now, apply lp1:
> > >
> > >       + funcA() gets redirected to lp1_funcA()
> > >
> > > Then, apply lp2
> > >
> > >       + funcA() gets redirected to lp2_funcA()
> > >
> > > Finally, apply lp3:
> > >
> > >       + The proposed code would add "nop()" for
> > >         funcA() because it exists in lp2 and does not exist in lp3.
> > >
> > >       + funcA() will get redirected to the original code
> > >         because of the nop() during transition
> > >
> > >       + nop() will get removed in klp_complete_transition() and
> > >         funcA() will get suddenly redirected to lp1_funcA()
> > >         because it will still be on ops->func_stack even
> > >         after the "nop" and lp2_funcA() gets removed.
> > >
> > >            => The entire system will start using another funcA
> > >               implementation at some random time
> > >
> > >            => this would violate the consistency model
> > >
> > >
> > > The proper solution might be tricky:
> > >
> > > 1. We would need to detect this situation and do _not_ add
> > >    the "nop" for lp3 and funcA().
> > >
> > > 2. We would need a more complicate code for handling the task states.
> > >
> > >    klp_update_patch_state() sets task->patch_state using
> > >    the global "klp_target_state". But in the above example,
> > >    when enabling lp3:
> > >
> > >     + funcA would need to get transitioned _backward_:
> > >          KLP_TRANSITION_PATCHED -> KLP_TRANSITION_UNPATCHED
> > >       , so that it goes on ops->func_stack:
> > >          lp2_funcA -> lp1->funcA
> > >
> > >    while:
> > >
> > >     + funcA would need to get transitioned forward:
> > >          KLP_TRANSITION_UNPATCHED -> KLP_TRANSITION_PATCHED
> > >       , so that it goes on ops->func_stack:
> > >          lp2_funcB -> lp3->funcB
> > >
> > >
> > > => the hybrid mode would complicate the life for both livepatch
> > >    creators/maintainers and kernel code developers/maintainers.
> > >
> >
> > I don’t believe they should be held responsible for poor
> > configurations. These settings could have been avoided from the start.
> > There are countless tunable knobs in the system—should we try to
> > accommodate every possible combination? No, that’s not how systems are
> > designed to operate. Instead, we should identify and follow best
> > practices to ensure optimal functionality.
>
> I am sorry but I do not understand the above paragraph at all.
>
> Livepatches modify functions.
> How is it related to system configuration or tunable knobs?

/sys/kernel/livepatch/FixN/replaceable is a tunable knob.

+static struct kobj_attribute replaceable_kobj_attr = __ATTR_RW(replaceable);

> What best practices are you talking, please?

As mentioned earlier.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-04 13:05         ` Petr Mladek
@ 2025-02-05  6:16           ` Yafang Shao
  2025-02-07 11:00             ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Yafang Shao @ 2025-02-05  6:16 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Miroslav Benes, jpoimboe, jikos, joe.lawrence, live-patching,
	linux-kernel

On Tue, Feb 4, 2025 at 9:05 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> > On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> > >
> > > > >
> > > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > > >
> > > > The script:
> > > >
> > > > #!/bin/bash
> > > > while true; do
> > > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > > >         ./apply_livepatch_61.sh # it will sleep 5s
> > > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > > done
> > >
> > > A live patch application is a slowpath. It is expected not to run
> > > frequently (in a relative sense).
> >
> > The frequency isn’t the main concern here; _scalability_ is the key issue.
> > Running livepatches once per day (a relatively low frequency) across all of our
> > production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> > periodically run tests on a subset of test servers.
>
> I am confused. The original problem was a system crash when
> livepatching do_exit() function, see
> https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com

Why do you view this patchset as a solution to the original problem?

>
> The rcu watchdog warning was first mentioned in this patchset.
> Do you see rcu watchdog warning in production or just
> with this artificial test, please?

So, we shouldn’t run any artificial tests on the livepatch, correct?
What exactly is the issue with these test cases?

>
>
> > > If you stress it like this, it is quite
> > > expected that it will have an impact. Especially on a large busy system.
> >
> > It seems you agree that the current atomic-replace process lacks scalability.
> > When deploying a livepatch across a large fleet of servers, it’s impossible to
> > ensure that the servers are idle, as their workloads are constantly varying and
> > are not deterministic.
>
> Do you see the scalability problem in production, please?

Yes, the livepatch transition was stalled.


> And could you prove that it was caused by livepatching, please?

When the livepatch transition is stalled, running `kpatch list` will
display the stalled information.

>
>
> > The challenges are very different when managing 1K servers versus 1M servers.
> > Similarly, the issues differ significantly between patching a single
> > function and
> > patching 100 functions, especially when some of those functions are critical.
> > That’s what scalability is all about.
> >
> > Since we transitioned from the old livepatch mode to the new
> > atomic-replace mode,
>
> What do you mean with the old livepatch mode, please?

$ kpatch-build -R

>
> Did you allow to install more livepatches in parallel?

No.

> What was the motivation to switch to the atomic replace, please?

This is the default behavior of kpatch [1] after upgrading to a new version.

[1].  https://github.com/dynup/kpatch/tree/master

>
> > our SREs have consistently reported that one or more servers become
> > stalled during
> > the upgrade (replacement).
>
> What is SRE, please?

From the wikipedia : https://en.wikipedia.org/wiki/Site_reliability_engineering

> Could you please show some log from a production system?

When the SREs initially reported that the livepatch transition was
stalled, I simply advised them to try again. However, after
experiencing these crashes, I dug deeper into the livepatch code and
realized that scalability is a concern. As a result, periodically
replacing an old livepatch triggers RCU warnings on our production
servers.

[Wed Feb  5 10:56:10 2025] livepatch: enabling patch 'livepatch_61_release6'
[Wed Feb  5 10:56:10 2025] livepatch: 'livepatch_61_release6':
starting patching transition
[Wed Feb  5 10:56:24 2025] rcu_tasks_wait_gp: rcu_tasks grace period
1126113 is 10078 jiffies old.
[Wed Feb  5 10:56:38 2025] rcu_tasks_wait_gp: rcu_tasks grace period
1126117 is 10199 jiffies old.
[Wed Feb  5 10:56:52 2025] rcu_tasks_wait_gp: rcu_tasks grace period
1126121 is 10047 jiffies old.
[Wed Feb  5 10:56:57 2025] livepatch: 'livepatch_61_release6': patching complete

PS: You might argue again about the frequency. If you believe this is
just a frequency issue, please suggest a suitable frequency.

>
>
> > > > > > Other potential risks may also arise
> > > > > >   due to inconsistencies or race conditions during transitions.
> > > > >
> > > > > What inconsistencies and race conditions you have in mind, please?
> > > >
> > > > I have explained it at
> > > > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> > > >
> > > >  klp_ftrace_handler
> > > >       if (unlikely(func->transition)) {
> > > >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> > > >   }
> > > >
> > > > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > > > that led to the decision to add this warning?
> > >
> > > A safety measure for something which really should not happen.
> >
> > Unfortunately, this issue occurs during my stress tests.
>
> I am confused. Do you see the above WARN_ON_ONCE() during your
> stress test? Could you please provide a log?

Could you pls read my replyment seriously ?
https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354

>
> > > > > The main advantage of the atomic replace is simplify the maintenance
> > > > > and debugging.
> > > >
> > > > Is it worth the high overhead on production servers?
> > >
> > > Yes, because the overhead once a live patch is applied is negligible.
> >
> > If you’re managing a large fleet of servers, this issue is far from negligible.
> >
> > >
> > > > Can you provide examples of companies that use atomic replacement at
> > > > scale in their production environments?
> > >
> > > At least SUSE uses it as a solution for its customers. No many problems
> > > have been reported since we started ~10 years ago.
> >
> > Perhaps we’re running different workloads.
> > Going back to the original purpose of livepatching: is it designed to address
> > security vulnerabilities, or to deploy new features?
>
> We (SUSE) use livepatches only for fixing CVEs and serious bugs.
>
>
> > If it’s the latter, then there’s definitely a lot of room for improvement.
>
> You might be right. I am just not sure whether the hybrid mode would
> be the right solution.
>
> If you have problems with the atomic replace then you might stop using
> it completely and just install more livepatches in parallel.

Why do we need to install livepatches in parallel if atomic replace is disabled?
We only need to install the additional new livepatch. Parallel
installation is only necessary at boot time.

>
>
> My view:
>
> More livepatches installed in parallel are more prone to

I’m confused as to why you consider this a parallel installation issue.

> inconsistencies. A good example is the thread about introducing
> stack order sysfs interface, see
> https://lore.kernel.org/all/AAD198C9-210E-4E31-8FD7-270C39A974A8@gmail.com/
>
> The atomic replace helps to keep the livepatched functions consistent.
>
> The hybrid model would allow to install more livepatches in parallel except
> that one livepatch could be replaced atomically. It would create even
> more scenarios than allowing all livepatches in parallel.
>
> What would be the rules, please?
>
> Which functionality will be livepatched by the atomic replace, please?
>
> Which functionality will be handled by the extra non-replaceable
> livepatches, please?
>
> How would you keep the livepatches consistent, please?
>
> How would you manage dependencies between livepatches, please?
>
> What is the advantage of the hybrid model over allowing
> all livepatches in parallel, please?

I can’t answer your questions if you insist on framing this as a
parallel installation issue.

--
Regards

Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-03 21:53         ` Song Liu
@ 2025-02-05 14:42           ` Yafang Shao
  2025-02-05 17:59             ` Song Liu
  0 siblings, 1 reply; 36+ messages in thread
From: Yafang Shao @ 2025-02-05 14:42 UTC (permalink / raw)
  To: Song Liu, bpf
  Cc: Miroslav Benes, Petr Mladek, jpoimboe, jikos, joe.lawrence,
	live-patching, linux-kernel

On Tue, Feb 4, 2025 at 5:53 AM Song Liu <song@kernel.org> wrote:
>
> On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> [...]
> >
> > If you’re managing a large fleet of servers, this issue is far from negligible.
> >
> > >
> > > > Can you provide examples of companies that use atomic replacement at
> > > > scale in their production environments?
> > >
> > > At least SUSE uses it as a solution for its customers. No many problems
> > > have been reported since we started ~10 years ago.
>
> We (Meta) always use atomic replacement for our live patches.
>
> >
> > Perhaps we’re running different workloads.
> > Going back to the original purpose of livepatching: is it designed to address
> > security vulnerabilities, or to deploy new features?
> > If it’s the latter, then there’s definitely a lot of room for improvement.
>
> We only use KLP to fix bugs and security vulnerabilities. We do not use
> live patches to deploy new features.

+BPF

Hello Song,

Since bpf_fexit also uses trampolines, I was curious about what would
happen if I attached do_exit() to fexit. Unfortunately, it triggers a
bug in BPF as well. The BPF program is as follows:

SEC("fexit/do_exit")
int fexit_do_exit
{
    return 0;
}

After the fexit program exits, the trampoline is still left over:

$ bpftool  link show  <<<< nothing output
$ grep "bpf_trampoline_[0-9]" /proc/kallsyms
ffffffffc04cb000 t bpf_trampoline_6442526459    [bpf]

We could either add functions annotated as "__noreturn" to the deny
list for fexit as follows, or we could explore a more generic
solution, such as embedding the "noreturn" information into the BTF
and extracting it when attaching fexit.

Any thoughts?

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d77abb87ffb1..37192888473c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -22742,6 +22742,13 @@ BTF_ID(func, __rcu_read_unlock)
 #endif
 BTF_SET_END(btf_id_deny)

+BTF_SET_START(fexit_deny)
+BTF_ID_UNUSED
+/* do_exit never returns */
+/* TODO: Add other functions annotated with __noreturn or figure out
a generic solution */
+BTF_ID(func, do_exit)
+BTF_SET_END(fexit_deny)
+
 static bool can_be_sleepable(struct bpf_prog *prog)
 {
        if (prog->type == BPF_PROG_TYPE_TRACING) {
@@ -22830,6 +22837,9 @@ static int check_attach_btf_id(struct
bpf_verifier_env *env)
        } else if (prog->type == BPF_PROG_TYPE_TRACING &&
                   btf_id_set_contains(&btf_id_deny, btf_id)) {
                return -EINVAL;
+       } else if (prog->expected_attach_type == BPF_TRACE_FEXIT &&
+                  btf_id_set_contains(&fexit_deny, btf_id)) {
+               return -EINVAL;
        }

        key = bpf_trampoline_compute_key(tgt_prog,
prog->aux->attach_btf, btf_id);


--
Regards
Yafang

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-05  2:54         ` Yafang Shao
@ 2025-02-05 16:03           ` Petr Mladek
  2025-02-06  2:35             ` Yafang Shao
  0 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2025-02-05 16:03 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Wed 2025-02-05 10:54:47, Yafang Shao wrote:
> On Tue, Feb 4, 2025 at 9:21 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Mon 2025-01-27 23:34:50, Yafang Shao wrote:
> > > On Mon, Jan 27, 2025 at 10:31 PM Petr Mladek <pmladek@suse.com> wrote:
> > > >
> > > > On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> > > > > The atomic replace livepatch mechanism was introduced to handle scenarios
> > > > > where we want to unload a specific livepatch without unloading others.
> > > > > However, its current implementation has significant shortcomings, making
> > > > > it less than ideal in practice. Below are the key downsides:
> > > >
> > > > [...]
> > > >
> > > > > In the hybrid mode:
> > > > >
> > > > > - Specific livepatches can be marked as "non-replaceable" to ensure they
> > > > >   remain active and unaffected during replacements.
> > > > >
> > > > > - Other livepatches can be marked as "replaceable," allowing targeted
> > > > >   replacements of only those patches.
> > > > >
> > > > > This selective approach would reduce unnecessary transitions, lower the
> > > > > risk of temporary patch loss, and mitigate performance issues during
> > > > > livepatch replacement.
> > > > >
> > > > > --- a/kernel/livepatch/core.c
> > > > > +++ b/kernel/livepatch/core.c
> > > > > @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
> > > > >               klp_for_each_object(old_patch, old_obj) {
> > > > >                       int err;
> > > > >
> > > > > +                     if (!old_patch->replaceable)
> > > > > +                             continue;
> > > >
> > > > This is one example where things might get very complicated.
> > >
> > > Why does this example even exist in the first place?
> > > If hybrid mode is enabled, this scenario could have been avoided entirely.
> >
> 
> You have many questions, but it seems you haven’t taken the time to read even
> a single line of this patchset. I’ll try to address them to save you some time.

What?

> > How exactly this scenario could be avoided, please?
> >
> > In the real life, livepatches are used to fix many bugs.
> > And every bug is fixed by livepatching several functions.
> >
> > Fix1 livepatches: funcA
> > Fix2 livepatches: funcA, funcB
> > Fix3 livepatches: funcB
> >
> > How would you handle this with the hybrid model?
> 
> In your scenario, each Fix will replace the applied livepatches, so
> they must be set to replaceable.
> To clarify the hybrid model, I'll illustrate it with the following Fixes:
> 
> Fix1 livepatches: funcA
> Fix2  livepatches: funcC
> Fix3 livepatches: funcA, funcB

How does look the livepatched FuncA here?
Does it contain changes only for Fix3?
Or is it cummulative livepatches_funA includes both Fix1 + Fix3?

> Fix4  livepatches: funcD
> Fix5 livepatches: funcB
> 
> Each FixN represents a single /sys/kernel/livepatch/FixN.
> By default, all Fixes are set to non-replaceable.
> 
> Step-by-step process:
> 1. Loading Fix1
>    Loaded Fixes: Fix1
> 2. Loading Fix2
>    Loaded Fixes: Fix1 Fix2
> 3. Loading Fix3
>     Before loading, set Fix1 to replaceable and replace it with Fix3,
> which is a cumulative patch of Fix1 and Fix3.

Who will modify the "replaceable" flag of "Fix1"? Userspace or kernel?

How the livepatch modules would get installed/removed to/from the
filesystem?

We (SUSE) distribute the livepatches using RPM packages. Every new
version of the livepatch module is packaged in a RPM package with
the same name and higher version. A post install script loads
the module into the kernel and removes disabled livepatch modules.

The package update causes that the new version of the livepatch module
is enabled and the old version is removed. And also the old version
of the module and is removed from the filesystem together with the old
version of the RPM package.

This matches the behavior of the atomic replace. There is always only
one version of the livepatch RPM package installed and only one
livepatch module loaded/enabled. And when it works correcly then
the version of the installed package matches the version of the loaded
livepatch module.

This might be hard to achieve with the hybrid model. Every livepatch
module would need to be packaged in a separate (different name)
RPM package. And some userspace service would need to clean up both
kernel modules and livepatch RPM packages (remove the unused ones).

This might add a lot of extra complexity.

>     Loaded Fixes:  Fix2 Fix3
> 4. Loading Fix4
>     Loaded Fixes:  Fix2 Fix3 Fix4
> 5. Loading Fix5
>     Similar to Step 3, set Fix3 to replaceable and replace it with Fix5.
>     Loaded Fixes:  Fix2 Fix4 Fix5

Let's imagine another situation:

Fix1 livepatches: funcA, funcB
Fix2  livepatches: funcB, funcC
Fix3 livepatches: funcA, funcC

Variant A:

 1. Loading Fix1
    Loaded Fixes: Fix1
    In use:: funcA_fix1, funcB_fix1

 2. Loading Fix2
    Loaded Fixes: Fix1 Fix2
    In use: funcA_fix1, funcB_fix2, funcC_fix2

 3. Loading Fix3
    Loaded Fixes: Fix2 Fix3
    In use: funcA_fix3, funcB_fix2, funcC_fix3

    This will be correct only when:

	+ funcA_fix3 contains both changes from Fix1 and Fix3
	+ funcC_fix3 contains both changes from Fix2 and Fix3


Variant B:

 1. Loading Fix1
    Loaded Fixes: Fix1
    In use:: funcA_fix1, funcB_fix1

 2. Loading Fix2 (fails from some reason or is skipped)
    Loaded Fixes: Fix1
    In use:: funcA_fix1, funcB_fix1

 3. Loading Fix3
    Loaded Fixes: Fix1 Fix2
    In use: funcA_fix3, funcB_fix1, funcC_fix3

    This will be correct only when:

	+ funcA_fix3 contains both changes from Fix1 and Fix3
	    and stays funcB_fix1 is compatible with funcA_fix3
	+ funcC_fix3 contains changes only from Fix3,
	    it must be compatible with the original funcB because

I want to say that this would work only when all livepatches
are loaded in the right order. Otherwise, it might break
the system.

How do you want to ensure this?

Is it really that easy?

> 3. Loading Fix3
>     Before loading, set Fix1 to replaceable and replace it with Fix3,

> This hybrid model ensures that irrelevant Fixes remain unaffected
> during replacements.

It makes some some now. But IMHO, it is not as easy as it looks.
The complexity is in details.

> >
> > Which fix will be handled by livepatches installed in parallel?
> > Which fix will be handled by atomic replace?
> > How would you keep it consistent?
> >
> > How would it work when there are hundreds of fixes and thousands
> > of livepatched functions?
> 
> The key point is that if a new Fix modifies a function already changed
> by previous Fixes, all the affected old Fixes should be set to
> replaceable, merged into the new Fix, and then the old Fixes should be
> replaced with the new one.

As I tried to explain above. This might be hard to use in practice.

We would either need to enforce loading all livepatches in the right
order. It might be hard to make it user friendly.

Or it would need a lot of extra code which would ensure that only
compatible livepatches can be loaded.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-05 14:42           ` Yafang Shao
@ 2025-02-05 17:59             ` Song Liu
  2025-02-06  2:54               ` Yafang Shao
  0 siblings, 1 reply; 36+ messages in thread
From: Song Liu @ 2025-02-05 17:59 UTC (permalink / raw)
  To: Yafang Shao
  Cc: bpf, Miroslav Benes, Petr Mladek, jpoimboe, jikos, joe.lawrence,
	live-patching, linux-kernel

On Wed, Feb 5, 2025 at 6:43 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Feb 4, 2025 at 5:53 AM Song Liu <song@kernel.org> wrote:
> >
> > On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > [...]
> > >
> > > If you’re managing a large fleet of servers, this issue is far from negligible.
> > >
> > > >
> > > > > Can you provide examples of companies that use atomic replacement at
> > > > > scale in their production environments?
> > > >
> > > > At least SUSE uses it as a solution for its customers. No many problems
> > > > have been reported since we started ~10 years ago.
> >
> > We (Meta) always use atomic replacement for our live patches.
> >
> > >
> > > Perhaps we’re running different workloads.
> > > Going back to the original purpose of livepatching: is it designed to address
> > > security vulnerabilities, or to deploy new features?
> > > If it’s the latter, then there’s definitely a lot of room for improvement.
> >
> > We only use KLP to fix bugs and security vulnerabilities. We do not use
> > live patches to deploy new features.
>
> +BPF
>
> Hello Song,
>
> Since bpf_fexit also uses trampolines, I was curious about what would
> happen if I attached do_exit() to fexit. Unfortunately, it triggers a
> bug in BPF as well. The BPF program is as follows:
>
> SEC("fexit/do_exit")
> int fexit_do_exit
> {
>     return 0;
> }
>
> After the fexit program exits, the trampoline is still left over:
>
> $ bpftool  link show  <<<< nothing output
> $ grep "bpf_trampoline_[0-9]" /proc/kallsyms
> ffffffffc04cb000 t bpf_trampoline_6442526459    [bpf]

I think we should first understand why the trampoline is not
freed.

> We could either add functions annotated as "__noreturn" to the deny
> list for fexit as follows, or we could explore a more generic
> solution, such as embedding the "noreturn" information into the BTF
> and extracting it when attaching fexit.

I personally don't think this is really necessary. It is good to have.
But a reasonable user should not expect noreturn function to
generate fexit events.

Thanks,
Song

> Any thoughts?
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d77abb87ffb1..37192888473c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -22742,6 +22742,13 @@ BTF_ID(func, __rcu_read_unlock)
>  #endif
>  BTF_SET_END(btf_id_deny)
>
> +BTF_SET_START(fexit_deny)
> +BTF_ID_UNUSED
> +/* do_exit never returns */
> +/* TODO: Add other functions annotated with __noreturn or figure out
> a generic solution */
> +BTF_ID(func, do_exit)
> +BTF_SET_END(fexit_deny)
> +
>  static bool can_be_sleepable(struct bpf_prog *prog)
>  {
>         if (prog->type == BPF_PROG_TYPE_TRACING) {
> @@ -22830,6 +22837,9 @@ static int check_attach_btf_id(struct
> bpf_verifier_env *env)
>         } else if (prog->type == BPF_PROG_TYPE_TRACING &&
>                    btf_id_set_contains(&btf_id_deny, btf_id)) {
>                 return -EINVAL;
> +       } else if (prog->expected_attach_type == BPF_TRACE_FEXIT &&
> +                  btf_id_set_contains(&fexit_deny, btf_id)) {
> +               return -EINVAL;
>         }
>
>         key = bpf_trampoline_compute_key(tgt_prog,
> prog->aux->attach_btf, btf_id);

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-05 16:03           ` Petr Mladek
@ 2025-02-06  2:35             ` Yafang Shao
  2025-02-07 13:58               ` Petr Mladek
  0 siblings, 1 reply; 36+ messages in thread
From: Yafang Shao @ 2025-02-06  2:35 UTC (permalink / raw)
  To: Petr Mladek
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Thu, Feb 6, 2025 at 12:03 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2025-02-05 10:54:47, Yafang Shao wrote:
> > On Tue, Feb 4, 2025 at 9:21 PM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Mon 2025-01-27 23:34:50, Yafang Shao wrote:
> > > > On Mon, Jan 27, 2025 at 10:31 PM Petr Mladek <pmladek@suse.com> wrote:
> > > > >
> > > > > On Mon 2025-01-27 14:35:26, Yafang Shao wrote:
> > > > > > The atomic replace livepatch mechanism was introduced to handle scenarios
> > > > > > where we want to unload a specific livepatch without unloading others.
> > > > > > However, its current implementation has significant shortcomings, making
> > > > > > it less than ideal in practice. Below are the key downsides:
> > > > >
> > > > > [...]
> > > > >
> > > > > > In the hybrid mode:
> > > > > >
> > > > > > - Specific livepatches can be marked as "non-replaceable" to ensure they
> > > > > >   remain active and unaffected during replacements.
> > > > > >
> > > > > > - Other livepatches can be marked as "replaceable," allowing targeted
> > > > > >   replacements of only those patches.
> > > > > >
> > > > > > This selective approach would reduce unnecessary transitions, lower the
> > > > > > risk of temporary patch loss, and mitigate performance issues during
> > > > > > livepatch replacement.
> > > > > >
> > > > > > --- a/kernel/livepatch/core.c
> > > > > > +++ b/kernel/livepatch/core.c
> > > > > > @@ -658,6 +658,8 @@ static int klp_add_nops(struct klp_patch *patch)
> > > > > >               klp_for_each_object(old_patch, old_obj) {
> > > > > >                       int err;
> > > > > >
> > > > > > +                     if (!old_patch->replaceable)
> > > > > > +                             continue;
> > > > >
> > > > > This is one example where things might get very complicated.
> > > >
> > > > Why does this example even exist in the first place?
> > > > If hybrid mode is enabled, this scenario could have been avoided entirely.
> > >
> >
> > You have many questions, but it seems you haven’t taken the time to read even
> > a single line of this patchset. I’ll try to address them to save you some time.
>
> What?

Apologies if my words offended you.

>
> > > How exactly this scenario could be avoided, please?
> > >
> > > In the real life, livepatches are used to fix many bugs.
> > > And every bug is fixed by livepatching several functions.
> > >
> > > Fix1 livepatches: funcA
> > > Fix2 livepatches: funcA, funcB
> > > Fix3 livepatches: funcB
> > >
> > > How would you handle this with the hybrid model?
> >
> > In your scenario, each Fix will replace the applied livepatches, so
> > they must be set to replaceable.
> > To clarify the hybrid model, I'll illustrate it with the following Fixes:
> >
> > Fix1 livepatches: funcA
> > Fix2  livepatches: funcC
> > Fix3 livepatches: funcA, funcB
>
> How does look the livepatched FuncA here?
> Does it contain changes only for Fix3?
> Or is it cummulative livepatches_funA includes both Fix1 + Fix3?

It is cumulative livepatches_funA includes both Fix1 + Fix3.

>
> > Fix4  livepatches: funcD
> > Fix5 livepatches: funcB
> >
> > Each FixN represents a single /sys/kernel/livepatch/FixN.
> > By default, all Fixes are set to non-replaceable.
> >
> > Step-by-step process:
> > 1. Loading Fix1
> >    Loaded Fixes: Fix1
> > 2. Loading Fix2
> >    Loaded Fixes: Fix1 Fix2
> > 3. Loading Fix3
> >     Before loading, set Fix1 to replaceable and replace it with Fix3,
> > which is a cumulative patch of Fix1 and Fix3.
>
> Who will modify the "replaceable" flag of "Fix1"? Userspace or kernel?

Userspace scripts. Modify it before loading a new one.

>
> How the livepatch modules would get installed/removed to/from the
> filesystem?

It doesn't make any difference.

>
> We (SUSE) distribute the livepatches using RPM packages. Every new
> version of the livepatch module is packaged in a RPM package with
> the same name and higher version. A post install script loads
> the module into the kernel and removes disabled livepatch modules.

Similar to us.

>
> The package update causes that the new version of the livepatch module
> is enabled and the old version is removed. And also the old version
> of the module and is removed from the filesystem together with the old
> version of the RPM package.
>
> This matches the behavior of the atomic replace. There is always only
> one version of the livepatch RPM package installed and only one
> livepatch module loaded/enabled. And when it works correcly then
> the version of the installed package matches the version of the loaded
> livepatch module.
>
> This might be hard to achieve with the hybrid model. Every livepatch
> module would need to be packaged in a separate (different name)
> RPM package.

Before switching to atomic replace, we packaged all the livepatch
modules into a single RPM. The RPM installation handled them quite
well.

> And some userspace service would need to clean up both
> kernel modules and livepatch RPM packages (remove the unused ones).
>
> This might add a lot of extra complexity.

It's not that complex—just like what we did before atomic replacement
was enabled.

>
> >     Loaded Fixes:  Fix2 Fix3
> > 4. Loading Fix4
> >     Loaded Fixes:  Fix2 Fix3 Fix4
> > 5. Loading Fix5
> >     Similar to Step 3, set Fix3 to replaceable and replace it with Fix5.
> >     Loaded Fixes:  Fix2 Fix4 Fix5
>
> Let's imagine another situation:
>
> Fix1 livepatches: funcA, funcB
> Fix2  livepatches: funcB, funcC
> Fix3 livepatches: funcA, funcC
>
> Variant A:
>
>  1. Loading Fix1
>     Loaded Fixes: Fix1
>     In use:: funcA_fix1, funcB_fix1
>
>  2. Loading Fix2
>     Loaded Fixes: Fix1 Fix2
>     In use: funcA_fix1, funcB_fix2, funcC_fix2
>
>  3. Loading Fix3
>     Loaded Fixes: Fix2 Fix3
>     In use: funcA_fix3, funcB_fix2, funcC_fix3
>
>     This will be correct only when:
>
>         + funcA_fix3 contains both changes from Fix1 and Fix3
>         + funcC_fix3 contains both changes from Fix2 and Fix3
>
>
> Variant B:
>
>  1. Loading Fix1
>     Loaded Fixes: Fix1
>     In use:: funcA_fix1, funcB_fix1
>
>  2. Loading Fix2 (fails from some reason or is skipped)
>     Loaded Fixes: Fix1
>     In use:: funcA_fix1, funcB_fix1
>
>  3. Loading Fix3
>     Loaded Fixes: Fix1 Fix2
>     In use: funcA_fix3, funcB_fix1, funcC_fix3
>
>     This will be correct only when:
>
>         + funcA_fix3 contains both changes from Fix1 and Fix3
>             and stays funcB_fix1 is compatible with funcA_fix3
>         + funcC_fix3 contains changes only from Fix3,
>             it must be compatible with the original funcB because
>
> I want to say that this would work only when all livepatches
> are loaded in the right order. Otherwise, it might break
> the system.
>
> How do you want to ensure this?

As we discussed earlier, if there’s an overlap between two
livepatches, we merge them into a cumulative patch and replace the old
one.

>
> Is it really that easy?
>
> > 3. Loading Fix3
> >     Before loading, set Fix1 to replaceable and replace it with Fix3,
>
> > This hybrid model ensures that irrelevant Fixes remain unaffected
> > during replacements.
>
> It makes some some now. But IMHO, it is not as easy as it looks.
> The complexity is in details.
>
> > >
> > > Which fix will be handled by livepatches installed in parallel?
> > > Which fix will be handled by atomic replace?
> > > How would you keep it consistent?
> > >
> > > How would it work when there are hundreds of fixes and thousands
> > > of livepatched functions?
> >
> > The key point is that if a new Fix modifies a function already changed
> > by previous Fixes, all the affected old Fixes should be set to
> > replaceable, merged into the new Fix, and then the old Fixes should be
> > replaced with the new one.
>
> As I tried to explain above. This might be hard to use in practice.
>
> We would either need to enforce loading all livepatches in the right
> order. It might be hard to make it user friendly.
>
> Or it would need a lot of extra code which would ensure that only
> compatible livepatches can be loaded.

That’s related to the configuration. If you decide to use it, you
should familiarize yourself with the best practices. Once you
understand those, it becomes much simpler and less complex.

BTW, based on my experience, the likelihood of overlapping between two
livepatches is very low in real production environments. I haven’t
encountered that case so far in our production servers.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-05 17:59             ` Song Liu
@ 2025-02-06  2:54               ` Yafang Shao
  2025-02-06 18:00                 ` Song Liu
  0 siblings, 1 reply; 36+ messages in thread
From: Yafang Shao @ 2025-02-06  2:54 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Miroslav Benes, Petr Mladek, jpoimboe, jikos, joe.lawrence,
	live-patching, linux-kernel

On Thu, Feb 6, 2025 at 1:59 AM Song Liu <song@kernel.org> wrote:
>
> On Wed, Feb 5, 2025 at 6:43 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Feb 4, 2025 at 5:53 AM Song Liu <song@kernel.org> wrote:
> > >
> > > On Mon, Feb 3, 2025 at 1:45 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > [...]
> > > >
> > > > If you’re managing a large fleet of servers, this issue is far from negligible.
> > > >
> > > > >
> > > > > > Can you provide examples of companies that use atomic replacement at
> > > > > > scale in their production environments?
> > > > >
> > > > > At least SUSE uses it as a solution for its customers. No many problems
> > > > > have been reported since we started ~10 years ago.
> > >
> > > We (Meta) always use atomic replacement for our live patches.
> > >
> > > >
> > > > Perhaps we’re running different workloads.
> > > > Going back to the original purpose of livepatching: is it designed to address
> > > > security vulnerabilities, or to deploy new features?
> > > > If it’s the latter, then there’s definitely a lot of room for improvement.
> > >
> > > We only use KLP to fix bugs and security vulnerabilities. We do not use
> > > live patches to deploy new features.
> >
> > +BPF
> >
> > Hello Song,
> >
> > Since bpf_fexit also uses trampolines, I was curious about what would
> > happen if I attached do_exit() to fexit. Unfortunately, it triggers a
> > bug in BPF as well. The BPF program is as follows:
> >
> > SEC("fexit/do_exit")
> > int fexit_do_exit
> > {
> >     return 0;
> > }
> >
> > After the fexit program exits, the trampoline is still left over:
> >
> > $ bpftool  link show  <<<< nothing output
> > $ grep "bpf_trampoline_[0-9]" /proc/kallsyms
> > ffffffffc04cb000 t bpf_trampoline_6442526459    [bpf]
>
> I think we should first understand why the trampoline is not
> freed.

IIUC, the fexit works as follows,

  bpf_trampoline
    + __bpf_tramp_enter
       + percpu_ref_get(&tr->pcref);

    + call do_exit()

    + __bpf_tramp_exit
       + percpu_ref_put(&tr->pcref);

Since do_exit() never returns, the refcnt of the trampoline image is
never decremented, preventing it from being freed.

>
> > We could either add functions annotated as "__noreturn" to the deny
> > list for fexit as follows, or we could explore a more generic
> > solution, such as embedding the "noreturn" information into the BTF
> > and extracting it when attaching fexit.
>
> I personally don't think this is really necessary. It is good to have.
> But a reasonable user should not expect noreturn function to
> generate fexit events.

If we don't plan to fix it, we should clearly document it to guide
users and advise them against using it.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-06  2:54               ` Yafang Shao
@ 2025-02-06 18:00                 ` Song Liu
  2025-02-08  6:41                   ` Yafang Shao
  0 siblings, 1 reply; 36+ messages in thread
From: Song Liu @ 2025-02-06 18:00 UTC (permalink / raw)
  To: Yafang Shao
  Cc: bpf, Miroslav Benes, Petr Mladek, jpoimboe, jikos, joe.lawrence,
	live-patching, linux-kernel

On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
[...]
> > I think we should first understand why the trampoline is not
> > freed.
>
> IIUC, the fexit works as follows,
>
>   bpf_trampoline
>     + __bpf_tramp_enter
>        + percpu_ref_get(&tr->pcref);
>
>     + call do_exit()
>
>     + __bpf_tramp_exit
>        + percpu_ref_put(&tr->pcref);
>
> Since do_exit() never returns, the refcnt of the trampoline image is
> never decremented, preventing it from being freed.

Thanks for the explanation. In this case, I think it makes sense to
disallow attaching fexit programs on __noreturn functions. I am not
sure what is the best solution for it though.

Thanks,
Song


> >
> > > We could either add functions annotated as "__noreturn" to the deny
> > > list for fexit as follows, or we could explore a more generic
> > > solution, such as embedding the "noreturn" information into the BTF
> > > and extracting it when attaching fexit.
> >
> > I personally don't think this is really necessary. It is good to have.
> > But a reasonable user should not expect noreturn function to
> > generate fexit events.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-01-27  6:35 ` [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode Yafang Shao
  2025-01-27 14:31   ` Petr Mladek
@ 2025-02-07  2:31   ` Josh Poimboeuf
  2025-02-07  3:16     ` Yafang Shao
  1 sibling, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2025-02-07  2:31 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching, linux-kernel

On Mon, Jan 27, 2025 at 02:35:26PM +0800, Yafang Shao wrote:
> The atomic replace livepatch mechanism was introduced to handle scenarios
> where we want to unload a specific livepatch without unloading others.
> However, its current implementation has significant shortcomings, making
> it less than ideal in practice. Below are the key downsides:
> 
> - It is expensive
> 
>   During testing with frequent replacements of an old livepatch, random RCU
>   warnings were observed:
> 
>   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
>   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
>   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
>   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
>   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
>   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
>   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
>   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
>   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
>   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
>   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
>   
>   This indicates that atomic replacement can cause performance issues,
>   particularly with RCU synchronization under frequent use.

Why does this happen?

> - Potential Risks During Replacement 
> 
>   One known issue involves replacing livepatched versions of critical
>   functions such as do_exit(). During the replacement process, a panic
>   might occur, as highlighted in [0]. Other potential risks may also arise
>   due to inconsistencies or race conditions during transitions.

That needs to be fixed.

> - Temporary Loss of Patching 
> 
>   During the replacement process, the old patch is set to a NOP (no-operation)
>   before the new patch is fully applied. This creates a window where the
>   function temporarily reverts to its original, unpatched state. If the old
>   patch fixed a critical issue (e.g., one that prevented a system panic), the
>   system could become vulnerable to that issue during the transition.

Are you saying that atomic replace is not atomic?  If so, this sounds
like another bug.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-07  2:31   ` Josh Poimboeuf
@ 2025-02-07  3:16     ` Yafang Shao
  2025-02-07  9:36       ` Petr Mladek
  2025-02-07 16:59       ` Josh Poimboeuf
  0 siblings, 2 replies; 36+ messages in thread
From: Yafang Shao @ 2025-02-07  3:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching, linux-kernel

On Fri, Feb 7, 2025 at 10:31 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Mon, Jan 27, 2025 at 02:35:26PM +0800, Yafang Shao wrote:
> > The atomic replace livepatch mechanism was introduced to handle scenarios
> > where we want to unload a specific livepatch without unloading others.
> > However, its current implementation has significant shortcomings, making
> > it less than ideal in practice. Below are the key downsides:
> >
> > - It is expensive
> >
> >   During testing with frequent replacements of an old livepatch, random RCU
> >   warnings were observed:
> >
> >   [19578271.779605] rcu_tasks_wait_gp: rcu_tasks grace period 642409 is 10024 jiffies old.
> >   [19578390.073790] rcu_tasks_wait_gp: rcu_tasks grace period 642417 is 10185 jiffies old.
> >   [19578423.034065] rcu_tasks_wait_gp: rcu_tasks grace period 642421 is 10150 jiffies old.
> >   [19578564.144591] rcu_tasks_wait_gp: rcu_tasks grace period 642449 is 10174 jiffies old.
> >   [19578601.064614] rcu_tasks_wait_gp: rcu_tasks grace period 642453 is 10168 jiffies old.
> >   [19578663.920123] rcu_tasks_wait_gp: rcu_tasks grace period 642469 is 10167 jiffies old.
> >   [19578872.990496] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 10215 jiffies old.
> >   [19578903.190292] rcu_tasks_wait_gp: rcu_tasks grace period 642529 is 40415 jiffies old.
> >   [19579017.965500] rcu_tasks_wait_gp: rcu_tasks grace period 642577 is 10174 jiffies old.
> >   [19579033.981425] rcu_tasks_wait_gp: rcu_tasks grace period 642581 is 10143 jiffies old.
> >   [19579153.092599] rcu_tasks_wait_gp: rcu_tasks grace period 642625 is 10188 jiffies old.
> >
> >   This indicates that atomic replacement can cause performance issues,
> >   particularly with RCU synchronization under frequent use.
>
> Why does this happen?

It occurs during the KLP transition. It seems like the KLP transition
is taking too long.

[20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
[20329703.340417] livepatch: 'livepatch_61_release6': starting
patching transition
[20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
10166 jiffies old.
[20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
10219 jiffies old.
[20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
10199 jiffies old.
[20329754.848036] livepatch: 'livepatch_61_release6': patching complete

>
> > - Potential Risks During Replacement
> >
> >   One known issue involves replacing livepatched versions of critical
> >   functions such as do_exit(). During the replacement process, a panic
> >   might occur, as highlighted in [0]. Other potential risks may also arise
> >   due to inconsistencies or race conditions during transitions.
>
> That needs to be fixed.
>
> > - Temporary Loss of Patching
> >
> >   During the replacement process, the old patch is set to a NOP (no-operation)
> >   before the new patch is fully applied. This creates a window where the
> >   function temporarily reverts to its original, unpatched state. If the old
> >   patch fixed a critical issue (e.g., one that prevented a system panic), the
> >   system could become vulnerable to that issue during the transition.
>
> Are you saying that atomic replace is not atomic?  If so, this sounds
> like another bug.

From my understanding, there’s a window where the original function is
not patched.

klp_enable_patch
+ klp_init_patch
   + if (patch->replace)
          klp_add_nops(patch);  <<<< set all old patches to nop

+ __klp_enable_patch
   + klp_patch_object
      + klp_patch_func
         + ops = klp_find_ops(func->old_func);
            + if (ops)
                   // add the new patch to the func_stack list
                   list_add_rcu(&func->stack_node, &ops->func_stack);


klp_ftrace_handler
+ func = list_first_or_null_rcu(&ops->func_stack, struct klp_func
+ if (func->nop)
       goto unlock;
+ ftrace_regs_set_instruction_pointer(fregs, (unsigned long)func->new_func);

Before the new atomic replace patch is added to the func_stack list,
the old patch is already set to nop. If klp_ftrace_handler() is
triggered at this point, it will effectively do nothing—in other
words, it will execute the original function.
I might be wrong.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-07  3:16     ` Yafang Shao
@ 2025-02-07  9:36       ` Petr Mladek
  2025-02-08  2:14         ` Yafang Shao
  2025-02-07 16:59       ` Josh Poimboeuf
  1 sibling, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2025-02-07  9:36 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Josh Poimboeuf, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Fri 2025-02-07 11:16:45, Yafang Shao wrote:
> On Fri, Feb 7, 2025 at 10:31 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > On Mon, Jan 27, 2025 at 02:35:26PM +0800, Yafang Shao wrote:
> > > - Temporary Loss of Patching
> > >
> > >   During the replacement process, the old patch is set to a NOP (no-operation)
> > >   before the new patch is fully applied. This creates a window where the
> > >   function temporarily reverts to its original, unpatched state. If the old
> > >   patch fixed a critical issue (e.g., one that prevented a system panic), the
> > >   system could become vulnerable to that issue during the transition.
> >
> > Are you saying that atomic replace is not atomic?  If so, this sounds
> > like another bug.
> 
> >From my understanding, there’s a window where the original function is
> not patched.

This is a misunderstanding.

> klp_enable_patch
> + klp_init_patch
>    + if (patch->replace)
>           klp_add_nops(patch);  <<<< set all old patches to nop

1. The "nop" entry is added into the _new_ (to-be-enabled) livepatch,
   see klp_add_nops(patch). The parameter is the _newly_ enabled patch.

2. The "nop" entries are added only for functions which are currently
   livepatched but they are not longer livepatched in the new
   livepatch, see:

static int klp_add_object_nops(struct klp_patch *patch,
			       struct klp_object *old_obj)
{
[...]
	klp_for_each_func(old_obj, old_func) {
		func = klp_find_func(obj, old_func);
		if (func)
			continue;	<------ Do not allocate nop
						when the fuction is
						implemeted in the new
						livepatch.

		func = klp_alloc_func_nop(old_func, obj);
		if (!func)
			return -ENOMEM;
	}

	return 0;
}


> + __klp_enable_patch
>    + klp_patch_object
>       + klp_patch_func
>          + ops = klp_find_ops(func->old_func);
>             + if (ops)
>                    // add the new patch to the func_stack list
>                    list_add_rcu(&func->stack_node, &ops->func_stack);
> 
> 
> klp_ftrace_handler
> + func = list_first_or_null_rcu(&ops->func_stack, struct klp_func

3. You omitted this important part of the code:

	if (unlikely(func->transition)) {
		patch_state = current->patch_state;
		if (patch_state == KLP_TRANSITION_UNPATCHED) {
			/*
---->			 * Use the previously patched version of the function.  
---->			 * If no previous patches exist, continue with the
---->			 * original function.
			 */
			func = list_entry_rcu(func->stack_node.next,
					      struct klp_func, stack_node);


	The condition "patch_state == KLP_TRANSITION_UNPATCHED" might
	be a bit misleading.

	The state "KLP_TRANSITION_UNPATCHED" means that it can't use
	the code from the "new" livepatch => it has to fallback
	to the previously used code => previous livepatch.


> + if (func->nop)
>        goto unlock;
> + ftrace_regs_set_instruction_pointer(fregs, (unsigned long)func->new_func);

> Before the new atomic replace patch is added to the func_stack list,
> the old patch is already set to nop.
      ^^^ 
     
     The nops are set in the _new_ patch for functions which will
     not longer get livepatched, see the commit e1452b607c48c642
     ("livepatch: Add atomic replace") for more details.
     
> If klp_ftrace_handler() is
> triggered at this point, it will effectively do nothing—in other
> words, it will execute the original function.
> I might be wrong.

Fortunately, you are wrong. This would be a serious violation of
the consistency model and livepatches modifying some semantic would
blow up systems.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-05  6:16           ` Yafang Shao
@ 2025-02-07 11:00             ` Petr Mladek
  2025-02-08  2:49               ` Yafang Shao
  0 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2025-02-07 11:00 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Miroslav Benes, jpoimboe, jikos, joe.lawrence, live-patching,
	linux-kernel

On Wed 2025-02-05 14:16:42, Yafang Shao wrote:
> On Tue, Feb 4, 2025 at 9:05 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> > > On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> > > >
> > > > > >
> > > > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > > > >
> > > > > The script:
> > > > >
> > > > > #!/bin/bash
> > > > > while true; do
> > > > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > > > >         ./apply_livepatch_61.sh # it will sleep 5s
> > > > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > > > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > > > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > > > done
> > > >
> > > > A live patch application is a slowpath. It is expected not to run
> > > > frequently (in a relative sense).
> > >
> > > The frequency isn’t the main concern here; _scalability_ is the key issue.
> > > Running livepatches once per day (a relatively low frequency) across all of our
> > > production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> > > periodically run tests on a subset of test servers.
> >
> > I am confused. The original problem was a system crash when
> > livepatching do_exit() function, see
> > https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com
> 
> Why do you view this patchset as a solution to the original problem?

You discovered the hardlockup warnings when trying to reproduce the
original problem. At least, you mentioned this at
https://lore.kernel.org/r/20250122085146.41553-1-laoar.shao@gmail.com

And using the hybrid module would allow to livepatch do_exit() only
once and do not touch it any longer. It is not the right solution
but it would be a workaround.


> > The rcu watchdog warning was first mentioned in this patchset.
> > Do you see rcu watchdog warning in production or just
> > with this artificial test, please?
> 
> So, we shouldn’t run any artificial tests on the livepatch, correct?
> What exactly is the issue with these test cases?

Some tests might be too artificial. They might find problems which
do not exist in practice.

Disclaimer: I do not say that this is the case. You actually prove
	later in this mail that the hardlockup warning happen
	even in production.

Anyway, if an artificial test finds a potential problem and the fix is
complicated then we need to decide if it is worth it.

It does not make sense to complicate the code too much when
the fixed problem does not happen in practice.

  + Too complicated code might introduce regressions which are
    more serious than the original problem.

  + Too complicated code increases the maintenance cost. It is
    more complicated to add new features or fix bugs.


> > > > If you stress it like this, it is quite
> > > > expected that it will have an impact. Especially on a large busy system.
> > >
> > > It seems you agree that the current atomic-replace process lacks scalability.
> > > When deploying a livepatch across a large fleet of servers, it’s impossible to
> > > ensure that the servers are idle, as their workloads are constantly varying and
> > > are not deterministic.
> >
> > Do you see the scalability problem in production, please?
> 
> Yes, the livepatch transition was stalled.

Good to know.

> 
> > And could you prove that it was caused by livepatching, please?
> 
> When the livepatch transition is stalled, running `kpatch list` will
> display the stalled information.

OK.

> > > The challenges are very different when managing 1K servers versus 1M servers.
> > > Similarly, the issues differ significantly between patching a single
> > > function and
> > > patching 100 functions, especially when some of those functions are critical.
> > > That’s what scalability is all about.
> > >
> > > Since we transitioned from the old livepatch mode to the new
> > > atomic-replace mode,
> >
> > What do you mean with the old livepatch mode, please?
> 
> $ kpatch-build -R

I am not familiar with kpatch-build. OK, I see the following at
https://github.com/dynup/kpatch/blob/master/kpatch-build/kpatch-build

echo "		-R, --non-replace       Disable replace patch (replace is on by default)" >&2

> >
> > Did you allow to install more livepatches in parallel?
> 
> No.

I guess that there is a misunderstanding. I am sorry my wording was
not clear enough.

By "installing" more livepatches in parallel I meant to "have enabled"
more livepatches in parallel. It is possible only when you do not
use the atomic replace.

By other words, if you use "kpatch-build -R" then you could have
enabled more livepatches in parallel.


> > What was the motivation to switch to the atomic replace, please?
> 
> This is the default behavior of kpatch [1] after upgrading to a new version.
> 
> [1].  https://github.com/dynup/kpatch/tree/master

OK. I wonder if the atomic replace simplified some actions for you.

I ask because the proposed "hybrid" model is very similar to the "old"
model which did not use the atomic replace.

What are the advantages of the "hybrid" model over the "old" model, please?


> > > our SREs have consistently reported that one or more servers become
> > > stalled during
> > > the upgrade (replacement).
> >
> > What is SRE, please?
> 
> >From the wikipedia : https://en.wikipedia.org/wiki/Site_reliability_engineering

Good to know.

> > Could you please show some log from a production system?
> 
> When the SREs initially reported that the livepatch transition was
> stalled, I simply advised them to try again. However, after
> experiencing these crashes, I dug deeper into the livepatch code and
> realized that scalability is a concern. As a result, periodically
> replacing an old livepatch triggers RCU warnings on our production
> servers.
> 
> [Wed Feb  5 10:56:10 2025] livepatch: enabling patch 'livepatch_61_release6'
> [Wed Feb  5 10:56:10 2025] livepatch: 'livepatch_61_release6':
> starting patching transition
> [Wed Feb  5 10:56:24 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> 1126113 is 10078 jiffies old.
> [Wed Feb  5 10:56:38 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> 1126117 is 10199 jiffies old.
> [Wed Feb  5 10:56:52 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> 1126121 is 10047 jiffies old.
> [Wed Feb  5 10:56:57 2025] livepatch: 'livepatch_61_release6': patching complete

I guess that this happens primary because there are many processes
running in kernel code:

       + many processes => long task list
       + in kernel code => need to check stack

I wondered how much it is caused by livepatching do_exit() which
takes tasklist_lock. The idea was:

	+ The processes calling do_exit() are blocked at
	  write_lock_irq(&tasklist_lock) when
	  klp_try_complete_transition() takes the tasklist_lock.

	+ These processes can't be transitioned because do_exit()
	  is on the stack => more klp_try_complete_transition()
	  rounds is needed.

	  => livepatching do_exit() reducess the chance of
	     klp_try_complete_transition() succcess.

	Well, it should not be that big problem because the next
	 klp_try_complete_transition() should be much faster.
	 It will skip already transitioned processes quickly.

       Resume: I think that livepatching of do_exit() should not be the main
	 	problem for the stall.


> PS: You might argue again about the frequency. If you believe this is
> just a frequency issue, please suggest a suitable frequency.

I do not know. The livepatch transition might block some processes.
It is a kind of stress for the system. Similar to another
housekeeping operations.

It depends on the load and the amount and type of livepatched
functions. It might take some time until the system recovers
from the stress and the system load drops back to normal.

If you create the stress (livepatch transition) too frequently
and the system does not get chance to recover in between the
stress situations then the bad effects might accumulate
and might be much worse.

I have no idea if it is the case here. The rule of thumb would be:

  + If you see the hardlockup warning _only_ when running the stress
    test "while true: do apply_livepatch ; done;" then
    the problem might be rather theoretical.

  + If you see the hardlockup warning on production systems where
    the you apply a livepatch only occasionally (one per day)
    then the problem is real and we should fix it.


> > > > > > > Other potential risks may also arise
> > > > > > >   due to inconsistencies or race conditions during transitions.
> > > > > >
> > > > > > What inconsistencies and race conditions you have in mind, please?
> > > > >
> > > > > I have explained it at
> > > > > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> > > > >
> > > > >  klp_ftrace_handler
> > > > >       if (unlikely(func->transition)) {
> > > > >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> > > > >   }
> > > > >
> > > > > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > > > > that led to the decision to add this warning?
> > > >
> > > > A safety measure for something which really should not happen.
> > >
> > > Unfortunately, this issue occurs during my stress tests.
> >
> > I am confused. Do you see the above WARN_ON_ONCE() during your
> > stress test? Could you please provide a log?
> 
> Could you pls read my replyment seriously ?

This is pretty hars and offending after so many details I have already
provided!

It is easy to miss a detail in a flood of long mails. Also I am
working on many other things in parallel.

> https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354

Ah, I have missed that you triggered this exact WARNING. It is great.
It confirms the theory about the race in do_exit(). I mean that
the transition finishes early because the processes in do_exit()
are not longer visible in the tasklist.


> > > > > > The main advantage of the atomic replace is simplify the maintenance
> > > > > > and debugging.
> >
> > If you have problems with the atomic replace then you might stop using
> > it completely and just install more livepatches in parallel.
> 
> Why do we need to install livepatches in parallel if atomic replace is disabled?
> We only need to install the additional new livepatch. Parallel
> installation is only necessary at boot time.

This is misunderstanding. By "installed" livepatches in parallel I
mean "enabled" livepatches in parallel, aka, without atomic replace.

If you have problems with atomic replace, you might stop using it.
Honestly, I do not see that big advantage in the hybrid model
over the non-atomic-replace model.

That said, I think that the hybrid mode will not prevent the
hardlockup warning. It seems that you have reproduced the hardlockup
even with a relatively simple livepatch, see
https://lore.kernel.org/all/CALOAHbBZc6ORGzXwBRwe+rD2=YGf1jub5TEr989_GpK54P2o1A@mail.gmail.com/

IMHO, we should rather detect and break the stall in
klp_try_complete_transition(). I mean to go the way explored in
the thread
https://lore.kernel.org/all/20250122085146.41553-1-laoar.shao@gmail.com/

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-06  2:35             ` Yafang Shao
@ 2025-02-07 13:58               ` Petr Mladek
  2025-02-08  3:08                 ` Yafang Shao
  0 siblings, 1 reply; 36+ messages in thread
From: Petr Mladek @ 2025-02-07 13:58 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Thu 2025-02-06 10:35:11, Yafang Shao wrote:
> On Thu, Feb 6, 2025 at 12:03 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Wed 2025-02-05 10:54:47, Yafang Shao wrote:
> > > On Tue, Feb 4, 2025 at 9:21 PM Petr Mladek <pmladek@suse.com> wrote:
> > > >
> > > > On Mon 2025-01-27 23:34:50, Yafang Shao wrote:

I am not sure if you still would want the hybrid model.
It is possible that the timeout in klp_try_complete_transition()
would remove the hardlockup watchdog warnings, see
https://lore.kernel.org/r/Z6Tmqro6CSm0h-E3@pathway.suse.cz

I reply to this mail just for record because there were some
unanswered questions...

> > > To clarify the hybrid model, I'll illustrate it with the following Fixes:
> > >
> > > Fix1 livepatches: funcA
> > > Fix2  livepatches: funcC
> > > Fix3 livepatches: funcA, funcB
> >
> > How does look the livepatched FuncA here?
> > Does it contain changes only for Fix3?
> > Or is it cummulative livepatches_funA includes both Fix1 + Fix3?
> 
> It is cumulative livepatches_funA includes both Fix1 + Fix3.

It makes sense.

I have missed this in the previous mail. I see it there now (after
re-reading). But trick was somehow encrypted in long sentences.


> > > Fix4  livepatches: funcD
> > > Fix5 livepatches: funcB
> > >
> > > Each FixN represents a single /sys/kernel/livepatch/FixN.
> > > By default, all Fixes are set to non-replaceable.
> > >
> > > Step-by-step process:
> > > 1. Loading Fix1
> > >    Loaded Fixes: Fix1
> > > 2. Loading Fix2
> > >    Loaded Fixes: Fix1 Fix2
> > > 3. Loading Fix3
> > >     Before loading, set Fix1 to replaceable and replace it with Fix3,
> > > which is a cumulative patch of Fix1 and Fix3.
> >
> > Who will modify the "replaceable" flag of "Fix1"? Userspace or kernel?
> 
> Userspace scripts. Modify it before loading a new one.

This is one mine concern. Such a usespace script would be
more complex for the the hybrid model then for cumulative livepatches.
Any user of the hybrid model would have their own scripts
and eventual bugs.

Anyway, the more possibilities there more ways to break things
and the more complicated is to debug eventual problems.

If anyone would still like the hybrid model then I would like
to enforce some safe behavior from the kernel. I mean to do
as much as possible in the common (kernel) code.

I have the following ideas:

  1. Allow to load a livepatch with .replace enabled only when
     all conflicting[*] livepatches are allowed to be replaced.

  2. Automatically replace all conflicting[*] livepatches.

  3. Allow to define a list of to-be-replaced livepatches
     into struct patch.


The 1st or 2nd idea would make the hybrid model more safe.

The 2nd idea would even work without the .replaceable flag.

The 3rd idea would allow to replace even non-conflicting[*]
patches.

[*] I would define that two livepatches are "conflicting" when
    at least one function is modified by both of them. e.g.

	+ Patch1: funcA, funcB   \
	+ Patch2: funcC, funcD   - non-conflicting 

	+ Patch1: funcA, funcB          \
	+ Patch2:        funcB, funcC   - conflicting 

    Or a bit weaker definition. Two patches are "conflicting"
    when the new livepatch provides only partial replacement
    for already livepatch functions, .e.g.

	+ Patch1: funcA, funcB                \
	+ Patch2:               funcC, funcD  - non-conflicting anyway

	+ Patch1: funcA, funcB          - Patch1 can't replace Patch2 (conflict)
	+ Patch2: funcA, funcB, funcC   - Patch2 can replace Patch1 (no conflict)

	+ Patch1: funcA, funcB          \
	+ Patch2:        funcB, funcC   - conflicting anyway


Especially, the automatic replacement of conflicting patches might
make the hybrid model safe and easier to use. And it would resolve
most of my fears with this approach.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-07  3:16     ` Yafang Shao
  2025-02-07  9:36       ` Petr Mladek
@ 2025-02-07 16:59       ` Josh Poimboeuf
  2025-02-08  3:38         ` Yafang Shao
  1 sibling, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2025-02-07 16:59 UTC (permalink / raw)
  To: Yafang Shao
  Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching, linux-kernel

On Fri, Feb 07, 2025 at 11:16:45AM +0800, Yafang Shao wrote:
> On Fri, Feb 7, 2025 at 10:31 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > Why does this happen?
> 
> It occurs during the KLP transition. It seems like the KLP transition
> is taking too long.
> 
> [20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
> [20329703.340417] livepatch: 'livepatch_61_release6': starting
> patching transition
> [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
> 10166 jiffies old.
> [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
> 10219 jiffies old.
> [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
> 10199 jiffies old.
> [20329754.848036] livepatch: 'livepatch_61_release6': patching complete

How specifically does the KLP transition trigger rcu_tasks workings?

> Before the new atomic replace patch is added to the func_stack list,
> the old patch is already set to nop. If klp_ftrace_handler() is
> triggered at this point, it will effectively do nothing—in other
> words, it will execute the original function.
> I might be wrong.

That's not actually how it works.  klp_add_nops() probably needs some
better comments.

It adds nops to the *new* patch so that all the functions in the old
patch(es) get replaced, even those which don't have a corresponding
function in the new patch.

The justification for your patch seems to be "here are some bugs, this
patch helps work around them", which isn't very convincing.  Instead we
need to understand the original bugs and fix them.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-07  9:36       ` Petr Mladek
@ 2025-02-08  2:14         ` Yafang Shao
  0 siblings, 0 replies; 36+ messages in thread
From: Yafang Shao @ 2025-02-08  2:14 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Fri, Feb 7, 2025 at 5:36 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2025-02-07 11:16:45, Yafang Shao wrote:
> > On Fri, Feb 7, 2025 at 10:31 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > > On Mon, Jan 27, 2025 at 02:35:26PM +0800, Yafang Shao wrote:
> > > > - Temporary Loss of Patching
> > > >
> > > >   During the replacement process, the old patch is set to a NOP (no-operation)
> > > >   before the new patch is fully applied. This creates a window where the
> > > >   function temporarily reverts to its original, unpatched state. If the old
> > > >   patch fixed a critical issue (e.g., one that prevented a system panic), the
> > > >   system could become vulnerable to that issue during the transition.
> > >
> > > Are you saying that atomic replace is not atomic?  If so, this sounds
> > > like another bug.
> >
> > >From my understanding, there’s a window where the original function is
> > not patched.
>
> This is a misunderstanding.
>
> > klp_enable_patch
> > + klp_init_patch
> >    + if (patch->replace)
> >           klp_add_nops(patch);  <<<< set all old patches to nop
>
> 1. The "nop" entry is added into the _new_ (to-be-enabled) livepatch,
>    see klp_add_nops(patch). The parameter is the _newly_ enabled patch.
>
> 2. The "nop" entries are added only for functions which are currently
>    livepatched but they are not longer livepatched in the new
>    livepatch, see:
>
> static int klp_add_object_nops(struct klp_patch *patch,
>                                struct klp_object *old_obj)
> {
> [...]
>         klp_for_each_func(old_obj, old_func) {
>                 func = klp_find_func(obj, old_func);
>                 if (func)
>                         continue;       <------ Do not allocate nop
>                                                 when the fuction is
>                                                 implemeted in the new
>                                                 livepatch.
>
>                 func = klp_alloc_func_nop(old_func, obj);
>                 if (!func)
>                         return -ENOMEM;
>         }
>
>         return 0;
> }

Thanks for your explanation.

>
>
> > + __klp_enable_patch
> >    + klp_patch_object
> >       + klp_patch_func
> >          + ops = klp_find_ops(func->old_func);
> >             + if (ops)
> >                    // add the new patch to the func_stack list
> >                    list_add_rcu(&func->stack_node, &ops->func_stack);
> >
> >
> > klp_ftrace_handler
> > + func = list_first_or_null_rcu(&ops->func_stack, struct klp_func
>
> 3. You omitted this important part of the code:
>
>         if (unlikely(func->transition)) {
>                 patch_state = current->patch_state;
>                 if (patch_state == KLP_TRANSITION_UNPATCHED) {
>                         /*
> ---->                    * Use the previously patched version of the function.
> ---->                    * If no previous patches exist, continue with the
> ---->                    * original function.
>                          */
>                         func = list_entry_rcu(func->stack_node.next,
>                                               struct klp_func, stack_node);
>
>
>         The condition "patch_state == KLP_TRANSITION_UNPATCHED" might
>         be a bit misleading.
>
>         The state "KLP_TRANSITION_UNPATCHED" means that it can't use
>         the code from the "new" livepatch => it has to fallback
>         to the previously used code => previous livepatch.

Understood.

>
>
> > + if (func->nop)
> >        goto unlock;
> > + ftrace_regs_set_instruction_pointer(fregs, (unsigned long)func->new_func);
>
> > Before the new atomic replace patch is added to the func_stack list,
> > the old patch is already set to nop.
>       ^^^
>
>      The nops are set in the _new_ patch for functions which will
>      not longer get livepatched, see the commit e1452b607c48c642
>      ("livepatch: Add atomic replace") for more details.
>
> > If klp_ftrace_handler() is
> > triggered at this point, it will effectively do nothing—in other
> > words, it will execute the original function.
> > I might be wrong.
>
> Fortunately, you are wrong. This would be a serious violation of
> the consistency model and livepatches modifying some semantic would
> blow up systems.

That is great. Thanks for your help.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-07 11:00             ` Petr Mladek
@ 2025-02-08  2:49               ` Yafang Shao
  2025-02-10  2:50                 ` Yafang Shao
  0 siblings, 1 reply; 36+ messages in thread
From: Yafang Shao @ 2025-02-08  2:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Miroslav Benes, jpoimboe, jikos, joe.lawrence, live-patching,
	linux-kernel

On Fri, Feb 7, 2025 at 7:01 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2025-02-05 14:16:42, Yafang Shao wrote:
> > On Tue, Feb 4, 2025 at 9:05 PM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> > > > On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> > > > >
> > > > > > >
> > > > > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > > > > >
> > > > > > The script:
> > > > > >
> > > > > > #!/bin/bash
> > > > > > while true; do
> > > > > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > > > > >         ./apply_livepatch_61.sh # it will sleep 5s
> > > > > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > > > > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > > > > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > > > > done
> > > > >
> > > > > A live patch application is a slowpath. It is expected not to run
> > > > > frequently (in a relative sense).
> > > >
> > > > The frequency isn’t the main concern here; _scalability_ is the key issue.
> > > > Running livepatches once per day (a relatively low frequency) across all of our
> > > > production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> > > > periodically run tests on a subset of test servers.
> > >
> > > I am confused. The original problem was a system crash when
> > > livepatching do_exit() function, see
> > > https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com
> >
> > Why do you view this patchset as a solution to the original problem?
>
> You discovered the hardlockup warnings when trying to reproduce the
> original problem. At least, you mentioned this at
> https://lore.kernel.org/r/20250122085146.41553-1-laoar.shao@gmail.com
>
> And using the hybrid module would allow to livepatch do_exit() only
> once and do not touch it any longer. It is not the right solution
> but it would be a workaround.
>
>
> > > The rcu watchdog warning was first mentioned in this patchset.
> > > Do you see rcu watchdog warning in production or just
> > > with this artificial test, please?
> >
> > So, we shouldn’t run any artificial tests on the livepatch, correct?
> > What exactly is the issue with these test cases?
>
> Some tests might be too artificial. They might find problems which
> do not exist in practice.
>
> Disclaimer: I do not say that this is the case. You actually prove
>         later in this mail that the hardlockup warning happen
>         even in production.
>
> Anyway, if an artificial test finds a potential problem and the fix is
> complicated then we need to decide if it is worth it.
>
> It does not make sense to complicate the code too much when
> the fixed problem does not happen in practice.
>
>   + Too complicated code might introduce regressions which are
>     more serious than the original problem.
>
>   + Too complicated code increases the maintenance cost. It is
>     more complicated to add new features or fix bugs.
>
>
> > > > > If you stress it like this, it is quite
> > > > > expected that it will have an impact. Especially on a large busy system.
> > > >
> > > > It seems you agree that the current atomic-replace process lacks scalability.
> > > > When deploying a livepatch across a large fleet of servers, it’s impossible to
> > > > ensure that the servers are idle, as their workloads are constantly varying and
> > > > are not deterministic.
> > >
> > > Do you see the scalability problem in production, please?
> >
> > Yes, the livepatch transition was stalled.
>
> Good to know.
>
> >
> > > And could you prove that it was caused by livepatching, please?
> >
> > When the livepatch transition is stalled, running `kpatch list` will
> > display the stalled information.
>
> OK.
>
> > > > The challenges are very different when managing 1K servers versus 1M servers.
> > > > Similarly, the issues differ significantly between patching a single
> > > > function and
> > > > patching 100 functions, especially when some of those functions are critical.
> > > > That’s what scalability is all about.
> > > >
> > > > Since we transitioned from the old livepatch mode to the new
> > > > atomic-replace mode,
> > >
> > > What do you mean with the old livepatch mode, please?
> >
> > $ kpatch-build -R
>
> I am not familiar with kpatch-build. OK, I see the following at
> https://github.com/dynup/kpatch/blob/master/kpatch-build/kpatch-build
>
> echo "          -R, --non-replace       Disable replace patch (replace is on by default)" >&2
>
> > >
> > > Did you allow to install more livepatches in parallel?
> >
> > No.
>
> I guess that there is a misunderstanding. I am sorry my wording was
> not clear enough.
>
> By "installing" more livepatches in parallel I meant to "have enabled"
> more livepatches in parallel. It is possible only when you do not
> use the atomic replace.
>
> By other words, if you use "kpatch-build -R" then you could have
> enabled more livepatches in parallel.
>
>
> > > What was the motivation to switch to the atomic replace, please?
> >
> > This is the default behavior of kpatch [1] after upgrading to a new version.
> >
> > [1].  https://github.com/dynup/kpatch/tree/master
>
> OK. I wonder if the atomic replace simplified some actions for you.

Actually, it simplifies the livepatch deployment since it only
involves a single livepatch.

>
> I ask because the proposed "hybrid" model is very similar to the "old"
> model which did not use the atomic replace.

It’s essentially a hybrid of the old model and the atomic replace model.

>
> What are the advantages of the "hybrid" model over the "old" model, please?

- Advantages compared to the old model
  In the old model, it’s not possible to replace a specific livepatch
with a new one. In the hybrid model, however, you can replace specific
livepatches as needed.

- Advantages and disadvantages compared to the atomic replace model
   - Advantage
     In the atomic replace model, you must replace all old
livepatches, regardless of whether they are relevant to the new
livepatch. In the hybrid model, you only need to replace the relevant
livepatches.

   - Disadvantage
     In the atomic replace model, you only need to maintain a single
livepatch. However, in the hybrid model, you need to manage multiple
livepatches.

>
>
> > > > our SREs have consistently reported that one or more servers become
> > > > stalled during
> > > > the upgrade (replacement).
> > >
> > > What is SRE, please?
> >
> > >From the wikipedia : https://en.wikipedia.org/wiki/Site_reliability_engineering
>
> Good to know.
>
> > > Could you please show some log from a production system?
> >
> > When the SREs initially reported that the livepatch transition was
> > stalled, I simply advised them to try again. However, after
> > experiencing these crashes, I dug deeper into the livepatch code and
> > realized that scalability is a concern. As a result, periodically
> > replacing an old livepatch triggers RCU warnings on our production
> > servers.
> >
> > [Wed Feb  5 10:56:10 2025] livepatch: enabling patch 'livepatch_61_release6'
> > [Wed Feb  5 10:56:10 2025] livepatch: 'livepatch_61_release6':
> > starting patching transition
> > [Wed Feb  5 10:56:24 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > 1126113 is 10078 jiffies old.
> > [Wed Feb  5 10:56:38 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > 1126117 is 10199 jiffies old.
> > [Wed Feb  5 10:56:52 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > 1126121 is 10047 jiffies old.
> > [Wed Feb  5 10:56:57 2025] livepatch: 'livepatch_61_release6': patching complete
>
> I guess that this happens primary because there are many processes
> running in kernel code:
>
>        + many processes => long task list
>        + in kernel code => need to check stack
>
> I wondered how much it is caused by livepatching do_exit() which
> takes tasklist_lock. The idea was:

I’ll drop the changes in do_exit() and run the test again.

>
>         + The processes calling do_exit() are blocked at
>           write_lock_irq(&tasklist_lock) when
>           klp_try_complete_transition() takes the tasklist_lock.
>
>         + These processes can't be transitioned because do_exit()
>           is on the stack => more klp_try_complete_transition()
>           rounds is needed.
>
>           => livepatching do_exit() reducess the chance of
>              klp_try_complete_transition() succcess.
>
>         Well, it should not be that big problem because the next
>          klp_try_complete_transition() should be much faster.
>          It will skip already transitioned processes quickly.
>
>        Resume: I think that livepatching of do_exit() should not be the main
>                 problem for the stall.
>
>
> > PS: You might argue again about the frequency. If you believe this is
> > just a frequency issue, please suggest a suitable frequency.
>
> I do not know. The livepatch transition might block some processes.
> It is a kind of stress for the system. Similar to another
> housekeeping operations.
>
> It depends on the load and the amount and type of livepatched
> functions. It might take some time until the system recovers
> from the stress and the system load drops back to normal.
>
> If you create the stress (livepatch transition) too frequently
> and the system does not get chance to recover in between the
> stress situations then the bad effects might accumulate
> and might be much worse.
>
> I have no idea if it is the case here. The rule of thumb would be:

Note that I’ve just deployed the test to a few of our production
servers. The test runs at a relatively low frequency, so it doesn’t
introduce significant side effects to the original workload, aside
from some latency spikes during deployment. These spikes are
short-lived and disappear quickly.

>
>   + If you see the hardlockup warning _only_ when running the stress
>     test "while true: do apply_livepatch ; done;" then
>     the problem might be rather theoretical.
>
>   + If you see the hardlockup warning on production systems where
>     the you apply a livepatch only occasionally (one per day)
>     then the problem is real and we should fix it.

I can't run it once per day, as it might take too long to reproduce
the issue. If I want to reproduce it quickly, I’d need to deploy the
test to more production servers, which would likely be complained
about by our sys admins. Notably, the RCU warning even occurs when I
run the test once every 4 hours. The reason I run the test
periodically is that the workloads change throughout the day, so
periodically testing them helps monitor the system's behavior across
different workloads.

>
>
> > > > > > > > Other potential risks may also arise
> > > > > > > >   due to inconsistencies or race conditions during transitions.
> > > > > > >
> > > > > > > What inconsistencies and race conditions you have in mind, please?
> > > > > >
> > > > > > I have explained it at
> > > > > > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
> > > > > >
> > > > > >  klp_ftrace_handler
> > > > > >       if (unlikely(func->transition)) {
> > > > > >           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
> > > > > >   }
> > > > > >
> > > > > > Why is WARN_ON_ONCE() placed here? What issues have we encountered in the past
> > > > > > that led to the decision to add this warning?
> > > > >
> > > > > A safety measure for something which really should not happen.
> > > >
> > > > Unfortunately, this issue occurs during my stress tests.
> > >
> > > I am confused. Do you see the above WARN_ON_ONCE() during your
> > > stress test? Could you please provide a log?
> >
> > Could you pls read my replyment seriously ?
>
> This is pretty hars and offending after so many details I have already
> provided!

Sorry about that. You really help me so much.

>
> It is easy to miss a detail in a flood of long mails. Also I am
> working on many other things in parallel.
>
> > https://lore.kernel.org/live-patching/Z5DHQG4geRsuIflc@pathway.suse.cz/T/#m5058583fa64d95ef7ac9525a6a8af8ca865bf354
>
> Ah, I have missed that you triggered this exact WARNING. It is great.
> It confirms the theory about the race in do_exit(). I mean that
> the transition finishes early because the processes in do_exit()
> are not longer visible in the tasklist.

It seems like that.

>
>
> > > > > > > The main advantage of the atomic replace is simplify the maintenance
> > > > > > > and debugging.
> > >
> > > If you have problems with the atomic replace then you might stop using
> > > it completely and just install more livepatches in parallel.
> >
> > Why do we need to install livepatches in parallel if atomic replace is disabled?
> > We only need to install the additional new livepatch. Parallel
> > installation is only necessary at boot time.
>
> This is misunderstanding. By "installed" livepatches in parallel I
> mean "enabled" livepatches in parallel, aka, without atomic replace.
>
> If you have problems with atomic replace, you might stop using it.
> Honestly, I do not see that big advantage in the hybrid model
> over the non-atomic-replace model.

I believe the ability to selectively replace a specific livepatch is a
significant advantage over both the non-atomic-replace and
atomic-replace models.

>
> That said, I think that the hybrid mode will not prevent the
> hardlockup warning. It seems that you have reproduced the hardlockup
> even with a relatively simple livepatch, see
> https://lore.kernel.org/all/CALOAHbBZc6ORGzXwBRwe+rD2=YGf1jub5TEr989_GpK54P2o1A@mail.gmail.com/
>
> IMHO, we should rather detect and break the stall in
> klp_try_complete_transition(). I mean to go the way explored in
> the thread
> https://lore.kernel.org/all/20250122085146.41553-1-laoar.shao@gmail.com/

That’s a separate issue, which we can discuss it seperately.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-07 13:58               ` Petr Mladek
@ 2025-02-08  3:08                 ` Yafang Shao
  0 siblings, 0 replies; 36+ messages in thread
From: Yafang Shao @ 2025-02-08  3:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: jpoimboe, jikos, mbenes, joe.lawrence, live-patching,
	linux-kernel

On Fri, Feb 7, 2025 at 9:58 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Thu 2025-02-06 10:35:11, Yafang Shao wrote:
> > On Thu, Feb 6, 2025 at 12:03 AM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Wed 2025-02-05 10:54:47, Yafang Shao wrote:
> > > > On Tue, Feb 4, 2025 at 9:21 PM Petr Mladek <pmladek@suse.com> wrote:
> > > > >
> > > > > On Mon 2025-01-27 23:34:50, Yafang Shao wrote:
>
> I am not sure if you still would want the hybrid model.

I believe the ability to selectively replace specific livepatches will
be a highly valuable feature.

> It is possible that the timeout in klp_try_complete_transition()
> would remove the hardlockup watchdog warnings, see
> https://lore.kernel.org/r/Z6Tmqro6CSm0h-E3@pathway.suse.cz
>
> I reply to this mail just for record because there were some
> unanswered questions...
>
> > > > To clarify the hybrid model, I'll illustrate it with the following Fixes:
> > > >
> > > > Fix1 livepatches: funcA
> > > > Fix2  livepatches: funcC
> > > > Fix3 livepatches: funcA, funcB
> > >
> > > How does look the livepatched FuncA here?
> > > Does it contain changes only for Fix3?
> > > Or is it cummulative livepatches_funA includes both Fix1 + Fix3?
> >
> > It is cumulative livepatches_funA includes both Fix1 + Fix3.
>
> It makes sense.
>
> I have missed this in the previous mail. I see it there now (after
> re-reading). But trick was somehow encrypted in long sentences.
>
>
> > > > Fix4  livepatches: funcD
> > > > Fix5 livepatches: funcB
> > > >
> > > > Each FixN represents a single /sys/kernel/livepatch/FixN.
> > > > By default, all Fixes are set to non-replaceable.
> > > >
> > > > Step-by-step process:
> > > > 1. Loading Fix1
> > > >    Loaded Fixes: Fix1
> > > > 2. Loading Fix2
> > > >    Loaded Fixes: Fix1 Fix2
> > > > 3. Loading Fix3
> > > >     Before loading, set Fix1 to replaceable and replace it with Fix3,
> > > > which is a cumulative patch of Fix1 and Fix3.
> > >
> > > Who will modify the "replaceable" flag of "Fix1"? Userspace or kernel?
> >
> > Userspace scripts. Modify it before loading a new one.
>
> This is one mine concern. Such a usespace script would be
> more complex for the the hybrid model then for cumulative livepatches.
> Any user of the hybrid model would have their own scripts
> and eventual bugs.

In the old model, we maintained a large script to deploy individual
livepatches. In the atomic replace model, we maintain another complex
script to deploy cumulative livepatches. We always end up creating our
own scripts to adapt to the complexities of the production
environment.

>
> Anyway, the more possibilities there more ways to break things
> and the more complicated is to debug eventual problems.

We still have some servers running the old 4.19 kernel, with over 20
livepatches deployed. These livepatches are managed using the old
model since the atomic replace model isn't supported yet. The
maintenance of these livepatches has been running smoothly with user
scripts. Things would be much simpler if we could rely on user scripts
to handle this process.

>
> If anyone would still like the hybrid model then I would like
> to enforce some safe behavior from the kernel. I mean to do
> as much as possible in the common (kernel) code.
>
> I have the following ideas:
>
>   1. Allow to load a livepatch with .replace enabled only when
>      all conflicting[*] livepatches are allowed to be replaced.

Makes sense.

>
>   2. Automatically replace all conflicting[*] livepatches.

Makes sense.

>
>   3. Allow to define a list of to-be-replaced livepatches
>      into struct patch.

Seems like a great idea.

>
>
> The 1st or 2nd idea would make the hybrid model more safe.
>
> The 2nd idea would even work without the .replaceable flag.
>
> The 3rd idea would allow to replace even non-conflicting[*]
> patches.
>
> [*] I would define that two livepatches are "conflicting" when
>     at least one function is modified by both of them. e.g.
>
>         + Patch1: funcA, funcB   \
>         + Patch2: funcC, funcD   - non-conflicting
>
>         + Patch1: funcA, funcB          \
>         + Patch2:        funcB, funcC   - conflicting
>
>     Or a bit weaker definition. Two patches are "conflicting"
>     when the new livepatch provides only partial replacement
>     for already livepatch functions, .e.g.
>
>         + Patch1: funcA, funcB                \
>         + Patch2:               funcC, funcD  - non-conflicting anyway
>
>         + Patch1: funcA, funcB          - Patch1 can't replace Patch2 (conflict)
>         + Patch2: funcA, funcB, funcC   - Patch2 can replace Patch1 (no conflict)
>
>         + Patch1: funcA, funcB          \
>         + Patch2:        funcB, funcC   - conflicting anyway
>
>
> Especially, the automatic replacement of conflicting patches might
> make the hybrid model safe and easier to use. And it would resolve
> most of my fears with this approach.

Many thanks for your suggestion. It’s been incredibly helpful. I’ll
definitely give it some thought.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode
  2025-02-07 16:59       ` Josh Poimboeuf
@ 2025-02-08  3:38         ` Yafang Shao
  0 siblings, 0 replies; 36+ messages in thread
From: Yafang Shao @ 2025-02-08  3:38 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: jikos, mbenes, pmladek, joe.lawrence, live-patching, linux-kernel

On Sat, Feb 8, 2025 at 12:59 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Fri, Feb 07, 2025 at 11:16:45AM +0800, Yafang Shao wrote:
> > On Fri, Feb 7, 2025 at 10:31 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > > Why does this happen?
> >
> > It occurs during the KLP transition. It seems like the KLP transition
> > is taking too long.
> >
> > [20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
> > [20329703.340417] livepatch: 'livepatch_61_release6': starting
> > patching transition
> > [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
> > 10166 jiffies old.
> > [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
> > 10219 jiffies old.
> > [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
> > 10199 jiffies old.
> > [20329754.848036] livepatch: 'livepatch_61_release6': patching complete
>
> How specifically does the KLP transition trigger rcu_tasks workings?

I believe the reason is the livepatch transition holds the spinlock
tasklist_lock too long. Though I haven't tried to prove it yet.

>
> > Before the new atomic replace patch is added to the func_stack list,
> > the old patch is already set to nop. If klp_ftrace_handler() is
> > triggered at this point, it will effectively do nothing—in other
> > words, it will execute the original function.
> > I might be wrong.
>
> That's not actually how it works.  klp_add_nops() probably needs some
> better comments.

With Petr's help, I now understand how it works.

What do you think about adding the following comments? These comments
are copied from Petr's reply [0].

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index f2071a20e5f0..64a026af53e1 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -559,6 +559,9 @@ static int klp_add_object_nops(struct klp_patch *patch,
  * Add 'nop' functions which simply return to the caller to run
  * the original function. The 'nop' functions are added to a
  * patch to facilitate a 'replace' mode.
+ *
+ * The nop entries are added only for functions which are currently
+ * livepatched but they are no longer livepatched in the new livepatch.
  */
 static int klp_add_nops(struct klp_patch *patch)
 {


[0]. https://lore.kernel.org/live-patching/CALOAHbBs5sfAxSw4HA6KwjWbH3GmhkHJqcni0d4iB7oVZ_3vjw@mail.gmail.com/T/#m96263bd4e0b2a781e5847aee4fe74f7a17ed186c

>
> It adds nops to the *new* patch so that all the functions in the old
> patch(es) get replaced, even those which don't have a corresponding
> function in the new patch.
>
> The justification for your patch seems to be "here are some bugs, this
> patch helps work around them", which isn't very convincing.

The statement "here are some bugs, the hybrid model can workaround
them" is correct. However, the most important part is "This selective
approach would reduce unnecessary transitions," which will be a
valuable improvement to the livepatch system.

In the old 4.19 kernel, we faced an issue where we had to unload an
already-loaded livepatch and replace it with a new one. Without atomic
replace, we had to first unload the old livepatch (along with others)
and then load the new one, which was less than ideal. In the 6.1
kernel, atomic replace is supported, and it works perfectly for that
situation. This is why we prefer using the atomic replace model over
the old one.

However, atomic replace is not entirely ideal, as it replaces all old
livepatches, even if they are not relevant to the new changes. In
reality, we should not replace livepatches unless they are relevant.
We only need to replace the ones that conflict with the new livepatch.
This is where the hybrid model excels, allowing us to achieve this
selective replacement.

> Instead we
> need to understand the original bugs and fix them.

Yes, we should continue working on fixing them.

--
Regards

Yafang

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-06 18:00                 ` Song Liu
@ 2025-02-08  6:41                   ` Yafang Shao
  2025-02-08 15:47                     ` Alexei Starovoitov
  0 siblings, 1 reply; 36+ messages in thread
From: Yafang Shao @ 2025-02-08  6:41 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Miroslav Benes, Petr Mladek, jpoimboe, jikos, joe.lawrence,
	live-patching, linux-kernel

On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
>
> On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> [...]
> > > I think we should first understand why the trampoline is not
> > > freed.
> >
> > IIUC, the fexit works as follows,
> >
> >   bpf_trampoline
> >     + __bpf_tramp_enter
> >        + percpu_ref_get(&tr->pcref);
> >
> >     + call do_exit()
> >
> >     + __bpf_tramp_exit
> >        + percpu_ref_put(&tr->pcref);
> >
> > Since do_exit() never returns, the refcnt of the trampoline image is
> > never decremented, preventing it from being freed.
>
> Thanks for the explanation. In this case, I think it makes sense to
> disallow attaching fexit programs on __noreturn functions. I am not
> sure what is the best solution for it though.

There is a tools/objtool/noreturns.h. Perhaps we could create a
similar noreturns.h under kernel/bpf and add all relevant functions to
the fexit deny list.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-08  6:41                   ` Yafang Shao
@ 2025-02-08 15:47                     ` Alexei Starovoitov
  2025-02-08 19:32                       ` Josh Poimboeuf
  0 siblings, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2025-02-08 15:47 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Song Liu, bpf, Miroslav Benes, Petr Mladek, Josh Poimboeuf,
	Jiri Kosina, Joe Lawrence, live-patching, LKML

On Fri, Feb 7, 2025 at 10:42 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
> >
> > On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > [...]
> > > > I think we should first understand why the trampoline is not
> > > > freed.
> > >
> > > IIUC, the fexit works as follows,
> > >
> > >   bpf_trampoline
> > >     + __bpf_tramp_enter
> > >        + percpu_ref_get(&tr->pcref);
> > >
> > >     + call do_exit()
> > >
> > >     + __bpf_tramp_exit
> > >        + percpu_ref_put(&tr->pcref);
> > >
> > > Since do_exit() never returns, the refcnt of the trampoline image is
> > > never decremented, preventing it from being freed.
> >
> > Thanks for the explanation. In this case, I think it makes sense to
> > disallow attaching fexit programs on __noreturn functions. I am not
> > sure what is the best solution for it though.
>
> There is a tools/objtool/noreturns.h. Perhaps we could create a
> similar noreturns.h under kernel/bpf and add all relevant functions to
> the fexit deny list.

Pls avoid copy paste if possible.
Something like:

BTF_SET_START(fexit_deny)
#define NORETURN(fn) BTF_ID(func, fn)
#include "../../tools/objtool/noreturns.h"

Should work?

Josh,
maybe we should move noreturns.h to some common location?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-08 15:47                     ` Alexei Starovoitov
@ 2025-02-08 19:32                       ` Josh Poimboeuf
  2025-02-09  3:56                         ` Alexei Starovoitov
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2025-02-08 19:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yafang Shao, Song Liu, bpf, Miroslav Benes, Petr Mladek,
	Jiri Kosina, Joe Lawrence, live-patching, LKML

On Sat, Feb 08, 2025 at 07:47:12AM -0800, Alexei Starovoitov wrote:
> On Fri, Feb 7, 2025 at 10:42 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
> > >
> > > On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > [...]
> > > > > I think we should first understand why the trampoline is not
> > > > > freed.
> > > >
> > > > IIUC, the fexit works as follows,
> > > >
> > > >   bpf_trampoline
> > > >     + __bpf_tramp_enter
> > > >        + percpu_ref_get(&tr->pcref);
> > > >
> > > >     + call do_exit()
> > > >
> > > >     + __bpf_tramp_exit
> > > >        + percpu_ref_put(&tr->pcref);
> > > >
> > > > Since do_exit() never returns, the refcnt of the trampoline image is
> > > > never decremented, preventing it from being freed.
> > >
> > > Thanks for the explanation. In this case, I think it makes sense to
> > > disallow attaching fexit programs on __noreturn functions. I am not
> > > sure what is the best solution for it though.
> >
> > There is a tools/objtool/noreturns.h. Perhaps we could create a
> > similar noreturns.h under kernel/bpf and add all relevant functions to
> > the fexit deny list.
> 
> Pls avoid copy paste if possible.
> Something like:
> 
> BTF_SET_START(fexit_deny)
> #define NORETURN(fn) BTF_ID(func, fn)
> #include "../../tools/objtool/noreturns.h"
> 
> Should work?
> 
> Josh,
> maybe we should move noreturns.h to some common location?

The tools code is meant to be independent from the kernel, but it could
be synced by copying it to both include/linux and tools/include/linux,
and then make sure it stays in sync with tools/objtool/sync-check.sh.

However, noreturns.h is manually edited, and only for some arches.  And
even for those arches it's likely not exhaustive: we only add to it when
we notice an objtool warning, and not all calls to noreturns will
necessarily trigger a warning.  So I'd be careful about relying on that.

Also that file is intended to be temporary, there have been proposals to
add compiler support for annotating noreturns.  That hasn't been
implemented yet, help wanted!

I think the noreturn info is available in DWARF, can that be converted
to BTF?

Or is there some way to release outstanding trampolines in do_exit()?

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-08 19:32                       ` Josh Poimboeuf
@ 2025-02-09  3:56                         ` Alexei Starovoitov
  2025-02-10  2:39                           ` Yafang Shao
  0 siblings, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2025-02-09  3:56 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Yafang Shao, Song Liu, bpf, Miroslav Benes, Petr Mladek,
	Jiri Kosina, Joe Lawrence, live-patching, LKML

On Sat, Feb 8, 2025 at 11:32 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Sat, Feb 08, 2025 at 07:47:12AM -0800, Alexei Starovoitov wrote:
> > On Fri, Feb 7, 2025 at 10:42 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
> > > >
> > > > On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > [...]
> > > > > > I think we should first understand why the trampoline is not
> > > > > > freed.
> > > > >
> > > > > IIUC, the fexit works as follows,
> > > > >
> > > > >   bpf_trampoline
> > > > >     + __bpf_tramp_enter
> > > > >        + percpu_ref_get(&tr->pcref);
> > > > >
> > > > >     + call do_exit()
> > > > >
> > > > >     + __bpf_tramp_exit
> > > > >        + percpu_ref_put(&tr->pcref);
> > > > >
> > > > > Since do_exit() never returns, the refcnt of the trampoline image is
> > > > > never decremented, preventing it from being freed.
> > > >
> > > > Thanks for the explanation. In this case, I think it makes sense to
> > > > disallow attaching fexit programs on __noreturn functions. I am not
> > > > sure what is the best solution for it though.
> > >
> > > There is a tools/objtool/noreturns.h. Perhaps we could create a
> > > similar noreturns.h under kernel/bpf and add all relevant functions to
> > > the fexit deny list.
> >
> > Pls avoid copy paste if possible.
> > Something like:
> >
> > BTF_SET_START(fexit_deny)
> > #define NORETURN(fn) BTF_ID(func, fn)
> > #include "../../tools/objtool/noreturns.h"
> >
> > Should work?
> >
> > Josh,
> > maybe we should move noreturns.h to some common location?
>
> The tools code is meant to be independent from the kernel, but it could
> be synced by copying it to both include/linux and tools/include/linux,
> and then make sure it stays in sync with tools/objtool/sync-check.sh.
>
> However, noreturns.h is manually edited, and only for some arches.  And
> even for those arches it's likely not exhaustive: we only add to it when
> we notice an objtool warning, and not all calls to noreturns will
> necessarily trigger a warning.  So I'd be careful about relying on that.
>
> Also that file is intended to be temporary, there have been proposals to
> add compiler support for annotating noreturns.  That hasn't been
> implemented yet, help wanted!
>
> I think the noreturn info is available in DWARF, can that be converted
> to BTF?

There are 30k+ noreturn funcs in dwarf. So pahole would need to have
some heuristic to filter out the noise.
It's doable, but we need to stop the bleeding the simplest way
and the fix would need to be backported too.
We can copy paste noreturns.h or #include it from the current location
for now and think of better ways for -next.

> Or is there some way to release outstanding trampolines in do_exit()?

we can walk all trampolines in do_exit,
but we'd still need to:
if (trampoline->func.addr == do_exit || ...addr == __x64_sys_exit ||
before dropping the refcnt.
which is the same thing, but worse.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-09  3:56                         ` Alexei Starovoitov
@ 2025-02-10  2:39                           ` Yafang Shao
  0 siblings, 0 replies; 36+ messages in thread
From: Yafang Shao @ 2025-02-10  2:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Josh Poimboeuf, Song Liu, bpf, Miroslav Benes, Petr Mladek,
	Jiri Kosina, Joe Lawrence, live-patching, LKML

On Sun, Feb 9, 2025 at 11:57 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Feb 8, 2025 at 11:32 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > On Sat, Feb 08, 2025 at 07:47:12AM -0800, Alexei Starovoitov wrote:
> > > On Fri, Feb 7, 2025 at 10:42 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > On Fri, Feb 7, 2025 at 2:01 AM Song Liu <song@kernel.org> wrote:
> > > > >
> > > > > On Wed, Feb 5, 2025 at 6:55 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > [...]
> > > > > > > I think we should first understand why the trampoline is not
> > > > > > > freed.
> > > > > >
> > > > > > IIUC, the fexit works as follows,
> > > > > >
> > > > > >   bpf_trampoline
> > > > > >     + __bpf_tramp_enter
> > > > > >        + percpu_ref_get(&tr->pcref);
> > > > > >
> > > > > >     + call do_exit()
> > > > > >
> > > > > >     + __bpf_tramp_exit
> > > > > >        + percpu_ref_put(&tr->pcref);
> > > > > >
> > > > > > Since do_exit() never returns, the refcnt of the trampoline image is
> > > > > > never decremented, preventing it from being freed.
> > > > >
> > > > > Thanks for the explanation. In this case, I think it makes sense to
> > > > > disallow attaching fexit programs on __noreturn functions. I am not
> > > > > sure what is the best solution for it though.
> > > >
> > > > There is a tools/objtool/noreturns.h. Perhaps we could create a
> > > > similar noreturns.h under kernel/bpf and add all relevant functions to
> > > > the fexit deny list.
> > >
> > > Pls avoid copy paste if possible.
> > > Something like:
> > >
> > > BTF_SET_START(fexit_deny)
> > > #define NORETURN(fn) BTF_ID(func, fn)
> > > #include "../../tools/objtool/noreturns.h"
> > >
> > > Should work?
> > >
> > > Josh,
> > > maybe we should move noreturns.h to some common location?
> >
> > The tools code is meant to be independent from the kernel, but it could
> > be synced by copying it to both include/linux and tools/include/linux,
> > and then make sure it stays in sync with tools/objtool/sync-check.sh.
> >
> > However, noreturns.h is manually edited, and only for some arches.  And
> > even for those arches it's likely not exhaustive: we only add to it when
> > we notice an objtool warning, and not all calls to noreturns will
> > necessarily trigger a warning.  So I'd be careful about relying on that.
> >
> > Also that file is intended to be temporary, there have been proposals to
> > add compiler support for annotating noreturns.  That hasn't been
> > implemented yet, help wanted!
> >
> > I think the noreturn info is available in DWARF, can that be converted
> > to BTF?
>
> There are 30k+ noreturn funcs in dwarf. So pahole would need to have
> some heuristic to filter out the noise.
> It's doable, but we need to stop the bleeding the simplest way
> and the fix would need to be backported too.
> We can copy paste noreturns.h or #include it from the current location
> for now and think of better ways for -next.

It seems like the simplest way at present.
I will send a patch.


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH 0/2] livepatch: Add support for hybrid mode
  2025-02-08  2:49               ` Yafang Shao
@ 2025-02-10  2:50                 ` Yafang Shao
  0 siblings, 0 replies; 36+ messages in thread
From: Yafang Shao @ 2025-02-10  2:50 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Miroslav Benes, jpoimboe, jikos, joe.lawrence, live-patching,
	linux-kernel

On Sat, Feb 8, 2025 at 10:49 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Fri, Feb 7, 2025 at 7:01 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Wed 2025-02-05 14:16:42, Yafang Shao wrote:
> > > On Tue, Feb 4, 2025 at 9:05 PM Petr Mladek <pmladek@suse.com> wrote:
> > > >
> > > > On Mon 2025-02-03 17:44:52, Yafang Shao wrote:
> > > > > On Fri, Jan 31, 2025 at 9:18 PM Miroslav Benes <mbenes@suse.cz> wrote:
> > > > > >
> > > > > > > >
> > > > > > > >   + What exactly is meant by frequent replacements (busy loop?, once a minute?)
> > > > > > >
> > > > > > > The script:
> > > > > > >
> > > > > > > #!/bin/bash
> > > > > > > while true; do
> > > > > > >         yum install -y ./kernel-livepatch-6.1.12-0.x86_64.rpm
> > > > > > >         ./apply_livepatch_61.sh # it will sleep 5s
> > > > > > >         yum erase -y kernel-livepatch-6.1.12-0.x86_64
> > > > > > >         yum install -y ./kernel-livepatch-6.1.6-0.x86_64.rpm
> > > > > > >         ./apply_livepatch_61.sh  # it will sleep 5s
> > > > > > > done
> > > > > >
> > > > > > A live patch application is a slowpath. It is expected not to run
> > > > > > frequently (in a relative sense).
> > > > >
> > > > > The frequency isn’t the main concern here; _scalability_ is the key issue.
> > > > > Running livepatches once per day (a relatively low frequency) across all of our
> > > > > production servers (hundreds of thousands) isn’t feasible. Instead, we need to
> > > > > periodically run tests on a subset of test servers.
> > > >
> > > > I am confused. The original problem was a system crash when
> > > > livepatching do_exit() function, see
> > > > https://lore.kernel.org/r/CALOAHbA9WHPjeZKUcUkwULagQjTMfqAdAg+akqPzbZ7Byc=qrw@mail.gmail.com
> > >
> > > Why do you view this patchset as a solution to the original problem?
> >
> > You discovered the hardlockup warnings when trying to reproduce the
> > original problem. At least, you mentioned this at
> > https://lore.kernel.org/r/20250122085146.41553-1-laoar.shao@gmail.com
> >
> > And using the hybrid module would allow to livepatch do_exit() only
> > once and do not touch it any longer. It is not the right solution
> > but it would be a workaround.
> >
> >
> > > > The rcu watchdog warning was first mentioned in this patchset.
> > > > Do you see rcu watchdog warning in production or just
> > > > with this artificial test, please?
> > >
> > > So, we shouldn’t run any artificial tests on the livepatch, correct?
> > > What exactly is the issue with these test cases?
> >
> > Some tests might be too artificial. They might find problems which
> > do not exist in practice.
> >
> > Disclaimer: I do not say that this is the case. You actually prove
> >         later in this mail that the hardlockup warning happen
> >         even in production.
> >
> > Anyway, if an artificial test finds a potential problem and the fix is
> > complicated then we need to decide if it is worth it.
> >
> > It does not make sense to complicate the code too much when
> > the fixed problem does not happen in practice.
> >
> >   + Too complicated code might introduce regressions which are
> >     more serious than the original problem.
> >
> >   + Too complicated code increases the maintenance cost. It is
> >     more complicated to add new features or fix bugs.
> >
> >
> > > > > > If you stress it like this, it is quite
> > > > > > expected that it will have an impact. Especially on a large busy system.
> > > > >
> > > > > It seems you agree that the current atomic-replace process lacks scalability.
> > > > > When deploying a livepatch across a large fleet of servers, it’s impossible to
> > > > > ensure that the servers are idle, as their workloads are constantly varying and
> > > > > are not deterministic.
> > > >
> > > > Do you see the scalability problem in production, please?
> > >
> > > Yes, the livepatch transition was stalled.
> >
> > Good to know.
> >
> > >
> > > > And could you prove that it was caused by livepatching, please?
> > >
> > > When the livepatch transition is stalled, running `kpatch list` will
> > > display the stalled information.
> >
> > OK.
> >
> > > > > The challenges are very different when managing 1K servers versus 1M servers.
> > > > > Similarly, the issues differ significantly between patching a single
> > > > > function and
> > > > > patching 100 functions, especially when some of those functions are critical.
> > > > > That’s what scalability is all about.
> > > > >
> > > > > Since we transitioned from the old livepatch mode to the new
> > > > > atomic-replace mode,
> > > >
> > > > What do you mean with the old livepatch mode, please?
> > >
> > > $ kpatch-build -R
> >
> > I am not familiar with kpatch-build. OK, I see the following at
> > https://github.com/dynup/kpatch/blob/master/kpatch-build/kpatch-build
> >
> > echo "          -R, --non-replace       Disable replace patch (replace is on by default)" >&2
> >
> > > >
> > > > Did you allow to install more livepatches in parallel?
> > >
> > > No.
> >
> > I guess that there is a misunderstanding. I am sorry my wording was
> > not clear enough.
> >
> > By "installing" more livepatches in parallel I meant to "have enabled"
> > more livepatches in parallel. It is possible only when you do not
> > use the atomic replace.
> >
> > By other words, if you use "kpatch-build -R" then you could have
> > enabled more livepatches in parallel.
> >
> >
> > > > What was the motivation to switch to the atomic replace, please?
> > >
> > > This is the default behavior of kpatch [1] after upgrading to a new version.
> > >
> > > [1].  https://github.com/dynup/kpatch/tree/master
> >
> > OK. I wonder if the atomic replace simplified some actions for you.
>
> Actually, it simplifies the livepatch deployment since it only
> involves a single livepatch.
>
> >
> > I ask because the proposed "hybrid" model is very similar to the "old"
> > model which did not use the atomic replace.
>
> It’s essentially a hybrid of the old model and the atomic replace model.
>
> >
> > What are the advantages of the "hybrid" model over the "old" model, please?
>
> - Advantages compared to the old model
>   In the old model, it’s not possible to replace a specific livepatch
> with a new one. In the hybrid model, however, you can replace specific
> livepatches as needed.
>
> - Advantages and disadvantages compared to the atomic replace model
>    - Advantage
>      In the atomic replace model, you must replace all old
> livepatches, regardless of whether they are relevant to the new
> livepatch. In the hybrid model, you only need to replace the relevant
> livepatches.
>
>    - Disadvantage
>      In the atomic replace model, you only need to maintain a single
> livepatch. However, in the hybrid model, you need to manage multiple
> livepatches.
>
> >
> >
> > > > > our SREs have consistently reported that one or more servers become
> > > > > stalled during
> > > > > the upgrade (replacement).
> > > >
> > > > What is SRE, please?
> > >
> > > >From the wikipedia : https://en.wikipedia.org/wiki/Site_reliability_engineering
> >
> > Good to know.
> >
> > > > Could you please show some log from a production system?
> > >
> > > When the SREs initially reported that the livepatch transition was
> > > stalled, I simply advised them to try again. However, after
> > > experiencing these crashes, I dug deeper into the livepatch code and
> > > realized that scalability is a concern. As a result, periodically
> > > replacing an old livepatch triggers RCU warnings on our production
> > > servers.
> > >
> > > [Wed Feb  5 10:56:10 2025] livepatch: enabling patch 'livepatch_61_release6'
> > > [Wed Feb  5 10:56:10 2025] livepatch: 'livepatch_61_release6':
> > > starting patching transition
> > > [Wed Feb  5 10:56:24 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > > 1126113 is 10078 jiffies old.
> > > [Wed Feb  5 10:56:38 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > > 1126117 is 10199 jiffies old.
> > > [Wed Feb  5 10:56:52 2025] rcu_tasks_wait_gp: rcu_tasks grace period
> > > 1126121 is 10047 jiffies old.
> > > [Wed Feb  5 10:56:57 2025] livepatch: 'livepatch_61_release6': patching complete
> >
> > I guess that this happens primary because there are many processes
> > running in kernel code:
> >
> >        + many processes => long task list
> >        + in kernel code => need to check stack
> >
> > I wondered how much it is caused by livepatching do_exit() which
> > takes tasklist_lock. The idea was:
>
> I’ll drop the changes in do_exit() and run the test again.

The RCU warning is still triggered even when the do_exit() is not
livepatched, but the frequency of occurrence decreases slightly.

However, with the following change, the RCU warning ceases to appear,
even when do_exit() is present and the test is run at a high frequency
(once every 5 seconds).

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 30187b1..c436aa6 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -400,11 +400,22 @@ void klp_try_complete_transition(void)
         * unless the patch includes changes to a very common function.
         */
        read_lock(&tasklist_lock);
-       for_each_process_thread(g, task)
+       for_each_process_thread(g, task) {
+               if (task->patch_state == klp_target_state)
+                       continue;
                if (!klp_try_switch_task(task))
                        complete = false;
+
+               if (need_resched()) {
+                       complete = false;
+                       break;
+               }
+       }
        read_unlock(&tasklist_lock);

+       /* The above operation might be expensive. */
+       cond_resched();
+
        /*
         * Ditto for the idle "swapper" tasks.
         */

-- 
Regards
Yafang

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2025-02-10  2:50 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-27  6:35 [RFC PATCH 0/2] livepatch: Add support for hybrid mode Yafang Shao
2025-01-27  6:35 ` [RFC PATCH 1/2] livepatch: Add replaceable attribute Yafang Shao
2025-01-27  6:35 ` [RFC PATCH 2/2] livepatch: Implement livepatch hybrid mode Yafang Shao
2025-01-27 14:31   ` Petr Mladek
2025-01-27 15:34     ` Yafang Shao
2025-02-04 13:21       ` Petr Mladek
2025-02-05  2:54         ` Yafang Shao
2025-02-05 16:03           ` Petr Mladek
2025-02-06  2:35             ` Yafang Shao
2025-02-07 13:58               ` Petr Mladek
2025-02-08  3:08                 ` Yafang Shao
2025-02-07  2:31   ` Josh Poimboeuf
2025-02-07  3:16     ` Yafang Shao
2025-02-07  9:36       ` Petr Mladek
2025-02-08  2:14         ` Yafang Shao
2025-02-07 16:59       ` Josh Poimboeuf
2025-02-08  3:38         ` Yafang Shao
2025-01-27 13:46 ` [RFC PATCH 0/2] livepatch: Add support for " Petr Mladek
2025-01-27 14:22   ` Yafang Shao
2025-01-31 13:18     ` Miroslav Benes
2025-02-03  9:44       ` Yafang Shao
2025-02-03 21:53         ` Song Liu
2025-02-05 14:42           ` Yafang Shao
2025-02-05 17:59             ` Song Liu
2025-02-06  2:54               ` Yafang Shao
2025-02-06 18:00                 ` Song Liu
2025-02-08  6:41                   ` Yafang Shao
2025-02-08 15:47                     ` Alexei Starovoitov
2025-02-08 19:32                       ` Josh Poimboeuf
2025-02-09  3:56                         ` Alexei Starovoitov
2025-02-10  2:39                           ` Yafang Shao
2025-02-04 13:05         ` Petr Mladek
2025-02-05  6:16           ` Yafang Shao
2025-02-07 11:00             ` Petr Mladek
2025-02-08  2:49               ` Yafang Shao
2025-02-10  2:50                 ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).