[BUG] Kernel Crash during replacement of livepatch patching do

live-patching.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG] Kernel Crash during replacement of livepatch patching do_exit()
@ 2025-01-21  9:38 Yafang Shao
  2025-01-22  6:36 ` Yafang Shao
  2025-01-22 10:24 ` Petr Mladek
  0 siblings, 2 replies; 12+ messages in thread
From: Yafang Shao @ 2025-01-21  9:38 UTC (permalink / raw)
  To: Josh Poimboeuf, jikos, Miroslav Benes, Petr Mladek, Joe Lawrence,
	live-patching

Hello,

We encountered a panic while upgrading our livepatch, specifically
replacing an old livepatch with a new version on our production
servers.

[156821.048318] livepatch: enabling patch 'livepatch_61_release12'
[156821.061580] livepatch: 'livepatch_61_release12': starting patching
transition
[156821.122212] livepatch: 'livepatch_61_release12': patching complete
[156821.175871] kernel tried to execute NX-protected page - exploit
attempt? (uid: 10524)
[156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa
[156821.176121] #PF: supervisor instruction fetch in kernel mode
[156821.176211] #PF: error_code(0x0011) - permissions violation
[156821.176302] PGD 986c15067 P4D 986c15067 PUD 986c17067 PMD
184f53b067 PTE 800000194c08e163
[156821.176435] Oops: 0011 [#1] PREEMPT SMP NOPTI
[156821.176506] CPU: 70 PID: 783972 Comm: java Kdump: loaded Tainted:
G S      W  O  K    6.1.52-3 #3.pdd
[156821.176654] Hardware name: Inspur SA5212M5/SA5212M5, BIOS 4.1.20 05/05/2021
[156821.176766] RIP: 0010:0xffffffffc0ded7fa
[156821.176841] Code: 0a 00 00 48 89 42 08 48 89 10 4d 89 a6 08 0a 00
00 4c 89 f7 4d 89 a6 10 0a 00 00 4d 8d a7 08 0a 00 00 4d 89 fe e8 00
00 00 00 <49> 8b 87 08 0a 00 00 48 2d 08 0a 00 00 4d 39 ec 75 aa 48 89
df e8
[156821.177138] RSP: 0018:ffffba6f273dbd30 EFLAGS: 00010282
[156821.177222] RAX: 0000000000000000 RBX: ffff94cd316f0000 RCX:
000000008020000d
[156821.177338] RDX: 000000008020000e RSI: 000000008020000d RDI:
ffff94cd316f0000
[156821.177452] RBP: ffffba6f273dbd88 R08: ffff94cd316f13f8 R09:
0000000000000001
[156821.177567] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffba6f273dbd48
[156821.177682] R13: ffffba6f273dbd48 R14: ffffba6f273db340 R15:
ffffba6f273db340
[156821.177797] FS:  0000000000000000(0000) GS:ffff94e321180000(0000)
knlGS:0000000000000000
[156821.177926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[156821.178019] CR2: ffffffffc0ded7fa CR3: 000000015909c006 CR4:
00000000007706e0
[156821.178133] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[156821.178248] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[156821.178363] PKRU: 55555554
[156821.178407] Call Trace:
[156821.178449]  <TASK>
[156821.178492]  ? show_regs.cold+0x1a/0x1f
[156821.178559]  ? __die_body+0x20/0x70
[156821.178617]  ? __die+0x2b/0x37
[156821.178669]  ? page_fault_oops+0x136/0x2b0
[156821.178734]  ? search_bpf_extables+0x63/0x90
[156821.178805]  ? search_exception_tables+0x5f/0x70
[156821.178881]  ? kernelmode_fixup_or_oops+0xa2/0x120
[156821.178957]  ? __bad_area_nosemaphore+0x176/0x1b0
[156821.179034]  ? bad_area_nosemaphore+0x16/0x20
[156821.179105]  ? do_kern_addr_fault+0x77/0x90
[156821.179175]  ? exc_page_fault+0xc6/0x160
[156821.179239]  ? asm_exc_page_fault+0x27/0x30
[156821.179310]  do_group_exit+0x35/0x90
[156821.179371]  get_signal+0x909/0x950
[156821.179429]  ? wake_up_q+0x50/0x90
[156821.179486]  arch_do_signal_or_restart+0x34/0x2a0
[156821.183207]  exit_to_user_mode_prepare+0x149/0x1b0
[156821.186963]  syscall_exit_to_user_mode+0x1e/0x50
[156821.190723]  do_syscall_64+0x48/0x90
[156821.194500]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[156821.198195] RIP: 0033:0x7f967feb5a35
[156821.201769] Code: Unable to access opcode bytes at 0x7f967feb5a0b.
[156821.205283] RSP: 002b:00007f96664ee670 EFLAGS: 00000246 ORIG_RAX:
00000000000000ca
[156821.208790] RAX: fffffffffffffe00 RBX: 00007f967808a650 RCX:
00007f967feb5a35
[156821.212305] RDX: 000000000000000f RSI: 0000000000000080 RDI:
00007f967808a654
[156821.215785] RBP: 00007f96664ee6c0 R08: 00007f967808a600 R09:
0000000000000007
[156821.219273] R10: 0000000000000000 R11: 0000000000000246 R12:
00007f967808a600
[156821.222727] R13: 00007f967808a628 R14: 00007f967f691220 R15:
00007f96664ee750
[156821.226155]  </TASK>
[156821.229470] Modules linked in: livepatch_61_release12(OK)
ebtable_filter ebtables af_packet_diag netlink_diag xt_DSCP xt_owner
iptable_mangle iptable_raw xt_CT cls_bpf sch_ingress bpf_preload
binfmt_misc raw_diag unix_diag tcp_diag udp_diag inet_diag
iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack_netlink
nfnetlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay af_packet
bonding tls intel_rapl_msr intel_rapl_common intel_uncore_frequency
intel_uncore_frequency_common isst_if_common skx_edac nfit
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
rapl vfat fat intel_cstate iTCO_wdt xfs intel_uncore pcspkr ses
enclosure mei_me i2c_i801 input_leds lpc_ich acpi_ipmi ioatdma
i2c_smbus mei mfd_core dca wmi ipmi_si intel_pch_thermal ipmi_devintf
ipmi_msghandler acpi_cpufreq acpi_pad acpi_power_meter ip_tables ext4
mbcache jbd2 sd_mod sg mpt3sas raid_class scsi_transport_sas
megaraid_sas crct10dif_pclmul crc32_pclmul crc32c_intel
polyval_clmulni polyval_generic
[156821.229555]  ghash_clmulni_intel sha512_ssse3 aesni_intel
crypto_simd cryptd nvme nvme_core t10_pi i40e ptp pps_core ahci
libahci libata deflate zlib_deflate
[156821.259012] Unloaded tainted modules:
livepatch_61_release6(OK):14089 livepatch_61_release12(OK):14088 [last
unloaded: livepatch_61_release6(OK)]
[156821.275421] CR2: ffffffffc0ded7fa

Although the issue was observed on an older 6.1 kernel, I suspect it
persists in the upstream kernel as well. Due to the significant effort
required to deploy the upstream kernel in our production environment,
I have not yet attempted to reproduce the issue with the latest
upstream version.

Crash Analysis:
=============

crash> bt
PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
 #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
 #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
 #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
 #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
 #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
ffffffffc0ded7fa [livepatch_61_release6]
 #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
ffffffffc0ded7fa [livepatch_61_release6]
 #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
ffffffffc0ded7fa [livepatch_61_release6]
 #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
    [exception RIP: _MODULE_START_livepatch_61_release6+14330]
    RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffff94cd316f0000  RCX: 000000008020000d
    RDX: 000000008020000e  RSI: 000000008020000d  RDI: ffff94cd316f0000
    RBP: ffffba6f273dbd88   R8: ffff94cd316f13f8   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffba6f273dbd48
    R13: ffffba6f273dbd48  R14: ffffba6f273db340  R15: ffffba6f273db340
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffba6f273dbd90] do_group_exit at ffffffff99092395
 #9 [ffffba6f273dbdc0] get_signal at ffffffff990a1c69
#10 [ffffba6f273dbdd0] wake_up_q at ffffffff990ce060
#11 [ffffba6f273dbe48] arch_do_signal_or_restart at ffffffff990209b4
#12 [ffffba6f273dbee0] exit_to_user_mode_prepare at ffffffff9912fdf9
#13 [ffffba6f273dbf20] syscall_exit_to_user_mode at ffffffff99aeb87e
#14 [ffffba6f273dbf38] do_syscall_64 at ffffffff99ae70b8
#15 [ffffba6f273dbf50] entry_SYSCALL_64_after_hwframe at ffffffff99c000dc
    RIP: 00007f967feb5a35  RSP: 00007f96664ee670  RFLAGS: 00000246
    RAX: fffffffffffffe00  RBX: 00007f967808a650  RCX: 00007f967feb5a35
    RDX: 000000000000000f  RSI: 0000000000000080  RDI: 00007f967808a654
    RBP: 00007f96664ee6c0   R8: 00007f967808a600   R9: 0000000000000007
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007f967808a600
    R13: 00007f967808a628  R14: 00007f967f691220  R15: 00007f96664ee750
    ORIG_RAX: 00000000000000ca  CS: 0033  SS: 002b

The crash occurred at the address 0xffffffffc0ded7fa, which is within
the livepatch_61_release6. However, from the kernel log, it's clear
that this module was replaced by livepatch_61_release12. We can verify
this with the crash utility:

crash> dis do_exit
dis: do_exit: duplicate text symbols found:
ffffffff99091700 (T) do_exit
/root/rpmbuild/BUILD/kernel-6.1.52/kernel/exit.c: 806
ffffffffc0e038d0 (t) do_exit [livepatch_61_release12]

crash> dis ffffffff99091700
0xffffffff99091700 <do_exit>:   call   0xffffffffc08b9000
0xffffffff99091705 <do_exit+5>: push   %rbp

Here, do_exit was patched in livepatch_61_release12, with the
trampoline address of the new implementation being 0xffffffffc08b9000.

Next, we checked the klp_ops struct to verify the livepatch operations:

crash> list klp_ops.node -H klp_ops  -s klp_ops -x
...
ffff94f3ab8ec900
struct klp_ops {
  node = {
    next = 0xffff94f3ab8ecc00,
    prev = 0xffff94f3ab8ed500
  },
  func_stack = {
    next = 0xffff94cd4a856238,
    prev = 0xffff94cd4a856238
  },
...

crash> struct -o klp_func.stack_node
struct klp_func {
  [112] struct list_head stack_node;
}

crash> klp_func ffff94cd4a8561c8
struct klp_func {
  old_name = 0xffffffffc0e086c8 "do_exit",
  new_func = 0xffffffffc0e038d0,
  old_sympos = 0,
  old_func = 0xffffffff99091700 <do_exit>,
  kobj = {
    name = 0xffff94f379c519c0 "do_exit,1",
    entry = {
      next = 0xffff94cd4a8561f0,
      prev = 0xffff94cd4a8561f0
    },
    parent = 0xffff94e487064ad8,

The do_exit function from livepatch_61_release6 was successfully
replaced by the updated version in livepatch_61_release12, but the
task causing the crash was still executing the older do_exit() from
livepatch_61_release6.

This was confirmed when we checked the symbol mapping for livepatch_61_release6:

crash> sym -m livepatch_61_release6
ffffffffc0dea000 MODULE START: livepatch_61_release6
ffffffffc0dff000 MODULE END: livepatch_61_release6

We identified that the crash occurred at offset 0x37fa within the old
livepatch module, specifically right after the release_task()
function. This crash took place within the do_exit() function. (Note
that the instruction shown below is decoded from the newly loaded
livepatch_61_release6, so while the address differs, the offset
remains the same.)

0xffffffffc0db07eb <do_exit+1803>:      lea    0xa08(%r15),%r12
0xffffffffc0db07f2 <do_exit+1810>:      mov    %r15,%r14
0xffffffffc0db07f5 <do_exit+1813>:      call   0xffffffff9a08fc00 <release_task>
0xffffffffc0db07fa <do_exit+1818>:      mov    0xa08(%r15),%rax
         <<<<<<<
0xffffffffc0db0801 <do_exit+1825>:      sub    $0xa08,%rax

Interestingly, the crash occurred immediately after returning from the
release_task() function. Four servers crashed out of around 50K, all
after returning from release_task().

This suggests a potential synchronization issue between release_task()
and klp_try_complete_transition(). It is possible that
klp_try_switch_task() failed to detect the task executing
release_task(), or that klp_synchronize_transition() failed to wait
for release_task() to finish.

I suspect we need do something change as follows,

--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -220,6 +220,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)

        kprobe_flush_task(tsk);
        rethook_flush_task(tsk);
+       klp_flush_task(tsk);
        perf_event_delayed_put(tsk);
        trace_sched_process_free(tsk);
        put_task_struct(tsk);

Any suggestions ?

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-21  9:38 [BUG] Kernel Crash during replacement of livepatch patching do_exit() Yafang Shao
@ 2025-01-22  6:36 ` Yafang Shao
  2025-01-22 11:45   ` Petr Mladek
  2025-01-22 10:24 ` Petr Mladek
  1 sibling, 1 reply; 12+ messages in thread
From: Yafang Shao @ 2025-01-22  6:36 UTC (permalink / raw)
  To: Josh Poimboeuf, jikos, Miroslav Benes, Petr Mladek, Joe Lawrence,
	live-patching

On Tue, Jan 21, 2025 at 5:38 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> Hello,
>
> We encountered a panic while upgrading our livepatch, specifically
> replacing an old livepatch with a new version on our production
> servers.
>
> [156821.048318] livepatch: enabling patch 'livepatch_61_release12'
> [156821.061580] livepatch: 'livepatch_61_release12': starting patching
> transition
> [156821.122212] livepatch: 'livepatch_61_release12': patching complete
> [156821.175871] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 10524)
> [156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa
> [156821.176121] #PF: supervisor instruction fetch in kernel mode
> [156821.176211] #PF: error_code(0x0011) - permissions violation
> [156821.176302] PGD 986c15067 P4D 986c15067 PUD 986c17067 PMD
> 184f53b067 PTE 800000194c08e163
> [156821.176435] Oops: 0011 [#1] PREEMPT SMP NOPTI
> [156821.176506] CPU: 70 PID: 783972 Comm: java Kdump: loaded Tainted:
> G S      W  O  K    6.1.52-3 #3.pdd
> [156821.176654] Hardware name: Inspur SA5212M5/SA5212M5, BIOS 4.1.20 05/05/2021
> [156821.176766] RIP: 0010:0xffffffffc0ded7fa
> [156821.176841] Code: 0a 00 00 48 89 42 08 48 89 10 4d 89 a6 08 0a 00
> 00 4c 89 f7 4d 89 a6 10 0a 00 00 4d 8d a7 08 0a 00 00 4d 89 fe e8 00
> 00 00 00 <49> 8b 87 08 0a 00 00 48 2d 08 0a 00 00 4d 39 ec 75 aa 48 89
> df e8
> [156821.177138] RSP: 0018:ffffba6f273dbd30 EFLAGS: 00010282
> [156821.177222] RAX: 0000000000000000 RBX: ffff94cd316f0000 RCX:
> 000000008020000d
> [156821.177338] RDX: 000000008020000e RSI: 000000008020000d RDI:
> ffff94cd316f0000
> [156821.177452] RBP: ffffba6f273dbd88 R08: ffff94cd316f13f8 R09:
> 0000000000000001
> [156821.177567] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffba6f273dbd48
> [156821.177682] R13: ffffba6f273dbd48 R14: ffffba6f273db340 R15:
> ffffba6f273db340
> [156821.177797] FS:  0000000000000000(0000) GS:ffff94e321180000(0000)
> knlGS:0000000000000000
> [156821.177926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [156821.178019] CR2: ffffffffc0ded7fa CR3: 000000015909c006 CR4:
> 00000000007706e0
> [156821.178133] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [156821.178248] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [156821.178363] PKRU: 55555554
> [156821.178407] Call Trace:
> [156821.178449]  <TASK>
> [156821.178492]  ? show_regs.cold+0x1a/0x1f
> [156821.178559]  ? __die_body+0x20/0x70
> [156821.178617]  ? __die+0x2b/0x37
> [156821.178669]  ? page_fault_oops+0x136/0x2b0
> [156821.178734]  ? search_bpf_extables+0x63/0x90
> [156821.178805]  ? search_exception_tables+0x5f/0x70
> [156821.178881]  ? kernelmode_fixup_or_oops+0xa2/0x120
> [156821.178957]  ? __bad_area_nosemaphore+0x176/0x1b0
> [156821.179034]  ? bad_area_nosemaphore+0x16/0x20
> [156821.179105]  ? do_kern_addr_fault+0x77/0x90
> [156821.179175]  ? exc_page_fault+0xc6/0x160
> [156821.179239]  ? asm_exc_page_fault+0x27/0x30
> [156821.179310]  do_group_exit+0x35/0x90
> [156821.179371]  get_signal+0x909/0x950
> [156821.179429]  ? wake_up_q+0x50/0x90
> [156821.179486]  arch_do_signal_or_restart+0x34/0x2a0
> [156821.183207]  exit_to_user_mode_prepare+0x149/0x1b0
> [156821.186963]  syscall_exit_to_user_mode+0x1e/0x50
> [156821.190723]  do_syscall_64+0x48/0x90
> [156821.194500]  entry_SYSCALL_64_after_hwframe+0x64/0xce
> [156821.198195] RIP: 0033:0x7f967feb5a35
> [156821.201769] Code: Unable to access opcode bytes at 0x7f967feb5a0b.
> [156821.205283] RSP: 002b:00007f96664ee670 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000ca
> [156821.208790] RAX: fffffffffffffe00 RBX: 00007f967808a650 RCX:
> 00007f967feb5a35
> [156821.212305] RDX: 000000000000000f RSI: 0000000000000080 RDI:
> 00007f967808a654
> [156821.215785] RBP: 00007f96664ee6c0 R08: 00007f967808a600 R09:
> 0000000000000007
> [156821.219273] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00007f967808a600
> [156821.222727] R13: 00007f967808a628 R14: 00007f967f691220 R15:
> 00007f96664ee750
> [156821.226155]  </TASK>
> [156821.229470] Modules linked in: livepatch_61_release12(OK)
> ebtable_filter ebtables af_packet_diag netlink_diag xt_DSCP xt_owner
> iptable_mangle iptable_raw xt_CT cls_bpf sch_ingress bpf_preload
> binfmt_misc raw_diag unix_diag tcp_diag udp_diag inet_diag
> iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack_netlink
> nfnetlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay af_packet
> bonding tls intel_rapl_msr intel_rapl_common intel_uncore_frequency
> intel_uncore_frequency_common isst_if_common skx_edac nfit
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> rapl vfat fat intel_cstate iTCO_wdt xfs intel_uncore pcspkr ses
> enclosure mei_me i2c_i801 input_leds lpc_ich acpi_ipmi ioatdma
> i2c_smbus mei mfd_core dca wmi ipmi_si intel_pch_thermal ipmi_devintf
> ipmi_msghandler acpi_cpufreq acpi_pad acpi_power_meter ip_tables ext4
> mbcache jbd2 sd_mod sg mpt3sas raid_class scsi_transport_sas
> megaraid_sas crct10dif_pclmul crc32_pclmul crc32c_intel
> polyval_clmulni polyval_generic
> [156821.229555]  ghash_clmulni_intel sha512_ssse3 aesni_intel
> crypto_simd cryptd nvme nvme_core t10_pi i40e ptp pps_core ahci
> libahci libata deflate zlib_deflate
> [156821.259012] Unloaded tainted modules:
> livepatch_61_release6(OK):14089 livepatch_61_release12(OK):14088 [last
> unloaded: livepatch_61_release6(OK)]
> [156821.275421] CR2: ffffffffc0ded7fa
>
> Although the issue was observed on an older 6.1 kernel, I suspect it
> persists in the upstream kernel as well. Due to the significant effort
> required to deploy the upstream kernel in our production environment,
> I have not yet attempted to reproduce the issue with the latest
> upstream version.
>
> Crash Analysis:
> =============
>
> crash> bt
> PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
>  #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
>  #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
>  #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
>  #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
>  #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
> ffffffffc0ded7fa [livepatch_61_release6]
>  #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
> ffffffffc0ded7fa [livepatch_61_release6]
>  #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
> ffffffffc0ded7fa [livepatch_61_release6]
>  #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
>     [exception RIP: _MODULE_START_livepatch_61_release6+14330]
>     RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282
>     RAX: 0000000000000000  RBX: ffff94cd316f0000  RCX: 000000008020000d
>     RDX: 000000008020000e  RSI: 000000008020000d  RDI: ffff94cd316f0000
>     RBP: ffffba6f273dbd88   R8: ffff94cd316f13f8   R9: 0000000000000001
>     R10: 0000000000000000  R11: 0000000000000000  R12: ffffba6f273dbd48
>     R13: ffffba6f273dbd48  R14: ffffba6f273db340  R15: ffffba6f273db340
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #8 [ffffba6f273dbd90] do_group_exit at ffffffff99092395
>  #9 [ffffba6f273dbdc0] get_signal at ffffffff990a1c69
> #10 [ffffba6f273dbdd0] wake_up_q at ffffffff990ce060
> #11 [ffffba6f273dbe48] arch_do_signal_or_restart at ffffffff990209b4
> #12 [ffffba6f273dbee0] exit_to_user_mode_prepare at ffffffff9912fdf9
> #13 [ffffba6f273dbf20] syscall_exit_to_user_mode at ffffffff99aeb87e
> #14 [ffffba6f273dbf38] do_syscall_64 at ffffffff99ae70b8
> #15 [ffffba6f273dbf50] entry_SYSCALL_64_after_hwframe at ffffffff99c000dc
>     RIP: 00007f967feb5a35  RSP: 00007f96664ee670  RFLAGS: 00000246
>     RAX: fffffffffffffe00  RBX: 00007f967808a650  RCX: 00007f967feb5a35
>     RDX: 000000000000000f  RSI: 0000000000000080  RDI: 00007f967808a654
>     RBP: 00007f96664ee6c0   R8: 00007f967808a600   R9: 0000000000000007
>     R10: 0000000000000000  R11: 0000000000000246  R12: 00007f967808a600
>     R13: 00007f967808a628  R14: 00007f967f691220  R15: 00007f96664ee750
>     ORIG_RAX: 00000000000000ca  CS: 0033  SS: 002b
>
> The crash occurred at the address 0xffffffffc0ded7fa, which is within
> the livepatch_61_release6. However, from the kernel log, it's clear
> that this module was replaced by livepatch_61_release12. We can verify
> this with the crash utility:
>
> crash> dis do_exit
> dis: do_exit: duplicate text symbols found:
> ffffffff99091700 (T) do_exit
> /root/rpmbuild/BUILD/kernel-6.1.52/kernel/exit.c: 806
> ffffffffc0e038d0 (t) do_exit [livepatch_61_release12]
>
> crash> dis ffffffff99091700
> 0xffffffff99091700 <do_exit>:   call   0xffffffffc08b9000
> 0xffffffff99091705 <do_exit+5>: push   %rbp
>
> Here, do_exit was patched in livepatch_61_release12, with the
> trampoline address of the new implementation being 0xffffffffc08b9000.
>
> Next, we checked the klp_ops struct to verify the livepatch operations:
>
> crash> list klp_ops.node -H klp_ops  -s klp_ops -x
> ...
> ffff94f3ab8ec900
> struct klp_ops {
>   node = {
>     next = 0xffff94f3ab8ecc00,
>     prev = 0xffff94f3ab8ed500
>   },
>   func_stack = {
>     next = 0xffff94cd4a856238,
>     prev = 0xffff94cd4a856238
>   },
> ...
>
> crash> struct -o klp_func.stack_node
> struct klp_func {
>   [112] struct list_head stack_node;
> }
>
> crash> klp_func ffff94cd4a8561c8
> struct klp_func {
>   old_name = 0xffffffffc0e086c8 "do_exit",
>   new_func = 0xffffffffc0e038d0,
>   old_sympos = 0,
>   old_func = 0xffffffff99091700 <do_exit>,
>   kobj = {
>     name = 0xffff94f379c519c0 "do_exit,1",
>     entry = {
>       next = 0xffff94cd4a8561f0,
>       prev = 0xffff94cd4a8561f0
>     },
>     parent = 0xffff94e487064ad8,
>
> The do_exit function from livepatch_61_release6 was successfully
> replaced by the updated version in livepatch_61_release12, but the
> task causing the crash was still executing the older do_exit() from
> livepatch_61_release6.
>
> This was confirmed when we checked the symbol mapping for livepatch_61_release6:
>
> crash> sym -m livepatch_61_release6
> ffffffffc0dea000 MODULE START: livepatch_61_release6
> ffffffffc0dff000 MODULE END: livepatch_61_release6
>
> We identified that the crash occurred at offset 0x37fa within the old
> livepatch module, specifically right after the release_task()
> function. This crash took place within the do_exit() function. (Note
> that the instruction shown below is decoded from the newly loaded
> livepatch_61_release6, so while the address differs, the offset
> remains the same.)
>
> 0xffffffffc0db07eb <do_exit+1803>:      lea    0xa08(%r15),%r12
> 0xffffffffc0db07f2 <do_exit+1810>:      mov    %r15,%r14
> 0xffffffffc0db07f5 <do_exit+1813>:      call   0xffffffff9a08fc00 <release_task>
> 0xffffffffc0db07fa <do_exit+1818>:      mov    0xa08(%r15),%rax
>          <<<<<<<
> 0xffffffffc0db0801 <do_exit+1825>:      sub    $0xa08,%rax
>
> Interestingly, the crash occurred immediately after returning from the
> release_task() function. Four servers crashed out of around 50K, all
> after returning from release_task().
>
> This suggests a potential synchronization issue between release_task()
> and klp_try_complete_transition(). It is possible that
> klp_try_switch_task() failed to detect the task executing
> release_task(), or that klp_synchronize_transition() failed to wait
> for release_task() to finish.
>
> I suspect we need do something change as follows,
>
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -220,6 +220,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
>
>         kprobe_flush_task(tsk);
>         rethook_flush_task(tsk);
> +       klp_flush_task(tsk);
>         perf_event_delayed_put(tsk);
>         trace_sched_process_free(tsk);
>         put_task_struct(tsk);
>
> Any suggestions ?

Hello,

After further analysis, my best guess is that the task stack is being
freed in release_task() while klp_try_switch_task() is still
attempting to access it. It seems we should consider calling
try_get_task_stack() in klp_check_stack() to address this.

I plan to reproduce the issue with this change. Please let me know if
my assessment is incorrect.

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-21  9:38 [BUG] Kernel Crash during replacement of livepatch patching do_exit() Yafang Shao
  2025-01-22  6:36 ` Yafang Shao
@ 2025-01-22 10:24 ` Petr Mladek
  1 sibling, 0 replies; 12+ messages in thread
From: Petr Mladek @ 2025-01-22 10:24 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Josh Poimboeuf, jikos, Miroslav Benes, Joe Lawrence,
	live-patching

On Tue 2025-01-21 17:38:45, Yafang Shao wrote:
> Hello,
> 
> We encountered a panic while upgrading our livepatch, specifically
> replacing an old livepatch with a new version on our production
> servers.
> 
> [156821.048318] livepatch: enabling patch 'livepatch_61_release12'
> [156821.061580] livepatch: 'livepatch_61_release12': starting patching
> transition
> [156821.122212] livepatch: 'livepatch_61_release12': patching complete
> [156821.175871] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 10524)
> [156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa
> [156821.176121] #PF: supervisor instruction fetch in kernel mode
> [156821.176211] #PF: error_code(0x0011) - permissions violation
> [156821.176302] PGD 986c15067 P4D 986c15067 PUD 986c17067 PMD
> 184f53b067 PTE 800000194c08e163
> [156821.176435] Oops: 0011 [#1] PREEMPT SMP NOPTI
> [156821.176506] CPU: 70 PID: 783972 Comm: java Kdump: loaded Tainted:
> G S      W  O  K    6.1.52-3 #3.pdd
> [156821.176654] Hardware name: Inspur SA5212M5/SA5212M5, BIOS 4.1.20 05/05/2021
> [156821.176766] RIP: 0010:0xffffffffc0ded7fa
> [156821.176841] Code: 0a 00 00 48 89 42 08 48 89 10 4d 89 a6 08 0a 00
> 00 4c 89 f7 4d 89 a6 10 0a 00 00 4d 8d a7 08 0a 00 00 4d 89 fe e8 00
> 00 00 00 <49> 8b 87 08 0a 00 00 48 2d 08 0a 00 00 4d 39 ec 75 aa 48 89
> df e8
> [156821.177138] RSP: 0018:ffffba6f273dbd30 EFLAGS: 00010282
> [156821.177222] RAX: 0000000000000000 RBX: ffff94cd316f0000 RCX:
> 000000008020000d
> [156821.177338] RDX: 000000008020000e RSI: 000000008020000d RDI:
> ffff94cd316f0000
> [156821.177452] RBP: ffffba6f273dbd88 R08: ffff94cd316f13f8 R09:
> 0000000000000001
> [156821.177567] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffba6f273dbd48
> [156821.177682] R13: ffffba6f273dbd48 R14: ffffba6f273db340 R15:
> ffffba6f273db340
> [156821.177797] FS:  0000000000000000(0000) GS:ffff94e321180000(0000)
> knlGS:0000000000000000
> [156821.177926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [156821.178019] CR2: ffffffffc0ded7fa CR3: 000000015909c006 CR4:
> 00000000007706e0
> [156821.178133] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [156821.178248] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [156821.178363] PKRU: 55555554
> [156821.178407] Call Trace:
> [156821.178449]  <TASK>
> [156821.178492]  ? show_regs.cold+0x1a/0x1f
> [156821.178559]  ? __die_body+0x20/0x70
> [156821.178617]  ? __die+0x2b/0x37
> [156821.178669]  ? page_fault_oops+0x136/0x2b0
> [156821.178734]  ? search_bpf_extables+0x63/0x90
> [156821.178805]  ? search_exception_tables+0x5f/0x70
> [156821.178881]  ? kernelmode_fixup_or_oops+0xa2/0x120
> [156821.178957]  ? __bad_area_nosemaphore+0x176/0x1b0
> [156821.179034]  ? bad_area_nosemaphore+0x16/0x20
> [156821.179105]  ? do_kern_addr_fault+0x77/0x90
> [156821.179175]  ? exc_page_fault+0xc6/0x160
> [156821.179239]  ? asm_exc_page_fault+0x27/0x30
> [156821.179310]  do_group_exit+0x35/0x90
> [156821.179371]  get_signal+0x909/0x950
> [156821.179429]  ? wake_up_q+0x50/0x90
> [156821.179486]  arch_do_signal_or_restart+0x34/0x2a0
> [156821.183207]  exit_to_user_mode_prepare+0x149/0x1b0
> [156821.186963]  syscall_exit_to_user_mode+0x1e/0x50
> [156821.190723]  do_syscall_64+0x48/0x90
> [156821.194500]  entry_SYSCALL_64_after_hwframe+0x64/0xce
> [156821.198195] RIP: 0033:0x7f967feb5a35
> [156821.201769] Code: Unable to access opcode bytes at 0x7f967feb5a0b.
> [156821.205283] RSP: 002b:00007f96664ee670 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000ca
> [156821.208790] RAX: fffffffffffffe00 RBX: 00007f967808a650 RCX:
> 00007f967feb5a35
> [156821.212305] RDX: 000000000000000f RSI: 0000000000000080 RDI:
> 00007f967808a654
> [156821.215785] RBP: 00007f96664ee6c0 R08: 00007f967808a600 R09:
> 0000000000000007
> [156821.219273] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00007f967808a600
> [156821.222727] R13: 00007f967808a628 R14: 00007f967f691220 R15:
> 00007f96664ee750
> [156821.226155]  </TASK>
> [156821.229470] Modules linked in: livepatch_61_release12(OK)
> ebtable_filter ebtables af_packet_diag netlink_diag xt_DSCP xt_owner
> iptable_mangle iptable_raw xt_CT cls_bpf sch_ingress bpf_preload
> binfmt_misc raw_diag unix_diag tcp_diag udp_diag inet_diag
> iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack_netlink
> nfnetlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay af_packet
> bonding tls intel_rapl_msr intel_rapl_common intel_uncore_frequency
> intel_uncore_frequency_common isst_if_common skx_edac nfit
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> rapl vfat fat intel_cstate iTCO_wdt xfs intel_uncore pcspkr ses
> enclosure mei_me i2c_i801 input_leds lpc_ich acpi_ipmi ioatdma
> i2c_smbus mei mfd_core dca wmi ipmi_si intel_pch_thermal ipmi_devintf
> ipmi_msghandler acpi_cpufreq acpi_pad acpi_power_meter ip_tables ext4
> mbcache jbd2 sd_mod sg mpt3sas raid_class scsi_transport_sas
> megaraid_sas crct10dif_pclmul crc32_pclmul crc32c_intel
> polyval_clmulni polyval_generic
> [156821.229555]  ghash_clmulni_intel sha512_ssse3 aesni_intel
> crypto_simd cryptd nvme nvme_core t10_pi i40e ptp pps_core ahci
> libahci libata deflate zlib_deflate
> [156821.259012] Unloaded tainted modules:
> livepatch_61_release6(OK):14089 livepatch_61_release12(OK):14088 [last
> unloaded: livepatch_61_release6(OK)]
> [156821.275421] CR2: ffffffffc0ded7fa
> 
> Although the issue was observed on an older 6.1 kernel, I suspect it
> persists in the upstream kernel as well. Due to the significant effort
> required to deploy the upstream kernel in our production environment,
> I have not yet attempted to reproduce the issue with the latest
> upstream version.
> 
> Crash Analysis:
> =============
> 
> crash> bt
> PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
>  #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
>  #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
>  #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
>  #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
>  #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
> ffffffffc0ded7fa [livepatch_61_release6]
>  #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
> ffffffffc0ded7fa [livepatch_61_release6]
>  #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
> ffffffffc0ded7fa [livepatch_61_release6]
>  #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
>     [exception RIP: _MODULE_START_livepatch_61_release6+14330]
>     RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282
>     RAX: 0000000000000000  RBX: ffff94cd316f0000  RCX: 000000008020000d
>     RDX: 000000008020000e  RSI: 000000008020000d  RDI: ffff94cd316f0000
>     RBP: ffffba6f273dbd88   R8: ffff94cd316f13f8   R9: 0000000000000001
>     R10: 0000000000000000  R11: 0000000000000000  R12: ffffba6f273dbd48
>     R13: ffffba6f273dbd48  R14: ffffba6f273db340  R15: ffffba6f273db340
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #8 [ffffba6f273dbd90] do_group_exit at ffffffff99092395
>  #9 [ffffba6f273dbdc0] get_signal at ffffffff990a1c69
> #10 [ffffba6f273dbdd0] wake_up_q at ffffffff990ce060
> #11 [ffffba6f273dbe48] arch_do_signal_or_restart at ffffffff990209b4
> #12 [ffffba6f273dbee0] exit_to_user_mode_prepare at ffffffff9912fdf9
> #13 [ffffba6f273dbf20] syscall_exit_to_user_mode at ffffffff99aeb87e
> #14 [ffffba6f273dbf38] do_syscall_64 at ffffffff99ae70b8
> #15 [ffffba6f273dbf50] entry_SYSCALL_64_after_hwframe at ffffffff99c000dc
>     RIP: 00007f967feb5a35  RSP: 00007f96664ee670  RFLAGS: 00000246
>     RAX: fffffffffffffe00  RBX: 00007f967808a650  RCX: 00007f967feb5a35
>     RDX: 000000000000000f  RSI: 0000000000000080  RDI: 00007f967808a654
>     RBP: 00007f96664ee6c0   R8: 00007f967808a600   R9: 0000000000000007
>     R10: 0000000000000000  R11: 0000000000000246  R12: 00007f967808a600
>     R13: 00007f967808a628  R14: 00007f967f691220  R15: 00007f96664ee750
>     ORIG_RAX: 00000000000000ca  CS: 0033  SS: 002b
> 
> The crash occurred at the address 0xffffffffc0ded7fa, which is within
> the livepatch_61_release6. However, from the kernel log, it's clear
> that this module was replaced by livepatch_61_release12. We can verify
> this with the crash utility:
> 
> crash> dis do_exit
> dis: do_exit: duplicate text symbols found:
> ffffffff99091700 (T) do_exit
> /root/rpmbuild/BUILD/kernel-6.1.52/kernel/exit.c: 806
> ffffffffc0e038d0 (t) do_exit [livepatch_61_release12]
> 
> crash> dis ffffffff99091700
> 0xffffffff99091700 <do_exit>:   call   0xffffffffc08b9000
> 0xffffffff99091705 <do_exit+5>: push   %rbp
> 
> Here, do_exit was patched in livepatch_61_release12, with the
> trampoline address of the new implementation being 0xffffffffc08b9000.
> 
> Next, we checked the klp_ops struct to verify the livepatch operations:
> 
> crash> list klp_ops.node -H klp_ops  -s klp_ops -x
> ...
> ffff94f3ab8ec900
> struct klp_ops {
>   node = {
>     next = 0xffff94f3ab8ecc00,
>     prev = 0xffff94f3ab8ed500
>   },
>   func_stack = {
>     next = 0xffff94cd4a856238,
>     prev = 0xffff94cd4a856238
>   },
> ...
> 
> crash> struct -o klp_func.stack_node
> struct klp_func {
>   [112] struct list_head stack_node;
> }
> 
> crash> klp_func ffff94cd4a8561c8
> struct klp_func {
>   old_name = 0xffffffffc0e086c8 "do_exit",
>   new_func = 0xffffffffc0e038d0,
>   old_sympos = 0,
>   old_func = 0xffffffff99091700 <do_exit>,
>   kobj = {
>     name = 0xffff94f379c519c0 "do_exit,1",
>     entry = {
>       next = 0xffff94cd4a8561f0,
>       prev = 0xffff94cd4a8561f0
>     },
>     parent = 0xffff94e487064ad8,
> 
> The do_exit function from livepatch_61_release6 was successfully
> replaced by the updated version in livepatch_61_release12, but the
> task causing the crash was still executing the older do_exit() from
> livepatch_61_release6.
> 
> This was confirmed when we checked the symbol mapping for livepatch_61_release6:
> 
> crash> sym -m livepatch_61_release6
> ffffffffc0dea000 MODULE START: livepatch_61_release6
> ffffffffc0dff000 MODULE END: livepatch_61_release6
> 
> We identified that the crash occurred at offset 0x37fa within the old
> livepatch module, specifically right after the release_task()
> function. This crash took place within the do_exit() function. (Note
> that the instruction shown below is decoded from the newly loaded
> livepatch_61_release6, so while the address differs, the offset
> remains the same.)
> 
> 0xffffffffc0db07eb <do_exit+1803>:      lea    0xa08(%r15),%r12
> 0xffffffffc0db07f2 <do_exit+1810>:      mov    %r15,%r14
> 0xffffffffc0db07f5 <do_exit+1813>:      call   0xffffffff9a08fc00 <release_task>
> 0xffffffffc0db07fa <do_exit+1818>:      mov    0xa08(%r15),%rax
>          <<<<<<<
> 0xffffffffc0db0801 <do_exit+1825>:      sub    $0xa08,%rax
> 
> Interestingly, the crash occurred immediately after returning from the
> release_task() function. Four servers crashed out of around 50K, all
> after returning from release_task().

The release_task() removes the process from the task list. It happens in:

  + release_task()
    + __exit_signal()
      + __unhash_process()

As a result, the task is no longer listed by for_each_process_thread()
loop in klp_complete_transition(). It becomes invisible for
the livepatching core and can't block the transition.

> This suggests a potential synchronization issue between release_task()
> and klp_try_complete_transition(). It is possible that
> klp_try_switch_task() failed to detect the task executing
> release_task(), or that klp_synchronize_transition() failed to wait
> for release_task() to finish.
>
> I suspect we need do something change as follows,
> 
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -220,6 +220,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
> 
>         kprobe_flush_task(tsk);
>         rethook_flush_task(tsk);
> +       klp_flush_task(tsk);

What exactly should klp_flush_task() do, please?

>         perf_event_delayed_put(tsk);
>         trace_sched_process_free(tsk);
>         put_task_struct(tsk);
> 
> Any suggestions ?

I am playing with some ideas how to block the transition until
delayed_put_task_struct() is called for all pending processes.
But it is not trivial.

Anyway, livepatching of do_exit() is not safe at the moment.
I think that the same problem is with fork.

In principle, it might be dangerous to livepatch functions which
might add/remove tasks to/from the task list used by
for_each_process_thread().

I am not sure at the moment how complicated and safe would be
to make this safe.

Best Regards,
Petr

PS: It was great analyze. The information from the crash dump
    were really helpful.
    

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-22  6:36 ` Yafang Shao
@ 2025-01-22 11:45   ` Petr Mladek
  2025-01-22 13:30     ` Yafang Shao
  0 siblings, 1 reply; 12+ messages in thread
From: Petr Mladek @ 2025-01-22 11:45 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Josh Poimboeuf, jikos, Miroslav Benes, Joe Lawrence,
	live-patching

On Wed 2025-01-22 14:36:55, Yafang Shao wrote:
> On Tue, Jan 21, 2025 at 5:38 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > Hello,
> >
> > We encountered a panic while upgrading our livepatch, specifically
> > replacing an old livepatch with a new version on our production
> > servers.
> >
> > [156821.048318] livepatch: enabling patch 'livepatch_61_release12'
> > [156821.061580] livepatch: 'livepatch_61_release12': starting patching
> > transition
> > [156821.122212] livepatch: 'livepatch_61_release12': patching complete
> > [156821.175871] kernel tried to execute NX-protected page - exploit
> > attempt? (uid: 10524)
> > [156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa
> > [156821.176121] #PF: supervisor instruction fetch in kernel mode
> > [156821.176211] #PF: error_code(0x0011) - permissions violation
> > [156821.176302] PGD 986c15067 P4D 986c15067 PUD 986c17067 PMD
> > 184f53b067 PTE 800000194c08e163
> > [156821.176435] Oops: 0011 [#1] PREEMPT SMP NOPTI
> > [156821.176506] CPU: 70 PID: 783972 Comm: java Kdump: loaded Tainted:
> > G S      W  O  K    6.1.52-3 #3.pdd
> > [156821.176654] Hardware name: Inspur SA5212M5/SA5212M5, BIOS 4.1.20 05/05/2021
> > [156821.176766] RIP: 0010:0xffffffffc0ded7fa
> > [156821.176841] Code: 0a 00 00 48 89 42 08 48 89 10 4d 89 a6 08 0a 00
> > 00 4c 89 f7 4d 89 a6 10 0a 00 00 4d 8d a7 08 0a 00 00 4d 89 fe e8 00
> > 00 00 00 <49> 8b 87 08 0a 00 00 48 2d 08 0a 00 00 4d 39 ec 75 aa 48 89
> > df e8
> > [156821.177138] RSP: 0018:ffffba6f273dbd30 EFLAGS: 00010282
> > [156821.177222] RAX: 0000000000000000 RBX: ffff94cd316f0000 RCX:
> > 000000008020000d
> > [156821.177338] RDX: 000000008020000e RSI: 000000008020000d RDI:
> > ffff94cd316f0000
> > [156821.177452] RBP: ffffba6f273dbd88 R08: ffff94cd316f13f8 R09:
> > 0000000000000001
> > [156821.177567] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffba6f273dbd48
> > [156821.177682] R13: ffffba6f273dbd48 R14: ffffba6f273db340 R15:
> > ffffba6f273db340
> > [156821.177797] FS:  0000000000000000(0000) GS:ffff94e321180000(0000)
> > knlGS:0000000000000000
> > [156821.177926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [156821.178019] CR2: ffffffffc0ded7fa CR3: 000000015909c006 CR4:
> > 00000000007706e0
> > [156821.178133] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [156821.178248] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [156821.178363] PKRU: 55555554
> > [156821.178407] Call Trace:
> > [156821.178449]  <TASK>
> > [156821.178492]  ? show_regs.cold+0x1a/0x1f
> > [156821.178559]  ? __die_body+0x20/0x70
> > [156821.178617]  ? __die+0x2b/0x37
> > [156821.178669]  ? page_fault_oops+0x136/0x2b0
> > [156821.178734]  ? search_bpf_extables+0x63/0x90
> > [156821.178805]  ? search_exception_tables+0x5f/0x70
> > [156821.178881]  ? kernelmode_fixup_or_oops+0xa2/0x120
> > [156821.178957]  ? __bad_area_nosemaphore+0x176/0x1b0
> > [156821.179034]  ? bad_area_nosemaphore+0x16/0x20
> > [156821.179105]  ? do_kern_addr_fault+0x77/0x90
> > [156821.179175]  ? exc_page_fault+0xc6/0x160
> > [156821.179239]  ? asm_exc_page_fault+0x27/0x30
> > [156821.179310]  do_group_exit+0x35/0x90
> > [156821.179371]  get_signal+0x909/0x950
> > [156821.179429]  ? wake_up_q+0x50/0x90
> > [156821.179486]  arch_do_signal_or_restart+0x34/0x2a0
> > [156821.183207]  exit_to_user_mode_prepare+0x149/0x1b0
> > [156821.186963]  syscall_exit_to_user_mode+0x1e/0x50
> > [156821.190723]  do_syscall_64+0x48/0x90
> > [156821.194500]  entry_SYSCALL_64_after_hwframe+0x64/0xce
> > [156821.198195] RIP: 0033:0x7f967feb5a35
> > [156821.201769] Code: Unable to access opcode bytes at 0x7f967feb5a0b.
> > [156821.205283] RSP: 002b:00007f96664ee670 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000ca
> > [156821.208790] RAX: fffffffffffffe00 RBX: 00007f967808a650 RCX:
> > 00007f967feb5a35
> > [156821.212305] RDX: 000000000000000f RSI: 0000000000000080 RDI:
> > 00007f967808a654
> > [156821.215785] RBP: 00007f96664ee6c0 R08: 00007f967808a600 R09:
> > 0000000000000007
> > [156821.219273] R10: 0000000000000000 R11: 0000000000000246 R12:
> > 00007f967808a600
> > [156821.222727] R13: 00007f967808a628 R14: 00007f967f691220 R15:
> > 00007f96664ee750
> > [156821.226155]  </TASK>
> > [156821.229470] Modules linked in: livepatch_61_release12(OK)
> > ebtable_filter ebtables af_packet_diag netlink_diag xt_DSCP xt_owner
> > iptable_mangle iptable_raw xt_CT cls_bpf sch_ingress bpf_preload
> > binfmt_misc raw_diag unix_diag tcp_diag udp_diag inet_diag
> > iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack_netlink
> > nfnetlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay af_packet
> > bonding tls intel_rapl_msr intel_rapl_common intel_uncore_frequency
> > intel_uncore_frequency_common isst_if_common skx_edac nfit
> > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> > rapl vfat fat intel_cstate iTCO_wdt xfs intel_uncore pcspkr ses
> > enclosure mei_me i2c_i801 input_leds lpc_ich acpi_ipmi ioatdma
> > i2c_smbus mei mfd_core dca wmi ipmi_si intel_pch_thermal ipmi_devintf
> > ipmi_msghandler acpi_cpufreq acpi_pad acpi_power_meter ip_tables ext4
> > mbcache jbd2 sd_mod sg mpt3sas raid_class scsi_transport_sas
> > megaraid_sas crct10dif_pclmul crc32_pclmul crc32c_intel
> > polyval_clmulni polyval_generic
> > [156821.229555]  ghash_clmulni_intel sha512_ssse3 aesni_intel
> > crypto_simd cryptd nvme nvme_core t10_pi i40e ptp pps_core ahci
> > libahci libata deflate zlib_deflate
> > [156821.259012] Unloaded tainted modules:
> > livepatch_61_release6(OK):14089 livepatch_61_release12(OK):14088 [last
> > unloaded: livepatch_61_release6(OK)]
> > [156821.275421] CR2: ffffffffc0ded7fa
> >
> > Although the issue was observed on an older 6.1 kernel, I suspect it
> > persists in the upstream kernel as well. Due to the significant effort
> > required to deploy the upstream kernel in our production environment,
> > I have not yet attempted to reproduce the issue with the latest
> > upstream version.
> >
> > Crash Analysis:
> > =============
> >
> > crash> bt
> > PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
> >  #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
> >  #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
> >  #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
> >  #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
> >  #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
> > ffffffffc0ded7fa [livepatch_61_release6]
> >  #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
> > ffffffffc0ded7fa [livepatch_61_release6]
> >  #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
> > ffffffffc0ded7fa [livepatch_61_release6]
> >  #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
> >     [exception RIP: _MODULE_START_livepatch_61_release6+14330]
> >     RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282
> >     RAX: 0000000000000000  RBX: ffff94cd316f0000  RCX: 000000008020000d
> >     RDX: 000000008020000e  RSI: 000000008020000d  RDI: ffff94cd316f0000
> >     RBP: ffffba6f273dbd88   R8: ffff94cd316f13f8   R9: 0000000000000001
> >     R10: 0000000000000000  R11: 0000000000000000  R12: ffffba6f273dbd48
> >     R13: ffffba6f273dbd48  R14: ffffba6f273db340  R15: ffffba6f273db340
> >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >  #8 [ffffba6f273dbd90] do_group_exit at ffffffff99092395
> >  #9 [ffffba6f273dbdc0] get_signal at ffffffff990a1c69
> > #10 [ffffba6f273dbdd0] wake_up_q at ffffffff990ce060
> > #11 [ffffba6f273dbe48] arch_do_signal_or_restart at ffffffff990209b4
> > #12 [ffffba6f273dbee0] exit_to_user_mode_prepare at ffffffff9912fdf9
> > #13 [ffffba6f273dbf20] syscall_exit_to_user_mode at ffffffff99aeb87e
> > #14 [ffffba6f273dbf38] do_syscall_64 at ffffffff99ae70b8
> > #15 [ffffba6f273dbf50] entry_SYSCALL_64_after_hwframe at ffffffff99c000dc
> >     RIP: 00007f967feb5a35  RSP: 00007f96664ee670  RFLAGS: 00000246
> >     RAX: fffffffffffffe00  RBX: 00007f967808a650  RCX: 00007f967feb5a35
> >     RDX: 000000000000000f  RSI: 0000000000000080  RDI: 00007f967808a654
> >     RBP: 00007f96664ee6c0   R8: 00007f967808a600   R9: 0000000000000007
> >     R10: 0000000000000000  R11: 0000000000000246  R12: 00007f967808a600
> >     R13: 00007f967808a628  R14: 00007f967f691220  R15: 00007f96664ee750
> >     ORIG_RAX: 00000000000000ca  CS: 0033  SS: 002b
> >
> > The crash occurred at the address 0xffffffffc0ded7fa, which is within
> > the livepatch_61_release6. However, from the kernel log, it's clear
> > that this module was replaced by livepatch_61_release12. We can verify
> > this with the crash utility:
> >
> > crash> dis do_exit
> > dis: do_exit: duplicate text symbols found:
> > ffffffff99091700 (T) do_exit
> > /root/rpmbuild/BUILD/kernel-6.1.52/kernel/exit.c: 806
> > ffffffffc0e038d0 (t) do_exit [livepatch_61_release12]
> >
> > crash> dis ffffffff99091700
> > 0xffffffff99091700 <do_exit>:   call   0xffffffffc08b9000
> > 0xffffffff99091705 <do_exit+5>: push   %rbp
> >
> > Here, do_exit was patched in livepatch_61_release12, with the
> > trampoline address of the new implementation being 0xffffffffc08b9000.
> >
> > Next, we checked the klp_ops struct to verify the livepatch operations:
> >
> > crash> list klp_ops.node -H klp_ops  -s klp_ops -x
> > ...
> > ffff94f3ab8ec900
> > struct klp_ops {
> >   node = {
> >     next = 0xffff94f3ab8ecc00,
> >     prev = 0xffff94f3ab8ed500
> >   },
> >   func_stack = {
> >     next = 0xffff94cd4a856238,
> >     prev = 0xffff94cd4a856238
> >   },
> > ...
> >
> > crash> struct -o klp_func.stack_node
> > struct klp_func {
> >   [112] struct list_head stack_node;
> > }
> >
> > crash> klp_func ffff94cd4a8561c8
> > struct klp_func {
> >   old_name = 0xffffffffc0e086c8 "do_exit",
> >   new_func = 0xffffffffc0e038d0,
> >   old_sympos = 0,
> >   old_func = 0xffffffff99091700 <do_exit>,
> >   kobj = {
> >     name = 0xffff94f379c519c0 "do_exit,1",
> >     entry = {
> >       next = 0xffff94cd4a8561f0,
> >       prev = 0xffff94cd4a8561f0
> >     },
> >     parent = 0xffff94e487064ad8,
> >
> > The do_exit function from livepatch_61_release6 was successfully
> > replaced by the updated version in livepatch_61_release12, but the
> > task causing the crash was still executing the older do_exit() from
> > livepatch_61_release6.
> >
> > This was confirmed when we checked the symbol mapping for livepatch_61_release6:
> >
> > crash> sym -m livepatch_61_release6
> > ffffffffc0dea000 MODULE START: livepatch_61_release6
> > ffffffffc0dff000 MODULE END: livepatch_61_release6
> >
> > We identified that the crash occurred at offset 0x37fa within the old
> > livepatch module, specifically right after the release_task()
> > function. This crash took place within the do_exit() function. (Note
> > that the instruction shown below is decoded from the newly loaded
> > livepatch_61_release6, so while the address differs, the offset
> > remains the same.)
> >
> > 0xffffffffc0db07eb <do_exit+1803>:      lea    0xa08(%r15),%r12
> > 0xffffffffc0db07f2 <do_exit+1810>:      mov    %r15,%r14
> > 0xffffffffc0db07f5 <do_exit+1813>:      call   0xffffffff9a08fc00 <release_task>
> > 0xffffffffc0db07fa <do_exit+1818>:      mov    0xa08(%r15),%rax
> >          <<<<<<<
> > 0xffffffffc0db0801 <do_exit+1825>:      sub    $0xa08,%rax
> >
> > Interestingly, the crash occurred immediately after returning from the
> > release_task() function. Four servers crashed out of around 50K, all
> > after returning from release_task().
> >
> > This suggests a potential synchronization issue between release_task()
> > and klp_try_complete_transition(). It is possible that
> > klp_try_switch_task() failed to detect the task executing
> > release_task(), or that klp_synchronize_transition() failed to wait
> > for release_task() to finish.
> >
> > I suspect we need do something change as follows,
> >
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -220,6 +220,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
> >
> >         kprobe_flush_task(tsk);
> >         rethook_flush_task(tsk);
> > +       klp_flush_task(tsk);
> >         perf_event_delayed_put(tsk);
> >         trace_sched_process_free(tsk);
> >         put_task_struct(tsk);
> >
> > Any suggestions ?
> 
> Hello,
> 
> After further analysis, my best guess is that the task stack is being
> freed in release_task() while klp_try_switch_task() is still
> attempting to access it. It seems we should consider calling
> try_get_task_stack() in klp_check_stack() to address this.

I do not agree here.

My understanding is that the system crashed when it was running
the obsolete livepatch_61_release6 code. Why do you think that
it was in klp_try_switch_task()?

The ordering of messages is:

 [156821.122212] livepatch: 'livepatch_61_release12': patching complete
 [156821.175871] kernel tried to execute NX-protected page - exploit
 attempt? (uid: 10524)
 [156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa

So that the livepatch transition has completed before the crash.
I can't see which process or CPU would be running
klp_try_switch_task() at this point.

My theory is that the transition has finished and some other process
started removing the older livepatch module. I guess that the memory
with the livepatch_61_release6 code has been freed on another CPU.

It would cause a crash of a process still running the freed do_exit()
function. The process would not block the transition after it was
removed from the task list in the middle of do_exit().

Maybe, you could confirm this in the existing crash dump.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-22 11:45   ` Petr Mladek
@ 2025-01-22 13:30     ` Yafang Shao
  2025-01-22 14:01       ` Yafang Shao
  0 siblings, 1 reply; 12+ messages in thread
From: Yafang Shao @ 2025-01-22 13:30 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, jikos, Miroslav Benes, Joe Lawrence,
	live-patching

On Wed, Jan 22, 2025 at 7:45 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2025-01-22 14:36:55, Yafang Shao wrote:
> > On Tue, Jan 21, 2025 at 5:38 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > We encountered a panic while upgrading our livepatch, specifically
> > > replacing an old livepatch with a new version on our production
> > > servers.
> > >
> > > [156821.048318] livepatch: enabling patch 'livepatch_61_release12'
> > > [156821.061580] livepatch: 'livepatch_61_release12': starting patching
> > > transition
> > > [156821.122212] livepatch: 'livepatch_61_release12': patching complete
> > > [156821.175871] kernel tried to execute NX-protected page - exploit
> > > attempt? (uid: 10524)
> > > [156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa
> > > [156821.176121] #PF: supervisor instruction fetch in kernel mode
> > > [156821.176211] #PF: error_code(0x0011) - permissions violation
> > > [156821.176302] PGD 986c15067 P4D 986c15067 PUD 986c17067 PMD
> > > 184f53b067 PTE 800000194c08e163
> > > [156821.176435] Oops: 0011 [#1] PREEMPT SMP NOPTI
> > > [156821.176506] CPU: 70 PID: 783972 Comm: java Kdump: loaded Tainted:
> > > G S      W  O  K    6.1.52-3 #3.pdd
> > > [156821.176654] Hardware name: Inspur SA5212M5/SA5212M5, BIOS 4.1.20 05/05/2021
> > > [156821.176766] RIP: 0010:0xffffffffc0ded7fa
> > > [156821.176841] Code: 0a 00 00 48 89 42 08 48 89 10 4d 89 a6 08 0a 00
> > > 00 4c 89 f7 4d 89 a6 10 0a 00 00 4d 8d a7 08 0a 00 00 4d 89 fe e8 00
> > > 00 00 00 <49> 8b 87 08 0a 00 00 48 2d 08 0a 00 00 4d 39 ec 75 aa 48 89
> > > df e8
> > > [156821.177138] RSP: 0018:ffffba6f273dbd30 EFLAGS: 00010282
> > > [156821.177222] RAX: 0000000000000000 RBX: ffff94cd316f0000 RCX:
> > > 000000008020000d
> > > [156821.177338] RDX: 000000008020000e RSI: 000000008020000d RDI:
> > > ffff94cd316f0000
> > > [156821.177452] RBP: ffffba6f273dbd88 R08: ffff94cd316f13f8 R09:
> > > 0000000000000001
> > > [156821.177567] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffffba6f273dbd48
> > > [156821.177682] R13: ffffba6f273dbd48 R14: ffffba6f273db340 R15:
> > > ffffba6f273db340
> > > [156821.177797] FS:  0000000000000000(0000) GS:ffff94e321180000(0000)
> > > knlGS:0000000000000000
> > > [156821.177926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [156821.178019] CR2: ffffffffc0ded7fa CR3: 000000015909c006 CR4:
> > > 00000000007706e0
> > > [156821.178133] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [156821.178248] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > [156821.178363] PKRU: 55555554
> > > [156821.178407] Call Trace:
> > > [156821.178449]  <TASK>
> > > [156821.178492]  ? show_regs.cold+0x1a/0x1f
> > > [156821.178559]  ? __die_body+0x20/0x70
> > > [156821.178617]  ? __die+0x2b/0x37
> > > [156821.178669]  ? page_fault_oops+0x136/0x2b0
> > > [156821.178734]  ? search_bpf_extables+0x63/0x90
> > > [156821.178805]  ? search_exception_tables+0x5f/0x70
> > > [156821.178881]  ? kernelmode_fixup_or_oops+0xa2/0x120
> > > [156821.178957]  ? __bad_area_nosemaphore+0x176/0x1b0
> > > [156821.179034]  ? bad_area_nosemaphore+0x16/0x20
> > > [156821.179105]  ? do_kern_addr_fault+0x77/0x90
> > > [156821.179175]  ? exc_page_fault+0xc6/0x160
> > > [156821.179239]  ? asm_exc_page_fault+0x27/0x30
> > > [156821.179310]  do_group_exit+0x35/0x90
> > > [156821.179371]  get_signal+0x909/0x950
> > > [156821.179429]  ? wake_up_q+0x50/0x90
> > > [156821.179486]  arch_do_signal_or_restart+0x34/0x2a0
> > > [156821.183207]  exit_to_user_mode_prepare+0x149/0x1b0
> > > [156821.186963]  syscall_exit_to_user_mode+0x1e/0x50
> > > [156821.190723]  do_syscall_64+0x48/0x90
> > > [156821.194500]  entry_SYSCALL_64_after_hwframe+0x64/0xce
> > > [156821.198195] RIP: 0033:0x7f967feb5a35
> > > [156821.201769] Code: Unable to access opcode bytes at 0x7f967feb5a0b.
> > > [156821.205283] RSP: 002b:00007f96664ee670 EFLAGS: 00000246 ORIG_RAX:
> > > 00000000000000ca
> > > [156821.208790] RAX: fffffffffffffe00 RBX: 00007f967808a650 RCX:
> > > 00007f967feb5a35
> > > [156821.212305] RDX: 000000000000000f RSI: 0000000000000080 RDI:
> > > 00007f967808a654
> > > [156821.215785] RBP: 00007f96664ee6c0 R08: 00007f967808a600 R09:
> > > 0000000000000007
> > > [156821.219273] R10: 0000000000000000 R11: 0000000000000246 R12:
> > > 00007f967808a600
> > > [156821.222727] R13: 00007f967808a628 R14: 00007f967f691220 R15:
> > > 00007f96664ee750
> > > [156821.226155]  </TASK>
> > > [156821.229470] Modules linked in: livepatch_61_release12(OK)
> > > ebtable_filter ebtables af_packet_diag netlink_diag xt_DSCP xt_owner
> > > iptable_mangle iptable_raw xt_CT cls_bpf sch_ingress bpf_preload
> > > binfmt_misc raw_diag unix_diag tcp_diag udp_diag inet_diag
> > > iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack_netlink
> > > nfnetlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay af_packet
> > > bonding tls intel_rapl_msr intel_rapl_common intel_uncore_frequency
> > > intel_uncore_frequency_common isst_if_common skx_edac nfit
> > > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> > > rapl vfat fat intel_cstate iTCO_wdt xfs intel_uncore pcspkr ses
> > > enclosure mei_me i2c_i801 input_leds lpc_ich acpi_ipmi ioatdma
> > > i2c_smbus mei mfd_core dca wmi ipmi_si intel_pch_thermal ipmi_devintf
> > > ipmi_msghandler acpi_cpufreq acpi_pad acpi_power_meter ip_tables ext4
> > > mbcache jbd2 sd_mod sg mpt3sas raid_class scsi_transport_sas
> > > megaraid_sas crct10dif_pclmul crc32_pclmul crc32c_intel
> > > polyval_clmulni polyval_generic
> > > [156821.229555]  ghash_clmulni_intel sha512_ssse3 aesni_intel
> > > crypto_simd cryptd nvme nvme_core t10_pi i40e ptp pps_core ahci
> > > libahci libata deflate zlib_deflate
> > > [156821.259012] Unloaded tainted modules:
> > > livepatch_61_release6(OK):14089 livepatch_61_release12(OK):14088 [last
> > > unloaded: livepatch_61_release6(OK)]
> > > [156821.275421] CR2: ffffffffc0ded7fa
> > >
> > > Although the issue was observed on an older 6.1 kernel, I suspect it
> > > persists in the upstream kernel as well. Due to the significant effort
> > > required to deploy the upstream kernel in our production environment,
> > > I have not yet attempted to reproduce the issue with the latest
> > > upstream version.
> > >
> > > Crash Analysis:
> > > =============
> > >
> > > crash> bt
> > > PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
> > >  #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
> > >  #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
> > >  #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
> > >  #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
> > >  #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
> > > ffffffffc0ded7fa [livepatch_61_release6]
> > >  #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
> > > ffffffffc0ded7fa [livepatch_61_release6]
> > >  #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
> > > ffffffffc0ded7fa [livepatch_61_release6]
> > >  #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
> > >     [exception RIP: _MODULE_START_livepatch_61_release6+14330]
> > >     RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282
> > >     RAX: 0000000000000000  RBX: ffff94cd316f0000  RCX: 000000008020000d
> > >     RDX: 000000008020000e  RSI: 000000008020000d  RDI: ffff94cd316f0000
> > >     RBP: ffffba6f273dbd88   R8: ffff94cd316f13f8   R9: 0000000000000001
> > >     R10: 0000000000000000  R11: 0000000000000000  R12: ffffba6f273dbd48
> > >     R13: ffffba6f273dbd48  R14: ffffba6f273db340  R15: ffffba6f273db340
> > >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > >  #8 [ffffba6f273dbd90] do_group_exit at ffffffff99092395
> > >  #9 [ffffba6f273dbdc0] get_signal at ffffffff990a1c69
> > > #10 [ffffba6f273dbdd0] wake_up_q at ffffffff990ce060
> > > #11 [ffffba6f273dbe48] arch_do_signal_or_restart at ffffffff990209b4
> > > #12 [ffffba6f273dbee0] exit_to_user_mode_prepare at ffffffff9912fdf9
> > > #13 [ffffba6f273dbf20] syscall_exit_to_user_mode at ffffffff99aeb87e
> > > #14 [ffffba6f273dbf38] do_syscall_64 at ffffffff99ae70b8
> > > #15 [ffffba6f273dbf50] entry_SYSCALL_64_after_hwframe at ffffffff99c000dc
> > >     RIP: 00007f967feb5a35  RSP: 00007f96664ee670  RFLAGS: 00000246
> > >     RAX: fffffffffffffe00  RBX: 00007f967808a650  RCX: 00007f967feb5a35
> > >     RDX: 000000000000000f  RSI: 0000000000000080  RDI: 00007f967808a654
> > >     RBP: 00007f96664ee6c0   R8: 00007f967808a600   R9: 0000000000000007
> > >     R10: 0000000000000000  R11: 0000000000000246  R12: 00007f967808a600
> > >     R13: 00007f967808a628  R14: 00007f967f691220  R15: 00007f96664ee750
> > >     ORIG_RAX: 00000000000000ca  CS: 0033  SS: 002b
> > >
> > > The crash occurred at the address 0xffffffffc0ded7fa, which is within
> > > the livepatch_61_release6. However, from the kernel log, it's clear
> > > that this module was replaced by livepatch_61_release12. We can verify
> > > this with the crash utility:
> > >
> > > crash> dis do_exit
> > > dis: do_exit: duplicate text symbols found:
> > > ffffffff99091700 (T) do_exit
> > > /root/rpmbuild/BUILD/kernel-6.1.52/kernel/exit.c: 806
> > > ffffffffc0e038d0 (t) do_exit [livepatch_61_release12]
> > >
> > > crash> dis ffffffff99091700
> > > 0xffffffff99091700 <do_exit>:   call   0xffffffffc08b9000
> > > 0xffffffff99091705 <do_exit+5>: push   %rbp
> > >
> > > Here, do_exit was patched in livepatch_61_release12, with the
> > > trampoline address of the new implementation being 0xffffffffc08b9000.
> > >
> > > Next, we checked the klp_ops struct to verify the livepatch operations:
> > >
> > > crash> list klp_ops.node -H klp_ops  -s klp_ops -x
> > > ...
> > > ffff94f3ab8ec900
> > > struct klp_ops {
> > >   node = {
> > >     next = 0xffff94f3ab8ecc00,
> > >     prev = 0xffff94f3ab8ed500
> > >   },
> > >   func_stack = {
> > >     next = 0xffff94cd4a856238,
> > >     prev = 0xffff94cd4a856238
> > >   },
> > > ...
> > >
> > > crash> struct -o klp_func.stack_node
> > > struct klp_func {
> > >   [112] struct list_head stack_node;
> > > }
> > >
> > > crash> klp_func ffff94cd4a8561c8
> > > struct klp_func {
> > >   old_name = 0xffffffffc0e086c8 "do_exit",
> > >   new_func = 0xffffffffc0e038d0,
> > >   old_sympos = 0,
> > >   old_func = 0xffffffff99091700 <do_exit>,
> > >   kobj = {
> > >     name = 0xffff94f379c519c0 "do_exit,1",
> > >     entry = {
> > >       next = 0xffff94cd4a8561f0,
> > >       prev = 0xffff94cd4a8561f0
> > >     },
> > >     parent = 0xffff94e487064ad8,
> > >
> > > The do_exit function from livepatch_61_release6 was successfully
> > > replaced by the updated version in livepatch_61_release12, but the
> > > task causing the crash was still executing the older do_exit() from
> > > livepatch_61_release6.
> > >
> > > This was confirmed when we checked the symbol mapping for livepatch_61_release6:
> > >
> > > crash> sym -m livepatch_61_release6
> > > ffffffffc0dea000 MODULE START: livepatch_61_release6
> > > ffffffffc0dff000 MODULE END: livepatch_61_release6
> > >
> > > We identified that the crash occurred at offset 0x37fa within the old
> > > livepatch module, specifically right after the release_task()
> > > function. This crash took place within the do_exit() function. (Note
> > > that the instruction shown below is decoded from the newly loaded
> > > livepatch_61_release6, so while the address differs, the offset
> > > remains the same.)
> > >
> > > 0xffffffffc0db07eb <do_exit+1803>:      lea    0xa08(%r15),%r12
> > > 0xffffffffc0db07f2 <do_exit+1810>:      mov    %r15,%r14
> > > 0xffffffffc0db07f5 <do_exit+1813>:      call   0xffffffff9a08fc00 <release_task>
> > > 0xffffffffc0db07fa <do_exit+1818>:      mov    0xa08(%r15),%rax
> > >          <<<<<<<
> > > 0xffffffffc0db0801 <do_exit+1825>:      sub    $0xa08,%rax
> > >
> > > Interestingly, the crash occurred immediately after returning from the
> > > release_task() function. Four servers crashed out of around 50K, all
> > > after returning from release_task().
> > >
> > > This suggests a potential synchronization issue between release_task()
> > > and klp_try_complete_transition(). It is possible that
> > > klp_try_switch_task() failed to detect the task executing
> > > release_task(), or that klp_synchronize_transition() failed to wait
> > > for release_task() to finish.
> > >
> > > I suspect we need do something change as follows,
> > >
> > > --- a/kernel/exit.c
> > > +++ b/kernel/exit.c
> > > @@ -220,6 +220,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
> > >
> > >         kprobe_flush_task(tsk);
> > >         rethook_flush_task(tsk);
> > > +       klp_flush_task(tsk);
> > >         perf_event_delayed_put(tsk);
> > >         trace_sched_process_free(tsk);
> > >         put_task_struct(tsk);
> > >
> > > Any suggestions ?
> >
> > Hello,
> >
> > After further analysis, my best guess is that the task stack is being
> > freed in release_task() while klp_try_switch_task() is still
> > attempting to access it. It seems we should consider calling
> > try_get_task_stack() in klp_check_stack() to address this.
>
> I do not agree here.
>
> My understanding is that the system crashed when it was running
> the obsolete livepatch_61_release6 code. Why do you think that
> it was in klp_try_switch_task()?

I suspect that klp_try_switch_task() is misinterpreting the task's
stack when the task is in release_task() or immediately after it. All
crashes occurred right after executing release_task(), which doesn't
seem like a coincidence.

>
> The ordering of messages is:
>
>  [156821.122212] livepatch: 'livepatch_61_release12': patching complete
>  [156821.175871] kernel tried to execute NX-protected page - exploit
>  attempt? (uid: 10524)
>  [156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa
>
> So that the livepatch transition has completed before the crash.
> I can't see which process or CPU would be running
> klp_try_switch_task() at this point.

I agree with you that klp_try_switch_task() is not currently running.
As I mentioned earlier, it appears that klp_try_switch_task() simply
missed this task.

>
> My theory is that the transition has finished and some other process
> started removing the older livepatch module. I guess that the memory
> with the livepatch_61_release6 code has been freed on another CPU.
>
> It would cause a crash of a process still running the freed do_exit()
> function. The process would not block the transition after it was
> removed from the task list in the middle of do_exit().
>
> Maybe, you could confirm this in the existing crash dump.

That's correct, I can confirm this. Below are the details:

crash> bt
PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
 #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
 #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
 #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
 #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
 #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
ffffffffc0ded7fa [livepatch_61_release6]
 #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
ffffffffc0ded7fa [livepatch_61_release6]
 #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
ffffffffc0ded7fa [livepatch_61_release6]
 #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
    [exception RIP: _MODULE_START_livepatch_61_release6+14330]
    RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282

crash> task_struct.tgid ffff94cd316f0000
  tgid = 783848,

crash> task_struct.tasks -o init_task
struct task_struct {
  [ffffffff9ac1b310] struct list_head tasks;
}

crash> list task_struct.tasks -H ffffffff9ac1b310 -s task_struct.tgid
| grep 783848
  tgid = 783848,

The thread group leader remains on the task list, but the thread has
already been removed from the thread_head list.

crash> task 783848
PID: 783848  TASK: ffff94cd603eb000  CPU: 18  COMMAND: "java"
struct task_struct {
  thread_info = {
    flags = 16388,

crash> task_struct.signal ffff94cd603eb000
  signal = 0xffff94cc89d11b00,

crash> signal_struct.thread_head -o 0xffff94cc89d11b00
struct signal_struct {
  [ffff94cc89d11b10] struct list_head thread_head;
}

crash> list task_struct.thread_node -H ffff94cc89d11b10 -s task_struct.pid
ffff94cd603eb000
  pid = 783848,
ffff94ccd8343000
  pid = 783879,

crash> signal_struct.nr_threads,thread_head 0xffff94cc89d11b00
  nr_threads = 2,
  thread_head = {
    next = 0xffff94cd603eba70,
    prev = 0xffff94ccd8343a70
  },

crash> ps -g 783848
PID: 783848  TASK: ffff94cd603eb000  CPU: 18  COMMAND: "java"
  PID: 783879  TASK: ffff94ccd8343000  CPU: 81  COMMAND: "java"
  PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
  PID: 784023  TASK: ffff94d644b48000  CPU: 24  COMMAND: "java"
  PID: 784025  TASK: ffff94dd30250000  CPU: 65  COMMAND: "java"
  PID: 785242  TASK: ffff94ccb5963000  CPU: 48  COMMAND: "java"
  PID: 785412  TASK: ffff94cd3eaf8000  CPU: 92  COMMAND: "java"
  PID: 785415  TASK: ffff94cd6606b000  CPU: 23  COMMAND: "java"
  PID: 785957  TASK: ffff94dfea4e3000  CPU: 16  COMMAND: "java"
  PID: 787125  TASK: ffff94e70547b000  CPU: 27  COMMAND: "java"
  PID: 787445  TASK: ffff94e49a2bb000  CPU: 28  COMMAND: "java"
  PID: 787502  TASK: ffff94e41e0f3000  CPU: 36  COMMAND: "java"

It seems like fixing this will be a challenging task.


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-22 13:30     ` Yafang Shao
@ 2025-01-22 14:01       ` Yafang Shao
  2025-01-22 15:56         ` Petr Mladek
  0 siblings, 1 reply; 12+ messages in thread
From: Yafang Shao @ 2025-01-22 14:01 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, jikos, Miroslav Benes, Joe Lawrence,
	live-patching

On Wed, Jan 22, 2025 at 9:30 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Wed, Jan 22, 2025 at 7:45 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Wed 2025-01-22 14:36:55, Yafang Shao wrote:
> > > On Tue, Jan 21, 2025 at 5:38 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > We encountered a panic while upgrading our livepatch, specifically
> > > > replacing an old livepatch with a new version on our production
> > > > servers.
> > > >
> > > > [156821.048318] livepatch: enabling patch 'livepatch_61_release12'
> > > > [156821.061580] livepatch: 'livepatch_61_release12': starting patching
> > > > transition
> > > > [156821.122212] livepatch: 'livepatch_61_release12': patching complete
> > > > [156821.175871] kernel tried to execute NX-protected page - exploit
> > > > attempt? (uid: 10524)
> > > > [156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa
> > > > [156821.176121] #PF: supervisor instruction fetch in kernel mode
> > > > [156821.176211] #PF: error_code(0x0011) - permissions violation
> > > > [156821.176302] PGD 986c15067 P4D 986c15067 PUD 986c17067 PMD
> > > > 184f53b067 PTE 800000194c08e163
> > > > [156821.176435] Oops: 0011 [#1] PREEMPT SMP NOPTI
> > > > [156821.176506] CPU: 70 PID: 783972 Comm: java Kdump: loaded Tainted:
> > > > G S      W  O  K    6.1.52-3 #3.pdd
> > > > [156821.176654] Hardware name: Inspur SA5212M5/SA5212M5, BIOS 4.1.20 05/05/2021
> > > > [156821.176766] RIP: 0010:0xffffffffc0ded7fa
> > > > [156821.176841] Code: 0a 00 00 48 89 42 08 48 89 10 4d 89 a6 08 0a 00
> > > > 00 4c 89 f7 4d 89 a6 10 0a 00 00 4d 8d a7 08 0a 00 00 4d 89 fe e8 00
> > > > 00 00 00 <49> 8b 87 08 0a 00 00 48 2d 08 0a 00 00 4d 39 ec 75 aa 48 89
> > > > df e8
> > > > [156821.177138] RSP: 0018:ffffba6f273dbd30 EFLAGS: 00010282
> > > > [156821.177222] RAX: 0000000000000000 RBX: ffff94cd316f0000 RCX:
> > > > 000000008020000d
> > > > [156821.177338] RDX: 000000008020000e RSI: 000000008020000d RDI:
> > > > ffff94cd316f0000
> > > > [156821.177452] RBP: ffffba6f273dbd88 R08: ffff94cd316f13f8 R09:
> > > > 0000000000000001
> > > > [156821.177567] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > ffffba6f273dbd48
> > > > [156821.177682] R13: ffffba6f273dbd48 R14: ffffba6f273db340 R15:
> > > > ffffba6f273db340
> > > > [156821.177797] FS:  0000000000000000(0000) GS:ffff94e321180000(0000)
> > > > knlGS:0000000000000000
> > > > [156821.177926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [156821.178019] CR2: ffffffffc0ded7fa CR3: 000000015909c006 CR4:
> > > > 00000000007706e0
> > > > [156821.178133] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > [156821.178248] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > 0000000000000400
> > > > [156821.178363] PKRU: 55555554
> > > > [156821.178407] Call Trace:
> > > > [156821.178449]  <TASK>
> > > > [156821.178492]  ? show_regs.cold+0x1a/0x1f
> > > > [156821.178559]  ? __die_body+0x20/0x70
> > > > [156821.178617]  ? __die+0x2b/0x37
> > > > [156821.178669]  ? page_fault_oops+0x136/0x2b0
> > > > [156821.178734]  ? search_bpf_extables+0x63/0x90
> > > > [156821.178805]  ? search_exception_tables+0x5f/0x70
> > > > [156821.178881]  ? kernelmode_fixup_or_oops+0xa2/0x120
> > > > [156821.178957]  ? __bad_area_nosemaphore+0x176/0x1b0
> > > > [156821.179034]  ? bad_area_nosemaphore+0x16/0x20
> > > > [156821.179105]  ? do_kern_addr_fault+0x77/0x90
> > > > [156821.179175]  ? exc_page_fault+0xc6/0x160
> > > > [156821.179239]  ? asm_exc_page_fault+0x27/0x30
> > > > [156821.179310]  do_group_exit+0x35/0x90
> > > > [156821.179371]  get_signal+0x909/0x950
> > > > [156821.179429]  ? wake_up_q+0x50/0x90
> > > > [156821.179486]  arch_do_signal_or_restart+0x34/0x2a0
> > > > [156821.183207]  exit_to_user_mode_prepare+0x149/0x1b0
> > > > [156821.186963]  syscall_exit_to_user_mode+0x1e/0x50
> > > > [156821.190723]  do_syscall_64+0x48/0x90
> > > > [156821.194500]  entry_SYSCALL_64_after_hwframe+0x64/0xce
> > > > [156821.198195] RIP: 0033:0x7f967feb5a35
> > > > [156821.201769] Code: Unable to access opcode bytes at 0x7f967feb5a0b.
> > > > [156821.205283] RSP: 002b:00007f96664ee670 EFLAGS: 00000246 ORIG_RAX:
> > > > 00000000000000ca
> > > > [156821.208790] RAX: fffffffffffffe00 RBX: 00007f967808a650 RCX:
> > > > 00007f967feb5a35
> > > > [156821.212305] RDX: 000000000000000f RSI: 0000000000000080 RDI:
> > > > 00007f967808a654
> > > > [156821.215785] RBP: 00007f96664ee6c0 R08: 00007f967808a600 R09:
> > > > 0000000000000007
> > > > [156821.219273] R10: 0000000000000000 R11: 0000000000000246 R12:
> > > > 00007f967808a600
> > > > [156821.222727] R13: 00007f967808a628 R14: 00007f967f691220 R15:
> > > > 00007f96664ee750
> > > > [156821.226155]  </TASK>
> > > > [156821.229470] Modules linked in: livepatch_61_release12(OK)
> > > > ebtable_filter ebtables af_packet_diag netlink_diag xt_DSCP xt_owner
> > > > iptable_mangle iptable_raw xt_CT cls_bpf sch_ingress bpf_preload
> > > > binfmt_misc raw_diag unix_diag tcp_diag udp_diag inet_diag
> > > > iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack_netlink
> > > > nfnetlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay af_packet
> > > > bonding tls intel_rapl_msr intel_rapl_common intel_uncore_frequency
> > > > intel_uncore_frequency_common isst_if_common skx_edac nfit
> > > > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> > > > rapl vfat fat intel_cstate iTCO_wdt xfs intel_uncore pcspkr ses
> > > > enclosure mei_me i2c_i801 input_leds lpc_ich acpi_ipmi ioatdma
> > > > i2c_smbus mei mfd_core dca wmi ipmi_si intel_pch_thermal ipmi_devintf
> > > > ipmi_msghandler acpi_cpufreq acpi_pad acpi_power_meter ip_tables ext4
> > > > mbcache jbd2 sd_mod sg mpt3sas raid_class scsi_transport_sas
> > > > megaraid_sas crct10dif_pclmul crc32_pclmul crc32c_intel
> > > > polyval_clmulni polyval_generic
> > > > [156821.229555]  ghash_clmulni_intel sha512_ssse3 aesni_intel
> > > > crypto_simd cryptd nvme nvme_core t10_pi i40e ptp pps_core ahci
> > > > libahci libata deflate zlib_deflate
> > > > [156821.259012] Unloaded tainted modules:
> > > > livepatch_61_release6(OK):14089 livepatch_61_release12(OK):14088 [last
> > > > unloaded: livepatch_61_release6(OK)]
> > > > [156821.275421] CR2: ffffffffc0ded7fa
> > > >
> > > > Although the issue was observed on an older 6.1 kernel, I suspect it
> > > > persists in the upstream kernel as well. Due to the significant effort
> > > > required to deploy the upstream kernel in our production environment,
> > > > I have not yet attempted to reproduce the issue with the latest
> > > > upstream version.
> > > >
> > > > Crash Analysis:
> > > > =============
> > > >
> > > > crash> bt
> > > > PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
> > > >  #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
> > > >  #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
> > > >  #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
> > > >  #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
> > > >  #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
> > > > ffffffffc0ded7fa [livepatch_61_release6]
> > > >  #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
> > > > ffffffffc0ded7fa [livepatch_61_release6]
> > > >  #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
> > > > ffffffffc0ded7fa [livepatch_61_release6]
> > > >  #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
> > > >     [exception RIP: _MODULE_START_livepatch_61_release6+14330]
> > > >     RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282
> > > >     RAX: 0000000000000000  RBX: ffff94cd316f0000  RCX: 000000008020000d
> > > >     RDX: 000000008020000e  RSI: 000000008020000d  RDI: ffff94cd316f0000
> > > >     RBP: ffffba6f273dbd88   R8: ffff94cd316f13f8   R9: 0000000000000001
> > > >     R10: 0000000000000000  R11: 0000000000000000  R12: ffffba6f273dbd48
> > > >     R13: ffffba6f273dbd48  R14: ffffba6f273db340  R15: ffffba6f273db340
> > > >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > >  #8 [ffffba6f273dbd90] do_group_exit at ffffffff99092395
> > > >  #9 [ffffba6f273dbdc0] get_signal at ffffffff990a1c69
> > > > #10 [ffffba6f273dbdd0] wake_up_q at ffffffff990ce060
> > > > #11 [ffffba6f273dbe48] arch_do_signal_or_restart at ffffffff990209b4
> > > > #12 [ffffba6f273dbee0] exit_to_user_mode_prepare at ffffffff9912fdf9
> > > > #13 [ffffba6f273dbf20] syscall_exit_to_user_mode at ffffffff99aeb87e
> > > > #14 [ffffba6f273dbf38] do_syscall_64 at ffffffff99ae70b8
> > > > #15 [ffffba6f273dbf50] entry_SYSCALL_64_after_hwframe at ffffffff99c000dc
> > > >     RIP: 00007f967feb5a35  RSP: 00007f96664ee670  RFLAGS: 00000246
> > > >     RAX: fffffffffffffe00  RBX: 00007f967808a650  RCX: 00007f967feb5a35
> > > >     RDX: 000000000000000f  RSI: 0000000000000080  RDI: 00007f967808a654
> > > >     RBP: 00007f96664ee6c0   R8: 00007f967808a600   R9: 0000000000000007
> > > >     R10: 0000000000000000  R11: 0000000000000246  R12: 00007f967808a600
> > > >     R13: 00007f967808a628  R14: 00007f967f691220  R15: 00007f96664ee750
> > > >     ORIG_RAX: 00000000000000ca  CS: 0033  SS: 002b
> > > >
> > > > The crash occurred at the address 0xffffffffc0ded7fa, which is within
> > > > the livepatch_61_release6. However, from the kernel log, it's clear
> > > > that this module was replaced by livepatch_61_release12. We can verify
> > > > this with the crash utility:
> > > >
> > > > crash> dis do_exit
> > > > dis: do_exit: duplicate text symbols found:
> > > > ffffffff99091700 (T) do_exit
> > > > /root/rpmbuild/BUILD/kernel-6.1.52/kernel/exit.c: 806
> > > > ffffffffc0e038d0 (t) do_exit [livepatch_61_release12]
> > > >
> > > > crash> dis ffffffff99091700
> > > > 0xffffffff99091700 <do_exit>:   call   0xffffffffc08b9000
> > > > 0xffffffff99091705 <do_exit+5>: push   %rbp
> > > >
> > > > Here, do_exit was patched in livepatch_61_release12, with the
> > > > trampoline address of the new implementation being 0xffffffffc08b9000.
> > > >
> > > > Next, we checked the klp_ops struct to verify the livepatch operations:
> > > >
> > > > crash> list klp_ops.node -H klp_ops  -s klp_ops -x
> > > > ...
> > > > ffff94f3ab8ec900
> > > > struct klp_ops {
> > > >   node = {
> > > >     next = 0xffff94f3ab8ecc00,
> > > >     prev = 0xffff94f3ab8ed500
> > > >   },
> > > >   func_stack = {
> > > >     next = 0xffff94cd4a856238,
> > > >     prev = 0xffff94cd4a856238
> > > >   },
> > > > ...
> > > >
> > > > crash> struct -o klp_func.stack_node
> > > > struct klp_func {
> > > >   [112] struct list_head stack_node;
> > > > }
> > > >
> > > > crash> klp_func ffff94cd4a8561c8
> > > > struct klp_func {
> > > >   old_name = 0xffffffffc0e086c8 "do_exit",
> > > >   new_func = 0xffffffffc0e038d0,
> > > >   old_sympos = 0,
> > > >   old_func = 0xffffffff99091700 <do_exit>,
> > > >   kobj = {
> > > >     name = 0xffff94f379c519c0 "do_exit,1",
> > > >     entry = {
> > > >       next = 0xffff94cd4a8561f0,
> > > >       prev = 0xffff94cd4a8561f0
> > > >     },
> > > >     parent = 0xffff94e487064ad8,
> > > >
> > > > The do_exit function from livepatch_61_release6 was successfully
> > > > replaced by the updated version in livepatch_61_release12, but the
> > > > task causing the crash was still executing the older do_exit() from
> > > > livepatch_61_release6.
> > > >
> > > > This was confirmed when we checked the symbol mapping for livepatch_61_release6:
> > > >
> > > > crash> sym -m livepatch_61_release6
> > > > ffffffffc0dea000 MODULE START: livepatch_61_release6
> > > > ffffffffc0dff000 MODULE END: livepatch_61_release6
> > > >
> > > > We identified that the crash occurred at offset 0x37fa within the old
> > > > livepatch module, specifically right after the release_task()
> > > > function. This crash took place within the do_exit() function. (Note
> > > > that the instruction shown below is decoded from the newly loaded
> > > > livepatch_61_release6, so while the address differs, the offset
> > > > remains the same.)
> > > >
> > > > 0xffffffffc0db07eb <do_exit+1803>:      lea    0xa08(%r15),%r12
> > > > 0xffffffffc0db07f2 <do_exit+1810>:      mov    %r15,%r14
> > > > 0xffffffffc0db07f5 <do_exit+1813>:      call   0xffffffff9a08fc00 <release_task>
> > > > 0xffffffffc0db07fa <do_exit+1818>:      mov    0xa08(%r15),%rax
> > > >          <<<<<<<
> > > > 0xffffffffc0db0801 <do_exit+1825>:      sub    $0xa08,%rax
> > > >
> > > > Interestingly, the crash occurred immediately after returning from the
> > > > release_task() function. Four servers crashed out of around 50K, all
> > > > after returning from release_task().
> > > >
> > > > This suggests a potential synchronization issue between release_task()
> > > > and klp_try_complete_transition(). It is possible that
> > > > klp_try_switch_task() failed to detect the task executing
> > > > release_task(), or that klp_synchronize_transition() failed to wait
> > > > for release_task() to finish.
> > > >
> > > > I suspect we need do something change as follows,
> > > >
> > > > --- a/kernel/exit.c
> > > > +++ b/kernel/exit.c
> > > > @@ -220,6 +220,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
> > > >
> > > >         kprobe_flush_task(tsk);
> > > >         rethook_flush_task(tsk);
> > > > +       klp_flush_task(tsk);
> > > >         perf_event_delayed_put(tsk);
> > > >         trace_sched_process_free(tsk);
> > > >         put_task_struct(tsk);
> > > >
> > > > Any suggestions ?
> > >
> > > Hello,
> > >
> > > After further analysis, my best guess is that the task stack is being
> > > freed in release_task() while klp_try_switch_task() is still
> > > attempting to access it. It seems we should consider calling
> > > try_get_task_stack() in klp_check_stack() to address this.
> >
> > I do not agree here.
> >
> > My understanding is that the system crashed when it was running
> > the obsolete livepatch_61_release6 code. Why do you think that
> > it was in klp_try_switch_task()?
>
> I suspect that klp_try_switch_task() is misinterpreting the task's
> stack when the task is in release_task() or immediately after it. All
> crashes occurred right after executing release_task(), which doesn't
> seem like a coincidence.
>
> >
> > The ordering of messages is:
> >
> >  [156821.122212] livepatch: 'livepatch_61_release12': patching complete
> >  [156821.175871] kernel tried to execute NX-protected page - exploit
> >  attempt? (uid: 10524)
> >  [156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa
> >
> > So that the livepatch transition has completed before the crash.
> > I can't see which process or CPU would be running
> > klp_try_switch_task() at this point.
>
> I agree with you that klp_try_switch_task() is not currently running.
> As I mentioned earlier, it appears that klp_try_switch_task() simply
> missed this task.
>
> >
> > My theory is that the transition has finished and some other process
> > started removing the older livepatch module. I guess that the memory
> > with the livepatch_61_release6 code has been freed on another CPU.
> >
> > It would cause a crash of a process still running the freed do_exit()
> > function. The process would not block the transition after it was
> > removed from the task list in the middle of do_exit().
> >
> > Maybe, you could confirm this in the existing crash dump.
>
> That's correct, I can confirm this. Below are the details:
>
> crash> bt
> PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
>  #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
>  #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
>  #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
>  #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
>  #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
> ffffffffc0ded7fa [livepatch_61_release6]
>  #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
> ffffffffc0ded7fa [livepatch_61_release6]
>  #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
> ffffffffc0ded7fa [livepatch_61_release6]
>  #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
>     [exception RIP: _MODULE_START_livepatch_61_release6+14330]
>     RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282
>
> crash> task_struct.tgid ffff94cd316f0000
>   tgid = 783848,
>
> crash> task_struct.tasks -o init_task
> struct task_struct {
>   [ffffffff9ac1b310] struct list_head tasks;
> }
>
> crash> list task_struct.tasks -H ffffffff9ac1b310 -s task_struct.tgid
> | grep 783848
>   tgid = 783848,
>
> The thread group leader remains on the task list, but the thread has
> already been removed from the thread_head list.
>
> crash> task 783848
> PID: 783848  TASK: ffff94cd603eb000  CPU: 18  COMMAND: "java"
> struct task_struct {
>   thread_info = {
>     flags = 16388,
>
> crash> task_struct.signal ffff94cd603eb000
>   signal = 0xffff94cc89d11b00,
>
> crash> signal_struct.thread_head -o 0xffff94cc89d11b00
> struct signal_struct {
>   [ffff94cc89d11b10] struct list_head thread_head;
> }
>
> crash> list task_struct.thread_node -H ffff94cc89d11b10 -s task_struct.pid
> ffff94cd603eb000
>   pid = 783848,
> ffff94ccd8343000
>   pid = 783879,
>
> crash> signal_struct.nr_threads,thread_head 0xffff94cc89d11b00
>   nr_threads = 2,
>   thread_head = {
>     next = 0xffff94cd603eba70,
>     prev = 0xffff94ccd8343a70
>   },
>
> crash> ps -g 783848
> PID: 783848  TASK: ffff94cd603eb000  CPU: 18  COMMAND: "java"
>   PID: 783879  TASK: ffff94ccd8343000  CPU: 81  COMMAND: "java"
>   PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
>   PID: 784023  TASK: ffff94d644b48000  CPU: 24  COMMAND: "java"
>   PID: 784025  TASK: ffff94dd30250000  CPU: 65  COMMAND: "java"
>   PID: 785242  TASK: ffff94ccb5963000  CPU: 48  COMMAND: "java"
>   PID: 785412  TASK: ffff94cd3eaf8000  CPU: 92  COMMAND: "java"
>   PID: 785415  TASK: ffff94cd6606b000  CPU: 23  COMMAND: "java"
>   PID: 785957  TASK: ffff94dfea4e3000  CPU: 16  COMMAND: "java"
>   PID: 787125  TASK: ffff94e70547b000  CPU: 27  COMMAND: "java"
>   PID: 787445  TASK: ffff94e49a2bb000  CPU: 28  COMMAND: "java"
>   PID: 787502  TASK: ffff94e41e0f3000  CPU: 36  COMMAND: "java"
>
> It seems like fixing this will be a challenging task.
>

Hello Petr,

I believe this case highlights the need for a hybrid livepatch
mode—where we allow the coexistence of atomic-replace and
non-atomic-replace patches. If a livepatch is set to non-replaceable,
it should neither be replaced by other livepatches nor replace any
other patches itself.

We’ve deployed this livepatch, including the change to do_exit(), to
nearly all of our servers—hundreds of thousands in total. It’s a real
tragedy that we can't unload it. Moving forward, we’ll have no choice
but to create non-atomic-replace livepatches to avoid this issue...

--
Regards
Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-22 14:01       ` Yafang Shao
@ 2025-01-22 15:56         ` Petr Mladek
  2025-01-23  2:19           ` Yafang Shao
  2025-01-23 15:55           ` Petr Mladek
  0 siblings, 2 replies; 12+ messages in thread
From: Petr Mladek @ 2025-01-22 15:56 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Josh Poimboeuf, jikos, Miroslav Benes, Joe Lawrence,
	live-patching

On Wed 2025-01-22 22:01:31, Yafang Shao wrote:
> On Wed, Jan 22, 2025 at 9:30 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Wed, Jan 22, 2025 at 7:45 PM Petr Mladek <pmladek@suse.com> wrote:
> > >
> > > On Wed 2025-01-22 14:36:55, Yafang Shao wrote:
> > > > On Tue, Jan 21, 2025 at 5:38 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > We encountered a panic while upgrading our livepatch, specifically
> > > > > replacing an old livepatch with a new version on our production
> > > > > servers.
> > > > >
> > > My theory is that the transition has finished and some other process
> > > started removing the older livepatch module. I guess that the memory
> > > with the livepatch_61_release6 code has been freed on another CPU.
> > >
> > > It would cause a crash of a process still running the freed do_exit()
> > > function. The process would not block the transition after it was
> > > removed from the task list in the middle of do_exit().
> > >
> > > Maybe, you could confirm this in the existing crash dump.
> >
> > That's correct, I can confirm this. Below are the details:
> >
> > crash> bt
> > PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
> >  #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
> >  #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
> >  #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
> >  #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
> >  #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
> > ffffffffc0ded7fa [livepatch_61_release6]
> >  #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
> > ffffffffc0ded7fa [livepatch_61_release6]
> >  #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
> > ffffffffc0ded7fa [livepatch_61_release6]
> >  #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
> >     [exception RIP: _MODULE_START_livepatch_61_release6+14330]
> >     RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282
> >
> > crash> task_struct.tgid ffff94cd316f0000
> >   tgid = 783848,
> >
> > crash> task_struct.tasks -o init_task
> > struct task_struct {
> >   [ffffffff9ac1b310] struct list_head tasks;
> > }
> >
> > crash> list task_struct.tasks -H ffffffff9ac1b310 -s task_struct.tgid
> > | grep 783848
> >   tgid = 783848,
> >
> > The thread group leader remains on the task list, but the thread has
> > already been removed from the thread_head list.
> >
> > crash> task 783848
> > PID: 783848  TASK: ffff94cd603eb000  CPU: 18  COMMAND: "java"
> > struct task_struct {
> >   thread_info = {
> >     flags = 16388,
> >
> > crash> task_struct.signal ffff94cd603eb000
> >   signal = 0xffff94cc89d11b00,
> >
> > crash> signal_struct.thread_head -o 0xffff94cc89d11b00
> > struct signal_struct {
> >   [ffff94cc89d11b10] struct list_head thread_head;
> > }
> >
> > crash> list task_struct.thread_node -H ffff94cc89d11b10 -s task_struct.pid
> > ffff94cd603eb000
> >   pid = 783848,
> > ffff94ccd8343000
> >   pid = 783879,
> >
> > crash> signal_struct.nr_threads,thread_head 0xffff94cc89d11b00
> >   nr_threads = 2,
> >   thread_head = {
> >     next = 0xffff94cd603eba70,
> >     prev = 0xffff94ccd8343a70
> >   },
> >
> > crash> ps -g 783848
> > PID: 783848  TASK: ffff94cd603eb000  CPU: 18  COMMAND: "java"
> >   PID: 783879  TASK: ffff94ccd8343000  CPU: 81  COMMAND: "java"
> >   PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
> >   PID: 784023  TASK: ffff94d644b48000  CPU: 24  COMMAND: "java"
> >   PID: 784025  TASK: ffff94dd30250000  CPU: 65  COMMAND: "java"
> >   PID: 785242  TASK: ffff94ccb5963000  CPU: 48  COMMAND: "java"
> >   PID: 785412  TASK: ffff94cd3eaf8000  CPU: 92  COMMAND: "java"
> >   PID: 785415  TASK: ffff94cd6606b000  CPU: 23  COMMAND: "java"
> >   PID: 785957  TASK: ffff94dfea4e3000  CPU: 16  COMMAND: "java"
> >   PID: 787125  TASK: ffff94e70547b000  CPU: 27  COMMAND: "java"
> >   PID: 787445  TASK: ffff94e49a2bb000  CPU: 28  COMMAND: "java"
> >   PID: 787502  TASK: ffff94e41e0f3000  CPU: 36  COMMAND: "java"
> >
> > It seems like fixing this will be a challenging task.

Could you please check if another CPU or process is running "rmmod"
which is removing the replaced livepatch_61_release6, please?

> 
> Hello Petr,
> 
> I believe this case highlights the need for a hybrid livepatch
> mode—where we allow the coexistence of atomic-replace and
> non-atomic-replace patches. If a livepatch is set to non-replaceable,
> it should neither be replaced by other livepatches nor replace any
> other patches itself.
> 
> We’ve deployed this livepatch, including the change to do_exit(), to
> nearly all of our servers—hundreds of thousands in total. It’s a real
> tragedy that we can't unload it. Moving forward, we’ll have no choice
> but to create non-atomic-replace livepatches to avoid this issue...

If my theory is correct then a workaround would be to keep the
replaced livepatch module loaded until all pending do_exit() calls
are finished. So that it stays in the memory as long as the code
is accessed.

It might be enough to update the scripting and call the rmmod after
some delay.

I doubt that a non-atomic-replace patches would make the life easier.
They would just create even more complicated scenario. But I might
be wrong.

Anyway, I am working on a POC which would allow to track
to-be-released processes. It would finish the transition only
when all the to-be-released processes already use the new code.
It won't allow to remove the disabled livepatch prematurely.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-22 15:56         ` Petr Mladek
@ 2025-01-23  2:19           ` Yafang Shao
  2025-01-23 15:55           ` Petr Mladek
  1 sibling, 0 replies; 12+ messages in thread
From: Yafang Shao @ 2025-01-23  2:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, jikos, Miroslav Benes, Joe Lawrence,
	live-patching

On Wed, Jan 22, 2025 at 11:56 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2025-01-22 22:01:31, Yafang Shao wrote:
> > On Wed, Jan 22, 2025 at 9:30 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Wed, Jan 22, 2025 at 7:45 PM Petr Mladek <pmladek@suse.com> wrote:
> > > >
> > > > On Wed 2025-01-22 14:36:55, Yafang Shao wrote:
> > > > > On Tue, Jan 21, 2025 at 5:38 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > We encountered a panic while upgrading our livepatch, specifically
> > > > > > replacing an old livepatch with a new version on our production
> > > > > > servers.
> > > > > >
> > > > My theory is that the transition has finished and some other process
> > > > started removing the older livepatch module. I guess that the memory
> > > > with the livepatch_61_release6 code has been freed on another CPU.
> > > >
> > > > It would cause a crash of a process still running the freed do_exit()
> > > > function. The process would not block the transition after it was
> > > > removed from the task list in the middle of do_exit().
> > > >
> > > > Maybe, you could confirm this in the existing crash dump.
> > >
> > > That's correct, I can confirm this. Below are the details:
> > >
> > > crash> bt
> > > PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
> > >  #0 [ffffba6f273db9a8] machine_kexec at ffffffff990632ad
> > >  #1 [ffffba6f273dba08] __crash_kexec at ffffffff9915c8af
> > >  #2 [ffffba6f273dbad0] crash_kexec at ffffffff9915db0c
> > >  #3 [ffffba6f273dbae0] oops_end at ffffffff99024bc9
> > >  #4 [ffffba6f273dbaf0] _MODULE_START_livepatch_61_release6 at
> > > ffffffffc0ded7fa [livepatch_61_release6]
> > >  #5 [ffffba6f273dbb80] _MODULE_START_livepatch_61_release6 at
> > > ffffffffc0ded7fa [livepatch_61_release6]
> > >  #6 [ffffba6f273dbbf8] _MODULE_START_livepatch_61_release6 at
> > > ffffffffc0ded7fa [livepatch_61_release6]
> > >  #7 [ffffba6f273dbc80] asm_exc_page_fault at ffffffff99c00bb7
> > >     [exception RIP: _MODULE_START_livepatch_61_release6+14330]
> > >     RIP: ffffffffc0ded7fa  RSP: ffffba6f273dbd30  RFLAGS: 00010282
> > >
> > > crash> task_struct.tgid ffff94cd316f0000
> > >   tgid = 783848,
> > >
> > > crash> task_struct.tasks -o init_task
> > > struct task_struct {
> > >   [ffffffff9ac1b310] struct list_head tasks;
> > > }
> > >
> > > crash> list task_struct.tasks -H ffffffff9ac1b310 -s task_struct.tgid
> > > | grep 783848
> > >   tgid = 783848,
> > >
> > > The thread group leader remains on the task list, but the thread has
> > > already been removed from the thread_head list.
> > >
> > > crash> task 783848
> > > PID: 783848  TASK: ffff94cd603eb000  CPU: 18  COMMAND: "java"
> > > struct task_struct {
> > >   thread_info = {
> > >     flags = 16388,
> > >
> > > crash> task_struct.signal ffff94cd603eb000
> > >   signal = 0xffff94cc89d11b00,
> > >
> > > crash> signal_struct.thread_head -o 0xffff94cc89d11b00
> > > struct signal_struct {
> > >   [ffff94cc89d11b10] struct list_head thread_head;
> > > }
> > >
> > > crash> list task_struct.thread_node -H ffff94cc89d11b10 -s task_struct.pid
> > > ffff94cd603eb000
> > >   pid = 783848,
> > > ffff94ccd8343000
> > >   pid = 783879,
> > >
> > > crash> signal_struct.nr_threads,thread_head 0xffff94cc89d11b00
> > >   nr_threads = 2,
> > >   thread_head = {
> > >     next = 0xffff94cd603eba70,
> > >     prev = 0xffff94ccd8343a70
> > >   },
> > >
> > > crash> ps -g 783848
> > > PID: 783848  TASK: ffff94cd603eb000  CPU: 18  COMMAND: "java"
> > >   PID: 783879  TASK: ffff94ccd8343000  CPU: 81  COMMAND: "java"
> > >   PID: 783972  TASK: ffff94cd316f0000  CPU: 70  COMMAND: "java"
> > >   PID: 784023  TASK: ffff94d644b48000  CPU: 24  COMMAND: "java"
> > >   PID: 784025  TASK: ffff94dd30250000  CPU: 65  COMMAND: "java"
> > >   PID: 785242  TASK: ffff94ccb5963000  CPU: 48  COMMAND: "java"
> > >   PID: 785412  TASK: ffff94cd3eaf8000  CPU: 92  COMMAND: "java"
> > >   PID: 785415  TASK: ffff94cd6606b000  CPU: 23  COMMAND: "java"
> > >   PID: 785957  TASK: ffff94dfea4e3000  CPU: 16  COMMAND: "java"
> > >   PID: 787125  TASK: ffff94e70547b000  CPU: 27  COMMAND: "java"
> > >   PID: 787445  TASK: ffff94e49a2bb000  CPU: 28  COMMAND: "java"
> > >   PID: 787502  TASK: ffff94e41e0f3000  CPU: 36  COMMAND: "java"
> > >
> > > It seems like fixing this will be a challenging task.
>
> Could you please check if another CPU or process is running "rmmod"
> which is removing the replaced livepatch_61_release6, please?

Unfortunately, I couldn't find this task in all the vmcores. It’s
possible that it has already exited.

>
> >
> > Hello Petr,
> >
> > I believe this case highlights the need for a hybrid livepatch
> > mode—where we allow the coexistence of atomic-replace and
> > non-atomic-replace patches. If a livepatch is set to non-replaceable,
> > it should neither be replaced by other livepatches nor replace any
> > other patches itself.
> >
> > We’ve deployed this livepatch, including the change to do_exit(), to
> > nearly all of our servers—hundreds of thousands in total. It’s a real
> > tragedy that we can't unload it. Moving forward, we’ll have no choice
> > but to create non-atomic-replace livepatches to avoid this issue...
>
> If my theory is correct then a workaround would be to keep the
> replaced livepatch module loaded until all pending do_exit() calls
> are finished. So that it stays in the memory as long as the code
> is accessed.

Yes, we’ve been running this test case on a test server, and it’s
still working fine so far. We’ll roll it out to more test servers, and
hopefully, it’ll serve as a viable workaround.

By the way, isn’t it a common issue in kernel modules that if tasks
are executing code from the module, and you try to rmmod it, the
module should be deferred from unloading until all tasks have finished
executing its code?

>
> It might be enough to update the scripting and call the rmmod after
> some delay.
>
> I doubt that a non-atomic-replace patches would make the life easier.
> They would just create even more complicated scenario. But I might
> be wrong.

The hybrid livepatch mode should serve as a fallback for the
atomic-replace mode. Livepatching has significantly improved our
workflow, but if issues arise within a livepatch, there should be a
fallback mechanism to ensure stability.

>
> Anyway, I am working on a POC which would allow to track
> to-be-released processes. It would finish the transition only
> when all the to-be-released processes already use the new code.
> It won't allow to remove the disabled livepatch prematurely.

Great. Thanks for your help.


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-22 15:56         ` Petr Mladek
  2025-01-23  2:19           ` Yafang Shao
@ 2025-01-23 15:55           ` Petr Mladek
  2025-01-24  1:08             ` Josh Poimboeuf
  2025-01-24  3:11             ` Yafang Shao
  1 sibling, 2 replies; 12+ messages in thread
From: Petr Mladek @ 2025-01-23 15:55 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Josh Poimboeuf, jikos, Miroslav Benes, Joe Lawrence,
	live-patching

On Wed 2025-01-22 16:56:31, Petr Mladek wrote:
> Anyway, I am working on a POC which would allow to track
> to-be-released processes. It would finish the transition only
> when all the to-be-released processes already use the new code.
> It won't allow to remove the disabled livepatch prematurely.

Here is the POC. I am not sure if it is the right path, see
the warning and open questions in the commit message.

I am going to wait for some feedback before investing more time
into this.

The patch is against current Linus' master, aka, v6.13-rc7.

From ac7287d99aaeca7a4536e8ade61b9bd0c8ec7fdc Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 23 Jan 2025 09:04:09 +0100
Subject: [PATCH] livepatching: Block transition until
 delayed_put_task_struct()

WARNING: This patch is just a POC. It will blow up the system because
	RCU callbacks are handled by softirq context which are handled
	by default on exit from IRQ handlers. And it is not allowed
	to take sleeping locks here, see the backtrace at the end
	of the commit message.

	We would need to synchronize the counting of the exiting
	processes with the livepatch transition another way.

	Hmm, I guess that spin_lock is legal in softirq context.
	It might be the easiest approach.

	In the worst case, we would need to use a lock less
	algorithm which might make things even more complicated.

Here is the description of the problem and the solution:

The livepatching core code uses for_each_process_thread() cycle for setting
and checking the state of processes on the system. It works well as long
as the livepatch touches only code which is used only by processes in
the task list.

The problem starts when the livepatch replaces code which might be
used by a process which is not longer in the task list. It is
mostly about the processes which are being removed. They
disapper from the list here:

	+ release_task()
	  + __exit_signal()
	    + __unhash_process()

There are basically two groups of problems:

1. The livepatching core does not longer updates TIF_PATCH_PENDING
   and p->patch_state. In this case, the ftrace handler
   klp_check_stack_func() might do wrong decision and
   use an incompatible variant of the code.

   This might become a real problem only when the code modifies
   the semantic.

2. The livepatching core does not longer check the stack and
   could finish the livepatch transition even when these
   to-be-removed processes have not been transitioned yet.

   This might even cause Oops when the to-be-removed processes
   are running a code from a disabled livepatch which might
   be removed in the meantime.

This patch tries to address the 2nd problem which most likely caused
the following crash:

[156821.048318] livepatch: enabling patch 'livepatch_61_release12'
[156821.061580] livepatch: 'livepatch_61_release12': starting patching
transition
[156821.122212] livepatch: 'livepatch_61_release12': patching complete
[156821.175871] kernel tried to execute NX-protected page - exploit
attempt? (uid: 10524)
[156821.176011] BUG: unable to handle page fault for address: ffffffffc0ded7fa
[156821.176121] #PF: supervisor instruction fetch in kernel mode
[156821.176211] #PF: error_code(0x0011) - permissions violation
[156821.176302] PGD 986c15067 P4D 986c15067 PUD 986c17067 PMD
184f53b067 PTE 800000194c08e163
[156821.176435] Oops: 0011 [#1] PREEMPT SMP NOPTI
[156821.176506] CPU: 70 PID: 783972 Comm: java Kdump: loaded Tainted:
G S      W  O  K    6.1.52-3 #3.pdd
[156821.176654] Hardware name: Inspur SA5212M5/SA5212M5, BIOS 4.1.20 05/05/2021
[156821.176766] RIP: 0010:0xffffffffc0ded7fa
[156821.176841] Code: 0a 00 00 48 89 42 08 48 89 10 4d 89 a6 08 0a 00
00 4c 89 f7 4d 89 a6 10 0a 00 00 4d 8d a7 08 0a 00 00 4d 89 fe e8 00
00 00 00 <49> 8b 87 08 0a 00 00 48 2d 08 0a 00 00 4d 39 ec 75 aa 48 89
df e8
[156821.177138] RSP: 0018:ffffba6f273dbd30 EFLAGS: 00010282
[156821.177222] RAX: 0000000000000000 RBX: ffff94cd316f0000 RCX:
000000008020000d
[156821.177338] RDX: 000000008020000e RSI: 000000008020000d RDI:
ffff94cd316f0000
[156821.177452] RBP: ffffba6f273dbd88 R08: ffff94cd316f13f8 R09:
0000000000000001
[156821.177567] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffba6f273dbd48
[156821.177682] R13: ffffba6f273dbd48 R14: ffffba6f273db340 R15:
ffffba6f273db340
[156821.177797] FS:  0000000000000000(0000) GS:ffff94e321180000(0000)
knlGS:0000000000000000
[156821.177926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[156821.178019] CR2: ffffffffc0ded7fa CR3: 000000015909c006 CR4:
00000000007706e0
[156821.178133] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[156821.178248] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[156821.178363] PKRU: 55555554
[156821.178407] Call Trace:
[156821.178449]  <TASK>
[156821.178492]  ? show_regs.cold+0x1a/0x1f
[156821.178559]  ? __die_body+0x20/0x70
[156821.178617]  ? __die+0x2b/0x37
[156821.178669]  ? page_fault_oops+0x136/0x2b0
[156821.178734]  ? search_bpf_extables+0x63/0x90
[156821.178805]  ? search_exception_tables+0x5f/0x70
[156821.178881]  ? kernelmode_fixup_or_oops+0xa2/0x120
[156821.178957]  ? __bad_area_nosemaphore+0x176/0x1b0
[156821.179034]  ? bad_area_nosemaphore+0x16/0x20
[156821.179105]  ? do_kern_addr_fault+0x77/0x90
[156821.179175]  ? exc_page_fault+0xc6/0x160
[156821.179239]  ? asm_exc_page_fault+0x27/0x30
[156821.179310]  do_group_exit+0x35/0x90
[156821.179371]  get_signal+0x909/0x950
[156821.179429]  ? wake_up_q+0x50/0x90
[156821.179486]  arch_do_signal_or_restart+0x34/0x2a0
[156821.183207]  exit_to_user_mode_prepare+0x149/0x1b0
[156821.186963]  syscall_exit_to_user_mode+0x1e/0x50
[156821.190723]  do_syscall_64+0x48/0x90
[156821.194500]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[156821.198195] RIP: 0033:0x7f967feb5a35
[156821.201769] Code: Unable to access opcode bytes at 0x7f967feb5a0b.
[156821.205283] RSP: 002b:00007f96664ee670 EFLAGS: 00000246 ORIG_RAX:
00000000000000ca
[156821.208790] RAX: fffffffffffffe00 RBX: 00007f967808a650 RCX:
00007f967feb5a35
[156821.212305] RDX: 000000000000000f RSI: 0000000000000080 RDI:
00007f967808a654
[156821.215785] RBP: 00007f96664ee6c0 R08: 00007f967808a600 R09:
0000000000000007
[156821.219273] R10: 0000000000000000 R11: 0000000000000246 R12:
00007f967808a600
[156821.222727] R13: 00007f967808a628 R14: 00007f967f691220 R15:
00007f96664ee750
[156821.226155]  </TASK>
[156821.229470] Modules linked in: livepatch_61_release12(OK)
ebtable_filter ebtables af_packet_diag netlink_diag xt_DSCP xt_owner
iptable_mangle iptable_raw xt_CT cls_bpf sch_ingress bpf_preload
binfmt_misc raw_diag unix_diag tcp_diag udp_diag inet_diag
iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack_netlink
nfnetlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 overlay af_packet
bonding tls intel_rapl_msr intel_rapl_common intel_uncore_frequency
intel_uncore_frequency_common isst_if_common skx_edac nfit
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
rapl vfat fat intel_cstate iTCO_wdt xfs intel_uncore pcspkr ses
enclosure mei_me i2c_i801 input_leds lpc_ich acpi_ipmi ioatdma
i2c_smbus mei mfd_core dca wmi ipmi_si intel_pch_thermal ipmi_devintf
ipmi_msghandler acpi_cpufreq acpi_pad acpi_power_meter ip_tables ext4
mbcache jbd2 sd_mod sg mpt3sas raid_class scsi_transport_sas
megaraid_sas crct10dif_pclmul crc32_pclmul crc32c_intel
polyval_clmulni polyval_generic
[156821.229555]  ghash_clmulni_intel sha512_ssse3 aesni_intel
crypto_simd cryptd nvme nvme_core t10_pi i40e ptp pps_core ahci
libahci libata deflate zlib_deflate
[156821.259012] Unloaded tainted modules:
livepatch_61_release6(OK):14089 livepatch_61_release12(OK):14088 [last
unloaded: livepatch_61_release6(OK)]
[156821.275421] CR2: ffffffffc0ded7fa

This patch tries to avoid the crash by tracking the number of
to-be-released processes. They block the current transition
until delayed_put_task_struct() is called.

It is just a POC. There are many open questions:

1. Does it help at all?

   It looks to me that release_task() is always called from another
   task. For example, exit_notify() seems to call it for a dead
   childern. It is not clear to me whether the released task
   is still running do_exit() at this point.

   Well, for example, wait_task_zombie() calls release_task()
   in some special cases.

2. Is it enough to block the transition until delayed_put_task_struct()?

   I do not fully understand the maze of code. It might still be too
   early.

   It seems that put_task_struct() can delay the release even more.
   Mabye, we should drop the klp reference count when it is final
   put_task_struct().

3. Is this worth the effort?

   It is probably a bad idea to livepatch do_exit() in the first place.

   But it is not obvious why it is a problem. It would be nice to at
   least detect the problem and warn about it.

Finally, here is the backtrace showing the problem with taking klp_mutex in
the RCU handler.

[    0.986614][    C3] =============================
[    0.987234][    C3] [ BUG: Invalid wait context ]
[    0.987882][    C3] 6.13.0-rc7-default+ #234 Tainted: G        W
[    0.988733][    C3] -----------------------------
[    0.989219][    C3] swapper/3/0 is trying to lock:
[    0.989698][    C3] ffffffff86447030 (klp_mutex){+.+.}-{4:4}, at: klp_put_releasing_task+0x1b/0x90
[    0.990283][    C3] other info that might help us debug this:
[    0.990283][    C3] context-{3:3}
[    0.990283][    C3] 1 lock held by swapper/3/0:
[    0.990283][    C3]  #0: ffffffff86377fc0 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x1a8/0xa40
[    0.990283][    C3] stack backtrace:
[    0.990283][    C3] CPU: 3 UID: 0 PID: 0 Comm: swapper/3 Tainted: G        W          6.13.0-rc7-default+ #234 bfeca4b35f98fc672d5ef9f6b720fe908580ae2c
[    0.990283][    C3] Tainted: [W]=WARN
[    0.990283][    C3] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
[    0.990283][    C3] Call Trace:
[    0.990283][    C3]  <IRQ>
[    0.990283][    C3]  dump_stack_lvl+0x6c/0xa0
[    0.990283][    C3]  __lock_acquire+0x919/0xb70
[    0.990283][    C3]  lock_acquire.part.0+0xad/0x220
[    0.990283][    C3]  ? klp_put_releasing_task+0x1b/0x90
[    0.990283][    C3]  ? rcu_is_watching+0x11/0x50
[    0.990283][    C3]  ? lock_acquire+0x107/0x140
[    0.990283][    C3]  ? klp_put_releasing_task+0x1b/0x90
[    0.990283][    C3]  __mutex_lock+0xb5/0xe00
[    0.990283][    C3]  ? klp_put_releasing_task+0x1b/0x90
[    0.990283][    C3]  ? __lock_acquire+0x551/0xb70
[    0.990283][    C3]  ? klp_put_releasing_task+0x1b/0x90
[    0.990283][    C3]  ? lock_acquire.part.0+0xbd/0x220
[    0.990283][    C3]  ? klp_put_releasing_task+0x1b/0x90
[    0.990283][    C3]  klp_put_releasing_task+0x1b/0x90
[    0.990283][    C3]  delayed_put_task_struct+0x4b/0x150
[    0.990283][    C3]  ? rcu_do_batch+0x1d2/0xa40
[    0.990283][    C3]  rcu_do_batch+0x1d4/0xa40
[    0.990283][    C3]  ? rcu_do_batch+0x1a8/0xa40
[    0.990283][    C3]  ? lock_is_held_type+0xd9/0x130
[    0.990283][    C3]  rcu_core+0x3bb/0x4f0
[    0.990283][    C3]  handle_softirqs+0xe2/0x400
[    0.990283][    C3]  __irq_exit_rcu+0xd9/0x150
[    0.990283][    C3]  irq_exit_rcu+0xe/0x30
[    0.990283][    C3]  sysvec_apic_timer_interrupt+0x8d/0xb0
[    0.990283][    C3]  </IRQ>
[    0.990283][    C3]  <TASK>
[    0.990283][    C3]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[    0.990283][    C3] RIP: 0010:pv_native_safe_halt+0xf/0x20
[    0.990283][    C3] Code: 22 d7 c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 03 92 2f 00 fb f4 <c3> cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
[    0.990283][    C3] RSP: 0000:ffffb95bc00c3eb8 EFLAGS: 00000246
[    0.990283][    C3] RAX: 0000000000000111 RBX: 00000000001fd2cc RCX: 0000000000000000
[    0.990283][    C3] RDX: 0000000000000000 RSI: ffffffff85f06d6a RDI: ffffffff85ed7907
[    0.990283][    C3] RBP: ffff9cd980883740 R08: 0000000000000001 R09: 0000000000000000
[    0.990283][    C3] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[    0.990283][    C3] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.990283][    C3]  default_idle+0x9/0x20
[    0.990283][    C3]  default_idle_call+0x84/0x1e0
[    0.990283][    C3]  cpuidle_idle_call+0x134/0x170
[    0.990283][    C3]  ? tsc_verify_tsc_adjust+0x45/0xd0
[    0.990283][    C3]  do_idle+0x93/0xf0
[    0.990283][    C3]  cpu_startup_entry+0x29/0x30
[    0.990283][    C3]  start_secondary+0x121/0x140
[    0.990283][    C3]  common_startup_64+0x13e/0x141
[    0.990283][    C3]  </TASK>

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/livepatch.h     |  3 ++
 include/linux/sched.h         |  1 +
 kernel/exit.c                 |  5 ++
 kernel/livepatch/transition.c | 89 +++++++++++++++++++++++++++++------
 4 files changed, 83 insertions(+), 15 deletions(-)

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 51a258c24ff5..63e9e56ca6fe 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -201,6 +201,9 @@ void klp_module_going(struct module *mod);
 void klp_copy_process(struct task_struct *child);
 void klp_update_patch_state(struct task_struct *task);
 
+void klp_get_releasing_task(struct task_struct *task);
+void klp_put_releasing_task(struct task_struct *task);
+
 static inline bool klp_patch_pending(struct task_struct *task)
 {
 	return test_tsk_thread_flag(task, TIF_PATCH_PENDING);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 64934e0830af..d8a587208212 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1542,6 +1542,7 @@ struct task_struct {
 #endif
 #ifdef CONFIG_LIVEPATCH
 	int patch_state;
+	bool klp_exit_counted;
 #endif
 #ifdef CONFIG_SECURITY
 	/* Used by LSM modules for access restriction: */
diff --git a/kernel/exit.c b/kernel/exit.c
index 1dcddfe537ee..a2a9672077d5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -224,6 +224,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
 	rethook_flush_task(tsk);
 	perf_event_delayed_put(tsk);
 	trace_sched_process_free(tsk);
+	klp_put_releasing_task(tsk);
 	put_task_struct(tsk);
 }
 
@@ -242,6 +243,10 @@ void release_task(struct task_struct *p)
 	struct task_struct *leader;
 	struct pid *thread_pid;
 	int zap_leader;
+
+	/* Block the transition until the very end. */
+	klp_get_releasing_task(p);
+
 repeat:
 	/* don't need to get the RCU readlock here - the process is dead and
 	 * can't be modifying its own credentials. But shut RCU-lockdep up */
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index ba069459c101..6403af34f231 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -25,6 +25,14 @@ struct klp_patch *klp_transition_patch;
 
 static int klp_target_state = KLP_TRANSITION_IDLE;
 
+/*
+ * Allow to livepatch do_exit() function by counting processes which
+ * are being removed from the task list. They will block the transition
+ * almost until the task struct is released.
+ */
+unsigned int klp_releasing_tasks_cnt;
+bool klp_track_releasing_tasks;
+
 static unsigned int klp_signals_cnt;
 
 /*
@@ -87,7 +95,7 @@ static void klp_synchronize_transition(void)
  * The transition to the target patch state is complete.  Clean up the data
  * structures.
  */
-static void klp_complete_transition(void)
+static void __klp_complete_transition(void)
 {
 	struct klp_object *obj;
 	struct klp_func *func;
@@ -156,6 +164,25 @@ static void klp_complete_transition(void)
 	klp_transition_patch = NULL;
 }
 
+static void klp_complete_transition(void)
+{
+	struct klp_patch *patch;
+
+	klp_cond_resched_disable();
+	patch = klp_transition_patch;
+	__klp_complete_transition();
+
+	/*
+	 * It would make more sense to free the unused patches in
+	 * klp_complete_transition() but it is called also
+	 * from klp_cancel_transition().
+	 */
+	if (!patch->enabled)
+		klp_free_patch_async(patch);
+	else if (patch->replace)
+		klp_free_replaced_patches_async(patch);
+}
+
 /*
  * This is called in the error path, to cancel a transition before it has
  * started, i.e. klp_init_transition() has been called but
@@ -171,7 +198,7 @@ void klp_cancel_transition(void)
 		 klp_transition_patch->mod->name);
 
 	klp_target_state = KLP_TRANSITION_UNPATCHED;
-	klp_complete_transition();
+	__klp_complete_transition();
 }
 
 /*
@@ -452,7 +479,6 @@ void klp_try_complete_transition(void)
 {
 	unsigned int cpu;
 	struct task_struct *g, *task;
-	struct klp_patch *patch;
 	bool complete = true;
 
 	WARN_ON_ONCE(klp_target_state == KLP_TRANSITION_IDLE);
@@ -507,20 +533,14 @@ void klp_try_complete_transition(void)
 		return;
 	}
 
-	/* Done!  Now cleanup the data structures. */
-	klp_cond_resched_disable();
-	patch = klp_transition_patch;
-	klp_complete_transition();
-
 	/*
-	 * It would make more sense to free the unused patches in
-	 * klp_complete_transition() but it is called also
-	 * from klp_cancel_transition().
+	 * All tasks in the task list are migrated. Stop counting releasing
+	 * processes. The last one would finish the transition when any.
 	 */
-	if (!patch->enabled)
-		klp_free_patch_async(patch);
-	else if (patch->replace)
-		klp_free_replaced_patches_async(patch);
+	klp_track_releasing_tasks = false;
+
+	/* Done!  Now cleanup the data structures. */
+	klp_complete_transition();
 }
 
 /*
@@ -582,6 +602,8 @@ void klp_init_transition(struct klp_patch *patch, int state)
 
 	klp_transition_patch = patch;
 
+	klp_track_releasing_tasks = true;
+
 	/*
 	 * Set the global target patch state which tasks will switch to.  This
 	 * has no effect until the TIF_PATCH_PENDING flags get set later.
@@ -715,6 +737,43 @@ void klp_copy_process(struct task_struct *child)
 	child->patch_state = current->patch_state;
 }
 
+void klp_get_releasing_task(struct task_struct* p)
+{
+	mutex_lock(&klp_mutex);
+
+	if (klp_track_releasing_tasks) {
+		klp_releasing_tasks_cnt++;
+		p->klp_exit_counted = true;
+	}
+
+	mutex_unlock(&klp_mutex);
+}
+
+void klp_put_releasing_task(struct task_struct *p)
+{
+	mutex_lock(&klp_mutex);
+
+	if (!p->klp_exit_counted)
+		goto out;
+
+	if (WARN_ON_ONCE(!klp_releasing_tasks_cnt))
+		goto out;
+
+	if (--klp_releasing_tasks_cnt)
+		goto out;
+
+	/*
+	 * Do not finish the transition when there are still non-migrated
+	 * processes in the task list.
+	 */
+	if (klp_track_releasing_tasks)
+		goto out;
+
+	klp_complete_transition();
+out:
+	mutex_unlock(&klp_mutex);
+}
+
 /*
  * Drop TIF_PATCH_PENDING of all tasks on admin's request. This forces an
  * existing transition to finish.
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-23 15:55           ` Petr Mladek
@ 2025-01-24  1:08             ` Josh Poimboeuf
  2025-01-24  3:11             ` Yafang Shao
  1 sibling, 0 replies; 12+ messages in thread
From: Josh Poimboeuf @ 2025-01-24  1:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Yafang Shao, jikos, Miroslav Benes, Joe Lawrence, live-patching,
	Peter Zijlstra

On Thu, Jan 23, 2025 at 04:55:35PM +0100, Petr Mladek wrote:
> On Wed 2025-01-22 16:56:31, Petr Mladek wrote:
> > Anyway, I am working on a POC which would allow to track
> > to-be-released processes. It would finish the transition only
> > when all the to-be-released processes already use the new code.
> > It won't allow to remove the disabled livepatch prematurely.

Can we just keep a list of exiting tasks, and use something like
for_each_executing_task() in the transition code?

Tasks-RCU is actually doing something like that already, see
exit_tasks_rcu_start() and exit_tasks_rcu_start_finish().  Maybe we
could use the same list.

-- 
Josh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-23 15:55           ` Petr Mladek
  2025-01-24  1:08             ` Josh Poimboeuf
@ 2025-01-24  3:11             ` Yafang Shao
  2025-01-24  7:07               ` Yafang Shao
  1 sibling, 1 reply; 12+ messages in thread
From: Yafang Shao @ 2025-01-24  3:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, jikos, Miroslav Benes, Joe Lawrence,
	live-patching

On Thu, Jan 23, 2025 at 11:55 PM Petr Mladek <pmladek@suse.com> wrote:
>
> On Wed 2025-01-22 16:56:31, Petr Mladek wrote:
> > Anyway, I am working on a POC which would allow to track
> > to-be-released processes. It would finish the transition only
> > when all the to-be-released processes already use the new code.
> > It won't allow to remove the disabled livepatch prematurely.
>
> Here is the POC. I am not sure if it is the right path, see
> the warning and open questions in the commit message.
>
> I am going to wait for some feedback before investing more time
> into this.
>
> The patch is against current Linus' master, aka, v6.13-rc7.
>
> From ac7287d99aaeca7a4536e8ade61b9bd0c8ec7fdc Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Thu, 23 Jan 2025 09:04:09 +0100
> Subject: [PATCH] livepatching: Block transition until
>  delayed_put_task_struct()
>
> WARNING: This patch is just a POC. It will blow up the system because
>         RCU callbacks are handled by softirq context which are handled
>         by default on exit from IRQ handlers. And it is not allowed
>         to take sleeping locks here, see the backtrace at the end
>         of the commit message.
>
>         We would need to synchronize the counting of the exiting
>         processes with the livepatch transition another way.
>
>         Hmm, I guess that spin_lock is legal in softirq context.
>         It might be the easiest approach.
>
>         In the worst case, we would need to use a lock less
>         algorithm which might make things even more complicated.
>
> Here is the description of the problem and the solution:
>
> The livepatching core code uses for_each_process_thread() cycle for setting
> and checking the state of processes on the system. It works well as long
> as the livepatch touches only code which is used only by processes in
> the task list.
>
> The problem starts when the livepatch replaces code which might be
> used by a process which is not longer in the task list. It is
> mostly about the processes which are being removed. They
> disapper from the list here:
>
>         + release_task()
>           + __exit_signal()
>             + __unhash_process()
>
> There are basically two groups of problems:
>
> 1. The livepatching core does not longer updates TIF_PATCH_PENDING
>    and p->patch_state. In this case, the ftrace handler
>    klp_check_stack_func() might do wrong decision and
>    use an incompatible variant of the code.

I believe I was able to reproduce the issue while attempting to
trigger the panic. The warning message is as follows:

The warning occurred at the following location:

 klp_ftrace_handler
      if (unlikely(func->transition)) {
          WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
  }

[58063.291589] livepatch: enabling patch 'livepatch_61_release12'
[58063.297580] livepatch: 'livepatch_61_release12': starting patching transition
[58063.323340] ------------[ cut here ]------------
[58063.323343] WARNING: CPU: 58 PID: 3851051 at
kernel/livepatch/patch.c:98 klp_ftrace_handler+0x136/0x150
[58063.323349] Modules linked in: livepatch_61_release12(OK)
livepatch_61_release6(OK) ebtable_filter ebtables af_packet_diag
netlink_diag xt_DSCP xt_owner iptable_mangle raw_diag unix_diag
udp_diag iptable_raw xt_CT tcp_diag inet_diag cls_bpf sch_ingress
bpf_preload binfmt_misc iptable_filter bpfilter xt_conntrack nf_nat
nf_conntrack_netlink nfnetlink nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 overlay af_packet bonding tls intel_rapl_msr
intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm irqbypass rapl vfat intel_cstate fat iTCO_wdt
xfs intel_uncore pcspkr ses enclosure input_leds i2c_i801 i2c_smbus
mei_me lpc_ich ioatdma mei mfd_core intel_pch_thermal dca acpi_ipmi
wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq acpi_pad
acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod sg mpt3sas
raid_class scsi_transport_sas megaraid_sas crct10dif_pclmul
crc32_pclmul crc32c_intel
[58063.323402]  polyval_clmulni polyval_generic ghash_clmulni_intel
sha512_ssse3 aesni_intel nvme crypto_simd cryptd nvme_core t10_pi i40e
ptp pps_core ahci libahci libata deflate zlib_deflate
[58063.323413] Unloaded tainted modules:
livepatch_61_release6(OK):3369 livepatch_61_release12(OK):3370 [last
unloaded: livepatch_61_release12(OK)]
[58063.323418] CPU: 58 PID: 3851051 Comm: docker Kdump: loaded
Tainted: G S         O  K    6.1.52-3
[58063.323421] Hardware name: Inspur SA5212M5/SA5212M5, BIOS 4.1.20 05/05/2021
[58063.323423] RIP: 0010:klp_ftrace_handler+0x136/0x150
[58063.323425] Code: eb b3 0f 1f 44 00 00 eb b5 8b 89 f4 23 00 00 83
f9 ff 74 16 85 c9 75 89 48 8b 00 48 8d 50 90 48 39 c6 0f 85 79 ff ff
ff eb 8b <0f> 0b e9 70 ff ff ff e8 ae 24 9c 00 66 66 2e 0f 1f 84 00 00
00 00
[58063.323428] RSP: 0018:ffffa87b2367fbb8 EFLAGS: 00010046
[58063.323429] RAX: ffff8b0ee59229a0 RBX: ffff8b26fa47b000 RCX: 00000000ffffffff
[58063.323432] RDX: ffff8b0ee5922930 RSI: ffff8b2d41e53f10 RDI: ffffa87b2367fbd8
[58063.323433] RBP: ffffa87b2367fbc8 R08: 0000000000000000 R09: fffffffffffffff7
[58063.323434] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b2520eaf240
[58063.323435] R13: ffff8b26fa47b000 R14: 000000000002f240 R15: 0000000000000000
[58063.323436] FS:  0000000000000000(0000) GS:ffff8b2520e80000(0000)
knlGS:0000000000000000
[58063.323438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58063.323440] CR2: 00007f19d7eaf000 CR3: 000000187a676004 CR4: 00000000007706e0
[58063.323441] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58063.323442] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[58063.323444] PKRU: 55555554
[58063.323445] Call Trace:
[58063.323445]  <TASK>
[58063.323449]  ? show_regs.cold+0x1a/0x1f
[58063.323454]  ? klp_ftrace_handler+0x136/0x150
[58063.323455]  ? __warn+0x84/0xd0
[58063.323457]  ? klp_ftrace_handler+0x136/0x150
[58063.323459]  ? report_bug+0x105/0x180
[58063.323463]  ? handle_bug+0x40/0x70
[58063.323467]  ? exc_invalid_op+0x19/0x70
[58063.323469]  ? asm_exc_invalid_op+0x1b/0x20
[58063.323474]  ? klp_ftrace_handler+0x136/0x150
[58063.323476]  ? kmem_cache_free+0x155/0x470
[58063.323479]  0xffffffffc0876099
[58063.323495]  ? update_rq_clock+0x5/0x250
[58063.323498]  update_rq_clock+0x5/0x250
[58063.323500]  __schedule+0xed/0x8f0
[58063.323504]  ? update_rq_clock+0x5/0x250
[58063.323506]  ? __schedule+0xed/0x8f0
[58063.323508]  ? trace_hardirqs_off+0x36/0xf0
[58063.323512]  do_task_dead+0x44/0x50
[58063.323515]  do_exit+0x7cd/0xb90 [livepatch_61_release6]
[58063.323525]  ? xfs_inode_mark_reclaimable+0x320/0x320 [livepatch_61_release6]
[58063.323533]  do_group_exit+0x35/0x90
[58063.323536]  get_signal+0x909/0x950
[58063.323539]  ? get_futex_key+0xa4/0x4f0
[58063.323543]  arch_do_signal_or_restart+0x34/0x2a0
[58063.323547]  exit_to_user_mode_prepare+0x149/0x1b0
[58063.323551]  syscall_exit_to_user_mode+0x1e/0x50
[58063.323555]  do_syscall_64+0x48/0x90
[58063.323557]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[58063.323560] RIP: 0033:0x5601df8122f3
[58063.323563] Code: Unable to access opcode bytes at 0x5601df8122c9.
[58063.323564] RSP: 002b:00007f9ce6ffccc0 EFLAGS: 00000286 ORIG_RAX:
00000000000000ca
[58063.323566] RAX: fffffffffffffe00 RBX: 000000c42053e000 RCX: 00005601df8122f3
[58063.323567] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000c42053e148
[58063.323568] RBP: 00007f9ce6ffcd08 R08: 0000000000000000 R09: 0000000000000000
[58063.323569] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000000
[58063.323570] R13: 0000000000801000 R14: 0000000000000000 R15: 00007f9ce6ffd700
[58063.323573]  </TASK>
[58063.323574] ---[ end trace 0000000000000000 ]---

>
>    This might become a real problem only when the code modifies
>    the semantic.
>
> 2. The livepatching core does not longer check the stack and
>    could finish the livepatch transition even when these
>    to-be-removed processes have not been transitioned yet.
>
>    This might even cause Oops when the to-be-removed processes
>    are running a code from a disabled livepatch which might
>    be removed in the meantime.
>
> This patch tries to address the 2nd problem which most likely caused
> the following crash:
> [...]
>
> This patch tries to avoid the crash by tracking the number of
> to-be-released processes. They block the current transition
> until delayed_put_task_struct() is called.
>
> It is just a POC. There are many open questions:
>
> 1. Does it help at all?
>
>    It looks to me that release_task() is always called from another
>    task. For example, exit_notify() seems to call it for a dead
>    childern. It is not clear to me whether the released task
>    is still running do_exit() at this point.

If the task is a thread (but not the thread group leader), it should
call release_task(), correct? Below is the trace from our production
server:

$ bpftrace -e 'k:release_task {$tsk=(struct task_struct *)arg0; if
($tsk->pid == tid){@stack[kstack()]=count()}}'
@stack[
    release_task+1
    kthread_exit+41
    kthread+200
    ret_from_fork+31
]: 1
@stack[
    release_task+1
    do_group_exit+53
    __x64_sys_exit_group+24
    do_syscall_64+56
    entry_SYSCALL_64_after_hwframe+100
]: 2
@stack[
    release_task+1
    do_group_exit+53
    get_signal+2313
    arch_do_signal_or_restart+52
    exit_to_user_mode_prepare+329
    syscall_exit_to_user_mode+30
    do_syscall_64+72
    entry_SYSCALL_64_after_hwframe+100
]: 20
@stack[
    release_task+1
    __x64_sys_exit+27
    do_syscall_64+56
    entry_SYSCALL_64_after_hwframe+100
]: 26

>
>    Well, for example, wait_task_zombie() calls release_task()
>    in some special cases.
>
> 2. Is it enough to block the transition until delayed_put_task_struct()?
>
>    I do not fully understand the maze of code. It might still be too
>    early.
>
>    It seems that put_task_struct() can delay the release even more.

After delayed_put_task_struct(), there is still some code that needs
to be executed. It appears that this just reduces the likelihood of
the issue occurring, but does not completely prevent it.

>    Mabye, we should drop the klp reference count when it is final
>    put_task_struct().
>
> 3. Is this worth the effort?

I believe it’s worth the effort. If we can find a solution to mitigate
this limitation, we should definitely try to implement it.

>
>    It is probably a bad idea to livepatch do_exit() in the first place.
>
>    But it is not obvious why it is a problem. It would be nice to at
>    least detect the problem and warn about it.


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] Kernel Crash during replacement of livepatch patching do_exit()
  2025-01-24  3:11             ` Yafang Shao
@ 2025-01-24  7:07               ` Yafang Shao
  0 siblings, 0 replies; 12+ messages in thread
From: Yafang Shao @ 2025-01-24  7:07 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Josh Poimboeuf, jikos, Miroslav Benes, Joe Lawrence,
	live-patching

On Fri, Jan 24, 2025 at 11:11 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Jan 23, 2025 at 11:55 PM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Wed 2025-01-22 16:56:31, Petr Mladek wrote:
> > > Anyway, I am working on a POC which would allow to track
> > > to-be-released processes. It would finish the transition only
> > > when all the to-be-released processes already use the new code.
> > > It won't allow to remove the disabled livepatch prematurely.
> >
> > Here is the POC. I am not sure if it is the right path, see
> > the warning and open questions in the commit message.
> >
> > I am going to wait for some feedback before investing more time
> > into this.
> >
> > The patch is against current Linus' master, aka, v6.13-rc7.
> >
> > From ac7287d99aaeca7a4536e8ade61b9bd0c8ec7fdc Mon Sep 17 00:00:00 2001
> > From: Petr Mladek <pmladek@suse.com>
> > Date: Thu, 23 Jan 2025 09:04:09 +0100
> > Subject: [PATCH] livepatching: Block transition until
> >  delayed_put_task_struct()
> >
> > WARNING: This patch is just a POC. It will blow up the system because
> >         RCU callbacks are handled by softirq context which are handled
> >         by default on exit from IRQ handlers. And it is not allowed
> >         to take sleeping locks here, see the backtrace at the end
> >         of the commit message.
> >
> >         We would need to synchronize the counting of the exiting
> >         processes with the livepatch transition another way.
> >
> >         Hmm, I guess that spin_lock is legal in softirq context.
> >         It might be the easiest approach.
> >
> >         In the worst case, we would need to use a lock less
> >         algorithm which might make things even more complicated.
> >
> > Here is the description of the problem and the solution:
> >
> > The livepatching core code uses for_each_process_thread() cycle for setting
> > and checking the state of processes on the system. It works well as long
> > as the livepatch touches only code which is used only by processes in
> > the task list.
> >
> > The problem starts when the livepatch replaces code which might be
> > used by a process which is not longer in the task list. It is
> > mostly about the processes which are being removed. They
> > disapper from the list here:
> >
> >         + release_task()
> >           + __exit_signal()
> >             + __unhash_process()
> >
> > There are basically two groups of problems:
> >
> > 1. The livepatching core does not longer updates TIF_PATCH_PENDING
> >    and p->patch_state. In this case, the ftrace handler
> >    klp_check_stack_func() might do wrong decision and
> >    use an incompatible variant of the code.
>
> I believe I was able to reproduce the issue while attempting to
> trigger the panic. The warning message is as follows:
>
> The warning occurred at the following location:
>
>  klp_ftrace_handler
>       if (unlikely(func->transition)) {
>           WARN_ON_ONCE(patch_state == KLP_UNDEFINED);
>   }
>
> [58063.291589] livepatch: enabling patch 'livepatch_61_release12'
> [58063.297580] livepatch: 'livepatch_61_release12': starting patching transition
> [58063.323340] ------------[ cut here ]------------
> [58063.323343] WARNING: CPU: 58 PID: 3851051 at
> kernel/livepatch/patch.c:98 klp_ftrace_handler+0x136/0x150
> [58063.323349] Modules linked in: livepatch_61_release12(OK)
> livepatch_61_release6(OK) ebtable_filter ebtables af_packet_diag
> netlink_diag xt_DSCP xt_owner iptable_mangle raw_diag unix_diag
> udp_diag iptable_raw xt_CT tcp_diag inet_diag cls_bpf sch_ingress
> bpf_preload binfmt_misc iptable_filter bpfilter xt_conntrack nf_nat
> nf_conntrack_netlink nfnetlink nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 overlay af_packet bonding tls intel_rapl_msr
> intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
> isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm irqbypass rapl vfat intel_cstate fat iTCO_wdt
> xfs intel_uncore pcspkr ses enclosure input_leds i2c_i801 i2c_smbus
> mei_me lpc_ich ioatdma mei mfd_core intel_pch_thermal dca acpi_ipmi
> wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq acpi_pad
> acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod sg mpt3sas
> raid_class scsi_transport_sas megaraid_sas crct10dif_pclmul
> crc32_pclmul crc32c_intel
> [58063.323402]  polyval_clmulni polyval_generic ghash_clmulni_intel
> sha512_ssse3 aesni_intel nvme crypto_simd cryptd nvme_core t10_pi i40e
> ptp pps_core ahci libahci libata deflate zlib_deflate
> [58063.323413] Unloaded tainted modules:
> livepatch_61_release6(OK):3369 livepatch_61_release12(OK):3370 [last
> unloaded: livepatch_61_release12(OK)]
> [58063.323418] CPU: 58 PID: 3851051 Comm: docker Kdump: loaded
> Tainted: G S         O  K    6.1.52-3
> [58063.323421] Hardware name: Inspur SA5212M5/SA5212M5, BIOS 4.1.20 05/05/2021
> [58063.323423] RIP: 0010:klp_ftrace_handler+0x136/0x150
> [58063.323425] Code: eb b3 0f 1f 44 00 00 eb b5 8b 89 f4 23 00 00 83
> f9 ff 74 16 85 c9 75 89 48 8b 00 48 8d 50 90 48 39 c6 0f 85 79 ff ff
> ff eb 8b <0f> 0b e9 70 ff ff ff e8 ae 24 9c 00 66 66 2e 0f 1f 84 00 00
> 00 00
> [58063.323428] RSP: 0018:ffffa87b2367fbb8 EFLAGS: 00010046
> [58063.323429] RAX: ffff8b0ee59229a0 RBX: ffff8b26fa47b000 RCX: 00000000ffffffff
> [58063.323432] RDX: ffff8b0ee5922930 RSI: ffff8b2d41e53f10 RDI: ffffa87b2367fbd8
> [58063.323433] RBP: ffffa87b2367fbc8 R08: 0000000000000000 R09: fffffffffffffff7
> [58063.323434] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b2520eaf240
> [58063.323435] R13: ffff8b26fa47b000 R14: 000000000002f240 R15: 0000000000000000
> [58063.323436] FS:  0000000000000000(0000) GS:ffff8b2520e80000(0000)
> knlGS:0000000000000000
> [58063.323438] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [58063.323440] CR2: 00007f19d7eaf000 CR3: 000000187a676004 CR4: 00000000007706e0
> [58063.323441] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [58063.323442] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [58063.323444] PKRU: 55555554
> [58063.323445] Call Trace:
> [58063.323445]  <TASK>
> [58063.323449]  ? show_regs.cold+0x1a/0x1f
> [58063.323454]  ? klp_ftrace_handler+0x136/0x150
> [58063.323455]  ? __warn+0x84/0xd0
> [58063.323457]  ? klp_ftrace_handler+0x136/0x150
> [58063.323459]  ? report_bug+0x105/0x180
> [58063.323463]  ? handle_bug+0x40/0x70
> [58063.323467]  ? exc_invalid_op+0x19/0x70
> [58063.323469]  ? asm_exc_invalid_op+0x1b/0x20
> [58063.323474]  ? klp_ftrace_handler+0x136/0x150
> [58063.323476]  ? kmem_cache_free+0x155/0x470
> [58063.323479]  0xffffffffc0876099
> [58063.323495]  ? update_rq_clock+0x5/0x250
> [58063.323498]  update_rq_clock+0x5/0x250
> [58063.323500]  __schedule+0xed/0x8f0
> [58063.323504]  ? update_rq_clock+0x5/0x250
> [58063.323506]  ? __schedule+0xed/0x8f0
> [58063.323508]  ? trace_hardirqs_off+0x36/0xf0
> [58063.323512]  do_task_dead+0x44/0x50
> [58063.323515]  do_exit+0x7cd/0xb90 [livepatch_61_release6]
> [58063.323525]  ? xfs_inode_mark_reclaimable+0x320/0x320 [livepatch_61_release6]
> [58063.323533]  do_group_exit+0x35/0x90
> [58063.323536]  get_signal+0x909/0x950
> [58063.323539]  ? get_futex_key+0xa4/0x4f0
> [58063.323543]  arch_do_signal_or_restart+0x34/0x2a0
> [58063.323547]  exit_to_user_mode_prepare+0x149/0x1b0
> [58063.323551]  syscall_exit_to_user_mode+0x1e/0x50
> [58063.323555]  do_syscall_64+0x48/0x90
> [58063.323557]  entry_SYSCALL_64_after_hwframe+0x64/0xce
> [58063.323560] RIP: 0033:0x5601df8122f3
> [58063.323563] Code: Unable to access opcode bytes at 0x5601df8122c9.
> [58063.323564] RSP: 002b:00007f9ce6ffccc0 EFLAGS: 00000286 ORIG_RAX:
> 00000000000000ca
> [58063.323566] RAX: fffffffffffffe00 RBX: 000000c42053e000 RCX: 00005601df8122f3
> [58063.323567] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000c42053e148
> [58063.323568] RBP: 00007f9ce6ffcd08 R08: 0000000000000000 R09: 0000000000000000
> [58063.323569] R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000000000
> [58063.323570] R13: 0000000000801000 R14: 0000000000000000 R15: 00007f9ce6ffd700
> [58063.323573]  </TASK>
> [58063.323574] ---[ end trace 0000000000000000 ]---
>
> >
> >    This might become a real problem only when the code modifies
> >    the semantic.
> >
> > 2. The livepatching core does not longer check the stack and
> >    could finish the livepatch transition even when these
> >    to-be-removed processes have not been transitioned yet.
> >
> >    This might even cause Oops when the to-be-removed processes
> >    are running a code from a disabled livepatch which might
> >    be removed in the meantime.
> >
> > This patch tries to address the 2nd problem which most likely caused
> > the following crash:
> > [...]
> >
> > This patch tries to avoid the crash by tracking the number of
> > to-be-released processes. They block the current transition
> > until delayed_put_task_struct() is called.
> >
> > It is just a POC. There are many open questions:
> >
> > 1. Does it help at all?
> >
> >    It looks to me that release_task() is always called from another
> >    task. For example, exit_notify() seems to call it for a dead
> >    childern. It is not clear to me whether the released task
> >    is still running do_exit() at this point.
>
> If the task is a thread (but not the thread group leader), it should
> call release_task(), correct? Below is the trace from our production
> server:
>
> $ bpftrace -e 'k:release_task {$tsk=(struct task_struct *)arg0; if
> ($tsk->pid == tid){@stack[kstack()]=count()}}'
> @stack[
>     release_task+1
>     kthread_exit+41
>     kthread+200
>     ret_from_fork+31
> ]: 1
> @stack[
>     release_task+1
>     do_group_exit+53
>     __x64_sys_exit_group+24
>     do_syscall_64+56
>     entry_SYSCALL_64_after_hwframe+100
> ]: 2
> @stack[
>     release_task+1
>     do_group_exit+53
>     get_signal+2313
>     arch_do_signal_or_restart+52
>     exit_to_user_mode_prepare+329
>     syscall_exit_to_user_mode+30
>     do_syscall_64+72
>     entry_SYSCALL_64_after_hwframe+100
> ]: 20
> @stack[
>     release_task+1
>     __x64_sys_exit+27
>     do_syscall_64+56
>     entry_SYSCALL_64_after_hwframe+100
> ]: 26
>
> >
> >    Well, for example, wait_task_zombie() calls release_task()
> >    in some special cases.
> >
> > 2. Is it enough to block the transition until delayed_put_task_struct()?
> >
> >    I do not fully understand the maze of code. It might still be too
> >    early.
> >
> >    It seems that put_task_struct() can delay the release even more.
>
> After delayed_put_task_struct(), there is still some code that needs
> to be executed. It appears that this just reduces the likelihood of
> the issue occurring, but does not completely prevent it.

The last function to be executed is do_task_dead(), and this function
will not return to do_exit() again. What if we define this function as
"__noinline __nopatchable" and perform the final check within it?

The "__noinline" attribute ensures it won’t be inlined into do_exit(),
while "__nopatchable" guarantees that it can't be livepatched.

Something as follows?

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6569,6 +6569,7 @@ void __noreturn do_task_dead(void)
        /* Tell freezer to ignore us: */
        current->flags |= PF_NOFREEZE;

+       klp_put_releasing_task();
        __schedule(SM_NONE);
        BUG();


--
Regards
Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-01-24  7:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-21  9:38 [BUG] Kernel Crash during replacement of livepatch patching do_exit() Yafang Shao
2025-01-22  6:36 ` Yafang Shao
2025-01-22 11:45   ` Petr Mladek
2025-01-22 13:30     ` Yafang Shao
2025-01-22 14:01       ` Yafang Shao
2025-01-22 15:56         ` Petr Mladek
2025-01-23  2:19           ` Yafang Shao
2025-01-23 15:55           ` Petr Mladek
2025-01-24  1:08             ` Josh Poimboeuf
2025-01-24  3:11             ` Yafang Shao
2025-01-24  7:07               ` Yafang Shao
2025-01-22 10:24 ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).