[PATCH] ftrace: Fix softlockup in ftrace_module

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ftrace: Fix softlockup in ftrace_module_enable
@ 2025-09-11 13:33 Vladimir Riabchun
  2025-09-11 13:38 ` Masami Hiramatsu
  2025-09-11 14:41 ` Steven Rostedt
  0 siblings, 2 replies; 6+ messages in thread
From: Vladimir Riabchun @ 2025-09-11 13:33 UTC (permalink / raw)
  To: rostedt; +Cc: mhiramat, mark.rutland, linux-kernel, linux-trace-kernel

A soft lockup was observed when loading amdgpu module,
this is the same issue that was fixed in
commit d0b24b4e91fc ("ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY
kernels") and commit 42ea22e754ba ("ftrace: Add cond_resched() to
ftrace_graph_set_hash()").

Fix it the same way by adding cond_resched() in ftrace_module_enable.

Signed-off-by: Vladimir Riabchun <ferr.lambarginio@gmail.com>
---
 kernel/trace/ftrace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a69067367c29..23c4d37c7bcd 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7526,6 +7526,9 @@ void ftrace_module_enable(struct module *mod)
 
 	do_for_each_ftrace_rec(pg, rec) {
 		int cnt;
+
+		cond_resched();
+
 		/*
 		 * do_for_each_ftrace_rec() is a double loop.
 		 * module text shares the pg. If a record is
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ftrace: Fix softlockup in ftrace_module_enable
  2025-09-11 13:33 [PATCH] ftrace: Fix softlockup in ftrace_module_enable Vladimir Riabchun
@ 2025-09-11 13:38 ` Masami Hiramatsu
  2025-09-11 14:41 ` Steven Rostedt
  1 sibling, 0 replies; 6+ messages in thread
From: Masami Hiramatsu @ 2025-09-11 13:38 UTC (permalink / raw)
  To: Vladimir Riabchun
  Cc: rostedt, mhiramat, mark.rutland, linux-kernel, linux-trace-kernel

On Thu, 11 Sep 2025 15:33:15 +0200
Vladimir Riabchun <ferr.lambarginio@gmail.com> wrote:

> A soft lockup was observed when loading amdgpu module,
> this is the same issue that was fixed in
> commit d0b24b4e91fc ("ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY
> kernels") and commit 42ea22e754ba ("ftrace: Add cond_resched() to
> ftrace_graph_set_hash()").
> 
> Fix it the same way by adding cond_resched() in ftrace_module_enable.
> 

Looks good to me.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> Signed-off-by: Vladimir Riabchun <ferr.lambarginio@gmail.com>
> ---
>  kernel/trace/ftrace.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index a69067367c29..23c4d37c7bcd 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7526,6 +7526,9 @@ void ftrace_module_enable(struct module *mod)
>  
>  	do_for_each_ftrace_rec(pg, rec) {
>  		int cnt;
> +
> +		cond_resched();
> +
>  		/*
>  		 * do_for_each_ftrace_rec() is a double loop.
>  		 * module text shares the pg. If a record is
> -- 
> 2.43.0
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ftrace: Fix softlockup in ftrace_module_enable
  2025-09-11 13:33 [PATCH] ftrace: Fix softlockup in ftrace_module_enable Vladimir Riabchun
  2025-09-11 13:38 ` Masami Hiramatsu
@ 2025-09-11 14:41 ` Steven Rostedt
  2025-09-11 18:16   ` Vladimir Riabchun
  1 sibling, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2025-09-11 14:41 UTC (permalink / raw)
  To: Vladimir Riabchun
  Cc: mhiramat, mark.rutland, linux-kernel, linux-trace-kernel

On Thu, 11 Sep 2025 15:33:15 +0200
Vladimir Riabchun <ferr.lambarginio@gmail.com> wrote:

> A soft lockup was observed when loading amdgpu module,

I'd like to see more about that soft lockup.

> this is the same issue that was fixed in
> commit d0b24b4e91fc ("ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY
> kernels") and commit 42ea22e754ba ("ftrace: Add cond_resched() to
> ftrace_graph_set_hash()").

The above cond_resched() is in the loop of all records that actually look
at all records! And that can be pretty big. On my server, it shows on boot:


[    1.934175] ftrace: allocating 45706 entries in 180 pages
[    1.934177] ftrace: allocated 180 pages with 4 groups

That means the loop will go through 45,706 entries. That's quite a lot and
a cond_resched() makes perfect sense.

> 
> Fix it the same way by adding cond_resched() in ftrace_module_enable.
> 
> Signed-off-by: Vladimir Riabchun <ferr.lambarginio@gmail.com>
> ---
>  kernel/trace/ftrace.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index a69067367c29..23c4d37c7bcd 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7526,6 +7526,9 @@ void ftrace_module_enable(struct module *mod)
>  
>  	do_for_each_ftrace_rec(pg, rec) {
>  		int cnt;
> +
> +		cond_resched();
> +
>  		/*
>  		 * do_for_each_ftrace_rec() is a double loop.
>  		 * module text shares the pg. If a record is

This loop is different. Let me show a bit more context:

	do_for_each_ftrace_rec(pg, rec) {
		int cnt;
		/*
		 * do_for_each_ftrace_rec() is a double loop.
		 * module text shares the pg. If a record is
		 * not part of this module, then skip this pg,
		 * which the "break" will do.
		 */
		if (!within_module(rec->ip, mod))
			break;

See this "if (!within_module(rec->ip, mod))" break?

Look at the dmesg output again, and you'll see "groups" mentioned.

[    1.934177] ftrace: allocated 180 pages with 4 groups

That "4 groups" means there are 4 "page groups". That's the "pg" in the
do_for_each_ftrace_recr() function.

This means in my scenario, it loops 4 times. And then it will loop through
each module.

How big is the amdgpu driver? How many functions does it have?

 # grep amdgpu /sys/kernel/tracing/available_filter_functions | wc -l

And I'm guessing that this is only an issue when ftrace is enabled:

		if (ftrace_start_up && cnt) {
			int failed = __ftrace_replace_code(rec, 1);
			if (failed) {
				ftrace_bug(failed, rec);
				goto out_loop;
			}
		}

As that could slow things down.

If this is all the case, then the cond_resched() should be with the
ftrace_start_up code and not in the open like you have it.

		if (ftrace_start_up && cnt) {
			int failed = __ftrace_replace_code(rec, 1);
			if (failed) {
				ftrace_bug(failed, rec);
				goto out_loop;
			}
+			cond_resched();
		}


-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ftrace: Fix softlockup in ftrace_module_enable
  2025-09-11 14:41 ` Steven Rostedt
@ 2025-09-11 18:16   ` Vladimir Riabchun
  2025-09-11 18:37     ` Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: Vladimir Riabchun @ 2025-09-11 18:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: ferr.lambarginio, linux-kernel, linux-trace-kernel, mark.rutland,
	mhiramat

On Thu, Sep 11, 2025 at 10:41:38AM -0400, Steven Rostedt wrote:
> On Thu, 11 Sep 2025 15:33:15 +0200
> Vladimir Riabchun <ferr.lambarginio@gmail.com> wrote:
> 
> > A soft lockup was observed when loading amdgpu module,
> 
> I'd like to see more about that soft lockup.
Sure, here is a call trace:
[  361.918605] [   T2076] [drm] amdgpu kernel modesetting enabled.
[  361.921884] [   T2076] amdgpu: Virtual CRAT table created for CPU
[  361.923327] [   T2076] amdgpu: Topology: Add CPU node
[  397.857556] [      C1] watchdog: BUG: soft lockup - CPU#1 stuck for 16s! [modprobe:2096]
[  397.857707] [      C1] Modules linked in: amdgpu(+) amdxcp drm_exec gpu_sched drm_buddy drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core i2c_algo_bit video wmi cfg80211 isofs binfmt_misc nls_iso8859_1 i2c_piix4 input_leds joydev serio_raw mac_hid sch_fq_codel dm_multipath efi_pstore nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 psmouse pata_acpi cirrus floppy crypto_simd cryptd [last unloaded: amdgpu]
[  397.858531] [      C1] irq event stamp: 191866
[  397.858587] [      C1] hardirqs last  enabled at (191865): [<ffffffff82d0e47a>] irqentry_exit+0x3a/0xb0
[  397.858727] [      C1] hardirqs last disabled at (191866): [<ffffffff82d0c41f>] sysvec_apic_timer_interrupt+0xf/0xd0
[  397.858863] [      C1] softirqs last  enabled at (191816): [<ffffffff82d37d6c>] __do_softirq+0x44c/0x5e1
[  397.858986] [      C1] softirqs last disabled at (191811): [<ffffffff8117b95e>] __irq_exit_rcu+0x13e/0x160
[  397.859111] [      C1] CPU: 1 PID: 2096 Comm: modprobe Kdump: loaded Not tainted 6.8.0-31-generic #31
[  397.859229] [      C1] Hardware name: Red Hat KVM, BIOS 1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
[  397.859349] [      C1] RIP: 0010:__asan_load4+0x35/0xd0
[  397.859419] [      C1] Code: fc 73 71 eb 0f cc cc cc 48 b8 00 00 00 00 00 00 00 ff eb 0a 48 b8 00 00 00 00 00 80 ff ff 48 39 c7 72 51 48 8d 47 03 48 89 c2 <83> e2 07 48 83 fa 02 76 29 48 b9 00 00 00 00 00 fc ff df 48 c1 e8
[  397.859681] [      C1] RSP: 0018:ffffc9000008f4c8 EFLAGS: 00000206
[  397.859762] [      C1] RAX: ffffffffa2bb5ddb RBX: ffffffffa2bb5de0 RCX: 0000000000000000
[  397.859865] [      C1] RDX: ffffffffa2bb5ddb RSI: ffffffff8130fed5 RDI: ffffffffa2bb5dd8
[  397.859967] [      C1] RBP: ffffc9000008f4c8 R08: 0000000000000000 R09: ffffc9000008f628
[  397.860070] [      C1] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa2c3bf50
[  397.860171] [      C1] R13: 0000000000006b99 R14: ffffffffa22382d8 R15: ffffffffa2cf49a8
[  397.860276] [      C1] FS:  00007fd742dea080(0000) GS:ffff888094080000(0000) knlGS:0000000000000000
[  397.860391] [      C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  397.860475] [      C1] CR2: 00005d011eb97dc0 CR3: 00000000127b8000 CR4: 00000000000006f0
[  397.860597] [      C1] Call Trace:
[  397.860639] [      C1]  <IRQ>
[  397.860678] [      C1]  ? show_regs+0x6d/0x80
[  397.860740] [      C1]  ? watchdog_timer_fn+0x29d/0x360
[  397.860811] [      C1]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  397.860887] [      C1]  ? __hrtimer_run_queues+0x3b3/0x650
[  397.860975] [      C1]  ? __pfx___hrtimer_run_queues+0x10/0x10
[  397.861056] [      C1]  ? kvm_clock_get_cycles+0x18/0x40
[  397.861129] [      C1]  ? ktime_get_update_offsets_now+0xb5/0x180
[  397.861218] [      C1]  ? hrtimer_interrupt+0x1a1/0x360
[  397.861306] [      C1]  ? __sysvec_apic_timer_interrupt+0x91/0x250
[  397.861393] [      C1]  ? sysvec_apic_timer_interrupt+0x8a/0xd0
[  397.861471] [      C1]  </IRQ>
[  397.861521] [      C1]  <TASK>
[  397.861562] [      C1]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  397.861663] [      C1]  ? find_kallsyms_symbol+0x1c5/0x350
[  397.861739] [      C1]  ? __asan_load4+0x35/0xd0
[  397.861806] [      C1]  find_kallsyms_symbol+0x1c5/0x350
[  397.861879] [      C1]  ? mod_hdcp_hdcp2_dp_transition+0x1140/0x1140 [amdgpu]
[  397.864007] [      C1]  ? __pfx_CalculatePrefetchSchedule.isra.0+0x10/0x10 [amdgpu]
[  397.868240] [      C1]  ? __pfx_dc_edid_parser_recv_cea_ack+0x10/0x10 [amdgpu]
[  397.872439] [      C1]  ? __pfx_dc_edid_parser_recv_cea_ack+0x10/0x10 [amdgpu]
[  397.876605] [      C1]  module_address_lookup+0x7f/0xd0
[  397.878686] [      C1]  ? __pfx_dc_edid_parser_recv_cea_ack+0x10/0x10 [amdgpu]
[  397.882809] [      C1]  kallsyms_lookup_buildid+0xf8/0x190
[  397.884811] [      C1]  kallsyms_lookup+0x14/0x30
[  397.886734] [      C1]  test_for_valid_rec+0xcf/0x160
[  397.888597] [      C1]  ? __pfx_test_for_valid_rec+0x10/0x10
[  397.890523] [      C1]  ? debug_lockdep_rcu_enabled+0x23/0x30
[  397.892330] [      C1]  ? ops_references_ip+0xfe/0x1b0
[  397.894117] [      C1]  ? __pfx_dc_edid_parser_recv_cea_ack+0x10/0x10 [amdgpu]
[  397.897959] [      C1]  ftrace_module_enable+0x2f5/0x540
[  397.899745] [      C1]  ? 0xffffffffa0c1a000
[  397.901579] [      C1]  load_module+0xdd6/0x1220
[  397.903325] [      C1]  ? __pfx_load_module+0x10/0x10
[  397.903325] [      C1]  ? __pfx_load_module+0x10/0x10
[  397.905037] [      C1]  ? __kasan_slab_free+0x121/0x1c0
[  397.906719] [      C1]  ? vfree.part.0+0x210/0x680
[  397.908326] [      C1]  ? kfree+0x11a/0x360
[  397.909866] [      C1]  ? vfree.part.0+0x210/0x680
[  397.911365] [      C1]  ? vfree.part.0+0x65/0x680
[  397.912843] [      C1]  init_module_from_file+0xf6/0x180
[  397.914303] [      C1]  ? init_module_from_file+0xf6/0x180
[  397.915772] [      C1]  ? __pfx_init_module_from_file+0x10/0x10
[  397.917231] [      C1]  ? find_held_lock+0x8c/0xb0
[  397.918723] [      C1]  ? do_raw_spin_unlock+0xa0/0x110
[  397.920154] [      C1]  idempotent_init_module+0x206/0x470
[  397.921569] [      C1]  ? __pfx_idempotent_init_module+0x10/0x10
[  397.922961] [      C1]  ? ksys_read+0xf7/0x190
[  397.924336] [      C1]  ? __kasan_check_read+0x11/0x20
[  397.925735] [      C1]  ? __fget_light+0xa0/0xf0
[  397.927118] [      C1]  __x64_sys_finit_module+0x82/0xe0
[  397.928523] [      C1]  x64_sys_call+0x1d6e/0x25c0
[  397.929901] [      C1]  do_syscall_64+0x92/0x190
[  397.931280] [      C1]  ? syscall_exit_to_user_mode+0xcf/0x2c0
[  397.932692] [      C1]  ? do_syscall_64+0x9f/0x190
[  397.934073] [      C1]  ? do_syscall_64+0x9f/0x190
[  397.935431] [      C1]  ? do_syscall_64+0x9f/0x190
[  397.936780] [      C1]  entry_SYSCALL_64_after_hwframe+0x73/0x7b
[  397.938130] [      C1] RIP: 0033:0x7fd74252725d
[  397.939473] [      C1] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b bb 0d 00 f7 d8 64 89 01 48
[  397.942486] [      C1] RSP: 002b:00007ffc1de6bf18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  397.944045] [      C1] RAX: ffffffffffffffda RBX: 00005d53cf14fc50 RCX: 00007fd74252725d
[  397.945617] [      C1] RDX: 0000000000000004 RSI: 00005d53cec28e52 RDI: 0000000000000003
[  397.947206] [      C1] RBP: 00007ffc1de6bfd0 R08: 0000000000000040 R09: 00007ffc1de6bf60
[  397.948794] [      C1] R10: 00007fd742603b20 R11: 0000000000000246 R12: 00005d53cec28e52
[  397.950372] [      C1] R13: 0000000000040000 R14: 00005d53cf14fd80 R15: 0000000000000000
[  397.951975] [      C1]  </TASK>
[  397.953493] [      C1] Kernel panic - not syncing: softlockup: hung tasks
[  397.955095] [      C1] CPU: 1 PID: 2096 Comm: modprobe Kdump: loaded Tainted: G             L     6.8.0-31-generic #31
[  397.956772] [      C1] Hardware name: Red Hat KVM, BIOS 1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
> 
> > this is the same issue that was fixed in
> > commit d0b24b4e91fc ("ftrace: Prevent RCU stall on PREEMPT_VOLUNTARY
> > kernels") and commit 42ea22e754ba ("ftrace: Add cond_resched() to
> > ftrace_graph_set_hash()").
> 
> The above cond_resched() is in the loop of all records that actually look
> at all records! And that can be pretty big. On my server, it shows on boot:
> 
> 
> [    1.934175] ftrace: allocating 45706 entries in 180 pages
> [    1.934177] ftrace: allocated 180 pages with 4 groups
> 
> That means the loop will go through 45,706 entries. That's quite a lot and
> a cond_resched() makes perfect sense.
> 
> > 
> > Fix it the same way by adding cond_resched() in ftrace_module_enable.
> > 
> > Signed-off-by: Vladimir Riabchun <ferr.lambarginio@gmail.com>
> > ---
> >  kernel/trace/ftrace.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index a69067367c29..23c4d37c7bcd 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -7526,6 +7526,9 @@ void ftrace_module_enable(struct module *mod)
> >  
> >  	do_for_each_ftrace_rec(pg, rec) {
> >  		int cnt;
> > +
> > +		cond_resched();
> > +
> >  		/*
> >  		 * do_for_each_ftrace_rec() is a double loop.
> >  		 * module text shares the pg. If a record is
> 
> This loop is different. Let me show a bit more context:
> 
> 	do_for_each_ftrace_rec(pg, rec) {
> 		int cnt;
> 		/*
> 		 * do_for_each_ftrace_rec() is a double loop.
> 		 * module text shares the pg. If a record is
> 		 * not part of this module, then skip this pg,
> 		 * which the "break" will do.
> 		 */
> 		if (!within_module(rec->ip, mod))
> 			break;
> 
> See this "if (!within_module(rec->ip, mod))" break?
> 
> Look at the dmesg output again, and you'll see "groups" mentioned.
> 
> [    1.934177] ftrace: allocated 180 pages with 4 groups
> 
> That "4 groups" means there are 4 "page groups". That's the "pg" in the
> do_for_each_ftrace_recr() function.
> 
> This means in my scenario, it loops 4 times. And then it will loop through
> each module.
> 
> How big is the amdgpu driver? How many functions does it have?
> 
>  # grep amdgpu /sys/kernel/tracing/available_filter_functions | wc -l
14334 functions, hefty one.
> 
> And I'm guessing that this is only an issue when ftrace is enabled:
> 
> 		if (ftrace_start_up && cnt) {
> 			int failed = __ftrace_replace_code(rec, 1);
> 			if (failed) {
> 				ftrace_bug(failed, rec);
> 				goto out_loop;
> 			}
> 		}
> 
> As that could slow things down.
Call trace shows that kernel got stuck in test_for_valid_rec. It calls
kallsyms_lookup, which then calls module_address_lookup,
bpf_address_lookup and ftrace_mod_address_lookup. All of these guys hold
RCU read lock or disable preemption and may consume some time (mostly
because of debug kernel checks, but still there shouldn't be a panic).

Probably it may worth moving cond_resched under within_module check, but
before test_for_valid_rec call.
> 
> If this is all the case, then the cond_resched() should be with the
> ftrace_start_up code and not in the open like you have it.
> 
> 		if (ftrace_start_up && cnt) {
> 			int failed = __ftrace_replace_code(rec, 1);
> 			if (failed) {
> 				ftrace_bug(failed, rec);
> 				goto out_loop;
> 			}
> +			cond_resched();
> 		}
> 
> 
> -- Steve
Thanks for the detailed explanation, I appreciate it!

Best regards,
Riabchun Vladimir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ftrace: Fix softlockup in ftrace_module_enable
  2025-09-11 18:16   ` Vladimir Riabchun
@ 2025-09-11 18:37     ` Steven Rostedt
  2025-09-11 18:42       ` Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2025-09-11 18:37 UTC (permalink / raw)
  To: Vladimir Riabchun
  Cc: 20250911104138.2830f60b, linux-kernel, linux-trace-kernel,
	mark.rutland, mhiramat

On Thu, 11 Sep 2025 20:16:41 +0200
Vladimir Riabchun <ferr.lambarginio@gmail.com> wrote:

> On Thu, Sep 11, 2025 at 10:41:38AM -0400, Steven Rostedt wrote:
> > On Thu, 11 Sep 2025 15:33:15 +0200
> > Vladimir Riabchun <ferr.lambarginio@gmail.com> wrote:
> >   
> > > A soft lockup was observed when loading amdgpu module,  
> > 
> > I'd like to see more about that soft lockup.  
> Sure, here is a call trace:

Thanks, because this helps out a lot in understanding this!


> [  397.861521] [      C1]  <TASK>
> [  397.861562] [      C1]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
> [  397.861663] [      C1]  ? find_kallsyms_symbol+0x1c5/0x350
> [  397.861739] [      C1]  ? __asan_load4+0x35/0xd0
> [  397.861806] [      C1]  find_kallsyms_symbol+0x1c5/0x350
> [  397.861879] [      C1]  ? mod_hdcp_hdcp2_dp_transition+0x1140/0x1140 [amdgpu]
> [  397.864007] [      C1]  ? __pfx_CalculatePrefetchSchedule.isra.0+0x10/0x10 [amdgpu]
> [  397.868240] [      C1]  ? __pfx_dc_edid_parser_recv_cea_ack+0x10/0x10 [amdgpu]
> [  397.872439] [      C1]  ? __pfx_dc_edid_parser_recv_cea_ack+0x10/0x10 [amdgpu]
> [  397.876605] [      C1]  module_address_lookup+0x7f/0xd0
> [  397.878686] [      C1]  ? __pfx_dc_edid_parser_recv_cea_ack+0x10/0x10 [amdgpu]
> [  397.882809] [      C1]  kallsyms_lookup_buildid+0xf8/0x190
> [  397.884811] [      C1]  kallsyms_lookup+0x14/0x30
> [  397.886734] [      C1]  test_for_valid_rec+0xcf/0x160

As you pointed out below, the test_for_valid_rec() looks to be the culprit.
Most likely because of the kallsyms lookup it's doing. Which requires a
search. It's binary, but still quite expensive when executed over 10
thousand times.


> > How big is the amdgpu driver? How many functions does it have?
> > 
> >  # grep amdgpu /sys/kernel/tracing/available_filter_functions | wc -l  
> 14334 functions, hefty one.

And why this is a problem.

> > 
> > And I'm guessing that this is only an issue when ftrace is enabled:
> > 
> > 		if (ftrace_start_up && cnt) {
> > 			int failed = __ftrace_replace_code(rec, 1);
> > 			if (failed) {
> > 				ftrace_bug(failed, rec);
> > 				goto out_loop;
> > 			}
> > 		}
> > 
> > As that could slow things down.  
> Call trace shows that kernel got stuck in test_for_valid_rec. It calls
> kallsyms_lookup, which then calls module_address_lookup,
> bpf_address_lookup and ftrace_mod_address_lookup. All of these guys hold
> RCU read lock or disable preemption and may consume some time (mostly
> because of debug kernel checks, but still there shouldn't be a panic).

Agreed. The kallsyms lookup is the problem
> 
> Probably it may worth moving cond_resched under within_module check, but
> before test_for_valid_rec call.

Hmm, do modules even have weak functions? If they don't then that test is
useless.

-- Steve


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ftrace: Fix softlockup in ftrace_module_enable
  2025-09-11 18:37     ` Steven Rostedt
@ 2025-09-11 18:42       ` Steven Rostedt
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2025-09-11 18:42 UTC (permalink / raw)
  To: Vladimir Riabchun
  Cc: 20250911104138.2830f60b, linux-kernel, linux-trace-kernel,
	mark.rutland, mhiramat

On Thu, 11 Sep 2025 14:37:58 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > Probably it may worth moving cond_resched under within_module check, but
> > before test_for_valid_rec call.  
> 
> Hmm, do modules even have weak functions? If they don't then that test is
> useless.

Doing a quick search, I see that they do :-/

drivers/tty/tty_ioctl.c:__weak int kernel_termios_to_user_termios(struct termios2 __user *u,

Looks like sparc can overwrite it. Thus, I'm guessing a module could add a
function and an arch may make it different. That means we still need this
for now.

Yeah, just send a v2 with the cond_resched() after the check for module.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-11 18:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-11 13:33 [PATCH] ftrace: Fix softlockup in ftrace_module_enable Vladimir Riabchun
2025-09-11 13:38 ` Masami Hiramatsu
2025-09-11 14:41 ` Steven Rostedt
2025-09-11 18:16   ` Vladimir Riabchun
2025-09-11 18:37     ` Steven Rostedt
2025-09-11 18:42       ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).