* [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! @ 2025-02-25 8:05 Ian Kumlien 2025-02-25 10:13 ` Ian Kumlien 0 siblings, 1 reply; 16+ messages in thread From: Ian Kumlien @ 2025-02-25 8:05 UTC (permalink / raw) To: Linux Kernel Network Developers Just had this happen just before be2net initialization... FYI and all that ;) [ 5.220133] ------------[ cut here ]------------ [ 5.220137] Voluntary context switch within RCU read-side critical section! [ 5.220143] WARNING: CPU: 4 PID: 1045 at kernel/rcu/tree_plugin.h:331 rcu_note_context_switch+0x65a/0x6d0 [ 5.220150] Modules linked in: cfg80211 rfkill qrtr nft_masq nft_nat nft_numgen nft_chain_nat nf_nat nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables sunrpc vfat fat ocrdma ib_uverbs ib_core xfs intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt dell_pc intel_pmc_bxt mei_wdt at24 iTCO_vendor_support snd_hda_codec_hdmi snd_hda_scodec_component kvm platform_profile mei_hdcp mei_pxp snd_hda_intel snd_intel_dspcfg dell_wmi dell_smm_hwmon snd_intel_sdw_acpi dell_smbios snd_hda_codec rapl snd_hda_core intel_wmi_thunderbolt dcdbas intel_cstate intel_uncore sparse_keymap wmi_bmof dell_wmi_descriptor i2c_i801 snd_hwdep i2c_smbus snd_pcm snd_timer mei_me be2net e1000e mei snd lpc_ich soundcore sch_fq fuse loop dm_multipath nfnetlink zram lz4hc_compress lz4_compress i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel [ 5.220232] i2c_algo_bit drm_buddy sha512_ssse3 ttm sha256_ssse3 sha1_ssse3 drm_display_helper video cec wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkcs8_key_parser [ 5.220250] Hardware name: Dell Inc. Precision T1700/04JGCK, BIOS A28 05/30/2019 [ 5.220253] RIP: rcu_note_context_switch+0x65a/0x6d0 [ 5.220256] Code: a8 00 00 00 00 0f 85 64 fd ff ff 49 89 8d a8 00 00 00 e9 58 fd ff ff 48 c7 c7 d0 ab e5 87 c6 05 b6 26 a2 02 01 e8 16 1c f2 ff <0f> 0b e9 f1 f9 ff ff 49 83 bd a0 00 00 00 00 75 c2 e9 18 fd ff ff All code ======== 0: a8 00 test $0x0,%al 2: 00 00 add %al,(%rax) 4: 00 0f add %cl,(%rdi) 6: 85 64 fd ff test %esp,-0x1(%rbp,%rdi,8) a: ff 49 89 decl -0x77(%rcx) d: 8d a8 00 00 00 e9 lea -0x17000000(%rax),%ebp 13: 58 pop %rax 14: fd std 15: ff (bad) 16: ff 48 c7 decl -0x39(%rax) 19: c7 (bad) 1a: d0 ab e5 87 c6 05 shrb $1,0x5c687e5(%rbx) 20: b6 26 mov $0x26,%dh 22: a2 02 01 e8 16 1c f2 movabs %al,0xffff21c16e80102 29:* ff 0f <-- trapping instruction 2b: 0b e9 or %ecx,%ebp 2d: f1 int1 2e: f9 stc 2f: ff (bad) 30: ff 49 83 decl -0x7d(%rcx) 33: bd a0 00 00 00 mov $0xa0,%ebp 38: 00 75 c2 add %dh,-0x3e(%rbp) 3b: e9 18 fd ff ff jmp 0xfffffffffffffd58 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: e9 f1 f9 ff ff jmp 0xfffffffffffff9f8 7: 49 83 bd a0 00 00 00 cmpq $0x0,0xa0(%r13) e: 00 f: 75 c2 jne 0xffffffffffffffd3 11: e9 18 fd ff ff jmp 0xfffffffffffffd2e [ 5.220259] RSP: 0018:ffffb28d80ae73c0 EFLAGS: 00010086 [ 5.220262] RAX: 0000000000000000 RBX: ffff8a2c1f3ad380 RCX: 0000000000000027 [ 5.220264] RDX: ffff8a2f0ea21908 RSI: 0000000000000001 RDI: ffff8a2f0ea21900 [ 5.220266] RBP: ffff8a2f0ea38040 R08: 0000000000000000 R09: 0000000000000000 [ 5.220268] R10: 6374697773207478 R11: 0000000000000000 R12: 0000000000000000 [ 5.220269] R13: ffff8a2c1f3ad380 R14: 0000000000000000 R15: ffff8a2c1a200af0 [ 5.220271] FS: 00007ff780c64bc0(0000) GS:ffff8a2f0ea00000(0000) knlGS:0000000000000000 [ 5.220274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.220276] CR2: 000055d7b4ddb208 CR3: 0000000110256001 CR4: 00000000001726f0 [ 5.220278] Call Trace: [ 5.220280] <TASK> [ 5.220282] ? rcu_note_context_switch+0x65a/0x6d0 [ 5.220285] ? __warn.cold+0x93/0xfa [ 5.220288] ? rcu_note_context_switch+0x65a/0x6d0 [ 5.220294] ? report_bug+0xff/0x140 [ 5.220297] ? handle_bug+0x58/0x90 [ 5.220300] ? exc_invalid_op+0x17/0x70 [ 5.220303] ? asm_exc_invalid_op+0x1a/0x20 [ 5.220308] ? rcu_note_context_switch+0x65a/0x6d0 [ 5.220312] __schedule+0xcc/0x14b0 [ 5.220316] ? get_nohz_timer_target+0x2d/0x180 [ 5.220322] ? timerqueue_add+0x71/0xc0 [ 5.220326] ? enqueue_hrtimer+0x42/0xa0 [ 5.220331] schedule+0x27/0xf0 [ 5.220334] schedule_hrtimeout_range_clock+0x100/0x1b0 [ 5.220338] ? __pfx_hrtimer_wakeup+0x10/0x10 [ 5.220342] usleep_range_state+0x65/0x90 WARNING! Cannot find .ko for module be2net, please pass a valid module path [ 5.220347] ? be_mcc_notify_wait+0x6c/0x150 be2net WARNING! Cannot find .ko for module be2net, please pass a valid module path [ 5.220360] be_mcc_notify_wait+0xbe/0x150 be2net WARNING! Cannot find .ko for module be2net, please pass a valid module path [ 5.220371] be_cmd_get_hsw_config+0x16c/0x190 be2net WARNING! Cannot find .ko for module be2net, please pass a valid module path [ 5.220382] be_ndo_bridge_getlink+0xe0/0x100 be2net [ 5.220393] rtnl_bridge_getlink+0x12b/0x1b0 [ 5.220398] ? __pfx_rtnl_bridge_getlink+0x10/0x10 [ 5.220401] rtnl_dumpit+0x80/0xa0 [ 5.220404] netlink_dump+0x13b/0x360 [ 5.220409] __netlink_dump_start+0x1eb/0x310 [ 5.220412] ? __pfx_rtnl_bridge_getlink+0x10/0x10 [ 5.220415] rtnetlink_rcv_msg+0x2da/0x460 [ 5.220418] ? __pfx_rtnl_dumpit+0x10/0x10 [ 5.220421] ? __pfx_rtnl_bridge_getlink+0x10/0x10 [ 5.220424] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 5.220427] netlink_rcv_skb+0x53/0x100 [ 5.220432] netlink_unicast+0x245/0x390 [ 5.220435] netlink_sendmsg+0x21b/0x470 [ 5.220438] __sys_sendto+0x1df/0x1f0 [ 5.220444] __x64_sys_sendto+0x24/0x30 [ 5.220446] do_syscall_64+0x82/0x160 [ 5.220449] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 5.220452] ? netlink_rcv_skb+0x82/0x100 [ 5.220455] ? netlink_unicast+0x24d/0x390 [ 5.220457] ? kmem_cache_free+0x3ee/0x440 [ 5.220461] ? skb_release_data+0x193/0x200 [ 5.220465] ? netlink_unicast+0x24d/0x390 [ 5.220468] ? netlink_sendmsg+0x228/0x470 [ 5.220471] ? __sys_sendto+0x1df/0x1f0 [ 5.220475] ? syscall_exit_to_user_mode+0x10/0x210 [ 5.220478] ? do_syscall_64+0x8e/0x160 [ 5.220480] ? iterate_dir+0x182/0x200 [ 5.220483] ? __x64_sys_getdents64+0xfa/0x130 [ 5.220486] ? __pfx_filldir64+0x10/0x10 [ 5.220489] ? syscall_exit_to_user_mode+0x10/0x210 [ 5.220491] ? do_syscall_64+0x8e/0x160 [ 5.220493] ? syscall_exit_to_user_mode+0x10/0x210 [ 5.220496] ? do_syscall_64+0x8e/0x160 [ 5.220498] ? exc_page_fault+0x7e/0x180 [ 5.220500] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.220504] RIP: 0033:0x7ff7807045b7 [ 5.220516] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 15 9b 0f 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0 All code ======== 0: c7 c0 ff ff ff ff mov $0xffffffff,%eax 6: eb be jmp 0xffffffffffffffc6 8: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) f: 00 00 00 12: 90 nop 13: f3 0f 1e fa endbr64 17: 80 3d 15 9b 0f 00 00 cmpb $0x0,0xf9b15(%rip) # 0xf9b33 1e: 41 89 ca mov %ecx,%r10d 21: 74 10 je 0x33 23: b8 2c 00 00 00 mov $0x2c,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 69 ja 0x9b 32: c3 ret 33: 55 push %rbp 34: 48 89 e5 mov %rsp,%rbp 37: 53 push %rbx 38: 48 83 ec 38 sub $0x38,%rsp 3c: 44 89 4d d0 mov %r9d,-0x30(%rbp) Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 69 ja 0x71 8: c3 ret 9: 55 push %rbp a: 48 89 e5 mov %rsp,%rbp d: 53 push %rbx e: 48 83 ec 38 sub $0x38,%rsp 12: 44 89 4d d0 mov %r9d,-0x30(%rbp) [ 5.220518] RSP: 002b:00007ffc921b4ff8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 5.220522] RAX: ffffffffffffffda RBX: 000055d7b4dacc80 RCX: 00007ff7807045b7 [ 5.220524] RDX: 0000000000000020 RSI: 000055d7b4db7ff0 RDI: 0000000000000003 [ 5.220525] RBP: 00007ffc921b5090 R08: 00007ffc921b5000 R09: 0000000000000080 [ 5.220527] R10: 0000000000000000 R11: 0000000000000202 R12: 000055d7b4ddb350 [ 5.220529] R13: 00007ffc921b50d4 R14: 000055d7b4ddb350 R15: 000055d77d5f8a90 [ 5.220532] </TASK> [ 5.220533] ---[ end trace 0000000000000000 ]--- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-25 8:05 [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! Ian Kumlien @ 2025-02-25 10:13 ` Ian Kumlien 2025-02-26 1:05 ` Jakub Kicinski 0 siblings, 1 reply; 16+ messages in thread From: Ian Kumlien @ 2025-02-25 10:13 UTC (permalink / raw) To: Linux Kernel Network Developers Same thing happens in 6.13.4, FYI [ 5.253286] ------------[ cut here ]------------ [ 5.253291] Voluntary context switch within RCU read-side critical section! [ 5.253296] WARNING: CPU: 7 PID: 1052 at kernel/rcu/tree_plugin.h:331 rcu_note_context_switch+0x66f/0x6d0 [ 5.253304] Modules linked in: cfg80211 rfkill qrtr nft_masq nft_nat sunrpc nft_numgen nft_chain_nat nf_nat nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables vfat fat ocrdma ib_uverbs ib_core xfs snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr snd_hda_scodec_component snd_hda_codec_hdmi intel_rapl_common x86_pkg_temp_thermal snd_hda_intel intel_powerclamp coretemp snd_intel_dspcfg mei_pxp snd_intel_sdw_acpi dell_pc iTCO_wdt platform_profile snd_hda_codec mei_wdt at24 kvm_intel mei_hdcp intel_pmc_bxt iTCO_vendor_support dell_smm_hwmon snd_hda_core dell_wmi kvm snd_hwdep dell_smbios snd_pcm rapl dcdbas sparse_keymap intel_cstate dell_wmi_descriptor intel_uncore intel_wmi_thunderbolt wmi_bmof i2c_i801 i2c_smbus snd_timer mei_me snd e1000e lpc_ich mei be2net soundcore sch_fq fuse loop dm_multipath nfnetlink zram lz4hc_compress lz4_compress i915 crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_buddy crc32c_intel polyval_clmulni ttm polyval_generic [ 5.253388] ghash_clmulni_intel drm_display_helper sha512_ssse3 sha256_ssse3 sha1_ssse3 cec video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkcs8_key_parser [ 5.253405] Hardware name: Dell Inc. Precision T1700/04JGCK, BIOS A28 05/30/2019 [ 5.253407] RIP: rcu_note_context_switch+0x66f/0x6d0 [ 5.253411] Code: a8 00 00 00 00 0f 85 3c fd ff ff 49 89 8d a8 00 00 00 e9 30 fd ff ff 48 c7 c7 30 6f de b7 c6 05 7b 51 96 02 01 e8 61 0e f2 ff <0f> 0b e9 dc f9 ff ff c6 45 11 00 48 8b 75 20 ba 01 00 00 00 48 8b All code ======== 0: a8 00 test $0x0,%al 2: 00 00 add %al,(%rax) 4: 00 0f add %cl,(%rdi) 6: 85 3c fd ff ff 49 89 test %edi,-0x76b60001(,%rdi,8) d: 8d a8 00 00 00 e9 lea -0x17000000(%rax),%ebp 13: 30 fd xor %bh,%ch 15: ff (bad) 16: ff 48 c7 decl -0x39(%rax) 19: c7 (bad) 1a: 30 6f de xor %ch,-0x22(%rdi) 1d: b7 c6 mov $0xc6,%bh 1f: 05 7b 51 96 02 add $0x296517b,%eax 24: 01 e8 add %ebp,%eax 26: 61 (bad) 27: 0e (bad) 28:* f2 ff 0f repnz decl (%rdi) <-- trapping instruction 2b: 0b e9 or %ecx,%ebp 2d: dc f9 fdivr %st,%st(1) 2f: ff (bad) 30: ff c6 inc %esi 32: 45 11 00 adc %r8d,(%r8) 35: 48 8b 75 20 mov 0x20(%rbp),%rsi 39: ba 01 00 00 00 mov $0x1,%edx 3e: 48 rex.W 3f: 8b .byte 0x8b Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: e9 dc f9 ff ff jmp 0xfffffffffffff9e3 7: c6 45 11 00 movb $0x0,0x11(%rbp) b: 48 8b 75 20 mov 0x20(%rbp),%rsi f: ba 01 00 00 00 mov $0x1,%edx 14: 48 rex.W 15: 8b .byte 0x8b [ 5.253413] RSP: 0018:ffffadb040f4b688 EFLAGS: 00010082 [ 5.253416] RAX: 0000000000000000 RBX: ffff957a4d705380 RCX: 0000000000000027 [ 5.253418] RDX: ffff957d4eba1908 RSI: 0000000000000001 RDI: ffff957d4eba1900 [ 5.253420] RBP: ffff957d4ebb7d40 R08: 0000000000000000 R09: 0000000000000000 [ 5.253422] R10: 206c616369746972 R11: 0000000000000000 R12: 0000000000000000 [ 5.253423] R13: ffff957a4d705380 R14: 000000000007a100 R15: ffff957a47400b30 [ 5.253425] FS: 00007f6cc2c0dbc0(0000) GS:ffff957d4eb80000(0000) knlGS:0000000000000000 [ 5.253428] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5.253430] CR2: 0000556e7a98b188 CR3: 00000001210ce006 CR4: 00000000001726f0 [ 5.253432] Call Trace: [ 5.253434] <TASK> [ 5.253435] ? rcu_note_context_switch+0x66f/0x6d0 [ 5.253439] ? __warn.cold+0x93/0xfa [ 5.253443] ? rcu_note_context_switch+0x66f/0x6d0 [ 5.253447] ? report_bug+0xff/0x140 [ 5.253451] ? console_unlock+0x9d/0x140 [ 5.253455] ? handle_bug+0x58/0x90 [ 5.253458] ? exc_invalid_op+0x17/0x70 [ 5.253461] ? asm_exc_invalid_op+0x1a/0x20 [ 5.253466] ? rcu_note_context_switch+0x66f/0x6d0 [ 5.253469] ? rcu_note_context_switch+0x66f/0x6d0 [ 5.253472] ? valid_bridge_getlink_req.constprop.0+0xac/0x1c0 [ 5.253478] __schedule+0xcc/0x14b0 [ 5.253482] ? get_nohz_timer_target+0x2d/0x180 [ 5.253486] ? timerqueue_add+0x71/0xc0 [ 5.253489] ? enqueue_hrtimer+0x42/0xa0 [ 5.253492] schedule+0x27/0xf0 [ 5.253495] usleep_range_state+0xea/0x120 [ 5.253499] ? __pfx_hrtimer_wakeup+0x10/0x10 WARNING! Cannot find .ko for module be2net, please pass a valid module path [ 5.253503] ? be_mcc_notify_wait+0x6c/0x150 be2net WARNING! Cannot find .ko for module be2net, please pass a valid module path [ 5.253516] be_mcc_notify_wait+0xbe/0x150 be2net WARNING! Cannot find .ko for module be2net, please pass a valid module path [ 5.253526] be_cmd_get_hsw_config+0x16c/0x190 be2net WARNING! Cannot find .ko for module be2net, please pass a valid module path [ 5.253537] be_ndo_bridge_getlink+0xe0/0x100 be2net [ 5.253547] rtnl_bridge_getlink+0x12b/0x1b0 [ 5.253551] ? __pfx_rtnl_bridge_getlink+0x10/0x10 [ 5.253555] rtnl_dumpit+0x80/0xa0 [ 5.253558] netlink_dump+0x19c/0x410 [ 5.253561] ? skb_release_data+0x193/0x200 [ 5.253566] __netlink_dump_start+0x1eb/0x310 [ 5.253569] ? __pfx_rtnl_bridge_getlink+0x10/0x10 [ 5.253573] rtnetlink_rcv_msg+0x2da/0x460 [ 5.253576] ? __pfx_rtnl_dumpit+0x10/0x10 [ 5.253579] ? __pfx_rtnl_bridge_getlink+0x10/0x10 [ 5.253582] ? __pfx_rtnetlink_rcv_msg+0x10/0x10 [ 5.253586] netlink_rcv_skb+0x53/0x100 [ 5.253590] netlink_unicast+0x245/0x390 [ 5.253593] netlink_sendmsg+0x21b/0x470 [ 5.253597] __sys_sendto+0x1ef/0x200 [ 5.253602] __x64_sys_sendto+0x24/0x30 [ 5.253605] do_syscall_64+0x82/0x160 [ 5.253609] ? syscall_exit_to_user_mode+0x10/0x210 [ 5.253613] ? do_syscall_64+0x8e/0x160 [ 5.253616] ? atime_needs_update+0xa0/0x120 [ 5.253621] ? touch_atime+0x1e/0x120 [ 5.253624] ? iterate_dir+0x182/0x200 [ 5.253627] ? __x64_sys_getdents64+0xa7/0x120 [ 5.253629] ? __pfx_filldir64+0x10/0x10 [ 5.253632] ? syscall_exit_to_user_mode+0x10/0x210 [ 5.253635] ? do_syscall_64+0x8e/0x160 [ 5.253638] ? do_syscall_64+0x8e/0x160 [ 5.253642] ? do_syscall_64+0x8e/0x160 [ 5.253645] ? do_syscall_64+0x8e/0x160 [ 5.253648] ? do_syscall_64+0x8e/0x160 [ 5.253651] ? exc_page_fault+0x7e/0x180 [ 5.253654] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.253658] RIP: 0033:0x7f6cc34d55b7 [ 5.253669] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 15 9b 0f 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0 All code ======== 0: c7 c0 ff ff ff ff mov $0xffffffff,%eax 6: eb be jmp 0xffffffffffffffc6 8: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) f: 00 00 00 12: 90 nop 13: f3 0f 1e fa endbr64 17: 80 3d 15 9b 0f 00 00 cmpb $0x0,0xf9b15(%rip) # 0xf9b33 1e: 41 89 ca mov %ecx,%r10d 21: 74 10 je 0x33 23: b8 2c 00 00 00 mov $0x2c,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 69 ja 0x9b 32: c3 ret 33: 55 push %rbp 34: 48 89 e5 mov %rsp,%rbp 37: 53 push %rbx 38: 48 83 ec 38 sub $0x38,%rsp 3c: 44 89 4d d0 mov %r9d,-0x30(%rbp) Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 69 ja 0x71 8: c3 ret 9: 55 push %rbp a: 48 89 e5 mov %rsp,%rbp d: 53 push %rbx e: 48 83 ec 38 sub $0x38,%rsp 12: 44 89 4d d0 mov %r9d,-0x30(%rbp) [ 5.253671] RSP: 002b:00007ffc5839a338 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 5.253674] RAX: ffffffffffffffda RBX: 0000556e7a95cc80 RCX: 00007f6cc34d55b7 [ 5.253676] RDX: 0000000000000020 RSI: 0000556e7a9752d0 RDI: 0000000000000003 [ 5.253677] RBP: 00007ffc5839a3d0 R08: 00007ffc5839a340 R09: 0000000000000080 [ 5.253679] R10: 0000000000000000 R11: 0000000000000202 R12: 0000556e7a98b2c0 [ 5.253681] R13: 00007ffc5839a414 R14: 0000556e7a98b2c0 R15: 0000556e448c7a90 [ 5.253684] </TASK> [ 5.253685] ---[ end trace 0000000000000000 ]--- On Tue, Feb 25, 2025 at 9:05 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: > > Just had this happen just before be2net initialization... FYI and all that ;) > [--8<--] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-25 10:13 ` Ian Kumlien @ 2025-02-26 1:05 ` Jakub Kicinski 2025-02-26 9:24 ` Ian Kumlien 0 siblings, 1 reply; 16+ messages in thread From: Jakub Kicinski @ 2025-02-26 1:05 UTC (permalink / raw) To: Ian Kumlien; +Cc: Linux Kernel Network Developers On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: > Same thing happens in 6.13.4, FYI Could you do a minor bisection? Does it not happen with 6.11? Nothing jumps out at quick look. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-26 1:05 ` Jakub Kicinski @ 2025-02-26 9:24 ` Ian Kumlien 2025-02-26 9:55 ` Ian Kumlien 0 siblings, 1 reply; 16+ messages in thread From: Ian Kumlien @ 2025-02-26 9:24 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Linux Kernel Network Developers On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: > > Same thing happens in 6.13.4, FYI > > Could you do a minor bisection? Does it not happen with 6.11? > Nothing jumps out at quick look. I have to admint that i haven't been tracking it too closely until it turned out to be an issue (makes network traffic over wireguard, through that node very slow) But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though (it's a gw to reach a internal server network in the basement, so not the best setup for this) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-26 9:24 ` Ian Kumlien @ 2025-02-26 9:55 ` Ian Kumlien 2025-02-26 10:33 ` Nikolay Aleksandrov 0 siblings, 1 reply; 16+ messages in thread From: Ian Kumlien @ 2025-02-26 9:55 UTC (permalink / raw) To: Jakub Kicinski; +Cc: Linux Kernel Network Developers On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: > > On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: > > > Same thing happens in 6.13.4, FYI > > > > Could you do a minor bisection? Does it not happen with 6.11? > > Nothing jumps out at quick look. > > I have to admint that i haven't been tracking it too closely until it > turned out to be an issue > (makes network traffic over wireguard, through that node very slow) > > But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though > (it's a gw to reach a internal server network in the basement, so not > the best setup for this) Since i'm at work i decided to check if i could find all the boot logs, which is actually done nicely by systemd first known bad: 6.11.7-300.fc41.x86_64 last known ok: 6.11.6-200.fc40.x86_64 Narrows the field for a bisect at least, =) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-26 9:55 ` Ian Kumlien @ 2025-02-26 10:33 ` Nikolay Aleksandrov 2025-02-26 11:52 ` Ian Kumlien 0 siblings, 1 reply; 16+ messages in thread From: Nikolay Aleksandrov @ 2025-02-26 10:33 UTC (permalink / raw) To: Ian Kumlien, Jakub Kicinski; +Cc: Linux Kernel Network Developers On 2/26/25 11:55, Ian Kumlien wrote: > On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: >> >> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: >>> >>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: >>>> Same thing happens in 6.13.4, FYI >>> >>> Could you do a minor bisection? Does it not happen with 6.11? >>> Nothing jumps out at quick look. >> >> I have to admint that i haven't been tracking it too closely until it >> turned out to be an issue >> (makes network traffic over wireguard, through that node very slow) >> >> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though >> (it's a gw to reach a internal server network in the basement, so not >> the best setup for this) > > Since i'm at work i decided to check if i could find all the boot > logs, which is actually done nicely by systemd > first known bad: 6.11.7-300.fc41.x86_64 > last known ok: 6.11.6-200.fc40.x86_64 > > Narrows the field for a bisect at least, =) > Saw bridge, took a look. :) I think there are multiple issues with benet's be_ndo_bridge_getlink() because it calls be_cmd_get_hsw_config() which can sleep in multiple places, e.g. the most obvious is the mutex_lock() in the beginning of be_cmd_get_hsw_config(), then we have the call trace here which is: be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() Maybe you updated some tool that calls down that path along with the kernel and system so you started seeing it in Fedora 41? IMO this has been problematic for a very long time, but obviously it depends on the chip type. Could you share your benet chip type to confirm the path? For the blamed commit I'd go with: commit b71724147e73 Author: Sathya Perla <sathya.perla@broadcom.com> Date: Wed Jul 27 05:26:18 2016 -0400 be2net: replace polling with sleeping in the FW completion path This one changed the udelay() (which is safe) to usleep_range() and the spinlock to a mutex. Cheers, Nik ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-26 10:33 ` Nikolay Aleksandrov @ 2025-02-26 11:52 ` Ian Kumlien 2025-02-26 12:00 ` Nikolay Aleksandrov 0 siblings, 1 reply; 16+ messages in thread From: Ian Kumlien @ 2025-02-26 11:52 UTC (permalink / raw) To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov <razor@blackwall.org> wrote: > > On 2/26/25 11:55, Ian Kumlien wrote: > > On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: > >> > >> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: > >>> > >>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: > >>>> Same thing happens in 6.13.4, FYI > >>> > >>> Could you do a minor bisection? Does it not happen with 6.11? > >>> Nothing jumps out at quick look. > >> > >> I have to admint that i haven't been tracking it too closely until it > >> turned out to be an issue > >> (makes network traffic over wireguard, through that node very slow) > >> > >> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though > >> (it's a gw to reach a internal server network in the basement, so not > >> the best setup for this) > > > > Since i'm at work i decided to check if i could find all the boot > > logs, which is actually done nicely by systemd > > first known bad: 6.11.7-300.fc41.x86_64 > > last known ok: 6.11.6-200.fc40.x86_64 > > > > Narrows the field for a bisect at least, =) > > > > Saw bridge, took a look. :) > > I think there are multiple issues with benet's be_ndo_bridge_getlink() > because it calls be_cmd_get_hsw_config() which can sleep in multiple > places, e.g. the most obvious is the mutex_lock() in the beginning of > be_cmd_get_hsw_config(), then we have the call trace here which is: > be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() > > Maybe you updated some tool that calls down that path along with the kernel and system > so you started seeing it in Fedora 41? Could be but it's pretty barebones > IMO this has been problematic for a very long time, but obviously it depends on the > chip type. Could you share your benet chip type to confirm the path? I don't know how to find the actual chip information but it's identified as: Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) > For the blamed commit I'd go with: > commit b71724147e73 > Author: Sathya Perla <sathya.perla@broadcom.com> > Date: Wed Jul 27 05:26:18 2016 -0400 > > be2net: replace polling with sleeping in the FW completion path > > This one changed the udelay() (which is safe) to usleep_range() and the spinlock > to a mutex. So, first try will be to try without that patch then, =) > Cheers, > Nik > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-26 11:52 ` Ian Kumlien @ 2025-02-26 12:00 ` Nikolay Aleksandrov 2025-02-26 12:26 ` Ian Kumlien 0 siblings, 1 reply; 16+ messages in thread From: Nikolay Aleksandrov @ 2025-02-26 12:00 UTC (permalink / raw) To: Ian Kumlien; +Cc: Jakub Kicinski, Linux Kernel Network Developers On 2/26/25 13:52, Ian Kumlien wrote: > On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov > <razor@blackwall.org> wrote: >> >> On 2/26/25 11:55, Ian Kumlien wrote: >>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: >>>> >>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: >>>>> >>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: >>>>>> Same thing happens in 6.13.4, FYI >>>>> >>>>> Could you do a minor bisection? Does it not happen with 6.11? >>>>> Nothing jumps out at quick look. >>>> >>>> I have to admint that i haven't been tracking it too closely until it >>>> turned out to be an issue >>>> (makes network traffic over wireguard, through that node very slow) >>>> >>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though >>>> (it's a gw to reach a internal server network in the basement, so not >>>> the best setup for this) >>> >>> Since i'm at work i decided to check if i could find all the boot >>> logs, which is actually done nicely by systemd >>> first known bad: 6.11.7-300.fc41.x86_64 >>> last known ok: 6.11.6-200.fc40.x86_64 >>> >>> Narrows the field for a bisect at least, =) >>> >> >> Saw bridge, took a look. :) >> >> I think there are multiple issues with benet's be_ndo_bridge_getlink() >> because it calls be_cmd_get_hsw_config() which can sleep in multiple >> places, e.g. the most obvious is the mutex_lock() in the beginning of >> be_cmd_get_hsw_config(), then we have the call trace here which is: >> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() >> >> Maybe you updated some tool that calls down that path along with the kernel and system >> so you started seeing it in Fedora 41? > > Could be but it's pretty barebones > >> IMO this has been problematic for a very long time, but obviously it depends on the >> chip type. Could you share your benet chip type to confirm the path? > > I don't know how to find the actual chip information but it's identified as: > Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) > Good, that confirms it. The skyhawk chip falls in the "else" of the block in be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config(). >> For the blamed commit I'd go with: >> commit b71724147e73 >> Author: Sathya Perla <sathya.perla@broadcom.com> >> Date: Wed Jul 27 05:26:18 2016 -0400 >> >> be2net: replace polling with sleeping in the FW completion path >> >> This one changed the udelay() (which is safe) to usleep_range() and the spinlock >> to a mutex. > > So, first try will be to try without that patch then, =) > That would be a good try, yes. It is not a straight-forward revert though since a lot of changes have happened since that commit. Let me know if you need help with that, I can prepare the revert to test. >> Cheers, >> Nik >> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-26 12:00 ` Nikolay Aleksandrov @ 2025-02-26 12:26 ` Ian Kumlien 2025-02-26 13:11 ` Nikolay Aleksandrov 0 siblings, 1 reply; 16+ messages in thread From: Ian Kumlien @ 2025-02-26 12:26 UTC (permalink / raw) To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > > On 2/26/25 13:52, Ian Kumlien wrote: > > On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov > > <razor@blackwall.org> wrote: > >> > >> On 2/26/25 11:55, Ian Kumlien wrote: > >>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: > >>>> > >>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: > >>>>> > >>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: > >>>>>> Same thing happens in 6.13.4, FYI > >>>>> > >>>>> Could you do a minor bisection? Does it not happen with 6.11? > >>>>> Nothing jumps out at quick look. > >>>> > >>>> I have to admint that i haven't been tracking it too closely until it > >>>> turned out to be an issue > >>>> (makes network traffic over wireguard, through that node very slow) > >>>> > >>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though > >>>> (it's a gw to reach a internal server network in the basement, so not > >>>> the best setup for this) > >>> > >>> Since i'm at work i decided to check if i could find all the boot > >>> logs, which is actually done nicely by systemd > >>> first known bad: 6.11.7-300.fc41.x86_64 > >>> last known ok: 6.11.6-200.fc40.x86_64 > >>> > >>> Narrows the field for a bisect at least, =) > >>> > >> > >> Saw bridge, took a look. :) > >> > >> I think there are multiple issues with benet's be_ndo_bridge_getlink() > >> because it calls be_cmd_get_hsw_config() which can sleep in multiple > >> places, e.g. the most obvious is the mutex_lock() in the beginning of > >> be_cmd_get_hsw_config(), then we have the call trace here which is: > >> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() > >> > >> Maybe you updated some tool that calls down that path along with the kernel and system > >> so you started seeing it in Fedora 41? > > > > Could be but it's pretty barebones > > > >> IMO this has been problematic for a very long time, but obviously it depends on the > >> chip type. Could you share your benet chip type to confirm the path? > > > > I don't know how to find the actual chip information but it's identified as: > > Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) > > > > Good, that confirms it. The skyhawk chip falls in the "else" of the block in > be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config(). > > >> For the blamed commit I'd go with: > >> commit b71724147e73 > >> Author: Sathya Perla <sathya.perla@broadcom.com> > >> Date: Wed Jul 27 05:26:18 2016 -0400 > >> > >> be2net: replace polling with sleeping in the FW completion path > >> > >> This one changed the udelay() (which is safe) to usleep_range() and the spinlock > >> to a mutex. > > > > So, first try will be to try without that patch then, =) > > > > That would be a good try, yes. It is not a straight-forward revert though since a lot > of changes have happened since that commit. Let me know if you need help with that, > I can prepare the revert to test. Yeah, looked at the size of it and... well... I dunno if i'd have the time =) > >> Cheers, > >> Nik > >> > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-26 12:26 ` Ian Kumlien @ 2025-02-26 13:11 ` Nikolay Aleksandrov 2025-02-26 22:28 ` Ian Kumlien 0 siblings, 1 reply; 16+ messages in thread From: Nikolay Aleksandrov @ 2025-02-26 13:11 UTC (permalink / raw) To: Ian Kumlien; +Cc: Jakub Kicinski, Linux Kernel Network Developers, Sathya Perla [-- Attachment #1: Type: text/plain, Size: 3551 bytes --] On 2/26/25 14:26, Ian Kumlien wrote: > On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: >> >> On 2/26/25 13:52, Ian Kumlien wrote: >>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov >>> <razor@blackwall.org> wrote: >>>> >>>> On 2/26/25 11:55, Ian Kumlien wrote: >>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: >>>>>> >>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: >>>>>>> >>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: >>>>>>>> Same thing happens in 6.13.4, FYI >>>>>>> >>>>>>> Could you do a minor bisection? Does it not happen with 6.11? >>>>>>> Nothing jumps out at quick look. >>>>>> >>>>>> I have to admint that i haven't been tracking it too closely until it >>>>>> turned out to be an issue >>>>>> (makes network traffic over wireguard, through that node very slow) >>>>>> >>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though >>>>>> (it's a gw to reach a internal server network in the basement, so not >>>>>> the best setup for this) >>>>> >>>>> Since i'm at work i decided to check if i could find all the boot >>>>> logs, which is actually done nicely by systemd >>>>> first known bad: 6.11.7-300.fc41.x86_64 >>>>> last known ok: 6.11.6-200.fc40.x86_64 >>>>> >>>>> Narrows the field for a bisect at least, =) >>>>> >>>> >>>> Saw bridge, took a look. :) >>>> >>>> I think there are multiple issues with benet's be_ndo_bridge_getlink() >>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple >>>> places, e.g. the most obvious is the mutex_lock() in the beginning of >>>> be_cmd_get_hsw_config(), then we have the call trace here which is: >>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() >>>> >>>> Maybe you updated some tool that calls down that path along with the kernel and system >>>> so you started seeing it in Fedora 41? >>> >>> Could be but it's pretty barebones >>> >>>> IMO this has been problematic for a very long time, but obviously it depends on the >>>> chip type. Could you share your benet chip type to confirm the path? >>> >>> I don't know how to find the actual chip information but it's identified as: >>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) >>> >> >> Good, that confirms it. The skyhawk chip falls in the "else" of the block in >> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config(). >> >>>> For the blamed commit I'd go with: >>>> commit b71724147e73 >>>> Author: Sathya Perla <sathya.perla@broadcom.com> >>>> Date: Wed Jul 27 05:26:18 2016 -0400 >>>> >>>> be2net: replace polling with sleeping in the FW completion path >>>> >>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock >>>> to a mutex. >>> >>> So, first try will be to try without that patch then, =) >>> >> >> That would be a good try, yes. It is not a straight-forward revert though since a lot >> of changes have happened since that commit. Let me know if you need help with that, >> I can prepare the revert to test. > > Yeah, looked at the size of it and... well... I dunno if i'd have the time =) > Can you try the attached patch? It is on top of net-next (but also applies to Linus' tree): git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay) because the commit does many more things. Also +CC original patch author which I forgot to do. Thanks, Nik [-- Attachment #2: 0001-benet-fix.patch --] [-- Type: text/x-patch, Size: 26486 bytes --] From 03517db970bea41e625c84fcff9263bae8ab679b Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov <razor@blackwall.org> Date: Wed, 26 Feb 2025 15:05:48 +0200 Subject: [PATCH] benet fix Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> --- drivers/net/ethernet/emulex/benet/be.h | 2 +- drivers/net/ethernet/emulex/benet/be_cmds.c | 197 ++++++++++---------- drivers/net/ethernet/emulex/benet/be_main.c | 2 +- 3 files changed, 100 insertions(+), 101 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index e48b861e4ce1..270ff9aab335 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -562,7 +562,7 @@ struct be_adapter { struct be_dma_mem mbox_mem_alloced; struct be_mcc_obj mcc_obj; - struct mutex mcc_lock; /* For serializing mcc cmds to BE card */ + spinlock_t mcc_lock; /* For serializing mcc cmds to BE card */ spinlock_t mcc_cq_lock; u16 cfg_num_rx_irqs; /* configured via set-channels */ diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c index 61adcebeef01..845320334f1d 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.c +++ b/drivers/net/ethernet/emulex/benet/be_cmds.c @@ -575,7 +575,7 @@ int be_process_mcc(struct be_adapter *adapter) /* Wait till no more pending mcc requests are present */ static int be_mcc_wait_compl(struct be_adapter *adapter) { -#define mcc_timeout 12000 /* 12s timeout */ +#define mcc_timeout 120000 /* 12s timeout */ int i, status = 0; struct be_mcc_obj *mcc_obj = &adapter->mcc_obj; @@ -589,7 +589,7 @@ static int be_mcc_wait_compl(struct be_adapter *adapter) if (atomic_read(&mcc_obj->q.used) == 0) break; - usleep_range(500, 1000); + udelay(100); } if (i == mcc_timeout) { dev_err(&adapter->pdev->dev, "FW not responding\n"); @@ -866,7 +866,7 @@ static bool use_mcc(struct be_adapter *adapter) static int be_cmd_lock(struct be_adapter *adapter) { if (use_mcc(adapter)) { - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); return 0; } else { return mutex_lock_interruptible(&adapter->mbox_lock); @@ -877,7 +877,7 @@ static int be_cmd_lock(struct be_adapter *adapter) static void be_cmd_unlock(struct be_adapter *adapter) { if (use_mcc(adapter)) - return mutex_unlock(&adapter->mcc_lock); + return spin_unlock_bh(&adapter->mcc_lock); else return mutex_unlock(&adapter->mbox_lock); } @@ -1047,7 +1047,7 @@ int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr, struct be_cmd_req_mac_query *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1076,7 +1076,7 @@ int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr, } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1088,7 +1088,7 @@ int be_cmd_pmac_add(struct be_adapter *adapter, const u8 *mac_addr, struct be_cmd_req_pmac_add *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1113,7 +1113,7 @@ int be_cmd_pmac_add(struct be_adapter *adapter, const u8 *mac_addr, } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); if (base_status(status) == MCC_STATUS_UNAUTHORIZED_REQUEST) status = -EPERM; @@ -1131,7 +1131,7 @@ int be_cmd_pmac_del(struct be_adapter *adapter, u32 if_id, int pmac_id, u32 dom) if (pmac_id == -1) return 0; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1151,7 +1151,7 @@ int be_cmd_pmac_del(struct be_adapter *adapter, u32 if_id, int pmac_id, u32 dom) status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1414,7 +1414,7 @@ int be_cmd_rxq_create(struct be_adapter *adapter, struct be_dma_mem *q_mem = &rxq->dma_mem; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1444,7 +1444,7 @@ int be_cmd_rxq_create(struct be_adapter *adapter, } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1508,7 +1508,7 @@ int be_cmd_rxq_destroy(struct be_adapter *adapter, struct be_queue_info *q) struct be_cmd_req_q_destroy *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1525,7 +1525,7 @@ int be_cmd_rxq_destroy(struct be_adapter *adapter, struct be_queue_info *q) q->created = false; err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1593,7 +1593,7 @@ int be_cmd_get_stats(struct be_adapter *adapter, struct be_dma_mem *nonemb_cmd) struct be_cmd_req_hdr *hdr; int status = 0; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1621,7 +1621,7 @@ int be_cmd_get_stats(struct be_adapter *adapter, struct be_dma_mem *nonemb_cmd) adapter->stats_cmd_sent = true; err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1637,7 +1637,7 @@ int lancer_cmd_get_pport_stats(struct be_adapter *adapter, CMD_SUBSYSTEM_ETH)) return -EPERM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1660,7 +1660,7 @@ int lancer_cmd_get_pport_stats(struct be_adapter *adapter, adapter->stats_cmd_sent = true; err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1697,7 +1697,7 @@ int be_cmd_link_status_query(struct be_adapter *adapter, u16 *link_speed, struct be_cmd_req_link_status *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); if (link_status) *link_status = LINK_DOWN; @@ -1736,7 +1736,7 @@ int be_cmd_link_status_query(struct be_adapter *adapter, u16 *link_speed, } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1747,7 +1747,7 @@ int be_cmd_get_die_temperature(struct be_adapter *adapter) struct be_cmd_req_get_cntl_addnl_attribs *req; int status = 0; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1762,7 +1762,7 @@ int be_cmd_get_die_temperature(struct be_adapter *adapter) status = be_mcc_notify(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1811,7 +1811,7 @@ int be_cmd_get_fat_dump(struct be_adapter *adapter, u32 buf_len, void *buf) if (!get_fat_cmd.va) return -ENOMEM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); while (total_size) { buf_size = min(total_size, (u32)60 * 1024); @@ -1851,7 +1851,7 @@ int be_cmd_get_fat_dump(struct be_adapter *adapter, u32 buf_len, void *buf) err: dma_free_coherent(&adapter->pdev->dev, get_fat_cmd.size, get_fat_cmd.va, get_fat_cmd.dma); - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1862,7 +1862,7 @@ int be_cmd_get_fw_ver(struct be_adapter *adapter) struct be_cmd_req_get_fw_version *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1885,7 +1885,7 @@ int be_cmd_get_fw_ver(struct be_adapter *adapter) sizeof(adapter->fw_on_flash)); } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1899,7 +1899,7 @@ static int __be_cmd_modify_eqd(struct be_adapter *adapter, struct be_cmd_req_modify_eq_delay *req; int status = 0, i; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1922,7 +1922,7 @@ static int __be_cmd_modify_eqd(struct be_adapter *adapter, status = be_mcc_notify(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1949,7 +1949,7 @@ int be_cmd_vlan_config(struct be_adapter *adapter, u32 if_id, u16 *vtag_array, struct be_cmd_req_vlan_config *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -1971,7 +1971,7 @@ int be_cmd_vlan_config(struct be_adapter *adapter, u32 if_id, u16 *vtag_array, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -1982,7 +1982,7 @@ static int __be_cmd_rx_filter(struct be_adapter *adapter, u32 flags, u32 value) struct be_cmd_req_rx_filter *req = mem->va; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2015,7 +2015,7 @@ static int __be_cmd_rx_filter(struct be_adapter *adapter, u32 flags, u32 value) status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -2046,7 +2046,7 @@ int be_cmd_set_flow_control(struct be_adapter *adapter, u32 tx_fc, u32 rx_fc) CMD_SUBSYSTEM_COMMON)) return -EPERM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2066,7 +2066,7 @@ int be_cmd_set_flow_control(struct be_adapter *adapter, u32 tx_fc, u32 rx_fc) status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); if (base_status(status) == MCC_STATUS_FEATURE_NOT_SUPPORTED) return -EOPNOTSUPP; @@ -2085,7 +2085,7 @@ int be_cmd_get_flow_control(struct be_adapter *adapter, u32 *tx_fc, u32 *rx_fc) CMD_SUBSYSTEM_COMMON)) return -EPERM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2108,7 +2108,7 @@ int be_cmd_get_flow_control(struct be_adapter *adapter, u32 *tx_fc, u32 *rx_fc) } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -2189,7 +2189,7 @@ int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable, if (!(be_if_cap_flags(adapter) & BE_IF_FLAGS_RSS)) return 0; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2214,7 +2214,7 @@ int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -2226,7 +2226,7 @@ int be_cmd_set_beacon_state(struct be_adapter *adapter, u8 port_num, struct be_cmd_req_enable_disable_beacon *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2247,7 +2247,7 @@ int be_cmd_set_beacon_state(struct be_adapter *adapter, u8 port_num, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -2258,7 +2258,7 @@ int be_cmd_get_beacon_state(struct be_adapter *adapter, u8 port_num, u32 *state) struct be_cmd_req_get_beacon_state *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2282,7 +2282,7 @@ int be_cmd_get_beacon_state(struct be_adapter *adapter, u8 port_num, u32 *state) } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -2306,7 +2306,7 @@ int be_cmd_read_port_transceiver_data(struct be_adapter *adapter, return -ENOMEM; } - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2328,7 +2328,7 @@ int be_cmd_read_port_transceiver_data(struct be_adapter *adapter, memcpy(data, resp->page_data + off, len); } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); dma_free_coherent(&adapter->pdev->dev, cmd.size, cmd.va, cmd.dma); return status; } @@ -2345,7 +2345,7 @@ static int lancer_cmd_write_object(struct be_adapter *adapter, void *ctxt = NULL; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); adapter->flash_status = 0; wrb = wrb_from_mccq(adapter); @@ -2387,7 +2387,7 @@ static int lancer_cmd_write_object(struct be_adapter *adapter, if (status) goto err_unlock; - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); if (!wait_for_completion_timeout(&adapter->et_cmd_compl, msecs_to_jiffies(60000))) @@ -2406,7 +2406,7 @@ static int lancer_cmd_write_object(struct be_adapter *adapter, return status; err_unlock: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -2460,7 +2460,7 @@ static int lancer_cmd_delete_object(struct be_adapter *adapter, struct be_mcc_wrb *wrb; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2478,7 +2478,7 @@ static int lancer_cmd_delete_object(struct be_adapter *adapter, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -2491,7 +2491,7 @@ int lancer_cmd_read_object(struct be_adapter *adapter, struct be_dma_mem *cmd, struct lancer_cmd_resp_read_object *resp; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2525,7 +2525,7 @@ int lancer_cmd_read_object(struct be_adapter *adapter, struct be_dma_mem *cmd, } err_unlock: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -2537,7 +2537,7 @@ static int be_cmd_write_flashrom(struct be_adapter *adapter, struct be_cmd_write_flashrom *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); adapter->flash_status = 0; wrb = wrb_from_mccq(adapter); @@ -2562,7 +2562,7 @@ static int be_cmd_write_flashrom(struct be_adapter *adapter, if (status) goto err_unlock; - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); if (!wait_for_completion_timeout(&adapter->et_cmd_compl, msecs_to_jiffies(40000))) @@ -2573,7 +2573,7 @@ static int be_cmd_write_flashrom(struct be_adapter *adapter, return status; err_unlock: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -2584,7 +2584,7 @@ static int be_cmd_get_flash_crc(struct be_adapter *adapter, u8 *flashed_crc, struct be_mcc_wrb *wrb; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -2611,7 +2611,7 @@ static int be_cmd_get_flash_crc(struct be_adapter *adapter, u8 *flashed_crc, memcpy(flashed_crc, req->crc, 4); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3217,7 +3217,7 @@ int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac, struct be_cmd_req_acpi_wol_magic_config *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3234,7 +3234,7 @@ int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3249,7 +3249,7 @@ int be_cmd_set_loopback(struct be_adapter *adapter, u8 port_num, CMD_SUBSYSTEM_LOWLEVEL)) return -EPERM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3272,7 +3272,7 @@ int be_cmd_set_loopback(struct be_adapter *adapter, u8 port_num, if (status) goto err_unlock; - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); if (!wait_for_completion_timeout(&adapter->et_cmd_compl, msecs_to_jiffies(SET_LB_MODE_TIMEOUT))) @@ -3281,7 +3281,7 @@ int be_cmd_set_loopback(struct be_adapter *adapter, u8 port_num, return status; err_unlock: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3298,7 +3298,7 @@ int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num, CMD_SUBSYSTEM_LOWLEVEL)) return -EPERM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3324,7 +3324,7 @@ int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num, if (status) goto err; - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); wait_for_completion(&adapter->et_cmd_compl); resp = embedded_payload(wrb); @@ -3332,7 +3332,7 @@ int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num, return status; err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3348,7 +3348,7 @@ int be_cmd_ddr_dma_test(struct be_adapter *adapter, u64 pattern, CMD_SUBSYSTEM_LOWLEVEL)) return -EPERM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3382,7 +3382,7 @@ int be_cmd_ddr_dma_test(struct be_adapter *adapter, u64 pattern, } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3393,7 +3393,7 @@ int be_cmd_get_seeprom_data(struct be_adapter *adapter, struct be_cmd_req_seeprom_read *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3409,7 +3409,7 @@ int be_cmd_get_seeprom_data(struct be_adapter *adapter, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3424,7 +3424,7 @@ int be_cmd_get_phy_info(struct be_adapter *adapter) CMD_SUBSYSTEM_COMMON)) return -EPERM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3469,7 +3469,7 @@ int be_cmd_get_phy_info(struct be_adapter *adapter) } dma_free_coherent(&adapter->pdev->dev, cmd.size, cmd.va, cmd.dma); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3479,7 +3479,7 @@ static int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain) struct be_cmd_req_set_qos *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3499,7 +3499,7 @@ static int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain) status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3611,7 +3611,7 @@ int be_cmd_get_fn_privileges(struct be_adapter *adapter, u32 *privilege, struct be_cmd_req_get_fn_privileges *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3643,7 +3643,7 @@ int be_cmd_get_fn_privileges(struct be_adapter *adapter, u32 *privilege, } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3655,7 +3655,7 @@ int be_cmd_set_fn_privileges(struct be_adapter *adapter, u32 privileges, struct be_cmd_req_set_fn_privileges *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3675,7 +3675,7 @@ int be_cmd_set_fn_privileges(struct be_adapter *adapter, u32 privileges, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3707,7 +3707,7 @@ int be_cmd_get_mac_from_list(struct be_adapter *adapter, u8 *mac, return -ENOMEM; } - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3771,7 +3771,7 @@ int be_cmd_get_mac_from_list(struct be_adapter *adapter, u8 *mac, } out: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); dma_free_coherent(&adapter->pdev->dev, get_mac_list_cmd.size, get_mac_list_cmd.va, get_mac_list_cmd.dma); return status; @@ -3831,7 +3831,7 @@ int be_cmd_set_mac_list(struct be_adapter *adapter, u8 *mac_array, if (!cmd.va) return -ENOMEM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3853,7 +3853,7 @@ int be_cmd_set_mac_list(struct be_adapter *adapter, u8 *mac_array, err: dma_free_coherent(&adapter->pdev->dev, cmd.size, cmd.va, cmd.dma); - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3889,7 +3889,7 @@ int be_cmd_set_hsw_config(struct be_adapter *adapter, u16 pvid, CMD_SUBSYSTEM_COMMON)) return -EPERM; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3930,7 +3930,7 @@ int be_cmd_set_hsw_config(struct be_adapter *adapter, u16 pvid, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -3944,7 +3944,7 @@ int be_cmd_get_hsw_config(struct be_adapter *adapter, u16 *pvid, int status; u16 vid; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -3991,7 +3991,7 @@ int be_cmd_get_hsw_config(struct be_adapter *adapter, u16 *pvid, } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -4190,7 +4190,7 @@ int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter, struct be_cmd_req_set_ext_fat_caps *req; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -4206,7 +4206,7 @@ int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -4684,7 +4684,7 @@ int be_cmd_manage_iface(struct be_adapter *adapter, u32 iface, u8 op) if (iface == 0xFFFFFFFF) return -1; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -4701,7 +4701,7 @@ int be_cmd_manage_iface(struct be_adapter *adapter, u32 iface, u8 op) status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -4735,7 +4735,7 @@ int be_cmd_get_if_id(struct be_adapter *adapter, struct be_vf_cfg *vf_cfg, struct be_cmd_resp_get_iface_list *resp; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -4756,7 +4756,7 @@ int be_cmd_get_if_id(struct be_adapter *adapter, struct be_vf_cfg *vf_cfg, } err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -4850,7 +4850,7 @@ int be_cmd_enable_vf(struct be_adapter *adapter, u8 domain) if (BEx_chip(adapter)) return 0; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -4868,7 +4868,7 @@ int be_cmd_enable_vf(struct be_adapter *adapter, u8 domain) req->enable = 1; status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -4941,7 +4941,7 @@ __be_cmd_set_logical_link_config(struct be_adapter *adapter, u32 link_config = 0; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -4969,7 +4969,7 @@ __be_cmd_set_logical_link_config(struct be_adapter *adapter, status = be_mcc_notify_wait(adapter); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -5000,8 +5000,7 @@ int be_cmd_set_features(struct be_adapter *adapter) struct be_mcc_wrb *wrb; int status; - if (mutex_lock_interruptible(&adapter->mcc_lock)) - return -1; + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -5039,7 +5038,7 @@ int be_cmd_set_features(struct be_adapter *adapter) dev_info(&adapter->pdev->dev, "Adapter does not support HW error recovery\n"); - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } @@ -5053,7 +5052,7 @@ int be_roce_mcc_cmd(void *netdev_handle, void *wrb_payload, struct be_cmd_resp_hdr *resp; int status; - mutex_lock(&adapter->mcc_lock); + spin_lock_bh(&adapter->mcc_lock); wrb = wrb_from_mccq(adapter); if (!wrb) { @@ -5076,7 +5075,7 @@ int be_roce_mcc_cmd(void *netdev_handle, void *wrb_payload, memcpy(wrb_payload, resp, sizeof(*resp) + resp->response_length); be_dws_le_to_cpu(wrb_payload, sizeof(*resp) + resp->response_length); err: - mutex_unlock(&adapter->mcc_lock); + spin_unlock_bh(&adapter->mcc_lock); return status; } EXPORT_SYMBOL(be_roce_mcc_cmd); diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 875fe379eea2..3d2e21592119 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -5667,8 +5667,8 @@ static int be_drv_init(struct be_adapter *adapter) } mutex_init(&adapter->mbox_lock); - mutex_init(&adapter->mcc_lock); mutex_init(&adapter->rx_filter_lock); + spin_lock_init(&adapter->mcc_lock); spin_lock_init(&adapter->mcc_cq_lock); init_completion(&adapter->et_cmd_compl); -- 2.48.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-26 13:11 ` Nikolay Aleksandrov @ 2025-02-26 22:28 ` Ian Kumlien 2025-02-27 14:31 ` Ian Kumlien 0 siblings, 1 reply; 16+ messages in thread From: Ian Kumlien @ 2025-02-26 22:28 UTC (permalink / raw) To: Nikolay Aleksandrov Cc: Jakub Kicinski, Linux Kernel Network Developers, Sathya Perla On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > > On 2/26/25 14:26, Ian Kumlien wrote: > > On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > >> > >> On 2/26/25 13:52, Ian Kumlien wrote: > >>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov > >>> <razor@blackwall.org> wrote: > >>>> > >>>> On 2/26/25 11:55, Ian Kumlien wrote: > >>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: > >>>>>> > >>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: > >>>>>>> > >>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: > >>>>>>>> Same thing happens in 6.13.4, FYI > >>>>>>> > >>>>>>> Could you do a minor bisection? Does it not happen with 6.11? > >>>>>>> Nothing jumps out at quick look. > >>>>>> > >>>>>> I have to admint that i haven't been tracking it too closely until it > >>>>>> turned out to be an issue > >>>>>> (makes network traffic over wireguard, through that node very slow) > >>>>>> > >>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though > >>>>>> (it's a gw to reach a internal server network in the basement, so not > >>>>>> the best setup for this) > >>>>> > >>>>> Since i'm at work i decided to check if i could find all the boot > >>>>> logs, which is actually done nicely by systemd > >>>>> first known bad: 6.11.7-300.fc41.x86_64 > >>>>> last known ok: 6.11.6-200.fc40.x86_64 > >>>>> > >>>>> Narrows the field for a bisect at least, =) > >>>>> > >>>> > >>>> Saw bridge, took a look. :) > >>>> > >>>> I think there are multiple issues with benet's be_ndo_bridge_getlink() > >>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple > >>>> places, e.g. the most obvious is the mutex_lock() in the beginning of > >>>> be_cmd_get_hsw_config(), then we have the call trace here which is: > >>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() > >>>> > >>>> Maybe you updated some tool that calls down that path along with the kernel and system > >>>> so you started seeing it in Fedora 41? > >>> > >>> Could be but it's pretty barebones > >>> > >>>> IMO this has been problematic for a very long time, but obviously it depends on the > >>>> chip type. Could you share your benet chip type to confirm the path? > >>> > >>> I don't know how to find the actual chip information but it's identified as: > >>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) > >>> > >> > >> Good, that confirms it. The skyhawk chip falls in the "else" of the block in > >> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config(). > >> > >>>> For the blamed commit I'd go with: > >>>> commit b71724147e73 > >>>> Author: Sathya Perla <sathya.perla@broadcom.com> > >>>> Date: Wed Jul 27 05:26:18 2016 -0400 > >>>> > >>>> be2net: replace polling with sleeping in the FW completion path > >>>> > >>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock > >>>> to a mutex. > >>> > >>> So, first try will be to try without that patch then, =) > >>> > >> > >> That would be a good try, yes. It is not a straight-forward revert though since a lot > >> of changes have happened since that commit. Let me know if you need help with that, > >> I can prepare the revert to test. > > > > Yeah, looked at the size of it and... well... I dunno if i'd have the time =) > > > > Can you try the attached patch? > It is on top of net-next (but also applies to Linus' tree): > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git > > It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay) > because the commit does many more things. > > Also +CC original patch author which I forgot to do. Thanks, built and installed but it refuses to boot it - will have to check during the weekend... (boots the latest fedora version even if this one is the selected one according to grubby) > Thanks, > Nik > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-26 22:28 ` Ian Kumlien @ 2025-02-27 14:31 ` Ian Kumlien 2025-02-27 14:33 ` Nikolay Aleksandrov 0 siblings, 1 reply; 16+ messages in thread From: Ian Kumlien @ 2025-02-27 14:31 UTC (permalink / raw) To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote: > > On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > > > > On 2/26/25 14:26, Ian Kumlien wrote: > > > On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > > >> > > >> On 2/26/25 13:52, Ian Kumlien wrote: > > >>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov > > >>> <razor@blackwall.org> wrote: > > >>>> > > >>>> On 2/26/25 11:55, Ian Kumlien wrote: > > >>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: > > >>>>>> > > >>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: > > >>>>>>> > > >>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: > > >>>>>>>> Same thing happens in 6.13.4, FYI > > >>>>>>> > > >>>>>>> Could you do a minor bisection? Does it not happen with 6.11? > > >>>>>>> Nothing jumps out at quick look. > > >>>>>> > > >>>>>> I have to admint that i haven't been tracking it too closely until it > > >>>>>> turned out to be an issue > > >>>>>> (makes network traffic over wireguard, through that node very slow) > > >>>>>> > > >>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though > > >>>>>> (it's a gw to reach a internal server network in the basement, so not > > >>>>>> the best setup for this) > > >>>>> > > >>>>> Since i'm at work i decided to check if i could find all the boot > > >>>>> logs, which is actually done nicely by systemd > > >>>>> first known bad: 6.11.7-300.fc41.x86_64 > > >>>>> last known ok: 6.11.6-200.fc40.x86_64 > > >>>>> > > >>>>> Narrows the field for a bisect at least, =) > > >>>>> > > >>>> > > >>>> Saw bridge, took a look. :) > > >>>> > > >>>> I think there are multiple issues with benet's be_ndo_bridge_getlink() > > >>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple > > >>>> places, e.g. the most obvious is the mutex_lock() in the beginning of > > >>>> be_cmd_get_hsw_config(), then we have the call trace here which is: > > >>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() > > >>>> > > >>>> Maybe you updated some tool that calls down that path along with the kernel and system > > >>>> so you started seeing it in Fedora 41? > > >>> > > >>> Could be but it's pretty barebones > > >>> > > >>>> IMO this has been problematic for a very long time, but obviously it depends on the > > >>>> chip type. Could you share your benet chip type to confirm the path? > > >>> > > >>> I don't know how to find the actual chip information but it's identified as: > > >>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) > > >>> > > >> > > >> Good, that confirms it. The skyhawk chip falls in the "else" of the block in > > >> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config(). > > >> > > >>>> For the blamed commit I'd go with: > > >>>> commit b71724147e73 > > >>>> Author: Sathya Perla <sathya.perla@broadcom.com> > > >>>> Date: Wed Jul 27 05:26:18 2016 -0400 > > >>>> > > >>>> be2net: replace polling with sleeping in the FW completion path > > >>>> > > >>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock > > >>>> to a mutex. > > >>> > > >>> So, first try will be to try without that patch then, =) > > >>> > > >> > > >> That would be a good try, yes. It is not a straight-forward revert though since a lot > > >> of changes have happened since that commit. Let me know if you need help with that, > > >> I can prepare the revert to test. > > > > > > Yeah, looked at the size of it and... well... I dunno if i'd have the time =) > > > > > > > Can you try the attached patch? > > It is on top of net-next (but also applies to Linus' tree): > > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git > > > > It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay) > > because the commit does many more things. > > > > Also +CC original patch author which I forgot to do. > > Thanks, built and installed but it refuses to boot it - will have to > check during the weekend... > (boots the latest fedora version even if this one is the selected one > according to grubby) So, saw that 6.13.5 was released so, fetched that, applied the patch and no more RCU issues in dmesg Will check more on the suspected performance bit as well when i get home later tonight I also understand Sathya Perla's motivation in saving power on this but things around it have been changed and it no longer works as intended.... > > Thanks, > > Nik > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-27 14:31 ` Ian Kumlien @ 2025-02-27 14:33 ` Nikolay Aleksandrov 2025-02-27 14:36 ` Ian Kumlien 0 siblings, 1 reply; 16+ messages in thread From: Nikolay Aleksandrov @ 2025-02-27 14:33 UTC (permalink / raw) To: Ian Kumlien; +Cc: Jakub Kicinski, Linux Kernel Network Developers On 2/27/25 16:31, Ian Kumlien wrote: > On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote: >> >> On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: >>> >>> On 2/26/25 14:26, Ian Kumlien wrote: >>>> On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: >>>>> >>>>> On 2/26/25 13:52, Ian Kumlien wrote: >>>>>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov >>>>>> <razor@blackwall.org> wrote: >>>>>>> >>>>>>> On 2/26/25 11:55, Ian Kumlien wrote: >>>>>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: >>>>>>>>> >>>>>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: >>>>>>>>>> >>>>>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: >>>>>>>>>>> Same thing happens in 6.13.4, FYI >>>>>>>>>> >>>>>>>>>> Could you do a minor bisection? Does it not happen with 6.11? >>>>>>>>>> Nothing jumps out at quick look. >>>>>>>>> >>>>>>>>> I have to admint that i haven't been tracking it too closely until it >>>>>>>>> turned out to be an issue >>>>>>>>> (makes network traffic over wireguard, through that node very slow) >>>>>>>>> >>>>>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though >>>>>>>>> (it's a gw to reach a internal server network in the basement, so not >>>>>>>>> the best setup for this) >>>>>>>> >>>>>>>> Since i'm at work i decided to check if i could find all the boot >>>>>>>> logs, which is actually done nicely by systemd >>>>>>>> first known bad: 6.11.7-300.fc41.x86_64 >>>>>>>> last known ok: 6.11.6-200.fc40.x86_64 >>>>>>>> >>>>>>>> Narrows the field for a bisect at least, =) >>>>>>>> >>>>>>> >>>>>>> Saw bridge, took a look. :) >>>>>>> >>>>>>> I think there are multiple issues with benet's be_ndo_bridge_getlink() >>>>>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple >>>>>>> places, e.g. the most obvious is the mutex_lock() in the beginning of >>>>>>> be_cmd_get_hsw_config(), then we have the call trace here which is: >>>>>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() >>>>>>> >>>>>>> Maybe you updated some tool that calls down that path along with the kernel and system >>>>>>> so you started seeing it in Fedora 41? >>>>>> >>>>>> Could be but it's pretty barebones >>>>>> >>>>>>> IMO this has been problematic for a very long time, but obviously it depends on the >>>>>>> chip type. Could you share your benet chip type to confirm the path? >>>>>> >>>>>> I don't know how to find the actual chip information but it's identified as: >>>>>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) >>>>>> >>>>> >>>>> Good, that confirms it. The skyhawk chip falls in the "else" of the block in >>>>> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config(). >>>>> >>>>>>> For the blamed commit I'd go with: >>>>>>> commit b71724147e73 >>>>>>> Author: Sathya Perla <sathya.perla@broadcom.com> >>>>>>> Date: Wed Jul 27 05:26:18 2016 -0400 >>>>>>> >>>>>>> be2net: replace polling with sleeping in the FW completion path >>>>>>> >>>>>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock >>>>>>> to a mutex. >>>>>> >>>>>> So, first try will be to try without that patch then, =) >>>>>> >>>>> >>>>> That would be a good try, yes. It is not a straight-forward revert though since a lot >>>>> of changes have happened since that commit. Let me know if you need help with that, >>>>> I can prepare the revert to test. >>>> >>>> Yeah, looked at the size of it and... well... I dunno if i'd have the time =) >>>> >>> >>> Can you try the attached patch? >>> It is on top of net-next (but also applies to Linus' tree): >>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git >>> >>> It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay) >>> because the commit does many more things. >>> >>> Also +CC original patch author which I forgot to do. >> >> Thanks, built and installed but it refuses to boot it - will have to >> check during the weekend... >> (boots the latest fedora version even if this one is the selected one >> according to grubby) > > So, saw that 6.13.5 was released so, fetched that, applied the patch > and no more RCU issues in dmesg > > Will check more on the suspected performance bit as well when i get > home later tonight > > I also understand Sathya Perla's motivation in saving power on this > but things around it have been changed > and it no longer works as intended.... > Nice, that's good to hear. Wrt the motivation - sure it's ok, but the code was wrong if they still want to achieve it, they need to work on an alternative solution. We shouldn't keep broken code around. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-27 14:33 ` Nikolay Aleksandrov @ 2025-02-27 14:36 ` Ian Kumlien 2025-02-27 14:45 ` Nikolay Aleksandrov 0 siblings, 1 reply; 16+ messages in thread From: Ian Kumlien @ 2025-02-27 14:36 UTC (permalink / raw) To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers On Thu, Feb 27, 2025 at 3:33 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > > On 2/27/25 16:31, Ian Kumlien wrote: > > On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote: > >> > >> On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > >>> > >>> On 2/26/25 14:26, Ian Kumlien wrote: > >>>> On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > >>>>> > >>>>> On 2/26/25 13:52, Ian Kumlien wrote: > >>>>>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov > >>>>>> <razor@blackwall.org> wrote: > >>>>>>> > >>>>>>> On 2/26/25 11:55, Ian Kumlien wrote: > >>>>>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: > >>>>>>>>>> > >>>>>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: > >>>>>>>>>>> Same thing happens in 6.13.4, FYI > >>>>>>>>>> > >>>>>>>>>> Could you do a minor bisection? Does it not happen with 6.11? > >>>>>>>>>> Nothing jumps out at quick look. > >>>>>>>>> > >>>>>>>>> I have to admint that i haven't been tracking it too closely until it > >>>>>>>>> turned out to be an issue > >>>>>>>>> (makes network traffic over wireguard, through that node very slow) > >>>>>>>>> > >>>>>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though > >>>>>>>>> (it's a gw to reach a internal server network in the basement, so not > >>>>>>>>> the best setup for this) > >>>>>>>> > >>>>>>>> Since i'm at work i decided to check if i could find all the boot > >>>>>>>> logs, which is actually done nicely by systemd > >>>>>>>> first known bad: 6.11.7-300.fc41.x86_64 > >>>>>>>> last known ok: 6.11.6-200.fc40.x86_64 > >>>>>>>> > >>>>>>>> Narrows the field for a bisect at least, =) > >>>>>>>> > >>>>>>> > >>>>>>> Saw bridge, took a look. :) > >>>>>>> > >>>>>>> I think there are multiple issues with benet's be_ndo_bridge_getlink() > >>>>>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple > >>>>>>> places, e.g. the most obvious is the mutex_lock() in the beginning of > >>>>>>> be_cmd_get_hsw_config(), then we have the call trace here which is: > >>>>>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() > >>>>>>> > >>>>>>> Maybe you updated some tool that calls down that path along with the kernel and system > >>>>>>> so you started seeing it in Fedora 41? > >>>>>> > >>>>>> Could be but it's pretty barebones > >>>>>> > >>>>>>> IMO this has been problematic for a very long time, but obviously it depends on the > >>>>>>> chip type. Could you share your benet chip type to confirm the path? > >>>>>> > >>>>>> I don't know how to find the actual chip information but it's identified as: > >>>>>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) > >>>>>> > >>>>> > >>>>> Good, that confirms it. The skyhawk chip falls in the "else" of the block in > >>>>> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config(). > >>>>> > >>>>>>> For the blamed commit I'd go with: > >>>>>>> commit b71724147e73 > >>>>>>> Author: Sathya Perla <sathya.perla@broadcom.com> > >>>>>>> Date: Wed Jul 27 05:26:18 2016 -0400 > >>>>>>> > >>>>>>> be2net: replace polling with sleeping in the FW completion path > >>>>>>> > >>>>>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock > >>>>>>> to a mutex. > >>>>>> > >>>>>> So, first try will be to try without that patch then, =) > >>>>>> > >>>>> > >>>>> That would be a good try, yes. It is not a straight-forward revert though since a lot > >>>>> of changes have happened since that commit. Let me know if you need help with that, > >>>>> I can prepare the revert to test. > >>>> > >>>> Yeah, looked at the size of it and... well... I dunno if i'd have the time =) > >>>> > >>> > >>> Can you try the attached patch? > >>> It is on top of net-next (but also applies to Linus' tree): > >>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git > >>> > >>> It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay) > >>> because the commit does many more things. > >>> > >>> Also +CC original patch author which I forgot to do. > >> > >> Thanks, built and installed but it refuses to boot it - will have to > >> check during the weekend... > >> (boots the latest fedora version even if this one is the selected one > >> according to grubby) > > > > So, saw that 6.13.5 was released so, fetched that, applied the patch > > and no more RCU issues in dmesg > > > > Will check more on the suspected performance bit as well when i get > > home later tonight > > > > I also understand Sathya Perla's motivation in saving power on this > > but things around it have been changed > > and it no longer works as intended.... > > > > Nice, that's good to hear. Wrt the motivation - sure it's ok, but the code was wrong > if they still want to achieve it, they need to work on an alternative solution. > We shouldn't keep broken code around. Agreed, but also, was it broken in 4.7 ;) Anyway, seems faster from what i can test here so Tested-by: Ian Kumlien <ian.kumlien@gmail.com> etc etc ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-27 14:36 ` Ian Kumlien @ 2025-02-27 14:45 ` Nikolay Aleksandrov 2025-02-27 15:52 ` Ian Kumlien 0 siblings, 1 reply; 16+ messages in thread From: Nikolay Aleksandrov @ 2025-02-27 14:45 UTC (permalink / raw) To: Ian Kumlien; +Cc: Jakub Kicinski, Linux Kernel Network Developers On 2/27/25 16:36, Ian Kumlien wrote: > On Thu, Feb 27, 2025 at 3:33 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: >> >> On 2/27/25 16:31, Ian Kumlien wrote: >>> On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote: >>>> >>>> On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: >>>>> >>>>> On 2/26/25 14:26, Ian Kumlien wrote: >>>>>> On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: >>>>>>> >>>>>>> On 2/26/25 13:52, Ian Kumlien wrote: >>>>>>>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov >>>>>>>> <razor@blackwall.org> wrote: >>>>>>>>> >>>>>>>>> On 2/26/25 11:55, Ian Kumlien wrote: >>>>>>>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: >>>>>>>>>>>>> Same thing happens in 6.13.4, FYI >>>>>>>>>>>> >>>>>>>>>>>> Could you do a minor bisection? Does it not happen with 6.11? >>>>>>>>>>>> Nothing jumps out at quick look. >>>>>>>>>>> >>>>>>>>>>> I have to admint that i haven't been tracking it too closely until it >>>>>>>>>>> turned out to be an issue >>>>>>>>>>> (makes network traffic over wireguard, through that node very slow) >>>>>>>>>>> >>>>>>>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though >>>>>>>>>>> (it's a gw to reach a internal server network in the basement, so not >>>>>>>>>>> the best setup for this) >>>>>>>>>> >>>>>>>>>> Since i'm at work i decided to check if i could find all the boot >>>>>>>>>> logs, which is actually done nicely by systemd >>>>>>>>>> first known bad: 6.11.7-300.fc41.x86_64 >>>>>>>>>> last known ok: 6.11.6-200.fc40.x86_64 >>>>>>>>>> >>>>>>>>>> Narrows the field for a bisect at least, =) >>>>>>>>>> >>>>>>>>> >>>>>>>>> Saw bridge, took a look. :) >>>>>>>>> >>>>>>>>> I think there are multiple issues with benet's be_ndo_bridge_getlink() >>>>>>>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple >>>>>>>>> places, e.g. the most obvious is the mutex_lock() in the beginning of >>>>>>>>> be_cmd_get_hsw_config(), then we have the call trace here which is: >>>>>>>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() >>>>>>>>> >>>>>>>>> Maybe you updated some tool that calls down that path along with the kernel and system >>>>>>>>> so you started seeing it in Fedora 41? >>>>>>>> >>>>>>>> Could be but it's pretty barebones >>>>>>>> >>>>>>>>> IMO this has been problematic for a very long time, but obviously it depends on the >>>>>>>>> chip type. Could you share your benet chip type to confirm the path? >>>>>>>> >>>>>>>> I don't know how to find the actual chip information but it's identified as: >>>>>>>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) >>>>>>>> >>>>>>> >>>>>>> Good, that confirms it. The skyhawk chip falls in the "else" of the block in >>>>>>> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config(). >>>>>>> >>>>>>>>> For the blamed commit I'd go with: >>>>>>>>> commit b71724147e73 >>>>>>>>> Author: Sathya Perla <sathya.perla@broadcom.com> >>>>>>>>> Date: Wed Jul 27 05:26:18 2016 -0400 >>>>>>>>> >>>>>>>>> be2net: replace polling with sleeping in the FW completion path >>>>>>>>> >>>>>>>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock >>>>>>>>> to a mutex. >>>>>>>> >>>>>>>> So, first try will be to try without that patch then, =) >>>>>>>> >>>>>>> >>>>>>> That would be a good try, yes. It is not a straight-forward revert though since a lot >>>>>>> of changes have happened since that commit. Let me know if you need help with that, >>>>>>> I can prepare the revert to test. >>>>>> >>>>>> Yeah, looked at the size of it and... well... I dunno if i'd have the time =) >>>>>> >>>>> >>>>> Can you try the attached patch? >>>>> It is on top of net-next (but also applies to Linus' tree): >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git >>>>> >>>>> It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay) >>>>> because the commit does many more things. >>>>> >>>>> Also +CC original patch author which I forgot to do. >>>> >>>> Thanks, built and installed but it refuses to boot it - will have to >>>> check during the weekend... >>>> (boots the latest fedora version even if this one is the selected one >>>> according to grubby) >>> >>> So, saw that 6.13.5 was released so, fetched that, applied the patch >>> and no more RCU issues in dmesg >>> >>> Will check more on the suspected performance bit as well when i get >>> home later tonight >>> >>> I also understand Sathya Perla's motivation in saving power on this >>> but things around it have been changed >>> and it no longer works as intended.... >>> >> >> Nice, that's good to hear. Wrt the motivation - sure it's ok, but the code was wrong >> if they still want to achieve it, they need to work on an alternative solution. >> We shouldn't keep broken code around. > > Agreed, but also, was it broken in 4.7 ;) > Since 4.9, yes it has. I just checked out v4.9 and it has all these bugs present. If you boot 4.9 and issue PF_BRIDGE RTM_GETLINK you'll hit the same problems. > Anyway, seems faster from what i can test here so > Tested-by: Ian Kumlien <ian.kumlien@gmail.com> > > etc etc Thank you, I'll clean up the patch and submit it. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! 2025-02-27 14:45 ` Nikolay Aleksandrov @ 2025-02-27 15:52 ` Ian Kumlien 0 siblings, 0 replies; 16+ messages in thread From: Ian Kumlien @ 2025-02-27 15:52 UTC (permalink / raw) To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers On Thu, Feb 27, 2025 at 3:45 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > > On 2/27/25 16:36, Ian Kumlien wrote: > > On Thu, Feb 27, 2025 at 3:33 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > >> > >> On 2/27/25 16:31, Ian Kumlien wrote: > >>> On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote: > >>>> > >>>> On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > >>>>> > >>>>> On 2/26/25 14:26, Ian Kumlien wrote: > >>>>>> On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote: > >>>>>>> > >>>>>>> On 2/26/25 13:52, Ian Kumlien wrote: > >>>>>>>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov > >>>>>>>> <razor@blackwall.org> wrote: > >>>>>>>>> > >>>>>>>>> On 2/26/25 11:55, Ian Kumlien wrote: > >>>>>>>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote: > >>>>>>>>>>>>> Same thing happens in 6.13.4, FYI > >>>>>>>>>>>> > >>>>>>>>>>>> Could you do a minor bisection? Does it not happen with 6.11? > >>>>>>>>>>>> Nothing jumps out at quick look. > >>>>>>>>>>> > >>>>>>>>>>> I have to admint that i haven't been tracking it too closely until it > >>>>>>>>>>> turned out to be an issue > >>>>>>>>>>> (makes network traffic over wireguard, through that node very slow) > >>>>>>>>>>> > >>>>>>>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though > >>>>>>>>>>> (it's a gw to reach a internal server network in the basement, so not > >>>>>>>>>>> the best setup for this) > >>>>>>>>>> > >>>>>>>>>> Since i'm at work i decided to check if i could find all the boot > >>>>>>>>>> logs, which is actually done nicely by systemd > >>>>>>>>>> first known bad: 6.11.7-300.fc41.x86_64 > >>>>>>>>>> last known ok: 6.11.6-200.fc40.x86_64 > >>>>>>>>>> > >>>>>>>>>> Narrows the field for a bisect at least, =) > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> Saw bridge, took a look. :) > >>>>>>>>> > >>>>>>>>> I think there are multiple issues with benet's be_ndo_bridge_getlink() > >>>>>>>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple > >>>>>>>>> places, e.g. the most obvious is the mutex_lock() in the beginning of > >>>>>>>>> be_cmd_get_hsw_config(), then we have the call trace here which is: > >>>>>>>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range() > >>>>>>>>> > >>>>>>>>> Maybe you updated some tool that calls down that path along with the kernel and system > >>>>>>>>> so you started seeing it in Fedora 41? > >>>>>>>> > >>>>>>>> Could be but it's pretty barebones > >>>>>>>> > >>>>>>>>> IMO this has been problematic for a very long time, but obviously it depends on the > >>>>>>>>> chip type. Could you share your benet chip type to confirm the path? > >>>>>>>> > >>>>>>>> I don't know how to find the actual chip information but it's identified as: > >>>>>>>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10) > >>>>>>>> > >>>>>>> > >>>>>>> Good, that confirms it. The skyhawk chip falls in the "else" of the block in > >>>>>>> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config(). > >>>>>>> > >>>>>>>>> For the blamed commit I'd go with: > >>>>>>>>> commit b71724147e73 > >>>>>>>>> Author: Sathya Perla <sathya.perla@broadcom.com> > >>>>>>>>> Date: Wed Jul 27 05:26:18 2016 -0400 > >>>>>>>>> > >>>>>>>>> be2net: replace polling with sleeping in the FW completion path > >>>>>>>>> > >>>>>>>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock > >>>>>>>>> to a mutex. > >>>>>>>> > >>>>>>>> So, first try will be to try without that patch then, =) > >>>>>>>> > >>>>>>> > >>>>>>> That would be a good try, yes. It is not a straight-forward revert though since a lot > >>>>>>> of changes have happened since that commit. Let me know if you need help with that, > >>>>>>> I can prepare the revert to test. > >>>>>> > >>>>>> Yeah, looked at the size of it and... well... I dunno if i'd have the time =) > >>>>>> > >>>>> > >>>>> Can you try the attached patch? > >>>>> It is on top of net-next (but also applies to Linus' tree): > >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git > >>>>> > >>>>> It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay) > >>>>> because the commit does many more things. > >>>>> > >>>>> Also +CC original patch author which I forgot to do. > >>>> > >>>> Thanks, built and installed but it refuses to boot it - will have to > >>>> check during the weekend... > >>>> (boots the latest fedora version even if this one is the selected one > >>>> according to grubby) > >>> > >>> So, saw that 6.13.5 was released so, fetched that, applied the patch > >>> and no more RCU issues in dmesg > >>> > >>> Will check more on the suspected performance bit as well when i get > >>> home later tonight > >>> > >>> I also understand Sathya Perla's motivation in saving power on this > >>> but things around it have been changed > >>> and it no longer works as intended.... > >>> > >> > >> Nice, that's good to hear. Wrt the motivation - sure it's ok, but the code was wrong > >> if they still want to achieve it, they need to work on an alternative solution. > >> We shouldn't keep broken code around. > > > > Agreed, but also, was it broken in 4.7 ;) > > > > Since 4.9, yes it has. I just checked out v4.9 and it has all these bugs present. > If you boot 4.9 and issue PF_BRIDGE RTM_GETLINK you'll hit the same problems. Ah!, ok! > > Anyway, seems faster from what i can test here so > > Tested-by: Ian Kumlien <ian.kumlien@gmail.com> > > > > etc etc > > Thank you, I'll clean up the patch and submit it. Thank you, =) ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-02-27 15:52 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-25 8:05 [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! Ian Kumlien 2025-02-25 10:13 ` Ian Kumlien 2025-02-26 1:05 ` Jakub Kicinski 2025-02-26 9:24 ` Ian Kumlien 2025-02-26 9:55 ` Ian Kumlien 2025-02-26 10:33 ` Nikolay Aleksandrov 2025-02-26 11:52 ` Ian Kumlien 2025-02-26 12:00 ` Nikolay Aleksandrov 2025-02-26 12:26 ` Ian Kumlien 2025-02-26 13:11 ` Nikolay Aleksandrov 2025-02-26 22:28 ` Ian Kumlien 2025-02-27 14:31 ` Ian Kumlien 2025-02-27 14:33 ` Nikolay Aleksandrov 2025-02-27 14:36 ` Ian Kumlien 2025-02-27 14:45 ` Nikolay Aleksandrov 2025-02-27 15:52 ` Ian Kumlien
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).