[6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
@ 2025-02-25  8:05 Ian Kumlien
  2025-02-25 10:13 ` Ian Kumlien
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kumlien @ 2025-02-25  8:05 UTC (permalink / raw)
  To: Linux Kernel Network Developers

Just had this happen just before be2net initialization... FYI and all that ;)

[    5.220133] ------------[ cut here ]------------
[    5.220137] Voluntary context switch within RCU read-side critical section!
[    5.220143] WARNING: CPU: 4 PID: 1045 at
kernel/rcu/tree_plugin.h:331 rcu_note_context_switch+0x65a/0x6d0
[    5.220150] Modules linked in: cfg80211 rfkill qrtr nft_masq
nft_nat nft_numgen nft_chain_nat nf_nat nft_ct nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nf_tables sunrpc vfat fat ocrdma ib_uverbs
ib_core xfs intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel snd_hda_codec_realtek
snd_hda_codec_generic iTCO_wdt dell_pc intel_pmc_bxt mei_wdt at24
iTCO_vendor_support snd_hda_codec_hdmi snd_hda_scodec_component kvm
platform_profile mei_hdcp mei_pxp snd_hda_intel snd_intel_dspcfg
dell_wmi dell_smm_hwmon snd_intel_sdw_acpi dell_smbios snd_hda_codec
rapl snd_hda_core intel_wmi_thunderbolt dcdbas intel_cstate
intel_uncore sparse_keymap wmi_bmof dell_wmi_descriptor i2c_i801
snd_hwdep i2c_smbus snd_pcm snd_timer mei_me be2net e1000e mei snd
lpc_ich soundcore sch_fq fuse loop dm_multipath nfnetlink zram
lz4hc_compress lz4_compress i915 crct10dif_pclmul crc32_pclmul
crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel
[    5.220232]  i2c_algo_bit drm_buddy sha512_ssse3 ttm sha256_ssse3
sha1_ssse3 drm_display_helper video cec wmi scsi_dh_rdac scsi_dh_emc
scsi_dh_alua pkcs8_key_parser
[    5.220250] Hardware name: Dell Inc. Precision T1700/04JGCK, BIOS
A28 05/30/2019
[    5.220253] RIP: rcu_note_context_switch+0x65a/0x6d0
[ 5.220256] Code: a8 00 00 00 00 0f 85 64 fd ff ff 49 89 8d a8 00 00
00 e9 58 fd ff ff 48 c7 c7 d0 ab e5 87 c6 05 b6 26 a2 02 01 e8 16 1c
f2 ff <0f> 0b e9 f1 f9 ff ff 49 83 bd a0 00 00 00 00 75 c2 e9 18 fd ff
ff
All code
========
   0: a8 00                test   $0x0,%al
   2: 00 00                add    %al,(%rax)
   4: 00 0f                add    %cl,(%rdi)
   6: 85 64 fd ff          test   %esp,-0x1(%rbp,%rdi,8)
   a: ff 49 89              decl   -0x77(%rcx)
   d: 8d a8 00 00 00 e9    lea    -0x17000000(%rax),%ebp
  13: 58                    pop    %rax
  14: fd                    std
  15: ff                    (bad)
  16: ff 48 c7              decl   -0x39(%rax)
  19: c7                    (bad)
  1a: d0 ab e5 87 c6 05    shrb   $1,0x5c687e5(%rbx)
  20: b6 26                mov    $0x26,%dh
  22: a2 02 01 e8 16 1c f2 movabs %al,0xffff21c16e80102
  29:* ff 0f <-- trapping instruction
  2b: 0b e9                or     %ecx,%ebp
  2d: f1                    int1
  2e: f9                    stc
  2f: ff                    (bad)
  30: ff 49 83              decl   -0x7d(%rcx)
  33: bd a0 00 00 00        mov    $0xa0,%ebp
  38: 00 75 c2              add    %dh,-0x3e(%rbp)
  3b: e9 18 fd ff ff        jmp    0xfffffffffffffd58

Code starting with the faulting instruction
===========================================
   0: 0f 0b                ud2
   2: e9 f1 f9 ff ff        jmp    0xfffffffffffff9f8
   7: 49 83 bd a0 00 00 00 cmpq   $0x0,0xa0(%r13)
   e: 00
   f: 75 c2                jne    0xffffffffffffffd3
  11: e9 18 fd ff ff        jmp    0xfffffffffffffd2e
[    5.220259] RSP: 0018:ffffb28d80ae73c0 EFLAGS: 00010086
[    5.220262] RAX: 0000000000000000 RBX: ffff8a2c1f3ad380 RCX: 0000000000000027
[    5.220264] RDX: ffff8a2f0ea21908 RSI: 0000000000000001 RDI: ffff8a2f0ea21900
[    5.220266] RBP: ffff8a2f0ea38040 R08: 0000000000000000 R09: 0000000000000000
[    5.220268] R10: 6374697773207478 R11: 0000000000000000 R12: 0000000000000000
[    5.220269] R13: ffff8a2c1f3ad380 R14: 0000000000000000 R15: ffff8a2c1a200af0
[    5.220271] FS:  00007ff780c64bc0(0000) GS:ffff8a2f0ea00000(0000)
knlGS:0000000000000000
[    5.220274] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.220276] CR2: 000055d7b4ddb208 CR3: 0000000110256001 CR4: 00000000001726f0
[    5.220278] Call Trace:
[    5.220280]  <TASK>
[    5.220282] ? rcu_note_context_switch+0x65a/0x6d0
[    5.220285] ? __warn.cold+0x93/0xfa
[    5.220288] ? rcu_note_context_switch+0x65a/0x6d0
[    5.220294] ? report_bug+0xff/0x140
[    5.220297] ? handle_bug+0x58/0x90
[    5.220300] ? exc_invalid_op+0x17/0x70
[    5.220303] ? asm_exc_invalid_op+0x1a/0x20
[    5.220308] ? rcu_note_context_switch+0x65a/0x6d0
[    5.220312] __schedule+0xcc/0x14b0
[    5.220316] ? get_nohz_timer_target+0x2d/0x180
[    5.220322] ? timerqueue_add+0x71/0xc0
[    5.220326] ? enqueue_hrtimer+0x42/0xa0
[    5.220331] schedule+0x27/0xf0
[    5.220334] schedule_hrtimeout_range_clock+0x100/0x1b0
[    5.220338] ? __pfx_hrtimer_wakeup+0x10/0x10
[    5.220342] usleep_range_state+0x65/0x90
WARNING! Cannot find .ko for module be2net, please pass a valid module path
[    5.220347] ? be_mcc_notify_wait+0x6c/0x150 be2net
WARNING! Cannot find .ko for module be2net, please pass a valid module path
[    5.220360] be_mcc_notify_wait+0xbe/0x150 be2net
WARNING! Cannot find .ko for module be2net, please pass a valid module path
[    5.220371] be_cmd_get_hsw_config+0x16c/0x190 be2net
WARNING! Cannot find .ko for module be2net, please pass a valid module path
[    5.220382] be_ndo_bridge_getlink+0xe0/0x100 be2net
[    5.220393] rtnl_bridge_getlink+0x12b/0x1b0
[    5.220398] ? __pfx_rtnl_bridge_getlink+0x10/0x10
[    5.220401] rtnl_dumpit+0x80/0xa0
[    5.220404] netlink_dump+0x13b/0x360
[    5.220409] __netlink_dump_start+0x1eb/0x310
[    5.220412] ? __pfx_rtnl_bridge_getlink+0x10/0x10
[    5.220415] rtnetlink_rcv_msg+0x2da/0x460
[    5.220418] ? __pfx_rtnl_dumpit+0x10/0x10
[    5.220421] ? __pfx_rtnl_bridge_getlink+0x10/0x10
[    5.220424] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[    5.220427] netlink_rcv_skb+0x53/0x100
[    5.220432] netlink_unicast+0x245/0x390
[    5.220435] netlink_sendmsg+0x21b/0x470
[    5.220438] __sys_sendto+0x1df/0x1f0
[    5.220444] __x64_sys_sendto+0x24/0x30
[    5.220446] do_syscall_64+0x82/0x160
[    5.220449] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[    5.220452] ? netlink_rcv_skb+0x82/0x100
[    5.220455] ? netlink_unicast+0x24d/0x390
[    5.220457] ? kmem_cache_free+0x3ee/0x440
[    5.220461] ? skb_release_data+0x193/0x200
[    5.220465] ? netlink_unicast+0x24d/0x390
[    5.220468] ? netlink_sendmsg+0x228/0x470
[    5.220471] ? __sys_sendto+0x1df/0x1f0
[    5.220475] ? syscall_exit_to_user_mode+0x10/0x210
[    5.220478] ? do_syscall_64+0x8e/0x160
[    5.220480] ? iterate_dir+0x182/0x200
[    5.220483] ? __x64_sys_getdents64+0xfa/0x130
[    5.220486] ? __pfx_filldir64+0x10/0x10
[    5.220489] ? syscall_exit_to_user_mode+0x10/0x210
[    5.220491] ? do_syscall_64+0x8e/0x160
[    5.220493] ? syscall_exit_to_user_mode+0x10/0x210
[    5.220496] ? do_syscall_64+0x8e/0x160
[    5.220498] ? exc_page_fault+0x7e/0x180
[    5.220500] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    5.220504] RIP: 0033:0x7ff7807045b7
[ 5.220516] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00
00 90 f3 0f 1e fa 80 3d 15 9b 0f 00 00 41 89 ca 74 10 b8 2c 00 00 00
0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d
d0
All code
========
   0: c7 c0 ff ff ff ff    mov    $0xffffffff,%eax
   6: eb be                jmp    0xffffffffffffffc6
   8: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
   f: 00 00 00
  12: 90                    nop
  13: f3 0f 1e fa          endbr64
  17: 80 3d 15 9b 0f 00 00 cmpb   $0x0,0xf9b15(%rip)        # 0xf9b33
  1e: 41 89 ca              mov    %ecx,%r10d
  21: 74 10                je     0x33
  23: b8 2c 00 00 00        mov    $0x2c,%eax
  28: 0f 05                syscall
  2a:* 48 3d 00 f0 ff ff    cmp    $0xfffffffffffff000,%rax <--
trapping instruction
  30: 77 69                ja     0x9b
  32: c3                    ret
  33: 55                    push   %rbp
  34: 48 89 e5              mov    %rsp,%rbp
  37: 53                    push   %rbx
  38: 48 83 ec 38          sub    $0x38,%rsp
  3c: 44 89 4d d0          mov    %r9d,-0x30(%rbp)

Code starting with the faulting instruction
===========================================
   0: 48 3d 00 f0 ff ff    cmp    $0xfffffffffffff000,%rax
   6: 77 69                ja     0x71
   8: c3                    ret
   9: 55                    push   %rbp
   a: 48 89 e5              mov    %rsp,%rbp
   d: 53                    push   %rbx
   e: 48 83 ec 38          sub    $0x38,%rsp
  12: 44 89 4d d0          mov    %r9d,-0x30(%rbp)
[    5.220518] RSP: 002b:00007ffc921b4ff8 EFLAGS: 00000202 ORIG_RAX:
000000000000002c
[    5.220522] RAX: ffffffffffffffda RBX: 000055d7b4dacc80 RCX: 00007ff7807045b7
[    5.220524] RDX: 0000000000000020 RSI: 000055d7b4db7ff0 RDI: 0000000000000003
[    5.220525] RBP: 00007ffc921b5090 R08: 00007ffc921b5000 R09: 0000000000000080
[    5.220527] R10: 0000000000000000 R11: 0000000000000202 R12: 000055d7b4ddb350
[    5.220529] R13: 00007ffc921b50d4 R14: 000055d7b4ddb350 R15: 000055d77d5f8a90
[    5.220532]  </TASK>
[    5.220533] ---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-25  8:05 [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! Ian Kumlien
@ 2025-02-25 10:13 ` Ian Kumlien
  2025-02-26  1:05   ` Jakub Kicinski
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kumlien @ 2025-02-25 10:13 UTC (permalink / raw)
  To: Linux Kernel Network Developers

Same thing happens in 6.13.4, FYI

[    5.253286] ------------[ cut here ]------------
[    5.253291] Voluntary context switch within RCU read-side critical section!
[    5.253296] WARNING: CPU: 7 PID: 1052 at
kernel/rcu/tree_plugin.h:331 rcu_note_context_switch+0x66f/0x6d0
[    5.253304] Modules linked in: cfg80211 rfkill qrtr nft_masq
nft_nat sunrpc nft_numgen nft_chain_nat nf_nat nft_ct nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4
nf_reject_ipv6 nft_reject nf_tables vfat fat ocrdma ib_uverbs ib_core
xfs snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr
snd_hda_scodec_component snd_hda_codec_hdmi intel_rapl_common
x86_pkg_temp_thermal snd_hda_intel intel_powerclamp coretemp
snd_intel_dspcfg mei_pxp snd_intel_sdw_acpi dell_pc iTCO_wdt
platform_profile snd_hda_codec mei_wdt at24 kvm_intel mei_hdcp
intel_pmc_bxt iTCO_vendor_support dell_smm_hwmon snd_hda_core dell_wmi
kvm snd_hwdep dell_smbios snd_pcm rapl dcdbas sparse_keymap
intel_cstate dell_wmi_descriptor intel_uncore intel_wmi_thunderbolt
wmi_bmof i2c_i801 i2c_smbus snd_timer mei_me snd e1000e lpc_ich mei
be2net soundcore sch_fq fuse loop dm_multipath nfnetlink zram
lz4hc_compress lz4_compress i915 crct10dif_pclmul i2c_algo_bit
crc32_pclmul drm_buddy crc32c_intel polyval_clmulni ttm
polyval_generic
[    5.253388]  ghash_clmulni_intel drm_display_helper sha512_ssse3
sha256_ssse3 sha1_ssse3 cec video wmi scsi_dh_rdac scsi_dh_emc
scsi_dh_alua pkcs8_key_parser
[    5.253405] Hardware name: Dell Inc. Precision T1700/04JGCK, BIOS
A28 05/30/2019
[    5.253407] RIP: rcu_note_context_switch+0x66f/0x6d0
[ 5.253411] Code: a8 00 00 00 00 0f 85 3c fd ff ff 49 89 8d a8 00 00
00 e9 30 fd ff ff 48 c7 c7 30 6f de b7 c6 05 7b 51 96 02 01 e8 61 0e
f2 ff <0f> 0b e9 dc f9 ff ff c6 45 11 00 48 8b 75 20 ba 01 00 00 00 48
8b
All code
========
   0: a8 00                test   $0x0,%al
   2: 00 00                add    %al,(%rax)
   4: 00 0f                add    %cl,(%rdi)
   6: 85 3c fd ff ff 49 89 test   %edi,-0x76b60001(,%rdi,8)
   d: 8d a8 00 00 00 e9    lea    -0x17000000(%rax),%ebp
  13: 30 fd                xor    %bh,%ch
  15: ff                    (bad)
  16: ff 48 c7              decl   -0x39(%rax)
  19: c7                    (bad)
  1a: 30 6f de              xor    %ch,-0x22(%rdi)
  1d: b7 c6                mov    $0xc6,%bh
  1f: 05 7b 51 96 02        add    $0x296517b,%eax
  24: 01 e8                add    %ebp,%eax
  26: 61                    (bad)
  27: 0e                    (bad)
  28:* f2 ff 0f              repnz decl (%rdi) <-- trapping instruction
  2b: 0b e9                or     %ecx,%ebp
  2d: dc f9                fdivr  %st,%st(1)
  2f: ff                    (bad)
  30: ff c6                inc    %esi
  32: 45 11 00              adc    %r8d,(%r8)
  35: 48 8b 75 20          mov    0x20(%rbp),%rsi
  39: ba 01 00 00 00        mov    $0x1,%edx
  3e: 48                    rex.W
  3f: 8b                    .byte 0x8b

Code starting with the faulting instruction
===========================================
   0: 0f 0b                ud2
   2: e9 dc f9 ff ff        jmp    0xfffffffffffff9e3
   7: c6 45 11 00          movb   $0x0,0x11(%rbp)
   b: 48 8b 75 20          mov    0x20(%rbp),%rsi
   f: ba 01 00 00 00        mov    $0x1,%edx
  14: 48                    rex.W
  15: 8b                    .byte 0x8b
[    5.253413] RSP: 0018:ffffadb040f4b688 EFLAGS: 00010082
[    5.253416] RAX: 0000000000000000 RBX: ffff957a4d705380 RCX: 0000000000000027
[    5.253418] RDX: ffff957d4eba1908 RSI: 0000000000000001 RDI: ffff957d4eba1900
[    5.253420] RBP: ffff957d4ebb7d40 R08: 0000000000000000 R09: 0000000000000000
[    5.253422] R10: 206c616369746972 R11: 0000000000000000 R12: 0000000000000000
[    5.253423] R13: ffff957a4d705380 R14: 000000000007a100 R15: ffff957a47400b30
[    5.253425] FS:  00007f6cc2c0dbc0(0000) GS:ffff957d4eb80000(0000)
knlGS:0000000000000000
[    5.253428] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.253430] CR2: 0000556e7a98b188 CR3: 00000001210ce006 CR4: 00000000001726f0
[    5.253432] Call Trace:
[    5.253434]  <TASK>
[    5.253435] ? rcu_note_context_switch+0x66f/0x6d0
[    5.253439] ? __warn.cold+0x93/0xfa
[    5.253443] ? rcu_note_context_switch+0x66f/0x6d0
[    5.253447] ? report_bug+0xff/0x140
[    5.253451] ? console_unlock+0x9d/0x140
[    5.253455] ? handle_bug+0x58/0x90
[    5.253458] ? exc_invalid_op+0x17/0x70
[    5.253461] ? asm_exc_invalid_op+0x1a/0x20
[    5.253466] ? rcu_note_context_switch+0x66f/0x6d0
[    5.253469] ? rcu_note_context_switch+0x66f/0x6d0
[    5.253472] ? valid_bridge_getlink_req.constprop.0+0xac/0x1c0
[    5.253478] __schedule+0xcc/0x14b0
[    5.253482] ? get_nohz_timer_target+0x2d/0x180
[    5.253486] ? timerqueue_add+0x71/0xc0
[    5.253489] ? enqueue_hrtimer+0x42/0xa0
[    5.253492] schedule+0x27/0xf0
[    5.253495] usleep_range_state+0xea/0x120
[    5.253499] ? __pfx_hrtimer_wakeup+0x10/0x10
WARNING! Cannot find .ko for module be2net, please pass a valid module path
[    5.253503] ? be_mcc_notify_wait+0x6c/0x150 be2net
WARNING! Cannot find .ko for module be2net, please pass a valid module path
[    5.253516] be_mcc_notify_wait+0xbe/0x150 be2net
WARNING! Cannot find .ko for module be2net, please pass a valid module path
[    5.253526] be_cmd_get_hsw_config+0x16c/0x190 be2net
WARNING! Cannot find .ko for module be2net, please pass a valid module path
[    5.253537] be_ndo_bridge_getlink+0xe0/0x100 be2net
[    5.253547] rtnl_bridge_getlink+0x12b/0x1b0
[    5.253551] ? __pfx_rtnl_bridge_getlink+0x10/0x10
[    5.253555] rtnl_dumpit+0x80/0xa0
[    5.253558] netlink_dump+0x19c/0x410
[    5.253561] ? skb_release_data+0x193/0x200
[    5.253566] __netlink_dump_start+0x1eb/0x310
[    5.253569] ? __pfx_rtnl_bridge_getlink+0x10/0x10
[    5.253573] rtnetlink_rcv_msg+0x2da/0x460
[    5.253576] ? __pfx_rtnl_dumpit+0x10/0x10
[    5.253579] ? __pfx_rtnl_bridge_getlink+0x10/0x10
[    5.253582] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[    5.253586] netlink_rcv_skb+0x53/0x100
[    5.253590] netlink_unicast+0x245/0x390
[    5.253593] netlink_sendmsg+0x21b/0x470
[    5.253597] __sys_sendto+0x1ef/0x200
[    5.253602] __x64_sys_sendto+0x24/0x30
[    5.253605] do_syscall_64+0x82/0x160
[    5.253609] ? syscall_exit_to_user_mode+0x10/0x210
[    5.253613] ? do_syscall_64+0x8e/0x160
[    5.253616] ? atime_needs_update+0xa0/0x120
[    5.253621] ? touch_atime+0x1e/0x120
[    5.253624] ? iterate_dir+0x182/0x200
[    5.253627] ? __x64_sys_getdents64+0xa7/0x120
[    5.253629] ? __pfx_filldir64+0x10/0x10
[    5.253632] ? syscall_exit_to_user_mode+0x10/0x210
[    5.253635] ? do_syscall_64+0x8e/0x160
[    5.253638] ? do_syscall_64+0x8e/0x160
[    5.253642] ? do_syscall_64+0x8e/0x160
[    5.253645] ? do_syscall_64+0x8e/0x160
[    5.253648] ? do_syscall_64+0x8e/0x160
[    5.253651] ? exc_page_fault+0x7e/0x180
[    5.253654] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    5.253658] RIP: 0033:0x7f6cc34d55b7
[ 5.253669] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00
00 90 f3 0f 1e fa 80 3d 15 9b 0f 00 00 41 89 ca 74 10 b8 2c 00 00 00
0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d
d0
All code
========
   0: c7 c0 ff ff ff ff    mov    $0xffffffff,%eax
   6: eb be                jmp    0xffffffffffffffc6
   8: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
   f: 00 00 00
  12: 90                    nop
  13: f3 0f 1e fa          endbr64
  17: 80 3d 15 9b 0f 00 00 cmpb   $0x0,0xf9b15(%rip)        # 0xf9b33
  1e: 41 89 ca              mov    %ecx,%r10d
  21: 74 10                je     0x33
  23: b8 2c 00 00 00        mov    $0x2c,%eax
  28: 0f 05                syscall
  2a:* 48 3d 00 f0 ff ff    cmp    $0xfffffffffffff000,%rax <--
trapping instruction
  30: 77 69                ja     0x9b
  32: c3                    ret
  33: 55                    push   %rbp
  34: 48 89 e5              mov    %rsp,%rbp
  37: 53                    push   %rbx
  38: 48 83 ec 38          sub    $0x38,%rsp
  3c: 44 89 4d d0          mov    %r9d,-0x30(%rbp)

Code starting with the faulting instruction
===========================================
   0: 48 3d 00 f0 ff ff    cmp    $0xfffffffffffff000,%rax
   6: 77 69                ja     0x71
   8: c3                    ret
   9: 55                    push   %rbp
   a: 48 89 e5              mov    %rsp,%rbp
   d: 53                    push   %rbx
   e: 48 83 ec 38          sub    $0x38,%rsp
  12: 44 89 4d d0          mov    %r9d,-0x30(%rbp)
[    5.253671] RSP: 002b:00007ffc5839a338 EFLAGS: 00000202 ORIG_RAX:
000000000000002c
[    5.253674] RAX: ffffffffffffffda RBX: 0000556e7a95cc80 RCX: 00007f6cc34d55b7
[    5.253676] RDX: 0000000000000020 RSI: 0000556e7a9752d0 RDI: 0000000000000003
[    5.253677] RBP: 00007ffc5839a3d0 R08: 00007ffc5839a340 R09: 0000000000000080
[    5.253679] R10: 0000000000000000 R11: 0000000000000202 R12: 0000556e7a98b2c0
[    5.253681] R13: 00007ffc5839a414 R14: 0000556e7a98b2c0 R15: 0000556e448c7a90
[    5.253684]  </TASK>
[    5.253685] ---[ end trace 0000000000000000 ]---

On Tue, Feb 25, 2025 at 9:05 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> Just had this happen just before be2net initialization... FYI and all that ;)
>

[--8<--]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-25 10:13 ` Ian Kumlien
@ 2025-02-26  1:05   ` Jakub Kicinski
  2025-02-26  9:24     ` Ian Kumlien
  0 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-02-26  1:05 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: Linux Kernel Network Developers

On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
> Same thing happens in 6.13.4, FYI

Could you do a minor bisection? Does it not happen with 6.11?
Nothing jumps out at quick look.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-26  1:05   ` Jakub Kicinski
@ 2025-02-26  9:24     ` Ian Kumlien
  2025-02-26  9:55       ` Ian Kumlien
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kumlien @ 2025-02-26  9:24 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Linux Kernel Network Developers

On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
> > Same thing happens in 6.13.4, FYI
>
> Could you do a minor bisection? Does it not happen with 6.11?
> Nothing jumps out at quick look.

I have to admint that i haven't been tracking it too closely until it
turned out to be an issue
(makes network traffic over wireguard, through that node very slow)

But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
(it's a gw to reach a internal server network in the basement, so not
the best setup for this)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-26  9:24     ` Ian Kumlien
@ 2025-02-26  9:55       ` Ian Kumlien
  2025-02-26 10:33         ` Nikolay Aleksandrov
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kumlien @ 2025-02-26  9:55 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Linux Kernel Network Developers

On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
> > > Same thing happens in 6.13.4, FYI
> >
> > Could you do a minor bisection? Does it not happen with 6.11?
> > Nothing jumps out at quick look.
>
> I have to admint that i haven't been tracking it too closely until it
> turned out to be an issue
> (makes network traffic over wireguard, through that node very slow)
>
> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
> (it's a gw to reach a internal server network in the basement, so not
> the best setup for this)

Since i'm at work i decided to check if i could find all the boot
logs, which is actually done nicely by systemd
first known bad: 6.11.7-300.fc41.x86_64
last known ok: 6.11.6-200.fc40.x86_64

Narrows the field for a bisect at least, =)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-26  9:55       ` Ian Kumlien
@ 2025-02-26 10:33         ` Nikolay Aleksandrov
  2025-02-26 11:52           ` Ian Kumlien
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Aleksandrov @ 2025-02-26 10:33 UTC (permalink / raw)
  To: Ian Kumlien, Jakub Kicinski; +Cc: Linux Kernel Network Developers

On 2/26/25 11:55, Ian Kumlien wrote:
> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>>
>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>
>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
>>>> Same thing happens in 6.13.4, FYI
>>>
>>> Could you do a minor bisection? Does it not happen with 6.11?
>>> Nothing jumps out at quick look.
>>
>> I have to admint that i haven't been tracking it too closely until it
>> turned out to be an issue
>> (makes network traffic over wireguard, through that node very slow)
>>
>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
>> (it's a gw to reach a internal server network in the basement, so not
>> the best setup for this)
> 
> Since i'm at work i decided to check if i could find all the boot
> logs, which is actually done nicely by systemd
> first known bad: 6.11.7-300.fc41.x86_64
> last known ok: 6.11.6-200.fc40.x86_64
> 
> Narrows the field for a bisect at least, =)
> 

Saw bridge, took a look. :)

I think there are multiple issues with benet's be_ndo_bridge_getlink()
because it calls be_cmd_get_hsw_config() which can sleep in multiple
places, e.g. the most obvious is the mutex_lock() in the beginning of
be_cmd_get_hsw_config(), then we have the call trace here which is:
be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()

Maybe you updated some tool that calls down that path along with the kernel and system
so you started seeing it in Fedora 41?

IMO this has been problematic for a very long time, but obviously it depends on the
chip type. Could you share your benet chip type to confirm the path?

For the blamed commit I'd go with:
 commit b71724147e73
 Author: Sathya Perla <sathya.perla@broadcom.com>
 Date:   Wed Jul 27 05:26:18 2016 -0400

     be2net: replace polling with sleeping in the FW completion path

This one changed the udelay() (which is safe) to usleep_range() and the spinlock
to a mutex.

Cheers,
 Nik

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-26 10:33         ` Nikolay Aleksandrov
@ 2025-02-26 11:52           ` Ian Kumlien
  2025-02-26 12:00             ` Nikolay Aleksandrov
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kumlien @ 2025-02-26 11:52 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers

On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
<razor@blackwall.org> wrote:
>
> On 2/26/25 11:55, Ian Kumlien wrote:
> > On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >>
> >> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>>
> >>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
> >>>> Same thing happens in 6.13.4, FYI
> >>>
> >>> Could you do a minor bisection? Does it not happen with 6.11?
> >>> Nothing jumps out at quick look.
> >>
> >> I have to admint that i haven't been tracking it too closely until it
> >> turned out to be an issue
> >> (makes network traffic over wireguard, through that node very slow)
> >>
> >> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
> >> (it's a gw to reach a internal server network in the basement, so not
> >> the best setup for this)
> >
> > Since i'm at work i decided to check if i could find all the boot
> > logs, which is actually done nicely by systemd
> > first known bad: 6.11.7-300.fc41.x86_64
> > last known ok: 6.11.6-200.fc40.x86_64
> >
> > Narrows the field for a bisect at least, =)
> >
>
> Saw bridge, took a look. :)
>
> I think there are multiple issues with benet's be_ndo_bridge_getlink()
> because it calls be_cmd_get_hsw_config() which can sleep in multiple
> places, e.g. the most obvious is the mutex_lock() in the beginning of
> be_cmd_get_hsw_config(), then we have the call trace here which is:
> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
>
> Maybe you updated some tool that calls down that path along with the kernel and system
> so you started seeing it in Fedora 41?

Could be but it's pretty barebones

> IMO this has been problematic for a very long time, but obviously it depends on the
> chip type. Could you share your benet chip type to confirm the path?

I don't know how to find the actual chip information but it's identified as:
Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)

> For the blamed commit I'd go with:
>  commit b71724147e73
>  Author: Sathya Perla <sathya.perla@broadcom.com>
>  Date:   Wed Jul 27 05:26:18 2016 -0400
>
>      be2net: replace polling with sleeping in the FW completion path
>
> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
> to a mutex.

So, first try will be to try without that patch then, =)

> Cheers,
>  Nik
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-26 11:52           ` Ian Kumlien
@ 2025-02-26 12:00             ` Nikolay Aleksandrov
  2025-02-26 12:26               ` Ian Kumlien
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Aleksandrov @ 2025-02-26 12:00 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: Jakub Kicinski, Linux Kernel Network Developers

On 2/26/25 13:52, Ian Kumlien wrote:
> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
> <razor@blackwall.org> wrote:
>>
>> On 2/26/25 11:55, Ian Kumlien wrote:
>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>>>>
>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>>
>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
>>>>>> Same thing happens in 6.13.4, FYI
>>>>>
>>>>> Could you do a minor bisection? Does it not happen with 6.11?
>>>>> Nothing jumps out at quick look.
>>>>
>>>> I have to admint that i haven't been tracking it too closely until it
>>>> turned out to be an issue
>>>> (makes network traffic over wireguard, through that node very slow)
>>>>
>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
>>>> (it's a gw to reach a internal server network in the basement, so not
>>>> the best setup for this)
>>>
>>> Since i'm at work i decided to check if i could find all the boot
>>> logs, which is actually done nicely by systemd
>>> first known bad: 6.11.7-300.fc41.x86_64
>>> last known ok: 6.11.6-200.fc40.x86_64
>>>
>>> Narrows the field for a bisect at least, =)
>>>
>>
>> Saw bridge, took a look. :)
>>
>> I think there are multiple issues with benet's be_ndo_bridge_getlink()
>> because it calls be_cmd_get_hsw_config() which can sleep in multiple
>> places, e.g. the most obvious is the mutex_lock() in the beginning of
>> be_cmd_get_hsw_config(), then we have the call trace here which is:
>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
>>
>> Maybe you updated some tool that calls down that path along with the kernel and system
>> so you started seeing it in Fedora 41?
> 
> Could be but it's pretty barebones
> 
>> IMO this has been problematic for a very long time, but obviously it depends on the
>> chip type. Could you share your benet chip type to confirm the path?
> 
> I don't know how to find the actual chip information but it's identified as:
> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
> 

Good, that confirms it. The skyhawk chip falls in the "else" of the block in
be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config().

>> For the blamed commit I'd go with:
>>  commit b71724147e73
>>  Author: Sathya Perla <sathya.perla@broadcom.com>
>>  Date:   Wed Jul 27 05:26:18 2016 -0400
>>
>>      be2net: replace polling with sleeping in the FW completion path
>>
>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
>> to a mutex.
> 
> So, first try will be to try without that patch then, =)
> 

That would be a good try, yes. It is not a straight-forward revert though since a lot
of changes have happened since that commit. Let me know if you need help with that,
I can prepare the revert to test.

>> Cheers,
>>  Nik
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-26 12:00             ` Nikolay Aleksandrov
@ 2025-02-26 12:26               ` Ian Kumlien
  2025-02-26 13:11                 ` Nikolay Aleksandrov
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kumlien @ 2025-02-26 12:26 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers

On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> On 2/26/25 13:52, Ian Kumlien wrote:
> > On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
> > <razor@blackwall.org> wrote:
> >>
> >> On 2/26/25 11:55, Ian Kumlien wrote:
> >>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >>>>
> >>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>>>>
> >>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
> >>>>>> Same thing happens in 6.13.4, FYI
> >>>>>
> >>>>> Could you do a minor bisection? Does it not happen with 6.11?
> >>>>> Nothing jumps out at quick look.
> >>>>
> >>>> I have to admint that i haven't been tracking it too closely until it
> >>>> turned out to be an issue
> >>>> (makes network traffic over wireguard, through that node very slow)
> >>>>
> >>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
> >>>> (it's a gw to reach a internal server network in the basement, so not
> >>>> the best setup for this)
> >>>
> >>> Since i'm at work i decided to check if i could find all the boot
> >>> logs, which is actually done nicely by systemd
> >>> first known bad: 6.11.7-300.fc41.x86_64
> >>> last known ok: 6.11.6-200.fc40.x86_64
> >>>
> >>> Narrows the field for a bisect at least, =)
> >>>
> >>
> >> Saw bridge, took a look. :)
> >>
> >> I think there are multiple issues with benet's be_ndo_bridge_getlink()
> >> because it calls be_cmd_get_hsw_config() which can sleep in multiple
> >> places, e.g. the most obvious is the mutex_lock() in the beginning of
> >> be_cmd_get_hsw_config(), then we have the call trace here which is:
> >> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
> >>
> >> Maybe you updated some tool that calls down that path along with the kernel and system
> >> so you started seeing it in Fedora 41?
> >
> > Could be but it's pretty barebones
> >
> >> IMO this has been problematic for a very long time, but obviously it depends on the
> >> chip type. Could you share your benet chip type to confirm the path?
> >
> > I don't know how to find the actual chip information but it's identified as:
> > Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
> >
>
> Good, that confirms it. The skyhawk chip falls in the "else" of the block in
> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config().
>
> >> For the blamed commit I'd go with:
> >>  commit b71724147e73
> >>  Author: Sathya Perla <sathya.perla@broadcom.com>
> >>  Date:   Wed Jul 27 05:26:18 2016 -0400
> >>
> >>      be2net: replace polling with sleeping in the FW completion path
> >>
> >> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
> >> to a mutex.
> >
> > So, first try will be to try without that patch then, =)
> >
>
> That would be a good try, yes. It is not a straight-forward revert though since a lot
> of changes have happened since that commit. Let me know if you need help with that,
> I can prepare the revert to test.

Yeah, looked at the size of it and... well... I dunno if i'd have the time =)

> >> Cheers,
> >>  Nik
> >>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-26 12:26               ` Ian Kumlien
@ 2025-02-26 13:11                 ` Nikolay Aleksandrov
  2025-02-26 22:28                   ` Ian Kumlien
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Aleksandrov @ 2025-02-26 13:11 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: Jakub Kicinski, Linux Kernel Network Developers, Sathya Perla

[-- Attachment #1: Type: text/plain, Size: 3551 bytes --]

On 2/26/25 14:26, Ian Kumlien wrote:
> On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>
>> On 2/26/25 13:52, Ian Kumlien wrote:
>>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
>>> <razor@blackwall.org> wrote:
>>>>
>>>> On 2/26/25 11:55, Ian Kumlien wrote:
>>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>>>>>>
>>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>>>>
>>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
>>>>>>>> Same thing happens in 6.13.4, FYI
>>>>>>>
>>>>>>> Could you do a minor bisection? Does it not happen with 6.11?
>>>>>>> Nothing jumps out at quick look.
>>>>>>
>>>>>> I have to admint that i haven't been tracking it too closely until it
>>>>>> turned out to be an issue
>>>>>> (makes network traffic over wireguard, through that node very slow)
>>>>>>
>>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
>>>>>> (it's a gw to reach a internal server network in the basement, so not
>>>>>> the best setup for this)
>>>>>
>>>>> Since i'm at work i decided to check if i could find all the boot
>>>>> logs, which is actually done nicely by systemd
>>>>> first known bad: 6.11.7-300.fc41.x86_64
>>>>> last known ok: 6.11.6-200.fc40.x86_64
>>>>>
>>>>> Narrows the field for a bisect at least, =)
>>>>>
>>>>
>>>> Saw bridge, took a look. :)
>>>>
>>>> I think there are multiple issues with benet's be_ndo_bridge_getlink()
>>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple
>>>> places, e.g. the most obvious is the mutex_lock() in the beginning of
>>>> be_cmd_get_hsw_config(), then we have the call trace here which is:
>>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
>>>>
>>>> Maybe you updated some tool that calls down that path along with the kernel and system
>>>> so you started seeing it in Fedora 41?
>>>
>>> Could be but it's pretty barebones
>>>
>>>> IMO this has been problematic for a very long time, but obviously it depends on the
>>>> chip type. Could you share your benet chip type to confirm the path?
>>>
>>> I don't know how to find the actual chip information but it's identified as:
>>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
>>>
>>
>> Good, that confirms it. The skyhawk chip falls in the "else" of the block in
>> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config().
>>
>>>> For the blamed commit I'd go with:
>>>>  commit b71724147e73
>>>>  Author: Sathya Perla <sathya.perla@broadcom.com>
>>>>  Date:   Wed Jul 27 05:26:18 2016 -0400
>>>>
>>>>      be2net: replace polling with sleeping in the FW completion path
>>>>
>>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
>>>> to a mutex.
>>>
>>> So, first try will be to try without that patch then, =)
>>>
>>
>> That would be a good try, yes. It is not a straight-forward revert though since a lot
>> of changes have happened since that commit. Let me know if you need help with that,
>> I can prepare the revert to test.
> 
> Yeah, looked at the size of it and... well... I dunno if i'd have the time =)
> 

Can you try the attached patch?
It is on top of net-next (but also applies to Linus' tree):
 git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git

It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay)
because the commit does many more things.

Also +CC original patch author which I forgot to do.

Thanks,
 Nik


[-- Attachment #2: 0001-benet-fix.patch --]
[-- Type: text/x-patch, Size: 26486 bytes --]

From 03517db970bea41e625c84fcff9263bae8ab679b Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Wed, 26 Feb 2025 15:05:48 +0200
Subject: [PATCH] benet fix

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
---
 drivers/net/ethernet/emulex/benet/be.h      |   2 +-
 drivers/net/ethernet/emulex/benet/be_cmds.c | 197 ++++++++++----------
 drivers/net/ethernet/emulex/benet/be_main.c |   2 +-
 3 files changed, 100 insertions(+), 101 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index e48b861e4ce1..270ff9aab335 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -562,7 +562,7 @@ struct be_adapter {
 	struct be_dma_mem mbox_mem_alloced;
 
 	struct be_mcc_obj mcc_obj;
-	struct mutex mcc_lock;	/* For serializing mcc cmds to BE card */
+	spinlock_t mcc_lock;	/* For serializing mcc cmds to BE card */
 	spinlock_t mcc_cq_lock;
 
 	u16 cfg_num_rx_irqs;		/* configured via set-channels */
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 61adcebeef01..845320334f1d 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -575,7 +575,7 @@ int be_process_mcc(struct be_adapter *adapter)
 /* Wait till no more pending mcc requests are present */
 static int be_mcc_wait_compl(struct be_adapter *adapter)
 {
-#define mcc_timeout		12000 /* 12s timeout */
+#define mcc_timeout		120000 /* 12s timeout */
 	int i, status = 0;
 	struct be_mcc_obj *mcc_obj = &adapter->mcc_obj;
 
@@ -589,7 +589,7 @@ static int be_mcc_wait_compl(struct be_adapter *adapter)
 
 		if (atomic_read(&mcc_obj->q.used) == 0)
 			break;
-		usleep_range(500, 1000);
+		udelay(100);
 	}
 	if (i == mcc_timeout) {
 		dev_err(&adapter->pdev->dev, "FW not responding\n");
@@ -866,7 +866,7 @@ static bool use_mcc(struct be_adapter *adapter)
 static int be_cmd_lock(struct be_adapter *adapter)
 {
 	if (use_mcc(adapter)) {
-		mutex_lock(&adapter->mcc_lock);
+		spin_lock_bh(&adapter->mcc_lock);
 		return 0;
 	} else {
 		return mutex_lock_interruptible(&adapter->mbox_lock);
@@ -877,7 +877,7 @@ static int be_cmd_lock(struct be_adapter *adapter)
 static void be_cmd_unlock(struct be_adapter *adapter)
 {
 	if (use_mcc(adapter))
-		return mutex_unlock(&adapter->mcc_lock);
+		return spin_unlock_bh(&adapter->mcc_lock);
 	else
 		return mutex_unlock(&adapter->mbox_lock);
 }
@@ -1047,7 +1047,7 @@ int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
 	struct be_cmd_req_mac_query *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1076,7 +1076,7 @@ int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1088,7 +1088,7 @@ int be_cmd_pmac_add(struct be_adapter *adapter, const u8 *mac_addr,
 	struct be_cmd_req_pmac_add *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1113,7 +1113,7 @@ int be_cmd_pmac_add(struct be_adapter *adapter, const u8 *mac_addr,
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 
 	if (base_status(status) == MCC_STATUS_UNAUTHORIZED_REQUEST)
 		status = -EPERM;
@@ -1131,7 +1131,7 @@ int be_cmd_pmac_del(struct be_adapter *adapter, u32 if_id, int pmac_id, u32 dom)
 	if (pmac_id == -1)
 		return 0;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1151,7 +1151,7 @@ int be_cmd_pmac_del(struct be_adapter *adapter, u32 if_id, int pmac_id, u32 dom)
 	status = be_mcc_notify_wait(adapter);
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1414,7 +1414,7 @@ int be_cmd_rxq_create(struct be_adapter *adapter,
 	struct be_dma_mem *q_mem = &rxq->dma_mem;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1444,7 +1444,7 @@ int be_cmd_rxq_create(struct be_adapter *adapter,
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1508,7 +1508,7 @@ int be_cmd_rxq_destroy(struct be_adapter *adapter, struct be_queue_info *q)
 	struct be_cmd_req_q_destroy *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1525,7 +1525,7 @@ int be_cmd_rxq_destroy(struct be_adapter *adapter, struct be_queue_info *q)
 	q->created = false;
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1593,7 +1593,7 @@ int be_cmd_get_stats(struct be_adapter *adapter, struct be_dma_mem *nonemb_cmd)
 	struct be_cmd_req_hdr *hdr;
 	int status = 0;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1621,7 +1621,7 @@ int be_cmd_get_stats(struct be_adapter *adapter, struct be_dma_mem *nonemb_cmd)
 	adapter->stats_cmd_sent = true;
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1637,7 +1637,7 @@ int lancer_cmd_get_pport_stats(struct be_adapter *adapter,
 			    CMD_SUBSYSTEM_ETH))
 		return -EPERM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1660,7 +1660,7 @@ int lancer_cmd_get_pport_stats(struct be_adapter *adapter,
 	adapter->stats_cmd_sent = true;
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1697,7 +1697,7 @@ int be_cmd_link_status_query(struct be_adapter *adapter, u16 *link_speed,
 	struct be_cmd_req_link_status *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	if (link_status)
 		*link_status = LINK_DOWN;
@@ -1736,7 +1736,7 @@ int be_cmd_link_status_query(struct be_adapter *adapter, u16 *link_speed,
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1747,7 +1747,7 @@ int be_cmd_get_die_temperature(struct be_adapter *adapter)
 	struct be_cmd_req_get_cntl_addnl_attribs *req;
 	int status = 0;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1762,7 +1762,7 @@ int be_cmd_get_die_temperature(struct be_adapter *adapter)
 
 	status = be_mcc_notify(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1811,7 +1811,7 @@ int be_cmd_get_fat_dump(struct be_adapter *adapter, u32 buf_len, void *buf)
 	if (!get_fat_cmd.va)
 		return -ENOMEM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	while (total_size) {
 		buf_size = min(total_size, (u32)60 * 1024);
@@ -1851,7 +1851,7 @@ int be_cmd_get_fat_dump(struct be_adapter *adapter, u32 buf_len, void *buf)
 err:
 	dma_free_coherent(&adapter->pdev->dev, get_fat_cmd.size,
 			  get_fat_cmd.va, get_fat_cmd.dma);
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1862,7 +1862,7 @@ int be_cmd_get_fw_ver(struct be_adapter *adapter)
 	struct be_cmd_req_get_fw_version *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1885,7 +1885,7 @@ int be_cmd_get_fw_ver(struct be_adapter *adapter)
 			sizeof(adapter->fw_on_flash));
 	}
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1899,7 +1899,7 @@ static int __be_cmd_modify_eqd(struct be_adapter *adapter,
 	struct be_cmd_req_modify_eq_delay *req;
 	int status = 0, i;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1922,7 +1922,7 @@ static int __be_cmd_modify_eqd(struct be_adapter *adapter,
 
 	status = be_mcc_notify(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1949,7 +1949,7 @@ int be_cmd_vlan_config(struct be_adapter *adapter, u32 if_id, u16 *vtag_array,
 	struct be_cmd_req_vlan_config *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -1971,7 +1971,7 @@ int be_cmd_vlan_config(struct be_adapter *adapter, u32 if_id, u16 *vtag_array,
 
 	status = be_mcc_notify_wait(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -1982,7 +1982,7 @@ static int __be_cmd_rx_filter(struct be_adapter *adapter, u32 flags, u32 value)
 	struct be_cmd_req_rx_filter *req = mem->va;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2015,7 +2015,7 @@ static int __be_cmd_rx_filter(struct be_adapter *adapter, u32 flags, u32 value)
 
 	status = be_mcc_notify_wait(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -2046,7 +2046,7 @@ int be_cmd_set_flow_control(struct be_adapter *adapter, u32 tx_fc, u32 rx_fc)
 			    CMD_SUBSYSTEM_COMMON))
 		return -EPERM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2066,7 +2066,7 @@ int be_cmd_set_flow_control(struct be_adapter *adapter, u32 tx_fc, u32 rx_fc)
 	status = be_mcc_notify_wait(adapter);
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 
 	if (base_status(status) == MCC_STATUS_FEATURE_NOT_SUPPORTED)
 		return  -EOPNOTSUPP;
@@ -2085,7 +2085,7 @@ int be_cmd_get_flow_control(struct be_adapter *adapter, u32 *tx_fc, u32 *rx_fc)
 			    CMD_SUBSYSTEM_COMMON))
 		return -EPERM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2108,7 +2108,7 @@ int be_cmd_get_flow_control(struct be_adapter *adapter, u32 *tx_fc, u32 *rx_fc)
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -2189,7 +2189,7 @@ int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable,
 	if (!(be_if_cap_flags(adapter) & BE_IF_FLAGS_RSS))
 		return 0;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2214,7 +2214,7 @@ int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable,
 
 	status = be_mcc_notify_wait(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -2226,7 +2226,7 @@ int be_cmd_set_beacon_state(struct be_adapter *adapter, u8 port_num,
 	struct be_cmd_req_enable_disable_beacon *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2247,7 +2247,7 @@ int be_cmd_set_beacon_state(struct be_adapter *adapter, u8 port_num,
 	status = be_mcc_notify_wait(adapter);
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -2258,7 +2258,7 @@ int be_cmd_get_beacon_state(struct be_adapter *adapter, u8 port_num, u32 *state)
 	struct be_cmd_req_get_beacon_state *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2282,7 +2282,7 @@ int be_cmd_get_beacon_state(struct be_adapter *adapter, u8 port_num, u32 *state)
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -2306,7 +2306,7 @@ int be_cmd_read_port_transceiver_data(struct be_adapter *adapter,
 		return -ENOMEM;
 	}
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2328,7 +2328,7 @@ int be_cmd_read_port_transceiver_data(struct be_adapter *adapter,
 		memcpy(data, resp->page_data + off, len);
 	}
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	dma_free_coherent(&adapter->pdev->dev, cmd.size, cmd.va, cmd.dma);
 	return status;
 }
@@ -2345,7 +2345,7 @@ static int lancer_cmd_write_object(struct be_adapter *adapter,
 	void *ctxt = NULL;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 	adapter->flash_status = 0;
 
 	wrb = wrb_from_mccq(adapter);
@@ -2387,7 +2387,7 @@ static int lancer_cmd_write_object(struct be_adapter *adapter,
 	if (status)
 		goto err_unlock;
 
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 
 	if (!wait_for_completion_timeout(&adapter->et_cmd_compl,
 					 msecs_to_jiffies(60000)))
@@ -2406,7 +2406,7 @@ static int lancer_cmd_write_object(struct be_adapter *adapter,
 	return status;
 
 err_unlock:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -2460,7 +2460,7 @@ static int lancer_cmd_delete_object(struct be_adapter *adapter,
 	struct be_mcc_wrb *wrb;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2478,7 +2478,7 @@ static int lancer_cmd_delete_object(struct be_adapter *adapter,
 
 	status = be_mcc_notify_wait(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -2491,7 +2491,7 @@ int lancer_cmd_read_object(struct be_adapter *adapter, struct be_dma_mem *cmd,
 	struct lancer_cmd_resp_read_object *resp;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2525,7 +2525,7 @@ int lancer_cmd_read_object(struct be_adapter *adapter, struct be_dma_mem *cmd,
 	}
 
 err_unlock:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -2537,7 +2537,7 @@ static int be_cmd_write_flashrom(struct be_adapter *adapter,
 	struct be_cmd_write_flashrom *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 	adapter->flash_status = 0;
 
 	wrb = wrb_from_mccq(adapter);
@@ -2562,7 +2562,7 @@ static int be_cmd_write_flashrom(struct be_adapter *adapter,
 	if (status)
 		goto err_unlock;
 
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 
 	if (!wait_for_completion_timeout(&adapter->et_cmd_compl,
 					 msecs_to_jiffies(40000)))
@@ -2573,7 +2573,7 @@ static int be_cmd_write_flashrom(struct be_adapter *adapter,
 	return status;
 
 err_unlock:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -2584,7 +2584,7 @@ static int be_cmd_get_flash_crc(struct be_adapter *adapter, u8 *flashed_crc,
 	struct be_mcc_wrb *wrb;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -2611,7 +2611,7 @@ static int be_cmd_get_flash_crc(struct be_adapter *adapter, u8 *flashed_crc,
 		memcpy(flashed_crc, req->crc, 4);
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3217,7 +3217,7 @@ int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac,
 	struct be_cmd_req_acpi_wol_magic_config *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3234,7 +3234,7 @@ int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac,
 	status = be_mcc_notify_wait(adapter);
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3249,7 +3249,7 @@ int be_cmd_set_loopback(struct be_adapter *adapter, u8 port_num,
 			    CMD_SUBSYSTEM_LOWLEVEL))
 		return -EPERM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3272,7 +3272,7 @@ int be_cmd_set_loopback(struct be_adapter *adapter, u8 port_num,
 	if (status)
 		goto err_unlock;
 
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 
 	if (!wait_for_completion_timeout(&adapter->et_cmd_compl,
 					 msecs_to_jiffies(SET_LB_MODE_TIMEOUT)))
@@ -3281,7 +3281,7 @@ int be_cmd_set_loopback(struct be_adapter *adapter, u8 port_num,
 	return status;
 
 err_unlock:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3298,7 +3298,7 @@ int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num,
 			    CMD_SUBSYSTEM_LOWLEVEL))
 		return -EPERM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3324,7 +3324,7 @@ int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num,
 	if (status)
 		goto err;
 
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 
 	wait_for_completion(&adapter->et_cmd_compl);
 	resp = embedded_payload(wrb);
@@ -3332,7 +3332,7 @@ int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num,
 
 	return status;
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3348,7 +3348,7 @@ int be_cmd_ddr_dma_test(struct be_adapter *adapter, u64 pattern,
 			    CMD_SUBSYSTEM_LOWLEVEL))
 		return -EPERM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3382,7 +3382,7 @@ int be_cmd_ddr_dma_test(struct be_adapter *adapter, u64 pattern,
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3393,7 +3393,7 @@ int be_cmd_get_seeprom_data(struct be_adapter *adapter,
 	struct be_cmd_req_seeprom_read *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3409,7 +3409,7 @@ int be_cmd_get_seeprom_data(struct be_adapter *adapter,
 	status = be_mcc_notify_wait(adapter);
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3424,7 +3424,7 @@ int be_cmd_get_phy_info(struct be_adapter *adapter)
 			    CMD_SUBSYSTEM_COMMON))
 		return -EPERM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3469,7 +3469,7 @@ int be_cmd_get_phy_info(struct be_adapter *adapter)
 	}
 	dma_free_coherent(&adapter->pdev->dev, cmd.size, cmd.va, cmd.dma);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3479,7 +3479,7 @@ static int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain)
 	struct be_cmd_req_set_qos *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3499,7 +3499,7 @@ static int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain)
 	status = be_mcc_notify_wait(adapter);
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3611,7 +3611,7 @@ int be_cmd_get_fn_privileges(struct be_adapter *adapter, u32 *privilege,
 	struct be_cmd_req_get_fn_privileges *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3643,7 +3643,7 @@ int be_cmd_get_fn_privileges(struct be_adapter *adapter, u32 *privilege,
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3655,7 +3655,7 @@ int be_cmd_set_fn_privileges(struct be_adapter *adapter, u32 privileges,
 	struct be_cmd_req_set_fn_privileges *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3675,7 +3675,7 @@ int be_cmd_set_fn_privileges(struct be_adapter *adapter, u32 privileges,
 
 	status = be_mcc_notify_wait(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3707,7 +3707,7 @@ int be_cmd_get_mac_from_list(struct be_adapter *adapter, u8 *mac,
 		return -ENOMEM;
 	}
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3771,7 +3771,7 @@ int be_cmd_get_mac_from_list(struct be_adapter *adapter, u8 *mac,
 	}
 
 out:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	dma_free_coherent(&adapter->pdev->dev, get_mac_list_cmd.size,
 			  get_mac_list_cmd.va, get_mac_list_cmd.dma);
 	return status;
@@ -3831,7 +3831,7 @@ int be_cmd_set_mac_list(struct be_adapter *adapter, u8 *mac_array,
 	if (!cmd.va)
 		return -ENOMEM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3853,7 +3853,7 @@ int be_cmd_set_mac_list(struct be_adapter *adapter, u8 *mac_array,
 
 err:
 	dma_free_coherent(&adapter->pdev->dev, cmd.size, cmd.va, cmd.dma);
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3889,7 +3889,7 @@ int be_cmd_set_hsw_config(struct be_adapter *adapter, u16 pvid,
 			    CMD_SUBSYSTEM_COMMON))
 		return -EPERM;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3930,7 +3930,7 @@ int be_cmd_set_hsw_config(struct be_adapter *adapter, u16 pvid,
 	status = be_mcc_notify_wait(adapter);
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -3944,7 +3944,7 @@ int be_cmd_get_hsw_config(struct be_adapter *adapter, u16 *pvid,
 	int status;
 	u16 vid;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -3991,7 +3991,7 @@ int be_cmd_get_hsw_config(struct be_adapter *adapter, u16 *pvid,
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -4190,7 +4190,7 @@ int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter,
 	struct be_cmd_req_set_ext_fat_caps *req;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -4206,7 +4206,7 @@ int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter,
 
 	status = be_mcc_notify_wait(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -4684,7 +4684,7 @@ int be_cmd_manage_iface(struct be_adapter *adapter, u32 iface, u8 op)
 	if (iface == 0xFFFFFFFF)
 		return -1;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -4701,7 +4701,7 @@ int be_cmd_manage_iface(struct be_adapter *adapter, u32 iface, u8 op)
 
 	status = be_mcc_notify_wait(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -4735,7 +4735,7 @@ int be_cmd_get_if_id(struct be_adapter *adapter, struct be_vf_cfg *vf_cfg,
 	struct be_cmd_resp_get_iface_list *resp;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -4756,7 +4756,7 @@ int be_cmd_get_if_id(struct be_adapter *adapter, struct be_vf_cfg *vf_cfg,
 	}
 
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -4850,7 +4850,7 @@ int be_cmd_enable_vf(struct be_adapter *adapter, u8 domain)
 	if (BEx_chip(adapter))
 		return 0;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -4868,7 +4868,7 @@ int be_cmd_enable_vf(struct be_adapter *adapter, u8 domain)
 	req->enable = 1;
 	status = be_mcc_notify_wait(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -4941,7 +4941,7 @@ __be_cmd_set_logical_link_config(struct be_adapter *adapter,
 	u32 link_config = 0;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -4969,7 +4969,7 @@ __be_cmd_set_logical_link_config(struct be_adapter *adapter,
 
 	status = be_mcc_notify_wait(adapter);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -5000,8 +5000,7 @@ int be_cmd_set_features(struct be_adapter *adapter)
 	struct be_mcc_wrb *wrb;
 	int status;
 
-	if (mutex_lock_interruptible(&adapter->mcc_lock))
-		return -1;
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -5039,7 +5038,7 @@ int be_cmd_set_features(struct be_adapter *adapter)
 		dev_info(&adapter->pdev->dev,
 			 "Adapter does not support HW error recovery\n");
 
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 
@@ -5053,7 +5052,7 @@ int be_roce_mcc_cmd(void *netdev_handle, void *wrb_payload,
 	struct be_cmd_resp_hdr *resp;
 	int status;
 
-	mutex_lock(&adapter->mcc_lock);
+	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
 	if (!wrb) {
@@ -5076,7 +5075,7 @@ int be_roce_mcc_cmd(void *netdev_handle, void *wrb_payload,
 	memcpy(wrb_payload, resp, sizeof(*resp) + resp->response_length);
 	be_dws_le_to_cpu(wrb_payload, sizeof(*resp) + resp->response_length);
 err:
-	mutex_unlock(&adapter->mcc_lock);
+	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
 EXPORT_SYMBOL(be_roce_mcc_cmd);
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 875fe379eea2..3d2e21592119 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -5667,8 +5667,8 @@ static int be_drv_init(struct be_adapter *adapter)
 	}
 
 	mutex_init(&adapter->mbox_lock);
-	mutex_init(&adapter->mcc_lock);
 	mutex_init(&adapter->rx_filter_lock);
+	spin_lock_init(&adapter->mcc_lock);
 	spin_lock_init(&adapter->mcc_cq_lock);
 	init_completion(&adapter->et_cmd_compl);
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-26 13:11                 ` Nikolay Aleksandrov
@ 2025-02-26 22:28                   ` Ian Kumlien
  2025-02-27 14:31                     ` Ian Kumlien
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kumlien @ 2025-02-26 22:28 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Jakub Kicinski, Linux Kernel Network Developers, Sathya Perla

On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> On 2/26/25 14:26, Ian Kumlien wrote:
> > On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
> >>
> >> On 2/26/25 13:52, Ian Kumlien wrote:
> >>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
> >>> <razor@blackwall.org> wrote:
> >>>>
> >>>> On 2/26/25 11:55, Ian Kumlien wrote:
> >>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >>>>>>
> >>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>>>>>>
> >>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
> >>>>>>>> Same thing happens in 6.13.4, FYI
> >>>>>>>
> >>>>>>> Could you do a minor bisection? Does it not happen with 6.11?
> >>>>>>> Nothing jumps out at quick look.
> >>>>>>
> >>>>>> I have to admint that i haven't been tracking it too closely until it
> >>>>>> turned out to be an issue
> >>>>>> (makes network traffic over wireguard, through that node very slow)
> >>>>>>
> >>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
> >>>>>> (it's a gw to reach a internal server network in the basement, so not
> >>>>>> the best setup for this)
> >>>>>
> >>>>> Since i'm at work i decided to check if i could find all the boot
> >>>>> logs, which is actually done nicely by systemd
> >>>>> first known bad: 6.11.7-300.fc41.x86_64
> >>>>> last known ok: 6.11.6-200.fc40.x86_64
> >>>>>
> >>>>> Narrows the field for a bisect at least, =)
> >>>>>
> >>>>
> >>>> Saw bridge, took a look. :)
> >>>>
> >>>> I think there are multiple issues with benet's be_ndo_bridge_getlink()
> >>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple
> >>>> places, e.g. the most obvious is the mutex_lock() in the beginning of
> >>>> be_cmd_get_hsw_config(), then we have the call trace here which is:
> >>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
> >>>>
> >>>> Maybe you updated some tool that calls down that path along with the kernel and system
> >>>> so you started seeing it in Fedora 41?
> >>>
> >>> Could be but it's pretty barebones
> >>>
> >>>> IMO this has been problematic for a very long time, but obviously it depends on the
> >>>> chip type. Could you share your benet chip type to confirm the path?
> >>>
> >>> I don't know how to find the actual chip information but it's identified as:
> >>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
> >>>
> >>
> >> Good, that confirms it. The skyhawk chip falls in the "else" of the block in
> >> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config().
> >>
> >>>> For the blamed commit I'd go with:
> >>>>  commit b71724147e73
> >>>>  Author: Sathya Perla <sathya.perla@broadcom.com>
> >>>>  Date:   Wed Jul 27 05:26:18 2016 -0400
> >>>>
> >>>>      be2net: replace polling with sleeping in the FW completion path
> >>>>
> >>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
> >>>> to a mutex.
> >>>
> >>> So, first try will be to try without that patch then, =)
> >>>
> >>
> >> That would be a good try, yes. It is not a straight-forward revert though since a lot
> >> of changes have happened since that commit. Let me know if you need help with that,
> >> I can prepare the revert to test.
> >
> > Yeah, looked at the size of it and... well... I dunno if i'd have the time =)
> >
>
> Can you try the attached patch?
> It is on top of net-next (but also applies to Linus' tree):
>  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
>
> It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay)
> because the commit does many more things.
>
> Also +CC original patch author which I forgot to do.

Thanks, built and installed but it refuses to boot it - will have to
check during the weekend...
(boots the latest fedora version even if this one is the selected one
according to grubby)

> Thanks,
>  Nik
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-26 22:28                   ` Ian Kumlien
@ 2025-02-27 14:31                     ` Ian Kumlien
  2025-02-27 14:33                       ` Nikolay Aleksandrov
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kumlien @ 2025-02-27 14:31 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers

On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
> >
> > On 2/26/25 14:26, Ian Kumlien wrote:
> > > On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
> > >>
> > >> On 2/26/25 13:52, Ian Kumlien wrote:
> > >>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
> > >>> <razor@blackwall.org> wrote:
> > >>>>
> > >>>> On 2/26/25 11:55, Ian Kumlien wrote:
> > >>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > >>>>>>
> > >>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >>>>>>>
> > >>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
> > >>>>>>>> Same thing happens in 6.13.4, FYI
> > >>>>>>>
> > >>>>>>> Could you do a minor bisection? Does it not happen with 6.11?
> > >>>>>>> Nothing jumps out at quick look.
> > >>>>>>
> > >>>>>> I have to admint that i haven't been tracking it too closely until it
> > >>>>>> turned out to be an issue
> > >>>>>> (makes network traffic over wireguard, through that node very slow)
> > >>>>>>
> > >>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
> > >>>>>> (it's a gw to reach a internal server network in the basement, so not
> > >>>>>> the best setup for this)
> > >>>>>
> > >>>>> Since i'm at work i decided to check if i could find all the boot
> > >>>>> logs, which is actually done nicely by systemd
> > >>>>> first known bad: 6.11.7-300.fc41.x86_64
> > >>>>> last known ok: 6.11.6-200.fc40.x86_64
> > >>>>>
> > >>>>> Narrows the field for a bisect at least, =)
> > >>>>>
> > >>>>
> > >>>> Saw bridge, took a look. :)
> > >>>>
> > >>>> I think there are multiple issues with benet's be_ndo_bridge_getlink()
> > >>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple
> > >>>> places, e.g. the most obvious is the mutex_lock() in the beginning of
> > >>>> be_cmd_get_hsw_config(), then we have the call trace here which is:
> > >>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
> > >>>>
> > >>>> Maybe you updated some tool that calls down that path along with the kernel and system
> > >>>> so you started seeing it in Fedora 41?
> > >>>
> > >>> Could be but it's pretty barebones
> > >>>
> > >>>> IMO this has been problematic for a very long time, but obviously it depends on the
> > >>>> chip type. Could you share your benet chip type to confirm the path?
> > >>>
> > >>> I don't know how to find the actual chip information but it's identified as:
> > >>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
> > >>>
> > >>
> > >> Good, that confirms it. The skyhawk chip falls in the "else" of the block in
> > >> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config().
> > >>
> > >>>> For the blamed commit I'd go with:
> > >>>>  commit b71724147e73
> > >>>>  Author: Sathya Perla <sathya.perla@broadcom.com>
> > >>>>  Date:   Wed Jul 27 05:26:18 2016 -0400
> > >>>>
> > >>>>      be2net: replace polling with sleeping in the FW completion path
> > >>>>
> > >>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
> > >>>> to a mutex.
> > >>>
> > >>> So, first try will be to try without that patch then, =)
> > >>>
> > >>
> > >> That would be a good try, yes. It is not a straight-forward revert though since a lot
> > >> of changes have happened since that commit. Let me know if you need help with that,
> > >> I can prepare the revert to test.
> > >
> > > Yeah, looked at the size of it and... well... I dunno if i'd have the time =)
> > >
> >
> > Can you try the attached patch?
> > It is on top of net-next (but also applies to Linus' tree):
> >  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> >
> > It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay)
> > because the commit does many more things.
> >
> > Also +CC original patch author which I forgot to do.
>
> Thanks, built and installed but it refuses to boot it - will have to
> check during the weekend...
> (boots the latest fedora version even if this one is the selected one
> according to grubby)

So, saw that 6.13.5 was released so, fetched that, applied the patch
and no more RCU issues in dmesg

Will check more on the suspected performance bit as well when i get
home later tonight

I also understand Sathya Perla's motivation in saving power on this
but things around it have been changed
and it no longer works as intended....

> > Thanks,
> >  Nik
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-27 14:31                     ` Ian Kumlien
@ 2025-02-27 14:33                       ` Nikolay Aleksandrov
  2025-02-27 14:36                         ` Ian Kumlien
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Aleksandrov @ 2025-02-27 14:33 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: Jakub Kicinski, Linux Kernel Network Developers

On 2/27/25 16:31, Ian Kumlien wrote:
> On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>>
>> On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>>
>>> On 2/26/25 14:26, Ian Kumlien wrote:
>>>> On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>>>>
>>>>> On 2/26/25 13:52, Ian Kumlien wrote:
>>>>>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
>>>>>> <razor@blackwall.org> wrote:
>>>>>>>
>>>>>>> On 2/26/25 11:55, Ian Kumlien wrote:
>>>>>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
>>>>>>>>>>> Same thing happens in 6.13.4, FYI
>>>>>>>>>>
>>>>>>>>>> Could you do a minor bisection? Does it not happen with 6.11?
>>>>>>>>>> Nothing jumps out at quick look.
>>>>>>>>>
>>>>>>>>> I have to admint that i haven't been tracking it too closely until it
>>>>>>>>> turned out to be an issue
>>>>>>>>> (makes network traffic over wireguard, through that node very slow)
>>>>>>>>>
>>>>>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
>>>>>>>>> (it's a gw to reach a internal server network in the basement, so not
>>>>>>>>> the best setup for this)
>>>>>>>>
>>>>>>>> Since i'm at work i decided to check if i could find all the boot
>>>>>>>> logs, which is actually done nicely by systemd
>>>>>>>> first known bad: 6.11.7-300.fc41.x86_64
>>>>>>>> last known ok: 6.11.6-200.fc40.x86_64
>>>>>>>>
>>>>>>>> Narrows the field for a bisect at least, =)
>>>>>>>>
>>>>>>>
>>>>>>> Saw bridge, took a look. :)
>>>>>>>
>>>>>>> I think there are multiple issues with benet's be_ndo_bridge_getlink()
>>>>>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple
>>>>>>> places, e.g. the most obvious is the mutex_lock() in the beginning of
>>>>>>> be_cmd_get_hsw_config(), then we have the call trace here which is:
>>>>>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
>>>>>>>
>>>>>>> Maybe you updated some tool that calls down that path along with the kernel and system
>>>>>>> so you started seeing it in Fedora 41?
>>>>>>
>>>>>> Could be but it's pretty barebones
>>>>>>
>>>>>>> IMO this has been problematic for a very long time, but obviously it depends on the
>>>>>>> chip type. Could you share your benet chip type to confirm the path?
>>>>>>
>>>>>> I don't know how to find the actual chip information but it's identified as:
>>>>>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
>>>>>>
>>>>>
>>>>> Good, that confirms it. The skyhawk chip falls in the "else" of the block in
>>>>> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config().
>>>>>
>>>>>>> For the blamed commit I'd go with:
>>>>>>>  commit b71724147e73
>>>>>>>  Author: Sathya Perla <sathya.perla@broadcom.com>
>>>>>>>  Date:   Wed Jul 27 05:26:18 2016 -0400
>>>>>>>
>>>>>>>      be2net: replace polling with sleeping in the FW completion path
>>>>>>>
>>>>>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
>>>>>>> to a mutex.
>>>>>>
>>>>>> So, first try will be to try without that patch then, =)
>>>>>>
>>>>>
>>>>> That would be a good try, yes. It is not a straight-forward revert though since a lot
>>>>> of changes have happened since that commit. Let me know if you need help with that,
>>>>> I can prepare the revert to test.
>>>>
>>>> Yeah, looked at the size of it and... well... I dunno if i'd have the time =)
>>>>
>>>
>>> Can you try the attached patch?
>>> It is on top of net-next (but also applies to Linus' tree):
>>>  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
>>>
>>> It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay)
>>> because the commit does many more things.
>>>
>>> Also +CC original patch author which I forgot to do.
>>
>> Thanks, built and installed but it refuses to boot it - will have to
>> check during the weekend...
>> (boots the latest fedora version even if this one is the selected one
>> according to grubby)
> 
> So, saw that 6.13.5 was released so, fetched that, applied the patch
> and no more RCU issues in dmesg
> 
> Will check more on the suspected performance bit as well when i get
> home later tonight
> 
> I also understand Sathya Perla's motivation in saving power on this
> but things around it have been changed
> and it no longer works as intended....
> 

Nice, that's good to hear. Wrt the motivation - sure it's ok, but the code was wrong
if they still want to achieve it, they need to work on an alternative solution.
We shouldn't keep broken code around.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-27 14:33                       ` Nikolay Aleksandrov
@ 2025-02-27 14:36                         ` Ian Kumlien
  2025-02-27 14:45                           ` Nikolay Aleksandrov
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Kumlien @ 2025-02-27 14:36 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers

On Thu, Feb 27, 2025 at 3:33 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> On 2/27/25 16:31, Ian Kumlien wrote:
> > On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >>
> >> On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
> >>>
> >>> On 2/26/25 14:26, Ian Kumlien wrote:
> >>>> On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
> >>>>>
> >>>>> On 2/26/25 13:52, Ian Kumlien wrote:
> >>>>>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
> >>>>>> <razor@blackwall.org> wrote:
> >>>>>>>
> >>>>>>> On 2/26/25 11:55, Ian Kumlien wrote:
> >>>>>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
> >>>>>>>>>>> Same thing happens in 6.13.4, FYI
> >>>>>>>>>>
> >>>>>>>>>> Could you do a minor bisection? Does it not happen with 6.11?
> >>>>>>>>>> Nothing jumps out at quick look.
> >>>>>>>>>
> >>>>>>>>> I have to admint that i haven't been tracking it too closely until it
> >>>>>>>>> turned out to be an issue
> >>>>>>>>> (makes network traffic over wireguard, through that node very slow)
> >>>>>>>>>
> >>>>>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
> >>>>>>>>> (it's a gw to reach a internal server network in the basement, so not
> >>>>>>>>> the best setup for this)
> >>>>>>>>
> >>>>>>>> Since i'm at work i decided to check if i could find all the boot
> >>>>>>>> logs, which is actually done nicely by systemd
> >>>>>>>> first known bad: 6.11.7-300.fc41.x86_64
> >>>>>>>> last known ok: 6.11.6-200.fc40.x86_64
> >>>>>>>>
> >>>>>>>> Narrows the field for a bisect at least, =)
> >>>>>>>>
> >>>>>>>
> >>>>>>> Saw bridge, took a look. :)
> >>>>>>>
> >>>>>>> I think there are multiple issues with benet's be_ndo_bridge_getlink()
> >>>>>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple
> >>>>>>> places, e.g. the most obvious is the mutex_lock() in the beginning of
> >>>>>>> be_cmd_get_hsw_config(), then we have the call trace here which is:
> >>>>>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
> >>>>>>>
> >>>>>>> Maybe you updated some tool that calls down that path along with the kernel and system
> >>>>>>> so you started seeing it in Fedora 41?
> >>>>>>
> >>>>>> Could be but it's pretty barebones
> >>>>>>
> >>>>>>> IMO this has been problematic for a very long time, but obviously it depends on the
> >>>>>>> chip type. Could you share your benet chip type to confirm the path?
> >>>>>>
> >>>>>> I don't know how to find the actual chip information but it's identified as:
> >>>>>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
> >>>>>>
> >>>>>
> >>>>> Good, that confirms it. The skyhawk chip falls in the "else" of the block in
> >>>>> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config().
> >>>>>
> >>>>>>> For the blamed commit I'd go with:
> >>>>>>>  commit b71724147e73
> >>>>>>>  Author: Sathya Perla <sathya.perla@broadcom.com>
> >>>>>>>  Date:   Wed Jul 27 05:26:18 2016 -0400
> >>>>>>>
> >>>>>>>      be2net: replace polling with sleeping in the FW completion path
> >>>>>>>
> >>>>>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
> >>>>>>> to a mutex.
> >>>>>>
> >>>>>> So, first try will be to try without that patch then, =)
> >>>>>>
> >>>>>
> >>>>> That would be a good try, yes. It is not a straight-forward revert though since a lot
> >>>>> of changes have happened since that commit. Let me know if you need help with that,
> >>>>> I can prepare the revert to test.
> >>>>
> >>>> Yeah, looked at the size of it and... well... I dunno if i'd have the time =)
> >>>>
> >>>
> >>> Can you try the attached patch?
> >>> It is on top of net-next (but also applies to Linus' tree):
> >>>  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> >>>
> >>> It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay)
> >>> because the commit does many more things.
> >>>
> >>> Also +CC original patch author which I forgot to do.
> >>
> >> Thanks, built and installed but it refuses to boot it - will have to
> >> check during the weekend...
> >> (boots the latest fedora version even if this one is the selected one
> >> according to grubby)
> >
> > So, saw that 6.13.5 was released so, fetched that, applied the patch
> > and no more RCU issues in dmesg
> >
> > Will check more on the suspected performance bit as well when i get
> > home later tonight
> >
> > I also understand Sathya Perla's motivation in saving power on this
> > but things around it have been changed
> > and it no longer works as intended....
> >
>
> Nice, that's good to hear. Wrt the motivation - sure it's ok, but the code was wrong
> if they still want to achieve it, they need to work on an alternative solution.
> We shouldn't keep broken code around.

Agreed, but also, was it broken in 4.7 ;)

Anyway, seems faster from what i can test here so
Tested-by: Ian Kumlien <ian.kumlien@gmail.com>

etc etc

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-27 14:36                         ` Ian Kumlien
@ 2025-02-27 14:45                           ` Nikolay Aleksandrov
  2025-02-27 15:52                             ` Ian Kumlien
  0 siblings, 1 reply; 16+ messages in thread
From: Nikolay Aleksandrov @ 2025-02-27 14:45 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: Jakub Kicinski, Linux Kernel Network Developers

On 2/27/25 16:36, Ian Kumlien wrote:
> On Thu, Feb 27, 2025 at 3:33 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>
>> On 2/27/25 16:31, Ian Kumlien wrote:
>>> On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>>>>
>>>> On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>>>>
>>>>> On 2/26/25 14:26, Ian Kumlien wrote:
>>>>>> On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>>>>>>
>>>>>>> On 2/26/25 13:52, Ian Kumlien wrote:
>>>>>>>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
>>>>>>>> <razor@blackwall.org> wrote:
>>>>>>>>>
>>>>>>>>> On 2/26/25 11:55, Ian Kumlien wrote:
>>>>>>>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
>>>>>>>>>>>>> Same thing happens in 6.13.4, FYI
>>>>>>>>>>>>
>>>>>>>>>>>> Could you do a minor bisection? Does it not happen with 6.11?
>>>>>>>>>>>> Nothing jumps out at quick look.
>>>>>>>>>>>
>>>>>>>>>>> I have to admint that i haven't been tracking it too closely until it
>>>>>>>>>>> turned out to be an issue
>>>>>>>>>>> (makes network traffic over wireguard, through that node very slow)
>>>>>>>>>>>
>>>>>>>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
>>>>>>>>>>> (it's a gw to reach a internal server network in the basement, so not
>>>>>>>>>>> the best setup for this)
>>>>>>>>>>
>>>>>>>>>> Since i'm at work i decided to check if i could find all the boot
>>>>>>>>>> logs, which is actually done nicely by systemd
>>>>>>>>>> first known bad: 6.11.7-300.fc41.x86_64
>>>>>>>>>> last known ok: 6.11.6-200.fc40.x86_64
>>>>>>>>>>
>>>>>>>>>> Narrows the field for a bisect at least, =)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Saw bridge, took a look. :)
>>>>>>>>>
>>>>>>>>> I think there are multiple issues with benet's be_ndo_bridge_getlink()
>>>>>>>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple
>>>>>>>>> places, e.g. the most obvious is the mutex_lock() in the beginning of
>>>>>>>>> be_cmd_get_hsw_config(), then we have the call trace here which is:
>>>>>>>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
>>>>>>>>>
>>>>>>>>> Maybe you updated some tool that calls down that path along with the kernel and system
>>>>>>>>> so you started seeing it in Fedora 41?
>>>>>>>>
>>>>>>>> Could be but it's pretty barebones
>>>>>>>>
>>>>>>>>> IMO this has been problematic for a very long time, but obviously it depends on the
>>>>>>>>> chip type. Could you share your benet chip type to confirm the path?
>>>>>>>>
>>>>>>>> I don't know how to find the actual chip information but it's identified as:
>>>>>>>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
>>>>>>>>
>>>>>>>
>>>>>>> Good, that confirms it. The skyhawk chip falls in the "else" of the block in
>>>>>>> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config().
>>>>>>>
>>>>>>>>> For the blamed commit I'd go with:
>>>>>>>>>  commit b71724147e73
>>>>>>>>>  Author: Sathya Perla <sathya.perla@broadcom.com>
>>>>>>>>>  Date:   Wed Jul 27 05:26:18 2016 -0400
>>>>>>>>>
>>>>>>>>>      be2net: replace polling with sleeping in the FW completion path
>>>>>>>>>
>>>>>>>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
>>>>>>>>> to a mutex.
>>>>>>>>
>>>>>>>> So, first try will be to try without that patch then, =)
>>>>>>>>
>>>>>>>
>>>>>>> That would be a good try, yes. It is not a straight-forward revert though since a lot
>>>>>>> of changes have happened since that commit. Let me know if you need help with that,
>>>>>>> I can prepare the revert to test.
>>>>>>
>>>>>> Yeah, looked at the size of it and... well... I dunno if i'd have the time =)
>>>>>>
>>>>>
>>>>> Can you try the attached patch?
>>>>> It is on top of net-next (but also applies to Linus' tree):
>>>>>  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
>>>>>
>>>>> It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay)
>>>>> because the commit does many more things.
>>>>>
>>>>> Also +CC original patch author which I forgot to do.
>>>>
>>>> Thanks, built and installed but it refuses to boot it - will have to
>>>> check during the weekend...
>>>> (boots the latest fedora version even if this one is the selected one
>>>> according to grubby)
>>>
>>> So, saw that 6.13.5 was released so, fetched that, applied the patch
>>> and no more RCU issues in dmesg
>>>
>>> Will check more on the suspected performance bit as well when i get
>>> home later tonight
>>>
>>> I also understand Sathya Perla's motivation in saving power on this
>>> but things around it have been changed
>>> and it no longer works as intended....
>>>
>>
>> Nice, that's good to hear. Wrt the motivation - sure it's ok, but the code was wrong
>> if they still want to achieve it, they need to work on an alternative solution.
>> We shouldn't keep broken code around.
> 
> Agreed, but also, was it broken in 4.7 ;)
> 

Since 4.9, yes it has. I just checked out v4.9 and it has all these bugs present.
If you boot 4.9 and issue PF_BRIDGE RTM_GETLINK you'll hit the same problems.

> Anyway, seems faster from what i can test here so
> Tested-by: Ian Kumlien <ian.kumlien@gmail.com>
> 
> etc etc

Thank you, I'll clean up the patch and submit it.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section!
  2025-02-27 14:45                           ` Nikolay Aleksandrov
@ 2025-02-27 15:52                             ` Ian Kumlien
  0 siblings, 0 replies; 16+ messages in thread
From: Ian Kumlien @ 2025-02-27 15:52 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: Jakub Kicinski, Linux Kernel Network Developers

On Thu, Feb 27, 2025 at 3:45 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> On 2/27/25 16:36, Ian Kumlien wrote:
> > On Thu, Feb 27, 2025 at 3:33 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
> >>
> >> On 2/27/25 16:31, Ian Kumlien wrote:
> >>> On Wed, Feb 26, 2025 at 11:28 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >>>>
> >>>> On Wed, Feb 26, 2025 at 2:11 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
> >>>>>
> >>>>> On 2/26/25 14:26, Ian Kumlien wrote:
> >>>>>> On Wed, Feb 26, 2025 at 1:00 PM Nikolay Aleksandrov <razor@blackwall.org> wrote:
> >>>>>>>
> >>>>>>> On 2/26/25 13:52, Ian Kumlien wrote:
> >>>>>>>> On Wed, Feb 26, 2025 at 11:33 AM Nikolay Aleksandrov
> >>>>>>>> <razor@blackwall.org> wrote:
> >>>>>>>>>
> >>>>>>>>> On 2/26/25 11:55, Ian Kumlien wrote:
> >>>>>>>>>> On Wed, Feb 26, 2025 at 10:24 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Feb 26, 2025 at 2:05 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, 25 Feb 2025 11:13:47 +0100 Ian Kumlien wrote:
> >>>>>>>>>>>>> Same thing happens in 6.13.4, FYI
> >>>>>>>>>>>>
> >>>>>>>>>>>> Could you do a minor bisection? Does it not happen with 6.11?
> >>>>>>>>>>>> Nothing jumps out at quick look.
> >>>>>>>>>>>
> >>>>>>>>>>> I have to admint that i haven't been tracking it too closely until it
> >>>>>>>>>>> turned out to be an issue
> >>>>>>>>>>> (makes network traffic over wireguard, through that node very slow)
> >>>>>>>>>>>
> >>>>>>>>>>> But i'm pretty sure it was ok in early 6.12.x - I'll try to do a bisect though
> >>>>>>>>>>> (it's a gw to reach a internal server network in the basement, so not
> >>>>>>>>>>> the best setup for this)
> >>>>>>>>>>
> >>>>>>>>>> Since i'm at work i decided to check if i could find all the boot
> >>>>>>>>>> logs, which is actually done nicely by systemd
> >>>>>>>>>> first known bad: 6.11.7-300.fc41.x86_64
> >>>>>>>>>> last known ok: 6.11.6-200.fc40.x86_64
> >>>>>>>>>>
> >>>>>>>>>> Narrows the field for a bisect at least, =)
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Saw bridge, took a look. :)
> >>>>>>>>>
> >>>>>>>>> I think there are multiple issues with benet's be_ndo_bridge_getlink()
> >>>>>>>>> because it calls be_cmd_get_hsw_config() which can sleep in multiple
> >>>>>>>>> places, e.g. the most obvious is the mutex_lock() in the beginning of
> >>>>>>>>> be_cmd_get_hsw_config(), then we have the call trace here which is:
> >>>>>>>>> be_cmd_get_hsw_config -> be_mcc_notify_wait -> be_mcc_wait_compl -> usleep_range()
> >>>>>>>>>
> >>>>>>>>> Maybe you updated some tool that calls down that path along with the kernel and system
> >>>>>>>>> so you started seeing it in Fedora 41?
> >>>>>>>>
> >>>>>>>> Could be but it's pretty barebones
> >>>>>>>>
> >>>>>>>>> IMO this has been problematic for a very long time, but obviously it depends on the
> >>>>>>>>> chip type. Could you share your benet chip type to confirm the path?
> >>>>>>>>
> >>>>>>>> I don't know how to find the actual chip information but it's identified as:
> >>>>>>>> Emulex Corporation OneConnect NIC (Skyhawk) (rev 10)
> >>>>>>>>
> >>>>>>>
> >>>>>>> Good, that confirms it. The skyhawk chip falls in the "else" of the block in
> >>>>>>> be_ndo_bridge_getlink() which calls be_cmd_get_hsw_config().
> >>>>>>>
> >>>>>>>>> For the blamed commit I'd go with:
> >>>>>>>>>  commit b71724147e73
> >>>>>>>>>  Author: Sathya Perla <sathya.perla@broadcom.com>
> >>>>>>>>>  Date:   Wed Jul 27 05:26:18 2016 -0400
> >>>>>>>>>
> >>>>>>>>>      be2net: replace polling with sleeping in the FW completion path
> >>>>>>>>>
> >>>>>>>>> This one changed the udelay() (which is safe) to usleep_range() and the spinlock
> >>>>>>>>> to a mutex.
> >>>>>>>>
> >>>>>>>> So, first try will be to try without that patch then, =)
> >>>>>>>>
> >>>>>>>
> >>>>>>> That would be a good try, yes. It is not a straight-forward revert though since a lot
> >>>>>>> of changes have happened since that commit. Let me know if you need help with that,
> >>>>>>> I can prepare the revert to test.
> >>>>>>
> >>>>>> Yeah, looked at the size of it and... well... I dunno if i'd have the time =)
> >>>>>>
> >>>>>
> >>>>> Can you try the attached patch?
> >>>>> It is on top of net-next (but also applies to Linus' tree):
> >>>>>  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> >>>>>
> >>>>> It partially reverts the mentioned commit above (only mutex -> spinlock and usleep -> udelay)
> >>>>> because the commit does many more things.
> >>>>>
> >>>>> Also +CC original patch author which I forgot to do.
> >>>>
> >>>> Thanks, built and installed but it refuses to boot it - will have to
> >>>> check during the weekend...
> >>>> (boots the latest fedora version even if this one is the selected one
> >>>> according to grubby)
> >>>
> >>> So, saw that 6.13.5 was released so, fetched that, applied the patch
> >>> and no more RCU issues in dmesg
> >>>
> >>> Will check more on the suspected performance bit as well when i get
> >>> home later tonight
> >>>
> >>> I also understand Sathya Perla's motivation in saving power on this
> >>> but things around it have been changed
> >>> and it no longer works as intended....
> >>>
> >>
> >> Nice, that's good to hear. Wrt the motivation - sure it's ok, but the code was wrong
> >> if they still want to achieve it, they need to work on an alternative solution.
> >> We shouldn't keep broken code around.
> >
> > Agreed, but also, was it broken in 4.7 ;)
> >
>
> Since 4.9, yes it has. I just checked out v4.9 and it has all these bugs present.
> If you boot 4.9 and issue PF_BRIDGE RTM_GETLINK you'll hit the same problems.

Ah!, ok!

> > Anyway, seems faster from what i can test here so
> > Tested-by: Ian Kumlien <ian.kumlien@gmail.com>
> >
> > etc etc
>
> Thank you, I'll clean up the patch and submit it.

Thank you, =)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-02-27 15:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-25  8:05 [6.12.15][be2net?] Voluntary context switch within RCU read-side critical section! Ian Kumlien
2025-02-25 10:13 ` Ian Kumlien
2025-02-26  1:05   ` Jakub Kicinski
2025-02-26  9:24     ` Ian Kumlien
2025-02-26  9:55       ` Ian Kumlien
2025-02-26 10:33         ` Nikolay Aleksandrov
2025-02-26 11:52           ` Ian Kumlien
2025-02-26 12:00             ` Nikolay Aleksandrov
2025-02-26 12:26               ` Ian Kumlien
2025-02-26 13:11                 ` Nikolay Aleksandrov
2025-02-26 22:28                   ` Ian Kumlien
2025-02-27 14:31                     ` Ian Kumlien
2025-02-27 14:33                       ` Nikolay Aleksandrov
2025-02-27 14:36                         ` Ian Kumlien
2025-02-27 14:45                           ` Nikolay Aleksandrov
2025-02-27 15:52                             ` Ian Kumlien

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).