linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload
@ 2018-09-24  8:56 Abdul Haleem
  2018-09-24  9:35 ` Oliver
  0 siblings, 1 reply; 8+ messages in thread
From: Abdul Haleem @ 2018-09-24  8:56 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-kernel, netdev, maurosr, mpe, manvanth, sim

Greeting's

bnx2x module load/unload test results in continuous hard LOCKUP trace on
my powerpc bare-metal running mainline 4.19.0-rc4 kernel

the instruction address points to:

0xc00000000009d048 is in opal_interrupt
(arch/powerpc/platforms/powernv/opal-irqchip.c:133).
128	
129	static irqreturn_t opal_interrupt(int irq, void *data)
130	{
131		__be64 events;
132	
133		opal_handle_interrupt(virq_to_hw(irq), &events);
134		last_outstanding_events = be64_to_cpu(events);
135		if (opal_have_pending_events())
136			opal_wake_poller();
137	

trace:
bnx2x 0008:01:00.3 enP8p1s0f3: renamed from eth0
bnx2x 0008:01:00.3 enP8p1s0f3: using MSI-X  IRQs: sp 297  fp[0] 299 ... fp[7] 306
bnx2x 0008:01:00.2 enP8p1s0f2: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
bnx2x 0008:01:00.3 enP8p1s0f3: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
bnx2x: QLogic 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.712.30-0 (2014/02/10)
bnx2x 0008:01:00.0: msix capability found
bnx2x 0008:01:00.0: Using 64-bit DMA iommu bypass
bnx2x 0008:01:00.0: part number 0-0-0-0
bnx2x 0008:01:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
bnx2x 0008:01:00.0 enP8p1s0f0: renamed from eth0
bnx2x 0008:01:00.1: msix capability found
bnx2x 0008:01:00.1: Using 64-bit DMA iommu bypass
bnx2x 0008:01:00.1: part number 0-0-0-0
bnx2x 0008:01:00.0 enP8p1s0f0: using MSI-X  IRQs: sp 267  fp[0] 269 ... fp[7] 276
bnx2x 0008:01:00.0 enP8p1s0f0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
bnx2x 0008:01:00.1: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
bnx2x 0008:01:00.1 enP8p1s0f1: renamed from eth0
bnx2x 0008:01:00.2: msix capability found
bnx2x 0008:01:00.2: Using 64-bit DMA iommu bypass
bnx2x 0008:01:00.2: part number 0-0-0-0
bnx2x 0008:01:00.1 enP8p1s0f1: using MSI-X  IRQs: sp 277  fp[0] 279 ... fp[7] 286
bnx2x 0008:01:00.1 enP8p1s0f1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70
watchdog: CPU 80 TB:980794111093, last heartbeat TB:973959617200 (13348ms ago)
Modules linked in: bnx2x(+) iptable_mangle ipt_MASQUERADE iptable_nat
nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT
nf_reject_ipv4 xt_tcpudp tun bridge stp llc iptable_filter dm_mirror
dm_region_hash dm_log dm_service_time vmx_crypto powernv_rng rng_core
dm_multipath kvm_hv kvm binfmt_misc nfsd ip_tables x_tables autofs4 xfs
lpfc crc_t10dif crct10dif_generic nvme_fc nvme_fabrics mdio libcrc32c
nvme_core crct10dif_common [last unloaded: bnx2x]
CPU: 80 PID: 0 Comm: swapper/80 Not tainted 4.19.0-rc4-autotest-autotest #1
NIP:  c00000000009d048 LR: c000000000092fd0 CTR: 0000000030032a00
REGS: c000003fff493d80 TRAP: 0900   Not tainted  (4.19.0-rc4-autotest-autotest)
MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48004042  XER: 00000000
CFAR: c000000000092fbc IRQMASK: 1 
GPR00: 0000000030005128 c000003fff70f220 c0000000010ae500 0000000000000000 
GPR04: 0000000048004042 c00000000009d048 9000000000009033 0000000000000090 
GPR08: 0000000000000000 0000000000000000 c000000000092fe4 9000000000001003 
GPR12: c000000000092fbc c000003fff7ff300 c000003c96c80c00 0000000000010000 
GPR16: 0000000000000000 000000000000003c c000003c96c80800 c000003c96d00700 
GPR20: 0000000000000001 0000000000000001 0000000000000002 0000000000000014 
GPR24: c000001fe8741000 c000003fff70f330 0000000000000000 c000003ca947fb40 
GPR28: 00000000092f47d0 0000000000000014 c000001fe8741000 c000001fe9860200 
NIP [c00000000009d048] opal_interrupt+0x28/0x70
LR [c000000000092fd0] opal_return+0x14/0x48
Call Trace:
[c000003fff70f220] [c00000000009d048] opal_interrupt+0x28/0x70 (unreliable)
[c000003fff70f250] [c00000000016d890] __handle_irq_event_percpu+0x90/0x2d0
[c000003fff70f310] [c00000000016db00] handle_irq_event_percpu+0x30/0x90
[c000003fff70f350] [c00000000016dbc0] handle_irq_event+0x60/0xc0
[c000003fff70f380] [c000000000172d2c] handle_fasteoi_irq+0xbc/0x1f0
[c000003fff70f3b0] [c00000000016c084] generic_handle_irq+0x44/0x70
[c000003fff70f3d0] [c0000000000193cc] __do_irq+0x8c/0x200
[c000003fff70f440] [c000000000019640] do_IRQ+0x100/0x110
[c000003fff70f490] [c000000000008db8] hardware_interrupt_common+0x158/0x160
--- interrupt: 501 at fib_table_lookup+0xfc/0x600
    LR = fib_validate_source+0x148/0x370
[c000003fff70f780] [0000000000000000]           (null) (unreliable)
[c000003fff70f7e0] [c000000000959af8] fib_validate_source+0x148/0x370
[c000003fff70f8a0] [c0000000008fd664] ip_route_input_rcu+0x214/0x970
[c000003fff70f990] [c0000000008fdde0] ip_route_input_noref+0x20/0x30
[c000003fff70f9e0] [c000000000945e28] arp_process.constprop.14+0x3d8/0x8a0
[c000003fff70faf0] [c00000000089eb20] __netif_receive_skb_one_core+0x60/0x80
[c000003fff70fb30] [c0000000008a7d00] netif_receive_skb_internal+0x30/0x110
[c000003fff70fb70] [c0000000008a888c] napi_gro_receive+0x11c/0x1c0
[c000003fff70fbb0] [c000000000702afc] tg3_poll_work+0x5fc/0x1060
[c000003fff70fcb0] [c0000000007035b4] tg3_poll_msix+0x54/0x210
[c000003fff70fd00] [c0000000008a922c] net_rx_action+0x31c/0x470
[c000003fff70fe10] [c0000000009f5afc] __do_softirq+0x15c/0x3b4
[c000003fff70ff00] [c0000000000fddf0] irq_exit+0x100/0x120
[c000003fff70ff20] [c0000000000193d8] __do_irq+0x98/0x200
[c000003fff70ff90] [c00000000002af24] call_do_irq+0x14/0x24
[c000003ca947fa80] [c0000000000195d4] do_IRQ+0x94/0x110
[c000003ca947fad0] [c000000000008db8] hardware_interrupt_common+0x158/0x160
--- interrupt: 501 at replay_interrupt_return+0x0/0x4
    LR = arch_local_irq_restore+0x84/0x90
[c000003ca947fdc0] [0000000000080000] 0x80000 (unreliable)
[c000003ca947fde0] [c000000000181f60] rcu_idle_exit+0xa0/0xd0
[c000003ca947fe30] [c000000000136d08] do_idle+0x1c8/0x3a0
[c000003ca947fec0] [c0000000001370b4] cpu_startup_entry+0x34/0x40
[c000003ca947fef0] [c0000000000467f4] start_secondary+0x4d4/0x520
[c000003ca947ff90] [c00000000000b270] start_secondary_prolog+0x10/0x14
Instruction dump:
60000000 60420000 3c4c0101 384214e0 7c0802a6 78630020 f8010010 f821ffd1 
4bf7b901 60000000 38810020 4bff657d <60000000> 39010020 3d42ffed e94a5d28 
watchdog: CPU 80 became unstuck TB:980802789270
CPU: 80 PID: 412 Comm: ksoftirqd/80 Not tainted 4.19.0-rc4-autotest-autotest #1
Call Trace:
[c000003ca96f7910] [c0000000009d4cec] dump_stack+0xb0/0xf4 (unreliable)
[c000003ca96f7950] [c00000000002f278] wd_smp_clear_cpu_pending+0x368/0x3f0
[c000003ca96f7a10] [c00000000002fa48] wd_timer_fn+0x78/0x3a0
[c000003ca96f7ad0] [c00000000018a3c0] call_timer_fn+0x50/0x1b0
[c000003ca96f7b50] [c00000000018a658] expire_timers+0x138/0x1e0
[c000003ca96f7bc0] [c00000000018a7c8] run_timer_softirq+0xc8/0x220
[c000003ca96f7c50] [c0000000009f5afc] __do_softirq+0x15c/0x3b4
[c000003ca96f7d40] [c0000000000fdab4] run_ksoftirqd+0x54/0x80
[c000003ca96f7d60] [c000000000126f10] smpboot_thread_fn+0x290/0x2a0
[c000003ca96f7dc0] [c0000000001215ac] kthread+0x15c/0x1a0
[c000003ca96f7e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68
bnx2x 0008:01:00.2: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
bnx2x 0008:01:00.2 enP8p1s0f2: renamed from eth0
bnx2x 0008:01:00.3: msix capability found
bnx2x 0008:01:00.3: Using 64-bit DMA iommu bypass
bnx2x 0008:01:00.3: part number 0-0-0-0
bnx2x 0008:01:00.3: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
bnx2x 0008:01:00.3 enP8p1s0f3: renamed from eth0

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload
  2018-09-24  8:56 [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload Abdul Haleem
@ 2018-09-24  9:35 ` Oliver
  2018-09-24 10:19   ` Abdul Haleem
  0 siblings, 1 reply; 8+ messages in thread
From: Oliver @ 2018-09-24  9:35 UTC (permalink / raw)
  To: Abdul Haleem; +Cc: linuxppc-dev, maurosr, sim, manvanth

On Mon, Sep 24, 2018 at 6:56 PM, Abdul Haleem
<abdhalee@linux.vnet.ibm.com> wrote:
> Greeting's
>
> bnx2x module load/unload test results in continuous hard LOCKUP trace on
> my powerpc bare-metal running mainline 4.19.0-rc4 kernel
>
> the instruction address points to:
>
> 0xc00000000009d048 is in opal_interrupt
> (arch/powerpc/platforms/powernv/opal-irqchip.c:133).
> 128
> 129     static irqreturn_t opal_interrupt(int irq, void *data)
> 130     {
> 131             __be64 events;
> 132
> 133             opal_handle_interrupt(virq_to_hw(irq), &events);
> 134             last_outstanding_events = be64_to_cpu(events);
> 135             if (opal_have_pending_events())
> 136                     opal_wake_poller();
> 137
>
> trace:
> bnx2x 0008:01:00.3 enP8p1s0f3: renamed from eth0
> bnx2x 0008:01:00.3 enP8p1s0f3: using MSI-X  IRQs: sp 297  fp[0] 299 ... fp[7] 306
> bnx2x 0008:01:00.2 enP8p1s0f2: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
> bnx2x 0008:01:00.3 enP8p1s0f3: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
> bnx2x: QLogic 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.712.30-0 (2014/02/10)
> bnx2x 0008:01:00.0: msix capability found
> bnx2x 0008:01:00.0: Using 64-bit DMA iommu bypass
> bnx2x 0008:01:00.0: part number 0-0-0-0
> bnx2x 0008:01:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> bnx2x 0008:01:00.0 enP8p1s0f0: renamed from eth0
> bnx2x 0008:01:00.1: msix capability found
> bnx2x 0008:01:00.1: Using 64-bit DMA iommu bypass
> bnx2x 0008:01:00.1: part number 0-0-0-0
> bnx2x 0008:01:00.0 enP8p1s0f0: using MSI-X  IRQs: sp 267  fp[0] 269 ... fp[7] 276
> bnx2x 0008:01:00.0 enP8p1s0f0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
> bnx2x 0008:01:00.1: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> bnx2x 0008:01:00.1 enP8p1s0f1: renamed from eth0
> bnx2x 0008:01:00.2: msix capability found
> bnx2x 0008:01:00.2: Using 64-bit DMA iommu bypass
> bnx2x 0008:01:00.2: part number 0-0-0-0
> bnx2x 0008:01:00.1 enP8p1s0f1: using MSI-X  IRQs: sp 277  fp[0] 279 ... fp[7] 286
> bnx2x 0008:01:00.1 enP8p1s0f1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit


> watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70
> watchdog: CPU 80 TB:980794111093, last heartbeat TB:973959617200 (13348ms ago)

Ouch, 13 seconds in OPAL. Looks like we trip the hard lockup detector
once the thread comes back into the kernel so we're not completely
stuck. At a guess there's some contention on a lock in OPAL due to the
bind/unbind loop, but i'm not sure why that would be happening.

Can you give us a copy of the OPAL log? /sys/firmware/opal/msglog)

> Modules linked in: bnx2x(+) iptable_mangle ipt_MASQUERADE iptable_nat
> nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT
> nf_reject_ipv4 xt_tcpudp tun bridge stp llc iptable_filter dm_mirror
> dm_region_hash dm_log dm_service_time vmx_crypto powernv_rng rng_core
> dm_multipath kvm_hv kvm binfmt_misc nfsd ip_tables x_tables autofs4 xfs
> lpfc crc_t10dif crct10dif_generic nvme_fc nvme_fabrics mdio libcrc32c
> nvme_core crct10dif_common [last unloaded: bnx2x]
> CPU: 80 PID: 0 Comm: swapper/80 Not tainted 4.19.0-rc4-autotest-autotest #1
> NIP:  c00000000009d048 LR: c000000000092fd0 CTR: 0000000030032a00
> REGS: c000003fff493d80 TRAP: 0900   Not tainted  (4.19.0-rc4-autotest-autotest)
> MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48004042  XER: 00000000
> CFAR: c000000000092fbc IRQMASK: 1
> GPR00: 0000000030005128 c000003fff70f220 c0000000010ae500 0000000000000000
> GPR04: 0000000048004042 c00000000009d048 9000000000009033 0000000000000090
> GPR08: 0000000000000000 0000000000000000 c000000000092fe4 9000000000001003
> GPR12: c000000000092fbc c000003fff7ff300 c000003c96c80c00 0000000000010000
> GPR16: 0000000000000000 000000000000003c c000003c96c80800 c000003c96d00700
> GPR20: 0000000000000001 0000000000000001 0000000000000002 0000000000000014
> GPR24: c000001fe8741000 c000003fff70f330 0000000000000000 c000003ca947fb40
> GPR28: 00000000092f47d0 0000000000000014 c000001fe8741000 c000001fe9860200
> NIP [c00000000009d048] opal_interrupt+0x28/0x70
> LR [c000000000092fd0] opal_return+0x14/0x48
> Call Trace:
> [c000003fff70f220] [c00000000009d048] opal_interrupt+0x28/0x70 (unreliable)
> [c000003fff70f250] [c00000000016d890] __handle_irq_event_percpu+0x90/0x2d0
> [c000003fff70f310] [c00000000016db00] handle_irq_event_percpu+0x30/0x90
> [c000003fff70f350] [c00000000016dbc0] handle_irq_event+0x60/0xc0
> [c000003fff70f380] [c000000000172d2c] handle_fasteoi_irq+0xbc/0x1f0
> [c000003fff70f3b0] [c00000000016c084] generic_handle_irq+0x44/0x70
> [c000003fff70f3d0] [c0000000000193cc] __do_irq+0x8c/0x200
> [c000003fff70f440] [c000000000019640] do_IRQ+0x100/0x110
> [c000003fff70f490] [c000000000008db8] hardware_interrupt_common+0x158/0x160
> --- interrupt: 501 at fib_table_lookup+0xfc/0x600
>     LR = fib_validate_source+0x148/0x370
> [c000003fff70f780] [0000000000000000]           (null) (unreliable)
> [c000003fff70f7e0] [c000000000959af8] fib_validate_source+0x148/0x370
> [c000003fff70f8a0] [c0000000008fd664] ip_route_input_rcu+0x214/0x970
> [c000003fff70f990] [c0000000008fdde0] ip_route_input_noref+0x20/0x30
> [c000003fff70f9e0] [c000000000945e28] arp_process.constprop.14+0x3d8/0x8a0
> [c000003fff70faf0] [c00000000089eb20] __netif_receive_skb_one_core+0x60/0x80
> [c000003fff70fb30] [c0000000008a7d00] netif_receive_skb_internal+0x30/0x110
> [c000003fff70fb70] [c0000000008a888c] napi_gro_receive+0x11c/0x1c0
> [c000003fff70fbb0] [c000000000702afc] tg3_poll_work+0x5fc/0x1060
> [c000003fff70fcb0] [c0000000007035b4] tg3_poll_msix+0x54/0x210
> [c000003fff70fd00] [c0000000008a922c] net_rx_action+0x31c/0x470
> [c000003fff70fe10] [c0000000009f5afc] __do_softirq+0x15c/0x3b4
> [c000003fff70ff00] [c0000000000fddf0] irq_exit+0x100/0x120
> [c000003fff70ff20] [c0000000000193d8] __do_irq+0x98/0x200
> [c000003fff70ff90] [c00000000002af24] call_do_irq+0x14/0x24
> [c000003ca947fa80] [c0000000000195d4] do_IRQ+0x94/0x110
> [c000003ca947fad0] [c000000000008db8] hardware_interrupt_common+0x158/0x160
> --- interrupt: 501 at replay_interrupt_return+0x0/0x4
>     LR = arch_local_irq_restore+0x84/0x90
> [c000003ca947fdc0] [0000000000080000] 0x80000 (unreliable)
> [c000003ca947fde0] [c000000000181f60] rcu_idle_exit+0xa0/0xd0
> [c000003ca947fe30] [c000000000136d08] do_idle+0x1c8/0x3a0
> [c000003ca947fec0] [c0000000001370b4] cpu_startup_entry+0x34/0x40
> [c000003ca947fef0] [c0000000000467f4] start_secondary+0x4d4/0x520
> [c000003ca947ff90] [c00000000000b270] start_secondary_prolog+0x10/0x14
> Instruction dump:
> 60000000 60420000 3c4c0101 384214e0 7c0802a6 78630020 f8010010 f821ffd1
> 4bf7b901 60000000 38810020 4bff657d <60000000> 39010020 3d42ffed e94a5d28
> watchdog: CPU 80 became unstuck TB:980802789270
> CPU: 80 PID: 412 Comm: ksoftirqd/80 Not tainted 4.19.0-rc4-autotest-autotest #1
> Call Trace:
> [c000003ca96f7910] [c0000000009d4cec] dump_stack+0xb0/0xf4 (unreliable)
> [c000003ca96f7950] [c00000000002f278] wd_smp_clear_cpu_pending+0x368/0x3f0
> [c000003ca96f7a10] [c00000000002fa48] wd_timer_fn+0x78/0x3a0
> [c000003ca96f7ad0] [c00000000018a3c0] call_timer_fn+0x50/0x1b0
> [c000003ca96f7b50] [c00000000018a658] expire_timers+0x138/0x1e0
> [c000003ca96f7bc0] [c00000000018a7c8] run_timer_softirq+0xc8/0x220
> [c000003ca96f7c50] [c0000000009f5afc] __do_softirq+0x15c/0x3b4
> [c000003ca96f7d40] [c0000000000fdab4] run_ksoftirqd+0x54/0x80
> [c000003ca96f7d60] [c000000000126f10] smpboot_thread_fn+0x290/0x2a0
> [c000003ca96f7dc0] [c0000000001215ac] kthread+0x15c/0x1a0
> [c000003ca96f7e30] [c00000000000bdd4] ret_from_kernel_thread+0x5c/0x68
> bnx2x 0008:01:00.2: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> bnx2x 0008:01:00.2 enP8p1s0f2: renamed from eth0
> bnx2x 0008:01:00.3: msix capability found
> bnx2x 0008:01:00.3: Using 64-bit DMA iommu bypass
> bnx2x 0008:01:00.3: part number 0-0-0-0
> bnx2x 0008:01:00.3: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> bnx2x 0008:01:00.3 enP8p1s0f3: renamed from eth0
>
> --
> Regard's
>
> Abdul Haleem
> IBM Linux Technology Centre
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload
  2018-09-24  9:35 ` Oliver
@ 2018-09-24 10:19   ` Abdul Haleem
  2018-11-15 11:10     ` Abdul Haleem
  0 siblings, 1 reply; 8+ messages in thread
From: Abdul Haleem @ 2018-09-24 10:19 UTC (permalink / raw)
  To: Oliver; +Cc: manvanth, sim, linuxppc-dev, maurosr

On Mon, 2018-09-24 at 19:35 +1000, Oliver wrote:
> On Mon, Sep 24, 2018 at 6:56 PM, Abdul Haleem
> <abdhalee@linux.vnet.ibm.com> wrote:
> > Greeting's
> >
> > bnx2x module load/unload test results in continuous hard LOCKUP trace on
> > my powerpc bare-metal running mainline 4.19.0-rc4 kernel
> >
> > the instruction address points to:
> >
> > 0xc00000000009d048 is in opal_interrupt
> > (arch/powerpc/platforms/powernv/opal-irqchip.c:133).
> > 128
> > 129     static irqreturn_t opal_interrupt(int irq, void *data)
> > 130     {
> > 131             __be64 events;
> > 132
> > 133             opal_handle_interrupt(virq_to_hw(irq), &events);
> > 134             last_outstanding_events = be64_to_cpu(events);
> > 135             if (opal_have_pending_events())
> > 136                     opal_wake_poller();
> > 137
> >
> > trace:
> > bnx2x 0008:01:00.3 enP8p1s0f3: renamed from eth0
> > bnx2x 0008:01:00.3 enP8p1s0f3: using MSI-X  IRQs: sp 297  fp[0] 299 ... fp[7] 306
> > bnx2x 0008:01:00.2 enP8p1s0f2: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
> > bnx2x 0008:01:00.3 enP8p1s0f3: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
> > bnx2x: QLogic 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.712.30-0 (2014/02/10)
> > bnx2x 0008:01:00.0: msix capability found
> > bnx2x 0008:01:00.0: Using 64-bit DMA iommu bypass
> > bnx2x 0008:01:00.0: part number 0-0-0-0
> > bnx2x 0008:01:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> > bnx2x 0008:01:00.0 enP8p1s0f0: renamed from eth0
> > bnx2x 0008:01:00.1: msix capability found
> > bnx2x 0008:01:00.1: Using 64-bit DMA iommu bypass
> > bnx2x 0008:01:00.1: part number 0-0-0-0
> > bnx2x 0008:01:00.0 enP8p1s0f0: using MSI-X  IRQs: sp 267  fp[0] 269 ... fp[7] 276
> > bnx2x 0008:01:00.0 enP8p1s0f0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
> > bnx2x 0008:01:00.1: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> > bnx2x 0008:01:00.1 enP8p1s0f1: renamed from eth0
> > bnx2x 0008:01:00.2: msix capability found
> > bnx2x 0008:01:00.2: Using 64-bit DMA iommu bypass
> > bnx2x 0008:01:00.2: part number 0-0-0-0
> > bnx2x 0008:01:00.1 enP8p1s0f1: using MSI-X  IRQs: sp 277  fp[0] 279 ... fp[7] 286
> > bnx2x 0008:01:00.1 enP8p1s0f1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
> 
> 
> > watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70
> > watchdog: CPU 80 TB:980794111093, last heartbeat TB:973959617200 (13348ms ago)
> 
> Ouch, 13 seconds in OPAL. Looks like we trip the hard lockup detector
> once the thread comes back into the kernel so we're not completely
> stuck. At a guess there's some contention on a lock in OPAL due to the
> bind/unbind loop, but i'm not sure why that would be happening.
> 
> Can you give us a copy of the OPAL log? /sys/firmware/opal/msglog)

Oliver, thanks for looking into this, I have sent a private mail (file
was 1MB) with logs attached.

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload
  2018-09-24 10:19   ` Abdul Haleem
@ 2018-11-15 11:10     ` Abdul Haleem
  2018-11-15 14:16       ` Abdul Haleem
  0 siblings, 1 reply; 8+ messages in thread
From: Abdul Haleem @ 2018-11-15 11:10 UTC (permalink / raw)
  To: Oliver; +Cc: manvanth, sim, linuxppc-dev, maurosr

On Mon, 2018-09-24 at 15:49 +0530, Abdul Haleem wrote:
> On Mon, 2018-09-24 at 19:35 +1000, Oliver wrote:
> > On Mon, Sep 24, 2018 at 6:56 PM, Abdul Haleem
> > <abdhalee@linux.vnet.ibm.com> wrote:
> > > Greeting's
> > >
> > > bnx2x module load/unload test results in continuous hard LOCKUP trace on
> > > my powerpc bare-metal running mainline 4.19.0-rc4 kernel
> > >
> > > the instruction address points to:
> > >
> > > 0xc00000000009d048 is in opal_interrupt
> > > (arch/powerpc/platforms/powernv/opal-irqchip.c:133).
> > > 128
> > > 129     static irqreturn_t opal_interrupt(int irq, void *data)
> > > 130     {
> > > 131             __be64 events;
> > > 132
> > > 133             opal_handle_interrupt(virq_to_hw(irq), &events);
> > > 134             last_outstanding_events = be64_to_cpu(events);
> > > 135             if (opal_have_pending_events())
> > > 136                     opal_wake_poller();
> > > 137
> > >
> > > trace:
> > > bnx2x 0008:01:00.3 enP8p1s0f3: renamed from eth0
> > > bnx2x 0008:01:00.3 enP8p1s0f3: using MSI-X  IRQs: sp 297  fp[0] 299 ... fp[7] 306
> > > bnx2x 0008:01:00.2 enP8p1s0f2: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
> > > bnx2x 0008:01:00.3 enP8p1s0f3: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
> > > bnx2x: QLogic 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.712.30-0 (2014/02/10)
> > > bnx2x 0008:01:00.0: msix capability found
> > > bnx2x 0008:01:00.0: Using 64-bit DMA iommu bypass
> > > bnx2x 0008:01:00.0: part number 0-0-0-0
> > > bnx2x 0008:01:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> > > bnx2x 0008:01:00.0 enP8p1s0f0: renamed from eth0
> > > bnx2x 0008:01:00.1: msix capability found
> > > bnx2x 0008:01:00.1: Using 64-bit DMA iommu bypass
> > > bnx2x 0008:01:00.1: part number 0-0-0-0
> > > bnx2x 0008:01:00.0 enP8p1s0f0: using MSI-X  IRQs: sp 267  fp[0] 269 ... fp[7] 276
> > > bnx2x 0008:01:00.0 enP8p1s0f0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
> > > bnx2x 0008:01:00.1: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> > > bnx2x 0008:01:00.1 enP8p1s0f1: renamed from eth0
> > > bnx2x 0008:01:00.2: msix capability found
> > > bnx2x 0008:01:00.2: Using 64-bit DMA iommu bypass
> > > bnx2x 0008:01:00.2: part number 0-0-0-0
> > > bnx2x 0008:01:00.1 enP8p1s0f1: using MSI-X  IRQs: sp 277  fp[0] 279 ... fp[7] 286
> > > bnx2x 0008:01:00.1 enP8p1s0f1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
> > 
> > 
> > > watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70
> > > watchdog: CPU 80 TB:980794111093, last heartbeat TB:973959617200 (13348ms ago)
> > 
> > Ouch, 13 seconds in OPAL. Looks like we trip the hard lockup detector
> > once the thread comes back into the kernel so we're not completely
> > stuck. At a guess there's some contention on a lock in OPAL due to the
> > bind/unbind loop, but i'm not sure why that would be happening.
> > 
> > Can you give us a copy of the OPAL log? /sys/firmware/opal/msglog)
> 
> Oliver, thanks for looking into this, I have sent a private mail (file
> was 1MB) with logs attached.
> 

Oliver, any luck on the logs given.

Warnings also show up on 4.20.0-rc2-next-20181114

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload
  2018-11-15 11:10     ` Abdul Haleem
@ 2018-11-15 14:16       ` Abdul Haleem
  2018-11-16  4:44         ` Michael Ellerman
  0 siblings, 1 reply; 8+ messages in thread
From: Abdul Haleem @ 2018-11-15 14:16 UTC (permalink / raw)
  To: Oliver; +Cc: linuxppc-dev, sim, manvanth, maurosr

On Thu, 2018-11-15 at 16:40 +0530, Abdul Haleem wrote:
> On Mon, 2018-09-24 at 15:49 +0530, Abdul Haleem wrote:
> > On Mon, 2018-09-24 at 19:35 +1000, Oliver wrote:
> > > On Mon, Sep 24, 2018 at 6:56 PM, Abdul Haleem
> > > <abdhalee@linux.vnet.ibm.com> wrote:
> > > > Greeting's
> > > >
> > > > bnx2x module load/unload test results in continuous hard LOCKUP trace on
> > > > my powerpc bare-metal running mainline 4.19.0-rc4 kernel
> > > >
> > > > the instruction address points to:
> > > >
> > > > 0xc00000000009d048 is in opal_interrupt
> > > > (arch/powerpc/platforms/powernv/opal-irqchip.c:133).
> > > > 128
> > > > 129     static irqreturn_t opal_interrupt(int irq, void *data)
> > > > 130     {
> > > > 131             __be64 events;
> > > > 132
> > > > 133             opal_handle_interrupt(virq_to_hw(irq), &events);
> > > > 134             last_outstanding_events = be64_to_cpu(events);
> > > > 135             if (opal_have_pending_events())
> > > > 136                     opal_wake_poller();
> > > > 137
> > > >
> > > > trace:
> > > > bnx2x 0008:01:00.3 enP8p1s0f3: renamed from eth0
> > > > bnx2x 0008:01:00.3 enP8p1s0f3: using MSI-X  IRQs: sp 297  fp[0] 299 ... fp[7] 306
> > > > bnx2x 0008:01:00.2 enP8p1s0f2: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
> > > > bnx2x 0008:01:00.3 enP8p1s0f3: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
> > > > bnx2x: QLogic 5771x/578xx 10/20-Gigabit Ethernet Driver bnx2x 1.712.30-0 (2014/02/10)
> > > > bnx2x 0008:01:00.0: msix capability found
> > > > bnx2x 0008:01:00.0: Using 64-bit DMA iommu bypass
> > > > bnx2x 0008:01:00.0: part number 0-0-0-0
> > > > bnx2x 0008:01:00.0: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> > > > bnx2x 0008:01:00.0 enP8p1s0f0: renamed from eth0
> > > > bnx2x 0008:01:00.1: msix capability found
> > > > bnx2x 0008:01:00.1: Using 64-bit DMA iommu bypass
> > > > bnx2x 0008:01:00.1: part number 0-0-0-0
> > > > bnx2x 0008:01:00.0 enP8p1s0f0: using MSI-X  IRQs: sp 267  fp[0] 269 ... fp[7] 276
> > > > bnx2x 0008:01:00.0 enP8p1s0f0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
> > > > bnx2x 0008:01:00.1: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)
> > > > bnx2x 0008:01:00.1 enP8p1s0f1: renamed from eth0
> > > > bnx2x 0008:01:00.2: msix capability found
> > > > bnx2x 0008:01:00.2: Using 64-bit DMA iommu bypass
> > > > bnx2x 0008:01:00.2: part number 0-0-0-0
> > > > bnx2x 0008:01:00.1 enP8p1s0f1: using MSI-X  IRQs: sp 277  fp[0] 279 ... fp[7] 286
> > > > bnx2x 0008:01:00.1 enP8p1s0f1: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit
> > > 
> > > 
> > > > watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70
> > > > watchdog: CPU 80 TB:980794111093, last heartbeat TB:973959617200 (13348ms ago)
> > > 
> > > Ouch, 13 seconds in OPAL. Looks like we trip the hard lockup detector
> > > once the thread comes back into the kernel so we're not completely
> > > stuck. At a guess there's some contention on a lock in OPAL due to the
> > > bind/unbind loop, but i'm not sure why that would be happening.
> > > 
> > > Can you give us a copy of the OPAL log? /sys/firmware/opal/msglog)
> > 
> > Oliver, thanks for looking into this, I have sent a private mail (file
> > was 1MB) with logs attached.
> > 
> 
> Oliver, any luck on the logs given.
> 
> Warnings also show up on 4.20.0-rc2-next-20181114

We have a patch fix available https://patchwork.ozlabs.org/patch/998054/

It fixed the problem.

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload
  2018-11-15 14:16       ` Abdul Haleem
@ 2018-11-16  4:44         ` Michael Ellerman
  2018-11-16  5:01           ` Abdul Haleem
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Ellerman @ 2018-11-16  4:44 UTC (permalink / raw)
  To: Abdul Haleem, Oliver; +Cc: manvanth, sim, linuxppc-dev, maurosr

Abdul Haleem <abdhalee@linux.vnet.ibm.com> writes:
> On Thu, 2018-11-15 at 16:40 +0530, Abdul Haleem wrote:
>> On Mon, 2018-09-24 at 15:49 +0530, Abdul Haleem wrote:
>> > On Mon, 2018-09-24 at 19:35 +1000, Oliver wrote:
>> > > On Mon, Sep 24, 2018 at 6:56 PM, Abdul Haleem
>> > > <abdhalee@linux.vnet.ibm.com> wrote:
>> > > > Greeting's
>> > > >
>> > > > bnx2x module load/unload test results in continuous hard LOCKUP trace on
>> > > > my powerpc bare-metal running mainline 4.19.0-rc4 kernel
                                                ^^^^^^^^^^
>> 
>> Warnings also show up on 4.20.0-rc2-next-20181114
>
> We have a patch fix available https://patchwork.ozlabs.org/patch/998054/
>
> It fixed the problem.

But the bug it fixes wasn't present in 4.19.0-rc4, which is the version
you originally reported against. Or is that version string not accurate?

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload
  2018-11-16  4:44         ` Michael Ellerman
@ 2018-11-16  5:01           ` Abdul Haleem
  2018-11-16 10:02             ` Michael Ellerman
  0 siblings, 1 reply; 8+ messages in thread
From: Abdul Haleem @ 2018-11-16  5:01 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, sim, Oliver, manvanth, maurosr

On Fri, 2018-11-16 at 15:44 +1100, Michael Ellerman wrote:
> Abdul Haleem <abdhalee@linux.vnet.ibm.com> writes:
> > On Thu, 2018-11-15 at 16:40 +0530, Abdul Haleem wrote:
> >> On Mon, 2018-09-24 at 15:49 +0530, Abdul Haleem wrote:
> >> > On Mon, 2018-09-24 at 19:35 +1000, Oliver wrote:
> >> > > On Mon, Sep 24, 2018 at 6:56 PM, Abdul Haleem
> >> > > <abdhalee@linux.vnet.ibm.com> wrote:
> >> > > > Greeting's
> >> > > >
> >> > > > bnx2x module load/unload test results in continuous hard LOCKUP trace on
> >> > > > my powerpc bare-metal running mainline 4.19.0-rc4 kernel
>                                                 ^^^^^^^^^^
> >> 
> >> Warnings also show up on 4.20.0-rc2-next-20181114
> >
> > We have a patch fix available https://patchwork.ozlabs.org/patch/998054/
> >
> > It fixed the problem.
> 
> But the bug it fixes wasn't present in 4.19.0-rc4, which is the version
> you originally reported against. Or is that version string not accurate?

Yes, version string wrong. the bug was first seen on linux-next
4.19.0-rc3-next-20180913 and on mainline version 4.19.0-rc8

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload
  2018-11-16  5:01           ` Abdul Haleem
@ 2018-11-16 10:02             ` Michael Ellerman
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2018-11-16 10:02 UTC (permalink / raw)
  To: Abdul Haleem; +Cc: linuxppc-dev, sim, Oliver, manvanth, maurosr

Abdul Haleem <abdhalee@linux.vnet.ibm.com> writes:
> On Fri, 2018-11-16 at 15:44 +1100, Michael Ellerman wrote:
>> Abdul Haleem <abdhalee@linux.vnet.ibm.com> writes:
>> > On Thu, 2018-11-15 at 16:40 +0530, Abdul Haleem wrote:
>> >> On Mon, 2018-09-24 at 15:49 +0530, Abdul Haleem wrote:
>> >> > On Mon, 2018-09-24 at 19:35 +1000, Oliver wrote:
>> >> > > On Mon, Sep 24, 2018 at 6:56 PM, Abdul Haleem
>> >> > > <abdhalee@linux.vnet.ibm.com> wrote:
>> >> > > > Greeting's
>> >> > > >
>> >> > > > bnx2x module load/unload test results in continuous hard LOCKUP trace on
>> >> > > > my powerpc bare-metal running mainline 4.19.0-rc4 kernel
>>                                                 ^^^^^^^^^^
>> >> 
>> >> Warnings also show up on 4.20.0-rc2-next-20181114
>> >
>> > We have a patch fix available https://patchwork.ozlabs.org/patch/998054/
>> >
>> > It fixed the problem.
>> 
>> But the bug it fixes wasn't present in 4.19.0-rc4, which is the version
>> you originally reported against. Or is that version string not accurate?
>
> Yes, version string wrong. the bug was first seen on linux-next
> 4.19.0-rc3-next-20180913 and on mainline version 4.19.0-rc8

OK. The commit that the above patch fixes was first in next-20181015, or
mainline as of v4.20-rc1~24.

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-11-16 10:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-24  8:56 [mainline][ppc][bnx2x] watchdog: CPU 80 self-detected hard LOCKUP @ opal_interrupt+0x28/0x70 when module load/unload Abdul Haleem
2018-09-24  9:35 ` Oliver
2018-09-24 10:19   ` Abdul Haleem
2018-11-15 11:10     ` Abdul Haleem
2018-11-15 14:16       ` Abdul Haleem
2018-11-16  4:44         ` Michael Ellerman
2018-11-16  5:01           ` Abdul Haleem
2018-11-16 10:02             ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).