Race between ipoib_cm_handle_tx_wc and ipoib_cm_tx

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* Race between ipoib_cm_handle_tx_wc and ipoib_cm_tx_destroy crashing.
@ 2016-08-11 14:11 Nikolay Borisov
       [not found] ` <57AC8799.8060003-6AxghH7DbtA@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Nikolay Borisov @ 2016-08-11 14:11 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
  Cc: Doug Ledford, yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA

Hello list, 

I've come across yet another ipoib issue. I got the following crash, 
while executing "ifdown ib0". At this time the infiniband network was working
as expected (e.g. no issue like the ones I reported before with tx queue
time outs, etc) : 

[721677.936044] ib0: timing out; 1 sends not completed
[721678.760114] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
[721678.760337] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[721679.081771] BUG: unable to handle kernel paging request at ffffc900358e24d8
[721679.081776] IP: [<ffffffffa01edf20>] ipoib_dma_unmap_tx+0x20/0x170 [ib_ipoib]
[721679.081782] PGD 1fff432067 PUD 3ffec01067 PMD 1e8f064067 PTE 0
[721679.081785] Oops: 0000 [#1] SMP 
[721679.081819] CPU: 0 PID: 3451 Comm: qib_cq0 Tainted: G           O    4.4.14-clouder3 #26
[721679.081821] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015
[721679.081822] task: ffff881fee8bb700 ti: ffff881ff0144000 task.ti: ffff881ff0144000
[721679.081823] RIP: 0010:[<ffffffffa01edf20>]  [<ffffffffa01edf20>] ipoib_dma_unmap_tx+0x20/0x170 [ib_ipoib]
[721679.081826] RSP: 0000:ffff881fff803e00  EFLAGS: 00010286
[721679.081827] RAX: ffffc900358e24e0 RBX: ffffc900358e24d8 RCX: 0000000000000100
[721679.081828] RDX: 000000000000069b RSI: ffffc900358e24d8 RDI: ffff881ff25c8700
[721679.081829] RBP: ffff881fff803e30 R08: 0000000000000002 R09: ffffc9001b7df000
[721679.081830] R10: 00000000000aebb9 R11: 0000000000000000 R12: 0000000000000000
[721679.081831] R13: 0000000000000000 R14: ffff881ff25c8700 R15: ffff881ff25c8000
[721679.081832] FS:  0000000000000000(0000) GS:ffff881fff800000(0000) knlGS:0000000000000000
[721679.081833] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[721679.081834] CR2: ffffc900358e24d8 CR3: 0000001fa3b4f000 CR4: 00000000001406f0
[721679.081834] Stack:
[721679.081835]  0000000000000000 ffffc900358e24d8 0000000000000000 ffff881fdf4b2c00
[721679.081837]  0000000000000000 ffff881ff25c8000 ffff881fff803e80 ffffffffa01f5660
[721679.081839]  ffff881ff25c9068 0000000040000059 ffff881ff25c8700 ffff881ff25c8710
[721679.081840] Call Trace:
[721679.081842]  <IRQ> 
[721679.081845]  [<ffffffffa01f5660>] ipoib_cm_handle_tx_wc+0x70/0x280 [ib_ipoib]
[721679.081848]  [<ffffffffa01ee487>] ipoib_poll+0xd7/0x160 [ib_ipoib]
[721679.081852]  [<ffffffff8154d36c>] net_rx_action+0x1ec/0x330
[721679.081855]  [<ffffffff81057337>] __do_softirq+0x147/0x310
[721679.081866]  [<ffffffffa00b601f>] ? send_complete+0x1f/0x60 [ib_qib]
[721679.081869]  [<ffffffff8161618c>] do_softirq_own_stack+0x1c/0x30
[721679.081869]  <EOI> 
[721679.081871]  [<ffffffff810567bb>] do_softirq.part.17+0x3b/0x40
[721679.081873]  [<ffffffff81056876>] __local_bh_enable_ip+0xb6/0xc0
[721679.081877]  [<ffffffffa00b604b>] send_complete+0x4b/0x60 [ib_qib]
[721679.081881]  [<ffffffff81071bdb>] kthread_worker_fn+0xbb/0x1e0
[721679.081883]  [<ffffffff81071b20>] ? kthread_create_on_node+0x180/0x180
[721679.081885]  [<ffffffff8107161f>] kthread+0xef/0x110
[721679.081887]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
[721679.081889]  [<ffffffff816149ff>] ret_from_fork+0x3f/0x70
[721679.081891]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
[721679.081892] Code: 48 8b 05 b4 63 a2 e1 eb a3 66 90 0f 1f 44 00 00 55 48 8d 46 08 48 89 e5 41 57 41 56 49 89 fe 41 55 45 31 ed 41 54 53 48 83 ec 08 <4c> 8b 3e 48 89 45 d0 41 8b 97 80 00 00 00 41 8b 87 84 00 00 00 
[721679.081909] RIP  [<ffffffffa01edf20>] ipoib_dma_unmap_tx+0x20/0x170 [ib_ipoib]
[721679.081911]  RSP <ffff881fff803e00>
[721679.081912] CR2: ffffc900358e24d8

ipoib_dma_unmap_tx+0x20 is : struct sk_buff *skb = tx_req->skb;
The address of tx_req is ffffc900358e24d8, which apparently
is unmapped address. 

The warning message "ib0: timing out; 1 sends not completed", 
came from  ipoib_cm_tx_destroy. This means the "goto timeout" 
statement has executed, triggering the freeing of the ipoib_cm_tx
state, eventually unmapping the vmalloced. At the time of the crash ipoib_cm_tx looks like: 

struct ipoib_cm_tx {
  id = 0xffff881fdf4b2c60, 
  qp = 0xffff8801c0bc8000, 
  list = {
    next = 0xdead000000000100, 
    prev = 0xdead000000000200
  }, 
  dev = 0xffff881ff25c8000, 
  neigh = 0x0, 
  path = 0xffff881f3f7a7500, 
  tx_ring = 0xffffc900358df000, 
  tx_head = 101722, 
  tx_tail = 101722, 
  flags = 1, 
  mtu = 65524
}

The list is dead which corresponds to the list_del(&p->list);
in cm_tx_reap and the tx_head/tx_tail are equal, corresponding
to the loop in ipoib_cm_tx_destroy. 

Reading the code in ipoib_cm_handle_tx_wc I wasn't able to figure out
how is this function synchronized against parallel destruction of 
underlying ipoib_cm_tx, which seems to be the case here. 

Regards, 
Nikolay



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <57AC8799.8060003-6AxghH7DbtA@public.gmane.org>]

* Re: Race between ipoib_cm_handle_tx_wc and ipoib_cm_tx_destroy crashing.
       [not found] ` <57AC8799.8060003-6AxghH7DbtA@public.gmane.org>
@ 2016-08-11 15:10   ` Erez Shitrit
       [not found]     ` <CAAk-MO8ftavVagQSXte_7yRf=iwHRXk-123wWRmmJELW+RVC7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Erez Shitrit @ 2016-08-11 15:10 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Doug Ledford,
	Yuval Shaia

On Thu, Aug 11, 2016 at 5:11 PM, Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> wrote:
> Hello list,
>
> I've come across yet another ipoib issue. I got the following crash,
> while executing "ifdown ib0". At this time the infiniband network was working
> as expected (e.g. no issue like the ones I reported before with tx queue
> time outs, etc) :
>

I think, it is all in the same area (as i already told you), somehow
there are missing events from the FW.

The scenario here can be explained by hanged/delayed FW/HW, please see below:

> [721677.936044] ib0: timing out; 1 sends not completed

The driver waits for 5 sec for completions from the HW/FW, they didn't
come, so the driver cleans all the skb including the dma mapping.

> [721678.760114] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
> [721678.760337] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
> [721679.081771] BUG: unable to handle kernel paging request at ffffc900358e24d8
> [721679.081776] IP: [<ffffffffa01edf20>] ipoib_dma_unmap_tx+0x20/0x170 [ib_ipoib]
> [721679.081782] PGD 1fff432067 PUD 3ffec01067 PMD 1e8f064067 PTE 0
> [721679.081785] Oops: 0000 [#1] SMP
> [721679.081819] CPU: 0 PID: 3451 Comm: qib_cq0 Tainted: G           O    4.4.14-clouder3 #26
> [721679.081821] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015
> [721679.081822] task: ffff881fee8bb700 ti: ffff881ff0144000 task.ti: ffff881ff0144000
> [721679.081823] RIP: 0010:[<ffffffffa01edf20>]  [<ffffffffa01edf20>] ipoib_dma_unmap_tx+0x20/0x170 [ib_ipoib]
> [721679.081826] RSP: 0000:ffff881fff803e00  EFLAGS: 00010286
> [721679.081827] RAX: ffffc900358e24e0 RBX: ffffc900358e24d8 RCX: 0000000000000100
> [721679.081828] RDX: 000000000000069b RSI: ffffc900358e24d8 RDI: ffff881ff25c8700
> [721679.081829] RBP: ffff881fff803e30 R08: 0000000000000002 R09: ffffc9001b7df000
> [721679.081830] R10: 00000000000aebb9 R11: 0000000000000000 R12: 0000000000000000
> [721679.081831] R13: 0000000000000000 R14: ffff881ff25c8700 R15: ffff881ff25c8000
> [721679.081832] FS:  0000000000000000(0000) GS:ffff881fff800000(0000) knlGS:0000000000000000
> [721679.081833] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [721679.081834] CR2: ffffc900358e24d8 CR3: 0000001fa3b4f000 CR4: 00000000001406f0
> [721679.081834] Stack:
> [721679.081835]  0000000000000000 ffffc900358e24d8 0000000000000000 ffff881fdf4b2c00
> [721679.081837]  0000000000000000 ffff881ff25c8000 ffff881fff803e80 ffffffffa01f5660
> [721679.081839]  ffff881ff25c9068 0000000040000059 ffff881ff25c8700 ffff881ff25c8710
> [721679.081840] Call Trace:
> [721679.081842]  <IRQ>
> [721679.081845]  [<ffffffffa01f5660>] ipoib_cm_handle_tx_wc+0x70/0x280 [ib_ipoib]

meanwhile the HW/FW returned the completion event of the send (after
the driver was waiting for it during 5 sec, and deleted that skb)
napi was called and the driver tries to unmap memory area that was
already was freed ..

-->>panic

> [721679.081848]  [<ffffffffa01ee487>] ipoib_poll+0xd7/0x160 [ib_ipoib]
> [721679.081852]  [<ffffffff8154d36c>] net_rx_action+0x1ec/0x330
> [721679.081855]  [<ffffffff81057337>] __do_softirq+0x147/0x310
> [721679.081866]  [<ffffffffa00b601f>] ? send_complete+0x1f/0x60 [ib_qib]
> [721679.081869]  [<ffffffff8161618c>] do_softirq_own_stack+0x1c/0x30
> [721679.081869]  <EOI>
> [721679.081871]  [<ffffffff810567bb>] do_softirq.part.17+0x3b/0x40
> [721679.081873]  [<ffffffff81056876>] __local_bh_enable_ip+0xb6/0xc0
> [721679.081877]  [<ffffffffa00b604b>] send_complete+0x4b/0x60 [ib_qib]
> [721679.081881]  [<ffffffff81071bdb>] kthread_worker_fn+0xbb/0x1e0
> [721679.081883]  [<ffffffff81071b20>] ? kthread_create_on_node+0x180/0x180
> [721679.081885]  [<ffffffff8107161f>] kthread+0xef/0x110
> [721679.081887]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
> [721679.081889]  [<ffffffff816149ff>] ret_from_fork+0x3f/0x70
> [721679.081891]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
> [721679.081892] Code: 48 8b 05 b4 63 a2 e1 eb a3 66 90 0f 1f 44 00 00 55 48 8d 46 08 48 89 e5 41 57 41 56 49 89 fe 41 55 45 31 ed 41 54 53 48 83 ec 08 <4c> 8b 3e 48 89 45 d0 41 8b 97 80 00 00 00 41 8b 87 84 00 00 00
> [721679.081909] RIP  [<ffffffffa01edf20>] ipoib_dma_unmap_tx+0x20/0x170 [ib_ipoib]
> [721679.081911]  RSP <ffff881fff803e00>
> [721679.081912] CR2: ffffc900358e24d8
>
> ipoib_dma_unmap_tx+0x20 is : struct sk_buff *skb = tx_req->skb;
> The address of tx_req is ffffc900358e24d8, which apparently
> is unmapped address.
>
> The warning message "ib0: timing out; 1 sends not completed",
> came from  ipoib_cm_tx_destroy. This means the "goto timeout"
> statement has executed, triggering the freeing of the ipoib_cm_tx
> state, eventually unmapping the vmalloced. At the time of the crash ipoib_cm_tx looks like:
>
> struct ipoib_cm_tx {
>   id = 0xffff881fdf4b2c60,
>   qp = 0xffff8801c0bc8000,
>   list = {
>     next = 0xdead000000000100,
>     prev = 0xdead000000000200
>   },
>   dev = 0xffff881ff25c8000,
>   neigh = 0x0,
>   path = 0xffff881f3f7a7500,
>   tx_ring = 0xffffc900358df000,
>   tx_head = 101722,
>   tx_tail = 101722,
>   flags = 1,
>   mtu = 65524
> }
>
> The list is dead which corresponds to the list_del(&p->list);
> in cm_tx_reap and the tx_head/tx_tail are equal, corresponding
> to the loop in ipoib_cm_tx_destroy.
>
> Reading the code in ipoib_cm_handle_tx_wc I wasn't able to figure out
> how is this function synchronized against parallel destruction of
> underlying ipoib_cm_tx, which seems to be the case here.

I think you need some debug from the HW/FW here.

>
> Regards,
> Nikolay
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <CAAk-MO8ftavVagQSXte_7yRf=iwHRXk-123wWRmmJELW+RVC7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: Race between ipoib_cm_handle_tx_wc and ipoib_cm_tx_destroy crashing.
       [not found]     ` <CAAk-MO8ftavVagQSXte_7yRf=iwHRXk-123wWRmmJELW+RVC7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-08-11 15:39       ` Nikolay Borisov
  0 siblings, 0 replies; 3+ messages in thread
From: Nikolay Borisov @ 2016-08-11 15:39 UTC (permalink / raw)
  To: Erez Shitrit
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Doug Ledford,
	Yuval Shaia



On 08/11/2016 06:10 PM, Erez Shitrit wrote:
> On Thu, Aug 11, 2016 at 5:11 PM, Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org> wrote:
>> Hello list,
>>
>> I've come across yet another ipoib issue. I got the following crash,
>> while executing "ifdown ib0". At this time the infiniband network was working
>> as expected (e.g. no issue like the ones I reported before with tx queue
>> time outs, etc) :
>>
> 
> I think, it is all in the same area (as i already told you), somehow
> there are missing events from the FW.
> 
> The scenario here can be explained by hanged/delayed FW/HW, please see below:
> 
>> [721677.936044] ib0: timing out; 1 sends not completed
> 
> The driver waits for 5 sec for completions from the HW/FW, they didn't
> come, so the driver cleans all the skb including the dma mapping.
> 
>> [721678.760114] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready
>> [721678.760337] IPv6: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>> [721679.081771] BUG: unable to handle kernel paging request at ffffc900358e24d8
>> [721679.081776] IP: [<ffffffffa01edf20>] ipoib_dma_unmap_tx+0x20/0x170 [ib_ipoib]
>> [721679.081782] PGD 1fff432067 PUD 3ffec01067 PMD 1e8f064067 PTE 0
>> [721679.081785] Oops: 0000 [#1] SMP
>> [721679.081819] CPU: 0 PID: 3451 Comm: qib_cq0 Tainted: G           O    4.4.14-clouder3 #26
>> [721679.081821] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015
>> [721679.081822] task: ffff881fee8bb700 ti: ffff881ff0144000 task.ti: ffff881ff0144000
>> [721679.081823] RIP: 0010:[<ffffffffa01edf20>]  [<ffffffffa01edf20>] ipoib_dma_unmap_tx+0x20/0x170 [ib_ipoib]
>> [721679.081826] RSP: 0000:ffff881fff803e00  EFLAGS: 00010286
>> [721679.081827] RAX: ffffc900358e24e0 RBX: ffffc900358e24d8 RCX: 0000000000000100
>> [721679.081828] RDX: 000000000000069b RSI: ffffc900358e24d8 RDI: ffff881ff25c8700
>> [721679.081829] RBP: ffff881fff803e30 R08: 0000000000000002 R09: ffffc9001b7df000
>> [721679.081830] R10: 00000000000aebb9 R11: 0000000000000000 R12: 0000000000000000
>> [721679.081831] R13: 0000000000000000 R14: ffff881ff25c8700 R15: ffff881ff25c8000
>> [721679.081832] FS:  0000000000000000(0000) GS:ffff881fff800000(0000) knlGS:0000000000000000
>> [721679.081833] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [721679.081834] CR2: ffffc900358e24d8 CR3: 0000001fa3b4f000 CR4: 00000000001406f0
>> [721679.081834] Stack:
>> [721679.081835]  0000000000000000 ffffc900358e24d8 0000000000000000 ffff881fdf4b2c00
>> [721679.081837]  0000000000000000 ffff881ff25c8000 ffff881fff803e80 ffffffffa01f5660
>> [721679.081839]  ffff881ff25c9068 0000000040000059 ffff881ff25c8700 ffff881ff25c8710
>> [721679.081840] Call Trace:
>> [721679.081842]  <IRQ>
>> [721679.081845]  [<ffffffffa01f5660>] ipoib_cm_handle_tx_wc+0x70/0x280 [ib_ipoib]
> 
> meanwhile the HW/FW returned the completion event of the send (after
> the driver was waiting for it during 5 sec, and deleted that skb)
> napi was called and the driver tries to unmap memory area that was
> already was freed ..
> 
> -->>panic

But even then I think the current design is problematic, because when
napi is called on an-already freed skb/state it should be able to detect
that and just do nothing. Otherwise we can get such a situation and kill
the whole machine. Even if the ib is not operational for whatever reason
it shouldn't be killing the whole machine.


> 
>> [721679.081848]  [<ffffffffa01ee487>] ipoib_poll+0xd7/0x160 [ib_ipoib]
>> [721679.081852]  [<ffffffff8154d36c>] net_rx_action+0x1ec/0x330
>> [721679.081855]  [<ffffffff81057337>] __do_softirq+0x147/0x310
>> [721679.081866]  [<ffffffffa00b601f>] ? send_complete+0x1f/0x60 [ib_qib]
>> [721679.081869]  [<ffffffff8161618c>] do_softirq_own_stack+0x1c/0x30
>> [721679.081869]  <EOI>
>> [721679.081871]  [<ffffffff810567bb>] do_softirq.part.17+0x3b/0x40
>> [721679.081873]  [<ffffffff81056876>] __local_bh_enable_ip+0xb6/0xc0
>> [721679.081877]  [<ffffffffa00b604b>] send_complete+0x4b/0x60 [ib_qib]
>> [721679.081881]  [<ffffffff81071bdb>] kthread_worker_fn+0xbb/0x1e0
>> [721679.081883]  [<ffffffff81071b20>] ? kthread_create_on_node+0x180/0x180
>> [721679.081885]  [<ffffffff8107161f>] kthread+0xef/0x110
>> [721679.081887]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
>> [721679.081889]  [<ffffffff816149ff>] ret_from_fork+0x3f/0x70
>> [721679.081891]  [<ffffffff81071530>] ? kthread_park+0x60/0x60
>> [721679.081892] Code: 48 8b 05 b4 63 a2 e1 eb a3 66 90 0f 1f 44 00 00 55 48 8d 46 08 48 89 e5 41 57 41 56 49 89 fe 41 55 45 31 ed 41 54 53 48 83 ec 08 <4c> 8b 3e 48 89 45 d0 41 8b 97 80 00 00 00 41 8b 87 84 00 00 00
>> [721679.081909] RIP  [<ffffffffa01edf20>] ipoib_dma_unmap_tx+0x20/0x170 [ib_ipoib]
>> [721679.081911]  RSP <ffff881fff803e00>
>> [721679.081912] CR2: ffffc900358e24d8
>>
>> ipoib_dma_unmap_tx+0x20 is : struct sk_buff *skb = tx_req->skb;
>> The address of tx_req is ffffc900358e24d8, which apparently
>> is unmapped address.
>>
>> The warning message "ib0: timing out; 1 sends not completed",
>> came from  ipoib_cm_tx_destroy. This means the "goto timeout"
>> statement has executed, triggering the freeing of the ipoib_cm_tx
>> state, eventually unmapping the vmalloced. At the time of the crash ipoib_cm_tx looks like:
>>
>> struct ipoib_cm_tx {
>>   id = 0xffff881fdf4b2c60,
>>   qp = 0xffff8801c0bc8000,
>>   list = {
>>     next = 0xdead000000000100,
>>     prev = 0xdead000000000200
>>   },
>>   dev = 0xffff881ff25c8000,
>>   neigh = 0x0,
>>   path = 0xffff881f3f7a7500,
>>   tx_ring = 0xffffc900358df000,
>>   tx_head = 101722,
>>   tx_tail = 101722,
>>   flags = 1,
>>   mtu = 65524
>> }
>>
>> The list is dead which corresponds to the list_del(&p->list);
>> in cm_tx_reap and the tx_head/tx_tail are equal, corresponding
>> to the loop in ipoib_cm_tx_destroy.
>>
>> Reading the code in ipoib_cm_handle_tx_wc I wasn't able to figure out
>> how is this function synchronized against parallel destruction of
>> underlying ipoib_cm_tx, which seems to be the case here.
> 
> I think you need some debug from the HW/FW here
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-08-11 15:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-11 14:11 Race between ipoib_cm_handle_tx_wc and ipoib_cm_tx_destroy crashing Nikolay Borisov
     [not found] ` <57AC8799.8060003-6AxghH7DbtA@public.gmane.org>
2016-08-11 15:10   ` Erez Shitrit
     [not found]     ` <CAAk-MO8ftavVagQSXte_7yRf=iwHRXk-123wWRmmJELW+RVC7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-08-11 15:39       ` Nikolay Borisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox