[PATCH] IB/mlx4: Fail post send command on error recovery

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] IB/mlx4: Fail post send command on error recovery
@ 2013-03-28 18:45 Kleber Sacilotto de Souza
       [not found] ` <1364496315-7588-1-git-send-email-klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Kleber Sacilotto de Souza @ 2013-03-28 18:45 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Roland Dreier, Sean Hefty, Hal Rosenstock, Or Gerlitz,
	Jack Morgenstein, Kleber Sacilotto de Souza

When the PCI adapter is going through error recovery, a call to
mlx4_ib_post_send() will return success without the command actually
arriving to the hardware. Adding a call to pci_channel_offline() to
check the state of the PCI slot and returning an error will allow the
upper layers to be aware that the command didn't succeed.

Signed-off-by: Kleber Sacilotto de Souza <klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/qp.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 35cced2..0fa4f72 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -2216,6 +2216,9 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 	__be32 blh;
 	int i;
 
+	if (pci_channel_offline(to_mdev(ibqp->device)->dev->pdev))
+		return -EIO;
+
 	spin_lock_irqsave(&qp->sq.lock, flags);
 
 	ind = qp->sq_next_wqe;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found] ` <1364496315-7588-1-git-send-email-klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2013-04-02  9:15   ` Or Gerlitz
       [not found]     ` <515AA1C6.7070804-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-04-02 17:00   ` Roland Dreier
  1 sibling, 1 reply; 12+ messages in thread
From: Or Gerlitz @ 2013-04-02  9:15 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Roland Dreier, Sean Hefty,
	Hal Rosenstock, Jack Morgenstein

On 28/03/2013 20:45, Kleber Sacilotto de Souza wrote:
> When the PCI adapter is going through error recovery, a call to
> mlx4_ib_post_send() will return success without the command actually
> arriving to the hardware. Adding a call to pci_channel_offline() to
> check the state of the PCI slot and returning an error will allow the
> upper layers to be aware that the command didn't succeed.

Is putting this call in fast path common practice in other (e.g 
Ethernet) drivers?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found]     ` <515AA1C6.7070804-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-04-02 11:24       ` Jack Morgenstein
  0 siblings, 0 replies; 12+ messages in thread
From: Jack Morgenstein @ 2013-04-02 11:24 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Kleber Sacilotto de Souza, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Roland Dreier, Sean Hefty, Hal Rosenstock

On Tuesday 02 April 2013 12:15, Or Gerlitz wrote:
> On 28/03/2013 20:45, Kleber Sacilotto de Souza wrote:
> > When the PCI adapter is going through error recovery, a call to
> > mlx4_ib_post_send() will return success without the command actually
> > arriving to the hardware. Adding a call to pci_channel_offline() to
> > check the state of the PCI slot and returning an error will allow the
> > upper layers to be aware that the command didn't succeed.
> 
> Is putting this call in fast path common practice in other (e.g 
> Ethernet) drivers?
> 
> Or.
> 
In addition, have you done any timing tests to see how this affects the
data path?  I am very concerned here that you penalize normal performance
for the sake of a very rare corner case.

-Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found] ` <1364496315-7588-1-git-send-email-klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2013-04-02  9:15   ` Or Gerlitz
@ 2013-04-02 17:00   ` Roland Dreier
       [not found]     ` <CAL1RGDW7wMVmyFhCv-Ei8Mbca-Y9yv+nygzfREU2_TozNSZ60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Roland Dreier @ 2013-04-02 17:00 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Hal Rosenstock, Or Gerlitz, Jack Morgenstein

> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index 35cced2..0fa4f72 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -2216,6 +2216,9 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>         __be32 blh;
>         int i;
>
> +       if (pci_channel_offline(to_mdev(ibqp->device)->dev->pdev))
> +               return -EIO;
> +
>         spin_lock_irqsave(&qp->sq.lock, flags);
>
>         ind = qp->sq_next_wqe;

To pile on to what Or and Jack asked, why here?  Why not in post_recv?
 Why not in mlx4_en?  What about userspace consumers?  What if the
error condition triggers just after the pci_channel_offline() check?
What if a command is queued but a PCI error occurs before the
completion can be returned?

Is there some practical scenario where this change makes a difference?

I would assume that in case of a PCI error, the driver would notice a
catastrophic error and send that asynchronous event to consumers, who
would know that commands might have been lost.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found]     ` <CAL1RGDW7wMVmyFhCv-Ei8Mbca-Y9yv+nygzfREU2_TozNSZ60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-04 13:01       ` Kleber Sacilotto de Souza
       [not found]         ` <515D79B3.4090808-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Kleber Sacilotto de Souza @ 2013-04-04 13:01 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Hal Rosenstock, Or Gerlitz, Jack Morgenstein

On 04/02/2013 02:00 PM, Roland Dreier wrote:
>> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
>> index 35cced2..0fa4f72 100644
>> --- a/drivers/infiniband/hw/mlx4/qp.c
>> +++ b/drivers/infiniband/hw/mlx4/qp.c
>> @@ -2216,6 +2216,9 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>>         __be32 blh;
>>         int i;
>>
>> +       if (pci_channel_offline(to_mdev(ibqp->device)->dev->pdev))
>> +               return -EIO;
>> +
>>         spin_lock_irqsave(&qp->sq.lock, flags);
>>
>>         ind = qp->sq_next_wqe;
> 
> To pile on to what Or and Jack asked, why here?  Why not in post_recv?
>  Why not in mlx4_en?  What about userspace consumers?  What if the
> error condition triggers just after the pci_channel_offline() check?
> What if a command is queued but a PCI error occurs before the
> completion can be returned?
> 
> Is there some practical scenario where this change makes a difference?
> 
> I would assume that in case of a PCI error, the driver would notice a
> catastrophic error and send that asynchronous event to consumers, who
> would know that commands might have been lost.
> 

The problem that I'm trying to solve is that some IB core modules are
hanging waiting on completion queues on their remove path during error
recovery. I've added the pci offline check in post_send, which seemed to
have to solved the problem, but while running other tests I was able to
hit the bug again. Adding the check in post_recv also only hid the
problem for a few testcases.

Adding any check in mlx4_en doesn't make sense in this case, because the
problem is only with IB adapters. The ethernet/RoCE adapters are
recovering fine, the check has been added already on the relevant places
in mlx4_core.

What async event should be sent to consumers before calling the remove
functions? IB_EVENT_DEVICE_FATAL, which is currently sent by mlx4_core
in case of catastrophic error (but not in PCI error recovery), doesn't
seem to be handled by most of the event handlers registered. Sending
IB_EVENT_PORT_ERR seems to solve the problem for most modules, but
rdma_cm, which doesn't have an event handler, is still hanging. Should
we implement an event handler for rdma_cm?

Thanks!

-- 
Kleber Sacilotto de Souza
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found]         ` <515D79B3.4090808-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2013-04-04 14:00           ` Jack Morgenstein
       [not found]             ` <201304041700.40349.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2013-04-04 21:45           ` Or Gerlitz
  1 sibling, 1 reply; 12+ messages in thread
From: Jack Morgenstein @ 2013-04-04 14:00 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Sean Hefty, Hal Rosenstock, Or Gerlitz

On Thursday 04 April 2013 16:01, Kleber Sacilotto de Souza wrote:
> On 04/02/2013 02:00 PM, Roland Dreier wrote:
> >> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> >> index 35cced2..0fa4f72 100644
> >> --- a/drivers/infiniband/hw/mlx4/qp.c
> >> +++ b/drivers/infiniband/hw/mlx4/qp.c
> >> @@ -2216,6 +2216,9 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
> >>         __be32 blh;
> >>         int i;
> >>
> >> +       if (pci_channel_offline(to_mdev(ibqp->device)->dev->pdev))
> >> +               return -EIO;
> >> +
> >>         spin_lock_irqsave(&qp->sq.lock, flags);
> >>
> >>         ind = qp->sq_next_wqe;
> > 
> > To pile on to what Or and Jack asked, why here?  Why not in post_recv?
> >  Why not in mlx4_en?  What about userspace consumers?  What if the
> > error condition triggers just after the pci_channel_offline() check?
> > What if a command is queued but a PCI error occurs before the
> > completion can be returned?
> > 
> > Is there some practical scenario where this change makes a difference?
> > 
> > I would assume that in case of a PCI error, the driver would notice a
> > catastrophic error and send that asynchronous event to consumers, who
> > would know that commands might have been lost.
> > 
> 
> The problem that I'm trying to solve is that some IB core modules are
> hanging waiting on completion queues on their remove path during error
> recovery. I've added the pci offline check in post_send, which seemed to
> have to solved the problem, but while running other tests I was able to
> hit the bug again. Adding the check in post_recv also only hid the
> problem for a few testcases.
> 
> Adding any check in mlx4_en doesn't make sense in this case, because the
> problem is only with IB adapters. The ethernet/RoCE adapters are
> recovering fine, the check has been added already on the relevant places
> in mlx4_core.
> 
> What async event should be sent to consumers before calling the remove
> functions? IB_EVENT_DEVICE_FATAL, which is currently sent by mlx4_core
> in case of catastrophic error (but not in PCI error recovery), doesn't
> seem to be handled by most of the event handlers registered. Sending
> IB_EVENT_PORT_ERR seems to solve the problem for most modules, but
> rdma_cm, which doesn't have an event handler, is still hanging. Should
> we implement an event handler for rdma_cm?
>

This won't really help unless ALL userspace apps respond by calling ibv_close_device.
You can check this by running ibv_asyncwatch  (in libibverbs/examples). Until ibv_asyncwatch
is exited the low-level device restart won't work.

-Jack
> 
> Thanks!
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found]             ` <201304041700.40349.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2013-04-04 21:45               ` Or Gerlitz
       [not found]                 ` <CAJZOPZ+dgtQRX_sfcDc=aSOW553Twi0oqNjREeminQ2tnZeEmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Or Gerlitz @ 2013-04-04 21:45 UTC (permalink / raw)
  To: Jack Morgenstein, Kleber Sacilotto de Souza
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Sean Hefty, Hal Rosenstock, Or Gerlitz

On Thu, Apr 4, 2013 at 5:00 PM, Jack Morgenstein <jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

> This won't really help unless ALL userspace apps respond by calling
> ibv_close_device. You can check this by running ibv_asyncwatch  (in
> libibverbs/examples). Until ibv_asyncwatch
> is exited the low-level device restart won't work.

Jack, so we have two different problems here, the kernel ULPs trying
to reap completions from their CQs which will never happen, and what
you pointed on under which currently the device will not be destroyed.
I think its OK to handle them one @ a time...

Kleber , as for the 1st problem, which kernel consumers are hanging
for ever on their CQs? IPoIB is giving up after sometime e.g see in
ipoib_ib.c "assume the HW is wedged and just free up all our pending
work requests"

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found]         ` <515D79B3.4090808-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2013-04-04 14:00           ` Jack Morgenstein
@ 2013-04-04 21:45           ` Or Gerlitz
       [not found]             ` <CAJZOPZLgCMDmTO-qqZXm9Y9xv+xCh5bezz3A_nBn6BEtB-G+0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Or Gerlitz @ 2013-04-04 21:45 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Sean Hefty, Hal Rosenstock, Or Gerlitz, Jack Morgenstein

On Thu, Apr 4, 2013 at 4:01 PM, Kleber Sacilotto de Souza

> The problem that I'm trying to solve is that some IB core modules are
> hanging waiting on completion queues on their remove path during error recovery.

So maybe patch them to give up after some time?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found]             ` <CAJZOPZLgCMDmTO-qqZXm9Y9xv+xCh5bezz3A_nBn6BEtB-G+0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-04 21:49               ` Roland Dreier
       [not found]                 ` <CAL1RGDU=hn7hMy0ECQ7AOQqmuB8R6+BT6JUudNS_6rPBKr2UtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Roland Dreier @ 2013-04-04 21:49 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Kleber Sacilotto de Souza,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Hal Rosenstock, Or Gerlitz, Jack Morgenstein

On Thu, Apr 4, 2013 at 2:45 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> The problem that I'm trying to solve is that some IB core modules are
>> hanging waiting on completion queues on their remove path during error recovery.

> So maybe patch them to give up after some time?

I don't know so much about this PCI error recovery stuff but it does
seem sensible to trigger a catastrophic error async event when it
happens (I'm assuming the recovery mechanism resets the adapter).

Then we should fix at least kernel ULPs behave appropriately when they
get such an async event.  And similarly if someone wants to harden
some subset of userspace apps to handle PCI error recovery too, that
would be another step forward.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found]                 ` <CAJZOPZ+dgtQRX_sfcDc=aSOW553Twi0oqNjREeminQ2tnZeEmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-08 13:51                   ` Kleber Sacilotto de Souza
       [not found]                     ` <5162CB51.3080600-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Kleber Sacilotto de Souza @ 2013-04-08 13:51 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jack Morgenstein, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Hal Rosenstock, Or Gerlitz

On 04/04/2013 06:45 PM, Or Gerlitz wrote:
> 
> Kleber , as for the 1st problem, which kernel consumers are hanging
> for ever on their CQs? IPoIB is giving up after sometime e.g see in
> ipoib_ib.c "assume the HW is wedged and just free up all our pending
> work requests"
> 

Or, I don't have a very comprehensive testcase to stress most part of
the IB stack during error recovery, but during my tests the kernel
consumer that are still hanging is the ib_sa module, mcast_remove_one()
is waiting for the port completion queue:

INFO: task eehd:4689 blocked for more than 30 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
eehd            D 0000000000000000     0  4689      2 0x00010000
Call Trace:
[c0000000fba83190] [0000000000000001] 0x1 (unreliable)
[c0000000fba83360] [c000000000016188] .__switch_to+0x140/0x268
[c0000000fba83410] [c000000000674f28] .__schedule+0x570/0x8f0
[c0000000fba836b0] [c000000000675bc4] .schedule_timeout+0x334/0x3c8
[c0000000fba837c0] [c000000000674738] .wait_for_common+0x1c0/0x238
[c0000000fba838a0] [d000000002ca230c] .mcast_remove_one+0xfc/0x168 [ib_sa]
[c0000000fba83940] [d000000002bc4f60] .ib_unregister_device+0x78/0x170
[ib_core]
...

Or rdma_cm waiting for the cma_dev completion:

Call Trace:
[c0000000f8fc70f0] [0000000000000001] 0x1 (unreliable)
[c0000000f8fc72c0] [c000000000016188] .__switch_to+0x140/0x268
[c0000000f8fc7370] [c000000000674f28] .__schedule+0x570/0x8f0
[c0000000f8fc7610] [c000000000675bc4] .schedule_timeout+0x334/0x3c8
[c0000000f8fc7720] [c000000000674738] .wait_for_common+0x1c0/0x238
[c0000000f8fc7800] [d000000002f835b0] .cma_process_remove+0x170/0x1a8
[rdma_cm]
[c0000000f8fc78b0] [d000000002f8366c] .cma_remove_one+0x84/0xb0 [rdma_cm]
[c0000000f8fc7940] [d000000002c34f60] .ib_unregister_device+0x78/0x170
[ib_core]
...


Thanks,
kleber

-- 
Kleber Sacilotto de Souza
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found]                 ` <CAL1RGDU=hn7hMy0ECQ7AOQqmuB8R6+BT6JUudNS_6rPBKr2UtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-08 14:07                   ` Kleber Sacilotto de Souza
  0 siblings, 0 replies; 12+ messages in thread
From: Kleber Sacilotto de Souza @ 2013-04-08 14:07 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Sean Hefty, Hal Rosenstock, Or Gerlitz, Jack Morgenstein

On 04/04/2013 06:49 PM, Roland Dreier wrote:
> 
> I don't know so much about this PCI error recovery stuff but it does
> seem sensible to trigger a catastrophic error async event when it
> happens (I'm assuming the recovery mechanism resets the adapter).

The PCI error recovery in the powerpc architecture, which is where I'm
focusing, works by identifying a misbehaving adapter and freezing its
slot, so that all MMIO writes to that device will be ignored and reads
will return all 1's. When that happens the Linux implementation will
invoke some callbacks on the driver (in this case mlx4_core) to recover
from the error, and reset the slot. The most common procedure is the
driver to remove the adapter and add it back, which is what the mlx4_ib
is trying to do.

> 
> Then we should fix at least kernel ULPs behave appropriately when they
> get such an async event.  And similarly if someone wants to harden
> some subset of userspace apps to handle PCI error recovery too, that
> would be another step forward.
> 

I agree, this seems to be what is missing to have the error recovery
fully functional.

Thanks,

-- 
Kleber Sacilotto de Souza
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] IB/mlx4: Fail post send command on error recovery
       [not found]                     ` <5162CB51.3080600-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2013-04-08 15:47                       ` Or Gerlitz
  0 siblings, 0 replies; 12+ messages in thread
From: Or Gerlitz @ 2013-04-08 15:47 UTC (permalink / raw)
  To: Kleber Sacilotto de Souza
  Cc: Jack Morgenstein, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Sean Hefty,
	Hal Rosenstock, Or Gerlitz

On Mon, Apr 8, 2013 at 4:51 PM, Kleber Sacilotto de Souza
<klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> wrote:

> Or, I don't have a very comprehensive testcase to stress most part of
> the IB stack during error recovery, but during my tests the kernel
> consumer that are still hanging is the ib_sa module, mcast_remove_one()
> is waiting for the port completion queue:

best if you can send the whole relevant picture, e.g use the sysrq "t"
dump when things are hanged and send the stack trace of all (or all
involved) kernel-threads/processes

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-04-08 15:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-28 18:45 [PATCH] IB/mlx4: Fail post send command on error recovery Kleber Sacilotto de Souza
     [not found] ` <1364496315-7588-1-git-send-email-klebers-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-02  9:15   ` Or Gerlitz
     [not found]     ` <515AA1C6.7070804-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-04-02 11:24       ` Jack Morgenstein
2013-04-02 17:00   ` Roland Dreier
     [not found]     ` <CAL1RGDW7wMVmyFhCv-Ei8Mbca-Y9yv+nygzfREU2_TozNSZ60A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-04 13:01       ` Kleber Sacilotto de Souza
     [not found]         ` <515D79B3.4090808-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-04 14:00           ` Jack Morgenstein
     [not found]             ` <201304041700.40349.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2013-04-04 21:45               ` Or Gerlitz
     [not found]                 ` <CAJZOPZ+dgtQRX_sfcDc=aSOW553Twi0oqNjREeminQ2tnZeEmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-08 13:51                   ` Kleber Sacilotto de Souza
     [not found]                     ` <5162CB51.3080600-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2013-04-08 15:47                       ` Or Gerlitz
2013-04-04 21:45           ` Or Gerlitz
     [not found]             ` <CAJZOPZLgCMDmTO-qqZXm9Y9xv+xCh5bezz3A_nBn6BEtB-G+0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-04 21:49               ` Roland Dreier
     [not found]                 ` <CAL1RGDU=hn7hMy0ECQ7AOQqmuB8R6+BT6JUudNS_6rPBKr2UtQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-08 14:07                   ` Kleber Sacilotto de Souza

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox