* how to re-use a QP for a new connection
@ 2014-06-20 18:06 Chuck Lever
[not found] ` <36E48CE3-3FB6-4985-9CA5-4D6B800EE3DC-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2014-06-20 18:06 UTC (permalink / raw)
To: linux-rdma
Hi-
I’m considering a change to xprtrdma that would re-use the QP and
rdma_cm_id after a transport disconnect.
I use rdma_disconnect() and then wait for the TIMEWAIT_EXIT upcall.
But after that, rdma_resolve_addr() always fails (-EINVAL).
What does xprtrdma need to do to get the rdma_cm_id back to the
RDMA_CM_IDLE state so I can reset the QP?
Feel free to tell me this doesn’t make sense.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: how to re-use a QP for a new connection
[not found] ` <36E48CE3-3FB6-4985-9CA5-4D6B800EE3DC-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-06-20 19:41 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A82373993132A8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Hefty, Sean @ 2014-06-20 19:41 UTC (permalink / raw)
To: Chuck Lever, linux-rdma
> I'm considering a change to xprtrdma that would re-use the QP and
> rdma_cm_id after a transport disconnect.
>
> I use rdma_disconnect() and then wait for the TIMEWAIT_EXIT upcall.
> But after that, rdma_resolve_addr() always fails (-EINVAL).
>
> What does xprtrdma need to do to get the rdma_cm_id back to the
> RDMA_CM_IDLE state so I can reset the QP?
I don't know that the kernel rdma cm code supports this. It's likely that the id would need to have its state reset and data structures cleaned up. That said, I doubt re-use of the rdma_cm_id would provide any real advantage.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to re-use a QP for a new connection
[not found] ` <1828884A29C6694DAF28B7E6B8A82373993132A8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-06-20 20:32 ` Chuck Lever
[not found] ` <5F77D836-4EE1-458D-B256-3C0EF4B1F2C2-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2014-06-20 20:32 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma
Hi Sean-
On Jun 20, 2014, at 3:41 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> I'm considering a change to xprtrdma that would re-use the QP and
>> rdma_cm_id after a transport disconnect.
>>
>> I use rdma_disconnect() and then wait for the TIMEWAIT_EXIT upcall.
>> But after that, rdma_resolve_addr() always fails (-EINVAL).
>>
>> What does xprtrdma need to do to get the rdma_cm_id back to the
>> RDMA_CM_IDLE state so I can reset the QP?
>
> I don't know that the kernel rdma cm code supports this. It's likely that the id would need to have its state reset and data structures cleaned up. That said, I doubt re-use of the rdma_cm_id would provide any real advantage.
During a remote transport disconnect, the QP leaves RTS.
xprtrdma deals with this in a separate transport connect worker process,
where it creates a new id and qp, and replaces the existing id and qp.
Unfortunately there are parts of xprtrdma (namely FRMR deregistration)
that are not easy to serialize with this reconnect logic.
Re-using the QP would mean no serialization would be needed between
transport reconnect and FRMR deregistration.
If QP re-use is not supported, though, it’s not worth considering any
further.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: how to re-use a QP for a new connection
[not found] ` <5F77D836-4EE1-458D-B256-3C0EF4B1F2C2-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-06-20 21:17 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237399313467-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Hefty, Sean @ 2014-06-20 21:17 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-rdma
> During a remote transport disconnect, the QP leaves RTS.
>
> xprtrdma deals with this in a separate transport connect worker process,
> where it creates a new id and qp, and replaces the existing id and qp.
>
> Unfortunately there are parts of xprtrdma (namely FRMR deregistration)
> that are not easy to serialize with this reconnect logic.
>
> Re-using the QP would mean no serialization would be needed between
> transport reconnect and FRMR deregistration.
>
> If QP re-use is not supported, though, it's not worth considering any
> further.
It may be possible to reuse the QP, just not the rdma_cm_id without additional code changes. Reuse of the rdma_cm_id may also require changes in the underlying IB/iWarp CMs.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to re-use a QP for a new connection
[not found] ` <1828884A29C6694DAF28B7E6B8A8237399313467-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-06-20 22:24 ` Shirley Ma
[not found] ` <53A4B4A1.50301-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-23 15:20 ` Chuck Lever
1 sibling, 1 reply; 14+ messages in thread
From: Shirley Ma @ 2014-06-20 22:24 UTC (permalink / raw)
To: Hefty, Sean, Chuck Lever; +Cc: linux-rdma
The QP can be reused. The rdma_id_private has a field reuseaddr. What additional change is needed besides rdma_set_reuseaddr?
Shirley
On 06/20/2014 02:17 PM, Hefty, Sean wrote:
>> During a remote transport disconnect, the QP leaves RTS.
>>
>> xprtrdma deals with this in a separate transport connect worker process,
>> where it creates a new id and qp, and replaces the existing id and qp.
>>
>> Unfortunately there are parts of xprtrdma (namely FRMR deregistration)
>> that are not easy to serialize with this reconnect logic.
>>
>> Re-using the QP would mean no serialization would be needed between
>> transport reconnect and FRMR deregistration.
>>
>> If QP re-use is not supported, though, it's not worth considering any
>> further.
>
> It may be possible to reuse the QP, just not the rdma_cm_id without additional code changes. Reuse of the rdma_cm_id may also require changes in the underlying IB/iWarp CMs.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to re-use a QP for a new connection
[not found] ` <53A4B4A1.50301-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-06-20 22:30 ` Chuck Lever
[not found] ` <905C8760-5964-47F8-8DF2-0C018CBDF695-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2014-06-20 22:30 UTC (permalink / raw)
To: Shirley Ma; +Cc: Hefty, Sean, linux-rdma
Hi Shirley-
I’ve found that to move the QP back to the IB_QPS_INIT state, I need to
call ib_modify_qp() with a specific set of attributes, including the
pkey_index and port_num.
rdma_init_qp_attr() extracts those attributes. But, when I try to call it
after rdma_disconnect(), the rdma_cm_id is not in the RDMA_CM_IDLE state,
and the call fails.
So I can’t get the QP back to the INIT state unless the rdma_cm_id has
somehow been reset.
I suppose I could call rdma_init_qp_attr() while the transport is still
connected, and save the returned attributes.
On Jun 20, 2014, at 6:24 PM, Shirley Ma <shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> The QP can be reused. The rdma_id_private has a field reuseaddr. What additional change is needed besides rdma_set_reuseaddr?
>
> Shirley
>
> On 06/20/2014 02:17 PM, Hefty, Sean wrote:
>>> During a remote transport disconnect, the QP leaves RTS.
>>>
>>> xprtrdma deals with this in a separate transport connect worker process,
>>> where it creates a new id and qp, and replaces the existing id and qp.
>>>
>>> Unfortunately there are parts of xprtrdma (namely FRMR deregistration)
>>> that are not easy to serialize with this reconnect logic.
>>>
>>> Re-using the QP would mean no serialization would be needed between
>>> transport reconnect and FRMR deregistration.
>>>
>>> If QP re-use is not supported, though, it's not worth considering any
>>> further.
>>
>> It may be possible to reuse the QP, just not the rdma_cm_id without additional code changes. Reuse of the rdma_cm_id may also require changes in the underlying IB/iWarp CMs.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to re-use a QP for a new connection
[not found] ` <905C8760-5964-47F8-8DF2-0C018CBDF695-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-06-20 22:56 ` Shirley Ma
0 siblings, 0 replies; 14+ messages in thread
From: Shirley Ma @ 2014-06-20 22:56 UTC (permalink / raw)
To: Chuck Lever; +Cc: Hefty, Sean, linux-rdma
On 06/20/2014 03:30 PM, Chuck Lever wrote:
> Hi Shirley-
>
> I’ve found that to move the QP back to the IB_QPS_INIT state, I need to
> call ib_modify_qp() with a specific set of attributes, including the
> pkey_index and port_num.
>
> rdma_init_qp_attr() extracts those attributes. But, when I try to call it
> after rdma_disconnect(), the rdma_cm_id is not in the RDMA_CM_IDLE state,
> and the call fails.
>
> So I can’t get the QP back to the INIT state unless the rdma_cm_id has
> somehow been reset.
I see, we need to have rdma_reset_id() to change the cm_id state to RDMA_CM_IDLE.
> I suppose I could call rdma_init_qp_attr() while the transport is still
> connected, and save the returned attributes.
Maybe we can save ib_qp_attr in xprtrdma rpcrdma_ia?
> On Jun 20, 2014, at 6:24 PM, Shirley Ma <shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>
>> The QP can be reused. The rdma_id_private has a field reuseaddr. What additional change is needed besides rdma_set_reuseaddr?
>>
>> Shirley
>>
>> On 06/20/2014 02:17 PM, Hefty, Sean wrote:
>>>> During a remote transport disconnect, the QP leaves RTS.
>>>>
>>>> xprtrdma deals with this in a separate transport connect worker process,
>>>> where it creates a new id and qp, and replaces the existing id and qp.
>>>>
>>>> Unfortunately there are parts of xprtrdma (namely FRMR deregistration)
>>>> that are not easy to serialize with this reconnect logic.
>>>>
>>>> Re-using the QP would mean no serialization would be needed between
>>>> transport reconnect and FRMR deregistration.
>>>>
>>>> If QP re-use is not supported, though, it's not worth considering any
>>>> further.
>>>
>>> It may be possible to reuse the QP, just not the rdma_cm_id without additional code changes. Reuse of the rdma_cm_id may also require changes in the underlying IB/iWarp CMs.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to re-use a QP for a new connection
[not found] ` <1828884A29C6694DAF28B7E6B8A8237399313467-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-06-20 22:24 ` Shirley Ma
@ 2014-06-23 15:20 ` Chuck Lever
[not found] ` <8E9844F1-AFDC-4F28-B646-596BCBC3FAA8-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
1 sibling, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2014-06-23 15:20 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma
Hi Sean-
On Jun 20, 2014, at 5:17 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> During a remote transport disconnect, the QP leaves RTS.
>>
>> xprtrdma deals with this in a separate transport connect worker process,
>> where it creates a new id and qp, and replaces the existing id and qp.
>>
>> Unfortunately there are parts of xprtrdma (namely FRMR deregistration)
>> that are not easy to serialize with this reconnect logic.
>>
>> Re-using the QP would mean no serialization would be needed between
>> transport reconnect and FRMR deregistration.
>>
>> If QP re-use is not supported, though, it's not worth considering any
>> further.
>
> It may be possible to reuse the QP, just not the rdma_cm_id without additional code changes. Reuse of the rdma_cm_id may also require changes in the underlying IB/iWarp CMs.
Steve Wise is helping me with a particular issue where QP re-use might
be helpful.
When an RPC/RDMA transport connection is dropped (for example, the NFS
server crashes), xprtrdma destroys the transport's QP and creates a
new one for the next connection.
We’re not quite sure what IB_WC_WR_FLUSH_ERR means in that instance. Our
theory is there is a gap when the old QP is destroyed:
1. If the HW reports a successful WR completion but the QP no longer
exists, the provider substitutes an IB_WC_WR_FLUSH_ERR completion
2. If the WR is dropped before the HW even saw it, the provider inserts
an IB_WC_WR_FLUSH_ERR completion
So if xprtrdma is trying to submit a FAST_REG_MR WR and the completion
gets flushed, xprtrdma has no way to know whether the rkey was bumped in
the adapter. Thus it has no certainty which rkey to use to invalidate
that FRMR.
I was idly wondering whether re-using the QP during connection loss
would provide a guarantee that xprtrdma would never see case 1 above.
Then IB_WC_WR_FLUSH_ERR on a FAST_REG_MR WR would be a more certain
indication that the HW still has the old rkey.
I suppose that xprtrdma can “hang onto” the QP without re-using it by
simply not destroying it until all WRs scheduled on the old QP are
completed. Is reference counting the QP the usual design pattern to deal
with this case?
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: how to re-use a QP for a new connection
[not found] ` <8E9844F1-AFDC-4F28-B646-596BCBC3FAA8-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-06-23 16:17 ` Devesh Sharma
2014-06-23 16:22 ` Hefty, Sean
1 sibling, 0 replies; 14+ messages in thread
From: Devesh Sharma @ 2014-06-23 16:17 UTC (permalink / raw)
To: Chuck Lever, Hefty, Sean; +Cc: linux-rdma
Hi Chuck,
> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
> Sent: Monday, June 23, 2014 8:51 PM
> To: Hefty, Sean
> Cc: linux-rdma
> Subject: Re: how to re-use a QP for a new connection
>
> Hi Sean-
>
> On Jun 20, 2014, at 5:17 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>
> >> During a remote transport disconnect, the QP leaves RTS.
> >>
> >> xprtrdma deals with this in a separate transport connect worker
> >> process, where it creates a new id and qp, and replaces the existing id and
> qp.
> >>
> >> Unfortunately there are parts of xprtrdma (namely FRMR
> >> deregistration) that are not easy to serialize with this reconnect logic.
> >>
> >> Re-using the QP would mean no serialization would be needed between
> >> transport reconnect and FRMR deregistration.
> >>
> >> If QP re-use is not supported, though, it's not worth considering any
> >> further.
> >
> > It may be possible to reuse the QP, just not the rdma_cm_id without
> additional code changes. Reuse of the rdma_cm_id may also require
> changes in the underlying IB/iWarp CMs.
>
> Steve Wise is helping me with a particular issue where QP re-use might be
> helpful.
>
> When an RPC/RDMA transport connection is dropped (for example, the NFS
> server crashes), xprtrdma destroys the transport's QP and creates a new one
> for the next connection.
>
> We're not quite sure what IB_WC_WR_FLUSH_ERR means in that instance.
> Our theory is there is a gap when the old QP is destroyed:
>
> 1. If the HW reports a successful WR completion but the QP no longer
> exists, the provider substitutes an IB_WC_WR_FLUSH_ERR completion
QP still exists but its state is ERROR. This state change could be due to multiple reasons. The WQE/RQE which
Caused this state transition is reported by h/w in the corresponding CQE. Rest of the CQEs after that are completed
With FLUSH-ERROR. This means Data Flow cannot happen anymore and QP needs a reconnection OR a fresh QP needs to be
Created and reconnected.
>
> 2. If the WR is dropped before the HW even saw it, the provider inserts
> an IB_WC_WR_FLUSH_ERR completion
>
> So if xprtrdma is trying to submit a FAST_REG_MR WR and the completion
> gets flushed, xprtrdma has no way to know whether the rkey was bumped in
> the adapter. Thus it has no certainty which rkey to use to invalidate that
> FRMR.
If FRMR WQE is completed in flush, It must be assumed that the request is _incomplete_
>
> I was idly wondering whether re-using the QP during connection loss would
> provide a guarantee that xprtrdma would never see case 1 above.
> Then IB_WC_WR_FLUSH_ERR on a FAST_REG_MR WR would be a more
> certain indication that the HW still has the old rkey.
>
> I suppose that xprtrdma can "hang onto" the QP without re-using it by simply
> not destroying it until all WRs scheduled on the old QP are completed. Is
> reference counting the QP the usual design pattern to deal with this case?
Why can we assume that all those FRMRs for which flush completion is reported are invalid while rest are still valid
Even if a new connection is in place after some time?
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: how to re-use a QP for a new connection
[not found] ` <8E9844F1-AFDC-4F28-B646-596BCBC3FAA8-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-23 16:17 ` Devesh Sharma
@ 2014-06-23 16:22 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823739931EDD5-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
1 sibling, 1 reply; 14+ messages in thread
From: Hefty, Sean @ 2014-06-23 16:22 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-rdma
> Steve Wise is helping me with a particular issue where QP re-use might
> be helpful.
>
> When an RPC/RDMA transport connection is dropped (for example, the NFS
> server crashes), xprtrdma destroys the transport's QP and creates a
> new one for the next connection.
If the remote side crashes, the local QP can transition into the error state, which would flush all posted receives. I believe that a WR that has completed in error only has the wr_id field valid.
Note that calling rdma_disconnect() will also transition the QP into the error state.
> We're not quite sure what IB_WC_WR_FLUSH_ERR means in that instance. Our
> theory is there is a gap when the old QP is destroyed:
>
> 1. If the HW reports a successful WR completion but the QP no longer
> exists, the provider substitutes an IB_WC_WR_FLUSH_ERR completion
>
> 2. If the WR is dropped before the HW even saw it, the provider inserts
> an IB_WC_WR_FLUSH_ERR completion
>
> So if xprtrdma is trying to submit a FAST_REG_MR WR and the completion
> gets flushed, xprtrdma has no way to know whether the rkey was bumped in
> the adapter. Thus it has no certainty which rkey to use to invalidate
> that FRMR.
I'm not familiar with the behavior of fast reg mr.
> I was idly wondering whether re-using the QP during connection loss
> would provide a guarantee that xprtrdma would never see case 1 above.
> Then IB_WC_WR_FLUSH_ERR on a FAST_REG_MR WR would be a more certain
> indication that the HW still has the old rkey.
>
> I suppose that xprtrdma can "hang onto" the QP without re-using it by
> simply not destroying it until all WRs scheduled on the old QP are
> completed. Is reference counting the QP the usual design pattern to deal
> with this case?
I _thought_ that destroying the QP would cleanup any completion entries in the CQ, but I'm not sure of this. Referencing counting should work though.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to re-use a QP for a new connection
[not found] ` <1828884A29C6694DAF28B7E6B8A823739931EDD5-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-06-23 17:22 ` Chuck Lever
[not found] ` <1F02274F-B3FC-40EE-A46D-FB178EA3781B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2014-06-23 17:22 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma
On Jun 23, 2014, at 12:22 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> Steve Wise is helping me with a particular issue where QP re-use might
>> be helpful.
>>
>> When an RPC/RDMA transport connection is dropped (for example, the NFS
>> server crashes), xprtrdma destroys the transport's QP and creates a
>> new one for the next connection.
>
> If the remote side crashes, the local QP can transition into the error state, which would flush all posted receives. I believe that a WR that has completed in error only has the wr_id field valid.
>
> Note that calling rdma_disconnect() will also transition the QP into the error state.
So on remote disconnect there are two steps:
1. The QP is transitioned to the error state
2. Later, when xprtrdma attempts to reconnect, it’s transport connect
worker destroys the old QP
I think you and Devesh are suggesting that when the QP is transitioned
to error state in step 1, the provider immediately flushes the send and
completion queues appropriately, leaving no possibility of a completed
WR with a dropped completion.
>
>> We're not quite sure what IB_WC_WR_FLUSH_ERR means in that instance. Our
>> theory is there is a gap when the old QP is destroyed:
>>
>> 1. If the HW reports a successful WR completion but the QP no longer
>> exists, the provider substitutes an IB_WC_WR_FLUSH_ERR completion
>>
>> 2. If the WR is dropped before the HW even saw it, the provider inserts
>> an IB_WC_WR_FLUSH_ERR completion
>>
>> So if xprtrdma is trying to submit a FAST_REG_MR WR and the completion
>> gets flushed, xprtrdma has no way to know whether the rkey was bumped in
>> the adapter. Thus it has no certainty which rkey to use to invalidate
>> that FRMR.
>
> I'm not familiar with the behavior of fast reg mr.
For the record, with both mlx4 and cxgb4, we see FRMRs left valid
after a FAST_REG_MR is flushed during a connection loss. More study
needed, obviously.
>> I was idly wondering whether re-using the QP during connection loss
>> would provide a guarantee that xprtrdma would never see case 1 above.
>> Then IB_WC_WR_FLUSH_ERR on a FAST_REG_MR WR would be a more certain
>> indication that the HW still has the old rkey.
>>
>> I suppose that xprtrdma can "hang onto" the QP without re-using it by
>> simply not destroying it until all WRs scheduled on the old QP are
>> completed. Is reference counting the QP the usual design pattern to deal
>> with this case?
>
> I _thought_ that destroying the QP would cleanup any completion entries in the CQ, but I'm not sure of this. Referencing counting should work though.
As a workaround, I can comment out the rdma_destroy_qp() call in xprtrdma's
connect worker to see if there’s any change in behavior when the old QP
stays around.
Given that the queues are flushed on RTS->Error, probably won’t see any
difference at all.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: how to re-use a QP for a new connection
[not found] ` <1F02274F-B3FC-40EE-A46D-FB178EA3781B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-06-23 17:25 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823739931EE90-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Hefty, Sean @ 2014-06-23 17:25 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-rdma
> For the record, with both mlx4 and cxgb4, we see FRMRs left valid
> after a FAST_REG_MR is flushed during a connection loss. More study
> needed, obviously.
Is the bug that this type of WR completes in error, but actually exposed the memory region?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to re-use a QP for a new connection
[not found] ` <1828884A29C6694DAF28B7E6B8A823739931EE90-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2014-06-23 17:31 ` Chuck Lever
[not found] ` <98556348-B33A-4C2C-9D4E-AEA57FB472CE-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2014-06-23 17:31 UTC (permalink / raw)
To: Hefty, Sean; +Cc: linux-rdma, Steve Wise
On Jun 23, 2014, at 1:25 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>> For the record, with both mlx4 and cxgb4, we see FRMRs left valid
>> after a FAST_REG_MR is flushed during a connection loss. More study
>> needed, obviously.
>
> Is the bug that this type of WR completes in error, but actually exposed the memory region?
We haven’t checked if the MR is exposed; hadn’t thought of that!
What we do know is that a subsequent LOCAL_INVALIDATE using the rkey that
should work (if FAST_REG_MR had indeed never been done) fails in some cases.
With mlx4, the LINV completes with IB_WC_MW_BIND_ERR. Steve can provide
more detail about the exact failure mode with cxgb4.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: how to re-use a QP for a new connection
[not found] ` <98556348-B33A-4C2C-9D4E-AEA57FB472CE-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-06-23 21:12 ` Steve Wise
0 siblings, 0 replies; 14+ messages in thread
From: Steve Wise @ 2014-06-23 21:12 UTC (permalink / raw)
To: Chuck Lever, Hefty, Sean; +Cc: linux-rdma
On 6/23/2014 12:31 PM, Chuck Lever wrote:
> On Jun 23, 2014, at 1:25 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
>
>>> For the record, with both mlx4 and cxgb4, we see FRMRs left valid
>>> after a FAST_REG_MR is flushed during a connection loss. More study
>>> needed, obviously.
>> Is the bug that this type of WR completes in error, but actually exposed the memory region?
> We haven’t checked if the MR is exposed; hadn’t thought of that!
I don't think this is a bug. It is a race where HW is in the process of
fast-registering the memory at the time the QP is moved out of RTS
causing all pending work requests to get FLUSHED. I looked at both the
IBTA IB and IETF iWARP Verbs specs, and neither state explicitly what
FLUSHED status means. They both say "at the the time the QP was moved
to ERROR the work request was not complete". That's doesn't indicate
that the work request was canceled or didn't actually complete. At
least that's how I read it. Irregardless, the chelsio hardware behaves
this way. And apparently the mlx hardware does too.
Anyway, for cxgb4 at least, the FRMR can be left in the valid state.
The correct procedure, in the case of a fast-reg wr completing as
FLUSHED is to dereg the MR if you want to ensure the region is invalidated.
> What we do know is that a subsequent LOCAL_INVALIDATE using the rkey that
> should work (if FAST_REG_MR had indeed never been done) fails in some cases.
> With mlx4, the LINV completes with IB_WC_MW_BIND_ERR. Steve can provide
> more detail about the exact failure mode with cxgb4.
cxgb4 completes with IB_WC_LOC_ACCESS_ERR.
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-06-23 21:12 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-20 18:06 how to re-use a QP for a new connection Chuck Lever
[not found] ` <36E48CE3-3FB6-4985-9CA5-4D6B800EE3DC-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-20 19:41 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A82373993132A8-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-06-20 20:32 ` Chuck Lever
[not found] ` <5F77D836-4EE1-458D-B256-3C0EF4B1F2C2-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-20 21:17 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A8237399313467-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-06-20 22:24 ` Shirley Ma
[not found] ` <53A4B4A1.50301-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-20 22:30 ` Chuck Lever
[not found] ` <905C8760-5964-47F8-8DF2-0C018CBDF695-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-20 22:56 ` Shirley Ma
2014-06-23 15:20 ` Chuck Lever
[not found] ` <8E9844F1-AFDC-4F28-B646-596BCBC3FAA8-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-23 16:17 ` Devesh Sharma
2014-06-23 16:22 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823739931EDD5-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-06-23 17:22 ` Chuck Lever
[not found] ` <1F02274F-B3FC-40EE-A46D-FB178EA3781B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-23 17:25 ` Hefty, Sean
[not found] ` <1828884A29C6694DAF28B7E6B8A823739931EE90-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-06-23 17:31 ` Chuck Lever
[not found] ` <98556348-B33A-4C2C-9D4E-AEA57FB472CE-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-23 21:12 ` Steve Wise
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox