public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* question on the timewait event of the rdma-cm
@ 2011-11-14 16:16 Or Gerlitz
       [not found] ` <4EC13EE9.1070103-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2011-11-14 16:16 UTC (permalink / raw)
  To: Hefty, Sean, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Sean,

I'm debugging some disconnect related race in iser - and wanted to check 
with you something re the CM/RDMA-CM state machine: I see that when a 
disconnected is initiated by the passive side (iser target) of a 
connection, such that the active side (iser initiator) gets 
RDMA_CM_EVENT_DISCONNECTED, later we call rdma_disconnect (e.g to move 
the QP into error state, send DREP, etc) - SOMETIMES the active side 
gets also RDMA_CM_EVENT_TIMEWAIT_EXIT event, AFAIK this should happen  
if the rdma-cm ID isn't destroyed within X time, correct? if yes, so 
what is that X? if this description isn't accurate what would be the 
correct phrasing for the condition to get the timewait event?

Or.

the call to rdma_disconnect was made on jiffies count 4295608818

> Nov 14 17:57:50 iser: iser_cma_handler:event 10 status 0 conn 
> ffff88060c8fb2b0 id ffff88061054dc00 jiffies 4295608742
> Nov 14 17:57:50 connection399:0: detected conn error (1011)
> Nov 14 17:57:50 iser: iser_disconnected_handler:more 32 rx 1 tx 
> completions to reap
> Nov 14 17:57:50 iser: iscsi_iser_ep_disconnect:ib conn 
> ffff88060c8fb2b0 state 3 jiffies 4295608818
> Nov 14 17:57:51 Kernel reported iSCSI connection 399:0 error (1011) 
> state (3)
> Nov 14 17:57:52 iser: iser_cma_handler:event 15 status 0 conn 
> ffff88060c8fb2b0 id ffff88061054dc00 jiffies 4295609333
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: question on the timewait event of the rdma-cm
       [not found] ` <4EC13EE9.1070103-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2011-11-14 19:16   ` Hefty, Sean
       [not found]     ` <1828884A29C6694DAF28B7E6B8A8237316E9AA8E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Hefty, Sean @ 2011-11-14 19:16 UTC (permalink / raw)
  To: Or Gerlitz, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> I'm debugging some disconnect related race in iser - and wanted to check
> with you something re the CM/RDMA-CM state machine: I see that when a
> disconnected is initiated by the passive side (iser target) of a
> connection, such that the active side (iser initiator) gets
> RDMA_CM_EVENT_DISCONNECTED, later we call rdma_disconnect (e.g to move
> the QP into error state, send DREP, etc) - SOMETIMES the active side
> gets also RDMA_CM_EVENT_TIMEWAIT_EXIT event, AFAIK this should happen
> if the rdma-cm ID isn't destroyed within X time, correct? if yes, so
> what is that X? if this description isn't accurate what would be the
> correct phrasing for the condition to get the timewait event?

After disconnecting, the QP should enter the timewait state for twice the packet lifetime.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question on the timewait event of the rdma-cm
       [not found]     ` <1828884A29C6694DAF28B7E6B8A8237316E9AA8E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2011-11-14 20:27       ` Or Gerlitz
       [not found]         ` <CAJZOPZJVs=NSky4hQ1sdoF+XiU8_Pg6FzS2AUMjgqVAzs7Oi6Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2011-11-14 20:27 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Mon, Nov 14, 2011 at 9:16 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> After disconnecting, the QP should enter the timewait state for twice the packet lifetime.

Does going through timewait always holds? e.g no matter what's the
return status of rdma_disconnect and/or the status of the rdma_cm
disconnected event?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: question on the timewait event of the rdma-cm
       [not found]         ` <CAJZOPZJVs=NSky4hQ1sdoF+XiU8_Pg6FzS2AUMjgqVAzs7Oi6Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-11-14 20:50           ` Hefty, Sean
       [not found]             ` <1828884A29C6694DAF28B7E6B8A8237316E9AAC4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Hefty, Sean @ 2011-11-14 20:50 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> Does going through timewait always holds? e.g no matter what's the
> return status of rdma_disconnect and/or the status of the rdma_cm
> disconnected event?

It usually holds.  It will fail if rdma_disconnect() is called from a bogus state.  But otherwise, I believe that it will enter timewait on failure to send or receive a disconnect message.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question on the timewait event of the rdma-cm
       [not found]             ` <1828884A29C6694DAF28B7E6B8A8237316E9AAC4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2011-11-14 21:10               ` Or Gerlitz
       [not found]                 ` <CAJZOPZKBaQ2HR4kbidW0Mec2xWhZ5H7-ezG20X-J_aYENd9ibw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2011-11-14 21:10 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> It usually holds.  It will fail if rdma_disconnect() is called from a bogus state.  But
> otherwise, I believe that it will enter timewait on failure to send or receive a disconnect
> message

mmm, so can these bogus states for rdma_disconnect to be called be
better defined? basically, for  the case where the rdma_cm manages the
consumer QP, this call is the only way to move an RC QP into the error
state when the QP is okay and the consumer want to flush, etc.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: question on the timewait event of the rdma-cm
       [not found]                 ` <CAJZOPZKBaQ2HR4kbidW0Mec2xWhZ5H7-ezG20X-J_aYENd9ibw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-11-14 21:28                   ` Hefty, Sean
       [not found]                     ` <1828884A29C6694DAF28B7E6B8A8237316E9AAF4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Hefty, Sean @ 2011-11-14 21:28 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> mmm, so can these bogus states for rdma_disconnect to be called be
> better defined? basically, for  the case where the rdma_cm manages the
> consumer QP, this call is the only way to move an RC QP into the error
> state when the QP is okay and the consumer want to flush, etc.

By bogus I mean calling disconnect when the QP has never been connected, or calling disconnect twice.  The ib_cm will check the state of the connection when disconnect is called.  If disconnect is called from the wrong state, it simply fails.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question on the timewait event of the rdma-cm
       [not found]                     ` <1828884A29C6694DAF28B7E6B8A8237316E9AAF4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2011-11-14 21:37                       ` Or Gerlitz
       [not found]                         ` <CAJZOPZLn3hEYp_kHRs22ttF3ca0iq4s5NtFi4iUYgEie81osBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Or Gerlitz @ 2011-11-14 21:37 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> By bogus I mean calling disconnect when the QP has never been connected, or calling
> disconnect twice

what return value can serve as bogus indication for the application?
is that -EINVAL? also, basically a QP could have buffers posted to it
also before being connected (e.g after RTR or there's no point in time
an rdma-cm consumer for which the cma manages the QP state is exposed
to such QP in RTR and not RTS?) as for the twice case, still the IB CM
will go through the timewait status in return for the 1st, not bogus
call, correct?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: question on the timewait event of the rdma-cm
       [not found]                         ` <CAJZOPZLn3hEYp_kHRs22ttF3ca0iq4s5NtFi4iUYgEie81osBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-11-14 21:55                           ` Hefty, Sean
       [not found]                             ` <CAJZOPZLB-WvMQwhkx_o5kMD-A33ff8chA1AvQSj+npg4HiOAKQ@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Hefty, Sean @ 2011-11-14 21:55 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

> what return value can serve as bogus indication for the application?
> is that -EINVAL? also, basically a QP could have buffers posted to it
> also before being connected (e.g after RTR or there's no point in time
> an rdma-cm consumer for which the cma manages the QP state is exposed
> to such QP in RTR and not RTS?) as for the twice case, still the IB CM
> will go through the timewait status in return for the 1st, not bogus
> call, correct?

rdma_disconnect() returns the value from modifying the QP state into error.  It masks the return value of calling the ib_cm to send the DREQ or DREP.  If rdma_disconnect() succeeds, a timewait event should occur, unless the app calls rdma_disconnect() without being connected.

Calling disconnect is one way that a QP may be transitioned into timewait.  The ib_cm will also transition to timewait in certain connection failure cases: REP times out or is rejected, or remote side detects a stale connection.  Without reading through the code, I don't know off the top of my head if a timewait event would follow an asynchronous rdma_connect() failure, even if rdma_disconnect() were called after the error.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question on the timewait event of the rdma-cm
       [not found]                               ` <CAJZOPZLB-WvMQwhkx_o5kMD-A33ff8chA1AvQSj+npg4HiOAKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2011-11-15  5:26                                 ` Or Gerlitz
  0 siblings, 0 replies; 9+ messages in thread
From: Or Gerlitz @ 2011-11-15  5:26 UTC (permalink / raw)
  To: Hefty, Sean; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

 On Mon, Nov 14, 2011 at 11:55 PM, Hefty, Sean <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> [...] calling disconnect is one way that a QP may be transitioned into timewait [...]

I was talking on the QP "physical" state (e.g error that causes
flushes) not the state w.r.t the IB CM.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-11-15  5:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-14 16:16 question on the timewait event of the rdma-cm Or Gerlitz
     [not found] ` <4EC13EE9.1070103-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2011-11-14 19:16   ` Hefty, Sean
     [not found]     ` <1828884A29C6694DAF28B7E6B8A8237316E9AA8E-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-11-14 20:27       ` Or Gerlitz
     [not found]         ` <CAJZOPZJVs=NSky4hQ1sdoF+XiU8_Pg6FzS2AUMjgqVAzs7Oi6Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-14 20:50           ` Hefty, Sean
     [not found]             ` <1828884A29C6694DAF28B7E6B8A8237316E9AAC4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-11-14 21:10               ` Or Gerlitz
     [not found]                 ` <CAJZOPZKBaQ2HR4kbidW0Mec2xWhZ5H7-ezG20X-J_aYENd9ibw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-14 21:28                   ` Hefty, Sean
     [not found]                     ` <1828884A29C6694DAF28B7E6B8A8237316E9AAF4-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-11-14 21:37                       ` Or Gerlitz
     [not found]                         ` <CAJZOPZLn3hEYp_kHRs22ttF3ca0iq4s5NtFi4iUYgEie81osBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-14 21:55                           ` Hefty, Sean
     [not found]                             ` <CAJZOPZLB-WvMQwhkx_o5kMD-A33ff8chA1AvQSj+npg4HiOAKQ@mail.gmail.com>
     [not found]                               ` <CAJZOPZLB-WvMQwhkx_o5kMD-A33ff8chA1AvQSj+npg4HiOAKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-11-15  5:26                                 ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox