Linux NFS development
 help / color / mirror / Atom feed
From: Peter Leckie <pleckie@sgi.com>
To: "Talpey, Thomas" <Thomas.Talpey@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
	talpey@netapp.com, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 01/04]  NFS/RDMA client stall patches
Date: Thu, 12 Jun 2008 18:45:36 +1000	[thread overview]
Message-ID: <4850E230.3010101@sgi.com> (raw)
In-Reply-To: <RTPCLUEXC1-PRDaogxL000001eb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>

Talpey, Thomas wrote:
> At 04:03 AM 6/11/2008, Peter Leckie wrote:
>   
>> That's a good point you raise there I was looking to closely at the tcp 
>> equivalent, the correct fix for this issue would be to implement a timer 
>> function for NFS/RDMA pretty much identical to xs_udp_timer(), as follows:
>>     
>
> Hmm, in fact that runs into a different issue - retransmitting over RDMA
> isn't allowed, since it consumes server credits and therefore will eventually
> overrun the connection's receive queue. I have a patch in my queue to
> force a disconnect in fact, which is the appropriate action. I will send them
> out soon, it's in with some other post-Connectathon work.
>
> I think with your earlier patch to avoid the 5-second pause, the disconnect
> action will be prompt and accurate. However, I would still be concerned why
> the RPC was timing out in the first place. Was there an issue in the server?
>   
So this is not a typical runtime issue it was just another reason the 
NFS/RDMA client failed to reconnect after
the server disconnects us. Under this circumstance the client is stalled 
on congestion and has not received the
disconnection event. So what both these patches do is allow the client 
to try and resend another RPC, this causes
rpcrdma_conn_upcall() to be called with an event of 
RDMA_CM_EVENT_CONNECT_ERROR which
then allows the xprt to be disconnected and reconnected. Without this 
change what happens is the client sits there
for ever waiting for the congestion to drop.

This does ask another question why didn't the DREQ from the server cause 
the xprt on the client to disconnect.
Umm I might try and reproduce this error and see what's happening.

Thanks,
Pete




      parent reply	other threads:[~2008-06-12  8:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-19  3:50 [PATCH 01/04] NFS/RDMA client stall patches Peter Leckie
2008-06-10 19:24 ` Trond Myklebust
2008-06-11  8:03   ` Peter Leckie
2008-06-11 13:53     ` Talpey, Thomas
     [not found]       ` <RTPCLUEXC1-PRDaogxL000001eb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-06-12  8:45         ` Peter Leckie [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4850E230.3010101@sgi.com \
    --to=pleckie@sgi.com \
    --cc=Thomas.Talpey@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=talpey@netapp.com \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox