Re: [RFC,PATCH 4/14] knfsd: has_wspace per transport

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Greg Banks <gnb@sgi.com>
To: Tom Tucker <tom@opengridcomputing.com>
Cc: NeilBrown <neilb@suse.de>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	"Talpey, Thomas" <Thomas.Talpey@netapp.com>,
	Linux NFS Mailing List <nfs@lists.sourceforge.net>,
	Peter Leckie <pleckie@melbourne.sgi.com>
Subject: Re: [RFC,PATCH 4/14] knfsd: has_wspace per transport
Date: Wed, 23 May 2007 16:41:57 +1000	[thread overview]
Message-ID: <20070523064157.GE14076@sgi.com> (raw)
In-Reply-To: <C27939D4.30F92%tom@opengridcomputing.com>

On Wed, May 23, 2007 at 12:22:44AM -0500, Tom Tucker wrote:
> 
> 
> 
> On 5/22/07 9:32 PM, "Greg Banks" <gnb@sgi.com> wrote:
> 
> > On Tue, May 22, 2007 at 12:34:23PM -0500, Tom Tucker wrote:
> 
> I need to do a little clean up (e.g. atomic_inc()) and then I'll add them as
> suggested. If there are dissenters, please speak now...

You should be able to get away with just ++.  They're only stats,
it's not like you have logic depending on them being exact.

> >> The current thread will wait, but the remaining
> >> threads (127 in your case) will be free to do other work for other mount
> >> points. 
> >> 
> >> This only falls over in the limit of 128 mounts all pounding the server
> >> with writes. Comments?
> > 
> > A common deployment scenario I would expect to see would have at
> > least twice as many clients as threads, and SGI have successfully
> > tested scenarios with a 16:1 client:thread ratio.  So I don't think
> > we can solve the problem in any way which ties down a thread for
> > up to 6 seconds per client, because there won't be enough threads
> > to go around.
> 
> Agreed. But isn't that the 100 dead adapter scenario?

Umm?

> Someone has to wait.
> Who? 

> > In other words, invert the rdma_read_xdr() logic so nfsd return
> > to the main loop instead of blocking.
> > 
> > Unfortunately it's kind of a major change.  Thoughts?
> > 
> 
> Ok, I love it and I hate it :-)  This is the consolidated waiter strategy
> that solves all the problems...except ...does this expose any read/write
> ordering issues at the client? Couldn't the client issue a write followed by
> a read and get the original data?

The server doesn't guarantee that the order of completion of calls
in the filesystem is the same as the order of emission of those calls
by the client.  On UDP, the calls might not even arrive at the server
in the same order they left the client.

If the client needs to ensure that a READ happens after a WRITE,
it needs to wait for the WRITE reply before emitting the READ; that
should still be true with this change.

> That's the hate it part. If we need to decide which requests are allowed to
> proceed on a QP with outstanding READ WR, things get messy quick.
> 
> Is there no such requirement?

I don't think so.

You might want to limit the number of RDMA READ streams in flight.

> >>> And every now and again something goes awry in
> >>> IB land and each thread ends up waiting for 6 seconds or more.
> >> 
> >> Yes. Note that when this happens, the connection gets terminated.
> > 
> > Indeed.  Meanwhile, for the last 6 seconds all the nfsds are
> > blocked waiting for the one client, and none of the other clients
> > are getting any service.
> 
> We need to think about this. A dead adapter causes havoc.

A dead adaptor on the server should cause havoc, at least for as long
as it takes HA to kick in.  A dead adaptor on a client should affect
only that client and no other clients.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

next prev parent reply	other threads:[~2007-05-23  6:42 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-16 19:22 [RFC,PATCH 4/14] knfsd: has_wspace per transport Greg Banks
2007-05-16 21:10 ` J. Bruce Fields
2007-05-17  7:12   ` Greg Banks
2007-05-17 10:30     ` Neil Brown
2007-05-17 12:39       ` Talpey, Thomas
2007-05-18  0:30         ` Neil Brown
2007-05-18  4:05       ` Greg Banks
2007-05-18 13:33         ` Tom Tucker
2007-05-18 13:39           ` Tom Tucker
2007-05-22 11:16           ` Greg Banks
2007-05-22 17:34             ` Tom Tucker
2007-05-23  2:32               ` Greg Banks
2007-05-23  5:22                 ` Tom Tucker
2007-05-23  6:41                   ` Greg Banks [this message]
2007-05-23 13:36                     ` Chuck Lever
2007-05-23 14:39                       ` Greg Banks
2007-05-23 20:11                         ` Chuck Lever
2007-05-18 13:44         ` Talpey, Thomas
2007-05-18  6:21       ` Greg Banks
2007-05-18  6:38         ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070523064157.GE14076@sgi.com \
    --to=gnb@sgi.com \
    --cc=Thomas.Talpey@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=neilb@suse.de \
    --cc=nfs@lists.sourceforge.net \
    --cc=pleckie@melbourne.sgi.com \
    --cc=tom@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.