From: Greg Banks <gnb@sgi.com>
To: "Talpey, Thomas" <Thomas.Talpey@netapp.com>
Cc: Neil Brown <neilb@suse.de>,
Peter Leckie <pleckie@melbourne.sgi.com>,
Trond Myklebust <trond.myklebust@fys.uio.no>,
"J. Bruce Fields" <bfields@fieldses.org>,
Linux NFS Mailing List <nfs@lists.sourceforge.net>
Subject: Re: [RFC,PATCH 11/15] knfsd: RDMA transport core
Date: Thu, 24 May 2007 18:35:08 +1000 [thread overview]
Message-ID: <20070524083508.GD31072@sgi.com> (raw)
In-Reply-To: <EXNANE01XvpFVjCRGry00000a64@exnane01.hq.netapp.com>
On Wed, May 23, 2007 at 05:00:03PM -0400, Talpey, Thomas wrote:
> At 04:01 PM 5/23/2007, Trond Myklebust wrote:
> >On Wed, 2007-05-23 at 14:59 -0400, Talpey, Thomas wrote:
> >> Personally, I'm not completely sure I see the problem here. If an RDMA
> >> adapter is going out to lunch and hanging what should be a very fast
> >> operation (the RDMA Read data pull), then that's an adapter problem
> >> which we should address in the adapter layer, or via some sort of interface
> >> hardening between it and RPC. Trying to push the issue back down the RPC
> >> pipe to the sending peer seems to me a very unworkable solution.
> >
> >AFAIK, the most common reason for wanting to defer a request is if the
> >server needs to make an upcall in order to talk to mountd,
This is the original and AFAICT only reason svc_defer() is called.
> > or to resolve
> >an NFSv4 name using idmapd.
It seems the idmap code deliberately circumvents the asynchronous
defer/revisit behaviour, and has code which blocks the calling thread
for up to 1 second in the case of a cache miss and subsequent upcall
to userspace. After 1 second it gives up.
So with NFSv4, if the LDAP server goes AWOL, some portion of NFS
calls will experience multiple-second delays, 1 second for each user
and group name in the call. Wonderful.
> > I don't think you really want to treat
> >hardware failures by deferring requests...
Agreed, the right way to handle hardware issues is disconnect.
> Well, the most common occurrence would be a lost conenction, this
> would prevent sending even nfserr_jukebox. I'm suggesting that if
> we're concerned about using nfsd thread context to pull data, then
> we should also be concerned about calling into filesystems, which might
> hang on their storage adapters, or whatever just as easily.
Two comments.
Firstly, some of us *are* concerned about those issues
http://marc.info/?l=linux-nfs&m=114683005119982&w=2
http://oss.sgi.com/archives/xfs/2007-04/msg00114.html
Secondly, there's a fundamental difference between blocking
for storage-side reasons and blocking for network-side reasons.
The former is effectively internal(*) to the NAS server and reflects
it's inherent capability to provide service. If the disks are broken,
then mechanisms internal to the server host (RAID, failover, whatever)
take care of this. So blocking (for short periods) in the filesystem
because the disks are fully loaded is fine, in fact this is the
fundamental purpose of the nfsd threads.
The latter is external to the server and is subject to the vagaries
of client machines, which can have hardware faults, software flaws,
or even be malicious and attempting to crash the server or lock it up.
Here we have a service boundary which the knfsd code needs to enforce.
We need firstly to protect the server from the effects of bad clients
and secondly to protect other clients from the effects of bad clients.
(*) Here I am ignoring the case of NFS exporting a clustered fs
> Basically, I'm saying there shouldn't be any special handling for the
> RDMA Reads used to pull write data. In the success case, they happen
> quite rapidly (at wire speed), and in the failure case, there isn't any
> peer to talk to anyway. So what are we protecting?
All the *other* clients who can't get any service, or get slower
service, because many nfsd threads are blocked. The problem here
is fairness between multiple clients in the face of a few greedy,
broken or malicious ones.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2007-05-24 8:35 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-18 17:45 [RFC,PATCH 11/15] knfsd: RDMA transport core Tom Tucker
2007-05-18 19:07 ` Trond Myklebust
2007-05-18 20:07 ` Tom Tucker
2007-05-18 21:17 ` Trond Myklebust
2007-05-19 4:32 ` Tom Tucker
2007-05-21 7:16 ` Neil Brown
2007-05-21 16:02 ` Tom Tucker
2007-05-22 5:36 ` Neil Brown
2007-05-22 15:23 ` Tom Tucker
2007-05-18 19:24 ` J. Bruce Fields
2007-05-18 19:36 ` Tom Tucker
2007-05-18 19:42 ` J. Bruce Fields
2007-05-23 14:09 ` Greg Banks
2007-05-23 14:43 ` Tom Tucker
2007-05-23 14:55 ` Greg Banks
2007-05-23 15:03 ` Trond Myklebust
2007-05-23 15:12 ` Tom Tucker
2007-05-23 15:37 ` Trond Myklebust
2007-05-23 16:02 ` Tom Tucker
2007-05-23 16:35 ` Greg Banks
2007-05-23 16:29 ` Greg Banks
2007-05-23 18:07 ` Trond Myklebust
2007-05-23 18:19 ` Talpey, Thomas
2007-05-23 18:37 ` Trond Myklebust
2007-05-23 18:59 ` Talpey, Thomas
2007-05-23 20:01 ` Trond Myklebust
2007-05-23 21:00 ` Talpey, Thomas
2007-05-24 8:35 ` Greg Banks [this message]
2007-05-24 13:45 ` Talpey, Thomas
2007-05-23 15:03 ` Tom Tucker
2007-05-21 7:11 ` Neil Brown
2007-05-21 10:02 ` Greg Banks
2007-05-21 15:58 ` Tom Tucker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070524083508.GD31072@sgi.com \
--to=gnb@sgi.com \
--cc=Thomas.Talpey@netapp.com \
--cc=bfields@fieldses.org \
--cc=neilb@suse.de \
--cc=nfs@lists.sourceforge.net \
--cc=pleckie@melbourne.sgi.com \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.