Re: [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls.

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "J. Bruce Fields" <bfields@fieldses.org>
To: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
Cc: Linux NFS ML <linux-nfs@vger.kernel.org>
Subject: Re: [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls.
Date: Fri, 9 Jan 2009 16:29:21 -0500	[thread overview]
Message-ID: <20090109212921.GC5466@fieldses.org> (raw)
In-Reply-To: <4966B92F.8060008-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>

On Fri, Jan 09, 2009 at 01:40:47PM +1100, Greg Banks wrote:
> J. Bruce Fields wrote:
> >   Appended below if it's of
> > any use to you to compare.  (Happy to apply either your patch or mine
> > depending which looks better; I haven't tried to compare carefully.)
> >   
> 
> Ok, I'll take a look at yours.

Thanks for doing this, by the way.

> >  static ssize_t
> >  cache_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
> >  {
> > -	struct cache_reader *rp = filp->private_data;
> > -	struct cache_request *rq;
> > +	struct cache_request *rq = filp->private_data;
> >  	struct cache_detail *cd = PDE(filp->f_path.dentry->d_inode)->data;
> > +	struct list_head *queue = &cd->queue;
> >  	int err;
> >  
> >  	if (count == 0)
> > @@ -711,60 +703,45 @@ cache_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
> >  	mutex_lock(&queue_io_mutex); /* protect against multiple concurrent
> >  			      * readers on this file */
> >   
> 
> Ah, so you still have a single global lock which is serialising all
> reads and writes to all caches.

Yes, making this per-cd seems sensible (though if the problem is
typically a single cache (auth_unix) then I don't know how significant a
help it is).

> Also, I think you'd want to sample filp->private_data after taking
> queue_io_mutex.

Whoops, yes.

> > +	if (rq == NULL) {
> >  		mutex_unlock(&queue_io_mutex);
> > -		BUG_ON(rp->offset);
> >   
> 
> Good riddance to that BUG_ON().
> > -		return 0;
> > +		return -EAGAIN;
> >  	}
> > -	rq = container_of(rp->q.list.next, struct cache_request, q.list);
> > -	BUG_ON(rq->q.reader);
> > -	if (rp->offset == 0)
> > -		rq->readers++;
> > -	spin_unlock(&queue_lock);
> >  
> > -	if (rp->offset == 0 && !test_bit(CACHE_PENDING, &rq->item->flags)) {
> > +	if (!test_bit(CACHE_PENDING, &rq->item->flags))
> >  		err = -EAGAIN;
> > -		spin_lock(&queue_lock);
> > -		list_move(&rp->q.list, &rq->q.list);
> > -		spin_unlock(&queue_lock);
> > -	} else {
> > -		if (rp->offset + count > rq->len)
> > -			count = rq->len - rp->offset;
> > +	else {
> > +		if (rq->offset + count > rq->len)
> > +			count = rq->len - rq->offset;
> >  		err = -EFAULT;
> > -		if (copy_to_user(buf, rq->buf + rp->offset, count))
> > +		if (copy_to_user(buf, rq->buf + rq->offset, count))
> >  			goto out;
> >   
> 
> Ok, so you try to keep partial read support but move the offset into the
> upcall request itself.  Interesting idea.
> 
> I think partial reads are Just Too Hard to do properly, i.e. without
> risk of racy message corruption under all combinations of message size
> and userspace behaviour .  In particular I think your code will corrupt
> upcall data if multiple userspace threads race to do partial reads on
> the same struct file (as rpc.mountd is doing at SGI's customer sites).

Yes, but what mountd's doing is just dumb, as far as I can tell; is
there any real reason not to just keep a separate open for each thread?

If we just tell userland to keep a number of opens equal to the number
of concurrent upcalls it wants to handle, and then all of this becomes
very easy.

Forget sharing a single struct file between tasks that do too-small
reads: we should make sure that we don't oops, but that's all.

> For the same reasons I think the FIONREAD support is utterly pointless
> (even though I preserved it).
> 
> But I still don't understand how this 100K message size number for gssd
> can happen?  If that's really necessary then the design equation changes
> considerably.  This seems to be the most significant difference between
> our patches.

So, the somewhat depressing situation with spkm3, which was to be the
public-key-based gss mechanism for nfs: we (citi) implemented it (modulo
this problem and maybe one or two others), but found some problems along
the way that required revising the spec.  I think there might have been
an implementation for one other platform too.  But the ietf seems to
have given up on spkm3, and instead is likely to pursue a new mechanism,
pku2u.  Nobody's implementing that yet.  Consulting my local expert, it
sounds like pku2u will have some more reasonable bound on init-sec-ctx
token sizes.  (Not sure if it'll be under a page, without checking, but
it shouldn't be much more than that, anyway).

So: the immediate pressure for larger upcalls is probably gone.  And
anyway more changes would be required (the ascii-hex encoding and
upcall-downcall matching are very wasteful in this case, etc.), so we'd
likely end up using a different mechanism.

That said, I think it's easy enough to handle just the case of multiple
reads on unshared struct files that it might make sense to keep that
piece.

> A smaller issue is that you keep a single list and use the value of the
> CACHE_PENDING bit to tell the difference between states.  I think this
> could be made to work; however my technique of using two lists makes
> most of the code even simpler at the small cost of having to do two list
> searches in queue_loose().

OK.  When exactly do they get moved between lists?  I'll take a look at
the code.

--b.

next prev parent reply	other threads:[~2009-01-09 21:29 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-08  8:25 [patch 00/14] sunrpc: Sunrpc cache cleanups and upcall rework Greg Banks
2009-01-08  8:25 ` [patch 01/14] sunrpc: Use consistent naming for variables of type struct cache_detail* Greg Banks
2009-01-08  8:25 ` [patch 02/14] sunrpc: Use consistent naming for variables of type struct cache_head* Greg Banks
2009-01-08  8:25 ` [patch 03/14] sunrpc: Use consistent naming for variables of type struct cache_request* Greg Banks
2009-01-08  8:25 ` [patch 04/14] sunrpc: Minor indentation cleanup in cache.c Greg Banks
2009-01-08  8:25 ` [patch 05/14] sunrpc: Rename queue_loose() to cache_remove_queued() Greg Banks
2009-01-08  8:25 ` [patch 06/14] sunrpc: Gather forward declarations of static functions in cache.c Greg Banks
2009-01-08  8:25 ` [patch 07/14] sunrpc: Make the global queue_lock per-cache-detail Greg Banks
2009-01-08  8:25 ` [patch 08/14] sunrpc: Make the global queue_wait per-cache-detail Greg Banks
2009-01-08  8:25 ` [patch 09/14] sunrpc: Remove the global lock queue_io_mutex Greg Banks
2009-01-08  8:25 ` [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls Greg Banks
2009-01-08 19:57   ` J. Bruce Fields
2009-01-09  2:40     ` Greg Banks
     [not found]       ` <4966B92F.8060008-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-09  2:57         ` J. Bruce Fields
2009-01-09  3:12           ` Greg Banks
     [not found]             ` <4966C0AB.7000604-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-09 16:53               ` Chuck Lever
2009-01-10  1:28                 ` Greg Banks
2009-01-09 21:29         ` J. Bruce Fields [this message]
2009-01-09 21:41           ` J. Bruce Fields
2009-01-09 23:40             ` Greg Banks
2009-01-09 23:29           ` Greg Banks
2009-01-08  8:25 ` [patch 11/14] sunrpc: Allocate cache_requests in a single allocation Greg Banks
2009-01-08  8:25 ` [patch 12/14] sunrpc: Centralise memory management of cache_requests Greg Banks
2009-01-08  8:25 ` [patch 13/14] sunrpc: Move struct cache_request to linux/sunrpc/cache.h Greg Banks
2009-01-08  8:25 ` [patch 14/14] sunrpc: Improve the usefulness of debug printks in the sunrpc cache code Greg Banks
2009-01-08 19:52 ` [patch 00/14] sunrpc: Sunrpc cache cleanups and upcall rework J. Bruce Fields
2009-01-09  1:42   ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090109212921.GC5466@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox