Re: [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "J. Bruce Fields" <bfields@fieldses.org>
To: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
Cc: Linux NFS ML <linux-nfs@vger.kernel.org>
Subject: Re: [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls.
Date: Fri, 9 Jan 2009 16:29:21 -0500	[thread overview]
Message-ID: <20090109212921.GC5466@fieldses.org> (raw)
In-Reply-To: <4966B92F.8060008-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>

On Fri, Jan 09, 2009 at 01:40:47PM +1100, Greg Banks wrote:
> J. Bruce Fields wrote:
> >   Appended below if it's of
> > any use to you to compare.  (Happy to apply either your patch or mine
> > depending which looks better; I haven't tried to compare carefully.)
> >   
> 
> Ok, I'll take a look at yours.

Thanks for doing this, by the way.

> >  static ssize_t
> >  cache_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
> >  {
> > -	struct cache_reader *rp = filp->private_data;
> > -	struct cache_request *rq;
> > +	struct cache_request *rq = filp->private_data;
> >  	struct cache_detail *cd = PDE(filp->f_path.dentry->d_inode)->data;
> > +	struct list_head *queue = &cd->queue;
> >  	int err;
> >  
> >  	if (count == 0)
> > @@ -711,60 +703,45 @@ cache_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
> >  	mutex_lock(&queue_io_mutex); /* protect against multiple concurrent
> >  			      * readers on this file */
> >   
> 
> Ah, so you still have a single global lock which is serialising all
> reads and writes to all caches.

Yes, making this per-cd seems sensible (though if the problem is
typically a single cache (auth_unix) then I don't know how significant a
help it is).

> Also, I think you'd want to sample filp->private_data after taking
> queue_io_mutex.

Whoops, yes.

> > +	if (rq == NULL) {
> >  		mutex_unlock(&queue_io_mutex);
> > -		BUG_ON(rp->offset);
> >   
> 
> Good riddance to that BUG_ON().
> > -		return 0;
> > +		return -EAGAIN;
> >  	}
> > -	rq = container_of(rp->q.list.next, struct cache_request, q.list);
> > -	BUG_ON(rq->q.reader);
> > -	if (rp->offset == 0)
> > -		rq->readers++;
> > -	spin_unlock(&queue_lock);
> >  
> > -	if (rp->offset == 0 && !test_bit(CACHE_PENDING, &rq->item->flags)) {
> > +	if (!test_bit(CACHE_PENDING, &rq->item->flags))
> >  		err = -EAGAIN;
> > -		spin_lock(&queue_lock);
> > -		list_move(&rp->q.list, &rq->q.list);
> > -		spin_unlock(&queue_lock);
> > -	} else {
> > -		if (rp->offset + count > rq->len)
> > -			count = rq->len - rp->offset;
> > +	else {
> > +		if (rq->offset + count > rq->len)
> > +			count = rq->len - rq->offset;
> >  		err = -EFAULT;
> > -		if (copy_to_user(buf, rq->buf + rp->offset, count))
> > +		if (copy_to_user(buf, rq->buf + rq->offset, count))
> >  			goto out;
> >   
> 
> Ok, so you try to keep partial read support but move the offset into the
> upcall request itself.  Interesting idea.
> 
> I think partial reads are Just Too Hard to do properly, i.e. without
> risk of racy message corruption under all combinations of message size
> and userspace behaviour .  In particular I think your code will corrupt
> upcall data if multiple userspace threads race to do partial reads on
> the same struct file (as rpc.mountd is doing at SGI's customer sites).

Yes, but what mountd's doing is just dumb, as far as I can tell; is
there any real reason not to just keep a separate open for each thread?

If we just tell userland to keep a number of opens equal to the number
of concurrent upcalls it wants to handle, and then all of this becomes
very easy.

Forget sharing a single struct file between tasks that do too-small
reads: we should make sure that we don't oops, but that's all.

> For the same reasons I think the FIONREAD support is utterly pointless
> (even though I preserved it).
> 
> But I still don't understand how this 100K message size number for gssd
> can happen?  If that's really necessary then the design equation changes
> considerably.  This seems to be the most significant difference between
> our patches.

So, the somewhat depressing situation with spkm3, which was to be the
public-key-based gss mechanism for nfs: we (citi) implemented it (modulo
this problem and maybe one or two others), but found some problems along
the way that required revising the spec.  I think there might have been
an implementation for one other platform too.  But the ietf seems to
have given up on spkm3, and instead is likely to pursue a new mechanism,
pku2u.  Nobody's implementing that yet.  Consulting my local expert, it
sounds like pku2u will have some more reasonable bound on init-sec-ctx
token sizes.  (Not sure if it'll be under a page, without checking, but
it shouldn't be much more than that, anyway).

So: the immediate pressure for larger upcalls is probably gone.  And
anyway more changes would be required (the ascii-hex encoding and
upcall-downcall matching are very wasteful in this case, etc.), so we'd
likely end up using a different mechanism.

That said, I think it's easy enough to handle just the case of multiple
reads on unshared struct files that it might make sense to keep that
piece.

> A smaller issue is that you keep a single list and use the value of the
> CACHE_PENDING bit to tell the difference between states.  I think this
> could be made to work; however my technique of using two lists makes
> most of the code even simpler at the small cost of having to do two list
> searches in queue_loose().

OK.  When exactly do they get moved between lists?  I'll take a look at
the code.

--b.

next prev parent reply	other threads:[~2009-01-09 21:29 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-08  8:25 [patch 00/14] sunrpc: Sunrpc cache cleanups and upcall rework Greg Banks
2009-01-08  8:25 ` [patch 01/14] sunrpc: Use consistent naming for variables of type struct cache_detail* Greg Banks
2009-01-08  8:25 ` [patch 02/14] sunrpc: Use consistent naming for variables of type struct cache_head* Greg Banks
2009-01-08  8:25 ` [patch 03/14] sunrpc: Use consistent naming for variables of type struct cache_request* Greg Banks
2009-01-08  8:25 ` [patch 04/14] sunrpc: Minor indentation cleanup in cache.c Greg Banks
2009-01-08  8:25 ` [patch 05/14] sunrpc: Rename queue_loose() to cache_remove_queued() Greg Banks
2009-01-08  8:25 ` [patch 06/14] sunrpc: Gather forward declarations of static functions in cache.c Greg Banks
2009-01-08  8:25 ` [patch 07/14] sunrpc: Make the global queue_lock per-cache-detail Greg Banks
2009-01-08  8:25 ` [patch 08/14] sunrpc: Make the global queue_wait per-cache-detail Greg Banks
2009-01-08  8:25 ` [patch 09/14] sunrpc: Remove the global lock queue_io_mutex Greg Banks
2009-01-08  8:25 ` [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls Greg Banks
2009-01-08 19:57   ` J. Bruce Fields
2009-01-09  2:40     ` Greg Banks
     [not found]       ` <4966B92F.8060008-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-09  2:57         ` J. Bruce Fields
2009-01-09  3:12           ` Greg Banks
     [not found]             ` <4966C0AB.7000604-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-09 16:53               ` Chuck Lever
2009-01-10  1:28                 ` Greg Banks
2009-01-09 21:29         ` J. Bruce Fields [this message]
2009-01-09 21:41           ` J. Bruce Fields
2009-01-09 23:40             ` Greg Banks
2009-01-09 23:29           ` Greg Banks
2009-01-08  8:25 ` [patch 11/14] sunrpc: Allocate cache_requests in a single allocation Greg Banks
2009-01-08  8:25 ` [patch 12/14] sunrpc: Centralise memory management of cache_requests Greg Banks
2009-01-08  8:25 ` [patch 13/14] sunrpc: Move struct cache_request to linux/sunrpc/cache.h Greg Banks
2009-01-08  8:25 ` [patch 14/14] sunrpc: Improve the usefulness of debug printks in the sunrpc cache code Greg Banks
2009-01-08 19:52 ` [patch 00/14] sunrpc: Sunrpc cache cleanups and upcall rework J. Bruce Fields
2009-01-09  1:42   ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090109212921.GC5466@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.