From: "J. Bruce Fields" <bfields@fieldses.org>
To: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
Cc: Linux NFS ML <linux-nfs@vger.kernel.org>
Subject: Re: [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls.
Date: Fri, 9 Jan 2009 16:29:21 -0500 [thread overview]
Message-ID: <20090109212921.GC5466@fieldses.org> (raw)
In-Reply-To: <4966B92F.8060008-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
On Fri, Jan 09, 2009 at 01:40:47PM +1100, Greg Banks wrote:
> J. Bruce Fields wrote:
> > Appended below if it's of
> > any use to you to compare. (Happy to apply either your patch or mine
> > depending which looks better; I haven't tried to compare carefully.)
> >
>
> Ok, I'll take a look at yours.
Thanks for doing this, by the way.
> > static ssize_t
> > cache_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
> > {
> > - struct cache_reader *rp = filp->private_data;
> > - struct cache_request *rq;
> > + struct cache_request *rq = filp->private_data;
> > struct cache_detail *cd = PDE(filp->f_path.dentry->d_inode)->data;
> > + struct list_head *queue = &cd->queue;
> > int err;
> >
> > if (count == 0)
> > @@ -711,60 +703,45 @@ cache_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
> > mutex_lock(&queue_io_mutex); /* protect against multiple concurrent
> > * readers on this file */
> >
>
> Ah, so you still have a single global lock which is serialising all
> reads and writes to all caches.
Yes, making this per-cd seems sensible (though if the problem is
typically a single cache (auth_unix) then I don't know how significant a
help it is).
> Also, I think you'd want to sample filp->private_data after taking
> queue_io_mutex.
Whoops, yes.
> > + if (rq == NULL) {
> > mutex_unlock(&queue_io_mutex);
> > - BUG_ON(rp->offset);
> >
>
> Good riddance to that BUG_ON().
> > - return 0;
> > + return -EAGAIN;
> > }
> > - rq = container_of(rp->q.list.next, struct cache_request, q.list);
> > - BUG_ON(rq->q.reader);
> > - if (rp->offset == 0)
> > - rq->readers++;
> > - spin_unlock(&queue_lock);
> >
> > - if (rp->offset == 0 && !test_bit(CACHE_PENDING, &rq->item->flags)) {
> > + if (!test_bit(CACHE_PENDING, &rq->item->flags))
> > err = -EAGAIN;
> > - spin_lock(&queue_lock);
> > - list_move(&rp->q.list, &rq->q.list);
> > - spin_unlock(&queue_lock);
> > - } else {
> > - if (rp->offset + count > rq->len)
> > - count = rq->len - rp->offset;
> > + else {
> > + if (rq->offset + count > rq->len)
> > + count = rq->len - rq->offset;
> > err = -EFAULT;
> > - if (copy_to_user(buf, rq->buf + rp->offset, count))
> > + if (copy_to_user(buf, rq->buf + rq->offset, count))
> > goto out;
> >
>
> Ok, so you try to keep partial read support but move the offset into the
> upcall request itself. Interesting idea.
>
> I think partial reads are Just Too Hard to do properly, i.e. without
> risk of racy message corruption under all combinations of message size
> and userspace behaviour . In particular I think your code will corrupt
> upcall data if multiple userspace threads race to do partial reads on
> the same struct file (as rpc.mountd is doing at SGI's customer sites).
Yes, but what mountd's doing is just dumb, as far as I can tell; is
there any real reason not to just keep a separate open for each thread?
If we just tell userland to keep a number of opens equal to the number
of concurrent upcalls it wants to handle, and then all of this becomes
very easy.
Forget sharing a single struct file between tasks that do too-small
reads: we should make sure that we don't oops, but that's all.
> For the same reasons I think the FIONREAD support is utterly pointless
> (even though I preserved it).
>
> But I still don't understand how this 100K message size number for gssd
> can happen? If that's really necessary then the design equation changes
> considerably. This seems to be the most significant difference between
> our patches.
So, the somewhat depressing situation with spkm3, which was to be the
public-key-based gss mechanism for nfs: we (citi) implemented it (modulo
this problem and maybe one or two others), but found some problems along
the way that required revising the spec. I think there might have been
an implementation for one other platform too. But the ietf seems to
have given up on spkm3, and instead is likely to pursue a new mechanism,
pku2u. Nobody's implementing that yet. Consulting my local expert, it
sounds like pku2u will have some more reasonable bound on init-sec-ctx
token sizes. (Not sure if it'll be under a page, without checking, but
it shouldn't be much more than that, anyway).
So: the immediate pressure for larger upcalls is probably gone. And
anyway more changes would be required (the ascii-hex encoding and
upcall-downcall matching are very wasteful in this case, etc.), so we'd
likely end up using a different mechanism.
That said, I think it's easy enough to handle just the case of multiple
reads on unshared struct files that it might make sense to keep that
piece.
> A smaller issue is that you keep a single list and use the value of the
> CACHE_PENDING bit to tell the difference between states. I think this
> could be made to work; however my technique of using two lists makes
> most of the code even simpler at the small cost of having to do two list
> searches in queue_loose().
OK. When exactly do they get moved between lists? I'll take a look at
the code.
--b.
next prev parent reply other threads:[~2009-01-09 21:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-08 8:25 [patch 00/14] sunrpc: Sunrpc cache cleanups and upcall rework Greg Banks
2009-01-08 8:25 ` [patch 01/14] sunrpc: Use consistent naming for variables of type struct cache_detail* Greg Banks
2009-01-08 8:25 ` [patch 02/14] sunrpc: Use consistent naming for variables of type struct cache_head* Greg Banks
2009-01-08 8:25 ` [patch 03/14] sunrpc: Use consistent naming for variables of type struct cache_request* Greg Banks
2009-01-08 8:25 ` [patch 04/14] sunrpc: Minor indentation cleanup in cache.c Greg Banks
2009-01-08 8:25 ` [patch 05/14] sunrpc: Rename queue_loose() to cache_remove_queued() Greg Banks
2009-01-08 8:25 ` [patch 06/14] sunrpc: Gather forward declarations of static functions in cache.c Greg Banks
2009-01-08 8:25 ` [patch 07/14] sunrpc: Make the global queue_lock per-cache-detail Greg Banks
2009-01-08 8:25 ` [patch 08/14] sunrpc: Make the global queue_wait per-cache-detail Greg Banks
2009-01-08 8:25 ` [patch 09/14] sunrpc: Remove the global lock queue_io_mutex Greg Banks
2009-01-08 8:25 ` [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls Greg Banks
2009-01-08 19:57 ` J. Bruce Fields
2009-01-09 2:40 ` Greg Banks
[not found] ` <4966B92F.8060008-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-09 2:57 ` J. Bruce Fields
2009-01-09 3:12 ` Greg Banks
[not found] ` <4966C0AB.7000604-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-09 16:53 ` Chuck Lever
2009-01-10 1:28 ` Greg Banks
2009-01-09 21:29 ` J. Bruce Fields [this message]
2009-01-09 21:41 ` J. Bruce Fields
2009-01-09 23:40 ` Greg Banks
2009-01-09 23:29 ` Greg Banks
2009-01-08 8:25 ` [patch 11/14] sunrpc: Allocate cache_requests in a single allocation Greg Banks
2009-01-08 8:25 ` [patch 12/14] sunrpc: Centralise memory management of cache_requests Greg Banks
2009-01-08 8:25 ` [patch 13/14] sunrpc: Move struct cache_request to linux/sunrpc/cache.h Greg Banks
2009-01-08 8:25 ` [patch 14/14] sunrpc: Improve the usefulness of debug printks in the sunrpc cache code Greg Banks
2009-01-08 19:52 ` [patch 00/14] sunrpc: Sunrpc cache cleanups and upcall rework J. Bruce Fields
2009-01-09 1:42 ` Greg Banks
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090109212921.GC5466@fieldses.org \
--to=bfields@fieldses.org \
--cc=gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox