All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Linux NFS ML <linux-nfs@vger.kernel.org>
Subject: Re: [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls.
Date: Sat, 10 Jan 2009 10:29:43 +1100	[thread overview]
Message-ID: <4967DDE7.40106@melbourne.sgi.com> (raw)
In-Reply-To: <20090109212921.GC5466@fieldses.org>

J. Bruce Fields wrote:
> On Fri, Jan 09, 2009 at 01:40:47PM +1100, Greg Banks wrote:
>   
>> J. Bruce Fields wrote:
>>     
>>
>   
>>>  static ssize_t
>>>  cache_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
>>>  {
>>> -	struct cache_reader *rp = filp->private_data;
>>> -	struct cache_request *rq;
>>> +	struct cache_request *rq = filp->private_data;
>>>  	struct cache_detail *cd = PDE(filp->f_path.dentry->d_inode)->data;
>>> +	struct list_head *queue = &cd->queue;
>>>  	int err;
>>>  
>>>  	if (count == 0)
>>> @@ -711,60 +703,45 @@ cache_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
>>>  	mutex_lock(&queue_io_mutex); /* protect against multiple concurrent
>>>  			      * readers on this file */
>>>   
>>>       
>> Ah, so you still have a single global lock which is serialising all
>> reads and writes to all caches.
>>     
>
> Yes, making this per-cd seems sensible (though if the problem is
> typically a single cache (auth_unix) then I don't know how significant a
> help it is).
>   
The usual pattern of traffic I see (on a SLES10 platform with the older
set of export caches) when the first NFS packet arrives during a client
mounts is:

 - upcall to mountd via auth.unix.ip cache
 - mountd writes pre-emptively to nfsd.export and nfsd.expkey caches
 - mountd write reply to auth.unix.ip cache

So it's not just a single cache.

However, I have no measurements for any performance improvement.  Based
on earlier experience I believe the elapsed mounting time to be
dominated by the latency of the forward and reverse DNS lookup that
mountd does, so the improvement is probably small.

>   
>>
>> I think partial reads are Just Too Hard to do properly, i.e. without
>> risk of racy message corruption under all combinations of message size
>> and userspace behaviour .  In particular I think your code will corrupt
>> upcall data if multiple userspace threads race to do partial reads on
>> the same struct file (as rpc.mountd is doing at SGI's customer sites).
>>     
>
> Yes, but what mountd's doing is just dumb, as far as I can tell; is
> there any real reason not to just keep a separate open for each thread?
>   
None at all.  The current rpc.mountd behaviour is a historical accident
of the "look, we can put a fork() here and everything will Just Work"
variety.  I was hoping to avoid changes to the current userspace
behaviour to limit deployment hassles with shipping a fix, but
ultimately it can be changed.

> If we just tell userland to keep a number of opens equal to the number
> of concurrent upcalls it wants to handle, and then all of this becomes
> very easy.
>   
If we put that requirement on userspace, and partial reads are still
necessary, then your approach of using filp->private_data as a parking
spot for requests currently being read would be the right way to go.

> Forget sharing a single struct file between tasks that do too-small
> reads: we should make sure that we don't oops, but that's all.
>   
We should definitely not oops :-)  Consistently delivering correct
messages to userspace would be nice too.

> So, the somewhat depressing situation with spkm3, which was to be the
> public-key-based gss mechanism for nfs: we (citi) implemented it (modulo
> this problem and maybe one or two others), but found some problems along
> the way that required revising the spec.
This is frequently the way with specs :-/

> [...]
> So: the immediate pressure for larger upcalls is probably gone.
Sweet.

> That said, I think it's easy enough to handle just the case of multiple
> reads on unshared struct files that it might make sense to keep that
> piece.
>
>   
>> A smaller issue is that you keep a single list and use the value of the
>> CACHE_PENDING bit to tell the difference between states. [...]
>>     
>
> OK.  When exactly do they get moved between lists?
Immediately after being copied out to userspace in cache_read().

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.


  parent reply	other threads:[~2009-01-09 23:37 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-08  8:25 [patch 00/14] sunrpc: Sunrpc cache cleanups and upcall rework Greg Banks
2009-01-08  8:25 ` [patch 01/14] sunrpc: Use consistent naming for variables of type struct cache_detail* Greg Banks
2009-01-08  8:25 ` [patch 02/14] sunrpc: Use consistent naming for variables of type struct cache_head* Greg Banks
2009-01-08  8:25 ` [patch 03/14] sunrpc: Use consistent naming for variables of type struct cache_request* Greg Banks
2009-01-08  8:25 ` [patch 04/14] sunrpc: Minor indentation cleanup in cache.c Greg Banks
2009-01-08  8:25 ` [patch 05/14] sunrpc: Rename queue_loose() to cache_remove_queued() Greg Banks
2009-01-08  8:25 ` [patch 06/14] sunrpc: Gather forward declarations of static functions in cache.c Greg Banks
2009-01-08  8:25 ` [patch 07/14] sunrpc: Make the global queue_lock per-cache-detail Greg Banks
2009-01-08  8:25 ` [patch 08/14] sunrpc: Make the global queue_wait per-cache-detail Greg Banks
2009-01-08  8:25 ` [patch 09/14] sunrpc: Remove the global lock queue_io_mutex Greg Banks
2009-01-08  8:25 ` [patch 10/14] sunrpc: Reorganise the queuing of cache upcalls Greg Banks
2009-01-08 19:57   ` J. Bruce Fields
2009-01-09  2:40     ` Greg Banks
     [not found]       ` <4966B92F.8060008-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-09  2:57         ` J. Bruce Fields
2009-01-09  3:12           ` Greg Banks
     [not found]             ` <4966C0AB.7000604-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-09 16:53               ` Chuck Lever
2009-01-10  1:28                 ` Greg Banks
2009-01-09 21:29         ` J. Bruce Fields
2009-01-09 21:41           ` J. Bruce Fields
2009-01-09 23:40             ` Greg Banks
2009-01-09 23:29           ` Greg Banks [this message]
2009-01-08  8:25 ` [patch 11/14] sunrpc: Allocate cache_requests in a single allocation Greg Banks
2009-01-08  8:25 ` [patch 12/14] sunrpc: Centralise memory management of cache_requests Greg Banks
2009-01-08  8:25 ` [patch 13/14] sunrpc: Move struct cache_request to linux/sunrpc/cache.h Greg Banks
2009-01-08  8:25 ` [patch 14/14] sunrpc: Improve the usefulness of debug printks in the sunrpc cache code Greg Banks
2009-01-08 19:52 ` [patch 00/14] sunrpc: Sunrpc cache cleanups and upcall rework J. Bruce Fields
2009-01-09  1:42   ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4967DDE7.40106@melbourne.sgi.com \
    --to=gnb-cp1dwlodopni96+mszhfpqc/g2k4zdhf@public.gmane.org \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.