[PATCH] nfsd: fix possible read-ahead cache and export table corruption

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] nfsd: fix possible read-ahead cache and export table corruption
@ 2007-07-13  2:44 J. Bruce Fields
  2007-07-13  4:10 ` Greg Banks
  0 siblings, 1 reply; 4+ messages in thread
From: J. Bruce Fields @ 2007-07-13  2:44 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs, Greg Banks

From: J. Bruce Fields <bfields@citi.umich.edu>

The value of nperbucket calculated here is too small--we should be
rounding up instead of down--with the result that the index j in the
following loop can overflow the raparm_hash array.  At least in my case,
the next thing in memory turns out to be export_table, so the symptoms I
see are crashes in cache_put caused by the appearance of four zeroed-out
export entries in the first bucket of the hash table of exports (which
were actually entries in the readahead cache, a pointer to which had
been written to the export table in this initialization code).

It looks like the bug was probably introduced with commit
fce1456a19f5c08b688c29f00ef90fdfa074c79b ("knfsd: make the readahead
params cache SMP-friendly").

Cc: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
---
 fs/nfsd/vfs.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

OK, that was painful.  I need a less dumb debugging method than "test,
stare at the code for a long time, insert 20 more printk's, test
again...".

Anyway, it might explain a few other reports of export table corruption
that I hadn't previously been able to reproduce.

The code could be a little more straightforward here.   Could we just
round up cache_size to a multiple of RAPARM_HASH_SIZE, make all the hash
chains the same length, and just use them as arrays?  The p_next
pointers seem like a unnecessary complication.

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 7e50da0..dd3604e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1885,7 +1885,7 @@ nfsd_racache_init(int cache_size)
 		raparm_hash[i].pb_head = NULL;
 		spin_lock_init(&raparm_hash[i].pb_lock);
 	}
-	nperbucket = cache_size >> RAPARM_HASH_BITS;
+	nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE);
 	for (i = 0; i < cache_size - 1; i++) {
 		if (i % nperbucket == 0)
 			raparm_hash[j++].pb_head = raparml + i;
-- 
1.5.3.rc0.63.gc956

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] nfsd: fix possible read-ahead cache and export table corruption
  2007-07-13  2:44 [PATCH] nfsd: fix possible read-ahead cache and export table corruption J. Bruce Fields
@ 2007-07-13  4:10 ` Greg Banks
  2007-07-13 17:46   ` J. Bruce Fields
  0 siblings, 1 reply; 4+ messages in thread
From: Greg Banks @ 2007-07-13  4:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Neil Brown, nfs

On Thu, Jul 12, 2007 at 10:44:20PM -0400, J. Bruce Fields wrote:
> From: J. Bruce Fields <bfields@citi.umich.edu>
> 
> The value of nperbucket calculated here is too small--we should be
> [...]
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 7e50da0..dd3604e 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1885,7 +1885,7 @@ nfsd_racache_init(int cache_size)
>  		raparm_hash[i].pb_head = NULL;
>  		spin_lock_init(&raparm_hash[i].pb_lock);
>  	}
> -	nperbucket = cache_size >> RAPARM_HASH_BITS;
> +	nperbucket = DIV_ROUND_UP(cache_size, RAPARM_HASH_SIZE);
>  	for (i = 0; i < cache_size - 1; i++) {
>  		if (i % nperbucket == 0)
>  			raparm_hash[j++].pb_head = raparml + i;
> -- 
> 1.5.3.rc0.63.gc956

My bad.  The code I added really only works properly if the number
of nfsd threads at startup is a multiple of 8, so that cache_size
is a multiple of 16 and the shift doesn't lose any nonzero bits.
As we use 128 threads in our product, I'd never noticed the bug :-(
Sorry for the trouble, Bruce.

There's a couple of other problems with that code, for which I'll send
patches shortly.  But I think the raparams cache needs a more thorough
renovation than that.  I have a mostly-working patch which rips it out
entirely and replaces it with a cache of open struct files, an idea
of Neil's.  This approach fixes a whole bunch of separate problems
at once, and will be really useful...if I can ever find enough time
between super-urgent dramas to work on it ;-)

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere.  Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nfsd: fix possible read-ahead cache and export table corruption
  2007-07-13  4:10 ` Greg Banks
@ 2007-07-13 17:46   ` J. Bruce Fields
  2007-07-24 14:40     ` J. Bruce Fields
  0 siblings, 1 reply; 4+ messages in thread
From: J. Bruce Fields @ 2007-07-13 17:46 UTC (permalink / raw)
  To: Greg Banks; +Cc: Neil Brown, nfs

On Fri, Jul 13, 2007 at 02:10:19PM +1000, Greg Banks wrote:
> My bad.  The code I added really only works properly if the number
> of nfsd threads at startup is a multiple of 8, so that cache_size
> is a multiple of 16 and the shift doesn't lose any nonzero bits.
> As we use 128 threads in our product, I'd never noticed the bug :-(

It seems likely the distros all default to powers or 2, so that may make
it a little less likely to explain other bug reports I've seen.  Oh
well.  I'm not sure why I had this one test machine set to start 50
threads.

> There's a couple of other problems with that code, for which I'll send
> patches shortly.  But I think the raparams cache needs a more thorough
> renovation than that.  I have a mostly-working patch which rips it out
> entirely and replaces it with a cache of open struct files, an idea
> of Neil's.  This approach fixes a whole bunch of separate problems
> at once, and will be really useful...

OK!  What are the other problems you were thinking of?

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nfsd: fix possible read-ahead cache and export table corruption
  2007-07-13 17:46   ` J. Bruce Fields
@ 2007-07-24 14:40     ` J. Bruce Fields
  0 siblings, 0 replies; 4+ messages in thread
From: J. Bruce Fields @ 2007-07-24 14:40 UTC (permalink / raw)
  To: Greg Banks; +Cc: Neil Brown, nfs

On Fri, Jul 13, 2007 at 01:46:25PM -0400, J. Bruce Fields wrote:
> On Fri, Jul 13, 2007 at 02:10:19PM +1000, Greg Banks wrote:
> > There's a couple of other problems with that code, for which I'll send
> > patches shortly.  But I think the raparams cache needs a more thorough
> > renovation than that.  I have a mostly-working patch which rips it out
> > entirely and replaces it with a cache of open struct files, an idea
> > of Neil's.  This approach fixes a whole bunch of separate problems
> > at once, and will be really useful...
> 
> OK!  What are the other problems you were thinking of?

By the way, one other problem with keeping open files that I was
reminded of recently--currently you should be able to do something like

	killall rpc.mountd
	# flush export table:
	exportfs -f
	umount /exports
	... do stuff
	mount /exports
	rpc.mountd

Also useful if you want to unmount a filesystem one one machine so you
can move it to another and mount it there.  But that's only possible
because noone holds a file open for longer than the time required to
service a single request.

Well, no-one except lockd--hence Wendy's patches to drop all locks on a
filesystem.  Or the nfsv4 server, which is holding files open as long as
clients have them open.  Oops, my fault.  I'm not sure quite what to do
about that.

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-07-24 14:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-13  2:44 [PATCH] nfsd: fix possible read-ahead cache and export table corruption J. Bruce Fields
2007-07-13  4:10 ` Greg Banks
2007-07-13 17:46   ` J. Bruce Fields
2007-07-24 14:40     ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.