linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [git pull] more vfs fixes for final
@ 2011-03-10 11:09 Al Viro
  2011-03-10 11:44 ` Sedat Dilek
  0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2011-03-10 11:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel

	Fix for misplaced include in sysctl.h, fixes for ->d_revalidate()
(a bunch of filesystems are unusable with nfsd at the moment; the real fix
is to make ->d_revalidate() take flags directly and forget about dereferencing
nameidata in most of the instances, but that's obviously for the next window
and oopsen really need to be fixed for -final), missing checks in fallocate()
and Bruce's patch for inode (temporary) leaks with nfsd.  Please, pull from

git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ for-linus

Shortlog:

Al Viro (8):
      /proc/self is never going to be invalidated...
      reiserfs xattr ->d_revalidate() shouldn't care about RCU
      ceph: fix d_revalidate oopsen on NFS exports
      fuse: fix d_revalidate oopsen on NFS exports
      gfs2: fix d_revalidate oopsen on NFS exports
      ocfs2: fix d_revalidate oopsen on NFS exports
      jfs: fix d_revalidate oopsen on NFS exports
      fat: fix d_revalidate oopsen on NFS exports

J. Bruce Fields (1):
      fs/dcache: allow d_obtain_alias() to return unhashed dentries

Marco Stornelli (1):
      Check for immutable/append flag in fallocate path

Stephen Rothwell (1):
      sysctl: the include of rcupdate.h is only needed in the kernel

Diffstat:
 fs/ceph/dir.c          |    2 +-
 fs/dcache.c            |   26 ++++++++++++++++++++++++--
 fs/fat/namei_vfat.c    |    4 ++--
 fs/fuse/dir.c          |    2 +-
 fs/gfs2/dentry.c       |    2 +-
 fs/jfs/namei.c         |    2 +-
 fs/ocfs2/dcache.c      |    2 +-
 fs/open.c              |    8 ++++++++
 fs/proc/base.c         |   30 ------------------------------
 fs/reiserfs/xattr.c    |    2 --
 include/linux/sysctl.h |    2 +-
 11 files changed, 40 insertions(+), 42 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [git pull] more vfs fixes for final
  2011-03-10 11:09 [git pull] more vfs fixes for final Al Viro
@ 2011-03-10 11:44 ` Sedat Dilek
  2011-03-10 11:58   ` Al Viro
  0 siblings, 1 reply; 9+ messages in thread
From: Sedat Dilek @ 2011-03-10 11:44 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, Mar 10, 2011 at 12:09 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>        Fix for misplaced include in sysctl.h, fixes for ->d_revalidate()
> (a bunch of filesystems are unusable with nfsd at the moment; the real fix
> is to make ->d_revalidate() take flags directly and forget about dereferencing
> nameidata in most of the instances, but that's obviously for the next window
> and oopsen really need to be fixed for -final), missing checks in fallocate()
> and Bruce's patch for inode (temporary) leaks with nfsd.  Please, pull from
>
> git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ for-linus
>

Did you really push the new commits?

Last commit here is commit b306419ae08d9def53f2142a37cc0a58622307a8:
"nd->inode is not set on the second attempt in path_walk()"

- Sedat -

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [git pull] more vfs fixes for final
  2011-03-10 11:44 ` Sedat Dilek
@ 2011-03-10 11:58   ` Al Viro
  2011-03-10 12:43     ` Sedat Dilek
       [not found]     ` <20110310115856.GG22723-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
  0 siblings, 2 replies; 9+ messages in thread
From: Al Viro @ 2011-03-10 11:58 UTC (permalink / raw)
  To: sedat.dilek; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, Mar 10, 2011 at 12:44:37PM +0100, Sedat Dilek wrote:
> On Thu, Mar 10, 2011 at 12:09 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > ?? ?? ?? ??Fix for misplaced include in sysctl.h, fixes for ->d_revalidate()
> > (a bunch of filesystems are unusable with nfsd at the moment; the real fix
> > is to make ->d_revalidate() take flags directly and forget about dereferencing
> > nameidata in most of the instances, but that's obviously for the next window
> > and oopsen really need to be fixed for -final), missing checks in fallocate()
> > and Bruce's patch for inode (temporary) leaks with nfsd. ??Please, pull from
> >
> > git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ for-linus
> >
> 
> Did you really push the new commits?
> 
> Last commit here is commit b306419ae08d9def53f2142a37cc0a58622307a8:
> "nd->inode is not set on the second attempt in path_walk()"

Looks like propagation from hera to git.kernel.org is slow tonight...

GIT_DIR=/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ git log for-linus|head -5
commit d891eedbc3b1b0fade8a9ce60cc0eba1cccb59e5
Author: J. Bruce Fields <bfields@fieldses.org>
Date:   Tue Jan 18 15:45:09 2011 -0500

    fs/dcache: allow d_obtain_alias() to return unhashed dentries

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [git pull] more vfs fixes for final
  2011-03-10 11:58   ` Al Viro
@ 2011-03-10 12:43     ` Sedat Dilek
       [not found]     ` <20110310115856.GG22723-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
  1 sibling, 0 replies; 9+ messages in thread
From: Sedat Dilek @ 2011-03-10 12:43 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-fsdevel

On Thu, Mar 10, 2011 at 12:58 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Mar 10, 2011 at 12:44:37PM +0100, Sedat Dilek wrote:
>> On Thu, Mar 10, 2011 at 12:09 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> > ?? ?? ?? ??Fix for misplaced include in sysctl.h, fixes for ->d_revalidate()
>> > (a bunch of filesystems are unusable with nfsd at the moment; the real fix
>> > is to make ->d_revalidate() take flags directly and forget about dereferencing
>> > nameidata in most of the instances, but that's obviously for the next window
>> > and oopsen really need to be fixed for -final), missing checks in fallocate()
>> > and Bruce's patch for inode (temporary) leaks with nfsd. ??Please, pull from
>> >
>> > git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ for-linus
>> >
>>
>> Did you really push the new commits?
>>
>> Last commit here is commit b306419ae08d9def53f2142a37cc0a58622307a8:
>> "nd->inode is not set on the second attempt in path_walk()"
>
> Looks like propagation from hera to git.kernel.org is slow tonight...
>
> GIT_DIR=/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ git log for-linus|head -5
> commit d891eedbc3b1b0fade8a9ce60cc0eba1cccb59e5
> Author: J. Bruce Fields <bfields@fieldses.org>
> Date:   Tue Jan 18 15:45:09 2011 -0500
>
>    fs/dcache: allow d_obtain_alias() to return unhashed dentries
>

Thanks, it's pull-able now (commit-id fits).

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [git pull] more vfs fixes for final
       [not found]     ` <20110310115856.GG22723-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
@ 2011-03-11 21:09       ` Simon Kirby
       [not found]         ` <20110311210938.GB8211-hrtb9jm4Du+w5LPnMra/2Q@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Simon Kirby @ 2011-03-11 21:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Mark Moseley, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 10, 2011 at 11:58:56AM +0000, Al Viro wrote:

> commit d891eedbc3b1b0fade8a9ce60cc0eba1cccb59e5
> Author: J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
> Date:   Tue Jan 18 15:45:09 2011 -0500
> 
>     fs/dcache: allow d_obtain_alias() to return unhashed dentries

Hmm, I was hoping this or something recently would fix nfs_inode_cache
growing forever and flush processes taking lots of system time since
2.6.36. For example:

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
3457486 3454365  99%    0.95K 105601       33   3379232K nfs_inode_cache
469638 248761  52%    0.10K  12042       39     48168K buffer_head
243712 216348  88%    0.02K    952      256      3808K kmalloc-16
232785 202185  86%    0.19K  11085       21     44340K dentry
149696  54633  36%    0.06K   2339       64      9356K kmalloc-64
115976 106806  92%    0.55K   4142       28     66272K radix_tree_node
 76064  45680  60%    0.12K   2377       32      9508K kmalloc-128
 62336  53427  85%    0.03K    487      128      1948K kmalloc-32
 41958  41250  98%    0.75K   1998       21     31968K ext3_inode_cache

This clears them all, similar to what you posted:

echo 2 > /proc/sys/vm/drop_caches
sync
echo 2 > /proc/sys/vm/drop_caches

...but 2.6.38-rc8 still doesn't seem to fix it.

http://0x.ca/sim/ref/2.6.37/cpu3_nfs.png
http://www.spinics.net/lists/linux-nfs/msg18212.html

Any ideas?  This started with 2.6.36.

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [git pull] more vfs fixes for final
       [not found]         ` <20110311210938.GB8211-hrtb9jm4Du+w5LPnMra/2Q@public.gmane.org>
@ 2011-03-11 21:35           ` J. Bruce Fields
       [not found]             ` <20110311213519.GA9404-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2011-03-11 21:35 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Mark Moseley, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Fri, Mar 11, 2011 at 01:09:38PM -0800, Simon Kirby wrote:
> On Thu, Mar 10, 2011 at 11:58:56AM +0000, Al Viro wrote:
> 
> > commit d891eedbc3b1b0fade8a9ce60cc0eba1cccb59e5
> > Author: J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
> > Date:   Tue Jan 18 15:45:09 2011 -0500
> > 
> >     fs/dcache: allow d_obtain_alias() to return unhashed dentries
> 
> Hmm, I was hoping this or something recently would fix nfs_inode_cache
> growing forever and flush processes taking lots of system time since
> 2.6.36. For example:
> 
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 3457486 3454365  99%    0.95K 105601       33   3379232K nfs_inode_cache
> 469638 248761  52%    0.10K  12042       39     48168K buffer_head
> 243712 216348  88%    0.02K    952      256      3808K kmalloc-16
> 232785 202185  86%    0.19K  11085       21     44340K dentry
> 149696  54633  36%    0.06K   2339       64      9356K kmalloc-64
> 115976 106806  92%    0.55K   4142       28     66272K radix_tree_node
>  76064  45680  60%    0.12K   2377       32      9508K kmalloc-128
>  62336  53427  85%    0.03K    487      128      1948K kmalloc-32
>  41958  41250  98%    0.75K   1998       21     31968K ext3_inode_cache
> 
> This clears them all, similar to what you posted:
> 
> echo 2 > /proc/sys/vm/drop_caches
> sync
> echo 2 > /proc/sys/vm/drop_caches
> 
> ...but 2.6.38-rc8 still doesn't seem to fix it.
> 
> http://0x.ca/sim/ref/2.6.37/cpu3_nfs.png
> http://www.spinics.net/lists/linux-nfs/msg18212.html
> 
> Any ideas?  This started with 2.6.36.

Do you have NFSv4 clients that are doing locking?  Then it's probably
0997b17360 and 529d7b2a7f on the for-2.6.39 branch at:

	git://linux-nfs.org/~bfields/linux.git for-2.6.39

Let me know if not.

Those should go into 2.6.39 and stable when the merge window opens.  I
would have tried to slip them into 2.6.38 but they just didn't seem
quite trivial enough given where we are in the release process.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [git pull] more vfs fixes for final
       [not found]             ` <20110311213519.GA9404-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2011-03-12  1:09               ` Simon Kirby
       [not found]                 ` <20110312010930.GD8211-hrtb9jm4Du+w5LPnMra/2Q@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Simon Kirby @ 2011-03-12  1:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Mark Moseley, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Fri, Mar 11, 2011 at 04:35:19PM -0500, J. Bruce Fields wrote:

> On Fri, Mar 11, 2011 at 01:09:38PM -0800, Simon Kirby wrote:
> > On Thu, Mar 10, 2011 at 11:58:56AM +0000, Al Viro wrote:
> > 
> > > commit d891eedbc3b1b0fade8a9ce60cc0eba1cccb59e5
> > > Author: J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
> > > Date:   Tue Jan 18 15:45:09 2011 -0500
> > > 
> > >     fs/dcache: allow d_obtain_alias() to return unhashed dentries
> > 
> > Hmm, I was hoping this or something recently would fix nfs_inode_cache
> > growing forever and flush processes taking lots of system time since
> > 2.6.36. For example:
> > 
> >   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> > 3457486 3454365  99%    0.95K 105601       33   3379232K nfs_inode_cache
> > 469638 248761  52%    0.10K  12042       39     48168K buffer_head
> > 243712 216348  88%    0.02K    952      256      3808K kmalloc-16
> > 232785 202185  86%    0.19K  11085       21     44340K dentry
> > 149696  54633  36%    0.06K   2339       64      9356K kmalloc-64
> > 115976 106806  92%    0.55K   4142       28     66272K radix_tree_node
> >  76064  45680  60%    0.12K   2377       32      9508K kmalloc-128
> >  62336  53427  85%    0.03K    487      128      1948K kmalloc-32
> >  41958  41250  98%    0.75K   1998       21     31968K ext3_inode_cache
> > 
> > This clears them all, similar to what you posted:
> > 
> > echo 2 > /proc/sys/vm/drop_caches
> > sync
> > echo 2 > /proc/sys/vm/drop_caches
> > 
> > ...but 2.6.38-rc8 still doesn't seem to fix it.
> > 
> > http://0x.ca/sim/ref/2.6.37/cpu3_nfs.png
> > http://www.spinics.net/lists/linux-nfs/msg18212.html
> > 
> > Any ideas?  This started with 2.6.36.
> 
> Do you have NFSv4 clients that are doing locking?  Then it's probably
> 0997b17360 and 529d7b2a7f on the for-2.6.39 branch at:
> 
> 	git://linux-nfs.org/~bfields/linux.git for-2.6.39
> 
> Let me know if not.

Yes, but when this started, it was all NFSv3. I tried NFSv4 to see if
it made any different (no, other than more I/O wait due to idmapd and
flock(LOCK_EX) breaking on a file opened O_RDONLY, which works locally
and on NFSv3 -- fixed by changing to open(O_RDWR) in my code), but
otherwise NFSv3 and NFSv4 look pretty much the same and "leak" at the
same rate.

Actually, I only changed the mount where it writes the log results, and
not where it reads them from (a bunch of NFSv3 mounts that will take a
lot more work to make NFSv4 since they currently run without libnss-mysql
set up), so I haven't actually tested to see if it happens with purely
NFSv4 mounts.

Anyway, it's locking on one file only, and it was purely for reducing
writeback thrashing, so I could try without it if you think it might be
related, but it all started without NFSv4. All this thing does is read a
bunch of log files, crunch them into reports, and write them to another
mount point, all run from xargs -P. Pretty basic readdir, read, write,
unlink...not anything exotic.

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [git pull] more vfs fixes for final
       [not found]                 ` <20110312010930.GD8211-hrtb9jm4Du+w5LPnMra/2Q@public.gmane.org>
@ 2011-03-15  0:46                   ` J. Bruce Fields
  2011-03-16  5:19                     ` Simon Kirby
  0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2011-03-15  0:46 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Mark Moseley, linux-nfs-u79uwXL29TY76Z2rM5mHXA, Al Viro,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Fri, Mar 11, 2011 at 05:09:30PM -0800, Simon Kirby wrote:
> On Fri, Mar 11, 2011 at 04:35:19PM -0500, J. Bruce Fields wrote:
> 
> > On Fri, Mar 11, 2011 at 01:09:38PM -0800, Simon Kirby wrote:
> > > On Thu, Mar 10, 2011 at 11:58:56AM +0000, Al Viro wrote:
> > > 
> > > > commit d891eedbc3b1b0fade8a9ce60cc0eba1cccb59e5
> > > > Author: J. Bruce Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
> > > > Date:   Tue Jan 18 15:45:09 2011 -0500
> > > > 
> > > >     fs/dcache: allow d_obtain_alias() to return unhashed dentries
> > > 
> > > Hmm, I was hoping this or something recently would fix nfs_inode_cache
> > > growing forever and flush processes taking lots of system time since
> > > 2.6.36. For example:
> > > 
> > >   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> > > 3457486 3454365  99%    0.95K 105601       33   3379232K nfs_inode_cache
> > > 469638 248761  52%    0.10K  12042       39     48168K buffer_head
> > > 243712 216348  88%    0.02K    952      256      3808K kmalloc-16
> > > 232785 202185  86%    0.19K  11085       21     44340K dentry
> > > 149696  54633  36%    0.06K   2339       64      9356K kmalloc-64
> > > 115976 106806  92%    0.55K   4142       28     66272K radix_tree_node
> > >  76064  45680  60%    0.12K   2377       32      9508K kmalloc-128
> > >  62336  53427  85%    0.03K    487      128      1948K kmalloc-32
> > >  41958  41250  98%    0.75K   1998       21     31968K ext3_inode_cache
> > > 
> > > This clears them all, similar to what you posted:
> > > 
> > > echo 2 > /proc/sys/vm/drop_caches
> > > sync
> > > echo 2 > /proc/sys/vm/drop_caches
> > > 
> > > ...but 2.6.38-rc8 still doesn't seem to fix it.
> > > 
> > > http://0x.ca/sim/ref/2.6.37/cpu3_nfs.png
> > > http://www.spinics.net/lists/linux-nfs/msg18212.html
> > > 
> > > Any ideas?  This started with 2.6.36.
> > 
> > Do you have NFSv4 clients that are doing locking?  Then it's probably
> > 0997b17360 and 529d7b2a7f on the for-2.6.39 branch at:
> > 
> > 	git://linux-nfs.org/~bfields/linux.git for-2.6.39
> > 
> > Let me know if not.

Pfft, I'm blind, sorry, I didn't pay attention to the slab name;
nfs_inode_cache is client-side, not server-side, so nothing I've done 
should affect it one way or the other.

Of course it's not necessarily a bug for the client to cache a lot of
inodes, but millions sounds like a lot for your case?

Are you actually seeing this cause you problems?

--b.

> Yes, but when this started, it was all NFSv3. I tried NFSv4 to see if
> it made any different (no, other than more I/O wait due to idmapd and
> flock(LOCK_EX) breaking on a file opened O_RDONLY, which works locally
> and on NFSv3 -- fixed by changing to open(O_RDWR) in my code), but
> otherwise NFSv3 and NFSv4 look pretty much the same and "leak" at the
> same rate.
> 
> Actually, I only changed the mount where it writes the log results, and
> not where it reads them from (a bunch of NFSv3 mounts that will take a
> lot more work to make NFSv4 since they currently run without libnss-mysql
> set up), so I haven't actually tested to see if it happens with purely
> NFSv4 mounts.
> 
> Anyway, it's locking on one file only, and it was purely for reducing
> writeback thrashing, so I could try without it if you think it might be
> related, but it all started without NFSv4. All this thing does is read a
> bunch of log files, crunch them into reports, and write them to another
> mount point, all run from xargs -P. Pretty basic readdir, read, write,
> unlink...not anything exotic.
> 
> Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [git pull] more vfs fixes for final
  2011-03-15  0:46                   ` J. Bruce Fields
@ 2011-03-16  5:19                     ` Simon Kirby
  0 siblings, 0 replies; 9+ messages in thread
From: Simon Kirby @ 2011-03-16  5:19 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Mark Moseley, linux-nfs, Al Viro, linux-kernel, linux-fsdevel

On Mon, Mar 14, 2011 at 08:46:56PM -0400, J. Bruce Fields wrote:

> On Fri, Mar 11, 2011 at 05:09:30PM -0800, Simon Kirby wrote:
> > On Fri, Mar 11, 2011 at 04:35:19PM -0500, J. Bruce Fields wrote:
> > 
> > > On Fri, Mar 11, 2011 at 01:09:38PM -0800, Simon Kirby wrote:
> > > > 
> > > > Hmm, I was hoping this or something recently would fix nfs_inode_cache
> > > > growing forever and flush processes taking lots of system time since
> > > > 2.6.36. For example:
> > > > 
> > > >   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> > > > 3457486 3454365  99%    0.95K 105601       33   3379232K nfs_inode_cache
> > > > 469638 248761  52%    0.10K  12042       39     48168K buffer_head
> > > > 243712 216348  88%    0.02K    952      256      3808K kmalloc-16
> > > > 232785 202185  86%    0.19K  11085       21     44340K dentry
> > > > 149696  54633  36%    0.06K   2339       64      9356K kmalloc-64
> > > > 115976 106806  92%    0.55K   4142       28     66272K radix_tree_node
> > > >  76064  45680  60%    0.12K   2377       32      9508K kmalloc-128
> > > >  62336  53427  85%    0.03K    487      128      1948K kmalloc-32
> > > >  41958  41250  98%    0.75K   1998       21     31968K ext3_inode_cache
> > > > 
> > > > This clears them all, similar to what you posted:
> > > > 
> > > > echo 2 > /proc/sys/vm/drop_caches
> > > > sync
> > > > echo 2 > /proc/sys/vm/drop_caches
> > > > 
> > > > ...but 2.6.38-rc8 still doesn't seem to fix it.
> > > > 
> > > > http://0x.ca/sim/ref/2.6.37/cpu3_nfs.png
> > > > http://www.spinics.net/lists/linux-nfs/msg18212.html
> > > > 
> > > > Any ideas?  This started with 2.6.36.
> > > 
> > > Do you have NFSv4 clients that are doing locking?  Then it's probably
> > > 0997b17360 and 529d7b2a7f on the for-2.6.39 branch at:
> > > 
> > > 	git://linux-nfs.org/~bfields/linux.git for-2.6.39
> > > 
> > > Let me know if not.
> 
> Pfft, I'm blind, sorry, I didn't pay attention to the slab name;
> nfs_inode_cache is client-side, not server-side, so nothing I've done 
> should affect it one way or the other.
> 
> Of course it's not necessarily a bug for the client to cache a lot of
> inodes, but millions sounds like a lot for your case?
> 
> Are you actually seeing this cause you problems?

Yes, manifesting in two ways..

On the log-crunching box I was referencing in my reports, system time
grows proportionally with nfs_inode_cache and eventually swamps the
entire system. (See http://0x.ca/sim/ref/2.6.37/cpu3_nfs.png for an
example of it using half of the CPU after a few weeks of uptime.) The
CPU seems to come from "flush" processes, and can be seen with "top"
and "perf top" (which shows mostly spinlock contention).

The other case is much less controlled, and is a large cluster of boxes
running Apache, etc. Some event seems to case reclaim of these all at
once, and the machine effectively goes out to lunch for a few minutes
while this happens, with 30 or more "flush" processes taking 100% CPU.
We were able to set up a script to detect this immediately and pull them
from the cluster pool, and after a few minutes, everything returns to
normal.

So, the problem seems to be two problems: the massive burst of spinlock
contention from the "flush" processes for every NFS mount (and we have
about 70 of them on average), and even without the burst, the flush
processes seem to take CPU proportionally to the number of them, even
though I would expect them to just be cache at that point.

None of this was an issue in 2.6.35. I can boot it on the log crunching
box and system time is 0% after every run. On 2.6.36-38, system time
increases by about 2% every day.

Simon-

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-03-16  5:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-10 11:09 [git pull] more vfs fixes for final Al Viro
2011-03-10 11:44 ` Sedat Dilek
2011-03-10 11:58   ` Al Viro
2011-03-10 12:43     ` Sedat Dilek
     [not found]     ` <20110310115856.GG22723-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2011-03-11 21:09       ` Simon Kirby
     [not found]         ` <20110311210938.GB8211-hrtb9jm4Du+w5LPnMra/2Q@public.gmane.org>
2011-03-11 21:35           ` J. Bruce Fields
     [not found]             ` <20110311213519.GA9404-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2011-03-12  1:09               ` Simon Kirby
     [not found]                 ` <20110312010930.GD8211-hrtb9jm4Du+w5LPnMra/2Q@public.gmane.org>
2011-03-15  0:46                   ` J. Bruce Fields
2011-03-16  5:19                     ` Simon Kirby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).