All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Hans-Peter Jansen" <hpj@urpla.net>
To: Dave Chinner <david@fromorbit.com>
Cc: opensuse-kernel@opensuse.org, linux-kernel@vger.kernel.org,
	xfs@oss.sgi.com
Subject: Re: 2.6.34-rc3: simple du (on a big xfs tree) triggers oom killer [bisected: 57817c68229984818fea9e614d6f95249c3fb098]
Date: Thu, 8 Apr 2010 00:02:20 +0200	[thread overview]
Message-ID: <201004080002.21137.hpj@urpla.net> (raw)
In-Reply-To: <20100407014533.GI11036@dastard>

On Wednesday 07 April 2010, 03:45:33 Dave Chinner wrote:
>
> However, if the memory pressure is purely inode cache (creating zero
> length files or read-only traversal), then the OOM killer kicks a
> while after the slab cache fills memory.  This doesn't need highmem;
> I used a x86_64 kernel on a VM w/ 1GB RAM to reliably reproduce
> this.  I'll add zero length file tests and traversals to my low
> memory testing.

I'm glad, that you're able to reproduce it. My initial failure was during 
disk to disk backup (with a simple cp -al & rsync combination).

> The best way to fix this, I think, is to trigger a shrinker callback
> when memory is low to run the background inode reclaim. The problem
> is that these inode caches and the reclaim state are per-filesystem,
> not global state, and the current shrinker interface only works with
> global state.
>
> Hence there are two patches to this fix - the first adds a context
> to the shrinker callout, and the second adds the XFS infrastructure
> to track the number of reclaimable inodes per filesystem and
> register/unregister shrinkers for each filesystem.

I see, the first one will be interesting to get into mainline, given the 
number of projects, that are involved. 

> With these patches, my reproducable test case which locked the
> machine up with a OOM panic in a couple of minutes has been running
> for over half an hour. I have much more confidence in this change
> with limited testing than the reverting of the background inode
> reclaim as the revert introduces
>
> The patches below apply to the xfs-dev tree, which is currently at
> 34-rc1. If they don't apply, let me know and I'll redo them against
> a vanilla kernel tree. Can you test them to see if the problem goes
> away? If the problem is fixed, I'll push them for a proper review
> cycle...

Of course, you did the original patch for a reason... Therefor I would love 
to test your patches. I've tried to apply them to 2.6.33.2, but after 
fixing the same reject as noted below, I'm stuck here:

/usr/src/packages/BUILD/kernel-default-2.6.33.2/linux-2.6.33/fs/xfs/linux-2.6/xfs_sync.c: 
In function 'xfs_reclaim_inode_shrink':
/usr/src/packages/BUILD/kernel-default-2.6.33.2/linux-2.6.33/fs/xfs/linux-2.6/xfs_sync.c:805: 
error: implicit declaration of function 'xfs_perag_get'
/usr/src/packages/BUILD/kernel-default-2.6.33.2/linux-2.6.33/fs/xfs/linux-2.6/xfs_sync.c:805: 
warning: assignment makes pointer from integer without a cast
/usr/src/packages/BUILD/kernel-default-2.6.33.2/linux-2.6.33/fs/xfs/linux-2.6/xfs_sync.c:807: 
error: implicit declaration of function 'xfs_perag_put'

Now I see, that there happened a rename of the offending functions, but also 
they've grown a radix_tree structure and locking. How do I handle that?

BTW, your patches do not apply to Linus' current git tree either:
patching file fs/xfs/quota/xfs_qm.c
Hunk #1 succeeded at 72 (offset 3 lines).
Hunk #2 FAILED at 2120.
1 out of 2 hunks FAILED -- saving rejects to file fs/xfs/quota/xfs_qm.c.rej
I'm able to resolve this, but 2.6.34-current does give me some other 
trouble, that I need to get by (PS2 keyboard stops working eventually)..

Anyway, thanks for your great support, Dave. This is much appreciated.

Cheers,
Pete

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: "Hans-Peter Jansen" <hpj@urpla.net>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-kernel@vger.kernel.org, opensuse-kernel@opensuse.org,
	xfs@oss.sgi.com
Subject: Re: 2.6.34-rc3: simple du (on a big xfs tree) triggers oom killer [bisected: 57817c68229984818fea9e614d6f95249c3fb098]
Date: Thu, 8 Apr 2010 00:02:20 +0200	[thread overview]
Message-ID: <201004080002.21137.hpj@urpla.net> (raw)
In-Reply-To: <20100407014533.GI11036@dastard>

On Wednesday 07 April 2010, 03:45:33 Dave Chinner wrote:
>
> However, if the memory pressure is purely inode cache (creating zero
> length files or read-only traversal), then the OOM killer kicks a
> while after the slab cache fills memory.  This doesn't need highmem;
> I used a x86_64 kernel on a VM w/ 1GB RAM to reliably reproduce
> this.  I'll add zero length file tests and traversals to my low
> memory testing.

I'm glad, that you're able to reproduce it. My initial failure was during 
disk to disk backup (with a simple cp -al & rsync combination).

> The best way to fix this, I think, is to trigger a shrinker callback
> when memory is low to run the background inode reclaim. The problem
> is that these inode caches and the reclaim state are per-filesystem,
> not global state, and the current shrinker interface only works with
> global state.
>
> Hence there are two patches to this fix - the first adds a context
> to the shrinker callout, and the second adds the XFS infrastructure
> to track the number of reclaimable inodes per filesystem and
> register/unregister shrinkers for each filesystem.

I see, the first one will be interesting to get into mainline, given the 
number of projects, that are involved. 

> With these patches, my reproducable test case which locked the
> machine up with a OOM panic in a couple of minutes has been running
> for over half an hour. I have much more confidence in this change
> with limited testing than the reverting of the background inode
> reclaim as the revert introduces
>
> The patches below apply to the xfs-dev tree, which is currently at
> 34-rc1. If they don't apply, let me know and I'll redo them against
> a vanilla kernel tree. Can you test them to see if the problem goes
> away? If the problem is fixed, I'll push them for a proper review
> cycle...

Of course, you did the original patch for a reason... Therefor I would love 
to test your patches. I've tried to apply them to 2.6.33.2, but after 
fixing the same reject as noted below, I'm stuck here:

/usr/src/packages/BUILD/kernel-default-2.6.33.2/linux-2.6.33/fs/xfs/linux-2.6/xfs_sync.c: 
In function 'xfs_reclaim_inode_shrink':
/usr/src/packages/BUILD/kernel-default-2.6.33.2/linux-2.6.33/fs/xfs/linux-2.6/xfs_sync.c:805: 
error: implicit declaration of function 'xfs_perag_get'
/usr/src/packages/BUILD/kernel-default-2.6.33.2/linux-2.6.33/fs/xfs/linux-2.6/xfs_sync.c:805: 
warning: assignment makes pointer from integer without a cast
/usr/src/packages/BUILD/kernel-default-2.6.33.2/linux-2.6.33/fs/xfs/linux-2.6/xfs_sync.c:807: 
error: implicit declaration of function 'xfs_perag_put'

Now I see, that there happened a rename of the offending functions, but also 
they've grown a radix_tree structure and locking. How do I handle that?

BTW, your patches do not apply to Linus' current git tree either:
patching file fs/xfs/quota/xfs_qm.c
Hunk #1 succeeded at 72 (offset 3 lines).
Hunk #2 FAILED at 2120.
1 out of 2 hunks FAILED -- saving rejects to file fs/xfs/quota/xfs_qm.c.rej
I'm able to resolve this, but 2.6.34-current does give me some other 
trouble, that I need to get by (PS2 keyboard stops working eventually)..

Anyway, thanks for your great support, Dave. This is much appreciated.

Cheers,
Pete

  reply	other threads:[~2010-04-07 22:05 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-04 22:49 2.6.34-rc3: simple du (on a big xfs tree) triggers oom killer Hans-Peter Jansen
2010-04-04 22:49 ` Hans-Peter Jansen
2010-04-05  0:49 ` Dave Chinner
2010-04-05  0:49   ` Dave Chinner
2010-04-05 11:35   ` Hans-Peter Jansen
2010-04-05 11:35     ` Hans-Peter Jansen
2010-04-05 23:06     ` Dave Chinner
2010-04-05 23:06       ` Dave Chinner
2010-04-06 14:52       ` 2.6.34-rc3: simple du (on a big xfs tree) triggers oom killer [bisected: 57817c68229984818fea9e614d6f95249c3fb098] Hans-Peter Jansen
2010-04-06 14:52         ` Hans-Peter Jansen
2010-04-06 23:11         ` Dave Chinner
2010-04-06 23:11           ` Dave Chinner
2010-04-07  1:45           ` Dave Chinner
2010-04-07  1:45             ` Dave Chinner
2010-04-07 22:02             ` Hans-Peter Jansen [this message]
2010-04-07 22:02               ` Hans-Peter Jansen
2010-04-12 22:32               ` Dave Chinner
2010-04-12 22:32                 ` Dave Chinner
2010-04-13  8:50                 ` Hans-Peter Jansen
2010-04-13  8:50                   ` Hans-Peter Jansen
2010-04-13  9:18                   ` Dave Chinner
2010-04-13  9:18                     ` Dave Chinner
2010-04-13  9:42                     ` Hans-Peter Jansen
2010-04-13  9:42                       ` Hans-Peter Jansen
2010-04-24 16:44                       ` Hans-Peter Jansen
2010-04-24 16:44                         ` Hans-Peter Jansen
2010-04-24 21:23                         ` Emmanuel Florac
2010-04-24 21:23                           ` Emmanuel Florac
2010-04-24 22:30                           ` Hans-Peter Jansen
2010-04-24 22:30                             ` Hans-Peter Jansen
2010-04-24 22:40                             ` [opensuse-kernel] " Justin P. Mattock
2010-04-24 22:40                               ` Justin P. Mattock
2010-04-24 22:41                             ` Justin P. Mattock
2010-04-24 22:41                               ` Justin P. Mattock
2010-04-25 13:04                             ` Emmanuel Florac
2010-04-25 13:04                               ` Emmanuel Florac
2010-04-25 16:27                         ` Greg KH
2010-04-25 16:27                           ` Greg KH
2010-04-25 16:57                           ` Christoph Hellwig
2010-04-25 16:57                             ` Christoph Hellwig
2010-04-25 18:18                             ` Greg KH
2010-04-25 18:18                               ` Greg KH
2010-04-26  0:36                           ` Dave Chinner
2010-04-26  0:36                             ` Dave Chinner
2010-04-27  0:02                             ` Greg KH
2010-04-27  0:02                               ` Greg KH
2010-04-26  0:32                         ` Dave Chinner
2010-04-26  0:32                           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201004080002.21137.hpj@urpla.net \
    --to=hpj@urpla.net \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=opensuse-kernel@opensuse.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.