OOM on quotacheck (again?)

* OOM on quotacheck (again?)
@ 2012-09-19 14:12 blafoo
  2012-09-19 20:59 ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: blafoo @ 2012-09-19 14:12 UTC (permalink / raw)
  To: xfs

Hi all,

for the last couple of days i've been trying to compile a new kernel for
our webserver-platform which is based on debian-squeeze.

Hardware: a mix of Dell PE2850, 2950, R710
- raid-10 with 4 disks (old setup, PE2850)
- raid-1 system, raid-10 content (current setup)
- currently running linux-2.6.37 custom built, vmalloc set to default
(128MB)

All systems have an xfs-filesystem as their content-partition and have
group-quota enabled (no other xfs-settings active). the
content-partition varies in size between 250GB and 1TB and contains
between 3 and 10 million files.

Every time i try to mount the xfs-file-system and a quota-check is
needed, the server goes out of memory (oom). I can easily reproduce this
by rebooting the server, resetting the quota-flags with

xfs_db -x -c 'sb 0' -c 'write qflags 0'

and rerun the quota-check.

This is true for various kernels but not all. What i've tried so far:

2.6.37.x - fails with OOM
2.6.39.4 - suprisingly works (see below why)
3.2.29 - fails with OOM
3.4.10 - fails with OOM
3.6.0rc5 - fails with vmalloc error (XFS (sda7): xfs_buf_get_map: failed
to map pages), with vmalloc=256 the systems hangs on mount infitly.

Some more infos from my test-system are available here:

http://pastebin.com/2DkDyH4R

I found a couple of references regarding this problem but no final
solution so far.

Please correct the following if i misunderstood anything:

1. There was an OOM problem with quota-checks which was fixed in
2.6.39.4 which is mentioned here:

a) http://permalink.gmane.org/gmane.comp.file-systems.xfs.general/43565

and fixed here:

b) http://patchwork.xfs.org/patch/3337/

That is why 2.6.39.4 works for me.

2. That fix was later replaced (not extended) with a nicer patch which
is mentioned/published here:

c) http://oss.sgi.com/archives/xfs/2011-03/msg00240.html

I checked all kernel-versions above for the patch mentioned in 2. and
can confirm its presence in each kernel-tree. Still our servers fail to
check quota successfully.

Am i missing something here?

PS: As a side-note: we've been running xfs for years without any
problems. But after we activated the gquota-feature, we've been having
problems in a couple of places. One is the OOM on quota-check, another
is xfs-errors on high-io volumes with gquota enabled. But since the
high-io-problem problem might be connected to the OOM-problem, we'll try
to fix the latter first :-)

best regards
Volker

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 10+ messages in thread