public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Cc: xfs@oss.sgi.com
Subject: Re: XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
Date: Thu, 22 Aug 2013 12:25:44 +1000	[thread overview]
Message-ID: <20130822022544.GS6023@dastard> (raw)
In-Reply-To: <20130821152458.GD986@poseidon.cudanet.local>

On Wed, Aug 21, 2013 at 11:24:58AM -0400, Josef 'Jeff' Sipek wrote:
> We've started experimenting with larger directory block sizes to avoid
> directory fragmentation.  Everything seems to work fine, except that the log
> is spammed with these lovely debug messages:
> 
> 	XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
> 
> From looking at the code, it looks like that each of those messages (there
> are thousands) equates to 100 trips through the loop.  My guess is that the
> larger blocks require multi-page allocations which are harder to satisfy.
> This is with 3.10 kernel.

No, larger blocks simply require more single pages. The buffer cache
does not require multi-page allocation at all. So, mode = 0x250,
which means ___GFP_NOWARN | ___GFP_IO | ___GFP_WAIT which is also
known as a GFP_NOFS allocation context.

So, it's entirely possible that your memory is full of cached
filesystem data and metadata, and the allocation that needs more
can't reclaim them.

> The hardware is something like (I can find out the exact config is you want):
> 
> 	32 cores
> 	128 GB RAM
> 	LSI 9271-8i RAID (one big RAID-60 with 36 disks, partitioned)
> 
> As I hinted at earlier, we end up with pretty big directories.  We can
> semi-reliably trigger this when we run rsync on the data between two
> (identical) hosts over 10GbitE.
> 
> # xfs_info /dev/sda9 
> meta-data=/dev/sda9              isize=256    agcount=6, agsize=268435455 blks 
>          =                       sectsz=512   attr=2 
> data     =                       bsize=4096   blocks=1454213211, imaxpct=5 
>          =                       sunit=0      swidth=0 blks 
> naming   =version 2              bsize=65536  ascii-ci=0 
> log      =internal               bsize=4096   blocks=521728, version=2 
>          =                       sectsz=512   sunit=0 blks, lazy-count=1 
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> /proc/slabinfo: https://www.copy.com/s/1x1yZFjYO2EI/slab.txt

Hmmm. You're using filestreams. That's unusual.

Only major slab cache is the buffer_head slab, with ~12 million
active bufferheads. So, that means you've got at least 47-48GB of
data in the page cache.....

And there's only ~35000 xfs_buf items in the slab, so the metadata
cache isn't very big, and reclaim from that isn't a problem, nor the
inode caches as there's only 130,000 cached inodes.

> sysrq m output: https://www.copy.com/s/mYfMYfJJl2EB/sysrq-m.txt

27764401 total pagecache pages

which indicates that you've got close to 110GB of pages in the page
cache. Hmmm, and 24-25GB of dirty pages in memory.

You know, I'd be suspecting a memory reclaim problem here to do with
having large amounts of dirty memory in the page cache. I don't
think the underlying cause is going to be the filesystem code, as
the warning should never be emitted if memory reclaim is making
progress. Perhaps you could try lowering all the dirty memory
thresholds to see if that allows memory reclaim to make more
progress because there are fewer dirty pages in memory...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2013-08-22  2:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-21 15:24 XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) Josef 'Jeff' Sipek
2013-08-22  2:25 ` Dave Chinner [this message]
2013-08-22 15:07   ` Josef 'Jeff' Sipek
  -- strict thread matches above, loose matches on Subject: below --
2013-11-28  9:13 Emmanuel Lacour
2013-11-28 10:05 ` Dave Chinner
2013-12-03  9:53   ` Emmanuel Lacour
2013-12-03 12:50     ` Dave Chinner
2013-12-03 16:28       ` Yann Dupont
2013-12-09  9:47       ` Emmanuel Lacour
2013-12-11 20:22       ` Ben Myers
2013-12-11 23:53         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130822022544.GS6023@dastard \
    --to=david@fromorbit.com \
    --cc=jeffpc@josefsipek.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox