All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Emmanuel Lacour <elacour@easter-eggs.com>, xfs@oss.sgi.com
Subject: Re: XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
Date: Wed, 11 Dec 2013 14:22:26 -0600	[thread overview]
Message-ID: <20131211202226.GH1935@sgi.com> (raw)
In-Reply-To: <20131203125057.GU10988@dastard>

On Tue, Dec 03, 2013 at 11:50:57PM +1100, Dave Chinner wrote:
> On Tue, Dec 03, 2013 at 10:53:58AM +0100, Emmanuel Lacour wrote:
> > On Thu, Nov 28, 2013 at 09:05:21PM +1100, Dave Chinner wrote:
> > > On Thu, Nov 28, 2013 at 10:13:22AM +0100, Emmanuel Lacour wrote:
> > > > 
> > > > Dear XFS users,
> > > > 
> > > > 
> > > > I run a Ceph cluster using XFS on Debian wheezy servers and Linux 3.10
> > > > (debian backports). I see the following line in our logs:
> > > > 
> > > > XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
> > > > 
> > > > does this reveal a problem in my setup or may I ignore it? If it's a
> > > > problem, can someone give me any hint on solving this?
> > > 
> > > It might be, but you need to provide more information for us to be
> > > able to make any intelligent comment on the message. Start here:
> > > 
> > > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> > > 
> > 
> > 
> > The problem continue and crashed my ceph cluster again, so here is all
> > the informations said in the FAQ:
> > 
> > http://people.easter-eggs.org/~manu/xfs.log
> 
> OK, 32GB RAM, no obvious shortage, no dirty or writeback data.
> 2TB SATA drives, 32AGs, only unusual setting is 64k directory block
> size.
> 
> Yup, there's your problem:
> 
> [4583991.478469] ceph-osd        D ffff88047fc93f40     0 22951
> 1 0x00000004
> [4583991.478471]  ffff88046d241140 0000000000000082 ffffffff81047e75
> ffff88046f949800
> [4583991.478475]  0000000000013f40 ffff88039eb0bfd8 ffff88039eb0bfd8
> ffff88046d241140
> [4583991.478479]  0000000000000000 00000001444d68bd ffff88046d241140
> 0000000000000005
> [4583991.478483] Call Trace:
> [4583991.478487]  [<ffffffff81047e75>] ? internal_add_timer+0xd/0x28
> [4583991.478491]  [<ffffffff8138e34a>] ? schedule_timeout+0xeb/0x123
> [4583991.478494]  [<ffffffff81047e63>] ? ftrace_raw_event_timer_class+0x9d/0x9d
> [4583991.478498]  [<ffffffff8138edb6>] ? io_schedule_timeout+0x60/0x86
> [4583991.478502]  [<ffffffff810d85ad>] ? congestion_wait+0x70/0xdb
> [4583991.478505]  [<ffffffff8105858f>] ? abort_exclusive_wait+0x79/0x79
> [4583991.478518]  [<ffffffffa056e3f9>] ? kmem_alloc+0x65/0x6f [xfs]
> [4583991.478535]  [<ffffffffa0592c4a>] ? xfs_dir2_block_to_sf+0x5b/0x1fb [xfs]
> [4583991.478550]  [<ffffffffa0592be0>] ? xfs_dir2_block_sfsize+0x15b/0x16a [xfs]
> [4583991.478566]  [<ffffffffa058bf9a>] ? xfs_dir2_block_removename+0x1c7/0x208 [xfs]
> [4583991.478581]  [<ffffffffa058ab4a>] ? xfs_dir_removename+0xda/0x114 [xfs]
> [4583991.478594]  [<ffffffffa056a55c>] ? xfs_rename+0x428/0x554 [xfs]
> [4583991.478606]  [<ffffffffa0567321>] ? xfs_vn_rename+0x5e/0x65 [xfs]
> [4583991.478610]  [<ffffffff8111677b>] ? vfs_rename+0x224/0x35f
> [4583991.478614]  [<ffffffff81113d0b>] ? lookup_dcache+0x22/0x95
> [4583991.478618]  [<ffffffff81116a7e>] ? SYSC_renameat+0x1c8/0x257
> [4583991.478622]  [<ffffffff810fb0fd>] ? __cache_free.isra.45+0x178/0x187
> [4583991.478625]  [<ffffffff81117eb1>] ? SyS_mkdirat+0x2e/0xce
> [4583991.478629]  [<ffffffff8100d56a>] ? do_notify_resume+0x53/0x68
> [4583991.478633]  [<ffffffff81395429>] ?  system_call_fastpath+0x16/0x1b
> 
> It'll be stuck on this:
> 
> 	hdr = kmem_alloc(mp->m_dirblksize, KM_SLEEP);
> 
> which is trying to allocate a contiguous 64k buffer to copy the
> direct contents into before freeing the block and then formatting
> them into the inode. The failure will be caused by memory
> fragmentation, and the only way around it is to avoid the contiguous
> allocation of that size.
> 
> Which, I think, is pretty easy to do. Yup, barely smoke tested patch
> below that demonstrates the fix. Beware - patch may eat babies and
> ask for more. Use it at your own risk!
> 
> I'll post it for review once it's had some testing and I know it
> doesn't corrupt directories all over the place.
> 
> > This may be related to a friend problem here:
> > 
> > http://tracker.ceph.com/issues/6386
> 
> Doesn't look related, unless the OOM killer is being triggered
> somehow...
> 
> Hmmmm - there's also a good chance the the transaction commit code
> has this same problem contiguous allocation problem given that it
> has to allocate enough space to log an entire directory buffer. Good
> guess - there's another thread stuck on exactly that:
> 
> [4583991.476833] ceph-osd        D ffff88047fc33f40     0 11072 1 0x00000004
> [4583991.476836]  ffff88038b32a040 0000000000000082 ffffffff81047e75 ffff88046f946040
> [4583991.476840]  0000000000013f40 ffff88048ea11fd8 ffff88048ea11fd8 ffff88038b32a040
> [4583991.476844]  0000000000000000 00000001444d68be ffff88038b32a040 0000000000000005
> [4583991.476848] Call Trace:
> [4583991.476852]  [<ffffffff81047e75>] ? internal_add_timer+0xd/0x28
> [4583991.476855]  [<ffffffff8138e34a>] ? schedule_timeout+0xeb/0x123
> [4583991.476859]  [<ffffffff81047e63>] ? ftrace_raw_event_timer_class+0x9d/0x9d
> [4583991.476862]  [<ffffffff8138edb6>] ? io_schedule_timeout+0x60/0x86
> [4583991.476867]  [<ffffffff810d85ad>] ? congestion_wait+0x70/0xdb
> [4583991.476870]  [<ffffffff8105858f>] ? abort_exclusive_wait+0x79/0x79
> [4583991.476883]  [<ffffffffa056e3f9>] ? kmem_alloc+0x65/0x6f [xfs]
> [4583991.476899]  [<ffffffffa05a77a6>] ? xfs_log_commit_cil+0xe8/0x3d1 [xfs]
> [4583991.476904]  [<ffffffff810748ab>] ? current_kernel_time+0x9/0x30
> [4583991.476909]  [<ffffffff81041942>] ? current_fs_time+0x27/0x2d
> [4583991.476925]  [<ffffffffa05a3b7b>] ? xfs_trans_commit+0x62/0x1cf [xfs]
> [4583991.476939]  [<ffffffffa056d3ad>] ? xfs_create+0x41e/0x54f [xfs]
> [4583991.476943]  [<ffffffff81114574>] ? lookup_fast+0x3d/0x215
> [4583991.476954]  [<ffffffffa056cf29>] ? xfs_lookup+0x88/0xee [xfs]
> [4583991.476966]  [<ffffffffa0567428>] ? xfs_vn_mknod+0xb7/0x162 [xfs]
> [4583991.476970]  [<ffffffff81115fda>] ? vfs_create+0x62/0x8b
> [4583991.476974]  [<ffffffff81113d0b>] ? lookup_dcache+0x22/0x95
> [4583991.476978]  [<ffffffff81117179>] ? do_last+0x595/0xa16
> [4583991.476982]  [<ffffffff811176be>] ? path_openat+0xc4/0x335
> [4583991.476985]  [<ffffffff81117bda>] ? do_filp_open+0x2a/0x6e
> [4583991.476989]  [<ffffffff81120b62>] ? __alloc_fd+0xd0/0xe1
> [4583991.476993]  [<ffffffff8110b684>] ? do_sys_open+0x5c/0xe0
> [4583991.476996]  [<ffffffff81395429>] ? system_call_fastpath+0x16/0x1b
> 
> That one isn't so easy to fix, unfortunately.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> xfs: xfs_dir2_block_to_sf temp buffer allocation fails 
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> If we are using a large directory block size, and memory becomes
> fragmented, we can get memory allocation failures trying to
> kmem_alloc(64k) for a temporary buffer. However, there is not need
> for a directory buffer sized allocation, as the end result ends up
> in the inode literal area. This is, at most, slightly less than 2k
> of space, and hence we don't need an allocation larger than that
> fora temporary buffer.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

D'oh, I missed this one too.  If you stick 'patch' in the subject they'll have
additional visibility.

Looks good to me.

Reviewed-by: Ben Myers <bpm@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-12-11 20:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-28  9:13 XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) Emmanuel Lacour
2013-11-28 10:05 ` Dave Chinner
2013-12-03  9:53   ` Emmanuel Lacour
2013-12-03 12:50     ` Dave Chinner
2013-12-03 16:28       ` Yann Dupont
2013-12-09  9:47       ` Emmanuel Lacour
2013-12-11 20:22       ` Ben Myers [this message]
2013-12-11 23:53         ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2013-08-21 15:24 Josef 'Jeff' Sipek
2013-08-22  2:25 ` Dave Chinner
2013-08-22 15:07   ` Josef 'Jeff' Sipek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131211202226.GH1935@sgi.com \
    --to=bpm@sgi.com \
    --cc=david@fromorbit.com \
    --cc=elacour@easter-eggs.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.