From: Brian Foster <bfoster@redhat.com>
To: Michael Meier <michael.meier@fau.de>
Cc: xfs@oss.sgi.com
Subject: Re: XFS hangs with XFS: possible memory allocation deadlock in kmem_alloc
Date: Sat, 7 Mar 2015 09:07:21 -0500 [thread overview]
Message-ID: <20150307140721.GA9098@bfoster.bfoster> (raw)
In-Reply-To: <54FAAE16.6090505@fau.de>
On Sat, Mar 07, 2015 at 08:51:50AM +0100, Michael Meier wrote:
> We've recently upgraded the OS on one of our servers, and since then
> have been experiencing frequent stalls of the XFS filesystem on it.
> Other filesystems on the machine seem to still respond fine while XFS
> hangs. The stalls sometimes last for around 30 minutes, during which all
> attempts to access that filesystem hang completely - after that, the
> filesystem suddenly responds instantly again, as if there had never been
> any problem. The dmesg is full of these messages while it stalls:
> XFS: possible memory allocation deadlock in kmem_alloc (mode:0x8250)
> These also occour from time to time without the filesystem stalling (or
> at least it's not noticeable) - the messages appear about once in two
> hours, the stalls about once a day.
>
> Google did point me to some reports of these messages occouring at the
> end of 2013, but the kernels in question should all have had the fixes
> proposed back then - although one message back then suggested there were
> more places where this problem could occour that were not fixed yet.
>
> Kernels used were:
> - Ubuntu 3.13.0-44 - shows stalls, according to
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1382333 has the fix
> - Ubuntu 3.16.0-31 - shows stalls
> - Ubuntu 3.2.0-various - no stalls in more than 1 year
> We can actually still boot the machine with the 3.2.0 kernel, and it
> will run absolutely fine, but as that kernel will not be supported
> forever, I do not consider that a permanent solution.
>
> The machine should not be low on memory, the disk array far from its
> limits, and the I/O-load is mostly reads with very little writes, as
> this is a public FTP server.
>
> I have tried to collect some information, available at
> https://grid.rrze.uni-erlangen.de/~unrz191/syslog-with-xfs-hangs.log
>
Thanks for the data. Some notes from the backtraces in the first
instance:
- xfsaild is down in xlog_cil_force_lsn()->flush_work(). So it's trying
to push the log, but the workqueue worker is already running.
- The workqueue worker is here:
[298163.482697] Workqueue: xfs-cil/dm-0 xlog_cil_push_work [xfs]
... and it appears to be blocked on the ctx lock. This means either a
transaction is completing or somebody else is pushing the cil.
- Writeback and one or two other transactions are backed up waiting on
the ctx lock.
- rsync is running a transaction completion (e.g., holding ctx lock) and
blocked on memory allocation:
rsync D ffff88103f893440 0 44446 43197 0x00000000
ffff8809e4f7b9f0 0000000000000086 ffff880801a15bb0 ffff8809e4f7bfd8
0000000000013440 0000000000013440 ffff8810146428c0 ffff881013dd8000
ffff8809e4f7ba20 00000001046f17ea ffff881013dd8000 000000000000d158
Call Trace:
[<ffffffff817675c9>] schedule+0x29/0x70
[<ffffffff817668e5>] schedule_timeout+0x165/0x2a0
[<ffffffff8107a420>] ? ftrace_raw_event_tick_stop+0xc0/0xc0
[<ffffffff81767c9b>] io_schedule_timeout+0x9b/0xf0
[<ffffffff81180403>] congestion_wait+0x73/0x100
[<ffffffff810b4d10>] ? prepare_to_wait_event+0x100/0x100
[<ffffffffc01deaac>] kmem_alloc+0x6c/0xf0 [xfs]
[<ffffffffc022399f>] xfs_log_commit_cil+0x34f/0x470 [xfs]
[<ffffffffc01de37c>] xfs_trans_commit+0x11c/0x230 [xfs]
[<ffffffffc0212c81>] xfs_rename+0x601/0x670 [xfs]
[<ffffffffc01d41c2>] xfs_vn_rename+0x82/0x90 [xfs]
[<ffffffff811e34de>] vfs_rename+0x56e/0x740
[<ffffffff811e4383>] SYSC_renameat2+0x483/0x530
[<ffffffff811d6451>] ? __sb_end_write+0x31/0x60
[<ffffffff811f208f>] ? mnt_drop_write+0x1f/0x30
[<ffffffff811f34b4>] ? mntput+0x24/0x40
[<ffffffff811ea6ac>] ? dput+0x4c/0x180
[<ffffffff811f34b4>] ? mntput+0x24/0x40
[<ffffffff811dd39e>] ? path_put+0x1e/0x30
[<ffffffff811e561e>] SyS_rename+0x1e/0x20
[<ffffffff8176b66d>] system_call_fastpath+0x1a/0x1f
... so that appears to hold everything else up.
This looks potentially related to the ongoing transaction context memory
allocation discussion, as this code implements a tight retry loop with
time-based task waits and "no fail" allocations. This is also the source
of the "possible memory allocation deadlock" warning.
Dave might be able to comment a bit further on that. I'm not totally
clear on the mm interaction here and if/what a workaround might be. It
might be a good idea to grab the meminfo data when the stall is actually
in effect.
Considering this is a large memory box (64g), I wonder if some vm tuning
might help mitigate this behavior..? For example, increase
/proc/sys/vm/min_free_kbytes in hopes of allowing more memory for these
allocations when under pressure, or tune down the
dirty_ratio/dirty_background_ratio thresholds to more aggressively get
data onto disk..?
Brian
> Regards,
> --
> Michael Meier, Zentrale Systeme
> Friedrich-Alexander-Universitaet Erlangen-Nuernberg
> Regionales Rechenzentrum Erlangen
> Martensstrasse 1, 91058 Erlangen, Germany
> Tel.: +49 9131 85-28973, Fax: +49 9131 302941
> michael.meier@fau.de
> www.rrze.fau.de
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-03-07 14:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-07 7:51 XFS hangs with XFS: possible memory allocation deadlock in kmem_alloc Michael Meier
2015-03-07 14:07 ` Brian Foster [this message]
2015-03-07 19:14 ` Michael Meier
2015-03-09 11:52 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150307140721.GA9098@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=michael.meier@fau.de \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.