From: Fengguang Wu <fengguang.wu@intel.com>
To: Dave Jones <davej@redhat.com>,
linux-mm@kvack.org, Linux Kernel <linux-kernel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org
Subject: Re: livelock in __writeback_inodes_wb ?
Date: Tue, 11 Dec 2012 16:23:27 +0800 [thread overview]
Message-ID: <20121211082327.GA15706@localhost> (raw)
In-Reply-To: <20121128145515.GA26564@redhat.com>
On Wed, Nov 28, 2012 at 09:55:15AM -0500, Dave Jones wrote:
> We had a user report the soft lockup detector kicked after 22
> seconds of no progress, with this trace..
Where is the original report? The reporter may help provide some clues
on the workload that triggered the bug.
> :BUG: soft lockup - CPU#1 stuck for 22s! [flush-8:16:3137]
> :Pid: 3137, comm: flush-8:16 Not tainted 3.6.7-4.fc17.x86_64 #1
> :RIP: 0010:[<ffffffff812eeb8c>] [<ffffffff812eeb8c>] __list_del_entry+0x2c/0xd0
> :Call Trace:
> : [<ffffffff811b783e>] redirty_tail+0x5e/0x80
> : [<ffffffff811b8212>] __writeback_inodes_wb+0x72/0xd0
> : [<ffffffff811b980b>] wb_writeback+0x23b/0x2d0
> : [<ffffffff811b9b5c>] wb_do_writeback+0xac/0x1f0
> : [<ffffffff8106c0e0>] ? __internal_add_timer+0x130/0x130
> : [<ffffffff811b9d2b>] bdi_writeback_thread+0x8b/0x230
> : [<ffffffff811b9ca0>] ? wb_do_writeback+0x1f0/0x1f0
> : [<ffffffff8107fde3>] kthread+0x93/0xa0
> : [<ffffffff81627e04>] kernel_thread_helper+0x4/0x10
> : [<ffffffff8107fd50>] ? kthread_freezable_should_stop+0x70/0x70
> : [<ffffffff81627e00>] ? gs_change+0x13/0x13
>
> Looking over the code, is it possible that something could be
> dirtying pages faster than writeback can get them written out,
> keeping us in this loop indefitely ?
The bug reporter should know best whether there are heavy IO.
However I suspect it's not directly caused by heavy IO: we will
release &wb->list_lock before each __writeback_single_inode() call,
which starts writeback IO for each inode.
> Should there be something in this loop periodically poking
> the watchdog perhaps ?
It seems we failed to release &wb->list_lock in wb_writeback() for
long time (dozens of seconds). That is, the inode_sleep_on_writeback()
is somehow not called. However it's not obvious to me how come this
can happen..
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Fengguang Wu <fengguang.wu@intel.com>
To: Dave Jones <davej@redhat.com>,
linux-mm@kvack.org, Linux Kernel <linux-kernel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org
Subject: Re: livelock in __writeback_inodes_wb ?
Date: Tue, 11 Dec 2012 16:23:27 +0800 [thread overview]
Message-ID: <20121211082327.GA15706@localhost> (raw)
In-Reply-To: <20121128145515.GA26564@redhat.com>
On Wed, Nov 28, 2012 at 09:55:15AM -0500, Dave Jones wrote:
> We had a user report the soft lockup detector kicked after 22
> seconds of no progress, with this trace..
Where is the original report? The reporter may help provide some clues
on the workload that triggered the bug.
> :BUG: soft lockup - CPU#1 stuck for 22s! [flush-8:16:3137]
> :Pid: 3137, comm: flush-8:16 Not tainted 3.6.7-4.fc17.x86_64 #1
> :RIP: 0010:[<ffffffff812eeb8c>] [<ffffffff812eeb8c>] __list_del_entry+0x2c/0xd0
> :Call Trace:
> : [<ffffffff811b783e>] redirty_tail+0x5e/0x80
> : [<ffffffff811b8212>] __writeback_inodes_wb+0x72/0xd0
> : [<ffffffff811b980b>] wb_writeback+0x23b/0x2d0
> : [<ffffffff811b9b5c>] wb_do_writeback+0xac/0x1f0
> : [<ffffffff8106c0e0>] ? __internal_add_timer+0x130/0x130
> : [<ffffffff811b9d2b>] bdi_writeback_thread+0x8b/0x230
> : [<ffffffff811b9ca0>] ? wb_do_writeback+0x1f0/0x1f0
> : [<ffffffff8107fde3>] kthread+0x93/0xa0
> : [<ffffffff81627e04>] kernel_thread_helper+0x4/0x10
> : [<ffffffff8107fd50>] ? kthread_freezable_should_stop+0x70/0x70
> : [<ffffffff81627e00>] ? gs_change+0x13/0x13
>
> Looking over the code, is it possible that something could be
> dirtying pages faster than writeback can get them written out,
> keeping us in this loop indefitely ?
The bug reporter should know best whether there are heavy IO.
However I suspect it's not directly caused by heavy IO: we will
release &wb->list_lock before each __writeback_single_inode() call,
which starts writeback IO for each inode.
> Should there be something in this loop periodically poking
> the watchdog perhaps ?
It seems we failed to release &wb->list_lock in wb_writeback() for
long time (dozens of seconds). That is, the inode_sleep_on_writeback()
is somehow not called. However it's not obvious to me how come this
can happen..
Thanks,
Fengguang
next prev parent reply other threads:[~2012-12-11 8:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-28 14:55 livelock in __writeback_inodes_wb ? Dave Jones
2012-11-28 14:55 ` Dave Jones
2012-12-11 8:23 ` Fengguang Wu [this message]
2012-12-11 8:23 ` Fengguang Wu
2012-12-11 13:41 ` Jan Kara
2012-12-11 13:41 ` Jan Kara
2012-12-14 9:13 ` Fengguang Wu
2012-12-14 9:13 ` Fengguang Wu
2012-12-11 14:29 ` Dave Jones
2012-12-11 14:29 ` Dave Jones
2012-12-12 2:26 ` Simon Jeons
2012-12-12 2:26 ` Simon Jeons
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121211082327.GA15706@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=davej@redhat.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.