From: Jens Axboe <jens.axboe@oracle.com>
To: Damien Wyart <damien.wyart@free.fr>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
chris.mason@oracle.com, david@fromorbit.com, hch@infradead.org,
akpm@linux-foundation.org, jack@suse.cz,
yanmin_zhang@linux.intel.com, richard@rsk.demon.co.uk
Subject: Re: [PATCH 0/12] Per-bdi writeback flusher threads v7
Date: Tue, 26 May 2009 23:11:19 +0200 [thread overview]
Message-ID: <20090526211119.GO11363@kernel.dk> (raw)
In-Reply-To: <20090526204703.GM11363@kernel.dk>
On Tue, May 26 2009, Jens Axboe wrote:
> On Tue, May 26 2009, Damien Wyart wrote:
> > > > I have been playing with v7 since your sending and after a while
> > > > (short on laptop, longer on desktop, a few hours), writeback doesn't
> > > > seem to work anymore. Manual call to sync hangs (process in D state)
> > > > and Dirty value in meminfo gets growing. As previous versions had
> > > > been heavily tested, I guess there is some regression in v7.
> >
> > > Not good, the prime suspect is the sync notification stuff. I'll take
> > > a look and get that fixed. You didn't happen to catch any sysrq-t back
> > > traces or anything like that? Would be interesting to see where
> > > bdi-default and the bdi-* threads are stuck.
> >
> > No, as I was doing many things at the same time and not exclusively
> > debugging, I just rebooted hard and went back to an upatched kernel when
> > the problems occured. But I noticed only bdi-default was alive, the
> > other bdi-* threads had disappeared and the sync commands I had tried
> > were all in D state. Also I tried to reinstall a kernel .deb (these
> > systems are Debian) and this got stuck guring installation, when probing
> > grub config (do not know if there is some sync syscall inthere).
> >
> > Can try to go further tomorrow but will not have a lot of time...
>
> OK, I spotted the problem. If we fallback to the on-stack allocation in
> bdi_writeback_all(), then we do the wait for the work completion with
> the bdi_lock mutex held. This can deadlock with bdi_forker_task(), so if
> we require that to be invoked to make progress (happens if a thread
> needs to be restarted), then we have a deadlock on that mutex.
>
> I'll cook up a fix for this, but probably not before the morning.
Untested fix. I think it should work, but I haven't run it here yet.
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a185a16..1662ede 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -122,12 +122,11 @@ static void bdi_work_free(struct rcu_head *head)
static void wb_work_complete(struct bdi_work *work)
{
- if (!bdi_work_on_stack(work)) {
- bdi_work_clear(work);
+ const enum writeback_sync_modes sync_mode = work->sync_mode;
- if (work->sync_mode == WB_SYNC_NONE)
- call_rcu(&work->rcu_head, bdi_work_free);
- } else
+ if (!bdi_work_on_stack(work))
+ bdi_work_clear(work);
+ if (sync_mode == WB_SYNC_NONE || bdi_work_on_stack(work))
call_rcu(&work->rcu_head, bdi_work_free);
}
@@ -272,7 +271,7 @@ void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
*/
if (work == &work_stack || must_wait) {
bdi_wait_on_work_clear(work);
- if (must_wait)
+ if (must_wait && work != &work_stack)
call_rcu(&work->rcu_head, bdi_work_free);
}
}
@@ -511,10 +510,9 @@ int bdi_writeback_task(struct bdi_writeback *wb)
* we are simply called for WB_SYNC_NONE, then writeback will merely be
* scheduled to run.
*/
-void bdi_writeback_all(struct super_block *sb, long nr_pages,
- enum writeback_sync_modes sync_mode)
+void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc)
{
- const bool must_wait = sync_mode == WB_SYNC_ALL;
+ const bool must_wait = wbc->sync_mode == WB_SYNC_ALL;
struct backing_dev_info *bdi, *tmp;
struct bdi_work *work;
LIST_HEAD(list);
@@ -522,31 +520,21 @@ void bdi_writeback_all(struct super_block *sb, long nr_pages,
mutex_lock(&bdi_lock);
list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
- struct bdi_work *work, work_stack;
+ struct bdi_work *work;
if (!bdi_has_dirty_io(bdi))
continue;
- work = bdi_alloc_work(sb, nr_pages, sync_mode);
+ work = bdi_alloc_work(sb, wbc->nr_to_write, wbc->sync_mode);
if (!work) {
- work = &work_stack;
- bdi_work_init_on_stack(work, sb, nr_pages, sync_mode);
- } else if (must_wait)
+ generic_sync_bdi_inodes(sb, wbc);
+ continue;
+ }
+ if (must_wait)
list_add_tail(&work->wait_list, &list);
bdi_queue_work(bdi, work);
__bdi_start_work(bdi, work);
-
- /*
- * Do the wait inline if this came from the stack. This
- * only happens if we ran out of memory, so should very
- * rarely trigger.
- */
- if (work == &work_stack) {
- bdi_wait_on_work_clear(work);
- if (must_wait)
- call_rcu(&work->rcu_head, bdi_work_free);
- }
}
mutex_unlock(&bdi_lock);
@@ -1082,7 +1070,7 @@ void generic_sync_sb_inodes(struct super_block *sb,
if (wbc->bdi)
bdi_start_writeback(wbc->bdi, sb, wbc->nr_to_write, wbc->sync_mode);
else
- bdi_writeback_all(sb, wbc->nr_to_write, wbc->sync_mode);
+ bdi_writeback_all(sb, wbc);
if (wbc->sync_mode == WB_SYNC_ALL) {
struct inode *inode, *old_inode = NULL;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index dee38ec..679cfb8 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -107,8 +107,7 @@ void bdi_unregister(struct backing_dev_info *bdi);
void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
long nr_pages, enum writeback_sync_modes sync_mode);
int bdi_writeback_task(struct bdi_writeback *wb);
-void bdi_writeback_all(struct super_block *sb, long nr_pages,
- enum writeback_sync_modes sync_mode);
+void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc);
void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
void bdi_add_flusher_task(struct backing_dev_info *bdi);
int bdi_has_dirty_io(struct backing_dev_info *bdi);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7dd7de7..dd403cf 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -669,10 +669,18 @@ void throttle_vm_writeout(gfp_t gfp_mask)
*/
void wakeup_flusher_threads(long nr_pages)
{
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_NONE,
+ .older_than_this = NULL,
+ .range_cyclic = 1,
+ };
+
if (nr_pages == 0)
nr_pages = global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_UNSTABLE_NFS);
- bdi_writeback_all(NULL, nr_pages, WB_SYNC_NONE);
+
+ wbc.nr_to_write = nr_pages;
+ bdi_writeback_all(NULL, &wbc);
}
static void laptop_timer_fn(unsigned long unused);
--
Jens Axboe
next prev parent reply other threads:[~2009-05-26 21:11 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-26 9:33 [PATCH 0/12] Per-bdi writeback flusher threads v7 Jens Axboe
2009-05-26 9:33 ` [PATCH 01/12] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
2009-05-26 9:33 ` [PATCH 02/12] btrfs: properly register fs backing device Jens Axboe
2009-05-26 9:33 ` [PATCH 03/12] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-26 9:33 ` [PATCH 04/12] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-26 9:33 ` [PATCH 05/12] writeback: get rid of pdflush completely Jens Axboe
2009-05-26 9:33 ` [PATCH 06/12] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-26 9:33 ` [PATCH 07/12] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-06-04 17:44 ` Paul E. McKenney
2009-06-04 19:48 ` Jens Axboe
2009-05-26 9:33 ` [PATCH 08/12] writeback: include default_backing_dev_info in writeback Jens Axboe
2009-05-26 9:33 ` [PATCH 09/12] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-26 9:33 ` [PATCH 10/12] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-26 9:33 ` [PATCH 11/12] writeback: add name to backing_dev_info Jens Axboe
2009-05-26 9:33 ` [PATCH 12/12] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-05-26 15:25 ` [PATCH 0/12] Per-bdi writeback flusher threads v7 Damien Wyart
2009-05-26 16:41 ` Jens Axboe
2009-05-26 17:08 ` Damien Wyart
2009-05-26 17:10 ` Damien Wyart
2009-05-26 20:47 ` Jens Axboe
2009-05-26 21:11 ` Jens Axboe [this message]
2009-05-27 5:21 ` Damien Wyart
2009-05-27 5:49 ` Damien Wyart
2009-05-27 9:20 ` Jens Axboe
2009-05-27 13:15 ` Damien Wyart
2009-05-27 15:05 ` Jens Axboe
2009-05-27 21:06 ` Andrew Morton
2009-05-28 10:20 ` Jens Axboe
2009-05-27 6:17 ` Jens Axboe
2009-05-27 5:27 ` Zhang, Yanmin
2009-05-27 6:17 ` Jens Axboe
2009-06-02 2:07 ` Zhang, Yanmin
2009-06-02 11:53 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090526211119.GO11363@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=damien.wyart@free.fr \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=richard@rsk.demon.co.uk \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).