linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Damien Wyart <damien.wyart@free.fr>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	chris.mason@oracle.com, david@fromorbit.com, hch@infradead.org,
	akpm@linux-foundation.org, jack@suse.cz,
	yanmin_zhang@linux.intel.com, richard@rsk.demon.co.uk
Subject: Re: [PATCH 0/12] Per-bdi writeback flusher threads v7
Date: Tue, 26 May 2009 23:11:19 +0200	[thread overview]
Message-ID: <20090526211119.GO11363@kernel.dk> (raw)
In-Reply-To: <20090526204703.GM11363@kernel.dk>

On Tue, May 26 2009, Jens Axboe wrote:
> On Tue, May 26 2009, Damien Wyart wrote:
> > > > I have been playing with v7 since your sending and after a while
> > > > (short on laptop, longer on desktop, a few hours), writeback doesn't
> > > > seem to work anymore. Manual call to sync hangs (process in D state)
> > > > and Dirty value in meminfo gets growing. As previous versions had
> > > > been heavily tested, I guess there is some regression in v7.
> > 
> > > Not good, the prime suspect is the sync notification stuff. I'll take
> > > a look and get that fixed. You didn't happen to catch any sysrq-t back
> > > traces or anything like that? Would be interesting to see where
> > > bdi-default and the bdi-* threads are stuck.
> > 
> > No, as I was doing many things at the same time and not exclusively
> > debugging, I just rebooted hard and went back to an upatched kernel when
> > the problems occured. But I noticed only bdi-default was alive, the
> > other bdi-* threads had disappeared and the sync commands I had tried
> > were all in D state. Also I tried to reinstall a kernel .deb (these
> > systems are Debian) and this got stuck guring installation, when probing
> > grub config (do not know if there is some sync syscall inthere).
> > 
> > Can try to go further tomorrow but will not have a lot of time...
> 
> OK, I spotted the problem. If we fallback to the on-stack allocation in
> bdi_writeback_all(), then we do the wait for the work completion with
> the bdi_lock mutex held. This can deadlock with bdi_forker_task(), so if
> we require that to be invoked to make progress (happens if a thread
> needs to be restarted), then we have a deadlock on that mutex.
> 
> I'll cook up a fix for this, but probably not before the morning.

Untested fix. I think it should work, but I haven't run it here yet.

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a185a16..1662ede 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -122,12 +122,11 @@ static void bdi_work_free(struct rcu_head *head)
 
 static void wb_work_complete(struct bdi_work *work)
 {
-	if (!bdi_work_on_stack(work)) {
-		bdi_work_clear(work);
+	const enum writeback_sync_modes sync_mode = work->sync_mode;
 
-		if (work->sync_mode == WB_SYNC_NONE)
-			call_rcu(&work->rcu_head, bdi_work_free);
-	} else
+	if (!bdi_work_on_stack(work))
+		bdi_work_clear(work);
+	if (sync_mode == WB_SYNC_NONE || bdi_work_on_stack(work))
 		call_rcu(&work->rcu_head, bdi_work_free);
 }
 
@@ -272,7 +271,7 @@ void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
 	 */
 	if (work == &work_stack || must_wait) {
 		bdi_wait_on_work_clear(work);
-		if (must_wait)
+		if (must_wait && work != &work_stack)
 			call_rcu(&work->rcu_head, bdi_work_free);
 	}
 }
@@ -511,10 +510,9 @@ int bdi_writeback_task(struct bdi_writeback *wb)
  * we are simply called for WB_SYNC_NONE, then writeback will merely be
  * scheduled to run.
  */
-void bdi_writeback_all(struct super_block *sb, long nr_pages,
-		       enum writeback_sync_modes sync_mode)
+void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc)
 {
-	const bool must_wait = sync_mode == WB_SYNC_ALL;
+	const bool must_wait = wbc->sync_mode == WB_SYNC_ALL;
 	struct backing_dev_info *bdi, *tmp;
 	struct bdi_work *work;
 	LIST_HEAD(list);
@@ -522,31 +520,21 @@ void bdi_writeback_all(struct super_block *sb, long nr_pages,
 	mutex_lock(&bdi_lock);
 
 	list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
-		struct bdi_work *work, work_stack;
+		struct bdi_work *work;
 
 		if (!bdi_has_dirty_io(bdi))
 			continue;
 
-		work = bdi_alloc_work(sb, nr_pages, sync_mode);
+		work = bdi_alloc_work(sb, wbc->nr_to_write, wbc->sync_mode);
 		if (!work) {
-			work = &work_stack;
-			bdi_work_init_on_stack(work, sb, nr_pages, sync_mode);
-		} else if (must_wait)
+			generic_sync_bdi_inodes(sb, wbc);
+			continue;
+		}
+		if (must_wait)
 			list_add_tail(&work->wait_list, &list);
 
 		bdi_queue_work(bdi, work);
 		__bdi_start_work(bdi, work);
-
-		/*
-		 * Do the wait inline if this came from the stack. This
-		 * only happens if we ran out of memory, so should very
-		 * rarely trigger.
-		 */
-		if (work == &work_stack) {
-			bdi_wait_on_work_clear(work);
-			if (must_wait)
-				call_rcu(&work->rcu_head, bdi_work_free);
-		}
 	}
 
 	mutex_unlock(&bdi_lock);
@@ -1082,7 +1070,7 @@ void generic_sync_sb_inodes(struct super_block *sb,
 	if (wbc->bdi)
 		bdi_start_writeback(wbc->bdi, sb, wbc->nr_to_write, wbc->sync_mode);
 	else
-		bdi_writeback_all(sb, wbc->nr_to_write, wbc->sync_mode);
+		bdi_writeback_all(sb, wbc);
 
 	if (wbc->sync_mode == WB_SYNC_ALL) {
 		struct inode *inode, *old_inode = NULL;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index dee38ec..679cfb8 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -107,8 +107,7 @@ void bdi_unregister(struct backing_dev_info *bdi);
 void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
 			 long nr_pages, enum writeback_sync_modes sync_mode);
 int bdi_writeback_task(struct bdi_writeback *wb);
-void bdi_writeback_all(struct super_block *sb, long nr_pages,
-			enum writeback_sync_modes sync_mode);
+void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc);
 void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
 void bdi_add_flusher_task(struct backing_dev_info *bdi);
 int bdi_has_dirty_io(struct backing_dev_info *bdi);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7dd7de7..dd403cf 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -669,10 +669,18 @@ void throttle_vm_writeout(gfp_t gfp_mask)
  */
 void wakeup_flusher_threads(long nr_pages)
 {
+	struct writeback_control wbc = {
+		.sync_mode	= WB_SYNC_NONE,
+		.older_than_this = NULL,
+		.range_cyclic	= 1,
+	};
+
 	if (nr_pages == 0)
 		nr_pages = global_page_state(NR_FILE_DIRTY) +
 				global_page_state(NR_UNSTABLE_NFS);
-	bdi_writeback_all(NULL, nr_pages, WB_SYNC_NONE);
+
+	wbc.nr_to_write = nr_pages;
+	bdi_writeback_all(NULL, &wbc);
 }
 
 static void laptop_timer_fn(unsigned long unused);

-- 
Jens Axboe


  reply	other threads:[~2009-05-26 21:11 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-26  9:33 [PATCH 0/12] Per-bdi writeback flusher threads v7 Jens Axboe
2009-05-26  9:33 ` [PATCH 01/12] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
2009-05-26  9:33 ` [PATCH 02/12] btrfs: properly register fs backing device Jens Axboe
2009-05-26  9:33 ` [PATCH 03/12] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-26  9:33 ` [PATCH 04/12] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-26  9:33 ` [PATCH 05/12] writeback: get rid of pdflush completely Jens Axboe
2009-05-26  9:33 ` [PATCH 06/12] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-26  9:33 ` [PATCH 07/12] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-06-04 17:44   ` Paul E. McKenney
2009-06-04 19:48     ` Jens Axboe
2009-05-26  9:33 ` [PATCH 08/12] writeback: include default_backing_dev_info in writeback Jens Axboe
2009-05-26  9:33 ` [PATCH 09/12] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-26  9:33 ` [PATCH 10/12] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-26  9:33 ` [PATCH 11/12] writeback: add name to backing_dev_info Jens Axboe
2009-05-26  9:33 ` [PATCH 12/12] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-05-26 15:25 ` [PATCH 0/12] Per-bdi writeback flusher threads v7 Damien Wyart
2009-05-26 16:41   ` Jens Axboe
2009-05-26 17:08     ` Damien Wyart
2009-05-26 17:10       ` Damien Wyart
2009-05-26 20:47       ` Jens Axboe
2009-05-26 21:11         ` Jens Axboe [this message]
2009-05-27  5:21           ` Damien Wyart
2009-05-27  5:49             ` Damien Wyart
2009-05-27  9:20               ` Jens Axboe
2009-05-27 13:15                 ` Damien Wyart
2009-05-27 15:05                   ` Jens Axboe
2009-05-27 21:06                     ` Andrew Morton
2009-05-28 10:20                       ` Jens Axboe
2009-05-27  6:17             ` Jens Axboe
2009-05-27  5:27 ` Zhang, Yanmin
2009-05-27  6:17   ` Jens Axboe
2009-06-02  2:07     ` Zhang, Yanmin
2009-06-02 11:53       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090526211119.GO11363@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=damien.wyart@free.fr \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richard@rsk.demon.co.uk \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).