All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Damien Wyart <damien.wyart@free.fr>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	chris.mason@oracle.com, david@fromorbit.com, hch@infradead.org,
	akpm@linux-foundation.org, jack@suse.cz,
	yanmin_zhang@linux.intel.com, richard@rsk.demon.co.uk
Subject: Re: [PATCH 0/12] Per-bdi writeback flusher threads v7
Date: Tue, 26 May 2009 23:11:19 +0200	[thread overview]
Message-ID: <20090526211119.GO11363@kernel.dk> (raw)
In-Reply-To: <20090526204703.GM11363@kernel.dk>

On Tue, May 26 2009, Jens Axboe wrote:
> On Tue, May 26 2009, Damien Wyart wrote:
> > > > I have been playing with v7 since your sending and after a while
> > > > (short on laptop, longer on desktop, a few hours), writeback doesn't
> > > > seem to work anymore. Manual call to sync hangs (process in D state)
> > > > and Dirty value in meminfo gets growing. As previous versions had
> > > > been heavily tested, I guess there is some regression in v7.
> > 
> > > Not good, the prime suspect is the sync notification stuff. I'll take
> > > a look and get that fixed. You didn't happen to catch any sysrq-t back
> > > traces or anything like that? Would be interesting to see where
> > > bdi-default and the bdi-* threads are stuck.
> > 
> > No, as I was doing many things at the same time and not exclusively
> > debugging, I just rebooted hard and went back to an upatched kernel when
> > the problems occured. But I noticed only bdi-default was alive, the
> > other bdi-* threads had disappeared and the sync commands I had tried
> > were all in D state. Also I tried to reinstall a kernel .deb (these
> > systems are Debian) and this got stuck guring installation, when probing
> > grub config (do not know if there is some sync syscall inthere).
> > 
> > Can try to go further tomorrow but will not have a lot of time...
> 
> OK, I spotted the problem. If we fallback to the on-stack allocation in
> bdi_writeback_all(), then we do the wait for the work completion with
> the bdi_lock mutex held. This can deadlock with bdi_forker_task(), so if
> we require that to be invoked to make progress (happens if a thread
> needs to be restarted), then we have a deadlock on that mutex.
> 
> I'll cook up a fix for this, but probably not before the morning.

Untested fix. I think it should work, but I haven't run it here yet.

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a185a16..1662ede 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -122,12 +122,11 @@ static void bdi_work_free(struct rcu_head *head)
 
 static void wb_work_complete(struct bdi_work *work)
 {
-	if (!bdi_work_on_stack(work)) {
-		bdi_work_clear(work);
+	const enum writeback_sync_modes sync_mode = work->sync_mode;
 
-		if (work->sync_mode == WB_SYNC_NONE)
-			call_rcu(&work->rcu_head, bdi_work_free);
-	} else
+	if (!bdi_work_on_stack(work))
+		bdi_work_clear(work);
+	if (sync_mode == WB_SYNC_NONE || bdi_work_on_stack(work))
 		call_rcu(&work->rcu_head, bdi_work_free);
 }
 
@@ -272,7 +271,7 @@ void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
 	 */
 	if (work == &work_stack || must_wait) {
 		bdi_wait_on_work_clear(work);
-		if (must_wait)
+		if (must_wait && work != &work_stack)
 			call_rcu(&work->rcu_head, bdi_work_free);
 	}
 }
@@ -511,10 +510,9 @@ int bdi_writeback_task(struct bdi_writeback *wb)
  * we are simply called for WB_SYNC_NONE, then writeback will merely be
  * scheduled to run.
  */
-void bdi_writeback_all(struct super_block *sb, long nr_pages,
-		       enum writeback_sync_modes sync_mode)
+void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc)
 {
-	const bool must_wait = sync_mode == WB_SYNC_ALL;
+	const bool must_wait = wbc->sync_mode == WB_SYNC_ALL;
 	struct backing_dev_info *bdi, *tmp;
 	struct bdi_work *work;
 	LIST_HEAD(list);
@@ -522,31 +520,21 @@ void bdi_writeback_all(struct super_block *sb, long nr_pages,
 	mutex_lock(&bdi_lock);
 
 	list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
-		struct bdi_work *work, work_stack;
+		struct bdi_work *work;
 
 		if (!bdi_has_dirty_io(bdi))
 			continue;
 
-		work = bdi_alloc_work(sb, nr_pages, sync_mode);
+		work = bdi_alloc_work(sb, wbc->nr_to_write, wbc->sync_mode);
 		if (!work) {
-			work = &work_stack;
-			bdi_work_init_on_stack(work, sb, nr_pages, sync_mode);
-		} else if (must_wait)
+			generic_sync_bdi_inodes(sb, wbc);
+			continue;
+		}
+		if (must_wait)
 			list_add_tail(&work->wait_list, &list);
 
 		bdi_queue_work(bdi, work);
 		__bdi_start_work(bdi, work);
-
-		/*
-		 * Do the wait inline if this came from the stack. This
-		 * only happens if we ran out of memory, so should very
-		 * rarely trigger.
-		 */
-		if (work == &work_stack) {
-			bdi_wait_on_work_clear(work);
-			if (must_wait)
-				call_rcu(&work->rcu_head, bdi_work_free);
-		}
 	}
 
 	mutex_unlock(&bdi_lock);
@@ -1082,7 +1070,7 @@ void generic_sync_sb_inodes(struct super_block *sb,
 	if (wbc->bdi)
 		bdi_start_writeback(wbc->bdi, sb, wbc->nr_to_write, wbc->sync_mode);
 	else
-		bdi_writeback_all(sb, wbc->nr_to_write, wbc->sync_mode);
+		bdi_writeback_all(sb, wbc);
 
 	if (wbc->sync_mode == WB_SYNC_ALL) {
 		struct inode *inode, *old_inode = NULL;
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index dee38ec..679cfb8 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -107,8 +107,7 @@ void bdi_unregister(struct backing_dev_info *bdi);
 void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
 			 long nr_pages, enum writeback_sync_modes sync_mode);
 int bdi_writeback_task(struct bdi_writeback *wb);
-void bdi_writeback_all(struct super_block *sb, long nr_pages,
-			enum writeback_sync_modes sync_mode);
+void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc);
 void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
 void bdi_add_flusher_task(struct backing_dev_info *bdi);
 int bdi_has_dirty_io(struct backing_dev_info *bdi);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7dd7de7..dd403cf 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -669,10 +669,18 @@ void throttle_vm_writeout(gfp_t gfp_mask)
  */
 void wakeup_flusher_threads(long nr_pages)
 {
+	struct writeback_control wbc = {
+		.sync_mode	= WB_SYNC_NONE,
+		.older_than_this = NULL,
+		.range_cyclic	= 1,
+	};
+
 	if (nr_pages == 0)
 		nr_pages = global_page_state(NR_FILE_DIRTY) +
 				global_page_state(NR_UNSTABLE_NFS);
-	bdi_writeback_all(NULL, nr_pages, WB_SYNC_NONE);
+
+	wbc.nr_to_write = nr_pages;
+	bdi_writeback_all(NULL, &wbc);
 }
 
 static void laptop_timer_fn(unsigned long unused);

-- 
Jens Axboe


  reply	other threads:[~2009-05-26 21:11 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-26  9:33 [PATCH 0/12] Per-bdi writeback flusher threads v7 Jens Axboe
2009-05-26  9:33 ` [PATCH 01/12] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
2009-05-26  9:33 ` [PATCH 02/12] btrfs: properly register fs backing device Jens Axboe
2009-05-26  9:33 ` [PATCH 03/12] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-26  9:33 ` [PATCH 04/12] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-26  9:33 ` [PATCH 05/12] writeback: get rid of pdflush completely Jens Axboe
2009-05-26  9:33 ` [PATCH 06/12] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-26  9:33 ` [PATCH 07/12] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-06-04 17:44   ` Paul E. McKenney
2009-06-04 19:48     ` Jens Axboe
2009-05-26  9:33 ` [PATCH 08/12] writeback: include default_backing_dev_info in writeback Jens Axboe
2009-05-26  9:33 ` [PATCH 09/12] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-26  9:33 ` [PATCH 10/12] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-26  9:33 ` [PATCH 11/12] writeback: add name to backing_dev_info Jens Axboe
2009-05-26  9:33 ` [PATCH 12/12] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-05-26 15:25 ` [PATCH 0/12] Per-bdi writeback flusher threads v7 Damien Wyart
2009-05-26 16:41   ` Jens Axboe
2009-05-26 17:08     ` Damien Wyart
2009-05-26 17:10       ` Damien Wyart
2009-05-26 20:47       ` Jens Axboe
2009-05-26 21:11         ` Jens Axboe [this message]
2009-05-27  5:21           ` Damien Wyart
2009-05-27  5:49             ` Damien Wyart
2009-05-27  9:20               ` Jens Axboe
2009-05-27 13:15                 ` Damien Wyart
2009-05-27 15:05                   ` Jens Axboe
2009-05-27 21:06                     ` Andrew Morton
2009-05-28 10:20                       ` Jens Axboe
2009-05-27  6:17             ` Jens Axboe
2009-05-27  5:27 ` Zhang, Yanmin
2009-05-27  6:17   ` Jens Axboe
2009-06-02  2:07     ` Zhang, Yanmin
2009-06-02 11:53       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090526211119.GO11363@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=damien.wyart@free.fr \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richard@rsk.demon.co.uk \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.