[PATCH 0/3] Small writeback code fixes and improvements

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] Small writeback code fixes and improvements
@ 2010-08-03 16:20 Jan Kara
  2010-08-03 16:20 ` [PATCH 1/3] mm: Stop background writeback if there is other work queued for the thread Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jan Kara @ 2010-08-03 16:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Wu Fengguang, linux-fsdevel, Jens Axboe

  Hi Andrew,

  could you please queue these three fixes after a few Fengguang's writeback
fixes you already carry? They are mostly simple improvements and with them
I'm not able to livelock sync(1) anymore doing normal things like copying
trees.

								Honza

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] mm: Stop background writeback if there is other work queued for the thread
  2010-08-03 16:20 [PATCH 0/3] Small writeback code fixes and improvements Jan Kara
@ 2010-08-03 16:20 ` Jan Kara
  2010-08-03 16:33   ` Christoph Hellwig
  2010-08-03 16:20 ` [PATCH 2/3] mm: Fix writeback_in_progress() Jan Kara
  2010-08-03 16:20 ` [PATCH 3/3] mm: Avoid resetting wb_start after each writeback round Jan Kara
  2 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2010-08-03 16:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Wu Fengguang, linux-fsdevel, Jens Axboe, Jan Kara

Background writeback and kupdate-style writeback are easily livelockable
(from a definition of their target). This is inconvenient because it can
make sync(1) stall forever waiting on its queued work to be finished.
Fix the problem by interrupting background and kupdate writeback if there
is some other work to do. We can return to them after completing all the
queued work.

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index d5be169..542471e 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -633,6 +633,14 @@ static long wb_writeback(struct bdi_writeback *wb,
 			break;
 
 		/*
+		 * Background writeout and kupdate-style writeback are
+		 * easily livelockable. Stop them if there is other work
+		 * to do so that e.g. sync can proceed.
+		 */
+		if ((work->for_background || work->for_kupdate) &&
+		    !list_empty(&wb->bdi->work_list))
+			break;
+		/*
 		 * For background writeout, stop when we are below the
 		 * background dirty threshold
 		 */
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] mm: Fix writeback_in_progress()
  2010-08-03 16:20 [PATCH 0/3] Small writeback code fixes and improvements Jan Kara
  2010-08-03 16:20 ` [PATCH 1/3] mm: Stop background writeback if there is other work queued for the thread Jan Kara
@ 2010-08-03 16:20 ` Jan Kara
  2010-08-03 16:36   ` Christoph Hellwig
  2010-08-03 16:20 ` [PATCH 3/3] mm: Avoid resetting wb_start after each writeback round Jan Kara
  2 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2010-08-03 16:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Wu Fengguang, linux-fsdevel, Jens Axboe, Jan Kara, hch

Commit 83ba7b071f30f7c01f72518ad72d5cd203c27502 broke writeback_in_progress()
as in that commit we started to remove work items from the list at the
moment we start working on them and not at the moment they are finished.
Thus if the flusher thread was doing some work but there was no other
work queued, writeback_in_progress() returned false. This could in
particular cause unnecessary queueing of background writeback from
balance_dirty_pages() or writeout work from writeback_sb_if_idle().

This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.

CC: hch@infradead.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c           |    4 +++-
 include/linux/backing-dev.h |    1 +
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 542471e..6bdc924 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -59,7 +59,7 @@ struct wb_writeback_work {
  */
 int writeback_in_progress(struct backing_dev_info *bdi)
 {
-	return !list_empty(&bdi->work_list);
+	return test_bit(BDI_writeback_running, &bdi->state);
 }
 
 static void bdi_queue_work(struct backing_dev_info *bdi,
@@ -751,6 +751,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 	struct wb_writeback_work *work;
 	long wrote = 0;
 
+	set_bit(BDI_writeback_running, &wb->bdi->state);
 	while ((work = get_next_work_item(bdi, wb)) != NULL) {
 		/*
 		 * Override sync mode, in case we must wait for completion
@@ -775,6 +776,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 	 * Check for periodic writeback, kupdated() style
 	 */
 	wrote += wb_check_old_data_flush(wb);
+	clear_bit(BDI_writeback_running, &wb->bdi->state);
 
 	return wrote;
 }
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index e9aec0d..12ba2cd 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -31,6 +31,7 @@ enum bdi_state {
 	BDI_async_congested,	/* The async (write) queue is getting full */
 	BDI_sync_congested,	/* The sync queue is getting full */
 	BDI_registered,		/* bdi_register() was done */
+	BDI_writeback_running,	/* Writeback is in progress */
 	BDI_unused,		/* Available bits start here */
 };
 
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] mm: Avoid resetting wb_start after each writeback round
  2010-08-03 16:20 [PATCH 0/3] Small writeback code fixes and improvements Jan Kara
  2010-08-03 16:20 ` [PATCH 1/3] mm: Stop background writeback if there is other work queued for the thread Jan Kara
  2010-08-03 16:20 ` [PATCH 2/3] mm: Fix writeback_in_progress() Jan Kara
@ 2010-08-03 16:20 ` Jan Kara
  2 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2010-08-03 16:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Wu Fengguang, linux-fsdevel, Jens Axboe, Jan Kara

WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't write
out some huge inode for too long while starving writeout of other inodes. To
avoid livelocks, we record time we started writeback in wbc->wb_start and do
not write out inodes which were dirtied after this time. But currently,
writeback_inodes_wb() resets wb_start each time it is called thus effectively
invalidating this logic and making any WB_SYNC_NONE writeback prone to
livelocks.

This patch makes sure wb_start is set only once when we start writeback.

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6bdc924..aa59394 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -530,7 +530,8 @@ void writeback_inodes_wb(struct bdi_writeback *wb,
 {
 	int ret = 0;
 
-	wbc->wb_start = jiffies; /* livelock avoidance */
+	if (!wbc->wb_start)
+		wbc->wb_start = jiffies; /* livelock avoidance */
 	spin_lock(&inode_lock);
 	if (!wbc->for_kupdate || list_empty(&wb->b_io))
 		queue_io(wb, wbc->older_than_this);
@@ -559,7 +560,6 @@ static void __writeback_inodes_sb(struct super_block *sb,
 {
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 
-	wbc->wb_start = jiffies; /* livelock avoidance */
 	spin_lock(&inode_lock);
 	if (!wbc->for_kupdate || list_empty(&wb->b_io))
 		queue_io(wb, wbc->older_than_this);
@@ -625,6 +625,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		wbc.range_end = LLONG_MAX;
 	}
 
+	wbc.wb_start = jiffies; /* livelock avoidance */
 	for (;;) {
 		/*
 		 * Stop writeback when nr_pages has been consumed
-- 
1.6.4.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm: Stop background writeback if there is other work queued for the thread
  2010-08-03 16:20 ` [PATCH 1/3] mm: Stop background writeback if there is other work queued for the thread Jan Kara
@ 2010-08-03 16:33   ` Christoph Hellwig
  2010-08-03 20:41     ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2010-08-03 16:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, Wu Fengguang, linux-fsdevel, Jens Axboe

On Tue, Aug 03, 2010 at 06:20:38PM +0200, Jan Kara wrote:
> Background writeback and kupdate-style writeback are easily livelockable
> (from a definition of their target). This is inconvenient because it can
> make sync(1) stall forever waiting on its queued work to be finished.
> Fix the problem by interrupting background and kupdate writeback if there
> is some other work to do. We can return to them after completing all the
> queued work.

I don't think this is quite correct.  If the only other callers was
fsync_filesystems that would be true, but we also have the odd
writeback_if_idle hack in ext4, and the bdi_start_writeback hack
in laptop mode. 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] mm: Fix writeback_in_progress()
  2010-08-03 16:20 ` [PATCH 2/3] mm: Fix writeback_in_progress() Jan Kara
@ 2010-08-03 16:36   ` Christoph Hellwig
  2010-08-03 20:55     ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2010-08-03 16:36 UTC (permalink / raw)
  To: Jan Kara; +Cc: Andrew Morton, Wu Fengguang, linux-fsdevel, Jens Axboe, hch

The patch looks correct to me, but I wonder if we shouldn't try to
get rid of writeback_in_progress instead.  There's just two users,
one is the ext4 writeback_if_idle hack, and the other one is
balance_dirty_pages.  The latter really should only care about about
other background reclaim beeing pending, and I have no idea why the
former cares - if e.g. any kind of kupdate in the background is pending
which just writes out a few inodes on another fs on the same bdi
it's exiting.  The XFS variant of this does unconditional writeback
for all inodes on the sb.

Long term I wonder if we shouldn't get rid of the work list entirely,
and just have a few targets for the flusher thread - we just set a
target and wake it up, and it keeps running until all targets are met.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm: Stop background writeback if there is other work queued for the thread
  2010-08-03 16:33   ` Christoph Hellwig
@ 2010-08-03 20:41     ` Jan Kara
  2010-08-06 12:28       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2010-08-03 20:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Andrew Morton, Wu Fengguang, linux-fsdevel, Jens Axboe

On Tue 03-08-10 12:33:21, Christoph Hellwig wrote:
> On Tue, Aug 03, 2010 at 06:20:38PM +0200, Jan Kara wrote:
> > Background writeback and kupdate-style writeback are easily livelockable
> > (from a definition of their target). This is inconvenient because it can
> > make sync(1) stall forever waiting on its queued work to be finished.
> > Fix the problem by interrupting background and kupdate writeback if there
> > is some other work to do. We can return to them after completing all the
> > queued work.
> 
> I don't think this is quite correct.  If the only other callers was
> fsync_filesystems that would be true, but we also have the odd
> writeback_if_idle hack in ext4, and the bdi_start_writeback hack
> in laptop mode. 
  Yes, I'm aware of these. But do they really matter? bdi_start_writeback()
hack is almost like background writeback (it's write-it-all thing) and so
yielding to it does not change much. Similarly the writeback_if_idle thing
(although that works only for a single sb which makes difference if there
are several active filesystems on the bdi).
  But regardless of what other work items actually do, I view background /
kupdate writeback as the type of thing we do when noone else sends bdi
dirty data to the backing device.  So if someone has actually idea what to
write out, we give it priority and then return to doing our background /
kupdate writeback. What do you think?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] mm: Fix writeback_in_progress()
  2010-08-03 16:36   ` Christoph Hellwig
@ 2010-08-03 20:55     ` Jan Kara
  2010-08-06 12:34       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2010-08-03 20:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Andrew Morton, Wu Fengguang, linux-fsdevel, Jens Axboe

On Tue 03-08-10 12:36:10, Christoph Hellwig wrote:
> The patch looks correct to me, but I wonder if we shouldn't try to
> get rid of writeback_in_progress instead.  There's just two users,
  Interesting. Yes, it might make sense.

> one is the ext4 writeback_if_idle hack, and the other one is
> balance_dirty_pages.  The latter really should only care about about
> other background reclaim beeing pending, and I have no idea why the
> former cares - if e.g. any kind of kupdate in the background is pending
> which just writes out a few inodes on another fs on the same bdi
> it's exiting.  The XFS variant of this does unconditional writeback
> for all inodes on the sb.
  You are right that writeback_if_idle probably isn't the right thing
ext4 needs. You are also right about the call in balance_dirty_pages but
that's harder to solve (as we definitely do not want to file a work item
in each balance_dirty_pages call).
 
> Long term I wonder if we shouldn't get rid of the work list entirely,
> and just have a few targets for the flusher thread - we just set a
> target and wake it up, and it keeps running until all targets are met.
  I cannot quite see how that would work when several threads want to
meet different targets. But I can imagine that background writeback
(and kupdate style which I belive should be basically wrapped into the
background) wouldn't have a work item in a list but whenever the flusher
thread has nothing else to do, it just writes things out until there are no
old inodes and we are below dirty_background_ratio.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm: Stop background writeback if there is other work queued for the thread
  2010-08-03 20:41     ` Jan Kara
@ 2010-08-06 12:28       ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2010-08-06 12:28 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Andrew Morton, Wu Fengguang, linux-fsdevel,
	Jens Axboe

> > I don't think this is quite correct.  If the only other callers was
> > fsync_filesystems that would be true, but we also have the odd
> > writeback_if_idle hack in ext4, and the bdi_start_writeback hack
> > in laptop mode. 
>   Yes, I'm aware of these. But do they really matter? bdi_start_writeback()
> hack is almost like background writeback (it's write-it-all thing) and so
> yielding to it does not change much. Similarly the writeback_if_idle thing
> (although that works only for a single sb which makes difference if there
> are several active filesystems on the bdi).
>   But regardless of what other work items actually do, I view background /
> kupdate writeback as the type of thing we do when noone else sends bdi
> dirty data to the backing device.  So if someone has actually idea what to
> write out, we give it priority and then return to doing our background /
> kupdate writeback. What do you think?

Makes sense as long as we keep our current queue with work items indeed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] mm: Fix writeback_in_progress()
  2010-08-03 20:55     ` Jan Kara
@ 2010-08-06 12:34       ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2010-08-06 12:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Andrew Morton, Wu Fengguang, linux-fsdevel,
	Jens Axboe

On Tue, Aug 03, 2010 at 10:55:34PM +0200, Jan Kara wrote:
> > Long term I wonder if we shouldn't get rid of the work list entirely,
> > and just have a few targets for the flusher thread - we just set a
> > target and wake it up, and it keeps running until all targets are met.
>   I cannot quite see how that would work when several threads want to
> meet different targets. But I can imagine that background writeback
> (and kupdate style which I belive should be basically wrapped into the
> background) wouldn't have a work item in a list but whenever the flusher
> thread has nothing else to do, it just writes things out until there are no
> old inodes and we are below dirty_background_ratio.

The targets would be per-bdi_writeback and we have exactly one thread
for each of those.  If we'll eventually need more than one thread per
filesystem we'll need to add multiple bdi_writeback structure.  E.g.
for XFS we could align them easily to allocation group boundaries.

Anyway, for kupdate we do in fact already have this target with
the older than this marker, and Wu's patches make it even more
seamlessly integrated.  For the background style writeback we already
have the over_bground_thresh in wb_writeback, so with a little
restructuring we can totally get rid of a work item for thes, too and
just wake up the flusher thread.  As you mention in the last mail the
laptop mode flush everthing code is just a special case of background
writeout with a dirty limit of zero, we could also converted that to
a target instead of submitting a work item.

The only special cases in this model are the per-sb calls.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-08-06 12:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-03 16:20 [PATCH 0/3] Small writeback code fixes and improvements Jan Kara
2010-08-03 16:20 ` [PATCH 1/3] mm: Stop background writeback if there is other work queued for the thread Jan Kara
2010-08-03 16:33   ` Christoph Hellwig
2010-08-03 20:41     ` Jan Kara
2010-08-06 12:28       ` Christoph Hellwig
2010-08-03 16:20 ` [PATCH 2/3] mm: Fix writeback_in_progress() Jan Kara
2010-08-03 16:36   ` Christoph Hellwig
2010-08-03 20:55     ` Jan Kara
2010-08-06 12:34       ` Christoph Hellwig
2010-08-03 16:20 ` [PATCH 3/3] mm: Avoid resetting wb_start after each writeback round Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).