linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] writeback: enabling-gate for light dirtied bdi
@ 2010-12-05  6:44 Wu Fengguang
  2010-12-05 14:04 ` Rik van Riel
  2010-12-08  0:51 ` [PATCH] writeback: enabling-gate for light dirtied bdi Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Wu Fengguang @ 2010-12-05  6:44 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton
  Cc: Theodore Ts'o, Chris Mason, Dave Chinner, Jan Kara,
	Jens Axboe, Mel Gorman, Rik van Riel, KOSAKI Motohiro,
	Christoph Hellwig, linux-mm, linux-fsdevel@vger.kernel.org, LKML

I noticed that my NFSROOT test system goes slow responding when there
is heavy dd to a local disk. Traces show that the NFSROOT's bdi_limit
is near 0 and many tasks in the system are repeatedly stuck in
balance_dirty_pages().

There are two related problems:

- light dirtiers at one device (more often than not the rootfs) get
  heavily impacted by heavy dirtiers on another independent device

- the light dirtied device does heavy throttling because bdi_limit=0,
  and the heavy throttling may in turn withhold its bdi_limit in 0 as
  it cannot dirty fast enough to grow up the bdi's proportional weight.

Fix it by introducing some "low pass" gate, which is a small (<=8MB)
value reserved by others and can be safely "stole" from the current
global dirty margin.  It does not need to be big to help the bdi gain
its initial weight.

CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---

Peter, I suspect this will do good for 2.6.37. Please help review, thanks!

 include/linux/writeback.h |    3 ++-
 mm/backing-dev.c          |    2 +-
 mm/page-writeback.c       |   23 +++++++++++++++++++++--
 3 files changed, 24 insertions(+), 4 deletions(-)

--- linux-next.orig/mm/page-writeback.c	2010-12-05 14:29:24.000000000 +0800
+++ linux-next/mm/page-writeback.c	2010-12-05 14:31:39.000000000 +0800
@@ -444,7 +444,9 @@ void global_dirty_limits(unsigned long *
  * The bdi's share of dirty limit will be adapting to its throughput and
  * bounded by the bdi->min_ratio and/or bdi->max_ratio parameters, if set.
  */
-unsigned long bdi_dirty_limit(struct backing_dev_info *bdi, unsigned long dirty)
+unsigned long bdi_dirty_limit(struct backing_dev_info *bdi,
+			      unsigned long dirty,
+			      unsigned long dirty_pages)
 {
 	u64 bdi_dirty;
 	long numerator, denominator;
@@ -459,6 +461,22 @@ unsigned long bdi_dirty_limit(struct bac
 	do_div(bdi_dirty, denominator);
 
 	bdi_dirty += (dirty * bdi->min_ratio) / 100;
+
+	/*
+	 * There is a chicken and egg problem: when bdi A (eg. /pub) is heavy
+	 * dirtied and bdi B (eg. /) is light dirtied hence has 0 dirty limit,
+	 * tasks writing to B always get heavily throttled and bdi B's dirty
+	 * limit may never be able to grow up from 0.
+	 *
+	 * So if we can dirty N more pages globally, honour N/2 to the bdi that
+	 * runs low. To provide such a global margin, we slightly decrease all
+	 * heavy dirtied bdi's limit.
+	 */
+	if (bdi_dirty < (dirty - dirty_pages) / 2 && dirty > dirty_pages)
+		bdi_dirty = (dirty - dirty_pages) / 2;
+	else
+		bdi_dirty -= min(bdi_dirty / 128, 8192ULL >> (PAGE_SHIFT-10));
+
 	if (bdi_dirty > (dirty * bdi->max_ratio) / 100)
 		bdi_dirty = dirty * bdi->max_ratio / 100;
 
@@ -508,7 +526,8 @@ static void balance_dirty_pages(struct a
 				(background_thresh + dirty_thresh) / 2)
 			break;
 
-		bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
+		bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh,
+					     nr_reclaimable + nr_writeback);
 		bdi_thresh = task_dirty_limit(current, bdi_thresh);
 
 		/*
--- linux-next.orig/mm/backing-dev.c	2010-12-05 14:29:23.000000000 +0800
+++ linux-next/mm/backing-dev.c	2010-12-05 14:30:00.000000000 +0800
@@ -83,7 +83,7 @@ static int bdi_debug_stats_show(struct s
 	spin_unlock(&inode_lock);
 
 	global_dirty_limits(&background_thresh, &dirty_thresh);
-	bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
+	bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh, dirty_thresh);
 
 #define K(x) ((x) << (PAGE_SHIFT - 10))
 	seq_printf(m,
--- linux-next.orig/include/linux/writeback.h	2010-12-05 14:29:24.000000000 +0800
+++ linux-next/include/linux/writeback.h	2010-12-05 14:30:00.000000000 +0800
@@ -126,7 +126,8 @@ int dirty_writeback_centisecs_handler(st
 
 void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty);
 unsigned long bdi_dirty_limit(struct backing_dev_info *bdi,
-			       unsigned long dirty);
+			       unsigned long dirty,
+			       unsigned long dirty_pages);
 
 void page_writeback_init(void);
 void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-12-08 15:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-05  6:44 [PATCH] writeback: enabling-gate for light dirtied bdi Wu Fengguang
2010-12-05 14:04 ` Rik van Riel
2010-12-07 13:11   ` [PATCH] writeback: safety margin for bdi stat errors Wu Fengguang
     [not found]     ` <20101207143351.GA23377@localhost>
2010-12-07 15:21       ` ext4 memory leak? Wu Fengguang
2010-12-07 16:38         ` Ted Ts'o
2010-12-08  2:40           ` Wu Fengguang
2010-12-08  3:07             ` Theodore Tso
2010-12-08  6:10               ` Wu Fengguang
2010-12-07 17:34     ` [PATCH] writeback: safety margin for bdi stat errors Rik van Riel
2010-12-08  0:51 ` [PATCH] writeback: enabling-gate for light dirtied bdi Andrew Morton
2010-12-08  4:04   ` Wu Fengguang
2010-12-08  4:30   ` [PATCH v2] " Wu Fengguang
2010-12-08  4:37     ` [PATCH v2] writeback: safety margin for bdi stat error Wu Fengguang
2010-12-08 15:31     ` [PATCH v2] writeback: enabling-gate for light dirtied bdi Wu Fengguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).