linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Cc: Jan Kara <jack@suse.cz>, Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Theodore Tso <tytso@mit.edu>,
	Chris Mason <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Wu Fengguang <fengguang.wu@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 6/7] writeback: scale IO chunk size up to half device bandwidth
Date: Sun, 19 Jun 2011 23:01:14 +0800	[thread overview]
Message-ID: <20110619150510.600942700@intel.com> (raw)
In-Reply-To: 20110619150108.691351746@intel.com

[-- Attachment #1: writeback-128M-MAX_WRITEBACK_PAGES.patch --]
[-- Type: text/plain, Size: 4158 bytes --]

Originally, MAX_WRITEBACK_PAGES was hard-coded to 1024 because of a
concern of not holding I_SYNC for too long.  (At least, that was the
comment previously.)  This doesn't make sense now because the only
time we wait for I_SYNC is if we are calling sync or fsync, and in
that case we need to write out all of the data anyway.  Previously
there may have been other code paths that waited on I_SYNC, but not
any more.					    -- Theodore Ts'o

So remove the MAX_WRITEBACK_PAGES constraint. The writeback pages
will adapt to as large as the storage device can write within 500ms.

XFS is observed to do IO completions in a batch, and the batch size is
equal to the write chunk size. To avoid dirty pages to suddenly drop
out of balance_dirty_pages()'s dirty control scope and create large
fluctuations, the chunk size is also limited to half the control scope.

The balance_dirty_pages() control scrope is

	[(background_thresh + dirty_thresh) / 2, dirty_thresh]

which is by default [15%, 20%] * dirty_pages, with the size
dirty_thresh / DIRTY_FULL_SCOPE.

The adpative write chunk size will be rounded to the nearest 4MB
boundary.

http://bugzilla.kernel.org/show_bug.cgi?id=13930

CC: Theodore Ts'o <tytso@mit.edu>
CC: Dave Chinner <david@fromorbit.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c         |   23 ++++++++++-------------
 include/linux/writeback.h |   11 +++++++++++
 2 files changed, 21 insertions(+), 13 deletions(-)

--- linux-next.orig/include/linux/writeback.h	2011-06-16 22:17:10.000000000 +0800
+++ linux-next/include/linux/writeback.h	2011-06-16 22:17:27.000000000 +0800
@@ -8,6 +8,10 @@
 #include <linux/fs.h>
 
 /*
+ * The 1/4 region under the global dirty thresh is for smooth dirty throttling:
+ *
+ *		(thresh - thresh/DIRTY_FULL_SCOPE, thresh)
+ *
  * The 1/16 region above the global dirty limit will be put to maximum pauses:
  *
  *		(limit, limit + limit/DIRTY_MAXPAUSE)
@@ -25,9 +29,16 @@
  * knocks down the global dirty threshold quickly, in which case the global
  * dirty limit will follow down slowly to prevent livelocking all dirtier tasks.
  */
+#define DIRTY_SCOPE		8
+#define DIRTY_FULL_SCOPE	(DIRTY_SCOPE / 2)
 #define DIRTY_MAXPAUSE		16
 #define DIRTY_PASSGOOD		8
 
+/*
+ * 4MB minimal write chunk size
+ */
+#define MIN_WRITEBACK_PAGES	(4096UL >> (PAGE_CACHE_SHIFT - 10))
+
 struct backing_dev_info;
 
 /*
--- linux-next.orig/fs/fs-writeback.c	2011-06-16 22:17:10.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-06-16 22:17:11.000000000 +0800
@@ -30,15 +30,6 @@
 #include "internal.h"
 
 /*
- * The maximum number of pages to writeout in a single bdi flush/kupdate
- * operation.  We do this so we don't hold I_SYNC against an inode for
- * enormous amounts of time, which would block a userspace task which has
- * been forced to throttle against that inode.  Also, the code reevaluates
- * the dirty each time it has written this many pages.
- */
-#define MAX_WRITEBACK_PAGES     1024L
-
-/*
  * Passed into wb_writeback(), essentially a subset of writeback_control
  */
 struct wb_writeback_work {
@@ -515,7 +506,8 @@ static bool pin_sb_for_writeback(struct 
 	return false;
 }
 
-static long writeback_chunk_size(struct wb_writeback_work *work)
+static long writeback_chunk_size(struct backing_dev_info *bdi,
+				 struct wb_writeback_work *work)
 {
 	long pages;
 
@@ -534,8 +526,13 @@ static long writeback_chunk_size(struct 
 	 */
 	if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages)
 		pages = LONG_MAX;
-	else
-		pages = min(MAX_WRITEBACK_PAGES, work->nr_pages);
+	else {
+		pages = min(bdi->avg_write_bandwidth / 2,
+			    global_dirty_limit / DIRTY_SCOPE);
+		pages = min(pages, work->nr_pages);
+		pages = round_down(pages + MIN_WRITEBACK_PAGES,
+				   MIN_WRITEBACK_PAGES);
+	}
 
 	return pages;
 }
@@ -600,7 +597,7 @@ static long writeback_sb_inodes(struct s
 			continue;
 		}
 		__iget(inode);
-		write_chunk = writeback_chunk_size(work);
+		write_chunk = writeback_chunk_size(wb->bdi, work);
 		wbc.nr_to_write = write_chunk;
 		wbc.pages_skipped = 0;
 



  parent reply	other threads:[~2011-06-19 15:27 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-19 15:01 [PATCH 0/7] more writeback patches for 3.1 Wu Fengguang
2011-06-19 15:01 ` [PATCH 1/7] writeback: consolidate variable names in balance_dirty_pages() Wu Fengguang
2011-06-20  7:45   ` Christoph Hellwig
2011-06-19 15:01 ` [PATCH 2/7] writeback: add parameters to __bdi_update_bandwidth() Wu Fengguang
2011-06-19 15:31   ` Christoph Hellwig
2011-06-19 15:35     ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 3/7] writeback: introduce smoothed global dirty limit Wu Fengguang
2011-06-19 15:36   ` Christoph Hellwig
2011-06-19 15:55     ` Wu Fengguang
2011-06-21 23:59       ` Andrew Morton
2011-06-22 14:11         ` Wu Fengguang
2011-06-20 21:18   ` Jan Kara
2011-06-21 14:24     ` Wu Fengguang
2011-06-22  0:04   ` Andrew Morton
2011-06-22 14:24     ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 4/7] writeback: introduce max-pause and pass-good dirty limits Wu Fengguang
2011-06-22  0:20   ` Andrew Morton
2011-06-23 13:18     ` Wu Fengguang
2011-06-19 15:01 ` [PATCH 5/7] writeback: make writeback_control.nr_to_write straight Wu Fengguang
2011-06-19 15:35   ` Christoph Hellwig
2011-06-19 16:14     ` Wu Fengguang
2011-06-19 15:01 ` Wu Fengguang [this message]
2011-06-19 15:01 ` [PATCH 7/7] writeback: timestamp based bdi dirty_exceeded state Wu Fengguang
2011-06-20 20:09   ` Christoph Hellwig
2011-06-21 10:00     ` Steven Whitehouse
2011-06-20 21:38   ` Jan Kara
2011-06-21 15:07     ` Wu Fengguang
2011-06-21 21:14       ` Jan Kara
2011-06-22 14:37         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110619150510.600942700@intel.com \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).