From: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
To: Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org
Cc: Bill Davidsen <davidsen@tmr.com>,
yumiko.sugita.yf@hitachi.com, masami.hiramatsu.pt@hitachi.com,
hidehiro.kawai.ez@hitachi.com, yuji.kakutani.uw@hitachi.com,
soshima@redhat.com, haoki@redhat.com
Subject: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio
Date: Tue, 03 Apr 2007 19:46:04 +0900 [thread overview]
Message-ID: <4612306C.2040600@hitachi.com> (raw)
In-Reply-To: <4607FECD.1090101@tmr.com>
This patchset is to avoid the problem that write(2) can be blocked for a
long time if a system has several disks with different speed and is
under heavy I/O pressure.
-Description of the problem:
While Dirty+Writeback pages get more than 40%(`dirty_ratio') of memory,
generators of dirty pages are blocked in balance_dirty_pages() until
they start writeback of a specific number (`write_chunk', typically=1536)
of dirty pages on the disks they write to.
Under this rule, if a process writes to the disk which has only a few
(less than 1536) dirty pages, that process will be blocked until
writeback of the other disks is completed and % of Dirty+Writeback goes
below 40%.
Thus, if a slow device (such as a USB disk) has many dirty pages, the
processes which write small data to the other disks can be blocked for
quite a long time.
-Solution:
This patch introduces high/low-watermark algorithm in
balance_dirty_pages() in order to throttle only the processes which
write to disks with heavy load.
This patch adds `dirty_start_writeback_ratio' for the low-watermark,
and modifies get_dirty_limits() to calculate and return the writeback
starting level of dirty pages based on `dirty_start_writeback_ratio'.
If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of
dirty pages start writeback of dirty pages by themselves. At that time,
these processes are not blocked in balance_dirty_pages(), but they may
be blocked if the write-requests-queue of the written disk is full
(that is, the length of the queue > `nr_requests'). By this behavior,
we can throttle only processes which write to the disks with heavy load,
and can allow processes to write to the other disks without blocking.
If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages
are throttled as current Linux does, not to fill up memory with dirty
pages.
Thanks,
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
---
include/linux/writeback.h | 1
mm/page-writeback.c | 52 ++++++++++++++++++++++++++++++++++++----------
2 files changed, 42 insertions(+), 11 deletions(-)
Index: linux-2.6.21-rc5-mm3-writeback/include/linux/writeback.h
===================================================================
--- linux-2.6.21-rc5-mm3-writeback.orig/include/linux/writeback.h
+++ linux-2.6.21-rc5-mm3-writeback/include/linux/writeback.h
@@ -94,6 +94,7 @@ static inline int laptop_spinned_down(vo
/* These are exported to sysctl. */
extern int dirty_background_ratio;
+extern int dirty_start_writeback_ratio;
extern int vm_dirty_ratio;
extern int dirty_writeback_interval;
extern int dirty_expire_interval;
Index: linux-2.6.21-rc5-mm3-writeback/mm/page-writeback.c
===================================================================
--- linux-2.6.21-rc5-mm3-writeback.orig/mm/page-writeback.c
+++ linux-2.6.21-rc5-mm3-writeback/mm/page-writeback.c
@@ -72,6 +72,11 @@ int dirty_background_ratio = 10;
/*
* The generator of dirty data starts writeback at this percentage
*/
+int dirty_start_writeback_ratio = 35;
+
+/*
+ * The generator of dirty data is blocked at this percentage
+ */
int vm_dirty_ratio = 40;
/*
@@ -112,12 +117,16 @@ static void background_writeout(unsigned
* performing lots of scanning.
*
* We only allow 1/2 of the currently-unmapped memory to be dirtied.
+ * `vm.dirty_ratio' is ignored if it is larger than that.
+ * In this case, `vm.dirty_start_writeback_ratio' is also decreased to keep
+ * writeback independently among disks.
*
* We don't permit the clamping level to fall below 5% - that is getting rather
* excessive.
*
- * We make sure that the background writeout level is below the adjusted
- * clamping level.
+ * We make sure that the active writeout level is below the adjusted clamping
+ * leve, and that the background writeout level is below the active writeout
+ * level.
*/
static unsigned long highmem_dirtyable_memory(unsigned long total)
@@ -158,13 +167,15 @@ static unsigned long determine_dirtyable
}
static void
-get_dirty_limits(long *pbackground, long *pdirty,
+get_dirty_limits(long *pbackground, long *pstart_writeback, long *pdirty,
struct address_space *mapping)
{
int background_ratio; /* Percentages */
+ int start_writeback_ratio;
int dirty_ratio;
int unmapped_ratio;
long background;
+ long start_writeback;
long dirty;
unsigned long available_memory = determine_dirtyable_memory();
struct task_struct *tsk;
@@ -177,28 +188,40 @@ get_dirty_limits(long *pbackground, long
if (dirty_ratio > unmapped_ratio / 2)
dirty_ratio = unmapped_ratio / 2;
+ start_writeback_ratio = dirty_start_writeback_ratio;
+ if (start_writeback_ratio > dirty_ratio)
+ start_writeback_ratio = dirty_ratio;
+ start_writeback_ratio -= vm_dirty_ratio - dirty_ratio;
+
if (dirty_ratio < 5)
dirty_ratio = 5;
+ if (start_writeback_ratio < 2)
+ start_writeback_ratio = 2;
background_ratio = dirty_background_ratio;
- if (background_ratio >= dirty_ratio)
- background_ratio = dirty_ratio / 2;
+ if (background_ratio >= start_writeback_ratio)
+ background_ratio = start_writeback_ratio / 2;
background = (background_ratio * available_memory) / 100;
+ start_writeback = (start_writeback_ratio * available_memory) / 100;
dirty = (dirty_ratio * available_memory) / 100;
tsk = current;
if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
background += background / 4;
+ start_writeback += start_writeback / 4;
dirty += dirty / 4;
}
*pbackground = background;
+ *pstart_writeback = start_writeback;
*pdirty = dirty;
}
/*
* balance_dirty_pages() must be called by processes which are generating dirty
* data. It looks at the number of dirty pages in the machine and will force
- * the caller to perform writeback if the system is over `vm_dirty_ratio'.
+ * the caller to perform writeback if the system is over
+ * `start_writeback_thresh'. If the system is over `dirty_thresh' then the
+ * caller will be blocked unless it cannot writeback enough pages.
* If we're over `background_thresh' then pdflush is woken to perform some
* writeout.
*/
@@ -206,6 +229,7 @@ static void balance_dirty_pages(struct a
{
long nr_reclaimable;
long background_thresh;
+ long start_writeback_thresh;
long dirty_thresh;
unsigned long pages_written = 0;
unsigned long write_chunk = sync_writeback_pages();
@@ -221,11 +245,12 @@ static void balance_dirty_pages(struct a
.range_cyclic = 1,
};
- get_dirty_limits(&background_thresh, &dirty_thresh, mapping);
+ get_dirty_limits(&background_thresh, &start_writeback_thresh,
+ &dirty_thresh, mapping);
nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_UNSTABLE_NFS);
if (nr_reclaimable + global_page_state(NR_WRITEBACK) <=
- dirty_thresh)
+ start_writeback_thresh)
break;
if (!dirty_exceeded)
@@ -240,7 +265,8 @@ static void balance_dirty_pages(struct a
if (nr_reclaimable) {
writeback_inodes(&wbc);
get_dirty_limits(&background_thresh,
- &dirty_thresh, mapping);
+ &start_writeback_thresh,
+ &dirty_thresh, mapping);
nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_UNSTABLE_NFS);
if (nr_reclaimable +
@@ -329,6 +355,7 @@ EXPORT_SYMBOL(balance_dirty_pages_rateli
void throttle_vm_writeout(gfp_t gfp_mask)
{
long background_thresh;
+ long writeback_thresh;
long dirty_thresh;
if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
@@ -342,7 +369,8 @@ void throttle_vm_writeout(gfp_t gfp_mask
}
for ( ; ; ) {
- get_dirty_limits(&background_thresh, &dirty_thresh, NULL);
+ get_dirty_limits(&background_thresh, &writeback_thresh,
+ &dirty_thresh, NULL);
/*
* Boost the allowable dirty threshold a bit for page
@@ -375,9 +403,11 @@ static void background_writeout(unsigned
for ( ; ; ) {
long background_thresh;
+ long writeback_thresh;
long dirty_thresh;
- get_dirty_limits(&background_thresh, &dirty_thresh, NULL);
+ get_dirty_limits(&background_thresh, &writeback_thresh,
+ &dirty_thresh, NULL);
if (global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_UNSTABLE_NFS) < background_thresh
&& min_pages <= 0)
next prev parent reply other threads:[~2007-04-03 10:50 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-14 12:42 [PATCH 0/3] VM throttling: avoid blocking occasional writers Tomoki Sekiyama
2007-03-14 13:18 ` Peter Zijlstra
2007-03-15 19:07 ` Andrew Morton
2007-03-18 14:59 ` Bill Davidsen
2007-03-22 5:49 ` Tomoki Sekiyama
2007-03-22 11:41 ` Bill Davidsen
2007-03-26 10:27 ` Tomoki Sekiyama
2007-03-26 17:11 ` Bill Davidsen
2007-04-03 10:42 ` Tomoki Sekiyama
2007-04-03 10:46 ` Tomoki Sekiyama [this message]
2007-04-06 0:31 ` [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio Andrew Morton
2007-04-10 3:04 ` Tomoki Sekiyama
2007-04-10 3:46 ` Andrew Morton
2007-04-03 10:47 ` [PATCH 2/2] VM throttling: Add vm.dirty_start_writeback_ratio to sysctl Tomoki Sekiyama
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4612306C.2040600@hitachi.com \
--to=tomoki.sekiyama.qu@hitachi.com \
--cc=akpm@linux-foundation.org \
--cc=davidsen@tmr.com \
--cc=haoki@redhat.com \
--cc=hidehiro.kawai.ez@hitachi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=masami.hiramatsu.pt@hitachi.com \
--cc=soshima@redhat.com \
--cc=yuji.kakutani.uw@hitachi.com \
--cc=yumiko.sugita.yf@hitachi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox