* [PATCH 1/5] writeback: Fix issue on make htmldocs
2011-12-05 6:22 writeback fixes for 3.2-rc5 Wu Fengguang
@ 2011-12-05 6:22 ` Wu Fengguang
2011-12-05 6:22 ` [PATCH 2/5] fs: Make write(2) interruptible by a fatal signal Wu Fengguang
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Wu Fengguang @ 2011-12-05 6:22 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Marcos Paulo de Souza, Wu Fengguang
From: Marcos Paulo de Souza <marcos.mage@gmail.com>
Document the @reason parameter to make "make htmldocs" happy.
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Marcos Paulo de Souza <marcos.mage@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
fs/fs-writeback.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 73c3992..ac86f8b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -156,6 +156,7 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
* bdi_start_writeback - start writeback
* @bdi: the backing device to write from
* @nr_pages: the number of pages to write
+ * @reason: reason why some writeback work was initiated
*
* Description:
* This does WB_SYNC_NONE opportunistic writeback. The IO is only
@@ -1223,6 +1224,7 @@ static void wait_sb_inodes(struct super_block *sb)
* writeback_inodes_sb_nr - writeback dirty inodes from given super_block
* @sb: the superblock
* @nr: the number of pages to write
+ * @reason: reason why some writeback work initiated
*
* Start writeback on some inodes on this super_block. No guarantees are made
* on how many (if any) will be written, and this function does not wait
@@ -1251,6 +1253,7 @@ EXPORT_SYMBOL(writeback_inodes_sb_nr);
/**
* writeback_inodes_sb - writeback dirty inodes from given super_block
* @sb: the superblock
+ * @reason: reason why some writeback work was initiated
*
* Start writeback on some inodes on this super_block. No guarantees are made
* on how many (if any) will be written, and this function does not wait
@@ -1265,6 +1268,7 @@ EXPORT_SYMBOL(writeback_inodes_sb);
/**
* writeback_inodes_sb_if_idle - start writeback if none underway
* @sb: the superblock
+ * @reason: reason why some writeback work was initiated
*
* Invoke writeback_inodes_sb if no writeback is currently underway.
* Returns 1 if writeback was started, 0 if not.
@@ -1285,6 +1289,7 @@ EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
* writeback_inodes_sb_if_idle - start writeback if none underway
* @sb: the superblock
* @nr: the number of pages to write
+ * @reason: reason why some writeback work was initiated
*
* Invoke writeback_inodes_sb if no writeback is currently underway.
* Returns 1 if writeback was started, 0 if not.
--
1.7.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/5] fs: Make write(2) interruptible by a fatal signal
2011-12-05 6:22 writeback fixes for 3.2-rc5 Wu Fengguang
2011-12-05 6:22 ` [PATCH 1/5] writeback: Fix issue on make htmldocs Wu Fengguang
@ 2011-12-05 6:22 ` Wu Fengguang
2011-12-05 6:22 ` [PATCH 3/5] writeback: comment on the bdi dirty threshold Wu Fengguang
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Wu Fengguang @ 2011-12-05 6:22 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Jan Kara, Wu Fengguang
From: Jan Kara <jack@suse.cz>
Currently write(2) to a file is not interruptible by any signal.
Sometimes this is desirable, e.g. when you want to quickly kill a
process hogging your disk. Also, with commit 499d05ecf990 ("mm: Make
task in balance_dirty_pages() killable"), it's necessary to abort the
current write accordingly to avoid it quickly dirtying lots more pages
at unthrottled rate.
This patch makes write interruptible by SIGKILL. We do not allow write
to be interruptible by any other signal because that has larger
potential of screwing some badly written applications.
Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Acked-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/filemap.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index c0018f2..c106d3b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2407,7 +2407,6 @@ static ssize_t generic_perform_write(struct file *file,
iov_iter_count(i));
again:
-
/*
* Bring in the user page that we will copy from _first_.
* Otherwise there's a nasty deadlock on copying from the
@@ -2463,7 +2462,10 @@ again:
written += copied;
balance_dirty_pages_ratelimited(mapping);
-
+ if (fatal_signal_pending(current)) {
+ status = -EINTR;
+ break;
+ }
} while (iov_iter_count(i));
return written ? written : status;
--
1.7.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/5] writeback: comment on the bdi dirty threshold
2011-12-05 6:22 writeback fixes for 3.2-rc5 Wu Fengguang
2011-12-05 6:22 ` [PATCH 1/5] writeback: Fix issue on make htmldocs Wu Fengguang
2011-12-05 6:22 ` [PATCH 2/5] fs: Make write(2) interruptible by a fatal signal Wu Fengguang
@ 2011-12-05 6:22 ` Wu Fengguang
2011-12-05 6:22 ` [PATCH 4/5] writeback: permit through good bdi even when global dirty exceeded Wu Fengguang
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Wu Fengguang @ 2011-12-05 6:22 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Wu Fengguang
We do "floating proportions" to let active devices to grow its target
share of dirty pages and stalled/inactive devices to decrease its target
share over time.
It works well except in the case of "an inactive disk suddenly goes
busy", where the initial target share may be too small. To mitigate
this, bdi_position_ratio() has the below line to raise a small
bdi_thresh when it's safe to do so, so that the disk be feed with enough
dirty pages for efficient IO and in turn fast rampup of bdi_thresh:
bdi_thresh = max(bdi_thresh, (limit - dirty) / 8);
balance_dirty_pages() normally does negative feedback control which
adjusts ratelimit to balance the bdi dirty pages around the target.
In some extreme cases when that is not enough, it will have to block
the tasks completely until the bdi dirty pages drop below bdi_thresh.
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/page-writeback.c | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7125248..155efca 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -411,8 +411,13 @@ void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty)
*
* Returns @bdi's dirty limit in pages. The term "dirty" in the context of
* dirty balancing includes all PG_dirty, PG_writeback and NFS unstable pages.
- * And the "limit" in the name is not seriously taken as hard limit in
- * balance_dirty_pages().
+ *
+ * Note that balance_dirty_pages() will only seriously take it as a hard limit
+ * when sleeping max_pause per page is not enough to keep the dirty pages under
+ * control. For example, when the device is completely stalled due to some error
+ * conditions, or when there are 1000 dd tasks writing to a slow 10MB/s USB key.
+ * In the other normal situations, it acts more gently by throttling the tasks
+ * more (rather than completely block them) when the bdi dirty pages go high.
*
* It allocates high/low dirty limits to fast/slow devices, in order to prevent
* - starving fast devices
@@ -594,6 +599,13 @@ static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
*/
if (unlikely(bdi_thresh > thresh))
bdi_thresh = thresh;
+ /*
+ * It's very possible that bdi_thresh is close to 0 not because the
+ * device is slow, but that it has remained inactive for long time.
+ * Honour such devices a reasonable good (hopefully IO efficient)
+ * threshold, so that the occasional writes won't be blocked and active
+ * writes can rampup the threshold quickly.
+ */
bdi_thresh = max(bdi_thresh, (limit - dirty) / 8);
/*
* scale global setpoint to bdi's:
--
1.7.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/5] writeback: permit through good bdi even when global dirty exceeded
2011-12-05 6:22 writeback fixes for 3.2-rc5 Wu Fengguang
` (2 preceding siblings ...)
2011-12-05 6:22 ` [PATCH 3/5] writeback: comment on the bdi dirty threshold Wu Fengguang
@ 2011-12-05 6:22 ` Wu Fengguang
2011-12-05 6:22 ` [PATCH 5/5] writeback: set max_pause to lowest value on zero bdi_dirty Wu Fengguang
[not found] ` <20111212102947.GA6731@localhost>
5 siblings, 0 replies; 9+ messages in thread
From: Wu Fengguang @ 2011-12-05 6:22 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Wu Fengguang
On a system with 1 local mount and 1 NFS mount, if the NFS server
becomes not responding when dd to the NFS mount, the NFS dirty pages may
exceed the global dirty limit and _every_ task involving writing will be
blocked. The whole system appears unresponsive.
The workaround is to permit through the bdi's that only has a small
number of dirty pages. The number chosen (bdi_stat_error pages) is not
enough to enable the local disk to run in optimal throughput, however is
enough to make the system responsive on a broken NFS mount. The user can
then kill the dirtiers on the NFS mount and increase the global dirty
limit to bring up the local disk's throughput.
It risks allowing dirty pages to grow much larger than the global dirty
limit when there are 1000+ mounts, however that's very unlikely to happen,
especially in low memory profiles.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/page-writeback.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 155efca..17403e3 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1148,6 +1148,19 @@ pause:
if (task_ratelimit)
break;
+ /*
+ * In the case of an unresponding NFS server and the NFS dirty
+ * pages exceeds dirty_thresh, give the other good bdi's a pipe
+ * to go through, so that tasks on them still remain responsive.
+ *
+ * In theory 1 page is enough to keep the comsumer-producer
+ * pipe going: the flusher cleans 1 page => the task dirties 1
+ * more page. However bdi_dirty has accounting errors. So use
+ * the larger and more IO friendly bdi_stat_error.
+ */
+ if (bdi_dirty <= bdi_stat_error(bdi))
+ break;
+
if (fatal_signal_pending(current))
break;
}
--
1.7.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/5] writeback: set max_pause to lowest value on zero bdi_dirty
2011-12-05 6:22 writeback fixes for 3.2-rc5 Wu Fengguang
` (3 preceding siblings ...)
2011-12-05 6:22 ` [PATCH 4/5] writeback: permit through good bdi even when global dirty exceeded Wu Fengguang
@ 2011-12-05 6:22 ` Wu Fengguang
[not found] ` <20111212102947.GA6731@localhost>
5 siblings, 0 replies; 9+ messages in thread
From: Wu Fengguang @ 2011-12-05 6:22 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Wu Fengguang
Some trace shows lots of bdi_dirty=0 lines where it's actually some
small value if w/o the accounting errors in the per-cpu bdi stats.
In this case the max pause time should really be set to the smallest
(non-zero) value to avoid IO queue underrun and improve throughput.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/page-writeback.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 17403e3..50f0824 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -989,8 +989,7 @@ static unsigned long bdi_max_pause(struct backing_dev_info *bdi,
*
* 8 serves as the safety ratio.
*/
- if (bdi_dirty)
- t = min(t, bdi_dirty * HZ / (8 * bw + 1));
+ t = min(t, bdi_dirty * HZ / (8 * bw + 1));
/*
* The pause time will be settled within range (max_pause/4, max_pause).
--
1.7.7.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <20111212102947.GA6731@localhost>]