From: Wu Fengguang <fengguang.wu@intel.com>
To: Martin Bligh <mbligh@google.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Michael Rubin <mrubin@google.com>,
"sandeen@redhat.com" <sandeen@redhat.com>,
Michael Davidson <md@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: Bug in kernel 2.6.31, Slow wb_kupdate writeout
Date: Sat, 1 Aug 2009 10:02:28 +0800 [thread overview]
Message-ID: <20090801020228.GA6542@localhost> (raw)
In-Reply-To: <33307c790907301255j136e003dtac0e4ba2032e890e@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]
On Fri, Jul 31, 2009 at 03:55:44AM +0800, Martin Bligh wrote:
> > Note that this is a simple fix that may have suboptimal write performance.
> > Here is an old reasoning:
> >
> > http://lkml.org/lkml/2009/3/28/235
>
> The other thing I've been experimenting with is to disable the per-page
> check in write_cache_pages, ie:
>
> if (wbc->nonblocking && bdi_write_congested(bdi)) {
> wb_stats_inc(WB_STATS_WCP_SECTION_CONG);
> wbc->encountered_congestion = 1;
> /* done = 1; */
>
> This treats the congestion limits as soft, but encourages us to write
> back in larger, more efficient chunks. If that's not going to scare
> people unduly, I can submit that as well.
This risks hitting the hard limit (nr_requests), and block everyone,
including the ones with higher priority (ie. kswapd).
On the other hand, the simple fix in previous mails won't necessarily
act too sub-optimal. It's only a potential one. There is a window of
(1/16)*(nr_requests)*request_size (= 128*256KB/16 = 4MB) between
congestion-on and congestion-off states. So for the best we can inject
a big 4MB chunk into the async write queue once it becomes uncongested.
I have a writeback debug patch that can help find out how
that works out in your real world workloads (by monitoring
nr_to_write). You can also try doubling the ratio (1/16) in
blk_queue_congestion_threshold(), to see how an increased
congestion-on-off window may help.
Thanks,
Fengguang
[-- Attachment #2: writeback-debug-2.6.31.patch --]
[-- Type: text/x-diff, Size: 2713 bytes --]
mm/page-writeback.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
--- sound-2.6.orig/mm/page-writeback.c
+++ sound-2.6/mm/page-writeback.c
@@ -116,6 +116,33 @@ EXPORT_SYMBOL(laptop_mode);
/* End of sysctl-exported parameters */
+#define writeback_debug_report(n, wbc) do { \
+ __writeback_debug_report(n, wbc, __FILE__, __LINE__, __FUNCTION__); \
+} while (0)
+
+void print_writeback_control(struct writeback_control *wbc)
+{
+ printk(KERN_DEBUG
+ "global dirty %lu writeback %lu nfs %lu "
+ "flags %c%c towrite %ld skipped %ld\n",
+ global_page_state(NR_FILE_DIRTY),
+ global_page_state(NR_WRITEBACK),
+ global_page_state(NR_UNSTABLE_NFS),
+ wbc->encountered_congestion ? 'C':'_',
+ wbc->more_io ? 'M':'_',
+ wbc->nr_to_write,
+ wbc->pages_skipped);
+}
+
+void __writeback_debug_report(long n, struct writeback_control *wbc,
+ const char *file, int line, const char *func)
+{
+ printk(KERN_DEBUG "%s %d %s: %s(%d) %ld\n",
+ file, line, func,
+ current->comm, current->pid,
+ n);
+ print_writeback_control(wbc);
+}
static void background_writeout(unsigned long _min_pages);
@@ -550,6 +577,7 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
get_dirty_limits(&background_thresh, &dirty_thresh,
&bdi_thresh, bdi);
+ writeback_debug_report(pages_written, &wbc);
}
/*
@@ -576,6 +604,7 @@ static void balance_dirty_pages(struct a
break; /* We've done our duty */
congestion_wait(BLK_RW_ASYNC, HZ/10);
+ writeback_debug_report(-pages_written, &wbc);
}
if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
@@ -670,6 +699,11 @@ void throttle_vm_writeout(gfp_t gfp_mask
global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
congestion_wait(BLK_RW_ASYNC, HZ/10);
+ printk(KERN_DEBUG "throttle_vm_writeout: "
+ "congestion_wait on %lu+%lu > %lu\n",
+ global_page_state(NR_UNSTABLE_NFS),
+ global_page_state(NR_WRITEBACK),
+ dirty_thresh);
/*
* The caller might hold locks which can prevent IO completion
@@ -719,7 +753,9 @@ static void background_writeout(unsigned
else
break;
}
+ writeback_debug_report(min_pages, &wbc);
}
+ writeback_debug_report(min_pages, &wbc);
}
/*
@@ -792,7 +828,9 @@ static void wb_kupdate(unsigned long arg
break; /* All the old data is written */
}
nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
+ writeback_debug_report(nr_to_write, &wbc);
}
+ writeback_debug_report(nr_to_write, &wbc);
if (time_before(next_jif, jiffies + HZ))
next_jif = jiffies + HZ;
if (dirty_writeback_interval)
WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Martin Bligh <mbligh@google.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Michael Rubin <mrubin@google.com>,
"sandeen@redhat.com" <sandeen@redhat.com>,
Michael Davidson <md@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: Bug in kernel 2.6.31, Slow wb_kupdate writeout
Date: Sat, 1 Aug 2009 10:02:28 +0800 [thread overview]
Message-ID: <20090801020228.GA6542@localhost> (raw)
In-Reply-To: <33307c790907301255j136e003dtac0e4ba2032e890e@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]
On Fri, Jul 31, 2009 at 03:55:44AM +0800, Martin Bligh wrote:
> > Note that this is a simple fix that may have suboptimal write performance.
> > Here is an old reasoning:
> >
> > A A A A http://lkml.org/lkml/2009/3/28/235
>
> The other thing I've been experimenting with is to disable the per-page
> check in write_cache_pages, ie:
>
> if (wbc->nonblocking && bdi_write_congested(bdi)) {
> wb_stats_inc(WB_STATS_WCP_SECTION_CONG);
> wbc->encountered_congestion = 1;
> /* done = 1; */
>
> This treats the congestion limits as soft, but encourages us to write
> back in larger, more efficient chunks. If that's not going to scare
> people unduly, I can submit that as well.
This risks hitting the hard limit (nr_requests), and block everyone,
including the ones with higher priority (ie. kswapd).
On the other hand, the simple fix in previous mails won't necessarily
act too sub-optimal. It's only a potential one. There is a window of
(1/16)*(nr_requests)*request_size (= 128*256KB/16 = 4MB) between
congestion-on and congestion-off states. So for the best we can inject
a big 4MB chunk into the async write queue once it becomes uncongested.
I have a writeback debug patch that can help find out how
that works out in your real world workloads (by monitoring
nr_to_write). You can also try doubling the ratio (1/16) in
blk_queue_congestion_threshold(), to see how an increased
congestion-on-off window may help.
Thanks,
Fengguang
[-- Attachment #2: writeback-debug-2.6.31.patch --]
[-- Type: text/x-diff, Size: 2713 bytes --]
mm/page-writeback.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
--- sound-2.6.orig/mm/page-writeback.c
+++ sound-2.6/mm/page-writeback.c
@@ -116,6 +116,33 @@ EXPORT_SYMBOL(laptop_mode);
/* End of sysctl-exported parameters */
+#define writeback_debug_report(n, wbc) do { \
+ __writeback_debug_report(n, wbc, __FILE__, __LINE__, __FUNCTION__); \
+} while (0)
+
+void print_writeback_control(struct writeback_control *wbc)
+{
+ printk(KERN_DEBUG
+ "global dirty %lu writeback %lu nfs %lu "
+ "flags %c%c towrite %ld skipped %ld\n",
+ global_page_state(NR_FILE_DIRTY),
+ global_page_state(NR_WRITEBACK),
+ global_page_state(NR_UNSTABLE_NFS),
+ wbc->encountered_congestion ? 'C':'_',
+ wbc->more_io ? 'M':'_',
+ wbc->nr_to_write,
+ wbc->pages_skipped);
+}
+
+void __writeback_debug_report(long n, struct writeback_control *wbc,
+ const char *file, int line, const char *func)
+{
+ printk(KERN_DEBUG "%s %d %s: %s(%d) %ld\n",
+ file, line, func,
+ current->comm, current->pid,
+ n);
+ print_writeback_control(wbc);
+}
static void background_writeout(unsigned long _min_pages);
@@ -550,6 +577,7 @@ static void balance_dirty_pages(struct a
pages_written += write_chunk - wbc.nr_to_write;
get_dirty_limits(&background_thresh, &dirty_thresh,
&bdi_thresh, bdi);
+ writeback_debug_report(pages_written, &wbc);
}
/*
@@ -576,6 +604,7 @@ static void balance_dirty_pages(struct a
break; /* We've done our duty */
congestion_wait(BLK_RW_ASYNC, HZ/10);
+ writeback_debug_report(-pages_written, &wbc);
}
if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
@@ -670,6 +699,11 @@ void throttle_vm_writeout(gfp_t gfp_mask
global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
congestion_wait(BLK_RW_ASYNC, HZ/10);
+ printk(KERN_DEBUG "throttle_vm_writeout: "
+ "congestion_wait on %lu+%lu > %lu\n",
+ global_page_state(NR_UNSTABLE_NFS),
+ global_page_state(NR_WRITEBACK),
+ dirty_thresh);
/*
* The caller might hold locks which can prevent IO completion
@@ -719,7 +753,9 @@ static void background_writeout(unsigned
else
break;
}
+ writeback_debug_report(min_pages, &wbc);
}
+ writeback_debug_report(min_pages, &wbc);
}
/*
@@ -792,7 +828,9 @@ static void wb_kupdate(unsigned long arg
break; /* All the old data is written */
}
nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
+ writeback_debug_report(nr_to_write, &wbc);
}
+ writeback_debug_report(nr_to_write, &wbc);
if (time_before(next_jif, jiffies + HZ))
next_jif = jiffies + HZ;
if (dirty_writeback_interval)
next prev parent reply other threads:[~2009-08-01 2:02 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-28 19:11 Bug in kernel 2.6.31, Slow wb_kupdate writeout Chad Talbott
2009-07-28 19:11 ` Chad Talbott
2009-07-28 21:49 ` Martin Bligh
2009-07-28 21:49 ` Martin Bligh
2009-07-29 7:15 ` Martin Bligh
2009-07-29 7:15 ` Martin Bligh
2009-07-29 11:43 ` Wu Fengguang
2009-07-29 11:43 ` Wu Fengguang
2009-07-29 14:11 ` Martin Bligh
2009-07-29 14:11 ` Martin Bligh
2009-07-30 1:06 ` Wu Fengguang
2009-07-30 1:06 ` Wu Fengguang
2009-07-30 1:12 ` Martin Bligh
2009-07-30 1:12 ` Martin Bligh
2009-07-30 1:57 ` Wu Fengguang
2009-07-30 1:57 ` Wu Fengguang
2009-07-30 2:59 ` Martin Bligh
2009-07-30 2:59 ` Martin Bligh
2009-07-30 4:08 ` Wu Fengguang
2009-07-30 4:08 ` Wu Fengguang
2009-07-30 19:55 ` Martin Bligh
2009-07-30 19:55 ` Martin Bligh
2009-08-01 2:02 ` Wu Fengguang [this message]
2009-08-01 2:02 ` Wu Fengguang
2009-07-30 0:19 ` Martin Bligh
2009-07-30 0:19 ` Martin Bligh
2009-07-30 1:28 ` Martin Bligh
2009-07-30 1:28 ` Martin Bligh
2009-07-30 2:09 ` Wu Fengguang
2009-07-30 2:09 ` Wu Fengguang
2009-07-30 2:57 ` Martin Bligh
2009-07-30 2:57 ` Martin Bligh
2009-07-30 3:19 ` Wu Fengguang
2009-07-30 3:19 ` Wu Fengguang
2009-07-30 20:33 ` Martin Bligh
2009-07-30 20:33 ` Martin Bligh
2009-08-01 2:58 ` Wu Fengguang
2009-08-01 2:58 ` Wu Fengguang
2009-08-01 4:10 ` Wu Fengguang
2009-08-01 4:10 ` Wu Fengguang
2009-07-30 1:49 ` Wu Fengguang
2009-07-30 1:49 ` Wu Fengguang
2009-07-30 21:39 ` Jens Axboe
2009-07-30 21:39 ` Jens Axboe
2009-07-30 22:01 ` Martin Bligh
2009-07-30 22:01 ` Martin Bligh
2009-07-30 22:17 ` Jens Axboe
2009-07-30 22:17 ` Jens Axboe
2009-07-30 22:34 ` Martin Bligh
2009-07-30 22:34 ` Martin Bligh
2009-07-30 22:43 ` Jens Axboe
2009-07-30 22:43 ` Jens Axboe
2009-07-30 22:48 ` Martin Bligh
2009-07-30 22:48 ` Martin Bligh
2009-07-31 7:50 ` Peter Zijlstra
2009-07-31 7:50 ` Peter Zijlstra
2009-08-01 4:03 ` Wu Fengguang
2009-08-01 4:03 ` Wu Fengguang
2009-08-01 4:53 ` Wu Fengguang
2009-08-01 4:53 ` Wu Fengguang
2009-08-01 5:03 ` Wu Fengguang
2009-08-01 5:03 ` Wu Fengguang
2009-08-01 4:02 ` Wu Fengguang
2009-08-01 4:02 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090801020228.GA6542@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mbligh@google.com \
--cc=md@google.com \
--cc=mrubin@google.com \
--cc=peterz@infradead.org \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.