From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Richard Kennedy <richard@rsk.demon.co.uk>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
Date: Thu, 25 Jun 2009 10:00:33 +0200 [thread overview]
Message-ID: <1245916833.31755.78.camel@twins> (raw)
In-Reply-To: <20090624152732.d6352f4f.akpm@linux-foundation.org>
On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2009 11:38:24 +0100
> Richard Kennedy <richard@rsk.demon.co.uk> wrote:
>
> > When writing to 2 (or more) devices at the same time, stop
> > balance_dirty_pages moving dirty pages to writeback when it has reached
> > the bdi threshold. This prevents balance_dirty_pages overshooting its
> > limits and moving all dirty pages to writeback.
> >
> >
> > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > ---
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[ moved explanation below ]
> >
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 7b0dcea..7687879 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
> > * filesystems (i.e. NFS) in which data may have been
> > * written to the server's write cache, but has not yet
> > * been flushed to permanent storage.
> > + * Only move pages to writeback if this bdi is over its
> > + * threshold otherwise wait until the disk writes catch
> > + * up.
> > */
> > - if (bdi_nr_reclaimable) {
> > + if (bdi_nr_reclaimable > bdi_thresh) {
> > writeback_inodes(&wbc);
> > pages_written += write_chunk - wbc.nr_to_write;
> > get_dirty_limits(&background_thresh, &dirty_thresh,
>
> yup, we need to think about the effect with zillions of disks. Peter,
> could you please take a look?
Looks to have been in that form forever (immediate git history).
When reading the code I read it like:
if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
break;
if (nr_reclaimable + nr_writeback <
(background_thresh + dirty_thresh) / 2)
break;
if (bdi_nr_reclaimable) {
writeback_inodes(&wbc);
Which to me reads:
- if there's not enough to do, drop out
- see if background write-out can catch up, drop out
- is there anything to do, yay! work.
/me goes read the changelog, maybe there's a clue in there :-)
> > balance_dirty_pages can overreact and move all of the dirty pages to
> > writeback unnecessarily.
> >
> > balance_dirty_pages makes its decision to throttle based on the number
> > of dirty plus writeback pages that are over the calculated limit,so it
> > will continue to move pages even when there are plenty of pages in
> > writeback and less than the threshold still dirty.
> >
> > This allows it to overshoot its limits and move all the dirty pages to
> > writeback while waiting for the drives to catch up and empty the
> > writeback list.
Ahhh, indeed, how silly of me not to notice that before!
> > This is the simplest fix I could find, but I'm not entirely sure that it
> > alone will be enough for all cases. But it certainly is an improvement
> > on my desktop machine writing to 2 disks.
Seems good to me.
> > Do we need something more for machines with large arrays where
> > bdi_threshold * number_of_drives is greater than the dirty_ratio ?
[ I assumed s/dirty_ratio/dirty_thresh/, since dirty_ratio is a ratio
and bdi_threshold is an actual value, therefore the inequality above
doesn't make sense ]
That cannot actually happen (aside from small numerical glitches).
bdi_threshold = P_i * dirty_thresh, where \Sum P_i = 1
The proportion is relative to the recent writeout speed of the device.
On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> Also... get_dirty_limits() is rather hard to grok. The callers of
> get_dirty_limits() treat its three return values as "thresholds", but
> they're not named as thresholds within get_dirty_limits() itself, which
> is a bit confusing. And the meaning of each of those return values is
> pretty obscure from the code - could we document them please?
Does something like this help?
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7b0dcea..dc2cee1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -426,6 +426,13 @@ unsigned long determine_dirtyable_memory(void)
return x + 1; /* Ensure that we never return 0 */
}
+/*
+ * get_dirty_limits() - compute the various dirty limits
+ *
+ * @pbackground - dirty limit at which we want to start background write-out
+ * @pdirty - total dirty limit, we should not have more dirty than this
+ * @pdbi_dirty - the share of @pdirty available to @bdi
+ */
void
get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Richard Kennedy <richard@rsk.demon.co.uk>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
Date: Thu, 25 Jun 2009 10:00:33 +0200 [thread overview]
Message-ID: <1245916833.31755.78.camel@twins> (raw)
In-Reply-To: <20090624152732.d6352f4f.akpm@linux-foundation.org>
On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2009 11:38:24 +0100
> Richard Kennedy <richard@rsk.demon.co.uk> wrote:
>
> > When writing to 2 (or more) devices at the same time, stop
> > balance_dirty_pages moving dirty pages to writeback when it has reached
> > the bdi threshold. This prevents balance_dirty_pages overshooting its
> > limits and moving all dirty pages to writeback.
> >
> >
> > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > ---
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[ moved explanation below ]
> >
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 7b0dcea..7687879 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
> > * filesystems (i.e. NFS) in which data may have been
> > * written to the server's write cache, but has not yet
> > * been flushed to permanent storage.
> > + * Only move pages to writeback if this bdi is over its
> > + * threshold otherwise wait until the disk writes catch
> > + * up.
> > */
> > - if (bdi_nr_reclaimable) {
> > + if (bdi_nr_reclaimable > bdi_thresh) {
> > writeback_inodes(&wbc);
> > pages_written += write_chunk - wbc.nr_to_write;
> > get_dirty_limits(&background_thresh, &dirty_thresh,
>
> yup, we need to think about the effect with zillions of disks. Peter,
> could you please take a look?
Looks to have been in that form forever (immediate git history).
When reading the code I read it like:
if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
break;
if (nr_reclaimable + nr_writeback <
(background_thresh + dirty_thresh) / 2)
break;
if (bdi_nr_reclaimable) {
writeback_inodes(&wbc);
Which to me reads:
- if there's not enough to do, drop out
- see if background write-out can catch up, drop out
- is there anything to do, yay! work.
/me goes read the changelog, maybe there's a clue in there :-)
> > balance_dirty_pages can overreact and move all of the dirty pages to
> > writeback unnecessarily.
> >
> > balance_dirty_pages makes its decision to throttle based on the number
> > of dirty plus writeback pages that are over the calculated limit,so it
> > will continue to move pages even when there are plenty of pages in
> > writeback and less than the threshold still dirty.
> >
> > This allows it to overshoot its limits and move all the dirty pages to
> > writeback while waiting for the drives to catch up and empty the
> > writeback list.
Ahhh, indeed, how silly of me not to notice that before!
> > This is the simplest fix I could find, but I'm not entirely sure that it
> > alone will be enough for all cases. But it certainly is an improvement
> > on my desktop machine writing to 2 disks.
Seems good to me.
> > Do we need something more for machines with large arrays where
> > bdi_threshold * number_of_drives is greater than the dirty_ratio ?
[ I assumed s/dirty_ratio/dirty_thresh/, since dirty_ratio is a ratio
and bdi_threshold is an actual value, therefore the inequality above
doesn't make sense ]
That cannot actually happen (aside from small numerical glitches).
bdi_threshold = P_i * dirty_thresh, where \Sum P_i = 1
The proportion is relative to the recent writeout speed of the device.
On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> Also... get_dirty_limits() is rather hard to grok. The callers of
> get_dirty_limits() treat its three return values as "thresholds", but
> they're not named as thresholds within get_dirty_limits() itself, which
> is a bit confusing. And the meaning of each of those return values is
> pretty obscure from the code - could we document them please?
Does something like this help?
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7b0dcea..dc2cee1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -426,6 +426,13 @@ unsigned long determine_dirtyable_memory(void)
return x + 1; /* Ensure that we never return 0 */
}
+/*
+ * get_dirty_limits() - compute the various dirty limits
+ *
+ * @pbackground - dirty limit at which we want to start background write-out
+ * @pdirty - total dirty limit, we should not have more dirty than this
+ * @pdbi_dirty - the share of @pdirty available to @bdi
+ */
void
get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
unsigned long *pbdi_dirty, struct backing_dev_info *bdi)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-06-25 8:00 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-24 10:38 [RFC][PATCH] mm: stop balance_dirty_pages doing too much work Richard Kennedy
2009-06-24 22:27 ` Andrew Morton
2009-06-24 22:27 ` Andrew Morton
2009-06-25 5:13 ` Jens Axboe
2009-06-25 5:13 ` Jens Axboe
2009-06-25 8:00 ` Peter Zijlstra [this message]
2009-06-25 8:00 ` Peter Zijlstra
2009-06-25 9:10 ` Jens Axboe
2009-06-25 9:10 ` Jens Axboe
2009-06-25 9:26 ` Jens Axboe
2009-06-25 9:26 ` Jens Axboe
2009-06-25 12:33 ` Al Boldi
2009-06-25 12:33 ` Al Boldi
2009-06-25 12:43 ` Jens Axboe
2009-06-25 12:43 ` Jens Axboe
2009-06-25 13:46 ` Al Boldi
2009-06-25 13:46 ` Al Boldi
2009-06-25 14:44 ` Jens Axboe
2009-06-25 14:44 ` Jens Axboe
2009-06-25 17:10 ` Al Boldi
2009-06-25 17:10 ` Al Boldi
2009-06-26 5:02 ` Jens Axboe
2009-06-26 5:02 ` Jens Axboe
2009-06-26 11:37 ` Al Boldi
2009-06-26 11:37 ` Al Boldi
2009-06-26 12:35 ` Jens Axboe
2009-06-26 12:35 ` Jens Axboe
2009-06-26 9:15 ` Richard Kennedy
2009-06-26 9:15 ` Richard Kennedy
2009-06-26 9:20 ` Jens Axboe
2009-06-26 9:20 ` Jens Axboe
2009-08-07 12:20 ` Peter Zijlstra
2009-08-07 12:20 ` Peter Zijlstra
2009-08-07 14:36 ` Richard Kennedy
2009-08-07 14:36 ` Richard Kennedy
2009-08-07 14:38 ` Peter Zijlstra
2009-08-07 14:38 ` Peter Zijlstra
2009-08-07 15:22 ` Chris Mason
2009-08-07 15:22 ` Chris Mason
2009-08-07 16:09 ` Richard Kennedy
2009-08-07 16:09 ` Richard Kennedy
2009-08-07 21:02 ` Chris Mason
2009-08-07 21:02 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1245916833.31755.78.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=richard@rsk.demon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.