From: Andrew Morton <akpm@linux-foundation.org>
To: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: a.p.zijlstra@chello.nl, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
Date: Wed, 24 Jun 2009 15:27:32 -0700 [thread overview]
Message-ID: <20090624152732.d6352f4f.akpm@linux-foundation.org> (raw)
In-Reply-To: <1245839904.3210.85.camel@localhost.localdomain>
On Wed, 24 Jun 2009 11:38:24 +0100
Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> When writing to 2 (or more) devices at the same time, stop
> balance_dirty_pages moving dirty pages to writeback when it has reached
> the bdi threshold. This prevents balance_dirty_pages overshooting its
> limits and moving all dirty pages to writeback.
>
>
> Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> ---
> balance_dirty_pages can overreact and move all of the dirty pages to
> writeback unnecessarily.
>
> balance_dirty_pages makes its decision to throttle based on the number
> of dirty plus writeback pages that are over the calculated limit,so it
> will continue to move pages even when there are plenty of pages in
> writeback and less than the threshold still dirty.
>
> This allows it to overshoot its limits and move all the dirty pages to
> writeback while waiting for the drives to catch up and empty the
> writeback list.
>
> A simple fio test easily demonstrates this problem.
>
> fio --name=f1 --directory=/disk1 --size=2G -rw=write
> --name=f2 --directory=/disk2 --size=1G --rw=write --startdelay=10
>
> The attached graph before.png shows how all pages are moved to writeback
> as the second write starts and the throttling kicks in.
>
> after.png is the same test with the patch applied, which clearly shows
> that it keeps dirty_background_ratio dirty pages in the buffer.
> The values and timings of the graphs are only approximate but are good
> enough to show the behaviour.
>
> This is the simplest fix I could find, but I'm not entirely sure that it
> alone will be enough for all cases. But it certainly is an improvement
> on my desktop machine writing to 2 disks.
>
> Do we need something more for machines with large arrays where
> bdi_threshold * number_of_drives is greater than the dirty_ratio ?
>
um. Interesting find. Jens, was any of your performance testing using
multiple devices? If so, it looks like the results just got invalidated :)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 7b0dcea..7687879 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
> * filesystems (i.e. NFS) in which data may have been
> * written to the server's write cache, but has not yet
> * been flushed to permanent storage.
> + * Only move pages to writeback if this bdi is over its
> + * threshold otherwise wait until the disk writes catch
> + * up.
> */
> - if (bdi_nr_reclaimable) {
> + if (bdi_nr_reclaimable > bdi_thresh) {
> writeback_inodes(&wbc);
> pages_written += write_chunk - wbc.nr_to_write;
> get_dirty_limits(&background_thresh, &dirty_thresh,
yup, we need to think about the effect with zillions of disks. Peter,
could you please take a look?
Also... get_dirty_limits() is rather hard to grok. The callers of
get_dirty_limits() treat its three return values as "thresholds", but
they're not named as thresholds within get_dirty_limits() itself, which
is a bit confusing. And the meaning of each of those return values is
pretty obscure from the code - could we document them please?
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: a.p.zijlstra@chello.nl, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
Date: Wed, 24 Jun 2009 15:27:32 -0700 [thread overview]
Message-ID: <20090624152732.d6352f4f.akpm@linux-foundation.org> (raw)
In-Reply-To: <1245839904.3210.85.camel@localhost.localdomain>
On Wed, 24 Jun 2009 11:38:24 +0100
Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> When writing to 2 (or more) devices at the same time, stop
> balance_dirty_pages moving dirty pages to writeback when it has reached
> the bdi threshold. This prevents balance_dirty_pages overshooting its
> limits and moving all dirty pages to writeback.
>
>
> Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> ---
> balance_dirty_pages can overreact and move all of the dirty pages to
> writeback unnecessarily.
>
> balance_dirty_pages makes its decision to throttle based on the number
> of dirty plus writeback pages that are over the calculated limit,so it
> will continue to move pages even when there are plenty of pages in
> writeback and less than the threshold still dirty.
>
> This allows it to overshoot its limits and move all the dirty pages to
> writeback while waiting for the drives to catch up and empty the
> writeback list.
>
> A simple fio test easily demonstrates this problem.
>
> fio --name=f1 --directory=/disk1 --size=2G -rw=write
> --name=f2 --directory=/disk2 --size=1G --rw=write --startdelay=10
>
> The attached graph before.png shows how all pages are moved to writeback
> as the second write starts and the throttling kicks in.
>
> after.png is the same test with the patch applied, which clearly shows
> that it keeps dirty_background_ratio dirty pages in the buffer.
> The values and timings of the graphs are only approximate but are good
> enough to show the behaviour.
>
> This is the simplest fix I could find, but I'm not entirely sure that it
> alone will be enough for all cases. But it certainly is an improvement
> on my desktop machine writing to 2 disks.
>
> Do we need something more for machines with large arrays where
> bdi_threshold * number_of_drives is greater than the dirty_ratio ?
>
um. Interesting find. Jens, was any of your performance testing using
multiple devices? If so, it looks like the results just got invalidated :)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 7b0dcea..7687879 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
> * filesystems (i.e. NFS) in which data may have been
> * written to the server's write cache, but has not yet
> * been flushed to permanent storage.
> + * Only move pages to writeback if this bdi is over its
> + * threshold otherwise wait until the disk writes catch
> + * up.
> */
> - if (bdi_nr_reclaimable) {
> + if (bdi_nr_reclaimable > bdi_thresh) {
> writeback_inodes(&wbc);
> pages_written += write_chunk - wbc.nr_to_write;
> get_dirty_limits(&background_thresh, &dirty_thresh,
yup, we need to think about the effect with zillions of disks. Peter,
could you please take a look?
Also... get_dirty_limits() is rather hard to grok. The callers of
get_dirty_limits() treat its three return values as "thresholds", but
they're not named as thresholds within get_dirty_limits() itself, which
is a bit confusing. And the meaning of each of those return values is
pretty obscure from the code - could we document them please?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-06-24 22:28 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-24 10:38 [RFC][PATCH] mm: stop balance_dirty_pages doing too much work Richard Kennedy
2009-06-24 22:27 ` Andrew Morton [this message]
2009-06-24 22:27 ` Andrew Morton
2009-06-25 5:13 ` Jens Axboe
2009-06-25 5:13 ` Jens Axboe
2009-06-25 8:00 ` Peter Zijlstra
2009-06-25 8:00 ` Peter Zijlstra
2009-06-25 9:10 ` Jens Axboe
2009-06-25 9:10 ` Jens Axboe
2009-06-25 9:26 ` Jens Axboe
2009-06-25 9:26 ` Jens Axboe
2009-06-25 12:33 ` Al Boldi
2009-06-25 12:33 ` Al Boldi
2009-06-25 12:43 ` Jens Axboe
2009-06-25 12:43 ` Jens Axboe
2009-06-25 13:46 ` Al Boldi
2009-06-25 13:46 ` Al Boldi
2009-06-25 14:44 ` Jens Axboe
2009-06-25 14:44 ` Jens Axboe
2009-06-25 17:10 ` Al Boldi
2009-06-25 17:10 ` Al Boldi
2009-06-26 5:02 ` Jens Axboe
2009-06-26 5:02 ` Jens Axboe
2009-06-26 11:37 ` Al Boldi
2009-06-26 11:37 ` Al Boldi
2009-06-26 12:35 ` Jens Axboe
2009-06-26 12:35 ` Jens Axboe
2009-06-26 9:15 ` Richard Kennedy
2009-06-26 9:15 ` Richard Kennedy
2009-06-26 9:20 ` Jens Axboe
2009-06-26 9:20 ` Jens Axboe
2009-08-07 12:20 ` Peter Zijlstra
2009-08-07 12:20 ` Peter Zijlstra
2009-08-07 14:36 ` Richard Kennedy
2009-08-07 14:36 ` Richard Kennedy
2009-08-07 14:38 ` Peter Zijlstra
2009-08-07 14:38 ` Peter Zijlstra
2009-08-07 15:22 ` Chris Mason
2009-08-07 15:22 ` Chris Mason
2009-08-07 16:09 ` Richard Kennedy
2009-08-07 16:09 ` Richard Kennedy
2009-08-07 21:02 ` Chris Mason
2009-08-07 21:02 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090624152732.d6352f4f.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=a.p.zijlstra@chello.nl \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=richard@rsk.demon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.