From: Andrew Morton <akpm@linux-foundation.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: dgc@sgi.com, linux-kernel@vger.kernel.org
Subject: Re: [patch 1/3] fix illogical behavior in balance_dirty_pages()
Date: Sun, 25 Mar 2007 15:35:08 -0800 [thread overview]
Message-ID: <20070325153508.10922ebd.akpm@linux-foundation.org> (raw)
In-Reply-To: <E1HVECv-0006bU-00@dorka.pomaz.szeredi.hu>
On Sat, 24 Mar 2007 22:55:29 +0100 Miklos Szeredi <miklos@szeredi.hu> wrote:
> This is a slightly different take on the fix for the deadlock in fuse
> with dirty balancing. David Chinner convinced me, that per-bdi
> counters are too expensive, and that it's not worth trying to account
> the number of pages under writeback, as they will be limited by the
> queue anyway.
>
> ----
> From: Miklos Szeredi <mszeredi@suse.cz>
>
> Current behavior of balance_dirty_pages() is to try to start writeout
> into the specified queue for at least "write_chunk" number of pages.
> If "write_chunk" pages have been submitted, then return.
>
> However if there are less than "write_chunk" dirty pages for this
> queue, then it doesn't return, waiting for the global dirty+writeback
> counters to subside, but without doing any actual work.
>
> This is illogical behavior: it allows more dirtyings while there are
> dirty pages, but stops further dirtying completely if there are no
> more dirty pages.
That behaviour is perfectly logical. It prevents the number of
dirty+writeback pages from exceeding dirty_ratio.
> It also makes a deadlock possible when one filesystem is writing data
> through another, and the balance_dirty_pages() for the lower
> filesystem is stalling the writeback for the upper filesystem's
> data (*).
I still don't understand this one. I got lost when belatedly told that
i_mutex had something to do with it.
> So the exit condition should instead be:
>
> submitted at least "write_chunk" number of pages
> OR
> submitted ALL the dirty pages destined for this backing dev
> AND
> backing dev is not congested
>
> To do this, introduce a new counter in writeback_control, which counts
> the number of dirty pages encountered during writeback. This includes
> all dirty pages, even those which are already under writeback but have
> been dirtied again, and those which have been skipped due to having
> locked buffers.
>
> If this counter is zero after trying to submit some pages for
> writeback, and the backing dev is uncongested, then don't wait any
> more. After this, newly dirtied pages can quickly be written back to
> this backing dev.
>
> If there are globally no more pages to submit for writeback
> (nr_reclaimable == 0), then also don't wait for ever, only while this
> backing dev is congested.
>
> (*) For more info on this deadlock, see the following discussions:
>
> http://lkml.org/lkml/2007/3/1/9
> http://lkml.org/lkml/2007/3/12/16
>
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
>
> Index: linux/include/linux/writeback.h
> ===================================================================
> --- linux.orig/include/linux/writeback.h 2007-03-24 22:06:56.000000000 +0100
> +++ linux/include/linux/writeback.h 2007-03-24 22:29:02.000000000 +0100
> @@ -44,6 +44,7 @@ struct writeback_control {
> long nr_to_write; /* Write this many pages, and decrement
> this for each page written */
> long pages_skipped; /* Pages which were not written */
> + long nr_dirty; /* Number of dirty pages encountered */
>
> /*
> * For a_ops->writepages(): is start or end are non-zero then this is
> Index: linux/mm/page-writeback.c
> ===================================================================
> --- linux.orig/mm/page-writeback.c 2007-03-24 22:06:56.000000000 +0100
> +++ linux/mm/page-writeback.c 2007-03-24 22:29:02.000000000 +0100
> @@ -207,7 +207,15 @@ static void balance_dirty_pages(struct a
> * written to the server's write cache, but has not yet
> * been flushed to permanent storage.
> */
> - if (nr_reclaimable) {
> + if (!nr_reclaimable) {
> + /*
> + * If there's nothing more to write back and this queue
> + * is uncongested, then it is possible to quickly
> + * write out some more data, so let's not wait
> + */
> + if (!bdi_write_congested(bdi))
> + break;
> + } else {
This says "if there are no dirty pages in the machine at all, then go back
and dirty some more pages, regardless of the present number of
under-writeback pages".
> writeback_inodes(&wbc);
> get_dirty_limits(&background_thresh,
> &dirty_thresh, mapping);
> @@ -220,6 +228,14 @@ static void balance_dirty_pages(struct a
> pages_written += write_chunk - wbc.nr_to_write;
> if (pages_written >= write_chunk)
> break; /* We've done our duty */
> +
> + /*
> + * If there are no more dirty pages for this backing
> + * backing dev, and the queue is not congested, then
> + * it is possible to quickly write out some more data
> + */
> + if (!wbc.nr_dirty && !bdi_write_congested(bdi))
> + break;
This says "if there are no pages dirty againt this device then go back and
dirty some more pages, regardless of the present number of under-writeback
pages".
IOW: this change will allow us to 100% fill all memory with under-writeback
pages.
The VM _should_ cope with that - it kinda used to. But it is an untested
region of operation and the chances of bogus oom-killings are excellent.
next prev parent reply other threads:[~2007-03-25 23:35 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-24 21:55 [patch 1/3] fix illogical behavior in balance_dirty_pages() Miklos Szeredi
2007-03-24 21:57 ` [patch 2/3] remove throttle_vm_writeout() Miklos Szeredi
2007-03-25 23:41 ` Andrew Morton
2007-03-26 8:35 ` Miklos Szeredi
2007-03-24 21:58 ` [patch 3/3] balance dirty pages from loop device Miklos Szeredi
2007-03-25 10:03 ` [patch 1/3] fix illogical behavior in balance_dirty_pages() Peter Zijlstra
2007-03-25 11:12 ` Miklos Szeredi
2007-03-25 11:34 ` Miklos Szeredi
2007-03-25 11:50 ` Peter Zijlstra
2007-03-25 20:41 ` Miklos Szeredi
2007-03-25 23:35 ` Andrew Morton [this message]
2007-03-26 8:26 ` Miklos Szeredi
2007-03-26 9:01 ` Andrew Morton
2007-03-26 9:20 ` Miklos Szeredi
2007-03-26 9:32 ` Andrew Morton
2007-03-26 9:48 ` Miklos Szeredi
2007-03-26 9:32 ` Miklos Szeredi
2007-03-26 10:08 ` Andrew Morton
2007-03-26 13:30 ` Peter Zijlstra
2007-03-27 0:30 ` David Chinner
2007-03-27 0:23 ` David Chinner
-- strict thread matches above, loose matches on Subject: below --
2007-04-03 18:40 Kris Corwin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070325153508.10922ebd.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=dgc@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.