From: Dave Chinner <david@fromorbit.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
tytso@mit.edu, jens.axboe@oracle.com
Subject: Re: [PATCH 6/6] writeback: limit write_cache_pages integrity scanning to current EOF
Date: Fri, 28 May 2010 11:23:38 +1000 [thread overview]
Message-ID: <20100528012338.GX12087@dastard> (raw)
In-Reply-To: <20100527143341.d4258798.akpm@linux-foundation.org>
On Thu, May 27, 2010 at 02:33:41PM -0700, Andrew Morton wrote:
> On Tue, 25 May 2010 20:54:12 +1000
> Dave Chinner <david@fromorbit.com> wrote:
>
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > sync can currently take a really long time if a concurrent writer is
> > extending a file. The problem is that the dirty pages on the address
> > space grow in the same direction as write_cache_pages scans, so if
> > the writer keeps ahead of writeback, the writeback will not
> > terminate until the writer stops adding dirty pages.
>
> <looks at Jens>
>
> The really was a pretty basic bug. It's writeback 101 to test that case :(
>
> > For a data integrity sync, we only need to write the pages dirty at
> > the time we start the writeback, so we can stop scanning once we get
> > to the page that was at the end of the file at the time the scan
> > started.
> >
> > This will prevent operations like copying a large file preventing
> > sync from completing as it will not write back pages that were
> > dirtied after the sync was started. This does not impact the
> > existing integrity guarantees, as any dirty page (old or new)
> > within the EOF range at the start of the scan will still be
> > captured.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> > mm/page-writeback.c | 15 +++++++++++++++
> > 1 files changed, 15 insertions(+), 0 deletions(-)
> >
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 0fe713d..c97e973 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -855,7 +855,22 @@ int write_cache_pages(struct address_space *mapping,
> > if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
> > range_whole = 1;
> > cycled = 1; /* ignore range_cyclic tests */
> > +
> > + /*
> > + * If this is a data integrity sync, cap the writeback to the
> > + * current end of file. Any extension to the file that occurs
> > + * after this is a new write and we don't need to write those
> > + * pages out to fulfil our data integrity requirements. If we
> > + * try to write them out, we can get stuck in this scan until
> > + * the concurrent writer stops adding dirty pages and extending
> > + * EOF.
> > + */
> > + if (wbc->sync_mode == WB_SYNC_ALL &&
> > + wbc->range_end == LLONG_MAX) {
> > + end = i_size_read(mapping->host) >> PAGE_CACHE_SHIFT;
> > + }
> > }
> > +
>
> This is somewhat inefficient. It's really trivial and fast to find the
> highest-index dirty page by walking straight down the
> PAGECACHE_TAG_DIRTY-tagged nodes.
>
> However pagevec_lookup_tag(..., PAGECACHE_TAG_DIRTY) should do a pretty
> good job of skipping over the (millions of) pages between the (last
> dirty page before `end') and (`end'). So it _should_ be OK. Some thought
> and runtime testing would be good.
I've done plenty of testing here - the patch is part of my usual QA
stack so it's probably run through XFSQA a few hundred times now.
The behaviour w.r.t. writes into holes appears to be identical to
the current behaviour (i.e. sync hangs until the write stops). I
can't _see_ any difference in behaviour of increase in overhead, but
we all know that this doesn't mean there isn't one.
The test I ran to determine it's effectiveness against extending
writes was running sync during an 8GB sequential write. It would
hang every time until the write completes - typically around 50-60s
- without this patch. With this patch sync returns within 3-4s every
time.
> That being said, I think the patch is insufficient. If I create an
> enormous (possibly sparse) file with a 16TB hole (or a run of clean
> pages) in the middle and then start busily writing into that hole (run
> of clean pages), the problem will still occur.
Yes, that's already been pointed out. This doesn't attempt to
address that problem because....
> One obvious fix for that (a) would be to add another radix-tree tag and
> do two passes across the radix-tree.
.... we're still waiting for Jan's mark and sweep patch that does
this to make progress. This patch is just a simple fix for the most
common cause of the problem and might get into the tree sooner.
Personally I see no problems with using a mark+sweep algorithm for
this - IMO ensuring that data integrity requirements are met is more
important than ultimate performance.
> Another fix (b) would be to track the number of dirty pages per
> adddress_space, and only write that number of pages.
>
> Another fix would be to work out how the code handled this situation
> before we broke it, and restore that in some fashion. I guess fix (b)
> above kinda does that.
Yes, b) is similar to the old code, but like the old code it doesn't
work for the data integrity case because it doesn't guarantee the
correct pages are written out. This patch does for the append case,
and a) guarantees it in all cases....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2010-05-28 1:23 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-25 10:54 [PATCH 0/6] writeback: tracing and fixes Dave Chinner
2010-05-25 10:54 ` [PATCH 1/6] writeback: initial tracing support Dave Chinner
2010-05-25 11:13 ` Christoph Hellwig
2010-05-27 21:32 ` Andrew Morton
2010-05-28 0:44 ` Dave Chinner
2010-05-28 1:20 ` Steven Rostedt
2010-05-28 1:18 ` Steven Rostedt
2010-05-28 7:45 ` Christoph Hellwig
2010-05-25 10:54 ` [PATCH 2/6] writeback: Add tracing to balance_dirty_pages Dave Chinner
2010-05-25 11:13 ` Christoph Hellwig
2010-05-25 10:54 ` [PATCH 3/6] ext4: Use our own write_cache_pages() Dave Chinner
2010-05-25 13:06 ` tytso
2010-05-25 22:42 ` Dave Chinner
2010-05-25 10:54 ` [PATCH 4/6] writeback: pay attention to wbc->nr_to_write in write_cache_pages Dave Chinner
2010-05-25 11:11 ` Christoph Hellwig
2010-05-27 21:32 ` Andrew Morton
2010-05-28 0:56 ` Dave Chinner
2010-05-25 10:54 ` [PATCH 5/6] xfs: remove nr_to_write writeback windup Dave Chinner
2010-05-25 11:14 ` Christoph Hellwig
2010-05-25 10:54 ` [PATCH 6/6] writeback: limit write_cache_pages integrity scanning to current EOF Dave Chinner
2010-05-27 21:33 ` Andrew Morton
2010-05-28 1:23 ` Dave Chinner [this message]
2010-05-28 5:06 ` Nick Piggin
2010-06-01 15:54 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100528012338.GX12087@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=jens.axboe@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).