All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Chris Mason <chris.mason@oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	ext4 <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH] Improve buffered streaming write ordering
Date: Fri, 3 Oct 2008 12:43:51 +1000	[thread overview]
Message-ID: <200810031243.51277.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <1222996262.12099.42.camel@think.oraclecorp.com>

On Friday 03 October 2008 11:11, Chris Mason wrote:
> On Thu, 2008-10-02 at 23:48 +0530, Aneesh Kumar K.V wrote:
> > On Thu, Oct 02, 2008 at 08:20:54AM -0400, Chris Mason wrote:
> > > On Wed, 2008-10-01 at 21:52 -0700, Andrew Morton wrote:
> > > > On Wed, 01 Oct 2008 14:40:51 -0400 Chris Mason 
<chris.mason@oracle.com> wrote:
> > > > > The patch below changes write_cache_pages to only use
> > > > > writeback_index when current_is_pdflush().  The basic idea is that
> > > > > pdflush is the only one who has concurrency control against the
> > > > > bdi, so it is the only one who can safely use and update
> > > > > writeback_index.
> > > >
> > > > Another approach would be to only update mapping->writeback_index if
> > > > nobody else altered it meanwhile.
> > >
> > > Ok, I can give that a short.
> > >
> > > > That being said, I don't really see why we get lots of seekiness when
> > > > two threads start their writing the file from the same offset.
> > >
> > > For metadata, it makes sense.  Pages get dirtied in strange order, and
> > > if writeback_index is jumping around, we'll get the seeky metadata
> > > writeback.
> > >
> > > Data makes less sense, especially the very high extent count from ext4.
> > > An extra printk shows that ext4 is calling redirty_page_for_writepage
> > > quite a bit in ext4_da_writepage.  This should be enough to make us
> > > jump around in the file.
> >
> > We need to do  start the journal before locking the page with jbd2.
> > That prevent us from doing any block allocation in writepage() call
> > back. So with ext4/jbd2 we do block allocation only in writepages()
> > call back where we start the journal with credit needed to write
> > a single extent. Then we look for contiguous unallocated logical
> > block and request the block allocator for 'x' blocks. If we get
> > less than that. The rest of the pages which we iterated in
> > writepages  are redirtied so that we try to allocate them again.
> > We loop inside ext4_da_writepages itself looking at wbc->pages_skipped
> >
> > 2481         if (wbc->range_cont && (pages_skipped !=
> > wbc->pages_skipped)) { 2482                 /* We skipped pages in this
> > loop */
> > 2483                 wbc->range_start = range_start;
> > 2484                 wbc->nr_to_write = to_write +
> >
> > > For a 4.5GB streaming buffered write, this printk inside
> > > ext4_da_writepage shows up 37,2429 times in /var/log/messages.
> >
> > Part of that can happen due to shrink_page_list -> pageout -> writepagee
> > call back with lots of unallocated buffer_heads(blocks). Also a journal
> > commit with jbd2 looks at the inode and all the dirty pages, rather than
> > the buffer_heads (journal_submit_data_buffers). We don't force commit
> > pages that doesn't have blocks allocated with the ext4. The consistency
> > is only with i_size and data.
>
> In general, I don't think pdflush or the VM expect
> redirty_pages_for_writepage to be used this aggressively.

BTW. redirty_page_for_writepage and the whole model of cleaning the page's
dirty bit *before* calling into the filesystem is really nasty IMO. For
one thing it opens races that mean a filesystem can't keep metadata about
the pagecache properly in synch with the page's dirty bit.

I have a patch in my fsblock series that fixes this and has the writepage()
function itself clear the page's dirty bit. This basically makes
redirty_page_for_writepages go away completely (at least the uses I looked
at, I didn't look at ext4 though).

Shall I break it out and submit it?

  reply	other threads:[~2008-10-03  2:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-01 18:40 [PATCH] Improve buffered streaming write ordering Chris Mason
2008-10-02  4:52 ` Andrew Morton
2008-10-02 12:20   ` Chris Mason
2008-10-02 16:12     ` Chris Mason
2008-10-02 18:18     ` Aneesh Kumar K.V
2008-10-02 19:44       ` Andrew Morton
2008-10-02 23:43       ` Dave Chinner
2008-10-03 19:45         ` Chris Mason
2008-10-06 10:16           ` Aneesh Kumar K.V
2008-10-06 14:21             ` Chris Mason
2008-10-07  8:45               ` Aneesh Kumar K.V
2008-10-07  9:05                 ` Christoph Hellwig
2008-10-07 10:02                   ` Aneesh Kumar K.V
2008-10-07 13:29                     ` Theodore Tso
2008-10-07 13:36                       ` Christoph Hellwig
2008-10-07 13:36                         ` Christoph Hellwig
2008-10-07 14:46                         ` Nick Piggin
2008-10-07 13:36                       ` Christoph Hellwig
2008-10-07 13:55                     ` Peter Staubach
2008-10-07 14:38                       ` Chuck Lever
2008-10-09 15:11         ` Chris Mason
2008-10-10  5:13           ` Dave Chinner
2008-10-03  1:11       ` Chris Mason
2008-10-03  2:43         ` Nick Piggin [this message]
2008-10-03 12:07           ` Chris Mason
2008-10-02 18:08 ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200810031243.51277.nickpiggin@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.