All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	David Howells <dhowells@redhat.com>
Subject: Re: [PATCH 4/6] writeback: dont redirty tail an inode with dirty pages
Date: Thu, 15 Jul 2010 23:35:30 +0800	[thread overview]
Message-ID: <20100715153530.GC6511@localhost> (raw)
In-Reply-To: <20100712151317.bd9d656c.akpm@linux-foundation.org>

On Tue, Jul 13, 2010 at 06:13:17AM +0800, Andrew Morton wrote:
> On Mon, 12 Jul 2010 23:31:27 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > > > +		} else if (inode->i_state & I_DIRTY) {
> > > > +			/*
> > > > +			 * At least XFS will redirty the inode during the
> > > > +			 * writeback (delalloc) and on io completion (isize).
> > > > +			 */
> > > > +			redirty_tail(inode);
> > > 
> > > I'd drop the mention of XFS here - any filesystem that does delayed
> > > allocation or unwritten extent conversion after Io completion will
> > > cause this. Perhaps make the comment:
> > > 
> > > 	/*
> > > 	 * Filesystems can dirty the inode during writeback
> > > 	 * operations, such as delayed allocation during submission
> > > 	 * or metadata updates after data IO completion.
> > > 	 */
> > 
> > Thanks, comments updated accordingly.
> > 
> > ---
> > writeback: don't redirty tail an inode with dirty pages
> > 
> > This avoids delaying writeback for an expired (XFS) inode with lots of
> > dirty pages, but no active dirtier at the moment. Previously we only do
> > that for the kupdate case.
> > 
> 
> You didn't actually explain the _reason_ for making this change. 
> Please always do that.

OK. It's actually extending commit b3af9468ae from the kupdate-only case to
both kupdate and !kupdate cases.

The commit documented the reason:

    Debug traces show that in per-bdi writeback, the inode under writeback
    almost always get redirtied by a busy dirtier.  We used to call
    redirty_tail() in this case, which could delay inode for up to 30s.
    
    This is unacceptable because it now happens so frequently for plain cp/dd,
    that the accumulated delays could make writeback of big files very slow.

    So let's distinguish between data redirty and metadata only redirty.
    The first one is caused by a busy dirtier, while the latter one could
    happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

Commit b3af9468ae only does that for kupdate case because requeue_io() was
only called in the kupdate case. Now we are merging the kupdate and !kupdate
cases in patch 6/6 (why not?), so is this patch.

> The patch is...  surprisingly complicated, although the end result
> looks OK.  This is not aided by the partial duplication between
> mapping_tagged(PAGECACHE_TAG_DIRTY) and I_DIRTY_PAGES.  I don't think
> we can easily remove I_DIRTY_PAGES because it's used for the
> did-someone-just-dirty-a-page test here.

I double checked I_DIRTY_PAGES. The main difference to PAGECACHE_TAG_DIRTY is:
I_DIRTY_PAGES (at the line removed by this patch) means there are _new_ pages
get dirtied during writeback, while PAGECACHE_TAG_DIRTY means there are dirty
pages. In this sense, if the I_DIRTY_PAGES handling is the same as
PAGECACHE_TAG_DIRTY, the code can be merged into PAGECACHE_TAG_DIRTY, as this
patch does.

The other minor differences are

- in *_set_page_dirty*(), PAGECACHE_TAG_DIRTY is set racelessly, while
  I_DIRTY_PAGES might be set on the inode for a page just truncated.
  The difference has no real impact on this patch (it's actually
  slightly better now).

- afs_fsync() always set I_DIRTY_PAGES after calling afs_writepages().
  The call was there in the first day (introduce by David Howells).
  What was the intention, hmm..?

> This code is way too complex and fragile and I fear that anything we do
> to it will break something :(

Agreed. Let's try to simplify it :)

Thanks,
Fengguang

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	David Howells <dhowells@redhat.com>
Subject: Re: [PATCH 4/6] writeback: dont redirty tail an inode with dirty pages
Date: Thu, 15 Jul 2010 23:35:30 +0800	[thread overview]
Message-ID: <20100715153530.GC6511@localhost> (raw)
In-Reply-To: <20100712151317.bd9d656c.akpm@linux-foundation.org>

On Tue, Jul 13, 2010 at 06:13:17AM +0800, Andrew Morton wrote:
> On Mon, 12 Jul 2010 23:31:27 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > > > +		} else if (inode->i_state & I_DIRTY) {
> > > > +			/*
> > > > +			 * At least XFS will redirty the inode during the
> > > > +			 * writeback (delalloc) and on io completion (isize).
> > > > +			 */
> > > > +			redirty_tail(inode);
> > > 
> > > I'd drop the mention of XFS here - any filesystem that does delayed
> > > allocation or unwritten extent conversion after Io completion will
> > > cause this. Perhaps make the comment:
> > > 
> > > 	/*
> > > 	 * Filesystems can dirty the inode during writeback
> > > 	 * operations, such as delayed allocation during submission
> > > 	 * or metadata updates after data IO completion.
> > > 	 */
> > 
> > Thanks, comments updated accordingly.
> > 
> > ---
> > writeback: don't redirty tail an inode with dirty pages
> > 
> > This avoids delaying writeback for an expired (XFS) inode with lots of
> > dirty pages, but no active dirtier at the moment. Previously we only do
> > that for the kupdate case.
> > 
> 
> You didn't actually explain the _reason_ for making this change. 
> Please always do that.

OK. It's actually extending commit b3af9468ae from the kupdate-only case to
both kupdate and !kupdate cases.

The commit documented the reason:

    Debug traces show that in per-bdi writeback, the inode under writeback
    almost always get redirtied by a busy dirtier.  We used to call
    redirty_tail() in this case, which could delay inode for up to 30s.
    
    This is unacceptable because it now happens so frequently for plain cp/dd,
    that the accumulated delays could make writeback of big files very slow.

    So let's distinguish between data redirty and metadata only redirty.
    The first one is caused by a busy dirtier, while the latter one could
    happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

Commit b3af9468ae only does that for kupdate case because requeue_io() was
only called in the kupdate case. Now we are merging the kupdate and !kupdate
cases in patch 6/6 (why not?), so is this patch.

> The patch is...  surprisingly complicated, although the end result
> looks OK.  This is not aided by the partial duplication between
> mapping_tagged(PAGECACHE_TAG_DIRTY) and I_DIRTY_PAGES.  I don't think
> we can easily remove I_DIRTY_PAGES because it's used for the
> did-someone-just-dirty-a-page test here.

I double checked I_DIRTY_PAGES. The main difference to PAGECACHE_TAG_DIRTY is:
I_DIRTY_PAGES (at the line removed by this patch) means there are _new_ pages
get dirtied during writeback, while PAGECACHE_TAG_DIRTY means there are dirty
pages. In this sense, if the I_DIRTY_PAGES handling is the same as
PAGECACHE_TAG_DIRTY, the code can be merged into PAGECACHE_TAG_DIRTY, as this
patch does.

The other minor differences are

- in *_set_page_dirty*(), PAGECACHE_TAG_DIRTY is set racelessly, while
  I_DIRTY_PAGES might be set on the inode for a page just truncated.
  The difference has no real impact on this patch (it's actually
  slightly better now).

- afs_fsync() always set I_DIRTY_PAGES after calling afs_writepages().
  The call was there in the first day (introduce by David Howells).
  What was the intention, hmm..?

> This code is way too complex and fragile and I fear that anything we do
> to it will break something :(

Agreed. Let's try to simplify it :)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-07-15 15:35 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-11  2:06 [PATCH 0/6] writeback cleanups and trivial fixes Wu Fengguang
2010-07-11  2:06 ` Wu Fengguang
2010-07-11  2:06 ` Wu Fengguang
2010-07-11  2:06 ` [PATCH 1/6] writeback: take account of NR_WRITEBACK_TEMP in balance_dirty_pages() Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-12 21:52   ` Andrew Morton
2010-07-12 21:52     ` Andrew Morton
2010-07-12 21:52     ` Andrew Morton
2010-07-13  8:58     ` Miklos Szeredi
2010-07-13  8:58       ` Miklos Szeredi
2010-07-15 14:50       ` Wu Fengguang
2010-07-15 14:50         ` Wu Fengguang
2010-07-11  2:06 ` [PATCH 2/6] writeback: reduce calls to global_page_state " Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-26 15:19   ` Jan Kara
2010-07-26 15:19     ` Jan Kara
2010-07-27  3:59     ` Wu Fengguang
2010-07-27  3:59       ` Wu Fengguang
2010-07-27  9:12       ` Jan Kara
2010-07-27  9:12         ` Jan Kara
2010-07-28  2:04         ` Wu Fengguang
2010-07-28  2:04           ` Wu Fengguang
2010-08-03 14:55   ` Peter Zijlstra
2010-08-03 14:55     ` Peter Zijlstra
2010-07-11  2:06 ` [PATCH 3/6] writeback: avoid unnecessary calculation of bdi dirty thresholds Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-12 21:56   ` Andrew Morton
2010-07-12 21:56     ` Andrew Morton
2010-07-12 21:56     ` Andrew Morton
2010-07-15 14:55     ` Wu Fengguang
2010-07-15 14:55       ` Wu Fengguang
2010-07-19 21:35   ` Andrew Morton
2010-07-19 21:35     ` Andrew Morton
2010-07-19 21:35     ` Andrew Morton
2010-07-20  3:34     ` Wu Fengguang
2010-07-20  3:34       ` Wu Fengguang
2010-07-20  3:34       ` Wu Fengguang
2010-07-20  4:14       ` Andrew Morton
2010-07-20  4:14         ` Andrew Morton
2010-08-03 15:03   ` Peter Zijlstra
2010-08-03 15:03     ` Peter Zijlstra
2010-08-03 15:10     ` Wu Fengguang
2010-08-03 15:10       ` Wu Fengguang
2010-08-04 16:41     ` Wu Fengguang
2010-08-04 16:41       ` Wu Fengguang
2010-08-04 17:10       ` Peter Zijlstra
2010-08-04 17:10         ` Peter Zijlstra
2010-07-11  2:07 ` [PATCH 4/6] writeback: dont redirty tail an inode with dirty pages Wu Fengguang
2010-07-11  2:07   ` Wu Fengguang
2010-07-12  2:01   ` Dave Chinner
2010-07-12  2:01     ` Dave Chinner
2010-07-12 15:31     ` Wu Fengguang
2010-07-12 15:31       ` Wu Fengguang
2010-07-12 22:13       ` Andrew Morton
2010-07-12 22:13         ` Andrew Morton
2010-07-15 15:35         ` Wu Fengguang [this message]
2010-07-15 15:35           ` Wu Fengguang
2010-07-11  2:07 ` [PATCH 5/6] writeback: fix queue_io() ordering Wu Fengguang
2010-07-11  2:07   ` Wu Fengguang
2010-07-11  2:07   ` Wu Fengguang
2010-07-12 22:15   ` Andrew Morton
2010-07-12 22:15     ` Andrew Morton
2010-07-12 22:15     ` Andrew Morton
2010-07-11  2:07 ` [PATCH 6/6] writeback: merge for_kupdate and !for_kupdate cases Wu Fengguang
2010-07-11  2:07   ` Wu Fengguang
2010-07-12  2:08   ` Dave Chinner
2010-07-12  2:08     ` Dave Chinner
2010-07-12 15:52     ` Wu Fengguang
2010-07-12 15:52       ` Wu Fengguang
2010-07-12 22:06       ` Dave Chinner
2010-07-12 22:06         ` Dave Chinner
2010-07-12 22:22       ` Andrew Morton
2010-07-12 22:22         ` Andrew Morton
2010-08-05 16:01         ` Wu Fengguang
2010-08-05 16:01           ` Wu Fengguang
2010-07-11  2:44 ` [PATCH 0/6] writeback cleanups and trivial fixes Christoph Hellwig
2010-07-11  2:44   ` Christoph Hellwig
2010-07-11  2:50   ` Wu Fengguang
2010-07-11  2:50     ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100715153530.GC6511@localhost \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.