public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering
Date: Thu, 30 Jun 2011 12:48:45 +1000	[thread overview]
Message-ID: <20110630024845.GC561@dastard> (raw)
In-Reply-To: <20110630020013.GX561@dastard>

On Thu, Jun 30, 2011 at 12:00:13PM +1000, Dave Chinner wrote:
> On Wed, Jun 29, 2011 at 10:01:12AM -0400, Christoph Hellwig wrote:
> > Instead of implementing our own writeback clustering use write_cache_pages
> > to do it for us.  This means the guts of the current writepage implementation
> > become a new helper used both for implementing ->writepage and as a callback
> > to write_cache_pages for ->writepages.  A new struct xfs_writeback_ctx
> > is used to track block mapping state and the ioend chain over multiple
> > invocation of it.
> > 
> > The advantage over the old code is that we avoid a double pagevec lookup,
> > and a more efficient handling of extent boundaries inside a page for
> > small blocksize filesystems, as well as having less XFS specific code.
> 
> Yes, it should be, but I can't actually measure any noticable CPU
> usage difference @800MB/s writeback. The profiles change shape
> around the changed code, but overall cpu usage does not change. I
> think this is because the second pagevec lookup is pretty much free
> because the radix tree is already hot in cache when we do the second
> lookup...
> 
> > The downside is that we don't do writeback clustering when called from
> > kswapd anyore, but that is a case that should be avoided anyway.  Note
> > that we still convert the whole delalloc range from ->writepage, so
> > the on-disk allocation pattern is not affected.
> 
> All the more reason to ensure the mm subsystem doesn't do this....
> 
> .....
> >  error:
> > -	if (iohead)
> > -		xfs_cancel_ioend(iohead);
> > -
> > -	if (err == -EAGAIN)
> > -		goto redirty;
> > -
> 
> Should this EAGAIN handling be dealt with in the removing-the-non-
> blocking-mode patch?
> 
> > +STATIC int
> >  xfs_vm_writepages(
> >  	struct address_space	*mapping,
> >  	struct writeback_control *wbc)
> >  {
> > +	struct xfs_writeback_ctx ctx = { };
> > +	int ret;
> > +
> >  	xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> > -	return generic_writepages(mapping, wbc);
> > +
> > +	ret = write_cache_pages(mapping, wbc, __xfs_vm_writepage, &ctx);
> > +
> > +	if (ctx.iohead) {
> > +		if (ret)
> > +			xfs_cancel_ioend(ctx.iohead);
> > +		else
> > +			xfs_submit_ioend(wbc, ctx.iohead);
> > +	}
> 
> I think this error handling does not work. If we have put pages into
> the ioend (i.e. successful ->writepage calls) and then have a
> ->writepage call fail, we'll get all the pages under writeback (i.e.
> those on the ioend) remain in that state, and not ever get written
> back (so move into the clean state) or redirtied (so written again
> later)
> 
> xfs_cancel_ioend() was only ever called for the first page sent down
> to ->writepage, and on error that page was redirtied separately.
> Hence it doesn't handle this case at all as it never occurs in the
> existing code.
> 
> I'd suggest that regardless of whether an error is returned here,
> the existence of ctx.iohead indicates a valid ioend that needs to be
> submitted....

I think i just tripped this. I'm running a 1k block size filesystem,
and test 224 has hung with waiting on IO completion after .writepage
errors:

[ 2850.300979] XFS (vdb): Mounting Filesystem
[ 2850.310069] XFS (vdb): Ending clean mount
[ 2867.246341] Filesystem "vdb": reserve blocks depleted! Consider increasing reserve pool size.
[ 2867.247652] XFS (vdb): page discard on page ffffea0000257b40, inode 0x1c6, offset 1187840.
[ 2867.254135] XFS (vdb): page discard on page ffffea0000025f40, inode 0x423, offset 1839104.
[ 2867.256289] XFS (vdb): page discard on page ffffea0000a21aa0, inode 0x34e, offset 28672.
[ 2867.258845] XFS (vdb): page discard on page ffffea00001830d0, inode 0xe5, offset 3637248.
[ 2867.260637] XFS (vdb): page discard on page ffffea0000776af8, inode 0x132, offset 6283264.
[ 2867.269380] XFS (vdb): page discard on page ffffea00009d5d38, inode 0xf1, offset 5632000.
[ 2867.277851] XFS (vdb): page discard on page ffffea0000017e60, inode 0x27a, offset 32768.
[ 2867.281165] XFS (vdb): page discard on page ffffea0000258278, inode 0x274, offset 32768.
[ 2867.282802] XFS (vdb): page discard on page ffffea00009a3c60, inode 0x48a, offset 32768.
[ 2867.284166] XFS (vdb): page discard on page ffffea0000cc7808, inode 0x42e, offset 32768.
[ 2867.287138] XFS (vdb): page discard on page ffffea00004d4440, inode 0x4e0, offset 32768.
[ 2867.288500] XFS (vdb): page discard on page ffffea0000b34978, inode 0x4cd, offset 32768.
[ 2867.289381] XFS (vdb): page discard on page ffffea00003f40f8, inode 0x4c4, offset 155648.
[ 2867.291536] XFS (vdb): page discard on page ffffea0000023578, inode 0x4c7, offset 32768.
[ 2867.300880] XFS (vdb): page discard on page ffffea00005276e8, inode 0x4cc, offset 32768.
[ 2867.318819] XFS (vdb): page discard on page ffffea0000777230, inode 0x449, offset 8581120.
[ 4701.141666] SysRq : Show Blocked State
[ 4701.142093]   task                        PC stack   pid father
[ 4701.142707] dd              D ffff8800076edbe8     0 14211   8946 0x00000000
[ 4701.143509]  ffff88002b03fa58 0000000000000086 ffffea00002db598 ffffea0000000000
[ 4701.144009]  ffff88002b03f9d8 ffffffff81113a35 ffff8800076ed860 0000000000010f80
[ 4701.144009]  ffff88002b03ffd8 ffff88002b03e010 ffff88002b03ffd8 0000000000010f80
[ 4701.144009] Call Trace:
[ 4701.144009]  [<ffffffff81113a35>] ? __free_pages+0x35/0x40
[ 4701.144009]  [<ffffffff81062f69>] ? default_spin_lock_flags+0x9/0x10
[ 4701.144009]  [<ffffffff8110b520>] ? __lock_page+0x70/0x70
[ 4701.144009]  [<ffffffff81afe2d0>] io_schedule+0x60/0x80
[ 4701.144009]  [<ffffffff8110b52e>] sleep_on_page+0xe/0x20
[ 4701.144009]  [<ffffffff81afec2f>] __wait_on_bit+0x5f/0x90
[ 4701.144009]  [<ffffffff8110b773>] wait_on_page_bit+0x73/0x80
[ 4701.144009]  [<ffffffff810a4110>] ? autoremove_wake_function+0x40/0x40
[ 4701.144009]  [<ffffffff81116365>] ? pagevec_lookup_tag+0x25/0x40
[ 4701.144009]  [<ffffffff8110bbc2>] filemap_fdatawait_range+0x112/0x1a0
[ 4701.144009]  [<ffffffff8145f469>] xfs_wait_on_pages+0x59/0x80
[ 4701.144009]  [<ffffffff8145f51d>] xfs_flush_pages+0x8d/0xb0
[ 4701.144009]  [<ffffffff8145f084>] xfs_file_buffered_aio_write+0x104/0x190
[ 4701.144009]  [<ffffffff81b03a98>] ? do_page_fault+0x1e8/0x450
[ 4701.144009]  [<ffffffff8145f2cf>] xfs_file_aio_write+0x1bf/0x300
[ 4701.144009]  [<ffffffff81160844>] ? path_openat+0x104/0x3f0
[ 4701.144009]  [<ffffffff8115251a>] do_sync_write+0xda/0x120
[ 4701.144009]  [<ffffffff816488b3>] ? security_file_permission+0x23/0x90
[ 4701.144009]  [<ffffffff81152a88>] vfs_write+0xc8/0x180
[ 4701.144009]  [<ffffffff81152c31>] sys_write+0x51/0x90
[ 4701.144009]  [<ffffffff81b07ec2>] system_call_fastpath+0x16/0x1b

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-06-30  2:48 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-29 14:01 [PATCH 00/27] patch queue for Linux 3.1 Christoph Hellwig
2011-06-29 14:01 ` [PATCH 01/27] xfs: PF_FSTRANS should never be set in ->writepage Christoph Hellwig
2011-06-30  1:34   ` Dave Chinner
2011-06-29 14:01 ` [PATCH 02/27] xfs: remove the unused ilock_nowait codepath in writepage Christoph Hellwig
2011-06-30  0:15   ` Dave Chinner
2011-06-30  1:26     ` Dave Chinner
2011-06-30  6:55     ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 03/27] xfs: use write_cache_pages for writeback clustering Christoph Hellwig
2011-06-30  2:00   ` Dave Chinner
2011-06-30  2:48     ` Dave Chinner [this message]
2011-06-30  6:57     ` Christoph Hellwig
2011-07-01  2:22   ` Dave Chinner
2011-07-01  4:18     ` Dave Chinner
2011-07-01  8:59       ` Christoph Hellwig
2011-07-01  9:20         ` Dave Chinner
2011-07-01  9:33       ` Christoph Hellwig
2011-07-01 14:59         ` Mel Gorman
2011-07-01 15:15           ` Christoph Hellwig
2011-07-02  2:42           ` Dave Chinner
2011-07-05 14:10             ` Mel Gorman
2011-07-05 15:55               ` Dave Chinner
2011-07-11 10:26             ` Christoph Hellwig
2011-07-01 15:41         ` Wu Fengguang
2011-07-04  3:25           ` Dave Chinner
2011-07-05 14:34             ` Mel Gorman
2011-07-06  1:23               ` Dave Chinner
2011-07-11 11:10               ` Christoph Hellwig
2011-07-06  4:53             ` Wu Fengguang
2011-07-06  6:47               ` Minchan Kim
2011-07-06  7:17               ` Dave Chinner
2011-07-06 15:12             ` Johannes Weiner
2011-07-08  9:54               ` Dave Chinner
2011-07-11 17:20                 ` Johannes Weiner
2011-07-11 17:24                   ` Christoph Hellwig
2011-07-11 19:09                   ` Rik van Riel
2011-07-01  8:51     ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 04/27] xfs: cleanup xfs_add_to_ioend Christoph Hellwig
2011-06-29 22:13   ` Alex Elder
2011-06-30  2:00   ` Dave Chinner
2011-06-29 14:01 ` [PATCH 05/27] xfs: work around bogus gcc warning in xfs_allocbt_init_cursor Christoph Hellwig
2011-06-29 22:13   ` Alex Elder
2011-06-29 14:01 ` [PATCH 06/27] xfs: split xfs_setattr Christoph Hellwig
2011-06-29 22:13   ` Alex Elder
2011-06-30  7:03     ` Christoph Hellwig
2011-06-30 12:28       ` Alex Elder
2011-06-30  2:11   ` Dave Chinner
2011-06-29 14:01 ` [PATCH 08/27] xfs: kill xfs_itruncate_start Christoph Hellwig
2011-06-29 22:13   ` Alex Elder
2011-06-29 14:01 ` [PATCH 09/27] xfs: split xfs_itruncate_finish Christoph Hellwig
2011-06-30  2:44   ` Dave Chinner
2011-06-30  7:18     ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 10/27] xfs: improve sync behaviour in the fact of aggressive dirtying Christoph Hellwig
2011-06-30  2:52   ` Dave Chinner
2011-06-29 14:01 ` [PATCH 11/27] xfs: fix filesystsem freeze race in xfs_trans_alloc Christoph Hellwig
2011-06-30  2:59   ` Dave Chinner
2011-06-29 14:01 ` [PATCH 12/27] xfs: remove i_transp Christoph Hellwig
2011-06-30  3:00   ` Dave Chinner
2011-06-29 14:01 ` [PATCH 13/27] xfs: factor out xfs_dir2_leaf_find_entry Christoph Hellwig
2011-06-30  6:11   ` Dave Chinner
2011-06-30  7:34     ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 14/27] xfs: cleanup shortform directory inode number handling Christoph Hellwig
2011-06-30  6:35   ` Dave Chinner
2011-06-30  7:39     ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 15/27] xfs: kill struct xfs_dir2_sf Christoph Hellwig
2011-06-30  7:04   ` Dave Chinner
2011-06-30  7:09     ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 16/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry Christoph Hellwig
2011-06-29 14:01 ` [PATCH 17/27] xfs: avoid usage of struct xfs_dir2_block Christoph Hellwig
2011-06-29 14:01 ` [PATCH 18/27] xfs: kill " Christoph Hellwig
2011-06-29 14:01 ` [PATCH 19/27] xfs: avoid usage of struct xfs_dir2_data Christoph Hellwig
2011-06-29 14:01 ` [PATCH 20/27] xfs: kill " Christoph Hellwig
2011-06-29 14:01 ` [PATCH 21/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry Christoph Hellwig
2011-06-29 14:01 ` [PATCH 22/27] xfs: cleanup struct xfs_dir2_leaf Christoph Hellwig
2011-06-29 14:01 ` [PATCH 23/27] xfs: remove the unused xfs_bufhash structure Christoph Hellwig
2011-06-29 14:01 ` [PATCH 24/27] xfs: clean up buffer locking helpers Christoph Hellwig
2011-06-29 14:01 ` [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached Christoph Hellwig
2011-06-29 14:01 ` [PATCH 26/27] xfs: cleanup I/O-related buffer flags Christoph Hellwig
2011-06-29 14:01 ` [PATCH 27/27] xfs: avoid a few disk cache flushes Christoph Hellwig
2011-06-30  6:36 ` [PATCH 00/27] patch queue for Linux 3.1 Dave Chinner
2011-06-30  6:50   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110630024845.GC561@dastard \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox