From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering
Date: Thu, 30 Jun 2011 12:48:45 +1000 [thread overview]
Message-ID: <20110630024845.GC561@dastard> (raw)
In-Reply-To: <20110630020013.GX561@dastard>
On Thu, Jun 30, 2011 at 12:00:13PM +1000, Dave Chinner wrote:
> On Wed, Jun 29, 2011 at 10:01:12AM -0400, Christoph Hellwig wrote:
> > Instead of implementing our own writeback clustering use write_cache_pages
> > to do it for us. This means the guts of the current writepage implementation
> > become a new helper used both for implementing ->writepage and as a callback
> > to write_cache_pages for ->writepages. A new struct xfs_writeback_ctx
> > is used to track block mapping state and the ioend chain over multiple
> > invocation of it.
> >
> > The advantage over the old code is that we avoid a double pagevec lookup,
> > and a more efficient handling of extent boundaries inside a page for
> > small blocksize filesystems, as well as having less XFS specific code.
>
> Yes, it should be, but I can't actually measure any noticable CPU
> usage difference @800MB/s writeback. The profiles change shape
> around the changed code, but overall cpu usage does not change. I
> think this is because the second pagevec lookup is pretty much free
> because the radix tree is already hot in cache when we do the second
> lookup...
>
> > The downside is that we don't do writeback clustering when called from
> > kswapd anyore, but that is a case that should be avoided anyway. Note
> > that we still convert the whole delalloc range from ->writepage, so
> > the on-disk allocation pattern is not affected.
>
> All the more reason to ensure the mm subsystem doesn't do this....
>
> .....
> > error:
> > - if (iohead)
> > - xfs_cancel_ioend(iohead);
> > -
> > - if (err == -EAGAIN)
> > - goto redirty;
> > -
>
> Should this EAGAIN handling be dealt with in the removing-the-non-
> blocking-mode patch?
>
> > +STATIC int
> > xfs_vm_writepages(
> > struct address_space *mapping,
> > struct writeback_control *wbc)
> > {
> > + struct xfs_writeback_ctx ctx = { };
> > + int ret;
> > +
> > xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> > - return generic_writepages(mapping, wbc);
> > +
> > + ret = write_cache_pages(mapping, wbc, __xfs_vm_writepage, &ctx);
> > +
> > + if (ctx.iohead) {
> > + if (ret)
> > + xfs_cancel_ioend(ctx.iohead);
> > + else
> > + xfs_submit_ioend(wbc, ctx.iohead);
> > + }
>
> I think this error handling does not work. If we have put pages into
> the ioend (i.e. successful ->writepage calls) and then have a
> ->writepage call fail, we'll get all the pages under writeback (i.e.
> those on the ioend) remain in that state, and not ever get written
> back (so move into the clean state) or redirtied (so written again
> later)
>
> xfs_cancel_ioend() was only ever called for the first page sent down
> to ->writepage, and on error that page was redirtied separately.
> Hence it doesn't handle this case at all as it never occurs in the
> existing code.
>
> I'd suggest that regardless of whether an error is returned here,
> the existence of ctx.iohead indicates a valid ioend that needs to be
> submitted....
I think i just tripped this. I'm running a 1k block size filesystem,
and test 224 has hung with waiting on IO completion after .writepage
errors:
[ 2850.300979] XFS (vdb): Mounting Filesystem
[ 2850.310069] XFS (vdb): Ending clean mount
[ 2867.246341] Filesystem "vdb": reserve blocks depleted! Consider increasing reserve pool size.
[ 2867.247652] XFS (vdb): page discard on page ffffea0000257b40, inode 0x1c6, offset 1187840.
[ 2867.254135] XFS (vdb): page discard on page ffffea0000025f40, inode 0x423, offset 1839104.
[ 2867.256289] XFS (vdb): page discard on page ffffea0000a21aa0, inode 0x34e, offset 28672.
[ 2867.258845] XFS (vdb): page discard on page ffffea00001830d0, inode 0xe5, offset 3637248.
[ 2867.260637] XFS (vdb): page discard on page ffffea0000776af8, inode 0x132, offset 6283264.
[ 2867.269380] XFS (vdb): page discard on page ffffea00009d5d38, inode 0xf1, offset 5632000.
[ 2867.277851] XFS (vdb): page discard on page ffffea0000017e60, inode 0x27a, offset 32768.
[ 2867.281165] XFS (vdb): page discard on page ffffea0000258278, inode 0x274, offset 32768.
[ 2867.282802] XFS (vdb): page discard on page ffffea00009a3c60, inode 0x48a, offset 32768.
[ 2867.284166] XFS (vdb): page discard on page ffffea0000cc7808, inode 0x42e, offset 32768.
[ 2867.287138] XFS (vdb): page discard on page ffffea00004d4440, inode 0x4e0, offset 32768.
[ 2867.288500] XFS (vdb): page discard on page ffffea0000b34978, inode 0x4cd, offset 32768.
[ 2867.289381] XFS (vdb): page discard on page ffffea00003f40f8, inode 0x4c4, offset 155648.
[ 2867.291536] XFS (vdb): page discard on page ffffea0000023578, inode 0x4c7, offset 32768.
[ 2867.300880] XFS (vdb): page discard on page ffffea00005276e8, inode 0x4cc, offset 32768.
[ 2867.318819] XFS (vdb): page discard on page ffffea0000777230, inode 0x449, offset 8581120.
[ 4701.141666] SysRq : Show Blocked State
[ 4701.142093] task PC stack pid father
[ 4701.142707] dd D ffff8800076edbe8 0 14211 8946 0x00000000
[ 4701.143509] ffff88002b03fa58 0000000000000086 ffffea00002db598 ffffea0000000000
[ 4701.144009] ffff88002b03f9d8 ffffffff81113a35 ffff8800076ed860 0000000000010f80
[ 4701.144009] ffff88002b03ffd8 ffff88002b03e010 ffff88002b03ffd8 0000000000010f80
[ 4701.144009] Call Trace:
[ 4701.144009] [<ffffffff81113a35>] ? __free_pages+0x35/0x40
[ 4701.144009] [<ffffffff81062f69>] ? default_spin_lock_flags+0x9/0x10
[ 4701.144009] [<ffffffff8110b520>] ? __lock_page+0x70/0x70
[ 4701.144009] [<ffffffff81afe2d0>] io_schedule+0x60/0x80
[ 4701.144009] [<ffffffff8110b52e>] sleep_on_page+0xe/0x20
[ 4701.144009] [<ffffffff81afec2f>] __wait_on_bit+0x5f/0x90
[ 4701.144009] [<ffffffff8110b773>] wait_on_page_bit+0x73/0x80
[ 4701.144009] [<ffffffff810a4110>] ? autoremove_wake_function+0x40/0x40
[ 4701.144009] [<ffffffff81116365>] ? pagevec_lookup_tag+0x25/0x40
[ 4701.144009] [<ffffffff8110bbc2>] filemap_fdatawait_range+0x112/0x1a0
[ 4701.144009] [<ffffffff8145f469>] xfs_wait_on_pages+0x59/0x80
[ 4701.144009] [<ffffffff8145f51d>] xfs_flush_pages+0x8d/0xb0
[ 4701.144009] [<ffffffff8145f084>] xfs_file_buffered_aio_write+0x104/0x190
[ 4701.144009] [<ffffffff81b03a98>] ? do_page_fault+0x1e8/0x450
[ 4701.144009] [<ffffffff8145f2cf>] xfs_file_aio_write+0x1bf/0x300
[ 4701.144009] [<ffffffff81160844>] ? path_openat+0x104/0x3f0
[ 4701.144009] [<ffffffff8115251a>] do_sync_write+0xda/0x120
[ 4701.144009] [<ffffffff816488b3>] ? security_file_permission+0x23/0x90
[ 4701.144009] [<ffffffff81152a88>] vfs_write+0xc8/0x180
[ 4701.144009] [<ffffffff81152c31>] sys_write+0x51/0x90
[ 4701.144009] [<ffffffff81b07ec2>] system_call_fastpath+0x16/0x1b
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-06-30 2:48 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-29 14:01 [PATCH 00/27] patch queue for Linux 3.1 Christoph Hellwig
2011-06-29 14:01 ` [PATCH 01/27] xfs: PF_FSTRANS should never be set in ->writepage Christoph Hellwig
2011-06-30 1:34 ` Dave Chinner
2011-06-29 14:01 ` [PATCH 02/27] xfs: remove the unused ilock_nowait codepath in writepage Christoph Hellwig
2011-06-30 0:15 ` Dave Chinner
2011-06-30 1:26 ` Dave Chinner
2011-06-30 6:55 ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 03/27] xfs: use write_cache_pages for writeback clustering Christoph Hellwig
2011-06-30 2:00 ` Dave Chinner
2011-06-30 2:48 ` Dave Chinner [this message]
2011-06-30 6:57 ` Christoph Hellwig
2011-07-01 2:22 ` Dave Chinner
2011-07-01 4:18 ` Dave Chinner
2011-07-01 8:59 ` Christoph Hellwig
2011-07-01 9:20 ` Dave Chinner
2011-07-01 9:33 ` Christoph Hellwig
2011-07-01 14:59 ` Mel Gorman
2011-07-01 15:15 ` Christoph Hellwig
2011-07-02 2:42 ` Dave Chinner
2011-07-05 14:10 ` Mel Gorman
2011-07-05 15:55 ` Dave Chinner
2011-07-11 10:26 ` Christoph Hellwig
2011-07-01 15:41 ` Wu Fengguang
2011-07-04 3:25 ` Dave Chinner
2011-07-05 14:34 ` Mel Gorman
2011-07-06 1:23 ` Dave Chinner
2011-07-11 11:10 ` Christoph Hellwig
2011-07-06 4:53 ` Wu Fengguang
2011-07-06 6:47 ` Minchan Kim
2011-07-06 7:17 ` Dave Chinner
2011-07-06 15:12 ` Johannes Weiner
2011-07-08 9:54 ` Dave Chinner
2011-07-11 17:20 ` Johannes Weiner
2011-07-11 17:24 ` Christoph Hellwig
2011-07-11 19:09 ` Rik van Riel
2011-07-01 8:51 ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 04/27] xfs: cleanup xfs_add_to_ioend Christoph Hellwig
2011-06-29 22:13 ` Alex Elder
2011-06-30 2:00 ` Dave Chinner
2011-06-29 14:01 ` [PATCH 05/27] xfs: work around bogus gcc warning in xfs_allocbt_init_cursor Christoph Hellwig
2011-06-29 22:13 ` Alex Elder
2011-06-29 14:01 ` [PATCH 06/27] xfs: split xfs_setattr Christoph Hellwig
2011-06-29 22:13 ` Alex Elder
2011-06-30 7:03 ` Christoph Hellwig
2011-06-30 12:28 ` Alex Elder
2011-06-30 2:11 ` Dave Chinner
2011-06-29 14:01 ` [PATCH 08/27] xfs: kill xfs_itruncate_start Christoph Hellwig
2011-06-29 22:13 ` Alex Elder
2011-06-29 14:01 ` [PATCH 09/27] xfs: split xfs_itruncate_finish Christoph Hellwig
2011-06-30 2:44 ` Dave Chinner
2011-06-30 7:18 ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 10/27] xfs: improve sync behaviour in the fact of aggressive dirtying Christoph Hellwig
2011-06-30 2:52 ` Dave Chinner
2011-06-29 14:01 ` [PATCH 11/27] xfs: fix filesystsem freeze race in xfs_trans_alloc Christoph Hellwig
2011-06-30 2:59 ` Dave Chinner
2011-06-29 14:01 ` [PATCH 12/27] xfs: remove i_transp Christoph Hellwig
2011-06-30 3:00 ` Dave Chinner
2011-06-29 14:01 ` [PATCH 13/27] xfs: factor out xfs_dir2_leaf_find_entry Christoph Hellwig
2011-06-30 6:11 ` Dave Chinner
2011-06-30 7:34 ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 14/27] xfs: cleanup shortform directory inode number handling Christoph Hellwig
2011-06-30 6:35 ` Dave Chinner
2011-06-30 7:39 ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 15/27] xfs: kill struct xfs_dir2_sf Christoph Hellwig
2011-06-30 7:04 ` Dave Chinner
2011-06-30 7:09 ` Christoph Hellwig
2011-06-29 14:01 ` [PATCH 16/27] xfs: cleanup the defintion of struct xfs_dir2_sf_entry Christoph Hellwig
2011-06-29 14:01 ` [PATCH 17/27] xfs: avoid usage of struct xfs_dir2_block Christoph Hellwig
2011-06-29 14:01 ` [PATCH 18/27] xfs: kill " Christoph Hellwig
2011-06-29 14:01 ` [PATCH 19/27] xfs: avoid usage of struct xfs_dir2_data Christoph Hellwig
2011-06-29 14:01 ` [PATCH 20/27] xfs: kill " Christoph Hellwig
2011-06-29 14:01 ` [PATCH 21/27] xfs: cleanup the defintion of struct xfs_dir2_data_entry Christoph Hellwig
2011-06-29 14:01 ` [PATCH 22/27] xfs: cleanup struct xfs_dir2_leaf Christoph Hellwig
2011-06-29 14:01 ` [PATCH 23/27] xfs: remove the unused xfs_bufhash structure Christoph Hellwig
2011-06-29 14:01 ` [PATCH 24/27] xfs: clean up buffer locking helpers Christoph Hellwig
2011-06-29 14:01 ` [PATCH 25/27] xfs: return the buffer locked from xfs_buf_get_uncached Christoph Hellwig
2011-06-29 14:01 ` [PATCH 26/27] xfs: cleanup I/O-related buffer flags Christoph Hellwig
2011-06-29 14:01 ` [PATCH 27/27] xfs: avoid a few disk cache flushes Christoph Hellwig
2011-06-30 6:36 ` [PATCH 00/27] patch queue for Linux 3.1 Dave Chinner
2011-06-30 6:50 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110630024845.GC561@dastard \
--to=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox