From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@lst.de>
Cc: xfs@oss.sgi.com, rpeterso@redhat.com, linux-fsdevel@vger.kernel.org
Subject: Re: iomap infrastructure and multipage writes V5
Date: Tue, 28 Jun 2016 10:26:49 +1000 [thread overview]
Message-ID: <20160628002649.GI12670@dastard> (raw)
In-Reply-To: <1464792297-13185-1-git-send-email-hch@lst.de>
On Wed, Jun 01, 2016 at 04:44:43PM +0200, Christoph Hellwig wrote:
> This series add a new file system I/O path that uses the iomap structure
> introduced for the pNFS support and support multi-page buffered writes.
>
> This was first started by Dave Chinner a long time ago, then I did beat
> it into shape for production runs in a very constrained ARM NAS
> enviroment for Tuxera almost as long ago, and now half a dozen rewrites
> later it's back.
>
> The basic idea is to avoid the per-block get_blocks overhead
> and make use of extents in the buffered write path by iterating over
> them instead.
Christoph, it look slike there's an ENOSPC+ENOMEM behavioural regression here.
generic/224 on my 1p/1GB RAM VM using a 1k lock size filesystem has
significantly different behaviour once ENOSPC is hit withi this patchset.
It ends up with an endless stream of errors like this:
[ 687.530641] XFS (sdc): page discard on page ffffea0000197700, inode 0xb52c, offset 400338944.
[ 687.539828] XFS (sdc): page discard on page ffffea0000197740, inode 0xb52c, offset 400343040.
[ 687.549035] XFS (sdc): page discard on page ffffea0000197780, inode 0xb52c, offset 400347136.
[ 687.558222] XFS (sdc): page discard on page ffffea00001977c0, inode 0xb52c, offset 400351232.
[ 687.567391] XFS (sdc): page discard on page ffffea0000846000, inode 0xb52c, offset 400355328.
[ 687.576602] XFS (sdc): page discard on page ffffea0000846040, inode 0xb52c, offset 400359424.
[ 687.585794] XFS (sdc): page discard on page ffffea0000846080, inode 0xb52c, offset 400363520.
[ 687.595005] XFS (sdc): page discard on page ffffea00008460c0, inode 0xb52c, offset 400367616.
Yeah, it's been going for ten minutes already, and it's reporting
this for every page on inode 0xb52c. df reports:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdc 1038336 1038334 2 100% /mnt/scratch
So it's looking very much like the iomap write is allowing pages
through the write side rather than giving ENOSPC. i.e. the initial
alloc fails, it triggers a flush, which triggers a page discard,
which then allows the next write to proceed because it's free up
blocks. The trace looks like this
dd-12284 [000] 983.936583: xfs_file_buffered_write: dev 8:32 ino 0xb52c size 0xa00000 offset 0x1f37a000 count 0x1000 ioflags
dd-12284 [000] 983.936584: xfs_ilock: dev 8:32 ino 0xb52c flags ILOCK_EXCL caller 0xffffffff8150bb47s
dd-12284 [000] 983.936585: xfs_iomap_prealloc_size: dev 8:32 ino 0xb52c prealloc blocks 64 shift 6 m_writeio_blocks 64
dd-12284 [000] 983.936585: xfs_delalloc_enospc: dev 8:32 ino 0xb52c isize 0x1f37a000 disize 0xa00000 offset 0x1f37a000 count 4096
dd-12284 [000] 983.936586: xfs_delalloc_enospc: dev 8:32 ino 0xb52c isize 0x1f37a000 disize 0xa00000 offset 0x1f37a000 count 4096
dd-12284 [000] 983.936586: xfs_iunlock: dev 8:32 ino 0xb52c flags ILOCK_EXCL caller 0xffffffff8150bc61s
kworker/u2:0-6 [000] 983.946649: xfs_writepage: dev 8:32 ino 0xb52c pgoff 0x1f379000 size 0x1f37a000 offset 0 length 0 delalloc 1 unwritten 0
kworker/u2:0-6 [000] 983.946650: xfs_ilock: dev 8:32 ino 0xb52c flags ILOCK_SHARED caller 0xffffffff814f4b49s
kworker/u2:0-6 [000] 983.946651: xfs_iunlock: dev 8:32 ino 0xb52c flags ILOCK_SHARED caller 0xffffffff814f4bf5s
kworker/u2:0-6 [000] 983.948093: xfs_ilock: dev 8:32 ino 0xb52c flags ILOCK_EXCL caller 0xffffffff814f555fs
kworker/u2:0-6 [000] 983.948095: xfs_bunmap: dev 8:32 ino 0xb52c size 0xa00000 bno 0x7cde4 len 0x1flags caller 0xffffffff814f9123s
kworker/u2:0-6 [000] 983.948096: xfs_bmap_pre_update: dev 8:32 ino 0xb52c state idx 0 offset 511460 block 4503599627239431 count 4 flag 0 caller 0xffffffff814c6ea3s
kworker/u2:0-6 [000] 983.948097: xfs_bmap_post_update: dev 8:32 ino 0xb52c state idx 0 offset 511461 block 4503599627239431 count 3 flag 0 caller 0xffffffff814c6f1es
kworker/u2:0-6 [000] 983.948097: xfs_bunmap: dev 8:32 ino 0xb52c size 0xa00000 bno 0x7cde5 len 0x1flags caller 0xffffffff814f9123s
kworker/u2:0-6 [000] 983.948098: xfs_bmap_pre_update: dev 8:32 ino 0xb52c state idx 0 offset 511461 block 4503599627239431 count 3 flag 0 caller 0xffffffff814c6ea3s
kworker/u2:0-6 [000] 983.948098: xfs_bmap_post_update: dev 8:32 ino 0xb52c state idx 0 offset 511462 block 4503599627239431 count 2 flag 0 caller 0xffffffff814c6f1es
kworker/u2:0-6 [000] 983.948099: xfs_bunmap: dev 8:32 ino 0xb52c size 0xa00000 bno 0x7cde6 len 0x1flags caller 0xffffffff814f9123s
kworker/u2:0-6 [000] 983.948099: xfs_bmap_pre_update: dev 8:32 ino 0xb52c state idx 0 offset 511462 block 4503599627239431 count 2 flag 0 caller 0xffffffff814c6ea3s
kworker/u2:0-6 [000] 983.948100: xfs_bmap_post_update: dev 8:32 ino 0xb52c state idx 0 offset 511463 block 4503599627239431 count 1 flag 0 caller 0xffffffff814c6f1es
kworker/u2:0-6 [000] 983.948100: xfs_bunmap: dev 8:32 ino 0xb52c size 0xa00000 bno 0x7cde7 len 0x1flags caller 0xffffffff814f9123s
kworker/u2:0-6 [000] 983.948101: xfs_iext_remove: dev 8:32 ino 0xb52c state idx 0 offset 511463 block 4503599627239431 count 1 flag 0 caller 0xffffffff814c6d2as
kworker/u2:0-6 [000] 983.948101: xfs_iunlock: dev 8:32 ino 0xb52c flags ILOCK_EXCL caller 0xffffffff814f55e8s
kworker/u2:0-6 [000] 983.948102: xfs_invalidatepage: dev 8:32 ino 0xb52c pgoff 0x1f379000 size 0x1f37a000 offset 0 length 1000 delalloc 1 unwritten 0
kworker/u2:0-6 [000] 983.948102: xfs_releasepage: dev 8:32 ino 0xb52c pgoff 0x1f379000 size 0x1f37a000 offset 0 length 0 delalloc 0 unwritten 0
[snip eof block scan locking]
dd-12284 [000] 983.948239: xfs_file_buffered_write: dev 8:32 ino 0xb52c size 0xa00000 offset 0x1f37a000 count 0x1000 ioflags
dd-12284 [000] 983.948239: xfs_ilock: dev 8:32 ino 0xb52c flags ILOCK_EXCL caller 0xffffffff8150bb47s
dd-12284 [000] 983.948240: xfs_iomap_prealloc_size: dev 8:32 ino 0xb52c prealloc blocks 64 shift 0 m_writeio_blocks 64
dd-12284 [000] 983.948242: xfs_delalloc_enospc: dev 8:32 ino 0xb52c isize 0x1f37a000 disize 0xa00000 offset 0x1f37a000 count 4096
dd-12284 [000] 983.948243: xfs_iext_insert: dev 8:32 ino 0xb52c state idx 0 offset 511464 block 4503599627239431 count 4 flag 0 caller 0xffffffff814c265as
dd-12284 [000] 983.948243: xfs_iunlock: dev 8:32 ino 0xb52c flags ILOCK_EXCL caller 0xffffffff8150bc61s
dd-12284 [000] 983.948244: xfs_iomap_alloc: dev 8:32 ino 0xb52c size 0xa00000 offset 0x1f37a000 count 4096 type invalid startoff 0x7cde8 startblock -1 blockcount 0x4
dd-12284 [000] 983.948250: xfs_iunlock: dev 8:32 ino 0xb52c flags IOLOCK_EXCL caller 0xffffffff81502378s
dd-12284 [000] 983.948254: xfs_ilock: dev 8:32 ino 0xb52c flags IOLOCK_EXCL caller 0xffffffff81502352s
dd-12284 [000] 983.948256: xfs_update_time: dev 8:32 ino 0xb52c
dd-12284 [000] 983.948257: xfs_log_reserve: dev 8:32 t_ocnt 0 t_cnt 0 t_curr_res 2860 t_unit_res 2860 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 956212 grant_write_cycle 1 grant_write_bytes 956212 curr_cycle 1 curr_block 1863 tail_cycle 1 tail_block 1861
dd-12284 [000] 983.948257: xfs_log_reserve_exit: dev 8:32 t_ocnt 0 t_cnt 0 t_curr_res 2860 t_unit_res 2860 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 959072 grant_write_cycle 1 grant_write_bytes 959072 curr_cycle 1 curr_block 1863 tail_cycle 1 tail_block 1861
dd-12284 [000] 983.948258: xfs_ilock: dev 8:32 ino 0xb52c flags ILOCK_EXCL caller 0xffffffff8150c4dfs
dd-12284 [000] 983.948260: xfs_log_done_nonperm: dev 8:32 t_ocnt 0 t_cnt 0 t_curr_res 2860 t_unit_res 2860 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 959072 grant_write_cycle 1 grant_write_bytes 959072 curr_cycle 1 curr_block 1863 tail_cycle 1 tail_block 1861
dd-12284 [000] 983.948260: xfs_log_ungrant_enter: dev 8:32 t_ocnt 0 t_cnt 0 t_curr_res 2860 t_unit_res 2860 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 959072 grant_write_cycle 1 grant_write_bytes 959072 curr_cycle 1 curr_block 1863 tail_cycle 1 tail_block 1861
dd-12284 [000] 983.948260: xfs_log_ungrant_sub: dev 8:32 t_ocnt 0 t_cnt 0 t_curr_res 2860 t_unit_res 2860 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 959072 grant_write_cycle 1 grant_write_bytes 959072 curr_cycle 1 curr_block 1863 tail_cycle 1 tail_block 1861
dd-12284 [000] 983.948261: xfs_log_ungrant_exit: dev 8:32 t_ocnt 0 t_cnt 0 t_curr_res 2860 t_unit_res 2860 t_flags XLOG_TIC_INITED reserveq empty writeq empty grant_reserve_cycle 1 grant_reserve_bytes 956212 grant_write_cycle 1 grant_write_bytes 956212 curr_cycle 1 curr_block 1863 tail_cycle 1 tail_block 1861
dd-12284 [000] 983.948261: xfs_iunlock: dev 8:32 ino 0xb52c flags ILOCK_EXCL caller 0xffffffff81526d6cs
Ad so the cycle goes. The next page at offset 0x1f37b000 fails
allocation, triggers a flush, which results in the write at
0x1f37a000 failing and freeing it's delalloc blocks, which then
allows the retry of the write at 0x1f37b000 to succeed....
I can see that mapping errors (i.e. the AS_ENOSPC error) ar enot
propagated into the write() path, but that's the same as the old
code. What I don't quite understand is how delalloc has blown
through the XFS_ALLOC_SET_ASIDE() limits which are supposed to
trigger ENOSPC on the write() side much earlier than just one or two
blocks free....
Something doesn't quite add up here, and I haven't been able to put
my finger on it yet.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2016-06-28 0:32 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-01 14:44 iomap infrastructure and multipage writes V5 Christoph Hellwig
2016-06-01 14:44 ` [PATCH 01/14] fs: move struct iomap from exportfs.h to a separate header Christoph Hellwig
2016-06-01 14:44 ` [PATCH 02/14] fs: introduce iomap infrastructure Christoph Hellwig
2016-06-16 16:12 ` Jan Kara
2016-06-17 12:01 ` Christoph Hellwig
2016-06-20 2:29 ` Dave Chinner
2016-06-20 12:22 ` Christoph Hellwig
2016-06-01 14:44 ` [PATCH 03/14] fs: support DAX based iomap zeroing Christoph Hellwig
2016-06-01 14:44 ` [PATCH 04/14] xfs: make xfs_bmbt_to_iomap available outside of xfs_pnfs.c Christoph Hellwig
2016-06-01 14:44 ` [PATCH 05/14] xfs: reorder zeroing and flushing sequence in truncate Christoph Hellwig
2016-06-01 14:44 ` [PATCH 06/14] xfs: implement iomap based buffered write path Christoph Hellwig
2016-06-01 14:44 ` [PATCH 07/14] xfs: remove buffered write support from __xfs_get_blocks Christoph Hellwig
2016-06-01 14:44 ` [PATCH 08/14] fs: iomap based fiemap implementation Christoph Hellwig
2016-08-18 13:18 ` Andreas Grünbacher
2016-06-01 14:44 ` [PATCH 09/14] xfs: use iomap " Christoph Hellwig
2016-06-01 14:44 ` [PATCH 10/14] xfs: use iomap infrastructure for DAX zeroing Christoph Hellwig
2016-06-01 14:44 ` [PATCH 11/14] xfs: handle 64-bit length in xfs_iozero Christoph Hellwig
2016-06-01 14:44 ` [PATCH 12/14] xfs: use xfs_zero_range in xfs_zero_eof Christoph Hellwig
2016-06-01 14:44 ` [PATCH 13/14] xfs: split xfs_free_file_space in manageable pieces Christoph Hellwig
2016-06-01 14:44 ` [PATCH 14/14] xfs: kill xfs_zero_remaining_bytes Christoph Hellwig
2016-06-01 14:46 ` iomap infrastructure and multipage writes V5 Christoph Hellwig
2016-06-28 0:26 ` Dave Chinner [this message]
2016-06-28 13:28 ` Christoph Hellwig
2016-06-28 13:38 ` Christoph Hellwig
2016-06-30 17:22 ` Christoph Hellwig
2016-06-30 23:16 ` Dave Chinner
2016-07-18 11:14 ` Dave Chinner
2016-07-18 11:18 ` Dave Chinner
2016-07-31 19:19 ` Christoph Hellwig
2016-08-01 0:16 ` Dave Chinner
2016-08-02 23:42 ` Dave Chinner
2017-02-13 22:31 ` Eric Sandeen
2017-02-14 6:10 ` Christoph Hellwig
2016-07-19 3:50 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160628002649.GI12670@dastard \
--to=david@fromorbit.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=rpeterso@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).