From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH v4] xfs: rework zero range to prevent invalid i_size updates
Date: Tue, 14 Oct 2014 11:18:06 +1100 [thread overview]
Message-ID: <20141014001806.GG5267@dastard> (raw)
In-Reply-To: <1413220285-24886-1-git-send-email-bfoster@redhat.com>
On Mon, Oct 13, 2014 at 01:11:25PM -0400, Brian Foster wrote:
> The zero range operation is analogous to fallocate with the exception of
> converting the range to zeroes. E.g., it attempts to allocate zeroed
> blocks over the range specified by the caller. The XFS implementation
> kills all delalloc blocks currently over the aligned range, converts the
> range to allocated zero blocks (unwritten extents) and handles the
> partial pages at the ends of the range by sending writes through the
> pagecache.
>
> The current implementation suffers from several problems associated with
> inode size. If the aligned range covers an extending I/O, said I/O is
> discarded and an inode size update from a previous write never makes it
> to disk. Further, if an unaligned zero range extends beyond eof, the
> page write induced for the partial end page can itself increase the
> inode size, even if the zero range request is not supposed to update
> i_size (via KEEP_SIZE, similar to an fallocate beyond EOF).
>
> The latter behavior not only incorrectly increases the inode size, but
> can lead to stray delalloc blocks on the inode. Typically, post-eof
> preallocation blocks are either truncated on release or inode eviction
> or explicitly written to by xfs_zero_eof() on natural file size
> extension. If the inode size increases due to zero range, however,
> associated blocks leak into the address space having never been
> converted or mapped to pagecache pages. A direct I/O to such an
> uncovered range cannot convert the extent via writeback and will BUG().
> For example:
>
> $ xfs_io -fc "pwrite 0 128k" -c "fzero -k 1m 54321" <file>
> ...
> $ xfs_io -d -c "pread 128k 128k" <file>
> <BUG>
>
> If the entire delalloc extent happens to not have page coverage
> whatsoever (e.g., delalloc conversion couldn't find a large enough free
> space extent), even a full file writeback won't convert what's left of
> the extent and we'll assert on inode eviction.
>
> Rework xfs_zero_file_space() to avoid buffered I/O for partial pages.
> Use the existing hole punch and prealloc mechanisms as primitives for
> zero range. We punch out the pagecache beforehand to eliminate
> unnecessary writeback. The hole punch mechanism handles partial block
> zeroing for us and facilitates the use of a single prealloc call over
> the entire range, which increases the odds of contiguous allocation.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
This patch triggers the same bug pretty much straight away on
generic/033 on all my test systems:
[ 306.378041] XFS: Assertion failed: startblockval(del.br_startblock) > 0, file: fs/xfs/libxfs/xfs_bmap.c, line: 5279
[ 306.380694] ------------[ cut here ]------------
[ 306.381655] kernel BUG at fs/xfs/xfs_message.c:107!
[ 306.382535] invalid opcode: 0000 [#1] SMP
[ 306.383310] Modules linked in:
[ 306.383889] CPU: 0 PID: 12151 Comm: xfs_io Not tainted 3.17.0-dgc+ #537
[ 306.384665] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 306.384665] task: ffff88007597e2c0 ti: ffff880077644000 task.ti: ffff880077644000
[ 306.384665] RIP: 0010:[<ffffffff814dadb2>] [<ffffffff814dadb2>] assfail+0x22/0x30
[ 306.384665] RSP: 0018:ffff880077647c98 EFLAGS: 00010296
[ 306.384665] RAX: 0000000000000067 RBX: 0000000000000007 RCX: 0000000000000000
[ 306.384665] RDX: 0000000000000001 RSI: ffff88007fc0d258 RDI: ffff88007fc0d258
[ 306.384665] RBP: ffff880077647c98 R08: 000000000000000a R09: 0000000000000000
[ 306.384665] R10: 000000000000026b R11: ffff880077647946 R12: 0000000000000007
[ 306.384665] R13: ffff88006a35d300 R14: ffff880076669370 R15: ffff880077647db0
[ 306.384665] FS: 00007fc0b858e700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 306.384665] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 306.384665] CR2: 000000000061dc40 CR3: 000000007592d000 CR4: 00000000000006f0
[ 306.384665] Stack:
[ 306.384665] ffff880077647d88 ffffffff81491341 ffff880077647d40 ffff880077647db4
[ 306.384665] ffff880075997000 0000000100000007 ffff88006a35d340 0000000000000000
[ 306.384665] 0000000000000000 ffffffff00000000 0000000700000000 0000000000000000
[ 306.384665] Call Trace:
[ 306.384665] [<ffffffff81491341>] xfs_bunmapi+0x781/0x1000
[ 306.384665] [<ffffffff814c0ad6>] xfs_bmap_punch_delalloc_range+0xf6/0x1a0
[ 306.384665] [<ffffffff814c1b13>] xfs_zero_file_space+0xf3/0x1d0
[ 306.384665] [<ffffffff814c8538>] xfs_file_fallocate+0xe8/0x2f0
[ 306.384665] [<ffffffff811aeb48>] ? __sb_start_write+0x58/0xf0
[ 306.384665] [<ffffffff811aa8b7>] do_fallocate+0x127/0x1c0
[ 306.384665] [<ffffffff811aa994>] SyS_fallocate+0x44/0x70
[ 306.384665] [<ffffffff81d01a29>] system_call_fastpath+0x16/0x1b
[ 306.384665] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 89 e5 48 89 fa 48 c7 c6 f0 67 15 82 31 ff 31 c0 e8 ce fb ff ff <0f> 0b 66 66 66
[ 306.384665] RIP [<ffffffff814dadb2>] assfail+0x22/0x30
[ 306.384665] RSP <ffff880077647c98>
[ 306.418027] ---[ end trace 18ffcc2e14a50ab1 ]---
I'm running 3.17 + for-next + a handful of local patches, but this
is the only patch that modifies anything in this area. I'll remove
all the other patches I have just to check....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-10-14 0:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-13 17:11 [PATCH v4] xfs: rework zero range to prevent invalid i_size updates Brian Foster
2014-10-14 0:18 ` Dave Chinner [this message]
2014-10-14 0:50 ` Dave Chinner
2014-10-14 11:37 ` Brian Foster
2014-10-17 20:20 ` Brian Foster
2014-10-20 1:25 ` Dave Chinner
2014-10-20 12:42 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141014001806.GG5267@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.