From: Chandan Rajendra <chandan@linux.vnet.ibm.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org
Subject: Re: COW improvements and always_cow support V2
Date: Mon, 26 Nov 2018 17:12:16 +0530 [thread overview]
Message-ID: <1984636.oWepG2GJO6@localhost.localdomain> (raw)
In-Reply-To: <4455584.hbaYzlDTmr@localhost.localdomain>
On Sunday, November 25, 2018 7:09:33 PM IST Chandan Rajendra wrote:
> On Monday, November 19, 2018 7:16:10 PM IST Christoph Hellwig wrote:
> > Hi all,
> >
> > this series adds the always_cow mode support after improving our COW
> > write support a little bit first.
> >
> > The always_cow mode stresses the COW path a lot, but with a few xfstests
> > fixups it generall looks good, except for:
> >
> > - a few tests that complain about fragmentation, which is rather inherent
> > in this mode
> > - generic/208 crashing a lot (and generic/095 with 1k block similarly)
> > because a COW fork extent has changed under writeback. As far as I can
> > tell this is because nothing prevents another thread from moving a COW
> > fork extent to the data fork while we are under writeback. I'm currently
> > fully root causing this and looking into a potential fix
> > - xfs/017 crashes occasionally in log recovery because we can't find
> > a refcount tree record that we try to free.
> > I haven't really fully understood this one yet.
> >
> > Changes since v1:
> > - make delalloc and unwritten extent conversions simpler and more robust
> > - add a few additional cleanups
> > - support all fallocate modes but actual preallocation
> > - rebase on top of a fix from Brian (which is included as first patch
> > to make the patch set more usable)
> >
>
> Hi Christoph,
>
> On ppc64le (with 4k block size), xfs/017 causes the following call trace,
>
> WARNING: CPU: 2 PID: 7865 at /root/repos/linux/fs/xfs/xfs_aops.c:352 xfs_map_blocks+0x154/0xc44
> Modules linked in:
> CPU: 2 PID: 7865 Comm: fsstress Not tainted 4.20.0-rc3-next-20181123-00008-gbb319d33d6c5-dirty #7
> NIP: c00000000069f924 LR: c00000000069f88c CTR: 0000000000000000
> REGS: c000000630dc3520 TRAP: 0700 Not tainted (4.20.0-rc3-next-20181123-00008-gbb319d33d6c5-dirty)
> MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 44004484 XER: 00000000
> CFAR: c00000000000e6d4 IRQMASK: 0
> GPR00: c00000000069f88c c000000630dc37b0 c0000000017a3100 c000000630dc3b80
> GPR04: 0000000000000000 000000000000008f 0000000000000008 fffffffffffffffe
> GPR08: 000000000000002b 0000000000000001 0000000000000009 c000000623040080
> GPR12: c0000000003113a0 c00000003fffdf00 0000000000000000 0000000000000000
> GPR16: 0000000100030780 00007fffffffb2c0 00000001000136a8 0000000000000001
> GPR20: 0000000000000000 0000000000010000 c000000630dc3c20 0000000000001000
> GPR24: 0000000000000008 f000000000d79a00 c000000630dc3b80 c000000624320300
> GPR28: c000000100733f48 0000000000000000 0000000000009000 c000000630dc37b0
> NIP [c00000000069f924] xfs_map_blocks+0x154/0xc44
> LR [c00000000069f88c] xfs_map_blocks+0xbc/0xc44
> Call Trace:
> [c000000630dc37b0] [c00000000069f88c] xfs_map_blocks+0xbc/0xc44 (unreliable)
> [c000000630dc3960] [c0000000006a0e88] xfs_writepage_map+0x174/0x428
> [c000000630dc39f0] [c0000000006a12ac] xfs_do_writepage+0x170/0x2c0
> [c000000630dc3a40] [c0000000003257f0] write_cache_pages+0x350/0x540
> [c000000630dc3b60] [c0000000006a1508] xfs_vm_writepages+0x98/0xd4
> [c000000630dc3bd0] [c000000000326c2c] do_writepages+0x5c/0xcc
> [c000000630dc3c00] [c00000000030e338] __filemap_fdatawrite_range+0xc4/0xd4
> [c000000630dc3c60] [c00000000030e460] filemap_flush+0x30/0x40
> [c000000630dc3c80] [c0000000006d0538] xfs_release+0x12c/0x298
> [c000000630dc3cc0] [c0000000006b4ca4] xfs_file_release+0x24/0x38
> [c000000630dc3ce0] [c0000000003f68f8] __fput+0xc8/0x280
> [c000000630dc3d40] [c0000000003f6b40] ____fput+0x20/0x30
> [c000000630dc3d60] [c00000000014bef4] task_work_run+0xb8/0x100
> [c000000630dc3da0] [c0000000000231cc] do_notify_resume+0x184/0x18c
> [c000000630dc3e20] [c00000000000e544] ret_from_except_lite+0x70/0x74
> Instruction dump:
> 39290002 7d290074 7929d182 5529063e 913f005c 813f005c 7d290034 5529d97e
> 69290001 5529063e 2fa90000 419e0008 <0fe00000> 813f005c 7d290034 5529d97e
>
>
> This corresponds to the following code inside xfs_map_blocks(),
>
> imap_valid = offset_fsb >= wpc->imap.br_startoff &&
> offset_fsb < wpc->imap.br_startoff + wpc->imap.br_blockcount;
> if (imap_valid &&
> !WARN_ON_ONCE(wpc->imap.br_startblock == HOLESTARTBLOCK) &&
> (!xfs_inode_has_cow_data(ip) ||
> wpc->io_type == XFS_IO_COW ||
> wpc->cow_seq == READ_ONCE(ip->i_cowfp->if_seq)))
> return 0;
>
>
> I did some debugging and the following provides an explaination as to why
> the condition passed to WARN_ON_ONCE() evaluated to true,
>
> 148854:fsstress 19867 [001] 178758.940803: probe:xfs_file_buffered_aio_write: (c0000000006b71f0) ki_pos=1301686 i_ino=8388710
> 149023:fsstress 19867 [001] 178758.940968: probe:__iomap_write_end: (c00000000049720c) i_ino=8388710 len_u32=9034 copied_u32=9034 pos_u64=1301686
> 149191:fsstress 19867 [001] 178758.941167: probe:__iomap_write_end: (c00000000049720c) i_ino=8388710 len_u32=65536 copied_u32=65536 pos_u64=1310720
> 150133:fsstress 19867 [001] 178758.941566: probe:__iomap_write_end: (c00000000049720c) i_ino=8388710 len_u32=8475 copied_u32=8475 pos_u64=1376256
>
> An "Extended write" is performed at a 4k block which maps file offset
> 1298432. Hence the file range before 1298431 maps to a hole.
>
>
>
> 123854:fsstress 19867 [001] 178758.896306: probe:xfs_file_buffered_aio_write: (c0000000006b71f0) ki_pos=434109 i_ino=8388710
> 124223:fsstress 19867 [001] 178758.896549: probe:__iomap_write_end: (c00000000049720c) i_ino=8388710 len_u32=24643 copied_u32=24643 pos_u64=434109
> 124583:fsstress 19867 [001] 178758.896719: probe:__iomap_write_end: (c00000000049720c) i_ino=8388710 len_u32=65536 copied_u32=65536 pos_u64=458752
> 125875:fsstress 19867 [001] 178758.897266: probe:__iomap_write_end: (c00000000049720c) i_ino=8388710 len_u32=38283 copied_u32=38283 pos_u64=524288
>
> The 8th 64k page now has data written to it in the file offset range [524288,
> 565247]. So there is a hole in the range [565248, 589823] mapped by the page.
>
>
> 154474:fsstress 19867 [001] 178758.947596: probe:iomap_set_range_uptodate: (c000000000495a40) index=8 i_ino=8388710 last=0xf first=0xa
>
> The above event occurs when fsstress invokes a buffered read operation at 10
> block of 8th page i.e. file offset 565248. Hence the blocks mapping the file
> offset range [565248, 589823] are marked uptodate in iop->uptodate.
>
>
> 238888:fsstress 19867 [001] 178759.134993: probe:xfs_map_blocks_found_new_map: (c00000000069fefc) blockcount=27 startblock=18446744073709551614 startoff=138 i_ino=8388710 offset_fsb_u64=138 imap_valid_u8=0
> 238913:fsstress 19867 [001] 178759.135003: probe:xfs_map_blocks: (c00000000069f87c) blockcount=27 startblock=18446744073709551614 startoff=138 i_ino=8388710 offset_fsb_u64=139 offset_u64=569344
> 238938:fsstress 19867 [001] 178759.135012: probe:xfs_map_blocks_imap_valid: (c00000000069f8e0) startoff=138 startblock=18446744073709551614 blockcount=27 i_ino=8388710 offset_fsb_u64=139 offset_u64=569344 imap_valid_u8=1
> 238963:fsstress 19867 [001] 178759.135054: probe:xfs_map_blocks_imap_valid_1: (c00000000069f93c) startoff=138 startblock=18446744073709551614 blockcount=27 i_ino=8388710 offset_fsb_u64=139 offset_u64=569344 imap_valid_u8=1
> 238988:fsstress 19867 [001] 178759.135065: probe:xfs_map_blocks: (c00000000069f87c) blockcount=27 startblock=18446744073709551614 startoff=138 i_ino=8388710 offset_fsb_u64=140 offset_u64=573440
>
> When writing the dirty page which maps the 4k block starting at file offset
> 565248 (138th block) we would encounter a hole with HOLESTARTBLOCK (i.e. -2)
> as the value of startblock. xfs_map_blocks() assigns this imap to
> xfs_writepage_ctx->imap and returns. xfs_writepage_map() loops once again
> since iop->uptodate has the corresponding bit set.
>
>
IMHO, we should track the dirty blocks within the iomap_page structure and
invoke xfs_map_blocks() only on those dirty blocks.
--
chandan
next prev parent reply other threads:[~2018-11-26 22:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-19 13:46 COW improvements and always_cow support V2 Christoph Hellwig
2018-11-19 13:46 ` [PATCH 1/9] xfs: fix shared extent data corruption due to missing cow reservation Christoph Hellwig
2018-11-19 13:46 ` [PATCH 2/9] xfs: handle -EAGAIN from xfs_iomap_write_allocate Christoph Hellwig
2018-11-19 13:46 ` [PATCH 3/9] xfs: avoid an extent tree lookup in xfs_iomap_write_allocate Christoph Hellwig
2018-11-19 13:46 ` [PATCH 4/9] xfs: make xfs_bmbt_to_iomap more useful Christoph Hellwig
2018-11-19 13:46 ` [PATCH 5/9] xfs: don't use delalloc extents for COW on files with extsize hints Christoph Hellwig
2018-11-19 13:46 ` [PATCH 6/9] xfs: merge COW handling into xfs_file_iomap_begin_delay Christoph Hellwig
2018-11-19 13:46 ` [PATCH 7/9] xfs: report IOMAP_F_SHARED from xfs_file_iomap_begin_delay Christoph Hellwig
2018-11-19 13:46 ` [PATCH 8/9] xfs: make COW fork unwritten extent conversions more robust Christoph Hellwig
2018-11-19 13:46 ` [PATCH 9/9] xfs: introduce an always_cow mode Christoph Hellwig
2018-11-25 13:39 ` COW improvements and always_cow support V2 Chandan Rajendra
2018-11-26 11:42 ` Chandan Rajendra [this message]
2018-11-28 7:52 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1984636.oWepG2GJO6@localhost.localdomain \
--to=chandan@linux.vnet.ibm.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).