From: Chris Mason <clm@fb.com>
To: Filipe Manana <fdmanana@suse.com>
Cc: <linux-btrfs@vger.kernel.org>, Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH v4] Btrfs: fix snapshot inconsistency after a file write followed by truncate
Date: Wed, 29 Oct 2014 08:19:58 -0400 [thread overview]
Message-ID: <1414585198.2166.6@mail.thefacebook.com> (raw)
In-Reply-To: <1414583879-10913-1-git-send-email-fdmanana@suse.com>
On Wed, Oct 29, 2014 at 7:57 AM, Filipe Manana <fdmanana@suse.com>
wrote:
> If right after starting the snapshot creation ioctl we perform a
> write against a
> file followed by a truncate, with both operations increasing the
> file's size, we
> can get a snapshot tree that reflects a state of the source
> subvolume's tree where
> the file truncation happened but the write operation didn't. This
> leaves a gap
> between 2 file extent items of the inode, which makes btrfs' fsck
> complain about it.
>
> For example, if we perform the following file operations:
>
> $ mkfs.btrfs -f /dev/vdd
> $ mount /dev/vdd /mnt
> $ xfs_io -f \
> -c "pwrite -S 0xaa -b 32K 0 32K" \
> -c "fsync" \
> -c "pwrite -S 0xbb -b 32770 16K 32770" \
> -c "truncate 90123" \
> /mnt/foobar
>
> and the snapshot creation ioctl was just called before the second
> write, we often
> can get the following inode items in the snapshot's btree:
>
> item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160
> inode generation 146 transid 7 size 90123 block group
> 0 mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0
> item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20
> inode ref index 282 namelen 10 name: foobar
> item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53
> extent data disk byte 1104855040 nr 32768
> extent data offset 0 nr 32768 ram 32768
> extent compression 0
> item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53
> extent data disk byte 0 nr 0
> extent data offset 0 nr 40960 ram 40960
> extent compression 0
>
> There's a file range, corresponding to the interval [32K; ALIGN(16K +
> 32770, 4096)[
> for which there's no file extent item covering it. This is because
> the file write
> and file truncate operations happened both right after the snapshot
> creation ioctl
> called btrfs_start_delalloc_inodes(), which means we didn't start and
> wait for the
> ordered extent that matches the write and, in btrfs_setsize(), we
> were able to call
> btrfs_cont_expand() before being able to commit the current
> transaction in the
> snapshot creation ioctl. So this made it possibe to insert the hole
> file extent
> item in the source subvolume (which represents the region added by
> the truncate)
> right before the transaction commit from the snapshot creation ioctl.
>
> Btrfs' fsck tool complains about such cases with a message like the
> following:
>
> "root 331 inode 257 errors 100, file extent discount"
>
> From a user perspective, the expectation when a snapshot is created
> while those
> file operations are being performed is that the snapshot will have a
> file that
> either:
>
> 1) is empty
> 2) only the first write was captured
> 3) only the 2 writes were captured
> 4) both writes and the truncation were captured
>
> But never capture a state where only the first write and the
> truncation were
> captured (since the second write was performed before the truncation).
>
> A test case for xfstests follows.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>
> V2: Use different approach to solve the problem. Don't start and wait
> for all
> dellaloc to finish after every expanding truncate, instead add an
> additional
> flush at transaction commit time if we're doing a transaction
> commit that
> creates snapshots.
>
> V3: Removed useless test condition in
> +wait_pending_snapshot_roots_delalloc().
>
> V4: Use another approach that doesn't imply starting delalloc work
> and wait
> for it to finish at transaction commit time.
I like this one better ;) Taking it for a spin here.
-chris
prev parent reply other threads:[~2014-10-29 12:20 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-21 10:12 [PATCH] Btrfs: fix snapshot inconsistency after a file write followed by truncate Filipe Manana
2014-10-27 19:57 ` Chris Mason
2014-10-28 10:10 ` Filipe David Manana
2014-10-29 0:35 ` [PATCH v2] " Filipe Manana
2014-10-29 8:21 ` [PATCH v3] " Filipe Manana
2014-10-29 9:13 ` Miao Xie
2014-10-29 11:57 ` [PATCH v4] " Filipe Manana
2014-10-29 12:19 ` Chris Mason [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1414585198.2166.6@mail.thefacebook.com \
--to=clm@fb.com \
--cc=fdmanana@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.