From: David Sterba <dsterba@suse.cz>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v3] Btrfs: fix data loss after inode eviction, renaming it, and fsync it
Date: Tue, 11 Jun 2019 19:02:15 +0200 [thread overview]
Message-ID: <20190611170215.GF3563@twin.jikos.cz> (raw)
In-Reply-To: <20190607102524.32655-1-fdmanana@kernel.org>
On Fri, Jun 07, 2019 at 11:25:24AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When we log an inode, regardless of logging it completely or only that it
> exists, we always update it as logged (logged_trans and last_log_commit
> fields of the inode are updated). This is generally fine and avoids future
> attempts to log it from having to do repeated work that brings no value.
>
> However, if we write data to a file, then evict its inode after all the
> dealloc was flushed (and ordered extents completed), rename the file and
> fsync it, we end up not logging the new extents, since the rename may
> result in logging that the inode exists in case the parent directory was
> logged before. The following reproducer shows and explains how this can
> happen:
>
> $ mkfs.btrfs -f /dev/sdb
> $ mount /dev/sdb /mnt
>
> $ mkdir /mnt/dir
> $ touch /mnt/dir/foo
> $ touch /mnt/dir/bar
>
> # Do a direct IO write instead of a buffered write because with a
> # buffered write we would need to make sure dealloc gets flushed and
> # complete before we do the inode eviction later, and we can not do that
> # from user space with call to things such as sync(2) since that results
> # in a transaction commit as well.
> $ xfs_io -d -c "pwrite -S 0xd3 0 4K" /mnt/dir/bar
>
> # Keep the directory dir in use while we evict inodes. We want our file
> # bar's inode to be evicted but we don't want our directory's inode to
> # be evicted (if it were evicted too, we would not be able to reproduce
> # the issue since the first fsync below, of file foo, would result in a
> # transaction commit.
> $ ( cd /mnt/dir; while true; do :; done ) &
> $ pid=$!
>
> # Wait a bit to give time for the background process to chdir.
> $ sleep 0.1
>
> # Evict all inodes, except the inode for the directory dir because it is
> # currently in use by our background process.
> $ echo 2 > /proc/sys/vm/drop_caches
>
> # fsync file foo, which ends up persisting information about the parent
> # directory because it is a new inode.
> $ xfs_io -c fsync /mnt/dir/foo
>
> # Rename bar, this results in logging that this inode exists (inode item,
> # names, xattrs) because the parent directory is in the log.
> $ mv /mnt/dir/bar /mnt/dir/baz
>
> # Now fsync baz, which ends up doing absolutely nothing because of the
> # rename operation which logged that the inode exists only.
> $ xfs_io -c fsync /mnt/dir/baz
>
> <power failure>
>
> $ mount /dev/sdb /mnt
> $ od -t x1 -A d /mnt/dir/baz
> 0000000
>
> --> Empty file, data we wrote is missing.
>
> Fix this by not updating last_sub_trans of an inode when we are logging
> only that it exists and the inode was not yet logged since it was loaded
> from disk (full_sync bit set), this is enough to make btrfs_inode_in_log()
> return false for this scenario and make us log the inode. The logged_trans
> of the inode is still always setsince that alone is used to track if names
> need to be deleted as part of unlink operations.
>
> Fixes: 257c62e1bce03e ("Btrfs: avoid tree log commit when there are no changes")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Added to misc-next, thanks.
prev parent reply other threads:[~2019-06-11 17:01 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-06 11:07 [PATCH] Btrfs: fix data loss after inode eviction, renaming it, and fsync it fdmanana
2019-06-06 13:19 ` [PATCH v2] " fdmanana
2019-06-07 10:25 ` [PATCH v3] " fdmanana
2019-06-11 17:02 ` David Sterba [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190611170215.GF3563@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox