From: Filipe Manana <fdmanana@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: Filipe Manana <fdmanana@suse.com>
Subject: [PATCH] Btrfs: fix hole detection during file fsync
Date: Wed, 6 Aug 2014 15:33:12 +0100 [thread overview]
Message-ID: <1407335592-27897-1-git-send-email-fdmanana@suse.com> (raw)
The file hole detection logic during a file fsync wasn't correct,
because it didn't look back (in a previous leaf) for the last file
extent item that can be in a leaf to the left of our leaf and that
has a generation lower than the current transaction id. This made it
assume that a hole exists when it really doesn't exist in the file.
Such false positive hole detection happens in the following scenario:
* We have a file that has many file extent items, covering 3 or more
btree leafs (the first leaf must contain non file extent items too).
* Two ranges of the file are modified, with their extent items being
located at 2 different leafs and those leafs aren't consecutive.
* When processing the second leaf, we weren't checking if some file
extent item exists that is located in some leaf that is between
our 2 leafs, and therefore assumed the range defined between the
last file extent item in first leaf and the first file extent item
in the second leaf matched a hole.
Fortunately this didn't result in overriding the log with wrong data,
instead it made the last loop in copy_items() attempt to insert a
duplicated key (for a hole file extent item), which makes the file
fsync code return with -EEXIST to file.c:btrfs_sync_file() which in
turn ends up doing a full transaction commit.
I could trigger this issue with the following test for xfstests (which
never fails, either without or with this patch). The last fsync call
results in a full transaction commit, due to the -EEXIST error mentioned
above. I could also observe this behaviour happening frequently when
running xfstests/generic/075 in a loop.
Test:
_cleanup()
{
_cleanup_flakey
rm -fr $tmp
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/dmflakey
# real QA test starts here
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_dm_flakey
_need_to_be_root
rm -f $seqres.full
# Create a file with many file extent items, each representing a 4Kb extent.
# These items span 3 btree leaves, of 16Kb each (default mkfs.btrfs leaf size
# as of btrfs-progs 3.12).
_scratch_mkfs -l 16384 >/dev/null 2>&1
_init_flakey
SAVE_MOUNT_OPTIONS="$MOUNT_OPTIONS"
MOUNT_OPTIONS="$MOUNT_OPTIONS -o commit=999"
_mount_flakey
# First fsync, inode has BTRFS_INODE_NEEDS_FULL_SYNC flag set.
$XFS_IO_PROG -f -c "pwrite -S 0x01 -b 4096 0 4096" -c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io
# For any of the following fsync calls, inode doesn't have the flag
# BTRFS_INODE_NEEDS_FULL_SYNC set.
for ((i = 1; i <= 500; i++)); do
OFFSET=$((4096 * i))
LEN=4096
$XFS_IO_PROG -c "pwrite -S 0x01 $OFFSET $LEN" -c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io
done
# Commit transaction and bump next transaction's id (to 7).
sync
# Truncate will set the BTRFS_INODE_NEEDS_FULL_SYNC flag in the btrfs's
# inode runtime flags.
$XFS_IO_PROG -c "truncate 2048000" $SCRATCH_MNT/foo
# Commit transaction and bump next transaction's id (to 8).
sync
# Touch 1 extent item from the first leaf and 1 from the last leaf. The leaf
# in the middle, containing only file extent items, isn't touched. So the
# next fsync, when calling btrfs_search_forward(), won't visit that middle
# leaf. First and 3rd leaf have generation 6, while middle one has generation 8.
$XFS_IO_PROG \
-c "pwrite -S 0xee -b 4096 0 4096" \
-c "pwrite -S 0xff -b 4096 2043904 4096" \
-c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io
_load_flakey_table $FLAKEY_DROP_WRITES
md5sum $SCRATCH_MNT/foo | _filter_scratch
_unmount_flakey
_load_flakey_table $FLAKEY_ALLOW_WRITES
# During mount, we'll replay the log created by the fsync above, and the file's
# md5 digest should be the same we got before the unmount.
_mount_flakey
md5sum $SCRATCH_MNT/foo | _filter_scratch
_unmount_flakey
MOUNT_OPTIONS="$SAVE_MOUNT_OPTIONS"
status=0
exit
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/tree-log.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index df332dd..5a917a6 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3296,7 +3296,7 @@ static noinline int copy_items(struct btrfs_trans_handle *trans,
struct list_head ordered_sums;
int skip_csum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
bool has_extents = false;
- bool need_find_last_extent = (*last_extent == 0);
+ bool need_find_last_extent = true;
bool done = false;
INIT_LIST_HEAD(&ordered_sums);
@@ -3350,8 +3350,7 @@ static noinline int copy_items(struct btrfs_trans_handle *trans,
*/
if (ins_keys[i].type == BTRFS_EXTENT_DATA_KEY) {
has_extents = true;
- if (need_find_last_extent &&
- first_key.objectid == (u64)-1)
+ if (first_key.objectid == (u64)-1)
first_key = ins_keys[i];
} else {
need_find_last_extent = false;
@@ -3425,6 +3424,16 @@ static noinline int copy_items(struct btrfs_trans_handle *trans,
if (!has_extents)
return ret;
+ if (need_find_last_extent && *last_extent == first_key.offset) {
+ /*
+ * We don't have any leafs between our current one and the one
+ * we processed before that can have file extent items for our
+ * inode (and have a generation number smaller than our current
+ * transaction id).
+ */
+ need_find_last_extent = false;
+ }
+
/*
* Because we use btrfs_search_forward we could skip leaves that were
* not modified and then assume *last_extent is valid when it really
@@ -3535,7 +3544,7 @@ fill_holes:
0, 0);
if (ret)
break;
- *last_extent = offset + len;
+ *last_extent = extent_end;
}
/*
* Need to let the callers know we dropped the path so they should
--
1.9.1
next reply other threads:[~2014-08-06 13:34 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-06 14:33 Filipe Manana [this message]
2014-08-07 11:00 ` [PATCH v2] Btrfs: fix hole detection during file fsync Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1407335592-27897-1-git-send-email-fdmanana@suse.com \
--to=fdmanana@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).