From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: fix race between shrinking truncate and fiemap
Date: Fri, 7 Feb 2020 11:19:38 -0500 [thread overview]
Message-ID: <749c0bca-b740-e163-e937-94277efd3d4a@toxicpanda.com> (raw)
In-Reply-To: <20200207122309.17209-1-fdmanana@kernel.org>
On 2/7/20 7:23 AM, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When there is a fiemap executing in parallel with a shrinking truncate we
> can end up in a situation where we have extent maps for which we no longer
> have corresponding file extent items. This is generally harmless and at
> the moment the only consequences are missing file extent items representing
> holes after we expand the file size again after the truncate operation
> removed the prealloc extent items, and stale information for future fiemap
> calls (reporting extents that no longer exist or may have been reallocated
> to other files for example).
>
> Consider the following example:
>
> 1) Our inode has a size of 128Kb, one 128Kb extent at file offset 0 and
> a 1Mb prealloc extent at file offset 128Kb;
>
> 2) Task A starts doing a shrinking truncate of our inode to reduce it to
> a size of 64Kb. Before it searches the subvolume tree for file extent
> items to delete, it drops all the extent maps in the range from 64Kb
> to (u64)-1 by calling btrfs_drop_extent_cache();
>
> 3) Task B starts doing a fiemap against our inode. When looking up for the
> inode's extent maps in the range from 128Kb to (u64)-1, it doesn't find
> any in the inode's extent map tree, since they were removed by task A.
> Because it didn't find any in the extent map tree, it scans the inode's
> subvolume tree for file extent items, and it finds the 1Mb prealloc
> extent at file offset 128Kb, then it creates an extent map based on
> that file extent item and adds it to inode's extent map tree (this ends
> up being done by btrfs_get_extent() <- btrfs_get_extent_fiemap() <-
> get_extent_skip_holes());
>
> 4) Task A then drops the prealloc extent at file offset 128Kb and shrinks
> the 128Kb extent file offset 0 to a length of 64Kb. The truncation
> operation finishes and we end up with an extent map representing a 1Mb
> prealloc extent at file offset 128Kb, despite we don't have any more
> that extent;
>
> After this the two types of problems we have are:
>
> 1) Future calls to fiemap always report that a 1Mb prealloc extent exists
> at file offset 128Kb. This is stale information, no longer correct;
>
> 2) If the size of the file is increased, by a truncate operation that
> increases the file size or by a write into a file offset > 64Kb for
> example, we end up not inserting file extent items to represent
> holes for any range between 128Kb and 128Kb + 1Mb, since the hole
> expansion function, btrfs_cont_expand() will skip hole insertion
> for any range for which an extent map exists that represents a
> prealloc extent. This causes fsck to complain about missing file
> extent items when not using the NO_HOLES feature.
>
> The second issue could be often triggered by test case generic/561 from
> fstests, which runs fsstress and duperemove in parallel, and duperemove
> does frequent fiemap calls.
>
> Essentially the problems happens because fiemap does not acquire the
> inode's lock while truncate does, and fiemap locks the file range in the
> inode's iotree while truncate does not. So fix the issue by making
> btrfs_truncate_inode_items() lock the file range from the new file size
> to (u64)-1, so that it serializes with fiemap.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
I've been wanting to do this for years anyway. We should probably go and see if
there's anywhere else we're modifying file extents without the extent lock in place.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
next prev parent reply other threads:[~2020-02-07 16:19 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 12:23 [PATCH] Btrfs: fix race between shrinking truncate and fiemap fdmanana
2020-02-07 16:19 ` Josef Bacik [this message]
2020-02-11 19:45 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=749c0bca-b740-e163-e937-94277efd3d4a@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox