From: Liu Bo <bo.li.liu@oracle.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH v2 02/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent
Date: Fri, 5 Jan 2018 12:51:09 -0700 [thread overview]
Message-ID: <20180105195117.5131-3-bo.li.liu@oracle.com> (raw)
In-Reply-To: <20180105195117.5131-1-bo.li.liu@oracle.com>
This fixes a corner case that is caused by a race of dio write vs dio
read/write.
Here is how the race could happen.
Suppose that no extent map has been loaded into memory yet.
There is a file extent [0, 32K), two jobs are running concurrently
against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
read from [0, 4K) or [4K, 8K).
t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).
------------------------------------------------------
t1 t2
btrfs_get_blocks_direct() btrfs_get_blocks_direct()
-> btrfs_get_extent() -> btrfs_get_extent()
-> lookup_extent_mapping()
-> add_extent_mapping() -> lookup_extent_mapping()
# load [0, 32K)
-> btrfs_new_extent_direct()
-> btrfs_drop_extent_cache()
# split [0, 32K) and
# drop [8K, 32K)
-> add_extent_mapping()
# add [8K, 32K)
-> add_extent_mapping()
# handle -EEXIST when adding
# [0, 32K)
------------------------------------------------------
About how t2(dio read/write) runs into -EEXIST:
a) add_extent_mapping() gets -EEXIST for adding em [0, 32k),
b) search_extent_mapping() then returns [0, 8k) as the existing em,
even though start == existing->start, em is [0, 32k) so that
extent_map_end(em) > extent_map_end(existing), i.e. 32k > 8k,
c) then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
(with a length 0) and returns -EEXIST as [8k, 32k) is already in tree,
d) so btrfs_get_extent() ends up returning -EEXIST to dio read/write,
which is confusing applications.
Here I conclude all the possible situations,
1) start < existing->start
+-----------+em+-----------+
+--prev---+ | +-------------+ |
| | | | | |
+---------+ + +---+existing++ ++
+
|
+
start
2) start == existing->start
+------------em------------+
| +-------------+ |
| | | |
+ +----existing-+ +
|
|
+
start
3) start > existing->start && start < (existing->start + existing->len)
+------------em------------+
| +-------------+ |
| | | |
+ +----existing-+ +
|
|
+
start
4) start >= (existing->start + existing->len)
+-----------+em+-----------+
| +-------------+ | +--next---+
| | | | | |
+ +---+existing++ + +---------+
+
|
+
start
As we can see, it turns out that if start is within existing em (front
inclusive), then the existing em should be returned as is, otherwise,
we try our best to merge candidate em with sibling ems to form a
larger em (in order to reduce the total number of em).
Reported-by: David Vallender <david.vallender@landmark.co.uk>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: Improve commit log to provide more details about the bug.
fs/btrfs/inode.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2784bb3..a270fe2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7162,19 +7162,12 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
* existing will always be non-NULL, since there must be
* extent causing the -EEXIST.
*/
- if (existing->start == em->start &&
- extent_map_end(existing) >= extent_map_end(em) &&
- em->block_start == existing->block_start) {
- /*
- * The existing extent map already encompasses the
- * entire extent map we tried to add.
- */
+ if (start >= existing->start &&
+ start < extent_map_end(existing)) {
free_extent_map(em);
em = existing;
err = 0;
-
- } else if (start >= extent_map_end(existing) ||
- start <= existing->start) {
+ } else {
/*
* The existing extent map is the one nearest to
* the [start, start + len) range which overlaps
@@ -7186,10 +7179,6 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode,
free_extent_map(em);
em = NULL;
}
- } else {
- free_extent_map(em);
- em = existing;
- err = 0;
}
}
write_unlock(&em_tree->lock);
--
2.9.4
next prev parent reply other threads:[~2018-01-05 20:54 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-05 19:51 [PATCH v2 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
2018-01-05 19:51 ` [PATCH v2 01/10] Btrfs: fix incorrect block_len in merge_extent_mapping Liu Bo
2018-01-09 17:24 ` Josef Bacik
2018-01-05 19:51 ` Liu Bo [this message]
2018-01-09 17:27 ` [PATCH v2 02/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent Josef Bacik
2018-01-05 19:51 ` [PATCH v2 03/10] Btrfs: add helper for em merge logic Liu Bo
2018-01-09 17:27 ` Josef Bacik
2018-01-05 19:51 ` [PATCH v2 04/10] Btrfs: move extent map specific code to extent_map.c Liu Bo
2018-01-09 17:29 ` Josef Bacik
2018-01-05 19:51 ` [PATCH v2 05/10] Btrfs: add extent map selftests Liu Bo
2018-01-09 17:31 ` Josef Bacik
2018-01-05 19:51 ` [PATCH v2 06/10] Btrfs: extent map selftest: buffered write vs dio read Liu Bo
2018-01-09 17:32 ` Josef Bacik
2018-01-05 19:51 ` [PATCH v2 07/10] Btrfs: extent map selftest: dio " Liu Bo
2018-01-09 17:32 ` Josef Bacik
2018-01-05 19:51 ` [PATCH v2 08/10] Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping Liu Bo
2018-01-09 17:33 ` Josef Bacik
2018-01-05 19:51 ` [PATCH v2 09/10] Btrfs: add tracepoint for em's EEXIST case Liu Bo
2018-01-09 17:35 ` Josef Bacik
2018-01-19 18:15 ` David Sterba
2018-01-19 18:22 ` Nikolay Borisov
2018-01-19 23:32 ` David Sterba
2018-01-05 19:51 ` [PATCH v2 10/10] Btrfs: noinline merge_extent_mapping Liu Bo
2018-01-09 17:35 ` Josef Bacik
2018-01-08 19:57 ` [PATCH v2 00/10] bugfixes and regression tests of btrfs_get_extent David Sterba
2018-01-18 16:51 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180105195117.5131-3-bo.li.liu@oracle.com \
--to=bo.li.liu@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).