Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Nikolay Borisov <nborisov@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent
Date: Fri, 22 Dec 2017 11:19:26 -0800	[thread overview]
Message-ID: <20171222191926.GE21643@lim.localdomain> (raw)
In-Reply-To: <a75a5e50-c65a-93e5-5914-b9442051f463@suse.com>

On Fri, Dec 22, 2017 at 02:10:45PM +0200, Nikolay Borisov wrote:
> 
> 
> On 22.12.2017 00:42, Liu Bo wrote:
> > This fixes a corner case that is caused by a race of dio write vs dio
> > read/write.
> > 
> > dio write:
> > [0, 32k) -> [0, 8k) + [8k, 32k)
> > 
> > dio read/write:
> > 
> > While get_extent() with [0, 4k), [0, 8k) is found as existing em, even
> > though start == existing->start, em is [0, 32k),
> > extent_map_end(em) > extent_map_end(existing),
> > then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
> > (with a length 0), and get_extent ends up returning -EEXIST, and dio
> > read/write will get -EEXIST which is confusing applications.
> 
> I think this paragraph should be expanded a bit since you've crammed a
> lot of information in few words.
> 

OK, it's not easy to explain the problem, either.

Probably I should also attach the following as a extra explanation,

+ * Suppose that no extent map has been loaded into memory yet.
+ * There is a file extent [0, 32K), two jobs are running concurrently
+ * against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
+ * read from [0, 4K) or [4K, 8K).
+ *
+ * t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).
+ *
+ *         t1                                t2
+ *  btrfs_get_blocks_direct()         btrfs_get_blocks_direct()
+ *   -> btrfs_get_extent()              -> btrfs_get_extent()
+ *       -> lookup_extent_mapping()
+ *       -> add_extent_mapping()            -> lookup_extent_mapping()
+ *          # load [0, 32K)
+ *   -> btrfs_new_extent_direct()
+ *       -> btrfs_drop_extent_cache()
+ *          # split [0, 32K)
+ *       -> add_extent_mapping()
+ *          # add [8K, 32K)
+ *                                          -> add_extent_mapping()
+ *                                             # handle -EEXIST when adding
+ *                                             # [0, 32K)


> In the below graphs em should be extent we'd like to insert, no?
> So given that it always encompasses the existing (0, 8k given the
> above description) then em should be 0, 32k ?

Yes and yes.

thanks,
-liubo

> > 
> > Here I concluded all the possible situations,
> > 1) start < existing->start
> > 
> >             +-----------+em+-----------+
> > +--prev---+ |     +-------------+      |
> > |         | |     |             |      |
> > +---------+ +     +---+existing++      ++
> >                 +
> >                 |
> >                 +
> >              start
> > 
> > 2) start == existing->start
> > 
> >       +------------em------------+
> >       |     +-------------+      |
> >       |     |             |      |
> >       +     +----existing-+      +
> >             |
> >             |
> >             +
> >          start
> > 
> > 3) start > existing->start && start < (existing->start + existing->len)
> > 
> >       +------------em------------+
> >       |     +-------------+      |
> >       |     |             |      |
> >       +     +----existing-+      +
> >                |
> >                |
> >                +
> >              start
> > 
> > 4) start >= (existing->start + existing->len)
> > 
> > +-----------+em+-----------+
> > |     +-------------+      | +--next---+
> > |     |             |      | |         |
> > +     +---+existing++      + +---------+
> >                       +
> >                       |
> >                       +
> >                    start
> > 
> > After going thru the above case by case, it turns out that if start is
> > within existing em (front inclusive), then the existing em should be
> > returned, otherwise, we try our best to merge candidate em with
> > sibling ems to form a larger em.
> > 
> > Reported-by: David Vallender <david.vallender@landmark.co.uk>
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/extent_map.c | 25 ++++++++++---------------
> >  1 file changed, 10 insertions(+), 15 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> > index 6653b08..d386cfb 100644
> > --- a/fs/btrfs/extent_map.c
> > +++ b/fs/btrfs/extent_map.c
> > @@ -483,7 +483,7 @@ static struct extent_map *prev_extent_map(struct extent_map *em)
> >  static int merge_extent_mapping(struct extent_map_tree *em_tree,
> >  				struct extent_map *existing,
> >  				struct extent_map *em,
> > -				u64 map_start)
> > +				u64 map_start, u64 map_len)
> >  {
> >  	struct extent_map *prev;
> >  	struct extent_map *next;
> > @@ -496,9 +496,13 @@ static int merge_extent_mapping(struct extent_map_tree *em_tree,
> >  	if (existing->start > map_start) {
> >  		next = existing;
> >  		prev = prev_extent_map(next);
> > +		if (prev)
> > +			ASSERT(extent_map_end(prev) <= map_start);
> >  	} else {
> >  		prev = existing;
> >  		next = next_extent_map(prev);
> > +		if (next)
> > +			ASSERT(map_start + map_len <= next->start);
> >  	}
> >  
> >  	start = prev ? extent_map_end(prev) : em->start;
> > @@ -540,35 +544,26 @@ int btrfs_add_extent_mapping(struct extent_map_tree *em_tree,
> >  		 * existing will always be non-NULL, since there must be
> >  		 * extent causing the -EEXIST.
> >  		 */
> > -		if (existing->start == em->start &&
> > -		    extent_map_end(existing) >= extent_map_end(em) &&
> > -		    em->block_start == existing->block_start) {
> > -			/*
> > -			 * The existing extent map already encompasses the
> > -			 * entire extent map we tried to add.
> > -			 */
> > +		if (start >= existing->start &&
> > +		    start < extent_map_end(existing)) {
> >  			free_extent_map(em);
> >  			*em_in = existing;
> >  			ret = 0;
> > -		} else if (start >= extent_map_end(existing) ||
> > -		    start <= existing->start) {
> > +		} else {
> >  			/*
> >  			 * The existing extent map is the one nearest to
> >  			 * the [start, start + len) range which overlaps
> >  			 */
> >  			ret = merge_extent_mapping(em_tree, existing,
> > -						   em, start);
> > +						   em, start, len);
> >  			free_extent_map(existing);
> >  			if (ret) {
> >  				free_extent_map(em);
> >  				*em_in = NULL;
> >  			}
> > -		} else {
> > -			free_extent_map(em);
> > -			*em_in = existing;
> > -			ret = 0;
> >  		}
> >  	}
> > +
> >  	ASSERT(ret == 0 || ret == -EEXIST);
> >  	return ret;
> >  }
> > 

  reply	other threads:[~2017-12-22 19:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-21 22:42 [PATCH 00/10] bugfixes and regression tests of btrfs_get_extent Liu Bo
2017-12-21 22:42 ` [PATCH 01/10] Btrfs: add helper for em merge logic Liu Bo
2017-12-22  7:23   ` Nikolay Borisov
2017-12-22 18:45     ` Liu Bo
2017-12-21 22:42 ` [PATCH 02/10] Btrfs: move extent map specific code to extent_map.c Liu Bo
2017-12-21 22:42 ` [PATCH 03/10] Btrfs: add extent map selftests Liu Bo
2017-12-22  7:42   ` Nikolay Borisov
2017-12-22 18:53     ` Liu Bo
2017-12-21 22:42 ` [PATCH 04/10] Btrfs: extent map selftest: buffered write vs dio read Liu Bo
2017-12-22  7:51   ` Nikolay Borisov
2017-12-22 18:58     ` Liu Bo
2017-12-21 22:42 ` [PATCH 05/10] Btrfs: extent map selftest: dio " Liu Bo
2017-12-21 22:42 ` [PATCH 06/10] Btrfs: fix incorrect block_len in merge_extent_mapping Liu Bo
2017-12-21 22:42 ` [PATCH 07/10] Btrfs: fix unexpected EEXIST from btrfs_get_extent Liu Bo
2017-12-22 12:10   ` Nikolay Borisov
2017-12-22 19:19     ` Liu Bo [this message]
2017-12-21 22:42 ` [PATCH 08/10] Btrfs: add WARN_ONCE to detect unexpected error from merge_extent_mapping Liu Bo
2017-12-22  8:52   ` Nikolay Borisov
2017-12-21 22:42 ` [PATCH 09/10] Btrfs: add tracepoint for em's EEXIST case Liu Bo
2017-12-22  8:56   ` Nikolay Borisov
2017-12-22 19:07     ` Liu Bo
2017-12-23  7:39       ` Nikolay Borisov
2018-01-02 18:02         ` Liu Bo
2017-12-21 22:42 ` [PATCH 10/10] Btrfs: noinline merge_extent_mapping Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171222191926.GE21643@lim.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox