linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] Btrfs: deal with existing encompassing extent map in btrfs_get_extent()
Date: Thu, 10 Nov 2016 12:01:20 -0800	[thread overview]
Message-ID: <20161110195718.GA22740@localhost.localdomain> (raw)
In-Reply-To: <262a1e171d091626edbd23c637cb138ba9d84ed8.1478733376.git.osandov@fb.com>

On Wed, Nov 09, 2016 at 03:26:50PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> My QEMU VM was seeing inexplicable I/O errors that I tracked down to
> errors coming from the qcow2 virtual drive in the host system. The qcow2
> file is a nocow file on my Btrfs drive, which QEMU opens with O_DIRECT.
> Every once in awhile, pread() or pwrite() would return EEXIST, which
> makes no sense. This turned out to be a bug in btrfs_get_extent().
> 
> Commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map
> insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() where
> two threads race on adding the same extent map to an inode's extent map
> tree. However, if the added em is merged with an adjacent em in the
> extent tree, then we'll end up with an existing extent that is not
> identical to but instead encompasses the extent we tried to add. When we
> call merge_extent_mapping() to find the nonoverlapping part of the new
> em, the arithmetic overflows because there is no such thing. We then end
> up trying to add a bogus em to the em_tree, which results in a EEXIST
> that can bubble all the way up to userspace.

I don't get how this could happen(even after reading Commit
8dff9c853410), btrfs_get_extent in direct_IO is protected by
lock_extent_direct, the assumption is that a racy thread should be
blocked by lock_extent_direct and when it gets the lock, it finds the
just-inserted em when going into btrfs_get_extent if its offset is
within [em->start, extent_map_end(em)].

I think we may also need to figure out why the above doesn't work as
expected besides fixing another special case.

Thanks,

-liubo

> 
> Fix it by extending the identical extent map special case.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
> Applies to 4.9-rc4.
> 
> Here [1] is a reproducer for this bug that doesn't involve firing up a
> QEMU VM. Also, a big shoutout to BCC [2] and BPF for making it possible
> to debug this on my laptop without compiling a custom kernel and
> rebooting just to add printks [3].
> 
> 1: https://gist.github.com/osandov/d08aabe5d4dec15517e9fde17012fd3b
> 2: https://github.com/iovisor/bcc
> 3: https://gist.github.com/osandov/eb1db868ce10c3af9e00b90f3a65bf9f
> 
>  fs/btrfs/inode.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 2b790bd..e5cf589 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7049,11 +7049,11 @@ struct extent_map *btrfs_get_extent(struct inode *inode, struct page *page,
>  		 * extent causing the -EEXIST.
>  		 */
>  		if (existing->start == em->start &&
> -		    extent_map_end(existing) == extent_map_end(em) &&
> +		    extent_map_end(existing) >= extent_map_end(em) &&
>  		    em->block_start == existing->block_start) {
>  			/*
> -			 * these two extents are the same, it happens
> -			 * with inlines especially
> +			 * The existing extent map already encompasses the
> +			 * entire extent map we tried to add.
>  			 */
>  			free_extent_map(em);
>  			em = existing;
> -- 
> 2.10.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2016-11-10 20:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-09 23:26 [PATCH] Btrfs: deal with existing encompassing extent map in btrfs_get_extent() Omar Sandoval
2016-11-10 15:06 ` David Sterba
2016-11-10 15:37   ` Holger Hoffstätte
2016-11-10 15:42   ` Omar Sandoval
2016-11-11  0:36   ` Liu Bo
2016-11-10 15:11 ` Holger Hoffstätte
2016-11-10 15:37   ` Omar Sandoval
2016-11-10 16:01     ` Holger Hoffstätte
2016-11-10 16:20       ` Omar Sandoval
2016-11-10 16:31         ` Holger Hoffstätte
2016-11-10 20:01 ` Liu Bo [this message]
2016-11-10 20:09   ` Omar Sandoval
2016-11-10 20:24     ` Omar Sandoval
2016-11-10 22:38       ` Liu Bo
2016-11-10 22:45         ` Omar Sandoval
2016-11-17  0:32           ` Omar Sandoval

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161110195718.GA22740@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).