From: Omar Sandoval <osandov@osandov.com>
To: "Holger Hoffstätte" <holger@applied-asynchrony.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] Btrfs: deal with existing encompassing extent map in btrfs_get_extent()
Date: Thu, 10 Nov 2016 08:20:34 -0800 [thread overview]
Message-ID: <20161110162034.GA2847@vader> (raw)
In-Reply-To: <582499E8.5090504@applied-asynchrony.com>
On Thu, Nov 10, 2016 at 05:01:44PM +0100, Holger Hoffstätte wrote:
> On 11/10/16 16:37, Omar Sandoval wrote:
> > On Thu, Nov 10, 2016 at 04:11:35PM +0100, Holger Hoffstätte wrote:
> >> On 11/10/16 00:26, Omar Sandoval wrote:
> >>> From: Omar Sandoval <osandov@fb.com>
> >>>
> >>> My QEMU VM was seeing inexplicable I/O errors that I tracked down to
> >>> errors coming from the qcow2 virtual drive in the host system. The qcow2
> >>> file is a nocow file on my Btrfs drive, which QEMU opens with O_DIRECT.
> >>> Every once in awhile, pread() or pwrite() would return EEXIST, which
> >>> makes no sense. This turned out to be a bug in btrfs_get_extent().
> >>>
> >>> Commit 8dff9c853410 ("Btrfs: deal with duplciates during extent_map
> >>> insertion in btrfs_get_extent") fixed a case in btrfs_get_extent() where
> >>> two threads race on adding the same extent map to an inode's extent map
> >>> tree. However, if the added em is merged with an adjacent em in the
> >>> extent tree, then we'll end up with an existing extent that is not
> >>> identical to but instead encompasses the extent we tried to add. When we
> >>> call merge_extent_mapping() to find the nonoverlapping part of the new
> >>> em, the arithmetic overflows because there is no such thing. We then end
> >>> up trying to add a bogus em to the em_tree, which results in a EEXIST
> >>> that can bubble all the way up to userspace.
> >>>
> >>> Fix it by extending the identical extent map special case.
> >>>
> >>> Signed-off-by: Omar Sandoval <osandov@fb.com>
> >>> ---
> >>> Applies to 4.9-rc4.
> >>>
> >>> Here [1] is a reproducer for this bug that doesn't involve firing up a
> >>> QEMU VM. Also, a big shoutout to BCC [2] and BPF for making it possible
> >>> to debug this on my laptop without compiling a custom kernel and
> >>> rebooting just to add printks [3].
> >>>
> >>> 1: https://gist.github.com/osandov/d08aabe5d4dec15517e9fde17012fd3b
> >>
> >> I can't really make this reproducer fail. It builds and runs fine, but just
> >> exits with no messages (other than the one about drop_caches in dmesg).
> >> It creates the 1MB file and always returns 0. Ideas?
> >>
> >> -h
> >
> > It's a race condition, so it doesn't happen 100% of the time. I imagine
> > it depends on the storage speed, as well. On my laptop, which is
> > dm-crypt on top of an SSD, it works about 50% of the time. Could you
> > just try running it 100 times or something and see if it fails?
>
> $for i ($(seq 1 1000)) ./pread_eexist_repro /mnt/test/$i || echo "fail"
>
> ..couple of thousand runs without problem, only lots of fallocating and
> cache dropping.
>
> Oh well, I tried. :)
>
> -h
Just out of curiousity, what kind of disk were you trying this on? I've
only been able to trigger it on my laptop and a VM running on my laptop.
--
Omar
next prev parent reply other threads:[~2016-11-10 16:20 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-09 23:26 [PATCH] Btrfs: deal with existing encompassing extent map in btrfs_get_extent() Omar Sandoval
2016-11-10 15:06 ` David Sterba
2016-11-10 15:37 ` Holger Hoffstätte
2016-11-10 15:42 ` Omar Sandoval
2016-11-11 0:36 ` Liu Bo
2016-11-10 15:11 ` Holger Hoffstätte
2016-11-10 15:37 ` Omar Sandoval
2016-11-10 16:01 ` Holger Hoffstätte
2016-11-10 16:20 ` Omar Sandoval [this message]
2016-11-10 16:31 ` Holger Hoffstätte
2016-11-10 20:01 ` Liu Bo
2016-11-10 20:09 ` Omar Sandoval
2016-11-10 20:24 ` Omar Sandoval
2016-11-10 22:38 ` Liu Bo
2016-11-10 22:45 ` Omar Sandoval
2016-11-17 0:32 ` Omar Sandoval
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161110162034.GA2847@vader \
--to=osandov@osandov.com \
--cc=holger@applied-asynchrony.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).