public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Filipe Manana <fdmanana@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>,
	Chris Murphy <chris@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: LMDB mdb_copy produces a corrupt database on btrfs, but not on ext4
Date: Thu, 16 Feb 2023 14:45:39 -0800	[thread overview]
Message-ID: <Y+6yEwymCdyOQ/4V@zen> (raw)
In-Reply-To: <CAL3q7H7gWmJhJ-xMcDifQ2hK=wMWJTmQ0tQWd8KRsaQM6fwiDg@mail.gmail.com>

On Thu, Feb 16, 2023 at 09:43:03PM +0000, Filipe Manana wrote:
> On Thu, Feb 16, 2023 at 6:49 PM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Thu, Feb 16, 2023 at 06:00:08PM +0000, Filipe Manana wrote:
> > > Ok, so the problem is btrfs_dio_iomap_end() detects the submitted
> > > amount is less than expected, so it marks the ordered extents as not
> > > up to date, setting the BTRFS_ORDERED_IOERR bit on it.
> > > That results in having an unexpected hole for the range [8192, 65535],
> > > and no error returned to btrfs_direct_write().
> > >
> > > My initial thought was to truncate the ordered extent at
> > > btrfs_dio_iomap_end(), similar to what we do at
> > > btrfs_invalidate_folio().
> > > I think that should work, however we would end up with a bookend
> > > extent (but so does your proposed fix), but I don't see an easy way to
> > > get around that.
> >
> > Wouldn't a better way to handle this be to cache the ordered_extent in
> > the btrfs_dio_data, and just reuse it on the next iteration if present
> > and covering the range?
> 
> That may work too, yes.

Quick update, I just got a preliminary version of this proposal working:
- reuse btrfs_dio_data across calls to __iomap_dio_rw
- store the dio ordered_extent when we create it in btrfs_dio_iomap_begin
- modify btrfs_dio_iomap_end to not mark the unfinished ios done in the
  incomplete case. (and to drop the ordered extent on done or error)
- modify btrfs_dio_iomap_begin to short-circuit when it has a cached
  ordered_extent

The resulting behavior on this workload is:
- write 8192
- finish OE, write file extent
- write 57344 (no extent, cached OE)
- re-enter __iomap_dio_rw with a live OE
- skip locking extent, reserving space, etc.
- write 1769472
- finish OE, write file extent

and the file looks as if there were no partial write. I think this is a
good structure for a fix to this bug, and plan to polish it up and send
it soon, unless someone objects and thinks we should go a different way.

  reply	other threads:[~2023-02-16 22:45 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-15 20:04 LMDB mdb_copy produces a corrupt database on btrfs, but not on ext4 Chris Murphy
2023-02-15 20:16 ` Chris Murphy
2023-02-15 21:41   ` Filipe Manana
2023-02-15 23:21   ` Boris Burkov
2023-02-16  0:34     ` Boris Burkov
2023-02-16  1:46       ` Boris Burkov
2023-02-16  5:58         ` Christoph Hellwig
2023-02-16  9:30           ` Christoph Hellwig
2023-02-16 11:57       ` Filipe Manana
2023-02-16 17:14         ` Boris Burkov
2023-02-16 18:00           ` Filipe Manana
2023-02-16 18:49             ` Christoph Hellwig
2023-02-16 21:43               ` Filipe Manana
2023-02-16 22:45                 ` Boris Burkov [this message]
2023-02-17 11:19                   ` Filipe Manana
2023-02-16 10:05     ` Qu Wenruo
2023-02-16 12:01       ` Filipe Manana
2023-02-17  0:15         ` Qu Wenruo
2023-02-17 11:38           ` Filipe Manana
2023-04-05 13:07 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-04-06 15:47   ` David Sterba
2023-04-06 22:40     ` Neal Gompa
2023-04-07  6:10     ` Linux regression tracking (Thorsten Leemhuis)
2023-04-08  0:08       ` Boris Burkov
2023-04-11 19:27       ` David Sterba
2023-04-12  9:57         ` Linux regression tracking (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+6yEwymCdyOQ/4V@zen \
    --to=boris@bur.io \
    --cc=chris@colorremedies.com \
    --cc=fdmanana@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox