From: Christoph Anton Mitterer <calestyo@scientia.net>
To: kreijack@inwind.it
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: Ongoing Btrfs stability issues
Date: Tue, 13 Mar 2018 21:10:42 +0100 [thread overview]
Message-ID: <1520971842.4242.9.camel@scientia.net> (raw)
In-Reply-To: <d6e007af-7980-3d9b-a497-acb3be90dac9@inwind.it>
On Tue, 2018-03-13 at 20:36 +0100, Goffredo Baroncelli wrote:
> A checksum mismatch, is returned as -EIO by a read() syscall. This is
> an event handled badly by most part of the programs.
Then these programs must simply be fixed... otherwise they'll also fail
under normal circumstances with btrfs, if there is any corruption.
> The problem is the following: there is a time window between the
> checksum computation and the writing the data on the disk (which is
> done at the lower level via a DMA channel), where if the data is
> update the checksum would mismatch. This happens if we have two
> threads, where the first commits the data on the disk, and the second
> one updates the data (I think that both VM and database could behave
> so).
Well that's clear... but isn't that time frame also there if the extent
is just written without CoW (regardless of checksumming)?
Obviously there would need to be some protection here anyway, so that
such data is taken e.g. from RAM, before the write has completed, so
that the read wouldn't take place while the write has only half
finished?!
So I'd naively assume one could just enlarge that protection to the
completion of checksum writing,...
> In btrfs, a checksum mismatch creates an -EIO error during the
> reading. In a conventional filesystem (or a btrfs filesystem w/o
> datasum) there is no checksum, so this problem doesn't exist.
If ext writes an extent (can't that be up to 128MiB there?), then I'm
sure it cannot write that atomically (in terms of hardware)... so there
is likely some protection around this operation, that there are no
concurrent reads of that particular extent from the disk, while the
write hasn't finished yet.
> > Even if not... I should be only a problem in case of a crash during
> > that,.. and than I'd still prefer to get the false positive than
> > bad
> > data.
>
> How you can know if it is a "bad data" or a "bad checksum" ?
Well as I've said, in my naive thinking this should only be a problem
in case of a crash... and then, yes, one cannot say whether it's bad
data or checksum (that's exactly what I'm saying)... but I rather
prefer to know that something might be fishy, then not knowing anything
and perhaps even get good data "RAID-repaired" with bad one...
Cheers,
Chris.
next prev parent reply other threads:[~2018-03-13 20:10 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-15 16:18 Ongoing Btrfs stability issues Alex Adriaanse
2018-02-15 18:00 ` Nikolay Borisov
2018-02-15 19:41 ` Alex Adriaanse
2018-02-15 20:42 ` Nikolay Borisov
2018-02-16 4:54 ` Alex Adriaanse
2018-02-16 7:40 ` Nikolay Borisov
2018-02-16 19:44 ` Austin S. Hemmelgarn
2018-02-17 3:03 ` Duncan
2018-02-17 4:34 ` Shehbaz Jaffer
2018-02-17 15:18 ` Hans van Kranenburg
2018-02-17 16:42 ` Shehbaz Jaffer
2018-03-01 19:04 ` Alex Adriaanse
2018-03-01 19:40 ` Nikolay Borisov
2018-03-02 17:29 ` Liu Bo
2018-03-08 17:40 ` Alex Adriaanse
2018-03-09 9:54 ` Nikolay Borisov
2018-03-09 19:05 ` Alex Adriaanse
2018-03-10 12:04 ` Nikolay Borisov
2018-03-10 14:29 ` Christoph Anton Mitterer
2018-03-11 17:51 ` Goffredo Baroncelli
2018-03-11 22:37 ` Christoph Anton Mitterer
2018-03-12 21:22 ` Goffredo Baroncelli
2018-03-12 21:48 ` Christoph Anton Mitterer
2018-03-13 19:36 ` Goffredo Baroncelli
2018-03-13 20:10 ` Christoph Anton Mitterer [this message]
2018-03-14 12:02 ` Austin S. Hemmelgarn
2018-03-14 18:39 ` Goffredo Baroncelli
2018-03-14 19:27 ` Austin S. Hemmelgarn
2018-03-14 22:17 ` Goffredo Baroncelli
2018-03-13 13:47 ` Patrik Lundquist
2018-03-02 4:02 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1520971842.4242.9.camel@scientia.net \
--to=calestyo@scientia.net \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).