Re: Is it safe to use btrfs on top of different types of devices?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Linux fs Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Is it safe to use btrfs on top of different types of devices?
Date: Thu, 19 Oct 2017 08:32:09 -0400	[thread overview]
Message-ID: <4d61bf1d-3c22-9b64-9710-0616fe4bb54a@gmail.com> (raw)
In-Reply-To: <23016.34313.523398.532675@tree.ty.sabi.co.uk>

On 2017-10-19 07:01, Peter Grandi wrote:
> [ ... ]
> 
>>> Oh please, please a bit less silliness would be welcome here.
>>> In a previous comment on this tedious thread I had written:
> 
>>>> If the block device abstraction layer and lower layers work
>>>> correctly, Btrfs does not have problems of that sort when
>>>> adding new devices; conversely if the block device layer and
>>>> lower layers do not work correctly, no mainline Linux
>>>> filesystem I know can cope with that.
>>>
>>>> Note: "work correctly" does not mean "work error-free".
>>>
>>> The last line is very important and I added it advisedly.
> 
>> Even looking at things that way though, Zoltan's assessment
>> that reliability is essentially a measure of error rate is
>> correct.
> 
> It is instead based on a grave confusion between two very
> different kinds of "error rate", confusion also partially based
> on the ridiculous misunderstanding, which I have already pointed
> out, that UNIX filesystems run on top of SATA or USB devices:
> 
>> Internal SATA devices absolutely can randomly drop off the bus
>> just like many USB storage devices do,
> 
> Filesystems run on top of *block devices* with a definite
> interface and a definite state machine, and filesystems in
> general assume that the block device works *correctly*.
They do run on top of USB or SATA devices, otherwise a significant 
majority of systems running Linux and/or BSD should not be operating 
right now.  Yes, they don't directly access them, but the block layer 
isn't much more than command translation, scheduling, and accounting, so 
this distinction is meaningless and largely irrelevant.  It's also 
pretty standard practice among most sane sysadmins who aren't trying to 
be jerks, as well as most kernel developers I've met, is to refer to a 
block device connected via interface 'X' as an 'X device' or an 'X 
storage device'.
> 
>> but it almost never happens (it's a statistical impossibility
>> if there are no hardware or firmware issues), so they are more
>> reliable in that respect.
> 
> What the OP was doing was using "unreliable" both for the case
> where the device "lies" and the case where the device does not
> "lie" but reports a failure. Both of these are malfunctions in a
> wide sense:
> 
>    * The [block] device "lies" as to its status or what it has done.
>    * The [block] device reports truthfully that an action has failed.
> 
> But they are of very different nature and need completely
> different handling. Hint: one is an extensional property and the
> other is a modal one, there is a huge difference between "this
> data is wrong" and "I know that this data is wrong".
> 
> The really important "detail" is that filesystems are, as a rule
> with very few exceptions, designed to work only if the block
> device layer (and those below it) does not "lie" (see "Bizantyne
> failures" below), that is "works correctly": reports the failure
> of every operation that fails and the success of every operation
> that succeeds and never gets into an unexpected state.
> 
> In particular filesystems designs are nearly always based on the
> assumption that there are no undetected errors at the block
> device level or below. Then the expected *frequency* of detected
> errors influences how much redundancy and what kind of recovery
> are desirable, but the frequency of "lies" is assumed to be
> zero.
> 
> The one case where Btrfs does not assume that the storage layer
> works *correctly* is checksumming: it is quite expensive and
> makes sense only if the block device is expected to (sometimes)
> "lie" about having written the data correctly or having read it
> correctly. The role of the checksum is to spot when a block
> device "lies" and turn an undetected read error into a detected
> one (they could be used also to detect correct writes that are
> misreported as having failed).
> 
> The crucial difference that exists between SATA and USB is not
> that USB chips have higher rates of detected failures (even if
> they often do), but that in my experience SATA interfaces from
> reputable suppliers don't "lie" (more realistically have
> negligible "lie" rates), and USB interfaces (both host bus
> adapters and IO bus bridges) "lie" both systematically and
> statistically with non negligible rates, and anyhow the USB mass
> storage protocol is not very good at error reporting and
> handling.
You do realize you just said exactly what I was saying, just in a more 
general and much more verbose manner which involved explaining things 
that are either well known and documented or aren't even entirely 
relevant to the thread in question?

For an end user, it generally doesn't matter whether a given layer 
reported the error or passed it on (or generated it), it matters whether 
it was corrected or not.  If the subset of the storage stack below 
whatever layer is being discussed (in this case the filesystem) causes 
errors at a rate deemed unacceptable for the given application that it 
does not correct, it's unreliable, regardless of whether or not they get 
corrected at this layer or a higher layer.  Even if you're running BTRFS 
on top of it, a SATA connected hard drive that returns bogus data on 1% 
of reads is from a user perspective just as unreliable as one that 
returns read errors 1% of the time, even though BTRFS can handle both 
(provided the user configures it correctly).

next prev parent reply	other threads:[~2017-10-19 12:32 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-14 19:00 Is it safe to use btrfs on top of different types of devices? Zoltán Ivánfi
2017-10-15  0:19 ` Peter Grandi
2017-10-15  3:42 ` Duncan
2017-10-15  8:30 ` Zoltán Ivánfi
2017-10-15 12:05   ` Duncan
2017-10-16 11:53   ` Austin S. Hemmelgarn
2017-10-16 16:57     ` Zoltan
2017-10-16 17:27       ` Austin S. Hemmelgarn
2017-10-17  1:14         ` Adam Borowski
2017-10-17 11:26           ` Austin S. Hemmelgarn
2017-10-17 11:42             ` Zoltan
2017-10-17 12:40               ` Austin S. Hemmelgarn
2017-10-17 17:06                 ` Adam Borowski
2017-10-17 19:19                   ` Austin S. Hemmelgarn
2017-10-17 20:21                     ` Adam Borowski
2017-10-17 21:56                       ` Zoltán Ivánfi
2017-10-18  4:44                         ` Duncan
2017-10-18 14:07                         ` Peter Grandi
2017-10-18 11:30                       ` Austin S. Hemmelgarn
2017-10-18 11:59                         ` Adam Borowski
2017-10-18 14:30                           ` Austin S. Hemmelgarn
2017-10-18  4:50                     ` Duncan
2017-10-18 13:53               ` Peter Grandi
2017-10-18 14:30                 ` Austin S. Hemmelgarn
2017-10-19 11:01                   ` Peter Grandi
2017-10-19 12:32                     ` Austin S. Hemmelgarn [this message]
2017-10-19 18:39                       ` Peter Grandi
2017-10-20 11:53                         ` Austin S. Hemmelgarn
2017-10-19 13:48                     ` Zoltan
2017-10-19 14:27                       ` Austin S. Hemmelgarn
2017-10-19 14:42                         ` Zoltan
2017-10-19 15:07                           ` Austin S. Hemmelgarn
2017-10-19 18:00                         ` Peter Grandi
2017-10-19 17:56                       ` Peter Grandi
2017-10-19 18:59                         ` Peter Grandi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4d61bf1d-3c22-9b64-9710-0616fe4bb54a@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).