Re: good documentation on btrfs internals and on disk layout

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: good documentation on btrfs internals and on disk layout
Date: Tue, 5 Apr 2016 14:56:17 -0400	[thread overview]
Message-ID: <57040A51.5060800@gmail.com> (raw)
In-Reply-To: <CAKWEGV4TTbC4PW=VoxbRxFL0L4DrgrCVNaEca-xroHeQDQe4Rg@mail.gmail.com>

On 2016-04-05 14:36, Yauhen Kharuzhy wrote:
> 2016-04-05 11:15 GMT-07:00 Austin S. Hemmelgarn <ahferroin7@gmail.com>:
>> On 2016-04-05 13:53, Yauhen Kharuzhy wrote:
>>>
>>> Hello,
>>>
>>> I try to understand btrfs logic in mounting of multi-device filesystem
>>> when device generations are different. All my questions are related to
>>> RAID5/6 for system, metadata, and data case.
>>>
>>> Kernel can mount FS with different device generations (if drive was
>>> physically removed before last unmount and returned back after, for
>>> example) now but scrub will report uncorrectable errors after this
>>> (but second run doesn't show any errors). Does any documentation about
>>> algorithm of multiple device handling in such case exist? Does the
>>> case with different device generations is allowed in general and what
>>> worst cases can be here?
>>
>> In general, it isn't allowed, but we don't explicitly disallow it either.
>> The worst case here is that the devices both get written two separately, and
>> you end up with data not matching for correlated generation ID's.  The
>> second scrub in this case shows no errors because the first one corrects
>> them (even though they are reported as uncorrectable, which is a bug as far
>> as I can tell), and from what I can tell from reading the code, it does this
>> by just picking the highest generation ID and dropping the data from the
>> lower generation.
>
> Hmm... Sounds reasonable but how to detect if filesystem should be
> checked by scrub after mounting? There is one way as I understand — to
> check kernel logs after mount for any btrfs errors and this is not a
> good way for case of some kind of automatic management.
There really isn't any way that I know of.  Personally, I just scrub all 
my filesystems shortly after mount, but I also have pretty small 
filesystems (the biggest are 64G) on relatively fast storage.  In 
theory, it might be possible to parse the filesystems before mounting to 
check the device generation numbers, but that may be just as expensive 
as just scrubbing the filesystem (and you really should be scrubbing 
somewhat regularly anyway).
>
>>> What should happen if device was removed and returned back after some
>>> time when filesystem is online? Should some kind of device
>>> reopening be possible or one possible way to guarantee FS consistensy
>>> is  to mark such device as missing and to replace it?
>>
>> In this case, the device being removed (or some component between the device
>> and the processor failing, or the device itself erroneously reporting
>> failure) will force the FS read-only.  If the device reappears while the FS
>> is still online, it may just start working again (this is _really_ rare, and
>> requires that the device appear with the same device node as it had
>> previously, and this usually only happens when the device disappears for
>> only a very short period of time), or it may not work until the FS gets
>> remounted (this is usually the case), or the system may crash (thankfully
>> this almost never happens, and it's usually not because of BTRFS when it
>> does).  Regardless of what happens, you may still have to run a scrub to
>> make sure everything is consistent.
>
> So, one right way if we see device reconnected as new block device —
> is to reject it and don't include it in device list again, am I right?
> Existing code tries to 'reconnect' it with new device name but this
> works completely wrong for mounted FS (because btrfs device is renamed
> only, no real device reopening is performed) and I intend to propose
> patch based on Anand's 'global spare' patch series to handle this
> properly.
In an ideal situation, you have nothing using the FS and can unmount, 
run a device scan, and then remount.  In most cases this won't work, and 
being able to re-add the device via a hot-spare type setup (or even just 
use device replace on it, which I've done before myself when dealing 
with filesystems on USB devices, and it works well) would be useful. 
Ideally, we should have the option to auto-detect such a situation and 
handle it, but that _really_ needs to be optional (there are just too 
many things that could go wrong).

next prev parent reply	other threads:[~2016-04-05 18:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-30 13:58 good documentation on btrfs internals and on disk layout sri
2016-03-30 17:28 ` Liu Bo
2016-03-30 18:11   ` Dave Stevens
2016-03-30 18:43 ` Hugo Mills
2016-04-05 17:53   ` Yauhen Kharuzhy
2016-04-05 18:15     ` Austin S. Hemmelgarn
2016-04-05 18:36       ` Yauhen Kharuzhy
2016-04-05 18:56         ` Austin S. Hemmelgarn [this message]
2016-04-05 19:26           ` Yauhen Kharuzhy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57040A51.5060800@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=yauhen.kharuzhy@zavadatar.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).