Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Oliver Freyermuth <o.freyermuth@googlemail.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Safe unmounting of external btrfs disk
Date: Fri, 27 Nov 2020 03:17:06 +0100	[thread overview]
Message-ID: <1c60c4a1-cff1-84a8-6acc-8f752aa5e265@googlemail.com> (raw)
In-Reply-To: <CAJCQCtS5_oUiTi0u0Twjwea-92-tzj6HNsbwy37e=8iSVky2CQ@mail.gmail.com>


Thanks for the reply!

Am 27.11.20 um 00:38 schrieb Chris Murphy:
> On Thu, Nov 26, 2020 at 11:35 AM Oliver Freyermuth
> <o.freyermuth@googlemail.com> wrote:
>>
>> Dear BTRFS experts,
>>
>> I've had a rather strange occurence with my external BTRFS backup disk last night,
>> which makes me question what is the correct way to safely remove a USB drive with BTRFS on it.
>>
>> Here's the timeline:
>> - 02:00:00 am: btrbk starts running.
>> - 02:01:17 am: btrbk deletes the last old subvolume from the disk
>>                  (I have btrfs_commit_delete = no, so the delayed deletion basically starts some time after).
>> - ~02:18 am: I performn an "umount" of the disk.
> 
> How was this done? If it's "umount" on cli, what was the exact command
> and did it complete?

It was "umount" on the only mountpoint the device was mounted to.

Of course, I have to add at this point that everything is in my logs, apart from the "umount" command,
which is only in non-timestamped shell history.

Since the shell in which I ran "umount" was already closed when I checked the kernel logs,
it's only my memory we rely on here. While I am 98 % sure I entered the command, and only then unplugged the disk,
the same way I do it every day (unless the days I shut down the system and unplug then),
I give 2 % to my memory playing tricks on me and will bump that by +1 % each time I think about it
due to the unreliable way human brains work.

> 
>> - ~02:18:43 am: I unplug the USB drive.
>> - 02:25:05 am: My kernel tells me this:
>> [19268.944902] BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> 
> This suggests the file system was not actually unmounted as far as the
> kernel was concerned, at the time the device was removed.
> 
>> Is this behaviour expected?
> 
> No, btrfs should block umount until it's done writing and flushing. It
> is still possible those writes are in the drive's write cache, and
> could get lost by disconnecting the drive before those writes are on
> stable media, but Btrfs wouldn't know about that and wouldn't
> complain. At next mount, it might know something went wrong only if
> the lost writes were out of order, like the super block was update on
> disk but the new trees it points to were lost. That'd be a drive bug,
> not a Btrfs bug.

That's what I had hoped and would have expected.
So indeed the only valid explanation up to this point would be that my memory played a trick on me.
The good news is: I usually do this test about once per day. So if it was *not* a human memory fault,
it will happen again.

>> If yes, how to "unmount" correctly (btrfs filesystem sync only seems to work on mounted filesystems)?
>> I believe udisks unmounts and then quickly removes power, so this would basically be similar to what I did manually here.
> 
> Enable scsi event tracing:
> 
> # echo "scsi:*" > /sys/kernel/debug/tracing/set_event
> 
> On an HDD (i.e. nossd mount option), for 'umount' I get a bunch of
> WRITE_10 commands that look like tree updates. Followed by
> SYNCHRONIZE_CACHE. And then I see two WRITE_10 commands that
> correspond to superblock 1 and 2. Followed by SYNCHRONIZE_CACHE. And
> then one WRITE_10 for the 3rd superblock. That's it. And that should
> be sufficient and nothing else happens after that - all of this is
> blocking until it's done, as in, the drive itself claims the command
> is done.

I can confirm the very same observations here, thanks for the nice explanation
of what these translate to!

> 
> For this command:
> 
> # echo 1 > /sys/block/sdb/device/delete
> 
> I see two commands, SYNCHRONIZE_CACHE and START_STOP. That's it.
> 
>  From this I conclude umount should be sufficient. I'm not certain
> deleting the device is really necessary but it doesn't hurt.
> 

Ok!
Then I will go this way from now on. If I manage to reproduce this with may daily "detach the backup disk"
routine, it should show up again soon. If now, I have to run a memtest86 on my brain,
too bad these are not CoW and can not be scrubbed...

Cheers and thanks,
	Oliver

      reply	other threads:[~2020-11-27  2:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-26 18:33 Safe unmounting of external btrfs disk Oliver Freyermuth
2020-11-26 23:38 ` Chris Murphy
2020-11-27  2:17   ` Oliver Freyermuth [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1c60c4a1-cff1-84a8-6acc-8f752aa5e265@googlemail.com \
    --to=o.freyermuth@googlemail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox