Re: fatal database corruption with btrfs "out of space" with ~50 GB left

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tomasz Chmielewski <tch@virtall.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: fatal database corruption with btrfs "out of space" with ~50 GB left
Date: Thu, 15 Feb 2018 13:19:58 +0900	[thread overview]
Message-ID: <f77a1a28c2cbf08efd24577088a29de0@virtall.com> (raw)
In-Reply-To: <72c2c665-97f1-3356-6d65-66ba162736de@gmx.com>

On 2018-02-15 10:47, Qu Wenruo wrote:
> On 2018年02月14日 22:19, Tomasz Chmielewski wrote:
>> Just FYI, how dangerous running btrfs can be - we had a fatal,
>> unrecoverable MySQL corruption when btrfs decided to do one of these 
>> "I
>> have ~50 GB left, so let's do out of space (and corrupt some files at
>> the same time, ha ha!)".
> 
> I'm recently looking into unexpected corruption problem of btrfs.
> 
> Would you please provide some extra info about how the corruption 
> happened?
> 
> 1) Is there any power reset?
>    Btrfs should be bullet proof, but in fact it's not, so I'm here to
>    get some clue.

No power reset.


> 2) Are MySQL files set with nodatacow?
>    If so, data corruption is more or less expected, but should be
>    handled by checkpoint of MySQL.

Yes, MySQL files were using "nodatacow".

I've seen many cases of "filesystem full" with ext4, but none lead to 
database corruption (i.e. the database would always recover after 
releasing some space)

On the other hand, I've seen a handful of "out of space" with gigabytes 
of free space with btrfs, which lead to some light, heavy or 
unrecoverable MySQL or mongo corruption.


Can it be because of of how "predictable" out of space situations are 
with btrfs and other filesystems?

- in short, ext4 will report out of space when there is 0 bytes left 
(perhaps slightly faster for non-root users) - the application trying to 
write data will see "out of space" at some point, and it can stay like 
this for hours (i.e. until some data is removed manually)

- on the other hand, btrfs can report out of space when there is still 
10, 50 or 100 GB left, meaning, any capacity planning is close to 
impossible; also, the application trying to write data can be seeing the 
fs as transitioning between "out of space" and "data written 
successfully" many times per minute/second?


> 3) Is the filesystem metadata corrupted? (AKA, btrfs check report 
> error)
>    If so, that should be the problem I'm looking into.

I don't think so, there are no scary things in dmesg. However, I didn't 
unmount the filesystem to run btrfs check.


> 4) Metadata/data ratio?
>    "btrfs fi usage" could have quite good result about it.
>    And "btrfs fi df" also helps.

Here it is - however, that's after removing some 80 GB data, so most 
likely doesn't reflect when the failure happened.

# btrfs fi usage /var/lib/lxd
Overall:
     Device size:                 846.25GiB
     Device allocated:            840.05GiB
     Device unallocated:            6.20GiB
     Device missing:                  0.00B
     Used:                        498.26GiB
     Free (estimated):            167.96GiB      (min: 167.96GiB)
     Data ratio:                       2.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:411.00GiB, Used:246.14GiB
    /dev/sda3     411.00GiB
    /dev/sdb3     411.00GiB

Metadata,RAID1: Size:9.00GiB, Used:2.99GiB
    /dev/sda3       9.00GiB
    /dev/sdb3       9.00GiB

System,RAID1: Size:32.00MiB, Used:80.00KiB
    /dev/sda3      32.00MiB
    /dev/sdb3      32.00MiB

Unallocated:
    /dev/sda3       3.10GiB
    /dev/sdb3       3.10GiB



# btrfs fi df /var/lib/lxd
Data, RAID1: total=411.00GiB, used=246.15GiB
System, RAID1: total=32.00MiB, used=80.00KiB
Metadata, RAID1: total=9.00GiB, used=2.99GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



# btrfs fi show /var/lib/lxd
Label: 'btrfs'  uuid: f5f30428-ec5b-4497-82de-6e20065e6f61
         Total devices 2 FS bytes used 249.15GiB
         devid    1 size 423.13GiB used 420.03GiB path /dev/sda3
         devid    2 size 423.13GiB used 420.03GiB path /dev/sdb3



Tomasz Chmielewski
https://lxadm.com

next prev parent reply	other threads:[~2018-02-15  4:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-14 14:19 fatal database corruption with btrfs "out of space" with ~50 GB left Tomasz Chmielewski
2018-02-15  1:25 ` Duncan
2018-02-15  1:47 ` Qu Wenruo
2018-02-15  4:19   ` Tomasz Chmielewski [this message]
2018-02-15  4:32     ` Qu Wenruo
2018-02-15  7:02       ` Tomasz Chmielewski
2018-02-15  7:17         ` Tomasz Chmielewski
2018-02-15  9:06           ` Nikolay Borisov
2018-02-15  7:38         ` Qu Wenruo
2018-02-15  7:50         ` Duncan
2018-02-19  4:29 ` Anand Jain
2018-02-19  8:30   ` Tomasz Chmielewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f77a1a28c2cbf08efd24577088a29de0@virtall.com \
    --to=tch@virtall.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).