linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ein <ein.net@gmail.com>
To: Nikolay Borisov <nborisov@suse.com>,
	Andrei Borzenkov <arvidjaar@gmail.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: csum failed root raveled during balance
Date: Mon, 28 May 2018 18:51:50 +0200	[thread overview]
Message-ID: <0b9ecc90-b12c-f2a9-fe29-37a3ba75f411@gmail.com> (raw)
In-Reply-To: <c39be3aa-e45e-daad-9342-f35959def35c@suse.com>

On 05/27/2018 11:41 AM, Nikolay Borisov wrote:
> 
> 
> On 27.05.2018 08:50, Andrei Borzenkov wrote:
>> 23.05.2018 09:32, Nikolay Borisov пишет:
>>>
>>>
>>> On 22.05.2018 23:05, ein wrote:
>>>> Hello devs,
>>>>
>>>> I tested BTRFS in production for about a month:
>>>>
>>>> 21:08:17 up 34 days,  2:21,  3 users,  load average: 0.06, 0.02, 0.00
>>>>
>>>> Without power blackout, hardware failure, SSD's SMART is flawless etc.
>>>> The tests ended with:
>>>>
>>>> root@node0:~# dmesg | grep BTRFS | grep warn
>>>> 185:980:[2927472.393557] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 312 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>>> 186:981:[2927472.394158] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 312 off 608284672 csum 0x7da1b152 expected csum 0x3163a9b7 mirror 1
>>>> 191:986:[2928224.169814] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 314 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>>> 192:987:[2928224.171433] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 314 off 608284672 csum 0x7da1b152 expected csum 0x3163a9b7 mirror 1
>>>> 206:1001:[2928298.039516] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 319 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>>> 207:1002:[2928298.043103] BTRFS warning (device dm-0): csum failed root
>>>> -9 ino 319 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>>> 208:1004:[2932213.513424] BTRFS warning (device dm-0): csum failed root
>>>> 5 ino 219962 off 4564959232 csum 0xc616afb4 expected csum 0x5425e489
>>>> mirror 1
>>>> 209:1005:[2932235.666368] BTRFS warning (device dm-0): csum failed root
>>>> 5 ino 219962 off 16989835264 csum 0xd63ed5da expected csum 0x7429caa1
>>>> mirror 1
>>>> 210:1072:[2936767.229277] BTRFS warning (device dm-0): csum failed root
>>>> 5 ino 219915 off 82318458880 csum 0x83614341 expected csum 0x0b8706f8
>>>> mirror 1
>>>> 211:1073:[2936767.276229] BTRFS warning (device dm-0): csum failed root
>>>> 5 ino 219915 off 82318458880 csum 0x83614341 expected csum 0x0b8706f8
>>>> mirror 1
>>>>
>>>> Above has been revealed during below command and quite high IO usage by
>>>> few VMs (Linux on top Ext4 with firebird database, lots of random
>>>> read/writes, two others with Windows 2016 and Windows Update in the
>>>> background):
>>>
>>> I believe you are hitting the issue described here:
>>>
>>> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg25656.html
>>>
>>> Essentially the way qemu operates on vm images atop btrfs is prone to
>>> producing such errors. As a matter of fact, other filesystems also
>>> suffer from this(i.e pages modified while being written, however due to
>>> lack of CRC on the data they don't detect it). Can you confirm that
>>> those inodes (312/314/319/219962/219915) belong to vm images files?
>>>
>>> IMHO the best course of action would be to disable checksumming for you
>>> vm files.
>>>
>>>
>>> For some background I suggest you read the following LWN articles:
>>>
>>> https://lwn.net/Articles/486311/
>>> https://lwn.net/Articles/442355/
>>>
>>
>> Hmm ... according to these articles, "pages under writeback are marked
>> as not being writable; any process attempting to write to such a page
>> will block until the writeback completes". And it says this feature is
>> available since 3.0 and btrfs has it. So how comes it still happens?
>> Were stable patches removed since then?
> 
> If you are using buffered writes, then yes you won't have the problem.
> However qemu by default bypasses host's page cache and instead uses DIO:
> 
> https://btrfs.wiki.kernel.org/index.php/Gotchas#Direct_IO_and_CRCs

I can confirm that writing data to the filesystem on guest side is not
buffered at host with config:

<disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/var/lib/libvirt/images/db.raw'/>
      <target dev='vda' bus='virtio'/>
      [...]
</disk>

Because buff/cache memory usage stays unchanged at host during high
sequential writing and there's no kworker/flush process committing the
data. How qemu can avoid dirty page buffering? There's nothing else
than:ppoll, read, io_sumbit and write in strace:

read(52, "\1\0\0\0\0\0\0\0", 512)       = 8
io_submit(0x7f35367f7000, 2, [{pwritev, fildes=19,
iovec=[{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=368640}, {iov_base="\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=368640},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=679
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=368640},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=679936}, {iov_bas
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=368640},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=679936}, {iov_base="\0\0\0\0\0\
1048576},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=1048576},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
read(38, "\3\0\0\0\0\0\0\0", 512)       = 8
ppoll([{fd=52, events=POLLIN|POLLERR|POLLHUP}, {fd=38,
events=POLLIN|POLLERR|POLLHUP}, {fd=10, events=POLLIN|POLLERR|POLLHUP}],
3, NULL, NULL, 8) = 1 ([{fd=52, revents=POLLIN}])
read(52, "\1\0\0\0\0\0\0\0", 512)       = 8
io_submit(0x7f35367f7000, 1, [{pwritev, fildes=19,
iovec=[{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=368640}, {iov_base="\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0"..., iov_len=1048576},
{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=368640}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0
ppoll([{fd=52, events=POLLIN|POLLERR|POLLHUP}, {fd=38,
events=POLLIN|POLLERR|POLLHUP}, {fd=10, events=POLLIN|POLLERR|POLLHUP}],
3, {tv_sec=0, tv_nsec=0}, NULL, 8) = 2 ([{fd=52, re

-- 
PGP Public Key (RSA/4096b):
ID: 0xF2C6EA10
SHA-1: 51DA 40EE 832A 0572 5AD8 B3C0 7AFF 69E1 F2C6 EA10

      reply	other threads:[~2018-05-28 16:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 20:05 csum failed root raveled during balance ein
2018-05-23  6:32 ` Nikolay Borisov
2018-05-23  8:03   ` ein
2018-05-23  9:09     ` Duncan
2018-05-23 10:09       ` ein
2018-05-23 11:03         ` Austin S. Hemmelgarn
2018-05-28 17:10           ` ein
2018-05-29 12:12             ` Austin S. Hemmelgarn
2018-05-29 14:02               ` ein
2018-05-29 14:35                 ` Austin S. Hemmelgarn
2018-05-23 11:12     ` Nikolay Borisov
2018-05-27  5:50   ` Andrei Borzenkov
2018-05-27  9:41     ` Nikolay Borisov
2018-05-28 16:51       ` ein [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b9ecc90-b12c-f2a9-fe29-37a3ba75f411@gmail.com \
    --to=ein.net@gmail.com \
    --cc=arvidjaar@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).