linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: ein <ein.net@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: csum failed root raveled during balance
Date: Wed, 23 May 2018 14:12:10 +0300	[thread overview]
Message-ID: <b7cd5626-bfa7-73c0-0810-a41f1abc4480@suse.com> (raw)
In-Reply-To: <5B052068.5000608@gmail.com>



On 23.05.2018 11:03, ein wrote:
> On 05/23/2018 08:32 AM, Nikolay Borisov wrote:
> 
> Nikolay, thank you for the answer.
> 
>>> [...]
>>> root@node0:~# dmesg | grep BTRFS | grep warn
>>> 185:980:[2927472.393557] BTRFS warning (device dm-0): csum failed root
>>> -9 ino 312 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>> 186:981:[2927472.394158] BTRFS warning (device dm-0): csum failed root
>>> -9 ino 312 off 608284672 csum 0x7da1b152 expected csum 0x3163a9b7 mirror 1
>>> 191:986:[2928224.169814] BTRFS warning (device dm-0): csum failed root
>>> -9 ino 314 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>> 192:987:[2928224.171433] BTRFS warning (device dm-0): csum failed root
>>> -9 ino 314 off 608284672 csum 0x7da1b152 expected csum 0x3163a9b7 mirror 1
>>> 206:1001:[2928298.039516] BTRFS warning (device dm-0): csum failed root
>>> -9 ino 319 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>> 207:1002:[2928298.043103] BTRFS warning (device dm-0): csum failed root
>>> -9 ino 319 off 608284672 csum 0x7d03a376 expected csum 0x3163a9b7 mirror 1
>>> 208:1004:[2932213.513424] BTRFS warning (device dm-0): csum failed root
>>> 5 ino 219962 off 4564959232 csum 0xc616afb4 expected csum 0x5425e489
>>> mirror 1
>>> 209:1005:[2932235.666368] BTRFS warning (device dm-0): csum failed root
>>> 5 ino 219962 off 16989835264 csum 0xd63ed5da expected csum 0x7429caa1
>>> mirror 1
>>> 210:1072:[2936767.229277] BTRFS warning (device dm-0): csum failed root
>>> 5 ino 219915 off 82318458880 csum 0x83614341 expected csum 0x0b8706f8
>>> mirror 1
>>> 211:1073:[2936767.276229] BTRFS warning (device dm-0): csum failed root
>>> 5 ino 219915 off 82318458880 csum 0x83614341 expected csum 0x0b8706f8
>>> mirror 1
>>>
>>> Above has been revealed during below command and quite high IO usage by
>>> few VMs (Linux on top Ext4 with firebird database, lots of random
>>> read/writes, two others with Windows 2016 and Windows Update in the
>>> background):
>>
>> I believe you are hitting the issue described here:
>>
>> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg25656.html
> 
> It make sense, fsck.ext4, gbak - firebird integrity checking tool,
> chkdsk and sfc /scannow don't show any errors internally within VM. As
> far I can tell the data inside VMs is not corrupted somehow.
> 
>> Essentially the way qemu operates on vm images atop btrfs is prone to
>> producing such errors. As a matter of fact, other filesystems also
>> suffer from this(i.e pages modified while being written, however due to
>> lack of CRC on the data they don't detect it). Can you confirm that
>> those inodes (312/314/319/219962/219915) belong to vm images files?
> 
> root@node0:/var/lib/libvirt# find  ./ -inum 312
> root@node0:/var/lib/libvirt# find  ./ -inum 314
> root@node0:/var/lib/libvirt# find  ./ -inum 319
> root@node0:/var/lib/libvirt# find  ./ -inum 219962
> ./images/rds.raw
> root@node0:/var/lib/libvirt# find  ./ -inum 219915
> ./images/database.raw
> 
> It seems so (219962, 219915):
> - rds.raw - Windows 2016 server, Remote Desktop Server, raw preallocated
> image, NTFS
> database.raw - Linux 3.8, Firebird DB server, raw preallocated image, Ext4
> 
>> IMHO the best course of action would be to disable checksumming for you
>> vm files.
>>
> 
> Do you mean '-o nodatasum' mount flag? Is it possible to disable
> checksumming for singe file by setting some magical chattr? Google
> thinks it's not possible to disable csums for a single file.

You can't disable checksumming for a single file. However, what you
could do is set a the "No CoW" flag via chattr +c /path/to/file since it
also disables checksumming. Bear in mind you can't set this flag to a
file which already has allocated blocks. So you'd have to create an
empty file, set the +C flag and then copy the data with dd for example.

On a different note - for database workloads and generally random
workloads it makes no sense to have CoW since you'd see very spikey io
performance.

> 
>> For some background I suggest you read the following LWN articles:
>>
>> https://lwn.net/Articles/486311/
>> https://lwn.net/Articles/442355/
>>
>>>
>>> when I changed BTRFS compress parameters. Or during umount (can't recall
>>> now):
>>>
>>> May  2 07:15:39 node0 kernel: [1168145.677431] WARNING: CPU: 6 PID: 3763
>>> at /build/linux-8B5M4n/linux-4.15.11/fs/direct-io.c:293
>>> dio_complete+0x1d6/0x220
>>> May  2 07:15:39 node0 kernel: [1168145.678811] Modules linked in: fuse
>>> ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs vhost_net vhost
>>> tap tun ebtable_filter ebtables ip6tab
>>> le_filter ip6_tables iptable_filter binfmt_misc bridge 8021q garp mrp
>>> stp llc snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal
>>> intel_powerclamp coretemp snd_hda_codec_realtek kvm
>>> _intel snd_hda_codec_generic kvm i915 irqbypass crct10dif_pclmul
>>> snd_hda_intel crc32_pclmul ghash_clmulni_intel snd_hda_codec
>>> intel_cstate snd_hda_core iTCO_wdt iTCO_vendor_support
>>>  intel_uncore drm_kms_helper snd_hwdep wmi_bmof intel_rapl_perf joydev
>>> evdev pcspkr snd_pcm snd_timer drm snd soundcore i2c_algo_bit sg mei_me
>>> lpc_ich shpchp mfd_core mei ie31200_e
>>> dac wmi video button ib_iser rdma_cm iw_cm ib_cm ib_core configfs
>>> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables
>>> May  2 07:15:39 node0 kernel: [1168145.685202]  x_tables autofs4 ext4
>>> crc16 mbcache jbd2 fscrypto ecb btrfs zstd_decompress zstd_compress
>>> xxhash raid456 async_raid6_recov async_mem
>>> cpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic
>>> raid0 multipath linear hid_generic usbhid hid dm_mod raid10 raid1 md_mod
>>> sd_mod crc32c_intel ahci i2c_i801 lib
>>> ahci aesni_intel xhci_pci aes_x86_64 ehci_pci libata crypto_simd
>>> xhci_hcd ehci_hcd cryptd glue_helper e1000e scsi_mod ptp usbcore
>>> pps_core usb_common fan thermal
>>> May  2 07:15:39 node0 kernel: [1168145.689057] CPU: 6 PID: 3763 Comm:
>>> kworker/6:2 Not tainted 4.15.0-0.bpo.2-amd64 #1 Debian 4.15.11-1~bpo9+1
>>> May  2 07:15:39 node0 kernel: [1168145.690347] Hardware name: LENOVO
>>> ThinkServer TS140/ThinkServer TS140, BIOS FBKTB3AUS 06/16/2015
>>> May  2 07:15:39 node0 kernel: [1168145.691659] Workqueue: dio/dm-0
>>> dio_aio_complete_work
>>> May  2 07:15:39 node0 kernel: [1168145.692935] RIP:
>>> 0010:dio_complete+0x1d6/0x220
>>> May  2 07:15:39 node0 kernel: [1168145.694275] RSP:
>>> 0018:ffff9abc68447e50 EFLAGS: 00010286
>>> May  2 07:15:39 node0 kernel: [1168145.695605] RAX: 00000000fffffff0
>>> RBX: ffff8e33712e3480 RCX: ffff9abc68447c88
>>> May  2 07:15:39 node0 kernel: [1168145.697024] RDX: fffff1dcc92e4c1f
>>> RSI: 0000000000000000 RDI: 0000000000000246
>>> May  2 07:15:39 node0 kernel: [1168145.698389] RBP: 0000000000005000
>>> R08: 0000000000000000 R09: ffffffffb7a075c0
>>> May  2 07:15:39 node0 kernel: [1168145.699703] R10: ffff8e33bb4223c0
>>> R11: 0000000000000195 R12: 0000000000005000
>>> May  2 07:15:39 node0 kernel: [1168145.701044] R13: 0000000000000003
>>> R14: 0000000403060000 R15: ffff8e33712e3500
>>> May  2 07:15:39 node0 kernel: [1168145.702238] FS: 
>>> 0000000000000000(0000) GS:ffff8e349eb80000(0000) knlGS:0000000000000000
>>> May  2 07:15:39 node0 kernel: [1168145.703475] CS:  0010 DS: 0000 ES:
>>> 0000 CR0: 0000000080050033
>>> May  2 07:15:39 node0 kernel: [1168145.704733] CR2: 00007ff89915b08e
>>> CR3: 00000005b2e0a005 CR4: 00000000001626e0
>>> May  2 07:15:39 node0 kernel: [1168145.705955] Call Trace:
>>> May  2 07:15:39 node0 kernel: [1168145.707151]  process_one_work+0x177/0x360
>>> May  2 07:15:39 node0 kernel: [1168145.708373]  worker_thread+0x4d/0x3c0
>>> May  2 07:15:39 node0 kernel: [1168145.709501]  kthread+0xf8/0x130
>>> May  2 07:15:39 node0 kernel: [1168145.710603]  ?
>>> process_one_work+0x360/0x360
>>> May  2 07:15:39 node0 kernel: [1168145.711701]  ?
>>> kthread_create_worker_on_cpu+0x70/0x70
>>> May  2 07:15:39 node0 kernel: [1168145.712845]  ? SyS_exit_group+0x10/0x10
>>> May  2 07:15:39 node0 kernel: [1168145.713973]  ret_from_fork+0x35/0x40
>>> May  2 07:15:39 node0 kernel: [1168145.715072] Code: 8b 78 30 48 83 7f
>>> 58 00 0f 84 e5 fe ff ff 49 8d 54 2e ff 4c 89 f6 48 c1 fe 0c 48 c1 fa 0c
>>> e8 c2 e0 f3 ff 85 c0 0f 84 c8 fe ff f
>>> f <0f> 0b e9 c1 fe ff ff 8b 47 20 a8 10 0f 84 e2 fe ff ff 48 8b 77
>>> May  2 07:15:39 node0 kernel: [1168145.717349] ---[ end trace
>>> cfa707d6465e13d2 ]---
>>>
>>> If someone is interested in investigating then please let me know. The
>>> data is not important. The lack of incrementing read_io_errs is
>>> particularly critical IMHO.
>>
>> This warning is due to mixing buffered/dio. For more info check the
>> commit log of :
>>
>> 332391a9935d ("fs: Fix page cache inconsistency when mixing buffered and
>> AIO DIO")
> 
> Reading the BTRFS code is beyond my understanding. Have you thought
> about read_io_errs counter?

I didn't say read the btrfs code but rather read the commit messages.

> 
> Balance reveals IO read error, copying VM file ends with IO read error,
> read_io_errors is unchanged - still shows "0".

Will have to investigate and see whether the current behavior is
intentional or not.

> 
> 

  parent reply	other threads:[~2018-05-23 11:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 20:05 csum failed root raveled during balance ein
2018-05-23  6:32 ` Nikolay Borisov
2018-05-23  8:03   ` ein
2018-05-23  9:09     ` Duncan
2018-05-23 10:09       ` ein
2018-05-23 11:03         ` Austin S. Hemmelgarn
2018-05-28 17:10           ` ein
2018-05-29 12:12             ` Austin S. Hemmelgarn
2018-05-29 14:02               ` ein
2018-05-29 14:35                 ` Austin S. Hemmelgarn
2018-05-23 11:12     ` Nikolay Borisov [this message]
2018-05-27  5:50   ` Andrei Borzenkov
2018-05-27  9:41     ` Nikolay Borisov
2018-05-28 16:51       ` ein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7cd5626-bfa7-73c0-0810-a41f1abc4480@suse.com \
    --to=nborisov@suse.com \
    --cc=ein.net@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).