File system corruption, btrfsck abort

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* File system corruption, btrfsck abort
@ 2017-04-25 17:50 Christophe de Dinechin
  2017-04-27 14:58 ` Christophe de Dinechin
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Christophe de Dinechin @ 2017-04-25 17:50 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I”ve been trying to run btrfs as my primary work filesystem for about 3-4 months now on Fedora 25 systems. I ran a few times into filesystem corruptions. At least one I attributed to a damaged disk, but the last one is with a brand new 3T disk that reports no SMART errors. Worse yet, in at least three cases, the filesystem corruption caused btrfsck to crash.

The last filesystem corruption is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there.

The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash modes: either an abort or a SIGSEGV. I checked that both still happens on master as of today.

The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this:

78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120
78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80
78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120

I don’t really know what to make of it.

The cause of the SIGSEGV is that we try to free a list entry that has its next set to NULL.

#0  list_del (entry=0x555555db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
#1  free_all_extent_backrefs (rec=0x555555db0350) at cmds-check.c:5386
#2  maybe_free_extent_rec (extent_cache=0x7fffffffd990, rec=0x555555db0350) at cmds-check.c:5417
#3  0x00005555555b308f in check_block (flags=<optimized out>, buf=0x55557b87cdf0, extent_cache=0x7fffffffd990, root=0x55555587d570) at cmds-check.c:5851
#4  run_next_block (root=root@entry=0x55555587d570, bits=bits@entry=0x5555558841

I don’t know if the two problems are related, but they seem to be pretty consistent on this specific disk, so I think that we have a good opportunity to improve btrfsck to make it more robust to this specific form of corruption. But I don’t want to hapazardly modify a code I don’t really understand. So if anybody could make a suggestion on what the right strategy should be when we have max_size == 0, or how to avoid it in the first place.

I don’t know if this is relevant at all, but all the machines that failed that way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat intense on occasions, since the VMs running there are part of a personal Jenkins ring that automatically builds various projects. Nominally, there are between three and five guests running (Windows XP, WIndows 10, macOS, Fedora25, Ubuntu 16.04).

Thanks
Christophe de Dinechin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-25 17:50 File system corruption, btrfsck abort Christophe de Dinechin
@ 2017-04-27 14:58 ` Christophe de Dinechin
  2017-04-27 15:12   ` Christophe de Dinechin
  2017-04-28  0:45 ` Qu Wenruo
  2017-04-28  3:58 ` Chris Murphy
  2 siblings, 1 reply; 17+ messages in thread
From: Christophe de Dinechin @ 2017-04-27 14:58 UTC (permalink / raw)
  To: linux-btrfs

> On 25 Apr 2017, at 19:50, Christophe de Dinechin <dinechin@redhat.com> wrote:

> The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this:
> 
> 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120
> 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80
> 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120
> 
> I don’t really know what to make of it.

I dig a bit deeper. We set rec->max_size = 0 in add_extent_rec_nolookup called from add_tree_backref, where we cleared the extent_record tmpl with a memset, so indeed, max_size is 0. However, we immediately after that do a lookup_cache_extent with a size of 1. So I wonder if at that stage, we should not set max_size to 1 for the newly created extent record.

Opinions?

Christophe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-27 14:58 ` Christophe de Dinechin
@ 2017-04-27 15:12   ` Christophe de Dinechin
  0 siblings, 0 replies; 17+ messages in thread
From: Christophe de Dinechin @ 2017-04-27 15:12 UTC (permalink / raw)
  To: linux-btrfs


> On 27 Apr 2017, at 16:58, Christophe de Dinechin <dinechin@redhat.com> wrote:
> 
>> On 25 Apr 2017, at 19:50, Christophe de Dinechin <dinechin@redhat.com> wrote:
> 
>> The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this:
>> 
>> 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120
>> 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80
>> 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120
>> 
>> I don’t really know what to make of it.
> 
> I dig a bit deeper. We set rec->max_size = 0 in add_extent_rec_nolookup called from add_tree_backref, where we cleared the extent_record tmpl with a memset, so indeed, max_size is 0. However, we immediately after that do a lookup_cache_extent with a size of 1. So I wonder if at that stage, we should not set max_size to 1 for the newly created extent record.

Well, for what it’s worth, it does not seem to help much:

*** Error in `btrfs check': double free or corruption (!prev): 0x0000000007d9c430 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7925b)[0x7ffff6feb25b]
/lib64/libc.so.6(+0x828ea)[0x7ffff6ff48ea]
/lib64/libc.so.6(cfree+0x4c)[0x7ffff6ff831c]
btrfs check[0x44d784]
btrfs check[0x4531ac]
btrfs check[0x45b24b]
btrfs check[0x45b743]
btrfs check[0x45c2b1]
btrfs check(cmd_check+0xcad)[0x4602d4]
btrfs check(main+0x8b)[0x40b7fb]
/lib64/libc.so.6(__libc_start_main+0xf1)[0x7ffff6f92401]
btrfs check(_start+0x2a)[0x40b4ba]
======= Memory map: ========
00400000-004a4000 r-xp 00000000 08:35 25167142                           /home/ddd/Work/btrfs-progs/btrfs
006a4000-006a8000 r--p 000a4000 08:35 25167142                           /home/ddd/Work/btrfs-progs/btrfs
006a8000-006fb000 rw-p 000a8000 08:35 25167142                           /home/ddd/Work/btrfs-progs/btrfs
006fb000-316a6000 rw-p 00000000 00:00 0                                  [heap]
7ffff0000000-7ffff0021000 rw-p 00000000 00:00 0 
7ffff0021000-7ffff4000000 ---p 00000000 00:00 0 
7ffff6d5b000-7ffff6d71000 r-xp 00000000 08:33 3156890                    /usr/lib64/libgcc_s-6.3.1-20161221.so.1
7ffff6d71000-7ffff6f70000 ---p 00016000 08:33 3156890                    /usr/lib64/libgcc_s-6.3.1-20161221.so.1
7ffff6f70000-7ffff6f71000 r--p 00015000 08:33 3156890                    /usr/lib64/libgcc_s-6.3.1-20161221.so.1
7ffff6f71000-7ffff6f72000 rw-p 00016000 08:33 3156890                    /usr/lib64/libgcc_s-6.3.1-20161221.so.1
7ffff6f72000-7ffff712f000 r-xp 00000000 08:33 3154711                    /usr/lib64/libc-2.24.so
7ffff712f000-7ffff732e000 ---p 001bd000 08:33 3154711                    /usr/lib64/libc-2.24.so
7ffff732e000-7ffff7332000 r--p 001bc000 08:33 3154711                    /usr/lib64/libc-2.24.so
7ffff7332000-7ffff7334000 rw-p 001c0000 08:33 3154711                    /usr/lib64/libc-2.24.so
7ffff7334000-7ffff7338000 rw-p 00000000 00:00 0 
7ffff7338000-7ffff7350000 r-xp 00000000 08:33 3155302                    /usr/lib64/libpthread-2.24.so
7ffff7350000-7ffff7550000 ---p 00018000 08:33 3155302                    /usr/lib64/libpthread-2.24.so
7ffff7550000-7ffff7551000 r--p 00018000 08:33 3155302                    /usr/lib64/libpthread-2.24.so
7ffff7551000-7ffff7552000 rw-p 00019000 08:33 3155302                    /usr/lib64/libpthread-2.24.so
7ffff7552000-7ffff7556000 rw-p 00000000 00:00 0 
7ffff7556000-7ffff7578000 r-xp 00000000 08:33 3155132                    /usr/lib64/liblzo2.so.2.0.0
7ffff7578000-7ffff7777000 ---p 00022000 08:33 3155132                    /usr/lib64/liblzo2.so.2.0.0
7ffff7777000-7ffff7778000 r--p 00021000 08:33 3155132                    /usr/lib64/liblzo2.so.2.0.0
7ffff7778000-7ffff7779000 rw-p 00000000 00:00 0 
7ffff7779000-7ffff778e000 r-xp 00000000 08:33 3155608                    /usr/lib64/libz.so.1.2.8
7ffff778e000-7ffff798d000 ---p 00015000 08:33 3155608                    /usr/lib64/libz.so.1.2.8
7ffff798d000-7ffff798e000 r--p 00014000 08:33 3155608                    /usr/lib64/libz.so.1.2.8
7ffff798e000-7ffff798f000 rw-p 00015000 08:33 3155608                    /usr/lib64/libz.so.1.2.8
7ffff798f000-7ffff79cc000 r-xp 00000000 08:33 3153511                    /usr/lib64/libblkid.so.1.1.0
7ffff79cc000-7ffff7bcc000 ---p 0003d000 08:33 3153511                    /usr/lib64/libblkid.so.1.1.0
7ffff7bcc000-7ffff7bd0000 r--p 0003d000 08:33 3153511                    /usr/lib64/libblkid.so.1.1.0
7ffff7bd0000-7ffff7bd1000 rw-p 00041000 08:33 3153511                    /usr/lib64/libblkid.so.1.1.0
7ffff7bd1000-7ffff7bd2000 rw-p 00000000 00:00 0 
7ffff7bd2000-7ffff7bd6000 r-xp 00000000 08:33 3154270                    /usr/lib64/libuuid.so.1.3.0
7ffff7bd6000-7ffff7dd5000 ---p 00004000 08:33 3154270                    /usr/lib64/libuuid.so.1.3.0
7ffff7dd5000-7ffff7dd6000 r--p 00003000 08:33 3154270                    /usr/lib64/libuuid.so.1.3.0
7ffff7dd6000-7ffff7dd7000 rw-p 00000000 00:00 0 
7ffff7dd7000-7ffff7dfc000 r-xp 00000000 08:33 3154536                    /usr/lib64/ld-2.24.so
7ffff7fdb000-7ffff7fe0000 rw-p 00000000 00:00 0 
7ffff7ff5000-7ffff7ff8000 rw-p 00000000 00:00 0 
7ffff7ff8000-7ffff7ffa000 r--p 00000000 00:00 0                          [vvar]
7ffff7ffa000-7ffff7ffc000 r-xp 00000000 00:00 0                          [vdso]
7ffff7ffc000-7ffff7ffd000 r--p 00025000 08:33 3154536                    /usr/lib64/ld-2.24.so
7ffff7ffd000-7ffff7ffe000 rw-p 00026000 08:33 3154536                    /usr/lib64/ld-2.24.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0 
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

This seems to match the other scenario I was referring to, with an inconsistent list.

> 
> Opinions?
> 
> Christophe
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-25 17:50 File system corruption, btrfsck abort Christophe de Dinechin
  2017-04-27 14:58 ` Christophe de Dinechin
@ 2017-04-28  0:45 ` Qu Wenruo
  2017-04-28  8:47   ` Christophe de Dinechin
  2017-04-28  3:58 ` Chris Murphy
  2 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2017-04-28  0:45 UTC (permalink / raw)
  To: Christophe de Dinechin, linux-btrfs



At 04/26/2017 01:50 AM, Christophe de Dinechin wrote:
> Hi,
> 
> 
> I”ve been trying to run btrfs as my primary work filesystem for about 3-4 months now on Fedora 25 systems. I ran a few times into filesystem corruptions. At least one I attributed to a damaged disk, but the last one is with a brand new 3T disk that reports no SMART errors. Worse yet, in at least three cases, the filesystem corruption caused btrfsck to crash.
> 
> The last filesystem corruption is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there.

According to the bugzilla, the btrfs-progs seems to be too old in btrfs 
standard.

What about using the latest btrfs-progs v4.10.2?

Furthermore for v4.10.2, btrfs check provides a new mode called lowmem.
You could try "btrfs check --mode=lowmem" to see if such problem can be 
avoided.

For the kernel bug, it seems to be related to wrongly inserted delayed 
ref, but I can totally be wrong.

Thanks,
Qu
> 
> The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash modes: either an abort or a SIGSEGV. I checked that both still happens on master as of today.
> 
> The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this:
> 
> 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120
> 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80
> 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120
> 
> I don’t really know what to make of it.
> 
> The cause of the SIGSEGV is that we try to free a list entry that has its next set to NULL.
> 
> #0  list_del (entry=0x555555db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
> #1  free_all_extent_backrefs (rec=0x555555db0350) at cmds-check.c:5386
> #2  maybe_free_extent_rec (extent_cache=0x7fffffffd990, rec=0x555555db0350) at cmds-check.c:5417
> #3  0x00005555555b308f in check_block (flags=<optimized out>, buf=0x55557b87cdf0, extent_cache=0x7fffffffd990, root=0x55555587d570) at cmds-check.c:5851
> #4  run_next_block (root=root@entry=0x55555587d570, bits=bits@entry=0x5555558841
> 
> I don’t know if the two problems are related, but they seem to be pretty consistent on this specific disk, so I think that we have a good opportunity to improve btrfsck to make it more robust to this specific form of corruption. But I don’t want to hapazardly modify a code I don’t really understand. So if anybody could make a suggestion on what the right strategy should be when we have max_size == 0, or how to avoid it in the first place.
> 
> I don’t know if this is relevant at all, but all the machines that failed that way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat intense on occasions, since the VMs running there are part of a personal Jenkins ring that automatically builds various projects. Nominally, there are between three and five guests running (Windows XP, WIndows 10, macOS, Fedora25, Ubuntu 16.04).
> 
> 
> Thanks
> Christophe de Dinechin
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-28  0:45 ` Qu Wenruo
@ 2017-04-28  8:47   ` Christophe de Dinechin
  2017-05-02  0:17     ` Qu Wenruo
  0 siblings, 1 reply; 17+ messages in thread
From: Christophe de Dinechin @ 2017-04-28  8:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


> On 28 Apr 2017, at 02:45, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> 
> 
> 
> At 04/26/2017 01:50 AM, Christophe de Dinechin wrote:
>> Hi,
>> I”ve been trying to run btrfs as my primary work filesystem for about 3-4 months now on Fedora 25 systems. I ran a few times into filesystem corruptions. At least one I attributed to a damaged disk, but the last one is with a brand new 3T disk that reports no SMART errors. Worse yet, in at least three cases, the filesystem corruption caused btrfsck to crash.
>> The last filesystem corruption is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there.
> 
> According to the bugzilla, the btrfs-progs seems to be too old in btrfs standard.

> What about using the latest btrfs-progs v4.10.2?

I tried 4.10.1-1 https://bugzilla.redhat.com/show_bug.cgi?id=1435567#c4.

I am currently debugging with a build from the master branch as of Tuesday (commit bd0ab27afbf14370f9f0da1f5f5ecbb0adc654c1), which is 4.10.2

There was no change in behavior. Runs are split about evenly between list crash and abort.

I added instrumentation and tried a fix, which brings me a tiny bit further, until I hit a message from delete_duplicate_records:

Ok we have overlapping extents that aren't completely covered by each
other, this is going to require more careful thought.  The extents are
[52428800-16384] and [52432896-16384]

> Furthermore for v4.10.2, btrfs check provides a new mode called lowmem.
> You could try "btrfs check --mode=lowmem" to see if such problem can be avoided.

I will try that, but what makes you think this is a memory-related condition? The machine has 16G of RAM, isn’t that enough for an fsck?

> 
> For the kernel bug, it seems to be related to wrongly inserted delayed ref, but I can totally be wrong.

For now, I’m focusing on the “repair” part as much as I can, because I assume the kernel bug is there anyway, so someone else is bound to hit this problem.


Thanks
Christophe

> 
> Thanks,
> Qu
>> The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash modes: either an abort or a SIGSEGV. I checked that both still happens on master as of today.
>> The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this:
>> 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120
>> 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80
>> 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120
>> I don’t really know what to make of it.
>> The cause of the SIGSEGV is that we try to free a list entry that has its next set to NULL.
>> #0  list_del (entry=0x555555db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
>> #1  free_all_extent_backrefs (rec=0x555555db0350) at cmds-check.c:5386
>> #2  maybe_free_extent_rec (extent_cache=0x7fffffffd990, rec=0x555555db0350) at cmds-check.c:5417
>> #3  0x00005555555b308f in check_block (flags=<optimized out>, buf=0x55557b87cdf0, extent_cache=0x7fffffffd990, root=0x55555587d570) at cmds-check.c:5851
>> #4  run_next_block (root=root@entry=0x55555587d570, bits=bits@entry=0x5555558841
>> I don’t know if the two problems are related, but they seem to be pretty consistent on this specific disk, so I think that we have a good opportunity to improve btrfsck to make it more robust to this specific form of corruption. But I don’t want to hapazardly modify a code I don’t really understand. So if anybody could make a suggestion on what the right strategy should be when we have max_size == 0, or how to avoid it in the first place.
>> I don’t know if this is relevant at all, but all the machines that failed that way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat intense on occasions, since the VMs running there are part of a personal Jenkins ring that automatically builds various projects. Nominally, there are between three and five guests running (Windows XP, WIndows 10, macOS, Fedora25, Ubuntu 16.04).
>> Thanks
>> Christophe de Dinechin
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-28  8:47   ` Christophe de Dinechin
@ 2017-05-02  0:17     ` Qu Wenruo
  2017-05-03 14:21       ` Christophe de Dinechin
  0 siblings, 1 reply; 17+ messages in thread
From: Qu Wenruo @ 2017-05-02  0:17 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: linux-btrfs



At 04/28/2017 04:47 PM, Christophe de Dinechin wrote:
> 
>> On 28 Apr 2017, at 02:45, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>>
>> At 04/26/2017 01:50 AM, Christophe de Dinechin wrote:
>>> Hi,
>>> I”ve been trying to run btrfs as my primary work filesystem for about 3-4 months now on Fedora 25 systems. I ran a few times into filesystem corruptions. At least one I attributed to a damaged disk, but the last one is with a brand new 3T disk that reports no SMART errors. Worse yet, in at least three cases, the filesystem corruption caused btrfsck to crash.
>>> The last filesystem corruption is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there.
>>
>> According to the bugzilla, the btrfs-progs seems to be too old in btrfs standard.
> 
>> What about using the latest btrfs-progs v4.10.2?
> 
> I tried 4.10.1-1 https://bugzilla.redhat.com/show_bug.cgi?id=1435567#c4.
> 
> I am currently debugging with a build from the master branch as of Tuesday (commit bd0ab27afbf14370f9f0da1f5f5ecbb0adc654c1), which is 4.10.2
> 
> There was no change in behavior. Runs are split about evenly between list crash and abort.
> 
> I added instrumentation and tried a fix, which brings me a tiny bit further, until I hit a message from delete_duplicate_records:
> 
> Ok we have overlapping extents that aren't completely covered by each
> other, this is going to require more careful thought.  The extents are
> [52428800-16384] and [52432896-16384]

Then I think lowmem mode may have better chance to handle it without crash.

> 
>> Furthermore for v4.10.2, btrfs check provides a new mode called lowmem.
>> You could try "btrfs check --mode=lowmem" to see if such problem can be avoided.
> 
> I will try that, but what makes you think this is a memory-related condition? The machine has 16G of RAM, isn’t that enough for an fsck?

Not for memory usage, but in fact lowmem mode is a completely rework, so 
I just want to see how good or bad the new lowmem mode handles it.

Thanks,
Qu

> 
>>
>> For the kernel bug, it seems to be related to wrongly inserted delayed ref, but I can totally be wrong.
> 
> For now, I’m focusing on the “repair” part as much as I can, because I assume the kernel bug is there anyway, so someone else is bound to hit this problem.
> 
> 
> Thanks
> Christophe
> 
>>
>> Thanks,
>> Qu
>>> The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash modes: either an abort or a SIGSEGV. I checked that both still happens on master as of today.
>>> The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this:
>>> 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120
>>> 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80
>>> 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120
>>> I don’t really know what to make of it.
>>> The cause of the SIGSEGV is that we try to free a list entry that has its next set to NULL.
>>> #0  list_del (entry=0x555555db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
>>> #1  free_all_extent_backrefs (rec=0x555555db0350) at cmds-check.c:5386
>>> #2  maybe_free_extent_rec (extent_cache=0x7fffffffd990, rec=0x555555db0350) at cmds-check.c:5417
>>> #3  0x00005555555b308f in check_block (flags=<optimized out>, buf=0x55557b87cdf0, extent_cache=0x7fffffffd990, root=0x55555587d570) at cmds-check.c:5851
>>> #4  run_next_block (root=root@entry=0x55555587d570, bits=bits@entry=0x5555558841
>>> I don’t know if the two problems are related, but they seem to be pretty consistent on this specific disk, so I think that we have a good opportunity to improve btrfsck to make it more robust to this specific form of corruption. But I don’t want to hapazardly modify a code I don’t really understand. So if anybody could make a suggestion on what the right strategy should be when we have max_size == 0, or how to avoid it in the first place.
>>> I don’t know if this is relevant at all, but all the machines that failed that way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat intense on occasions, since the VMs running there are part of a personal Jenkins ring that automatically builds various projects. Nominally, there are between three and five guests running (Windows XP, WIndows 10, macOS, Fedora25, Ubuntu 16.04).
>>> Thanks
>>> Christophe de Dinechin
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-05-02  0:17     ` Qu Wenruo
@ 2017-05-03 14:21       ` Christophe de Dinechin
  2017-05-04 12:33         ` Christophe de Dinechin
  2017-05-05  0:18         ` Qu Wenruo
  0 siblings, 2 replies; 17+ messages in thread
From: Christophe de Dinechin @ 2017-05-03 14:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


> On 2 May 2017, at 02:17, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
> 
> 
> 
> At 04/28/2017 04:47 PM, Christophe de Dinechin wrote:
>>> On 28 Apr 2017, at 02:45, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>> 
>>> 
>>> 
>>> At 04/26/2017 01:50 AM, Christophe de Dinechin wrote:
>>>> Hi,
>>>> I”ve been trying to run btrfs as my primary work filesystem for about 3-4 months now on Fedora 25 systems. I ran a few times into filesystem corruptions. At least one I attributed to a damaged disk, but the last one is with a brand new 3T disk that reports no SMART errors. Worse yet, in at least three cases, the filesystem corruption caused btrfsck to crash.
>>>> The last filesystem corruption is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there.
>>> 
>>> According to the bugzilla, the btrfs-progs seems to be too old in btrfs standard.
>>> What about using the latest btrfs-progs v4.10.2?
>> I tried 4.10.1-1 https://bugzilla.redhat.com/show_bug.cgi?id=1435567#c4.
>> I am currently debugging with a build from the master branch as of Tuesday (commit bd0ab27afbf14370f9f0da1f5f5ecbb0adc654c1), which is 4.10.2
>> There was no change in behavior. Runs are split about evenly between list crash and abort.
>> I added instrumentation and tried a fix, which brings me a tiny bit further, until I hit a message from delete_duplicate_records:
>> Ok we have overlapping extents that aren't completely covered by each
>> other, this is going to require more careful thought.  The extents are
>> [52428800-16384] and [52432896-16384]
> 
> Then I think lowmem mode may have better chance to handle it without crash.

I tried it and got:

[root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem --repair /dev/sda4
enabling repair mode
ERROR: low memory mode doesn't support repair yet

The problem only occurred in —repair mode anyway.


> 
>>> Furthermore for v4.10.2, btrfs check provides a new mode called lowmem.
>>> You could try "btrfs check --mode=lowmem" to see if such problem can be avoided.
>> I will try that, but what makes you think this is a memory-related condition? The machine has 16G of RAM, isn’t that enough for an fsck?
> 
> Not for memory usage, but in fact lowmem mode is a completely rework, so I just want to see how good or bad the new lowmem mode handles it.

Is there a prototype with lowmem and repair?


Thanks
Christophe

> 
> Thanks,
> Qu
> 
>>> 
>>> For the kernel bug, it seems to be related to wrongly inserted delayed ref, but I can totally be wrong.
>> For now, I’m focusing on the “repair” part as much as I can, because I assume the kernel bug is there anyway, so someone else is bound to hit this problem.
>> Thanks
>> Christophe
>>> 
>>> Thanks,
>>> Qu
>>>> The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash modes: either an abort or a SIGSEGV. I checked that both still happens on master as of today.
>>>> The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this:
>>>> 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120
>>>> 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80
>>>> 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120
>>>> I don’t really know what to make of it.
>>>> The cause of the SIGSEGV is that we try to free a list entry that has its next set to NULL.
>>>> #0  list_del (entry=0x555555db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
>>>> #1  free_all_extent_backrefs (rec=0x555555db0350) at cmds-check.c:5386
>>>> #2  maybe_free_extent_rec (extent_cache=0x7fffffffd990, rec=0x555555db0350) at cmds-check.c:5417
>>>> #3  0x00005555555b308f in check_block (flags=<optimized out>, buf=0x55557b87cdf0, extent_cache=0x7fffffffd990, root=0x55555587d570) at cmds-check.c:5851
>>>> #4  run_next_block (root=root@entry=0x55555587d570, bits=bits@entry=0x5555558841
>>>> I don’t know if the two problems are related, but they seem to be pretty consistent on this specific disk, so I think that we have a good opportunity to improve btrfsck to make it more robust to this specific form of corruption. But I don’t want to hapazardly modify a code I don’t really understand. So if anybody could make a suggestion on what the right strategy should be when we have max_size == 0, or how to avoid it in the first place.
>>>> I don’t know if this is relevant at all, but all the machines that failed that way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat intense on occasions, since the VMs running there are part of a personal Jenkins ring that automatically builds various projects. Nominally, there are between three and five guests running (Windows XP, WIndows 10, macOS, Fedora25, Ubuntu 16.04).
>>>> Thanks
>>>> Christophe de Dinechin
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-05-03 14:21       ` Christophe de Dinechin
@ 2017-05-04 12:33         ` Christophe de Dinechin
  2017-05-05  0:18         ` Qu Wenruo
  1 sibling, 0 replies; 17+ messages in thread
From: Christophe de Dinechin @ 2017-05-04 12:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


> On 3 May 2017, at 16:21, Christophe de Dinechin <dinechin@redhat.com> wrote:
> 
>> 
>> On 2 May 2017, at 02:17, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>> 
>> 
>> 
>> At 04/28/2017 04:47 PM, Christophe de Dinechin wrote:
>>>> On 28 Apr 2017, at 02:45, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>> 
>>>> 
>>>> 
>>>> At 04/26/2017 01:50 AM, Christophe de Dinechin wrote:
>>>>> Hi,
>>>>> I”ve been trying to run btrfs as my primary work filesystem for about 3-4 months now on Fedora 25 systems. I ran a few times into filesystem corruptions. At least one I attributed to a damaged disk, but the last one is with a brand new 3T disk that reports no SMART errors. Worse yet, in at least three cases, the filesystem corruption caused btrfsck to crash.
>>>>> The last filesystem corruption is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there.
>>>> 
>>>> According to the bugzilla, the btrfs-progs seems to be too old in btrfs standard.
>>>> What about using the latest btrfs-progs v4.10.2?
>>> I tried 4.10.1-1 https://bugzilla.redhat.com/show_bug.cgi?id=1435567#c4.
>>> I am currently debugging with a build from the master branch as of Tuesday (commit bd0ab27afbf14370f9f0da1f5f5ecbb0adc654c1), which is 4.10.2
>>> There was no change in behavior. Runs are split about evenly between list crash and abort.
>>> I added instrumentation and tried a fix, which brings me a tiny bit further, until I hit a message from delete_duplicate_records:
>>> Ok we have overlapping extents that aren't completely covered by each
>>> other, this is going to require more careful thought.  The extents are
>>> [52428800-16384] and [52432896-16384]
>> 
>> Then I think lowmem mode may have better chance to handle it without crash.
> 
> I tried it and got:
> 
> [root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem --repair /dev/sda4
> enabling repair mode
> ERROR: low memory mode doesn't support repair yet
> 
> The problem only occurred in —repair mode anyway.

For what it’s worth, without the --repair option, it gets stuck. I stopped it after 24 hours, it had printed:

[root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem  /dev/sda4
Checking filesystem on /dev/sda4
UUID: 26a0c84c-d2ac-4da8-b880-684f2ea48a22
checking extents
checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03
checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03
checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03
checksum verify failed on 52428800 found E3ADA767 wanted 7C506C03
Csum didn't match
ERROR: extent [52428800 16384] lost referencer (owner: 7, level: 0)
checksum verify failed on 52445184 found 8D1BE62F wanted 00000000
checksum verify failed on 52445184 found 8D1BE62F wanted 00000000
checksum verify failed on 52445184 found 8D1BE62F wanted 00000000
checksum verify failed on 52445184 found 8D1BE62F wanted 00000000
bytenr mismatch, want=52445184, have=2199023255552
ERROR: extent [52445184 16384] lost referencer (owner: 2, level: 0)
ERROR: extent[52432896 16384] backref lost (owner: 2, level: 0)
ERROR: check leaf failed root 2 bytenr 52432896 level 0, force continue check

Any tips for further debugging this?


Christophe

> 
> 
>> 
>>>> Furthermore for v4.10.2, btrfs check provides a new mode called lowmem.
>>>> You could try "btrfs check --mode=lowmem" to see if such problem can be avoided.
>>> I will try that, but what makes you think this is a memory-related condition? The machine has 16G of RAM, isn’t that enough for an fsck?
>> 
>> Not for memory usage, but in fact lowmem mode is a completely rework, so I just want to see how good or bad the new lowmem mode handles it.
> 
> Is there a prototype with lowmem and repair?
> 
> 
> Thanks
> Christophe
> 
>> 
>> Thanks,
>> Qu
>> 
>>>> 
>>>> For the kernel bug, it seems to be related to wrongly inserted delayed ref, but I can totally be wrong.
>>> For now, I’m focusing on the “repair” part as much as I can, because I assume the kernel bug is there anyway, so someone else is bound to hit this problem.
>>> Thanks
>>> Christophe
>>>> 
>>>> Thanks,
>>>> Qu
>>>>> The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash modes: either an abort or a SIGSEGV. I checked that both still happens on master as of today.
>>>>> The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this:
>>>>> 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120
>>>>> 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80
>>>>> 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120
>>>>> I don’t really know what to make of it.
>>>>> The cause of the SIGSEGV is that we try to free a list entry that has its next set to NULL.
>>>>> #0  list_del (entry=0x555555db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
>>>>> #1  free_all_extent_backrefs (rec=0x555555db0350) at cmds-check.c:5386
>>>>> #2  maybe_free_extent_rec (extent_cache=0x7fffffffd990, rec=0x555555db0350) at cmds-check.c:5417
>>>>> #3  0x00005555555b308f in check_block (flags=<optimized out>, buf=0x55557b87cdf0, extent_cache=0x7fffffffd990, root=0x55555587d570) at cmds-check.c:5851
>>>>> #4  run_next_block (root=root@entry=0x55555587d570, bits=bits@entry=0x5555558841
>>>>> I don’t know if the two problems are related, but they seem to be pretty consistent on this specific disk, so I think that we have a good opportunity to improve btrfsck to make it more robust to this specific form of corruption. But I don’t want to hapazardly modify a code I don’t really understand. So if anybody could make a suggestion on what the right strategy should be when we have max_size == 0, or how to avoid it in the first place.
>>>>> I don’t know if this is relevant at all, but all the machines that failed that way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat intense on occasions, since the VMs running there are part of a personal Jenkins ring that automatically builds various projects. Nominally, there are between three and five guests running (Windows XP, WIndows 10, macOS, Fedora25, Ubuntu 16.04).
>>>>> Thanks
>>>>> Christophe de Dinechin
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-05-03 14:21       ` Christophe de Dinechin
  2017-05-04 12:33         ` Christophe de Dinechin
@ 2017-05-05  0:18         ` Qu Wenruo
  1 sibling, 0 replies; 17+ messages in thread
From: Qu Wenruo @ 2017-05-05  0:18 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: linux-btrfs



At 05/03/2017 10:21 PM, Christophe de Dinechin wrote:
> 
>> On 2 May 2017, at 02:17, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>
>>
>>
>> At 04/28/2017 04:47 PM, Christophe de Dinechin wrote:
>>>> On 28 Apr 2017, at 02:45, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote:
>>>>
>>>>
>>>>
>>>> At 04/26/2017 01:50 AM, Christophe de Dinechin wrote:
>>>>> Hi,
>>>>> I”ve been trying to run btrfs as my primary work filesystem for about 3-4 months now on Fedora 25 systems. I ran a few times into filesystem corruptions. At least one I attributed to a damaged disk, but the last one is with a brand new 3T disk that reports no SMART errors. Worse yet, in at least three cases, the filesystem corruption caused btrfsck to crash.
>>>>> The last filesystem corruption is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there.
>>>>
>>>> According to the bugzilla, the btrfs-progs seems to be too old in btrfs standard.
>>>> What about using the latest btrfs-progs v4.10.2?
>>> I tried 4.10.1-1 https://bugzilla.redhat.com/show_bug.cgi?id=1435567#c4.
>>> I am currently debugging with a build from the master branch as of Tuesday (commit bd0ab27afbf14370f9f0da1f5f5ecbb0adc654c1), which is 4.10.2
>>> There was no change in behavior. Runs are split about evenly between list crash and abort.
>>> I added instrumentation and tried a fix, which brings me a tiny bit further, until I hit a message from delete_duplicate_records:
>>> Ok we have overlapping extents that aren't completely covered by each
>>> other, this is going to require more careful thought.  The extents are
>>> [52428800-16384] and [52432896-16384]
>>
>> Then I think lowmem mode may have better chance to handle it without crash.
> 
> I tried it and got:
> 
> [root@rescue ~]# /usr/local/bin/btrfsck --mode=lowmem --repair /dev/sda4
> enabling repair mode
> ERROR: low memory mode doesn't support repair yet
> 
> The problem only occurred in —repair mode anyway.
> 
> 
>>
>>>> Furthermore for v4.10.2, btrfs check provides a new mode called lowmem.
>>>> You could try "btrfs check --mode=lowmem" to see if such problem can be avoided.
>>> I will try that, but what makes you think this is a memory-related condition? The machine has 16G of RAM, isn’t that enough for an fsck?
>>
>> Not for memory usage, but in fact lowmem mode is a completely rework, so I just want to see how good or bad the new lowmem mode handles it.
> 
> Is there a prototype with lowmem and repair?

Yes, Su Yue submitted a patchset for it, but still repair is only 
supported for fs tree contents.
https://www.spinics.net/lists/linux-btrfs/msg63316.html

Repairing other trees, especially extent tree, is not supported yet.

Thanks,
Qu
> 
> 
> Thanks
> Christophe
> 
>>
>> Thanks,
>> Qu
>>
>>>>
>>>> For the kernel bug, it seems to be related to wrongly inserted delayed ref, but I can totally be wrong.
>>> For now, I’m focusing on the “repair” part as much as I can, because I assume the kernel bug is there anyway, so someone else is bound to hit this problem.
>>> Thanks
>>> Christophe
>>>>
>>>> Thanks,
>>>> Qu
>>>>> The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash modes: either an abort or a SIGSEGV. I checked that both still happens on master as of today.
>>>>> The cause of the abort is that we call set_extent_dirty from check_extent_refs with rec->max_size == 0. I’ve instrumented to try to see where we set this to 0 (see https://github.com/c3d/btrfs-progs/tree/rhbz1435567), and indeed, we do sometimes see max_size set to 0 in a few locations. My instrumentation shows this:
>>>>> 78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120
>>>>> 78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80
>>>>> 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120
>>>>> I don’t really know what to make of it.
>>>>> The cause of the SIGSEGV is that we try to free a list entry that has its next set to NULL.
>>>>> #0  list_del (entry=0x555555db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
>>>>> #1  free_all_extent_backrefs (rec=0x555555db0350) at cmds-check.c:5386
>>>>> #2  maybe_free_extent_rec (extent_cache=0x7fffffffd990, rec=0x555555db0350) at cmds-check.c:5417
>>>>> #3  0x00005555555b308f in check_block (flags=<optimized out>, buf=0x55557b87cdf0, extent_cache=0x7fffffffd990, root=0x55555587d570) at cmds-check.c:5851
>>>>> #4  run_next_block (root=root@entry=0x55555587d570, bits=bits@entry=0x5555558841
>>>>> I don’t know if the two problems are related, but they seem to be pretty consistent on this specific disk, so I think that we have a good opportunity to improve btrfsck to make it more robust to this specific form of corruption. But I don’t want to hapazardly modify a code I don’t really understand. So if anybody could make a suggestion on what the right strategy should be when we have max_size == 0, or how to avoid it in the first place.
>>>>> I don’t know if this is relevant at all, but all the machines that failed that way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat intense on occasions, since the VMs running there are part of a personal Jenkins ring that automatically builds various projects. Nominally, there are between three and five guests running (Windows XP, WIndows 10, macOS, Fedora25, Ubuntu 16.04).
>>>>> Thanks
>>>>> Christophe de Dinechin
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-25 17:50 File system corruption, btrfsck abort Christophe de Dinechin
  2017-04-27 14:58 ` Christophe de Dinechin
  2017-04-28  0:45 ` Qu Wenruo
@ 2017-04-28  3:58 ` Chris Murphy
       [not found]   ` <2CE52079-1B96-4FB3-8CEF-05FC6D3CB183@redhat.com>
  2 siblings, 1 reply; 17+ messages in thread
From: Chris Murphy @ 2017-04-28  3:58 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: Btrfs BTRFS

On Tue, Apr 25, 2017 at 11:50 AM, Christophe de Dinechin
<dinechin@redhat.com> wrote:

>
> The last filesystem corruption is documented here: https://bugzilla.redhat.com/show_bug.cgi?id=1444821. The dmesg log is in there.

And also from the bug:
>How reproducible: Seen at least 4 times on 3 different disks and 2 different systems.

I've been using Fedora and Btrfs on 1/2 dozen different kinds of
hardware since around Fedora 13. The oldest file systems are about 2
years old. I've not seen file system corruption. So I'd say there's
some kind of workload that's helping to trigger it or it's hardware
related; that it's happening on multiple systems makes me wonder if
it's power related.

>
> The btrfsck crash is here: https://bugzilla.redhat.com/show_bug.cgi?id=1435567. I have two crash modes: either an abort or a SIGSEGV. I checked that both still happens on master as of today.

The btrfs check crash is another matter. I've seen it crash many
times, but the more recent versions are more reliable and haven't seen
a crash lately.

> I don’t know if this is relevant at all, but all the machines that failed that way were used to run VMs with KVM/QEMU. DIsk activity tends to be somewhat intense on occasions, since the VMs running there are part of a personal Jenkins ring that automatically builds various projects. Nominally, there are between three and five guests running (Windows XP, WIndows 10, macOS, Fedora25, Ubuntu 16.04).

I do run VM's quite often with all of my setups but rarely two
concurrently and never three or more. So, hmmm. And are the VM's
backed by a qemu image on Btrfs? Or LVM?

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

[parent not found: <2CE52079-1B96-4FB3-8CEF-05FC6D3CB183@redhat.com>]

* Re: File system corruption, btrfsck abort
       [not found]   ` <2CE52079-1B96-4FB3-8CEF-05FC6D3CB183@redhat.com>
@ 2017-04-28 20:09     ` Chris Murphy
  2017-04-29  8:46       ` Christophe de Dinechin
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Murphy @ 2017-04-28 20:09 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: Btrfs BTRFS

On Fri, Apr 28, 2017 at 3:10 AM, Christophe de Dinechin
<dinechin@redhat.com> wrote:

>
> QEMU qcow2. Host is BTRFS. Guests are BTRFS, LVM, Ext4, NTFS (winXP and
> win10) and HFS+ (macOS Sierra). I think I had 7 VMs installed, planned to
> restore another 8 from backups before my previous disk crash. I usually have
> at least 2 running, often as many as 5 (fedora, ubuntu, winXP, win10, macOS)
> to cover my software testing needs.

That is quite a torture test for any file system but more so Btrfs.
How are the qcow2 files being created? What's the qemu-img create
command? In particular i'm wondering if these qcow2 files are cow or
nocow; if they're compressed by Btrfs; and how many fragments they
have with filefrag.

When I was using qcow2 for backing I used

qemu-img create -f qcow2 -o preallocation=falloc,nocow=on,lazy_refcounts=on

But then later I started using fallocated raw files with chattr +C
applied. And these days I'm just using LVM thin volumes. The journaled
file systems in a guest cause a ton of backing file fragmentation
unless nocow is used on Btrfs. I've seen hundreds of thousands of
extents for a single backing file for a Windows guest.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-28 20:09     ` Chris Murphy
@ 2017-04-29  8:46       ` Christophe de Dinechin
  2017-04-29 19:13         ` Chris Murphy
  2017-04-29 19:18         ` Chris Murphy
  0 siblings, 2 replies; 17+ messages in thread
From: Christophe de Dinechin @ 2017-04-29  8:46 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS


> On 28 Apr 2017, at 22:09, Chris Murphy <lists@colorremedies.com> wrote:
> 
> On Fri, Apr 28, 2017 at 3:10 AM, Christophe de Dinechin
> <dinechin@redhat.com> wrote:
> 
>> 
>> QEMU qcow2. Host is BTRFS. Guests are BTRFS, LVM, Ext4, NTFS (winXP and
>> win10) and HFS+ (macOS Sierra). I think I had 7 VMs installed, planned to
>> restore another 8 from backups before my previous disk crash. I usually have
>> at least 2 running, often as many as 5 (fedora, ubuntu, winXP, win10, macOS)
>> to cover my software testing needs.
> 
> That is quite a torture test for any file system but more so Btrfs.

Sorry, but could you elaborate why it’s worse for btrfs?

> How are the qcow2 files being created?

In most cases, default qcow2 configuration as given by virt-manager.

> What's the qemu-img create
> command? In particular i'm wondering if these qcow2 files are cow or
> nocow; if they're compressed by Btrfs; and how many fragments they
> have with filefrag.

I suspect they are cow. I’ll check (on the other machine with a similar setup) when I’m back home.


> 
> When I was using qcow2 for backing I used
> 
> qemu-img create -f qcow2 -o preallocation=falloc,nocow=on,lazy_refcounts=on
> 
> But then later I started using fallocated raw files with chattr +C
> applied. And these days I'm just using LVM thin volumes. The journaled
> file systems in a guest cause a ton of backing file fragmentation
> unless nocow is used on Btrfs. I've seen hundreds of thousands of
> extents for a single backing file for a Windows guest.

Are there btrfs commands I could run on a read-only filesystem that would give me this information?

Thanks
Christophe

> 
> -- 
> Chris Murphy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-29  8:46       ` Christophe de Dinechin
@ 2017-04-29 19:13         ` Chris Murphy
  2017-05-03 14:17           ` Christophe de Dinechin
  2017-04-29 19:18         ` Chris Murphy
  1 sibling, 1 reply; 17+ messages in thread
From: Chris Murphy @ 2017-04-29 19:13 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: Btrfs BTRFS

On Sat, Apr 29, 2017 at 2:46 AM, Christophe de Dinechin
<dinechin@redhat.com> wrote:
>
>> On 28 Apr 2017, at 22:09, Chris Murphy <lists@colorremedies.com> wrote:
>>
>> On Fri, Apr 28, 2017 at 3:10 AM, Christophe de Dinechin
>> <dinechin@redhat.com> wrote:
>>
>>>
>>> QEMU qcow2. Host is BTRFS. Guests are BTRFS, LVM, Ext4, NTFS (winXP and
>>> win10) and HFS+ (macOS Sierra). I think I had 7 VMs installed, planned to
>>> restore another 8 from backups before my previous disk crash. I usually have
>>> at least 2 running, often as many as 5 (fedora, ubuntu, winXP, win10, macOS)
>>> to cover my software testing needs.
>>
>> That is quite a torture test for any file system but more so Btrfs.
>
> Sorry, but could you elaborate why it’s worse for btrfs?


Copy on write. Four of your five guests use non-cow filesystems, so
any overwrite, think journal writes, are new extent writes in Btrfs.
Nothing is overwritten in Btrfs. Only after the write completes are
the stale extents released. So you get a lot of fragmentation, and all
of these tasks you're doing become very metadata heavy workloads.

However, what you're doing should work. The consequence should only be
one of performance, not file system integrity. So your configuration
is useful for testing and making Btrfs better.



>
>> How are the qcow2 files being created?
>
> In most cases, default qcow2 configuration as given by virt-manager.
>
>> What's the qemu-img create
>> command? In particular i'm wondering if these qcow2 files are cow or
>> nocow; if they're compressed by Btrfs; and how many fragments they
>> have with filefrag.
>
> I suspect they are cow. I’ll check (on the other machine with a similar setup) when I’m back home.

Check the qcow2 files with filefrag and see how many extents they
have. I'll bet they're massively fragmented.


>> When I was using qcow2 for backing I used
>>
>> qemu-img create -f qcow2 -o preallocation=falloc,nocow=on,lazy_refcounts=on
>>
>> But then later I started using fallocated raw files with chattr +C
>> applied. And these days I'm just using LVM thin volumes. The journaled
>> file systems in a guest cause a ton of backing file fragmentation
>> unless nocow is used on Btrfs. I've seen hundreds of thousands of
>> extents for a single backing file for a Windows guest.
>
> Are there btrfs commands I could run on a read-only filesystem that would give me this information?

lsattr


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-29 19:13         ` Chris Murphy
@ 2017-05-03 14:17           ` Christophe de Dinechin
  2017-05-03 14:49             ` Austin S. Hemmelgarn
  2017-05-03 17:43             ` Chris Murphy
  0 siblings, 2 replies; 17+ messages in thread
From: Christophe de Dinechin @ 2017-05-03 14:17 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS


> On 29 Apr 2017, at 21:13, Chris Murphy <lists@colorremedies.com> wrote:
> 
> On Sat, Apr 29, 2017 at 2:46 AM, Christophe de Dinechin
> <dinechin@redhat.com> wrote:
>> 
>>> On 28 Apr 2017, at 22:09, Chris Murphy <lists@colorremedies.com> wrote:
>>> 
>>> On Fri, Apr 28, 2017 at 3:10 AM, Christophe de Dinechin
>>> <dinechin@redhat.com> wrote:
>>> 
>>>> 
>>>> QEMU qcow2. Host is BTRFS. Guests are BTRFS, LVM, Ext4, NTFS (winXP and
>>>> win10) and HFS+ (macOS Sierra). I think I had 7 VMs installed, planned to
>>>> restore another 8 from backups before my previous disk crash. I usually have
>>>> at least 2 running, often as many as 5 (fedora, ubuntu, winXP, win10, macOS)
>>>> to cover my software testing needs.
>>> 
>>> That is quite a torture test for any file system but more so Btrfs.
>> 
>> Sorry, but could you elaborate why it’s worse for btrfs?
> 
> 
> Copy on write. Four of your five guests use non-cow filesystems, so
> any overwrite, think journal writes, are new extent writes in Btrfs.
> Nothing is overwritten in Btrfs. Only after the write completes are
> the stale extents released. So you get a lot of fragmentation, and all
> of these tasks you're doing become very metadata heavy workloads.

Makes sense. Thanks for explaining.


> However, what you're doing should work. The consequence should only be
> one of performance, not file system integrity. So your configuration
> is useful for testing and making Btrfs better.

Yes. I just received a new machine, which is intended to become my primary host. That one I installed with ext4, so that I can keep pushing btrfs on my other two Linux hosts. Since I don’t care much about performance of the VMs either (they are build bots for a Jenkins setup), I can leave them in the current sub-optimal configuration.

> 
>>> How are the qcow2 files being created?
>> 
>> In most cases, default qcow2 configuration as given by virt-manager.
>> 
>>> What's the qemu-img create
>>> command? In particular i'm wondering if these qcow2 files are cow or
>>> nocow; if they're compressed by Btrfs; and how many fragments they
>>> have with filefrag.
>> 
>> I suspect they are cow. I’ll check (on the other machine with a similar setup) when I’m back home.
> 
> Check the qcow2 files with filefrag and see how many extents they
> have. I'll bet they're massively fragmented.

Indeed:

fedora25.qcow2: 28358 extents found
mac_hdd.qcow2: 79493 extents found
ubuntu14.04-64.qcow2: 35069 extents found
ubuntu14.04.qcow2: 240 extents found
ubuntu16.04-32.qcow2: 81 extents found
ubuntu16.04-64.qcow2: 15060 extents found
ubuntu16.10-64.qcow2: 228 extents found
win10.qcow2: 3438997 extents found
winxp.qcow2: 66657 extents found

I have no idea why my Win10 guest is so much worse than the others. It’s currently one of the least used, at least it’s not yet operating regularly in my build ring… But I had noticed that the installation of Visual Studio had taken quite a bit of time.

> 
>>> When I was using qcow2 for backing I used
>>> 
>>> qemu-img create -f qcow2 -o preallocation=falloc,nocow=on,lazy_refcounts=on
>>> 
>>> But then later I started using fallocated raw files with chattr +C
>>> applied. And these days I'm just using LVM thin volumes. The journaled
>>> file systems in a guest cause a ton of backing file fragmentation
>>> unless nocow is used on Btrfs. I've seen hundreds of thousands of
>>> extents for a single backing file for a Windows guest.
>> 
>> Are there btrfs commands I could run on a read-only filesystem that would give me this information?
> 
> lsattr

Hmmm. Does that even work on BTRFS? I get this, even after doing a chattr +C on one of the files.

------------------- fedora25.qcow2
------------------- mac_hdd.qcow2
------------------- ubuntu14.04-64.qcow2
------------------- ubuntu14.04.qcow2
------------------- ubuntu16.04-32.qcow2
------------------- ubuntu16.04-64.qcow2
------------------- ubuntu16.10-64.qcow2
------------------- win10.qcow2
------------------- winxp.qcow2

Thanks
Christophe

> 
> 
> -- 
> Chris Murphy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-05-03 14:17           ` Christophe de Dinechin
@ 2017-05-03 14:49             ` Austin S. Hemmelgarn
  2017-05-03 17:43             ` Chris Murphy
  1 sibling, 0 replies; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2017-05-03 14:49 UTC (permalink / raw)
  To: Christophe de Dinechin, Chris Murphy; +Cc: Btrfs BTRFS

On 2017-05-03 10:17, Christophe de Dinechin wrote:
>
>> On 29 Apr 2017, at 21:13, Chris Murphy <lists@colorremedies.com> wrote:
>>
>> On Sat, Apr 29, 2017 at 2:46 AM, Christophe de Dinechin
>> <dinechin@redhat.com> wrote:
>>>
>>>> On 28 Apr 2017, at 22:09, Chris Murphy <lists@colorremedies.com> wrote:
>>>>
>>>> On Fri, Apr 28, 2017 at 3:10 AM, Christophe de Dinechin
>>>> <dinechin@redhat.com> wrote:
>>>>
>>>>>
>>>>> QEMU qcow2. Host is BTRFS. Guests are BTRFS, LVM, Ext4, NTFS (winXP and
>>>>> win10) and HFS+ (macOS Sierra). I think I had 7 VMs installed, planned to
>>>>> restore another 8 from backups before my previous disk crash. I usually have
>>>>> at least 2 running, often as many as 5 (fedora, ubuntu, winXP, win10, macOS)
>>>>> to cover my software testing needs.
>>>>
>>>> That is quite a torture test for any file system but more so Btrfs.
>>>
>>> Sorry, but could you elaborate why it’s worse for btrfs?
>>
>>
>> Copy on write. Four of your five guests use non-cow filesystems, so
>> any overwrite, think journal writes, are new extent writes in Btrfs.
>> Nothing is overwritten in Btrfs. Only after the write completes are
>> the stale extents released. So you get a lot of fragmentation, and all
>> of these tasks you're doing become very metadata heavy workloads.
>
> Makes sense. Thanks for explaining.
>
>
>> However, what you're doing should work. The consequence should only be
>> one of performance, not file system integrity. So your configuration
>> is useful for testing and making Btrfs better.
>
> Yes. I just received a new machine, which is intended to become my primary host. That one I installed with ext4, so that I can keep pushing btrfs on my other two Linux hosts. Since I don’t care much about performance of the VMs either (they are build bots for a Jenkins setup), I can leave them in the current sub-optimal configuration.
On the note of performance, you can make things slightly better by 
defragmenting on a regular (weekly is what I would suggest) basis.  Make 
sure to defrag inside the guest first, then defrag the disk image file 
itself on the host if you do this though, as that will help ensure an 
optimal layout.  FWIW, tools like Ansible or Puppet are great for 
coordinating this.
>
>>
>>>> How are the qcow2 files being created?
>>>
>>> In most cases, default qcow2 configuration as given by virt-manager.
>>>
>>>> What's the qemu-img create
>>>> command? In particular i'm wondering if these qcow2 files are cow or
>>>> nocow; if they're compressed by Btrfs; and how many fragments they
>>>> have with filefrag.
>>>
>>> I suspect they are cow. I’ll check (on the other machine with a similar setup) when I’m back home.
>>
>> Check the qcow2 files with filefrag and see how many extents they
>> have. I'll bet they're massively fragmented.
>
> Indeed:
>
> fedora25.qcow2: 28358 extents found
> mac_hdd.qcow2: 79493 extents found
> ubuntu14.04-64.qcow2: 35069 extents found
> ubuntu14.04.qcow2: 240 extents found
> ubuntu16.04-32.qcow2: 81 extents found
> ubuntu16.04-64.qcow2: 15060 extents found
> ubuntu16.10-64.qcow2: 228 extents found
> win10.qcow2: 3438997 extents found
> winxp.qcow2: 66657 extents found
>
> I have no idea why my Win10 guest is so much worse than the others. It’s currently one of the least used, at least it’s not yet operating regularly in my build ring… But I had noticed that the installation of Visual Studio had taken quite a bit of time.
Windows 10 does a lot more background processing than XP, and a lot of 
it hits the disk (although most of what you are seeing is probably side 
effects from the automatically scheduled defrag job that Windows 10 
seems to have).  It also appears to have a different allocator in the 
NTFS driver which prefers to spread data under certain circumstances, 
and VM's appear to be one such situation.
>
>>
>>>> When I was using qcow2 for backing I used
>>>>
>>>> qemu-img create -f qcow2 -o preallocation=falloc,nocow=on,lazy_refcounts=on
>>>>
>>>> But then later I started using fallocated raw files with chattr +C
>>>> applied. And these days I'm just using LVM thin volumes. The journaled
>>>> file systems in a guest cause a ton of backing file fragmentation
>>>> unless nocow is used on Btrfs. I've seen hundreds of thousands of
>>>> extents for a single backing file for a Windows guest.
>>>
>>> Are there btrfs commands I could run on a read-only filesystem that would give me this information?
>>
>> lsattr
>
> Hmmm. Does that even work on BTRFS? I get this, even after doing a chattr +C on one of the files.
>
> ------------------- fedora25.qcow2
> ------------------- mac_hdd.qcow2
> ------------------- ubuntu14.04-64.qcow2
> ------------------- ubuntu14.04.qcow2
> ------------------- ubuntu16.04-32.qcow2
> ------------------- ubuntu16.04-64.qcow2
> ------------------- ubuntu16.10-64.qcow2
> ------------------- win10.qcow2
> ------------------- winxp.qcow2
These files wouldn't have been created with the NOCOW attribute by 
default, as QEMU doesn't know about it.  To convert them, you would have 
to create a new empty file, set that attribute, then use something like 
cp or dd to copy the data into the new file, then rename it over-top of 
the old one.  Setting these NOCOW may not help as much as it does for 
pre-allocated raw image files though.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-05-03 14:17           ` Christophe de Dinechin
  2017-05-03 14:49             ` Austin S. Hemmelgarn
@ 2017-05-03 17:43             ` Chris Murphy
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2017-05-03 17:43 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: Chris Murphy, Btrfs BTRFS

On Wed, May 3, 2017 at 8:17 AM, Christophe de Dinechin
<dinechin@redhat.com> wrote:

>> Check the qcow2 files with filefrag and see how many extents they
>> have. I'll bet they're massively fragmented.
>
> Indeed:
>
> fedora25.qcow2: 28358 extents found
> mac_hdd.qcow2: 79493 extents found
> ubuntu14.04-64.qcow2: 35069 extents found
> ubuntu14.04.qcow2: 240 extents found
> ubuntu16.04-32.qcow2: 81 extents found
> ubuntu16.04-64.qcow2: 15060 extents found
> ubuntu16.10-64.qcow2: 228 extents found
> win10.qcow2: 3438997 extents found
> winxp.qcow2: 66657 extents found
>
> I have no idea why my Win10 guest is so much worse than the others. It’s currently one of the least used, at least it’s not yet operating regularly in my build ring… But I had noticed that the installation of Visual Studio had taken quite a bit of time.

I see the same pathological behavior. I don't know if it's Windows,
NTFS, or some combination of how Windows+NTFS flushes, and how that
flush gets treated as it passes from guest to host. And I don't know
if qcow2 itself exacerbates it. But it seems fairly clear most every
Windows guest write becomes an extent on Btrfs.

If we consider 3438997 extents, and each extent has an extent data
item, with ~200 items per 16KiB leaf, that's 17194 leaves, and about
270MiB of metadata. For one file; and that's just the extent data
metadata. It doesn't include csums. But anyway there's a lot of churn
not just in the extent data getting written out but how much metadata
is affected by each write and the obsoleting of extents. Pretty much
everything on Btrfs is a write. Even a delete is first a write and
only later is space released.

> Hmmm. Does that even work on BTRFS? I get this, even after doing a chattr +C on one of the files.
>
> ------------------- fedora25.qcow2
> ------------------- mac_hdd.qcow2
> ------------------- ubuntu14.04-64.qcow2
> ------------------- ubuntu14.04.qcow2
> ------------------- ubuntu16.04-32.qcow2
> ------------------- ubuntu16.04-64.qcow2
> ------------------- ubuntu16.10-64.qcow2
> ------------------- win10.qcow2
> ------------------- winxp.qcow2

It only works on zero length files. It has to be set at the time the
file is created, which is what -o nocow=on does with qemu-img. If you
wanted to do this with raw files and make it behave on Btrfs pretty
much like it does on any other file system:

touch windows.raw
chattr +C windows.raw
fallocate -l 50g windows.raw

It's not possible to retroactively make a cow file nocow, or a nocow
file cow. You can copy it to a new location such that it inherits +C
(like a new directory). And you can also create a new nocow file, and
then cat the old one into the new one. I haven't tried it but
presumably you can use either 'qemu-img convert' or 'qemu-img dd' to
migrate the data inside a cow qcow2 into a nocow qcow2. I don't know
if you'd do the touch > chattr > qemu-image; or if you'd have qemu-img
create a new one with -o nocow=on and then use the dd command.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: File system corruption, btrfsck abort
  2017-04-29  8:46       ` Christophe de Dinechin
  2017-04-29 19:13         ` Chris Murphy
@ 2017-04-29 19:18         ` Chris Murphy
  1 sibling, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2017-04-29 19:18 UTC (permalink / raw)
  To: Christophe de Dinechin; +Cc: Chris Murphy, Btrfs BTRFS

On Sat, Apr 29, 2017 at 2:46 AM, Christophe de Dinechin
<dinechin@redhat.com> wrote:

> Are there btrfs commands I could run on a read-only filesystem that would give me this information?

qemu-img info <file> will give you the status of lazy refcounts.
lsattr will show a capital C in the 3rd to last position if it's nocow
filefrag -v will show many extents with the "unwritten" flag if the
file is fallocated.

$ lsattr
------------------- ./Desktop
------------------- ./Downloads
------------------- ./Templates
------------------- ./Public
------------------- ./Documents
------------------- ./Music
------------------- ./Pictures
------------------- ./Videos
--------c---------- ./tmp                      ##enable compression
------------------- ./Applications
----------------C-- ./hello.qcow2           ##this is nocow



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-05-05  0:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-25 17:50 File system corruption, btrfsck abort Christophe de Dinechin
2017-04-27 14:58 ` Christophe de Dinechin
2017-04-27 15:12   ` Christophe de Dinechin
2017-04-28  0:45 ` Qu Wenruo
2017-04-28  8:47   ` Christophe de Dinechin
2017-05-02  0:17     ` Qu Wenruo
2017-05-03 14:21       ` Christophe de Dinechin
2017-05-04 12:33         ` Christophe de Dinechin
2017-05-05  0:18         ` Qu Wenruo
2017-04-28  3:58 ` Chris Murphy
     [not found]   ` <2CE52079-1B96-4FB3-8CEF-05FC6D3CB183@redhat.com>
2017-04-28 20:09     ` Chris Murphy
2017-04-29  8:46       ` Christophe de Dinechin
2017-04-29 19:13         ` Chris Murphy
2017-05-03 14:17           ` Christophe de Dinechin
2017-05-03 14:49             ` Austin S. Hemmelgarn
2017-05-03 17:43             ` Chris Murphy
2017-04-29 19:18         ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).