btrfs check --repair: ERROR: cannot read chunk root

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* btrfs check --repair: ERROR: cannot read chunk root
@ 2016-10-30 18:34 Marc MERLIN
  2016-10-31  1:02 ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-10-30 18:34 UTC (permalink / raw)
  To: linux-btrfs

I have a filesystem on top of md raid5 that got a few problems due to the
underlying block layer (bad data cable).
The filesystem mounts fine, but had a few issues
Scrub runs (I didn't let it finish, it takes a _long_ time)
But check --repair won't even run at all:

myth:~# btrfs --version
btrfs-progs v4.7.3
myth:~# uname -r
4.8.5-ia32-20161028

myth:~# btrfs check -p --repair  /dev/mapper/crypt_bcache0  2>&1 | tee
/var/spool/repair
bytenr mismatch, want=13835462344704, have=0
ERROR: cannot read chunk root
Couldn't open file system
enabling repair mode
myth:~#

myth:~# btrfs rescue super-recover -v /dev//mapper/crypt_bcache0 
All Devices:
        Device: id = 1, name = /dev//mapper/crypt_bcache0

Before Recovering:
        [All good supers]:
                device name = /dev//mapper/crypt_bcache0
                superblock bytenr = 65536

                device name = /dev//mapper/crypt_bcache0
                superblock bytenr = 67108864

                device name = /dev//mapper/crypt_bcache0
                superblock bytenr = 274877906944

        [All bad supers]:

All supers are valid, no need to recover


I don't care about the data, it's a backup array, but I'd still like to know
if I can recover from this state and do a repair to see how much data got
damaged

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-30 18:34 btrfs check --repair: ERROR: cannot read chunk root Marc MERLIN
@ 2016-10-31  1:02 ` Qu Wenruo
  2016-10-31  2:06   ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-10-31  1:02 UTC (permalink / raw)
  To: Marc MERLIN, linux-btrfs



At 10/31/2016 02:34 AM, Marc MERLIN wrote:
> I have a filesystem on top of md raid5 that got a few problems due to the
> underlying block layer (bad data cable).
> The filesystem mounts fine, but had a few issues
> Scrub runs (I didn't let it finish, it takes a _long_ time)
> But check --repair won't even run at all:
>
> myth:~# btrfs --version
> btrfs-progs v4.7.3
> myth:~# uname -r
> 4.8.5-ia32-20161028
>
> myth:~# btrfs check -p --repair  /dev/mapper/crypt_bcache0  2>&1 | tee
> /var/spool/repair
> bytenr mismatch, want=13835462344704, have=0
> ERROR: cannot read chunk root

Your chunk root is corrupted, and since chunk tree provides the 
underlying disk layout, even for single device, so if we failed to read 
it, then it will never be able to be mounted.

You could try to use backup chunk root.

"btrfs inspect-internal dump-super -f" to find the backup chunk root, 
and use "btrfs check --chunk-root <backup chunk root bytenr>" to have 
another try.

Thanks,
Qu
> Couldn't open file system
> enabling repair mode
> myth:~#
>
> myth:~# btrfs rescue super-recover -v /dev//mapper/crypt_bcache0
> All Devices:
>         Device: id = 1, name = /dev//mapper/crypt_bcache0
>
> Before Recovering:
>         [All good supers]:
>                 device name = /dev//mapper/crypt_bcache0
>                 superblock bytenr = 65536
>
>                 device name = /dev//mapper/crypt_bcache0
>                 superblock bytenr = 67108864
>
>                 device name = /dev//mapper/crypt_bcache0
>                 superblock bytenr = 274877906944
>
>         [All bad supers]:
>
> All supers are valid, no need to recover
>
>
> I don't care about the data, it's a backup array, but I'd still like to know
> if I can recover from this state and do a repair to see how much data got
> damaged
>
> Thanks,
> Marc
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  1:02 ` Qu Wenruo
@ 2016-10-31  2:06   ` Marc MERLIN
  2016-10-31  4:21     ` Marc MERLIN
  2016-10-31  5:27     ` Qu Wenruo
  0 siblings, 2 replies; 40+ messages in thread
From: Marc MERLIN @ 2016-10-31  2:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote:
> Your chunk root is corrupted, and since chunk tree provides the 
> underlying disk layout, even for single device, so if we failed to read 
> it, then it will never be able to be mounted.
 
That's the thing though, I can mount the filesystem just fine :)

> You could try to use backup chunk root.
> 
> "btrfs inspect-internal dump-super -f" to find the backup chunk root, 
> and use "btrfs check --chunk-root <backup chunk root bytenr>" to have 
> another try.

Am I doing this right? It doesn't seem to work

myth:~# btrfs check -p --repair --chunk-root 13835462344704 /dev/mapper/crypt_bcache0  2>&1 | tee /var/spool/repair2
bytenr mismatch, want=13835462344704, have=0
ERROR: cannot read chunk root
Couldn't open file system
enabling repair mode


myth:~# btrfs inspect-internal dump-super -f /dev/mapper/crypt_bcache0 | less
superblock: bytenr=65536, device=/dev/mapper/crypt_bcache0
---------------------------------------------------------
csum_type               0 (crc32c)
csum_size               4
csum                    0x3814e4a0 [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    6692cf4c-93d9-438c-ac30-5db6381dc4f2
label                   DS5
generation              51176
root                    13845513109504
sys_array_size          129
chunk_root_generation   51135
root_level              1
chunk_root              13835462344704
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             16002599346176
bytes_used              14584560160768
sectorsize              4096
nodesize                16384
leafsize                16384
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x169
                        ( MIXED_BACKREF |
                          COMPRESS_LZO |
                          BIG_METADATA |
                          EXTENDED_IREF |
                          SKINNY_METADATA )
cache_generation        51176
uuid_tree_generation    51176
dev_item.uuid           0cf779be-8e16-4982-b7d7-f8241deea0d1
dev_item.fsid           6692cf4c-93d9-438c-ac30-5db6381dc4f2 [match]
dev_item.type           0
dev_item.total_bytes    16002599346176
dev_item.bytes_used     14691011133440
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0
sys_chunk_array[2048]:
        item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13835461197824)
                chunk length 33554432 owner 2 stripe_len 65536
                type SYSTEM|DUP num_stripes 2
                        stripe 0 devid 1 offset 13500327919616
                        dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
                        stripe 1 devid 1 offset 13500361474048
                        dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
backup_roots[4]:
        backup 0:
                backup_tree_root:       12801101791232  gen: 51174      level: 1
                backup_chunk_root:      13835462344704  gen: 51135      level: 1
                backup_extent_root:     12801124352000  gen: 51174      level: 3
                backup_fs_root:         10548133724160  gen: 51172      level: 0
                backup_dev_root:        11125467824128  gen: 51172      level: 1
                backup_csum_root:       12801133953024  gen: 51174      level: 3
                backup_total_bytes:     16002599346176
                backup_bytes_used:      14584560160768
                backup_num_devices:     1

        backup 1:
                backup_tree_root:       13842532810752  gen: 51175      level: 1
                backup_chunk_root:      13835462344704  gen: 51135      level: 1
                backup_extent_root:     13843784695808  gen: 51175      level: 3
                backup_fs_root:         10548133724160  gen: 51172      level: 0
                backup_dev_root:        11125467824128  gen: 51172      level: 1
                backup_csum_root:       13842542362624  gen: 51175      level: 3
                backup_total_bytes:     16002599346176
                backup_bytes_used:      14584560160768
                backup_num_devices:     1

        backup 2:
                backup_tree_root:       13845513109504  gen: 51176      level: 1
                backup_chunk_root:      13835462344704  gen: 51135      level: 1
                backup_extent_root:     13845513191424  gen: 51176      level: 3
                backup_fs_root:         10548133724160  gen: 51172      level: 0
                backup_dev_root:        11125467824128  gen: 51172      level: 1
                backup_csum_root:       13852180938752  gen: 51176      level: 3
                backup_total_bytes:     16002599346176
                backup_bytes_used:      14584560160768
                backup_num_devices:     1

        backup 3:
                backup_tree_root:       12750807580672  gen: 51173      level: 1
                backup_chunk_root:      13835462344704  gen: 51135      level: 1
                backup_extent_root:     12750810447872  gen: 51173      level: 3
                backup_fs_root:         10548133724160  gen: 51172      level: 0
                backup_dev_root:        11125467824128  gen: 51172      level: 1
                backup_csum_root:       12684302712832  gen: 51173      level: 3
                backup_total_bytes:     16002599346176
                backup_bytes_used:      14584560177152
                backup_num_devices:     1



-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  2:06   ` Marc MERLIN
@ 2016-10-31  4:21     ` Marc MERLIN
  2016-10-31  5:27     ` Qu Wenruo
  1 sibling, 0 replies; 40+ messages in thread
From: Marc MERLIN @ 2016-10-31  4:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Sun, Oct 30, 2016 at 07:06:16PM -0700, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote:
> > Your chunk root is corrupted, and since chunk tree provides the 
> > underlying disk layout, even for single device, so if we failed to read 
> > it, then it will never be able to be mounted.
>  
> That's the thing though, I can mount the filesystem just fine :)

Actually, has anyone seen any configuration where the kernel can mount a
filesystem without ro, or recovery, it can just mount it read/write and
btrfs check --repair can't open it?

This kind of sounds like a bug in check --repair IMO.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  2:06   ` Marc MERLIN
  2016-10-31  4:21     ` Marc MERLIN
@ 2016-10-31  5:27     ` Qu Wenruo
  2016-10-31  5:47       ` Marc MERLIN
  1 sibling, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-10-31  5:27 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs



At 10/31/2016 10:06 AM, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 09:02:50AM +0800, Qu Wenruo wrote:
>> Your chunk root is corrupted, and since chunk tree provides the
>> underlying disk layout, even for single device, so if we failed to read
>> it, then it will never be able to be mounted.
>
> That's the thing though, I can mount the filesystem just fine :)

That's strange, pretty strange.

And according to your super dump, I didn't see anything btrfs-progs 
can't handle.

Your chunk tree lies in a DUP chunk, which btrfs-progs should be able to 
handle it. (Unlike RAID5/6, btrfs-progs doesn't support to recover it at 
read time)



>
>> You could try to use backup chunk root.
>>
>> "btrfs inspect-internal dump-super -f" to find the backup chunk root,
>> and use "btrfs check --chunk-root <backup chunk root bytenr>" to have
>> another try.
>
> Am I doing this right? It doesn't seem to work
>
> myth:~# btrfs check -p --repair --chunk-root 13835462344704 /dev/mapper/crypt_bcache0  2>&1 | tee /var/spool/repair2
> bytenr mismatch, want=13835462344704, have=0
> ERROR: cannot read chunk root
> Couldn't open file system
> enabling repair mode

You're doing it right, while the superblock doesn't contain any old 
chunk root bytenr.

So this method doesn't work at all. :(


Would you please dump the following bytes?
That's the chunk root tree block on your disk.

offset: 13500329066496 length: 16384
offset: 13500330213376 length: 16384

According to your fsck error output, I assume btrfs-progs fails to read 
the first copy of chunk root, and due to a bug, it doesn't continue to 
read 2nd copy.

While kernel continues to read the 2nd copy and everything goes on.

IIRC btrfs-progs can handle csum error and continue trying, maybe some 
logical goes wrong.

Thanks,
Qu
>
>
> myth:~# btrfs inspect-internal dump-super -f /dev/mapper/crypt_bcache0 | less
> superblock: bytenr=65536, device=/dev/mapper/crypt_bcache0
> ---------------------------------------------------------
> csum_type               0 (crc32c)
> csum_size               4
> csum                    0x3814e4a0 [match]
> bytenr                  65536
> flags                   0x1
>                         ( WRITTEN )
> magic                   _BHRfS_M [match]
> fsid                    6692cf4c-93d9-438c-ac30-5db6381dc4f2
> label                   DS5
> generation              51176
> root                    13845513109504
> sys_array_size          129
> chunk_root_generation   51135
> root_level              1
> chunk_root              13835462344704
> chunk_root_level        1
> log_root                0
> log_root_transid        0
> log_root_level          0
> total_bytes             16002599346176
> bytes_used              14584560160768
> sectorsize              4096
> nodesize                16384
> leafsize                16384
> stripesize              4096
> root_dir                6
> num_devices             1
> compat_flags            0x0
> compat_ro_flags         0x0
> incompat_flags          0x169
>                         ( MIXED_BACKREF |
>                           COMPRESS_LZO |
>                           BIG_METADATA |
>                           EXTENDED_IREF |
>                           SKINNY_METADATA )
> cache_generation        51176
> uuid_tree_generation    51176
> dev_item.uuid           0cf779be-8e16-4982-b7d7-f8241deea0d1
> dev_item.fsid           6692cf4c-93d9-438c-ac30-5db6381dc4f2 [match]
> dev_item.type           0
> dev_item.total_bytes    16002599346176
> dev_item.bytes_used     14691011133440
> dev_item.io_align       4096
> dev_item.io_width       4096
> dev_item.sector_size    4096
> dev_item.devid          1
> dev_item.dev_group      0
> dev_item.seek_speed     0
> dev_item.bandwidth      0
> dev_item.generation     0
> sys_chunk_array[2048]:
>         item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13835461197824)
>                 chunk length 33554432 owner 2 stripe_len 65536
>                 type SYSTEM|DUP num_stripes 2
>                         stripe 0 devid 1 offset 13500327919616
>                         dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
>                         stripe 1 devid 1 offset 13500361474048
>                         dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
> backup_roots[4]:
>         backup 0:
>                 backup_tree_root:       12801101791232  gen: 51174      level: 1
>                 backup_chunk_root:      13835462344704  gen: 51135      level: 1
>                 backup_extent_root:     12801124352000  gen: 51174      level: 3
>                 backup_fs_root:         10548133724160  gen: 51172      level: 0
>                 backup_dev_root:        11125467824128  gen: 51172      level: 1
>                 backup_csum_root:       12801133953024  gen: 51174      level: 3
>                 backup_total_bytes:     16002599346176
>                 backup_bytes_used:      14584560160768
>                 backup_num_devices:     1
>
>         backup 1:
>                 backup_tree_root:       13842532810752  gen: 51175      level: 1
>                 backup_chunk_root:      13835462344704  gen: 51135      level: 1
>                 backup_extent_root:     13843784695808  gen: 51175      level: 3
>                 backup_fs_root:         10548133724160  gen: 51172      level: 0
>                 backup_dev_root:        11125467824128  gen: 51172      level: 1
>                 backup_csum_root:       13842542362624  gen: 51175      level: 3
>                 backup_total_bytes:     16002599346176
>                 backup_bytes_used:      14584560160768
>                 backup_num_devices:     1
>
>         backup 2:
>                 backup_tree_root:       13845513109504  gen: 51176      level: 1
>                 backup_chunk_root:      13835462344704  gen: 51135      level: 1
>                 backup_extent_root:     13845513191424  gen: 51176      level: 3
>                 backup_fs_root:         10548133724160  gen: 51172      level: 0
>                 backup_dev_root:        11125467824128  gen: 51172      level: 1
>                 backup_csum_root:       13852180938752  gen: 51176      level: 3
>                 backup_total_bytes:     16002599346176
>                 backup_bytes_used:      14584560160768
>                 backup_num_devices:     1
>
>         backup 3:
>                 backup_tree_root:       12750807580672  gen: 51173      level: 1
>                 backup_chunk_root:      13835462344704  gen: 51135      level: 1
>                 backup_extent_root:     12750810447872  gen: 51173      level: 3
>                 backup_fs_root:         10548133724160  gen: 51172      level: 0
>                 backup_dev_root:        11125467824128  gen: 51172      level: 1
>                 backup_csum_root:       12684302712832  gen: 51173      level: 3
>                 backup_total_bytes:     16002599346176
>                 backup_bytes_used:      14584560177152
>                 backup_num_devices:     1
>
>
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  5:27     ` Qu Wenruo
@ 2016-10-31  5:47       ` Marc MERLIN
  2016-10-31  6:04         ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-10-31  5:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Oct 31, 2016 at 01:27:56PM +0800, Qu Wenruo wrote:
> Would you please dump the following bytes?
> That's the chunk root tree block on your disk.
> 
> offset: 13500329066496 length: 16384
> offset: 13500330213376 length: 16384

Sorry for asking, am I doing this wrong?
myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
skip=26367830208
dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s

> According to your fsck error output, I assume btrfs-progs fails to read 
> the first copy of chunk root, and due to a bug, it doesn't continue to 
> read 2nd copy.
> 
> While kernel continues to read the 2nd copy and everything goes on.

Ah, that would make sense.
But from what you're saying, I should be able to do recovery by pointing
to the 2nd copy of the chunk root, but somehow I haven't typed the right
command to do so yet, correct?

Should I try another command offset than 
btrfs check -p --repair --chunk-root 13835462344704 /dev/mapper/crypt_bcache0 
?

Or are you saying the btrfs progs bug causes it to fail to even try to read
the 2nd copy of the chunk root even though it was given on the command line?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  5:47       ` Marc MERLIN
@ 2016-10-31  6:04         ` Qu Wenruo
  2016-10-31  6:25           ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-10-31  6:04 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs



At 10/31/2016 01:47 PM, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 01:27:56PM +0800, Qu Wenruo wrote:
>> Would you please dump the following bytes?
>> That's the chunk root tree block on your disk.
>>
>> offset: 13500329066496 length: 16384
>> offset: 13500330213376 length: 16384
>
> Sorry for asking, am I doing this wrong?
> myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
> skip=26367830208
> dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
> 0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s

So, the underlying MD RAID5 are complaining about some wrong data, and 
refuse to read out.

It seems that btrfs-progs can't handle read failure?
Maybe dm-error could emulate it.

And what about the 2nd range?

>
>> According to your fsck error output, I assume btrfs-progs fails to read
>> the first copy of chunk root, and due to a bug, it doesn't continue to
>> read 2nd copy.
>>
>> While kernel continues to read the 2nd copy and everything goes on.
>
> Ah, that would make sense.
> But from what you're saying, I should be able to do recovery by pointing
> to the 2nd copy of the chunk root, but somehow I haven't typed the right
> command to do so yet, correct?

Unfortunately, no the case.

For --chunk-root command, *logical* bytenr is specified.

We can only tell btrfs-progs(kernel is the same) to find tree root/chunk 
root at given *logical* bytenr.

But to read which *physical* copy, we can't specify.

Normally, btrfs-progs/kernel should find the correct physical copy 
without problem, but not this time for btrfs-progs.

And further more, all backup chunk root are in facts pointing to current 
chunk root, so --chunk-root doesn't work at all.

>
> Should I try another command offset than
> btrfs check -p --repair --chunk-root 13835462344704 /dev/mapper/crypt_bcache0
> ?
Nope, that bytenr is *physical* bytenr, not *logical* bytenr 
--chunk-root accepts.

But the read error for first tree block already gives some hint.
I'll try to emulate it.

Thanks,
Qu

>
> Or are you saying the btrfs progs bug causes it to fail to even try to read
> the 2nd copy of the chunk root even though it was given on the command line?
>
> Thanks,
> Marc
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  6:04         ` Qu Wenruo
@ 2016-10-31  6:25           ` Marc MERLIN
  2016-10-31  6:32             ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-10-31  6:25 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote:
> >Sorry for asking, am I doing this wrong?
> >myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
> >skip=26367830208
> >dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
> >0+0 records in
> >0+0 records out
> >0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s
> 
> So, the underlying MD RAID5 are complaining about some wrong data, and 
> refuse to read out.
> 
> It seems that btrfs-progs can't handle read failure?
> Maybe dm-error could emulate it.
> 
> And what about the 2nd range?

they both fail the same, but I wasn' tsure if I typed the wrong dd command
or not.

myth:~# btrfs fi df /mnt/mnt
Data, single: total=13.22TiB, used=13.19TiB
System, DUP: total=32.00MiB, used=1.42MiB
Metadata, DUP: total=74.00GiB, used=72.82GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
myth:~# btrfs fi show
Label: 'DS5'  uuid: 6692cf4c-93d9-438c-ac30-5db6381dc4f2
        Total devices 1 FS bytes used 13.26TiB
        devid    1 size 14.55TiB used 13.36TiB path /dev/mapper/crypt_bcache0

For now, I mounted the filesystem and I'm running scrub on it to see how
much damage there is. It will take all night:
BTRFS warning (device dm-0): checksum error at logical 27886878720 on dev /dev/mapper/crypt_bcache0, sector 56580096, root 9461, inode 45837, offset 15460089856, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 27887009792 on dev /dev/mapper/crypt_bcache0
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 27886878720 on dev /dev/mapper/crypt_bcache0
BTRFS warning (device dm-0): checksum error at logical 27885961216 on dev /dev/mapper/crypt_bcache0, sector 56578304, root 9461, inode 45837, offset 15459172352, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS warning (device dm-0): checksum error at logical 27885830144 on dev /dev/mapper/crypt_bcache0, sector 56578048, root 9461, inode 45837, offset 15459041280, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 27885830144 on dev /dev/mapper/crypt_bcache0
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 27885961216 on dev /dev/mapper/crypt_bcache0
BTRFS warning (device dm-0): checksum error at logical 27887013888 on dev /dev/mapper/crypt_bcache0, sector 56580360, root 9461, inode 45837, offset 15460225024, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 27887013888 on dev /dev/mapper/crypt_bcache0
BTRFS warning (device dm-0): checksum error at logical 27885834240 on dev /dev/mapper/crypt_bcache0, sector 56578056, root 9461, inode 45837, offset 15459045376, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 27885834240 on dev /dev/mapper/crypt_bcache0
BTRFS warning (device dm-0): checksum error at logical 27887017984 on dev /dev/mapper/crypt_bcache0, sector 56580368, root 9461, inode 45837, offset 15460229120, length 4096, links 1 (path: system/mlocate/mlocate.db)
BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
BTRFS error (device dm-0): unable to fixup (regular) error at logical 27887017984 on dev /dev/mapper/crypt_bcache0

So far, it looks like mnior damage limited to one file, I'll see tomorrow morning after it's done reading the whole array

> And further more, all backup chunk root are in facts pointing to current 
> chunk root, so --chunk-root doesn't work at all.

Ah, ok, so there is nothing I can do at the moment until I get a new btrfs-progs, correct?

Thanks for your answers
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  6:25           ` Marc MERLIN
@ 2016-10-31  6:32             ` Qu Wenruo
  2016-10-31  6:37               ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-10-31  6:32 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs



At 10/31/2016 02:25 PM, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote:
>>> Sorry for asking, am I doing this wrong?
>>> myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
>>> skip=26367830208
>>> dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
>>> 0+0 records in
>>> 0+0 records out
>>> 0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s
>>
>> So, the underlying MD RAID5 are complaining about some wrong data, and
>> refuse to read out.
>>
>> It seems that btrfs-progs can't handle read failure?
>> Maybe dm-error could emulate it.
>>
>> And what about the 2nd range?
>
> they both fail the same, but I wasn' tsure if I typed the wrong dd command
> or not.

Strange, your command seems OK to me.

Does it has anything to do with your security setup or something like that?
Or is it related to dm-crypt or bcache?


But this reminds me, if dd can't read it, maybe btrfs-progs is the same.

Maybe only kernel can read dm-crypt device while user space tools can't 
access dm-crypt devices directly?

Thanks,
Qu

>
> myth:~# btrfs fi df /mnt/mnt
> Data, single: total=13.22TiB, used=13.19TiB
> System, DUP: total=32.00MiB, used=1.42MiB
> Metadata, DUP: total=74.00GiB, used=72.82GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> myth:~# btrfs fi show
> Label: 'DS5'  uuid: 6692cf4c-93d9-438c-ac30-5db6381dc4f2
>         Total devices 1 FS bytes used 13.26TiB
>         devid    1 size 14.55TiB used 13.36TiB path /dev/mapper/crypt_bcache0
>
> For now, I mounted the filesystem and I'm running scrub on it to see how
> much damage there is. It will take all night:
> BTRFS warning (device dm-0): checksum error at logical 27886878720 on dev /dev/mapper/crypt_bcache0, sector 56580096, root 9461, inode 45837, offset 15460089856, length 4096, links 1 (path: system/mlocate/mlocate.db)
> BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> BTRFS error (device dm-0): unable to fixup (regular) error at logical 27887009792 on dev /dev/mapper/crypt_bcache0
> BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> BTRFS error (device dm-0): unable to fixup (regular) error at logical 27886878720 on dev /dev/mapper/crypt_bcache0
> BTRFS warning (device dm-0): checksum error at logical 27885961216 on dev /dev/mapper/crypt_bcache0, sector 56578304, root 9461, inode 45837, offset 15459172352, length 4096, links 1 (path: system/mlocate/mlocate.db)
> BTRFS warning (device dm-0): checksum error at logical 27885830144 on dev /dev/mapper/crypt_bcache0, sector 56578048, root 9461, inode 45837, offset 15459041280, length 4096, links 1 (path: system/mlocate/mlocate.db)
> BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
> BTRFS error (device dm-0): unable to fixup (regular) error at logical 27885830144 on dev /dev/mapper/crypt_bcache0
> BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
> BTRFS error (device dm-0): unable to fixup (regular) error at logical 27885961216 on dev /dev/mapper/crypt_bcache0
> BTRFS warning (device dm-0): checksum error at logical 27887013888 on dev /dev/mapper/crypt_bcache0, sector 56580360, root 9461, inode 45837, offset 15460225024, length 4096, links 1 (path: system/mlocate/mlocate.db)
> BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
> BTRFS error (device dm-0): unable to fixup (regular) error at logical 27887013888 on dev /dev/mapper/crypt_bcache0
> BTRFS warning (device dm-0): checksum error at logical 27885834240 on dev /dev/mapper/crypt_bcache0, sector 56578056, root 9461, inode 45837, offset 15459045376, length 4096, links 1 (path: system/mlocate/mlocate.db)
> BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
> BTRFS error (device dm-0): unable to fixup (regular) error at logical 27885834240 on dev /dev/mapper/crypt_bcache0
> BTRFS warning (device dm-0): checksum error at logical 27887017984 on dev /dev/mapper/crypt_bcache0, sector 56580368, root 9461, inode 45837, offset 15460229120, length 4096, links 1 (path: system/mlocate/mlocate.db)
> BTRFS error (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
> BTRFS error (device dm-0): unable to fixup (regular) error at logical 27887017984 on dev /dev/mapper/crypt_bcache0
>
> So far, it looks like mnior damage limited to one file, I'll see tomorrow morning after it's done reading the whole array
>
>> And further more, all backup chunk root are in facts pointing to current
>> chunk root, so --chunk-root doesn't work at all.
>
> Ah, ok, so there is nothing I can do at the moment until I get a new btrfs-progs, correct?
>
> Thanks for your answers
> Marc
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  6:32             ` Qu Wenruo
@ 2016-10-31  6:37               ` Marc MERLIN
  2016-10-31  7:04                 ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-10-31  6:37 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Oct 31, 2016 at 02:32:53PM +0800, Qu Wenruo wrote:
> 
> 
> At 10/31/2016 02:25 PM, Marc MERLIN wrote:
> >On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote:
> >>>Sorry for asking, am I doing this wrong?
> >>>myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
> >>>skip=26367830208
> >>>dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
> >>>0+0 records in
> >>>0+0 records out
> >>>0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s
> >>
> >>So, the underlying MD RAID5 are complaining about some wrong data, and
> >>refuse to read out.
> >>
> >>It seems that btrfs-progs can't handle read failure?
> >>Maybe dm-error could emulate it.
> >>
> >>And what about the 2nd range?
> >
> >they both fail the same, but I wasn' tsure if I typed the wrong dd command
> >or not.
> 
> Strange, your command seems OK to me.
> 
> Does it has anything to do with your security setup or something like that?
> Or is it related to dm-crypt or bcache?
> 
> 
> But this reminds me, if dd can't read it, maybe btrfs-progs is the same.
> 
> Maybe only kernel can read dm-crypt device while user space tools can't 
> access dm-crypt devices directly?

It can, it's just the offset seems wrong:

myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 skip=26367830208
dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000421662 s, 0.0 kB/s

If I divide by 1000, it works:
myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 skip=26367830
32+0 records in
32+0 records out
16384 bytes (16 kB) copied, 0.139005 s, 118 kB/s

so that's why I was asking you if I counted the offset wrong. I took the
value you asked and divided by 512, but it seems too big

13500329066496 / 512 = 26367830208

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  6:37               ` Marc MERLIN
@ 2016-10-31  7:04                 ` Qu Wenruo
  2016-10-31  8:44                   ` Hugo Mills
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-10-31  7:04 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs



At 10/31/2016 02:37 PM, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 02:32:53PM +0800, Qu Wenruo wrote:
>>
>>
>> At 10/31/2016 02:25 PM, Marc MERLIN wrote:
>>> On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote:
>>>>> Sorry for asking, am I doing this wrong?
>>>>> myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
>>>>> skip=26367830208
>>>>> dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
>>>>> 0+0 records in
>>>>> 0+0 records out
>>>>> 0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s
>>>>
>>>> So, the underlying MD RAID5 are complaining about some wrong data, and
>>>> refuse to read out.
>>>>
>>>> It seems that btrfs-progs can't handle read failure?
>>>> Maybe dm-error could emulate it.
>>>>
>>>> And what about the 2nd range?
>>>
>>> they both fail the same, but I wasn' tsure if I typed the wrong dd command
>>> or not.
>>
>> Strange, your command seems OK to me.
>>
>> Does it has anything to do with your security setup or something like that?
>> Or is it related to dm-crypt or bcache?
>>
>>
>> But this reminds me, if dd can't read it, maybe btrfs-progs is the same.
>>
>> Maybe only kernel can read dm-crypt device while user space tools can't
>> access dm-crypt devices directly?
>
> It can, it's just the offset seems wrong:
>
> myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 skip=26367830208
> dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
> 0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.000421662 s, 0.0 kB/s
>
> If I divide by 1000, it works:
> myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 skip=26367830
> 32+0 records in
> 32+0 records out
> 16384 bytes (16 kB) copied, 0.139005 s, 118 kB/s
>
> so that's why I was asking you if I counted the offset wrong. I took the
> value you asked and divided by 512, but it seems too big
>
> 13500329066496 / 512 = 26367830208
>
> Marc
>
But according to your dump-super output, that's strange.
------
chunk_root              13835462344704 (CR)
         item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13835461197824) (CS)
                 chunk length 33554432 owner 2 stripe_len 65536
                 type SYSTEM|DUP num_stripes 2
                         stripe 0 devid 1 offset 13500327919616 (ST1)
                         dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
                         stripe 1 devid 1 offset 13500361474048 (ST2)
                         dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
------

Here, your chunk logical bytenr is 13835461197824, and its physical 
bytenr is 13500327919616 and 13500361474048.

My calculation is quite simple.
Start1 = CR - CS + ST1
Start2 = CR - CS + ST2

Unless the superblock is incorrect, it is not possile.

And the physical offset, is about 12.2 TiB, which is smaller than 15TiB 
of your device.

So that's quite strange that dd can't read out the data.
And if dd can't read it out, then I see no reason btrfs-progs can read 
it out.

Any idea on special dm setup which can make us fail to read out some 
data range?

Thanks,
Qu




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  7:04                 ` Qu Wenruo
@ 2016-10-31  8:44                   ` Hugo Mills
  2016-10-31 15:04                     ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Hugo Mills @ 2016-10-31  8:44 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Marc MERLIN, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3608 bytes --]

On Mon, Oct 31, 2016 at 03:04:27PM +0800, Qu Wenruo wrote:
> 
> 
> At 10/31/2016 02:37 PM, Marc MERLIN wrote:
> >On Mon, Oct 31, 2016 at 02:32:53PM +0800, Qu Wenruo wrote:
> >>
> >>
> >>At 10/31/2016 02:25 PM, Marc MERLIN wrote:
> >>>On Mon, Oct 31, 2016 at 02:04:10PM +0800, Qu Wenruo wrote:
> >>>>>Sorry for asking, am I doing this wrong?
> >>>>>myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32
> >>>>>skip=26367830208
> >>>>>dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
> >>>>>0+0 records in
> >>>>>0+0 records out
> >>>>>0 bytes (0 B) copied, 0.000401393 s, 0.0 kB/s
> >>>>
> >>>>So, the underlying MD RAID5 are complaining about some wrong data, and
> >>>>refuse to read out.
> >>>>
> >>>>It seems that btrfs-progs can't handle read failure?
> >>>>Maybe dm-error could emulate it.
> >>>>
> >>>>And what about the 2nd range?
> >>>
> >>>they both fail the same, but I wasn' tsure if I typed the wrong dd command
> >>>or not.
> >>
> >>Strange, your command seems OK to me.
> >>
> >>Does it has anything to do with your security setup or something like that?
> >>Or is it related to dm-crypt or bcache?
> >>
> >>
> >>But this reminds me, if dd can't read it, maybe btrfs-progs is the same.
> >>
> >>Maybe only kernel can read dm-crypt device while user space tools can't
> >>access dm-crypt devices directly?
> >
> >It can, it's just the offset seems wrong:
> >
> >myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 skip=26367830208
> >dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
> >0+0 records in
> >0+0 records out
> >0 bytes (0 B) copied, 0.000421662 s, 0.0 kB/s
> >
> >If I divide by 1000, it works:
> >myth:~# dd if=/dev/mapper/crypt_bcache0 of=/tmp/dump1 bs=512 count=32 skip=26367830
> >32+0 records in
> >32+0 records out
> >16384 bytes (16 kB) copied, 0.139005 s, 118 kB/s
> >
> >so that's why I was asking you if I counted the offset wrong. I took the
> >value you asked and divided by 512, but it seems too big
> >
> >13500329066496 / 512 = 26367830208
> >
> >Marc
> >
> But according to your dump-super output, that's strange.
> ------
> chunk_root              13835462344704 (CR)
>         item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 13835461197824) (CS)
>                 chunk length 33554432 owner 2 stripe_len 65536
>                 type SYSTEM|DUP num_stripes 2
>                         stripe 0 devid 1 offset 13500327919616 (ST1)
>                         dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
>                         stripe 1 devid 1 offset 13500361474048 (ST2)
>                         dev uuid: 0cf779be-8e16-4982-b7d7-f8241deea0d1
> ------
> 
> Here, your chunk logical bytenr is 13835461197824, and its physical
> bytenr is 13500327919616 and 13500361474048.
> 
> My calculation is quite simple.
> Start1 = CR - CS + ST1
> Start2 = CR - CS + ST2
> 
> Unless the superblock is incorrect, it is not possile.
> 
> And the physical offset, is about 12.2 TiB, which is smaller than
> 15TiB of your device.
> 
> So that's quite strange that dd can't read out the data.
> And if dd can't read it out, then I see no reason btrfs-progs can
> read it out.
> 
> Any idea on special dm setup which can make us fail to read out some
> data range?

   I've seen both btrfs check and btrfs dump-super give wrong answers
(particularly, some addresses end up larger than the device, for some
reason) when run on a mounted filesystem. Worth ruling that one out.

   Hugo.

-- 
Hugo Mills             | Great films about cricket: Silly Point Break
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31  8:44                   ` Hugo Mills
@ 2016-10-31 15:04                     ` Marc MERLIN
  2016-11-01  3:48                       ` Marc MERLIN
  2016-11-01  4:13                       ` Qu Wenruo
  0 siblings, 2 replies; 40+ messages in thread
From: Marc MERLIN @ 2016-10-31 15:04 UTC (permalink / raw)
  To: Hugo Mills, Qu Wenruo, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3432 bytes --]

On Mon, Oct 31, 2016 at 08:44:12AM +0000, Hugo Mills wrote:
> > Any idea on special dm setup which can make us fail to read out some
> > data range?
> 
>    I've seen both btrfs check and btrfs dump-super give wrong answers
> (particularly, some addresses end up larger than the device, for some
> reason) when run on a mounted filesystem. Worth ruling that one out.

I just finished running my scrub overnight, and it failed around 10%:
[115500.316921] BTRFS error (device dm-0): bad tree block start 8461247125784585065 17619396231168
[115500.332354] BTRFS error (device dm-0): bad tree block start 8461247125784585065 17619396231168
[115500.332626] BTRFS: error (device dm-0) in __btrfs_free_extent:6954: errno=-5 IO failure
[115500.332629] BTRFS info (device dm-0): forced readonly
[115500.332632] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2960: errno=-5 IO failure
[115500.436002] btrfs_printk: 550 callbacks suppressed
[115500.436024] BTRFS warning (device dm-0): Skipping commit of aborted transaction.
[115500.436029] BTRFS: error (device dm-0) in cleanup_transaction:1854: errno=-5 IO failure


myth:~# ionice -c 3 nice -10 btrfs scrub start -Bd /mnt/mnt
(...)
scrub device /dev/mapper/crypt_bcache0 (id 1) canceled
        scrub started at Sun Oct 30 22:52:59 2016 and was aborted after 09:03:11
        total bytes scrubbed: 1.15TiB with 512 errors
        error details: csum=512
        corrected errors: 0, uncorrectable errors: 512, unverified errors: 0

Am I correct that if I see "__btrfs_free_extent:6954: errno=-5 IO failure" it means
that btrfs had physical read errors from the underlying block layer?

Do I have some weird mismatch between the size of my md array and the size of my filesystem
(as per dd apparently thinking parts of it are out of bounds?)
Yet,  the sizes seem to match:


myth:~#  mdadm --query --detail /dev/md5
/dev/md5:
        Version : 1.2
  Creation Time : Tue Jan 21 10:35:52 2014
     Raid Level : raid5
     Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
  Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Oct 31 07:56:07 2016
          State : clean 
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : gargamel.svh.merlins.org:5
           UUID : ec672af7:a66d9557:2f00d76c:38c9f705
         Events : 147992

    Number   Major   Minor   RaidDevice State
       0       8       97        0      active sync   /dev/sdg1
       6       8      113        1      active sync   /dev/sdh1
       2       8       81        2      active sync   /dev/sdf1
       3       8       65        3      active sync   /dev/sde1
       5       8       49        4      active sync   /dev/sdd1

myth:~# btrfs fi df /mnt/mnt
Data, single: total=13.22TiB, used=13.19TiB
System, DUP: total=32.00MiB, used=1.42MiB
Metadata, DUP: total=75.00GiB, used=72.82GiB
GlobalReserve, single: total=512.00MiB, used=6.73MiB

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 291 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31 15:04                     ` Marc MERLIN
@ 2016-11-01  3:48                       ` Marc MERLIN
  2016-11-01  4:13                       ` Qu Wenruo
  1 sibling, 0 replies; 40+ messages in thread
From: Marc MERLIN @ 2016-11-01  3:48 UTC (permalink / raw)
  To: Hugo Mills, Qu Wenruo, linux-btrfs

So, I'm willing to wait 2 more days before I wipe this filesystem and
start over if I can't get check --repair to work on it.
If you need longer, please let me konw you have an upcoming patch for me
to try and I'll wait.

Thanks,
Marc

On Mon, Oct 31, 2016 at 08:04:22AM -0700, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 08:44:12AM +0000, Hugo Mills wrote:
> > > Any idea on special dm setup which can make us fail to read out some
> > > data range?
> > 
> >    I've seen both btrfs check and btrfs dump-super give wrong answers
> > (particularly, some addresses end up larger than the device, for some
> > reason) when run on a mounted filesystem. Worth ruling that one out.
> 
> I just finished running my scrub overnight, and it failed around 10%:
> [115500.316921] BTRFS error (device dm-0): bad tree block start 8461247125784585065 17619396231168
> [115500.332354] BTRFS error (device dm-0): bad tree block start 8461247125784585065 17619396231168
> [115500.332626] BTRFS: error (device dm-0) in __btrfs_free_extent:6954: errno=-5 IO failure
> [115500.332629] BTRFS info (device dm-0): forced readonly
> [115500.332632] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2960: errno=-5 IO failure
> [115500.436002] btrfs_printk: 550 callbacks suppressed
> [115500.436024] BTRFS warning (device dm-0): Skipping commit of aborted transaction.
> [115500.436029] BTRFS: error (device dm-0) in cleanup_transaction:1854: errno=-5 IO failure
> 
> 
> myth:~# ionice -c 3 nice -10 btrfs scrub start -Bd /mnt/mnt
> (...)
> scrub device /dev/mapper/crypt_bcache0 (id 1) canceled
>         scrub started at Sun Oct 30 22:52:59 2016 and was aborted after 09:03:11
>         total bytes scrubbed: 1.15TiB with 512 errors
>         error details: csum=512
>         corrected errors: 0, uncorrectable errors: 512, unverified errors: 0
> 
> Am I correct that if I see "__btrfs_free_extent:6954: errno=-5 IO failure" it means
> that btrfs had physical read errors from the underlying block layer?
> 
> Do I have some weird mismatch between the size of my md array and the size of my filesystem
> (as per dd apparently thinking parts of it are out of bounds?)
> Yet,  the sizes seem to match:
> 
> 
> myth:~#  mdadm --query --detail /dev/md5
> /dev/md5:
>         Version : 1.2
>   Creation Time : Tue Jan 21 10:35:52 2014
>      Raid Level : raid5
>      Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
>   Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
>    Raid Devices : 5
>   Total Devices : 5
>     Persistence : Superblock is persistent
> 
>   Intent Bitmap : Internal
> 
>     Update Time : Mon Oct 31 07:56:07 2016
>           State : clean 
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>            Name : gargamel.svh.merlins.org:5
>            UUID : ec672af7:a66d9557:2f00d76c:38c9f705
>          Events : 147992
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       97        0      active sync   /dev/sdg1
>        6       8      113        1      active sync   /dev/sdh1
>        2       8       81        2      active sync   /dev/sdf1
>        3       8       65        3      active sync   /dev/sde1
>        5       8       49        4      active sync   /dev/sdd1
> 
> myth:~# btrfs fi df /mnt/mnt
> Data, single: total=13.22TiB, used=13.19TiB
> System, DUP: total=32.00MiB, used=1.42MiB
> Metadata, DUP: total=75.00GiB, used=72.82GiB
> GlobalReserve, single: total=512.00MiB, used=6.73MiB
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  



-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-10-31 15:04                     ` Marc MERLIN
  2016-11-01  3:48                       ` Marc MERLIN
@ 2016-11-01  4:13                       ` Qu Wenruo
  2016-11-01  4:21                         ` Marc MERLIN
  1 sibling, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-01  4:13 UTC (permalink / raw)
  To: Marc MERLIN, Hugo Mills, linux-btrfs



At 10/31/2016 11:04 PM, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 08:44:12AM +0000, Hugo Mills wrote:
>>> Any idea on special dm setup which can make us fail to read out some
>>> data range?
>>
>>    I've seen both btrfs check and btrfs dump-super give wrong answers
>> (particularly, some addresses end up larger than the device, for some
>> reason) when run on a mounted filesystem. Worth ruling that one out.
>
> I just finished running my scrub overnight, and it failed around 10%:
> [115500.316921] BTRFS error (device dm-0): bad tree block start 8461247125784585065 17619396231168
> [115500.332354] BTRFS error (device dm-0): bad tree block start 8461247125784585065 17619396231168
> [115500.332626] BTRFS: error (device dm-0) in __btrfs_free_extent:6954: errno=-5 IO failure
> [115500.332629] BTRFS info (device dm-0): forced readonly
> [115500.332632] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2960: errno=-5 IO failure
> [115500.436002] btrfs_printk: 550 callbacks suppressed
> [115500.436024] BTRFS warning (device dm-0): Skipping commit of aborted transaction.
> [115500.436029] BTRFS: error (device dm-0) in cleanup_transaction:1854: errno=-5 IO failure
>
>
> myth:~# ionice -c 3 nice -10 btrfs scrub start -Bd /mnt/mnt
> (...)
> scrub device /dev/mapper/crypt_bcache0 (id 1) canceled
>         scrub started at Sun Oct 30 22:52:59 2016 and was aborted after 09:03:11
>         total bytes scrubbed: 1.15TiB with 512 errors
>         error details: csum=512
>         corrected errors: 0, uncorrectable errors: 512, unverified errors: 0
>
> Am I correct that if I see "__btrfs_free_extent:6954: errno=-5 IO failure" it means
> that btrfs had physical read errors from the underlying block layer?

Not really sure if it's physical read errors. As we throw -EIO almost 
every where.

But that's possible that your extent tree got corrupted so 
__btrfs_free_extent() failed to modify extent tree.

And in that case, we do throw -EIO.

>
> Do I have some weird mismatch between the size of my md array and the size of my filesystem
> (as per dd apparently thinking parts of it are out of bounds?)
> Yet,  the sizes seem to match:

Would you try to locate the range where we starts to fail to read?

I still think the root problem is we failed to read the device in user 
space.

Thanks,
Qu
>
>
> myth:~#  mdadm --query --detail /dev/md5
> /dev/md5:
>         Version : 1.2
>   Creation Time : Tue Jan 21 10:35:52 2014
>      Raid Level : raid5
>      Array Size : 15627542528 (14903.59 GiB 16002.60 GB)
>   Used Dev Size : 3906885632 (3725.90 GiB 4000.65 GB)
>    Raid Devices : 5
>   Total Devices : 5
>     Persistence : Superblock is persistent
>
>   Intent Bitmap : Internal
>
>     Update Time : Mon Oct 31 07:56:07 2016
>           State : clean
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 512K
>
>            Name : gargamel.svh.merlins.org:5
>            UUID : ec672af7:a66d9557:2f00d76c:38c9f705
>          Events : 147992
>
>     Number   Major   Minor   RaidDevice State
>        0       8       97        0      active sync   /dev/sdg1
>        6       8      113        1      active sync   /dev/sdh1
>        2       8       81        2      active sync   /dev/sdf1
>        3       8       65        3      active sync   /dev/sde1
>        5       8       49        4      active sync   /dev/sdd1
>
> myth:~# btrfs fi df /mnt/mnt
> Data, single: total=13.22TiB, used=13.19TiB
> System, DUP: total=32.00MiB, used=1.42MiB
> Metadata, DUP: total=75.00GiB, used=72.82GiB
> GlobalReserve, single: total=512.00MiB, used=6.73MiB
>
> Thanks,
> Marc
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-11-01  4:13                       ` Qu Wenruo
@ 2016-11-01  4:21                         ` Marc MERLIN
  2016-11-04  8:01                           ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-01  4:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hugo Mills, linux-btrfs

On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote:
> Would you try to locate the range where we starts to fail to read?
> 
> I still think the root problem is we failed to read the device in user
> space.
 
Understood.

I'll run this then:
myth:~# dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M &
[2] 21108
myth:~# while :; do killall -USR1 dd; sleep 1200; done
275+0 records in
274+0 records out
287309824 bytes (287 MB) copied, 7.20248 s, 39.9 MB/s

This will take a while to run, I'll report back on how far it goes.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-11-01  4:21                         ` Marc MERLIN
@ 2016-11-04  8:01                           ` Marc MERLIN
  2016-11-04  9:00                             ` Roman Mamedov
  2016-11-07  1:11                             ` Qu Wenruo
  0 siblings, 2 replies; 40+ messages in thread
From: Marc MERLIN @ 2016-11-04  8:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hugo Mills, linux-btrfs

On Mon, Oct 31, 2016 at 09:21:40PM -0700, Marc MERLIN wrote:
> On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote:
> > Would you try to locate the range where we starts to fail to read?
> > 
> > I still think the root problem is we failed to read the device in user
> > space.
>  
> Understood.
> 
> I'll run this then:
> myth:~# dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M &
> [2] 21108
> myth:~# while :; do killall -USR1 dd; sleep 1200; done
> 275+0 records in
> 274+0 records out
> 287309824 bytes (287 MB) copied, 7.20248 s, 39.9 MB/s
> 
> This will take a while to run, I'll report back on how far it goes.

Well, turns out you were right. My array is 14TB and dd was only able to
copy 8.8TB out of it.

I wonder if it's a bug with bcache and source devices that are too big?

8782434271232 bytes (8.8 TB) copied, 214809 s, 40.9 MB/s
dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
8388608+0 records in
8388608+0 records out
8796093022208 bytes (8.8 TB) copied, 215197 s, 40.9 MB/s
[2]+  Exit 1                  dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M

What's vexing is that absolutely nothing has been logged in the kernel dmesg
buffer about this read error.

Basically I have this:
sde                            8:64   0   3.7T  0 
└─sde1                         8:65   0   3.7T  0 
  └─md5                        9:5    0  14.6T  0 
    └─bcache0                252:0    0  14.6T  0 
      └─crypt_bcache0 (dm-0) 253:0    0  14.6T  0 

I'll try dd'ing the md5 directly now, but that's going to take another 2 days :(

That said, given that almost half the device is not readable from user space
for some reason, that would explain why btrfs check is failing. Obviously it
can't do its job if it can't read blocks.

I'll report back on what I find out with this problem but if you have
suggestions on what to look for, let me know :)

Thanks.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-11-04  8:01                           ` Marc MERLIN
@ 2016-11-04  9:00                             ` Roman Mamedov
  2016-11-04 17:59                               ` Marc MERLIN
  2016-11-07  1:11                             ` Qu Wenruo
  1 sibling, 1 reply; 40+ messages in thread
From: Roman Mamedov @ 2016-11-04  9:00 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Qu Wenruo, Hugo Mills, linux-btrfs

On Fri, 4 Nov 2016 01:01:13 -0700
Marc MERLIN <marc@merlins.org> wrote:

> Basically I have this:
> sde                            8:64   0   3.7T  0 
> └─sde1                         8:65   0   3.7T  0 
>   └─md5                        9:5    0  14.6T  0 
>     └─bcache0                252:0    0  14.6T  0 
>       └─crypt_bcache0 (dm-0) 253:0    0  14.6T  0 
> 
> I'll try dd'ing the md5 directly now, but that's going to take another 2 days :(
> 
> That said, given that almost half the device is not readable from user space
> for some reason, that would explain why btrfs check is failing. Obviously it
> can't do its job if it can't read blocks.

I don't see anything to support the notion that "half is unreadable", maybe
just a 512-byte sector is unreadable -- but that would be enough to make
regular dd bail out -- which is why you should be using dd_rescue for this,
not regular dd. Assuming you just want to copy over as much data as possible,
and not simply test if dd fails or not (but in any case dd_rescue at least
would not fail instantly and would tell you precise count of how much is
unreadable).

There is "GNU ddrescue" and "dd_rescue", I liked the first one better, but
they both work on a similar principle.

Also didn't you recently have issues with bad block lists on mdadm. This
mysterious "unreadable and nothing in dmesg" does appear to be a continuation
of that.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-11-04  9:00                             ` Roman Mamedov
@ 2016-11-04 17:59                               ` Marc MERLIN
  0 siblings, 0 replies; 40+ messages in thread
From: Marc MERLIN @ 2016-11-04 17:59 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Qu Wenruo, Hugo Mills, linux-btrfs

On Fri, Nov 04, 2016 at 02:00:43PM +0500, Roman Mamedov wrote:
> On Fri, 4 Nov 2016 01:01:13 -0700
> Marc MERLIN <marc@merlins.org> wrote:
> 
> > Basically I have this:
> > sde                            8:64   0   3.7T  0 
> > └─sde1                         8:65   0   3.7T  0 
> >   └─md5                        9:5    0  14.6T  0 
> >     └─bcache0                252:0    0  14.6T  0 
> >       └─crypt_bcache0 (dm-0) 253:0    0  14.6T  0 
> > 
> > I'll try dd'ing the md5 directly now, but that's going to take another 2 days :(
> > 
> > That said, given that almost half the device is not readable from user space
> > for some reason, that would explain why btrfs check is failing. Obviously it
> > can't do its job if it can't read blocks.
> 
> I don't see anything to support the notion that "half is unreadable", maybe
> just a 512-byte sector is unreadable -- but that would be enough to make
> regular dd bail out -- which is why you should be using dd_rescue for this,
> not regular dd. Assuming you just want to copy over as much data as possible,
> and not simply test if dd fails or not (but in any case dd_rescue at least
> would not fail instantly and would tell you precise count of how much is
> unreadable).

Thanks for the plug on ddrescue, I have used it to rescue drives in the
past.
Here, however, everything after the 8.8TB mark, is unreadable, so there
is nothing to skip.

Because the underlying drives are fine, I'm not entirely sure where the
issue is although it has to be on the mdadm side and not related to
btrfs.

And of course the mdadm array shows clean, and I have already disabled
the mdadm per drive bad block (mis-)feature which probably is
responsible for all the problems I've had here.
myth:~# mdadm --examine-badblocks /dev/sd[defgh]1
No bad-blocks list configured on /dev/sdd1
No bad-blocks list configured on /dev/sde1
No bad-blocks list configured on /dev/sdf1
No bad-blocks list configured on /dev/sdg1
No bad-blocks list configured on /dev/sdh1

I'm also still perplexed as to why despite the rear error I'm getting,
absolutely nothing is logged in the kernel :-/

I'll pursue that further and post a summary on the thread here if I find
something interesting.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs check --repair: ERROR: cannot read chunk root
  2016-11-04  8:01                           ` Marc MERLIN
  2016-11-04  9:00                             ` Roman Mamedov
@ 2016-11-07  1:11                             ` Qu Wenruo
       [not found]                               ` <87lgwwnnyf.fsf@notabene.neil.brown.name>
  1 sibling, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-07  1:11 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Hugo Mills, linux-btrfs



At 11/04/2016 04:01 PM, Marc MERLIN wrote:
> On Mon, Oct 31, 2016 at 09:21:40PM -0700, Marc MERLIN wrote:
>> On Tue, Nov 01, 2016 at 12:13:38PM +0800, Qu Wenruo wrote:
>>> Would you try to locate the range where we starts to fail to read?
>>>
>>> I still think the root problem is we failed to read the device in user
>>> space.
>>
>> Understood.
>>
>> I'll run this then:
>> myth:~# dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M &
>> [2] 21108
>> myth:~# while :; do killall -USR1 dd; sleep 1200; done
>> 275+0 records in
>> 274+0 records out
>> 287309824 bytes (287 MB) copied, 7.20248 s, 39.9 MB/s
>>
>> This will take a while to run, I'll report back on how far it goes.
>
> Well, turns out you were right. My array is 14TB and dd was only able to
> copy 8.8TB out of it.
>
> I wonder if it's a bug with bcache and source devices that are too big?

At least we know it's not a problem of btrfs-progs.

And for bcache/soft raid/encryption, unfortunately I'm not familiar with 
any of them.

I would recommend to report it to bcache/mdadm/encryption ML after 
locating the layer which returns EINVAL.

>
> 8782434271232 bytes (8.8 TB) copied, 214809 s, 40.9 MB/s
> dd: reading `/dev/mapper/crypt_bcache0': Invalid argument
> 8388608+0 records in
> 8388608+0 records out
> 8796093022208 bytes (8.8 TB) copied, 215197 s, 40.9 MB/s
> [2]+  Exit 1                  dd if=/dev/mapper/crypt_bcache0 of=/dev/null bs=1M
>
> What's vexing is that absolutely nothing has been logged in the kernel dmesg
> buffer about this read error.
>
> Basically I have this:
> sde                            8:64   0   3.7T  0
> └─sde1                         8:65   0   3.7T  0
>   └─md5                        9:5    0  14.6T  0
>     └─bcache0                252:0    0  14.6T  0
>       └─crypt_bcache0 (dm-0) 253:0    0  14.6T  0
>
> I'll try dd'ing the md5 directly now, but that's going to take another 2 days :(

No need to read them out, just reading from the 8T would be good enough 
for me.

BTW, that's really a complicated layout, with soft raid, bcache, and 
encryption, it will take a long time to find the real cause.

But at least we know the 8.8T position, we can save some time not 
reading the whole disk.

Thanks,
Qu

>
> That said, given that almost half the device is not readable from user space
> for some reason, that would explain why btrfs check is failing. Obviously it
> can't do its job if it can't read blocks.
>
> I'll report back on what I find out with this problem but if you have
> suggestions on what to look for, let me know :)
>
> Thanks.
> Marc
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
       [not found]                               ` <87lgwwnnyf.fsf@notabene.neil.brown.name>
@ 2016-11-07  1:20                                 ` Marc MERLIN
  2016-11-07  1:39                                   ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-07  1:20 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hugo Mills, linux-btrfs

On Mon, Nov 07, 2016 at 09:11:54AM +0800, Qu Wenruo wrote:
> > Well, turns out you were right. My array is 14TB and dd was only able to
> > copy 8.8TB out of it.
> > 
> > I wonder if it's a bug with bcache and source devices that are too big?
> 
> At least we know it's not a problem of btrfs-progs.
> 
> And for bcache/soft raid/encryption, unfortunately I'm not familiar with any
> of them.
> 
> I would recommend to report it to bcache/mdadm/encryption ML after locating
> the layer which returns EINVAL.

So, Neil Brown found the problem.

myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
dd: reading `/dev/md5': Invalid argument
2+0 records in
2+0 records out
2147483648 bytes (2.1 GB) copied, 37.0785 s, 57.9 MB/s
myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190 count=3 iflag=direct
3+0 records in
3+0 records out


On Mon, Nov 07, 2016 at 11:16:56AM +1100, NeilBrown wrote:
> EINVAL from a read() system call is surprising in this context.....
> 
> do_generic_file_read can return it:
> 	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
> 		return -EINVAL;
> 
> s_maxbytes will be MAX_LFS_FILESIZE which, on a 32bit system, is
> 
> #define MAX_LFS_FILESIZE        (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
> 
> That is 2^(12+31) or 2^43 or 8TB.
> 
> Is this a 32bit system you are using?  Such systems can only support
> buffered IO up to 8TB.  If you use iflags=direct to avoid buffering, you
> should get access to the whole device.

I am indeed using a 32bit system, and now we know why the kernel can
mount and use my filesystem just fine while btrfs check repair fails to
deal with it.
The filesystem is more than 8TB on a 32bit kernel with 32bit userland.

Since iflag=direct fixes the issue with dd, it sounds like something
similar could be done for btrfs progs, to support filesystems bigger
than 8TB on 32bit systems.

However, could you confirm that filesystems more than 8TB are supported
by the kernel code itself on 32bit systems? (I think so, but just
wanting to make sure)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-07  1:20                                 ` clearing blocks wrongfully marked as bad if --update=no-bbl can't be used? Marc MERLIN
@ 2016-11-07  1:39                                   ` Qu Wenruo
  2016-11-07  4:18                                     ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-07  1:39 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Hugo Mills, linux-btrfs

At 11/07/2016 09:20 AM, Marc MERLIN wrote:
> On Mon, Nov 07, 2016 at 09:11:54AM +0800, Qu Wenruo wrote:
>>> Well, turns out you were right. My array is 14TB and dd was only able to
>>> copy 8.8TB out of it.
>>>
>>> I wonder if it's a bug with bcache and source devices that are too big?
>>
>> At least we know it's not a problem of btrfs-progs.
>>
>> And for bcache/soft raid/encryption, unfortunately I'm not familiar with any
>> of them.
>>
>> I would recommend to report it to bcache/mdadm/encryption ML after locating
>> the layer which returns EINVAL.
>
> So, Neil Brown found the problem.
>
> myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
> dd: reading `/dev/md5': Invalid argument
> 2+0 records in
> 2+0 records out
> 2147483648 bytes (2.1 GB) copied, 37.0785 s, 57.9 MB/s
> myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190 count=3 iflag=direct
> 3+0 records in
> 3+0 records out

That's interesting.

>
>
> On Mon, Nov 07, 2016 at 11:16:56AM +1100, NeilBrown wrote:
>> EINVAL from a read() system call is surprising in this context.....
>>
>> do_generic_file_read can return it:
>> 	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
>> 		return -EINVAL;

At least the return value is a bug.
Normally we should return -EFBIG instead of -EINVAL.

>>
>> s_maxbytes will be MAX_LFS_FILESIZE which, on a 32bit system, is
>>
>> #define MAX_LFS_FILESIZE        (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
>>
>> That is 2^(12+31) or 2^43 or 8TB.
>>
>> Is this a 32bit system you are using?  Such systems can only support
>> buffered IO up to 8TB.  If you use iflags=direct to avoid buffering, you
>> should get access to the whole device.
>
> I am indeed using a 32bit system, and now we know why the kernel can
> mount and use my filesystem just fine while btrfs check repair fails to
> deal with it.
> The filesystem is more than 8TB on a 32bit kernel with 32bit userland.
>
> Since iflag=direct fixes the issue with dd, it sounds like something
> similar could be done for btrfs progs, to support filesystems bigger
> than 8TB on 32bit systems.
>
> However, could you confirm that filesystems more than 8TB are supported
> by the kernel code itself on 32bit systems? (I think so, but just
> wanting to make sure)

Yep, fs can support to u64 max size fs. (But I'd assume u63 max as some 
fs may use the highest bit for special purpose)
Just VFS/mm layer is blocking things.

Direct IO can handle it because it avoids cache, while for buffered IO, 
it's cache(memory) size limiting the offsize.

It's good to locate the root cause.

It doesn't look hard to add such workaround for btrfs-progs.
I'll send such workaround soon.

Thanks,
Qu

>
> Thanks,
> Marc
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: clearing blocks wrongfully marked as bad if --update=no-bbl can't be used?
  2016-11-07  1:39                                   ` Qu Wenruo
@ 2016-11-07  4:18                                     ` Qu Wenruo
  2016-11-07  5:36                                       ` btrfs support for filesystems >8TB on 32bit architectures Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-07  4:18 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Hugo Mills, linux-btrfs



At 11/07/2016 09:39 AM, Qu Wenruo wrote:
>
>
> At 11/07/2016 09:20 AM, Marc MERLIN wrote:
>> On Mon, Nov 07, 2016 at 09:11:54AM +0800, Qu Wenruo wrote:
>>>> Well, turns out you were right. My array is 14TB and dd was only
>>>> able to
>>>> copy 8.8TB out of it.
>>>>
>>>> I wonder if it's a bug with bcache and source devices that are too big?
>>>
>>> At least we know it's not a problem of btrfs-progs.
>>>
>>> And for bcache/soft raid/encryption, unfortunately I'm not familiar
>>> with any
>>> of them.
>>>
>>> I would recommend to report it to bcache/mdadm/encryption ML after
>>> locating
>>> the layer which returns EINVAL.
>>
>> So, Neil Brown found the problem.
>>
>> myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
>> dd: reading `/dev/md5': Invalid argument
>> 2+0 records in
>> 2+0 records out
>> 2147483648 bytes (2.1 GB) copied, 37.0785 s, 57.9 MB/s
>> myth:/sys/block/md5/md# dd if=/dev/md5 of=/dev/null bs=1GiB skip=8190
>> count=3 iflag=direct
>> 3+0 records in
>> 3+0 records out
>
> That's interesting.
>
>>
>>
>> On Mon, Nov 07, 2016 at 11:16:56AM +1100, NeilBrown wrote:
>>> EINVAL from a read() system call is surprising in this context.....
>>>
>>> do_generic_file_read can return it:
>>>     if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
>>>         return -EINVAL;
>
> At least the return value is a bug.
> Normally we should return -EFBIG instead of -EINVAL.
>
>>>
>>> s_maxbytes will be MAX_LFS_FILESIZE which, on a 32bit system, is
>>>
>>> #define MAX_LFS_FILESIZE        (((loff_t)PAGE_SIZE <<
>>> (BITS_PER_LONG-1))-1)
>>>
>>> That is 2^(12+31) or 2^43 or 8TB.
>>>
>>> Is this a 32bit system you are using?  Such systems can only support
>>> buffered IO up to 8TB.  If you use iflags=direct to avoid buffering, you
>>> should get access to the whole device.
>>
>> I am indeed using a 32bit system, and now we know why the kernel can
>> mount and use my filesystem just fine while btrfs check repair fails to
>> deal with it.
>> The filesystem is more than 8TB on a 32bit kernel with 32bit userland.
>>
>> Since iflag=direct fixes the issue with dd, it sounds like something
>> similar could be done for btrfs progs, to support filesystems bigger
>> than 8TB on 32bit systems.
>>
>> However, could you confirm that filesystems more than 8TB are supported
>> by the kernel code itself on 32bit systems? (I think so, but just
>> wanting to make sure)
>
> Yep, fs can support to u64 max size fs. (But I'd assume u63 max as some
> fs may use the highest bit for special purpose)
> Just VFS/mm layer is blocking things.
>
> Direct IO can handle it because it avoids cache, while for buffered IO,
> it's cache(memory) size limiting the offsize.
>
> It's good to locate the root cause.
>
> It doesn't look hard to add such workaround for btrfs-progs.
> I'll send such workaround soon.

I'm totally wrong here.

DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes 
aligned.

While we are using a lot of stack memory() and normal malloc()/calloc() 
allocated memory, which are seldom aligned to 512 bytes.

So to *workaround* the problem in btrfs-progs, we may need to change any 
pread() caller to use aligned memory allocation.

I really don't think David will accept such huge change for a workdaround...

Thanks,
Qu
>
> Thanks,
> Qu
>
>>
>> Thanks,
>> Marc
>>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-07  4:18                                     ` Qu Wenruo
@ 2016-11-07  5:36                                       ` Marc MERLIN
  2016-11-07  6:16                                         ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-07  5:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Hugo Mills, linux-btrfs

(sorry for the bad subject line from the mdadm list on the previous mail) 

On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote:
> I'm totally wrong here.
> 
> DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes
> aligned.
> 
> While we are using a lot of stack memory() and normal malloc()/calloc()
> allocated memory, which are seldom aligned to 512 bytes.
> 
> So to *workaround* the problem in btrfs-progs, we may need to change any
> pread() caller to use aligned memory allocation.
> 
> I really don't think David will accept such huge change for a workdaround...

Thanks for looking into it.
So basically should we just document that btrfs filesystems past 8TB in
size are not supported on 32bit architectures?
(as in you can mount them and use them I believe, but you cannot create,
or repair them)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-07  5:36                                       ` btrfs support for filesystems >8TB on 32bit architectures Marc MERLIN
@ 2016-11-07  6:16                                         ` Qu Wenruo
  2016-11-07 14:55                                           ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-07  6:16 UTC (permalink / raw)
  To: Marc MERLIN, David Sterba; +Cc: Hugo Mills, linux-btrfs



At 11/07/2016 01:36 PM, Marc MERLIN wrote:
> (sorry for the bad subject line from the mdadm list on the previous mail)
>
> On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote:
>> I'm totally wrong here.
>>
>> DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes
>> aligned.
>>
>> While we are using a lot of stack memory() and normal malloc()/calloc()
>> allocated memory, which are seldom aligned to 512 bytes.
>>
>> So to *workaround* the problem in btrfs-progs, we may need to change any
>> pread() caller to use aligned memory allocation.
>>
>> I really don't think David will accept such huge change for a workdaround...
>
> Thanks for looking into it.
> So basically should we just document that btrfs filesystems past 8TB in
> size are not supported on 32bit architectures?
> (as in you can mount them and use them I believe, but you cannot create,
> or repair them)
>
> Marc
>
Add David to this thread.

For create, it should be OK. As at create time, we hardly write beyond 
3G. So it won't be a big problem.

For repair, we do have a possibility that btrfsck can't handle it.

Anyway, I'd like to see how David thinks what we should do the handle 
the problem.

Thanks,
Qu



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-07  6:16                                         ` Qu Wenruo
@ 2016-11-07 14:55                                           ` Marc MERLIN
  2016-11-08  0:35                                             ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-07 14:55 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David Sterba, Hugo Mills, linux-btrfs

On Mon, Nov 07, 2016 at 02:16:37PM +0800, Qu Wenruo wrote:
> 
> 
> At 11/07/2016 01:36 PM, Marc MERLIN wrote:
> > (sorry for the bad subject line from the mdadm list on the previous mail)
> > 
> > On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote:
> > > I'm totally wrong here.
> > > 
> > > DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes
> > > aligned.
> > > 
> > > While we are using a lot of stack memory() and normal malloc()/calloc()
> > > allocated memory, which are seldom aligned to 512 bytes.
> > > 
> > > So to *workaround* the problem in btrfs-progs, we may need to change any
> > > pread() caller to use aligned memory allocation.
> > > 
> > > I really don't think David will accept such huge change for a workdaround...
> > 
> > Thanks for looking into it.
> > So basically should we just document that btrfs filesystems past 8TB in
> > size are not supported on 32bit architectures?
> > (as in you can mount them and use them I believe, but you cannot create,
> > or repair them)
> > 
> > Marc
> > 
> Add David to this thread.
> 
> For create, it should be OK. As at create time, we hardly write beyond 3G.
> So it won't be a big problem.
> 
> For repair, we do have a possibility that btrfsck can't handle it.
> 
> Anyway, I'd like to see how David thinks what we should do the handle the
> problem.

Understood. One big thing (for me) I forgot to confirm:
1) btrfs receive
2) btrfs scrub
should both be able to work because the IO operations are done directly
inside the kernel and not from user space, correct?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-07 14:55                                           ` Marc MERLIN
@ 2016-11-08  0:35                                             ` Qu Wenruo
  2016-11-08  0:39                                               ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-08  0:35 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: David Sterba, Hugo Mills, linux-btrfs



At 11/07/2016 10:55 PM, Marc MERLIN wrote:
> On Mon, Nov 07, 2016 at 02:16:37PM +0800, Qu Wenruo wrote:
>>
>>
>> At 11/07/2016 01:36 PM, Marc MERLIN wrote:
>>> (sorry for the bad subject line from the mdadm list on the previous mail)
>>>
>>> On Mon, Nov 07, 2016 at 12:18:10PM +0800, Qu Wenruo wrote:
>>>> I'm totally wrong here.
>>>>
>>>> DirectIO needs the 'buf' parameter of read()/pread() to be 512 bytes
>>>> aligned.
>>>>
>>>> While we are using a lot of stack memory() and normal malloc()/calloc()
>>>> allocated memory, which are seldom aligned to 512 bytes.
>>>>
>>>> So to *workaround* the problem in btrfs-progs, we may need to change any
>>>> pread() caller to use aligned memory allocation.
>>>>
>>>> I really don't think David will accept such huge change for a workdaround...
>>>
>>> Thanks for looking into it.
>>> So basically should we just document that btrfs filesystems past 8TB in
>>> size are not supported on 32bit architectures?
>>> (as in you can mount them and use them I believe, but you cannot create,
>>> or repair them)
>>>
>>> Marc
>>>
>> Add David to this thread.
>>
>> For create, it should be OK. As at create time, we hardly write beyond 3G.
>> So it won't be a big problem.
>>
>> For repair, we do have a possibility that btrfsck can't handle it.
>>
>> Anyway, I'd like to see how David thinks what we should do the handle the
>> problem.
>
> Understood. One big thing (for me) I forgot to confirm:
> 1) btrfs receive

Unfortunately, receive is completely done in userspace.
Only send works inside kernel.

So, receive will fail to reconstruct any file larger beyond 8T.
Despite that, any other normal file smaller than 8T is not affected.

> 2) btrfs scrub

Scrub does work in kernel, so it's unaffected.

Thanks,
Qu

> should both be able to work because the IO operations are done directly
> inside the kernel and not from user space, correct?
>
> Thanks,
> Marc
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-08  0:35                                             ` Qu Wenruo
@ 2016-11-08  0:39                                               ` Marc MERLIN
  2016-11-08  0:43                                                 ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-08  0:39 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David Sterba, Hugo Mills, linux-btrfs

On Tue, Nov 08, 2016 at 08:35:54AM +0800, Qu Wenruo wrote:
> >Understood. One big thing (for me) I forgot to confirm:
> >1) btrfs receive
> 
> Unfortunately, receive is completely done in userspace.
> Only send works inside kernel.
 
right, I've confirmed that btrfs receive fails.
It looks like btrfs balance is also failing, which is more surprising.
Isn't that one in the kernel?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-08  0:39                                               ` Marc MERLIN
@ 2016-11-08  0:43                                                 ` Qu Wenruo
  2016-11-08  1:06                                                   ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-08  0:43 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: David Sterba, Hugo Mills, linux-btrfs



At 11/08/2016 08:39 AM, Marc MERLIN wrote:
> On Tue, Nov 08, 2016 at 08:35:54AM +0800, Qu Wenruo wrote:
>>> Understood. One big thing (for me) I forgot to confirm:
>>> 1) btrfs receive
>>
>> Unfortunately, receive is completely done in userspace.
>> Only send works inside kernel.
>
> right, I've confirmed that btrfs receive fails.
> It looks like btrfs balance is also failing, which is more surprising.
> Isn't that one in the kernel?

That's strange, balance is done completely in kernel space.

Unless we're calling vfs_* function we won't go through the extra check.

What's the error reported?

Thanks,
Qu
>
> Marc
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-08  0:43                                                 ` Qu Wenruo
@ 2016-11-08  1:06                                                   ` Marc MERLIN
  2016-11-08  1:17                                                     ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-08  1:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David Sterba, Hugo Mills, linux-btrfs

On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote:
> That's strange, balance is done completely in kernel space.
> 
> Unless we're calling vfs_* function we won't go through the extra check.
> 
> What's the error reported?

See below. Note however that is may be because btrfs received messed up the
filesystem first.

BTRFS info (device dm-0): use zlib compression
BTRFS info (device dm-0): disk space caching is enabled
BTRFS info (device dm-0): has skinny extents
BTRFS info (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 512, gen 0
BTRFS info (device dm-0): detected SSD devices, enabling SSD mode
BTRFS info (device dm-0): continuing balance
BTRFS info (device dm-0): The free space cache file (1593999097856) is invalid. skip it

BTRFS info (device dm-0): The free space cache file (1671308509184) is invalid. skip it

BTRFS info (device dm-0): relocating block group 13835461197824 flags 34
------------[ cut here ]------------
WARNING: CPU: 0 PID: 22825 at fs/btrfs/disk-io.c:520 btree_csum_one_bio.isra.39+0xf7/0x100
Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400 snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii uhci_hcd usbcore usb_common
CPU: 0 PID: 22825 Comm: kworker/u9:2 Tainted: G        W       4.8.5-ia32-20161028 #2
Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604    07/16/2008
Workqueue: btrfs-worker-high btrfs_worker_helper
 00200286 00200286 d3d81e48 df414827 00000000 dfa12da5 d3d81e78 df05677a
 df9ed884 00000000 00005929 dfa12da5 00000208 df2cf067 00000208 f7463fa0
 f401a080 00000000 d3d81e8c df05684a 00000009 00000000 00000000 d3d81eb4
Call Trace:
 [<df414827>] dump_stack+0x58/0x81
 [<df05677a>] __warn+0xea/0x110
 [<df2cf067>] ? btree_csum_one_bio.isra.39+0xf7/0x100
 [<df05684a>] warn_slowpath_null+0x2a/0x30
 [<df2cf067>] btree_csum_one_bio.isra.39+0xf7/0x100
 [<df2cf085>] __btree_submit_bio_start+0x15/0x20
 [<df2cdd10>] run_one_async_start+0x30/0x40
 [<df31286d>] btrfs_scrubparity_helper+0xcd/0x2d0
 [<df2cde70>] ? run_one_async_free+0x20/0x20
 [<df312bbd>] btrfs_worker_helper+0xd/0x10
 [<df06d05b>] process_one_work+0x10b/0x400
 [<df06d387>] worker_thread+0x37/0x4b0
 [<df06d350>] ? process_one_work+0x400/0x400
 [<df0722db>] kthread+0x9b/0xb0
 [<df799922>] ret_from_kernel_thread+0xe/0x24
 [<df072240>] ? kthread_stop+0x100/0x100
---[ end trace f461faff989bf258 ]---
BTRFS: error (device dm-0) in btrfs_commit_transaction:2232: errno=-5 IO failure (Error while writing out transaction)
BTRFS info (device dm-0): forced readonly
BTRFS warning (device dm-0): Skipping commit of aborted transaction.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 22318 at fs/btrfs/transaction.c:1854 btrfs_commit_transaction+0x2f5/0xcc0
BTRFS: Transaction aborted (error -5)
Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400 snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii uhci_hcd usbcore usb_common
CPU: 0 PID: 22318 Comm: btrfs-balance Tainted: G        W       4.8.5-ia32-20161028 #2
Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604    07/16/2008
 00000286 00000286 d74a3ca4 df414827 d74a3ce8 dfa132ab d74a3cd4 df05677a
 dfa075cc d74a3d04 0000572e dfa132ab 0000073e df2d7de5 0000073e f698dc00
 e9173e70 fffffffb d74a3cf0 df0567db 00000009 00000000 d74a3ce8 dfa075cc
Call Trace:
 [<df414827>] dump_stack+0x58/0x81
 [<df05677a>] __warn+0xea/0x110
 [<df2d7de5>] ? btrfs_commit_transaction+0x2f5/0xcc0
 [<df0567db>] warn_slowpath_fmt+0x3b/0x40
 [<df2d7de5>] btrfs_commit_transaction+0x2f5/0xcc0
 [<df096800>] ? prepare_to_wait_event+0xd0/0xd0
 [<df33334f>] prepare_to_relocate+0x12f/0x180
 [<df339a41>] relocate_block_group+0x31/0x790
 [<df0b1427>] ? vprintk_default+0x37/0x40
 [<df796ca0>] ? mutex_lock+0x10/0x30
 [<df2f8f45>] ? btrfs_wait_ordered_roots+0x1d5/0x1f0
 [<df14eed6>] ? printk+0x17/0x19
 [<df2a47b2>] ? btrfs_printk+0x102/0x110
 [<df33a388>] btrfs_relocate_block_group+0x1e8/0x2e0
 [<df308a9f>] btrfs_relocate_chunk.isra.29+0x3f/0xf0
 [<df30221f>] ? free_extent_buffer+0x4f/0xa0
 [<df30a555>] btrfs_balance+0xb05/0x1820
 [<df0b0afa>] ? console_unlock+0x40a/0x630
 [<df30b2c1>] balance_kthread+0x51/0x80
 [<df30b270>] ? btrfs_balance+0x1820/0x1820
 [<df0722db>] kthread+0x9b/0xb0
 [<df799922>] ret_from_kernel_thread+0xe/0x24
 [<df072240>] ? kthread_stop+0x100/0x100
---[ end trace f461faff989bf259 ]---
BTRFS: error (device dm-0) in cleanup_transaction:1854: errno=-5 IO failure
BTRFS info (device dm-0): delayed_refs has NO entry

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-08  1:06                                                   ` Marc MERLIN
@ 2016-11-08  1:17                                                     ` Qu Wenruo
  2016-11-08 15:24                                                       ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-08  1:17 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: David Sterba, Hugo Mills, linux-btrfs



At 11/08/2016 09:06 AM, Marc MERLIN wrote:
> On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote:
>> That's strange, balance is done completely in kernel space.
>>
>> Unless we're calling vfs_* function we won't go through the extra check.
>>
>> What's the error reported?
>
> See below. Note however that is may be because btrfs received messed up the
> filesystem first.

If receive can easily screw up the fs, then fsstress can also screw up 
btrfs easily.

So I didn't think that's the case. (Several years ago it's possible)

>
> BTRFS info (device dm-0): use zlib compression
> BTRFS info (device dm-0): disk space caching is enabled
> BTRFS info (device dm-0): has skinny extents
> BTRFS info (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, flush 0, corrupt 512, gen 0
> BTRFS info (device dm-0): detected SSD devices, enabling SSD mode
> BTRFS info (device dm-0): continuing balance
> BTRFS info (device dm-0): The free space cache file (1593999097856) is invalid. skip it
>
> BTRFS info (device dm-0): The free space cache file (1671308509184) is invalid. skip it
>
> BTRFS info (device dm-0): relocating block group 13835461197824 flags 34
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 22825 at fs/btrfs/disk-io.c:520 btree_csum_one_bio.isra.39+0xf7/0x100

Dirty tree block's bytenr doesn't match with page's logical.
It seems that the tree block is not up-to-date, maybe corrupted.

Seems not related to the 8T limit.

Could you please add pr_info() to print out the 'found_start' and 'start'?
Also I'm not familiar with this code, the number may has a clue to show 
what's going wrong.

Thanks,
Qu

> Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400 snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii uhci_hcd usbcore usb_common
> CPU: 0 PID: 22825 Comm: kworker/u9:2 Tainted: G        W       4.8.5-ia32-20161028 #2
> Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604    07/16/2008
> Workqueue: btrfs-worker-high btrfs_worker_helper
>  00200286 00200286 d3d81e48 df414827 00000000 dfa12da5 d3d81e78 df05677a
>  df9ed884 00000000 00005929 dfa12da5 00000208 df2cf067 00000208 f7463fa0
>  f401a080 00000000 d3d81e8c df05684a 00000009 00000000 00000000 d3d81eb4
> Call Trace:
>  [<df414827>] dump_stack+0x58/0x81
>  [<df05677a>] __warn+0xea/0x110
>  [<df2cf067>] ? btree_csum_one_bio.isra.39+0xf7/0x100
>  [<df05684a>] warn_slowpath_null+0x2a/0x30
>  [<df2cf067>] btree_csum_one_bio.isra.39+0xf7/0x100
>  [<df2cf085>] __btree_submit_bio_start+0x15/0x20
>  [<df2cdd10>] run_one_async_start+0x30/0x40
>  [<df31286d>] btrfs_scrubparity_helper+0xcd/0x2d0
>  [<df2cde70>] ? run_one_async_free+0x20/0x20
>  [<df312bbd>] btrfs_worker_helper+0xd/0x10
>  [<df06d05b>] process_one_work+0x10b/0x400
>  [<df06d387>] worker_thread+0x37/0x4b0
>  [<df06d350>] ? process_one_work+0x400/0x400
>  [<df0722db>] kthread+0x9b/0xb0
>  [<df799922>] ret_from_kernel_thread+0xe/0x24
>  [<df072240>] ? kthread_stop+0x100/0x100
> ---[ end trace f461faff989bf258 ]---
> BTRFS: error (device dm-0) in btrfs_commit_transaction:2232: errno=-5 IO failure (Error while writing out transaction)
> BTRFS info (device dm-0): forced readonly
> BTRFS warning (device dm-0): Skipping commit of aborted transaction.
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 22318 at fs/btrfs/transaction.c:1854 btrfs_commit_transaction+0x2f5/0xcc0
> BTRFS: Transaction aborted (error -5)
> Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400 snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii uhci_hcd usbcore usb_common
> CPU: 0 PID: 22318 Comm: btrfs-balance Tainted: G        W       4.8.5-ia32-20161028 #2
> Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604    07/16/2008
>  00000286 00000286 d74a3ca4 df414827 d74a3ce8 dfa132ab d74a3cd4 df05677a
>  dfa075cc d74a3d04 0000572e dfa132ab 0000073e df2d7de5 0000073e f698dc00
>  e9173e70 fffffffb d74a3cf0 df0567db 00000009 00000000 d74a3ce8 dfa075cc
> Call Trace:
>  [<df414827>] dump_stack+0x58/0x81
>  [<df05677a>] __warn+0xea/0x110
>  [<df2d7de5>] ? btrfs_commit_transaction+0x2f5/0xcc0
>  [<df0567db>] warn_slowpath_fmt+0x3b/0x40
>  [<df2d7de5>] btrfs_commit_transaction+0x2f5/0xcc0
>  [<df096800>] ? prepare_to_wait_event+0xd0/0xd0
>  [<df33334f>] prepare_to_relocate+0x12f/0x180
>  [<df339a41>] relocate_block_group+0x31/0x790
>  [<df0b1427>] ? vprintk_default+0x37/0x40
>  [<df796ca0>] ? mutex_lock+0x10/0x30
>  [<df2f8f45>] ? btrfs_wait_ordered_roots+0x1d5/0x1f0
>  [<df14eed6>] ? printk+0x17/0x19
>  [<df2a47b2>] ? btrfs_printk+0x102/0x110
>  [<df33a388>] btrfs_relocate_block_group+0x1e8/0x2e0
>  [<df308a9f>] btrfs_relocate_chunk.isra.29+0x3f/0xf0
>  [<df30221f>] ? free_extent_buffer+0x4f/0xa0
>  [<df30a555>] btrfs_balance+0xb05/0x1820
>  [<df0b0afa>] ? console_unlock+0x40a/0x630
>  [<df30b2c1>] balance_kthread+0x51/0x80
>  [<df30b270>] ? btrfs_balance+0x1820/0x1820
>  [<df0722db>] kthread+0x9b/0xb0
>  [<df799922>] ret_from_kernel_thread+0xe/0x24
>  [<df072240>] ? kthread_stop+0x100/0x100
> ---[ end trace f461faff989bf259 ]---
> BTRFS: error (device dm-0) in cleanup_transaction:1854: errno=-5 IO failure
> BTRFS info (device dm-0): delayed_refs has NO entry
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-08  1:17                                                     ` Qu Wenruo
@ 2016-11-08 15:24                                                       ` Marc MERLIN
  2016-11-09  1:50                                                         ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-08 15:24 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David Sterba, Hugo Mills, linux-btrfs

On Tue, Nov 08, 2016 at 09:17:43AM +0800, Qu Wenruo wrote:
> 
> 
> At 11/08/2016 09:06 AM, Marc MERLIN wrote:
> >On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote:
> >>That's strange, balance is done completely in kernel space.
> >>
> >>Unless we're calling vfs_* function we won't go through the extra check.
> >>
> >>What's the error reported?
> >
> >See below. Note however that is may be because btrfs received messed up the
> >filesystem first.
> 
> If receive can easily screw up the fs, then fsstress can also screw up 
> btrfs easily.
> 
> So I didn't think that's the case. (Several years ago it's possible)
 
So now I'm even more confused. I put the array back in my 64bit system and
check --repair comes back clean, but scrub does not. Is that supposed to be possible?

gargamel:~# btrfs check -p --repair /dev/mapper/crypt_bcache2 2>&1 | tee /mnt/dshelf1/other/btrfs2
enabling repair mode
Checking filesystem on /dev/mapper/crypt_bcache2
UUID: 6692cf4c-93d9-438c-ac30-5db6381dc4f2
checking extents [.]
Fixed 0 roots.
cache and super generation don't match, space cache will be invalidated
checking fs roots [o]
checking csums
checking root refs
found 14622791987200 bytes used err is 0
total csum bytes: 14200176492
total tree bytes: 78239416320
total fs tree bytes: 59524497408
total extent tree bytes: 3236872192
btree space waste bytes: 10068589919
file data blocks allocated: 18101311373312
 referenced 18038641020928

Nov  8 06:55:40 gargamel kernel: [35631.988896] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 513, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988897] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 514, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988899] BTRFS warning (device dm-6): checksum error at logical 27885961216 on dev /dev/mapper/crypt_bcache2, sector 56578304, root 9461, inode 45837, offset 15459172352, length 4096, links 1 (path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988900] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 515, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988903] BTRFS warning (device dm-6): checksum error at logical 27887534080 on dev /dev/mapper/crypt_bcache2, sector 56581376, root 9461, inode 45837, offset 15460745216, length 4096, links 1 (path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988904] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887009792 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988905] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27886878720 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988906] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 516, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988907] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887837184 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988908] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 517, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988909] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 518, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988910] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885830144 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988911] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885961216 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988912] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887534080 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988882] BTRFS warning (device dm-6): checksum error at logical 27887403008 on dev /dev/mapper/crypt_bcache2, sector 56581120, root 9461, inode 45837, offset 15460614144, length 4096, links 1 (path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988885] BTRFS warning (device dm-6): checksum error at logical 27887009792 on dev /dev/mapper/crypt_bcache2, sector 56580352, root 9461, inode 45837, offset 15460220928, length 4096, links 1 (path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988887] BTRFS warning (device dm-6): checksum error at logical 27886878720 on dev /dev/mapper/crypt_bcache2, sector 56580096, root 9461, inode 45837, offset 15460089856, length 4096, links 1 (path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988890] BTRFS warning (device dm-6): checksum error at logical 27887837184 on dev /dev/mapper/crypt_bcache2, sector 56581968, root 9461, inode 45837, offset 15461048320, length 4096, links 1 (path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988895] BTRFS warning (device dm-6): checksum error at logical 27885830144 on dev /dev/mapper/crypt_bcache2, sector 56578048, root 9461, inode 45837, offset 15459041280, length 4096, links 1 (path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988896] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 513, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988897] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 514, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988899] BTRFS warning (device dm-6): checksum error at logical 27885961216 on dev /dev/mapper/crypt_bcache2, sector 56578304, root 9461, inode 45837, offset 15459172352, length 4096, links 1 (path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988900] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 515, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988903] BTRFS warning (device dm-6): checksum error at logical 27887534080 on dev /dev/mapper/crypt_bcache2, sector 56581376, root 9461, inode 45837, offset 15460745216, length 4096, links 1 (path: system/mlocate/mlocate.db)
Nov  8 06:55:40 gargamel kernel: [35631.988904] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887009792 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988905] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27886878720 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988906] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 516, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988907] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887837184 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988908] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 517, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988909] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 518, gen 0
Nov  8 06:55:40 gargamel kernel: [35631.988910] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885830144 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988911] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885961216 on dev /dev/mapper/crypt_bcache2
Nov  8 06:55:40 gargamel kernel: [35631.988912] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887534080 on dev /dev/mapper/crypt_bcache2



> >
> >BTRFS info (device dm-0): use zlib compression
> >BTRFS info (device dm-0): disk space caching is enabled
> >BTRFS info (device dm-0): has skinny extents
> >BTRFS info (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0, 
> >flush 0, corrupt 512, gen 0
> >BTRFS info (device dm-0): detected SSD devices, enabling SSD mode
> >BTRFS info (device dm-0): continuing balance
> >BTRFS info (device dm-0): The free space cache file (1593999097856) is 
> >invalid. skip it
> >
> >BTRFS info (device dm-0): The free space cache file (1671308509184) is 
> >invalid. skip it
> >
> >BTRFS info (device dm-0): relocating block group 13835461197824 flags 34
> >------------[ cut here ]------------
> >WARNING: CPU: 0 PID: 22825 at fs/btrfs/disk-io.c:520 
> >btree_csum_one_bio.isra.39+0xf7/0x100
> 
> Dirty tree block's bytenr doesn't match with page's logical.
> It seems that the tree block is not up-to-date, maybe corrupted.
> 
> Seems not related to the 8T limit.
> 
> Could you please add pr_info() to print out the 'found_start' and 'start'?
> Also I'm not familiar with this code, the number may has a clue to show 
> what's going wrong.
> 
> Thanks,
> Qu
> 
> >Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c 
> >cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 
> >snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic 
> >tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core 
> >snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400 
> >snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi 
> >hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device 
> >snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core 
> >input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media 
> >acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse 
> >lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov 
> >async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom 
> >sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii 
> >uhci_hcd usbcore usb_common
> >CPU: 0 PID: 22825 Comm: kworker/u9:2 Tainted: G        W       
> >4.8.5-ia32-20161028 #2
> >Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604    
> >07/16/2008
> >Workqueue: btrfs-worker-high btrfs_worker_helper
> > 00200286 00200286 d3d81e48 df414827 00000000 dfa12da5 d3d81e78 df05677a
> > df9ed884 00000000 00005929 dfa12da5 00000208 df2cf067 00000208 f7463fa0
> > f401a080 00000000 d3d81e8c df05684a 00000009 00000000 00000000 d3d81eb4
> >Call Trace:
> > [<df414827>] dump_stack+0x58/0x81
> > [<df05677a>] __warn+0xea/0x110
> > [<df2cf067>] ? btree_csum_one_bio.isra.39+0xf7/0x100
> > [<df05684a>] warn_slowpath_null+0x2a/0x30
> > [<df2cf067>] btree_csum_one_bio.isra.39+0xf7/0x100
> > [<df2cf085>] __btree_submit_bio_start+0x15/0x20
> > [<df2cdd10>] run_one_async_start+0x30/0x40
> > [<df31286d>] btrfs_scrubparity_helper+0xcd/0x2d0
> > [<df2cde70>] ? run_one_async_free+0x20/0x20
> > [<df312bbd>] btrfs_worker_helper+0xd/0x10
> > [<df06d05b>] process_one_work+0x10b/0x400
> > [<df06d387>] worker_thread+0x37/0x4b0
> > [<df06d350>] ? process_one_work+0x400/0x400
> > [<df0722db>] kthread+0x9b/0xb0
> > [<df799922>] ret_from_kernel_thread+0xe/0x24
> > [<df072240>] ? kthread_stop+0x100/0x100
> >---[ end trace f461faff989bf258 ]---
> >BTRFS: error (device dm-0) in btrfs_commit_transaction:2232: errno=-5 IO 
> >failure (Error while writing out transaction)
> >BTRFS info (device dm-0): forced readonly
> >BTRFS warning (device dm-0): Skipping commit of aborted transaction.
> >------------[ cut here ]------------
> >WARNING: CPU: 0 PID: 22318 at fs/btrfs/transaction.c:1854 
> >btrfs_commit_transaction+0x2f5/0xcc0
> >BTRFS: Transaction aborted (error -5)
> >Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c 
> >cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4 
> >snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic 
> >tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core 
> >snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400 
> >snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi 
> >hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device 
> >snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core 
> >input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media 
> >acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse 
> >lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov 
> >async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom 
> >sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii 
> >uhci_hcd usbcore usb_common
> >CPU: 0 PID: 22318 Comm: btrfs-balance Tainted: G        W       
> >4.8.5-ia32-20161028 #2
> >Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604    
> >07/16/2008
> > 00000286 00000286 d74a3ca4 df414827 d74a3ce8 dfa132ab d74a3cd4 df05677a
> > dfa075cc d74a3d04 0000572e dfa132ab 0000073e df2d7de5 0000073e f698dc00
> > e9173e70 fffffffb d74a3cf0 df0567db 00000009 00000000 d74a3ce8 dfa075cc
> >Call Trace:
> > [<df414827>] dump_stack+0x58/0x81
> > [<df05677a>] __warn+0xea/0x110
> > [<df2d7de5>] ? btrfs_commit_transaction+0x2f5/0xcc0
> > [<df0567db>] warn_slowpath_fmt+0x3b/0x40
> > [<df2d7de5>] btrfs_commit_transaction+0x2f5/0xcc0
> > [<df096800>] ? prepare_to_wait_event+0xd0/0xd0
> > [<df33334f>] prepare_to_relocate+0x12f/0x180
> > [<df339a41>] relocate_block_group+0x31/0x790
> > [<df0b1427>] ? vprintk_default+0x37/0x40
> > [<df796ca0>] ? mutex_lock+0x10/0x30
> > [<df2f8f45>] ? btrfs_wait_ordered_roots+0x1d5/0x1f0
> > [<df14eed6>] ? printk+0x17/0x19
> > [<df2a47b2>] ? btrfs_printk+0x102/0x110
> > [<df33a388>] btrfs_relocate_block_group+0x1e8/0x2e0
> > [<df308a9f>] btrfs_relocate_chunk.isra.29+0x3f/0xf0
> > [<df30221f>] ? free_extent_buffer+0x4f/0xa0
> > [<df30a555>] btrfs_balance+0xb05/0x1820
> > [<df0b0afa>] ? console_unlock+0x40a/0x630
> > [<df30b2c1>] balance_kthread+0x51/0x80
> > [<df30b270>] ? btrfs_balance+0x1820/0x1820
> > [<df0722db>] kthread+0x9b/0xb0
> > [<df799922>] ret_from_kernel_thread+0xe/0x24
> > [<df072240>] ? kthread_stop+0x100/0x100
> >---[ end trace f461faff989bf259 ]---
> >BTRFS: error (device dm-0) in cleanup_transaction:1854: errno=-5 IO failure
> >BTRFS info (device dm-0): delayed_refs has NO entry
> >
> 
> 
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-08 15:24                                                       ` Marc MERLIN
@ 2016-11-09  1:50                                                         ` Qu Wenruo
  2016-11-09  2:05                                                           ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-09  1:50 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: David Sterba, Hugo Mills, linux-btrfs



At 11/08/2016 11:24 PM, Marc MERLIN wrote:
> On Tue, Nov 08, 2016 at 09:17:43AM +0800, Qu Wenruo wrote:
>>
>>
>> At 11/08/2016 09:06 AM, Marc MERLIN wrote:
>>> On Tue, Nov 08, 2016 at 08:43:34AM +0800, Qu Wenruo wrote:
>>>> That's strange, balance is done completely in kernel space.
>>>>
>>>> Unless we're calling vfs_* function we won't go through the extra check.
>>>>
>>>> What's the error reported?
>>>
>>> See below. Note however that is may be because btrfs received messed up the
>>> filesystem first.
>>
>> If receive can easily screw up the fs, then fsstress can also screw up
>> btrfs easily.
>>
>> So I didn't think that's the case. (Several years ago it's possible)
>
> So now I'm even more confused. I put the array back in my 64bit system and
> check --repair comes back clean, but scrub does not. Is that supposed to be possible?

Yeah, quite possible!

The truth is, current btrfs check only checks:
1) Metadata
    while --check-data-csum option will check data, but still
    follow the restriction 3).
2) Crossing reference of metadata (contents of metadata)
3) The first good mirror/backup

So quite a lot of problems can't be detected by btrfs check:
1) Data corruption (csum mismatch)
2) 2nd mirror corruption(DUP/RAID0/10) or parity error(RAID5/6)

For btrfsck to check all mirror and data, you could try out-of-tree 
offline scrub patchset:
https://github.com/adam900710/btrfs-progs/tree/fsck_scrub

Which implements the kernel scrub equivalent in btrfs-progs.

Thanks,
Qu

>
> gargamel:~# btrfs check -p --repair /dev/mapper/crypt_bcache2 2>&1 | tee /mnt/dshelf1/other/btrfs2
> enabling repair mode
> Checking filesystem on /dev/mapper/crypt_bcache2
> UUID: 6692cf4c-93d9-438c-ac30-5db6381dc4f2
> checking extents [.]
> Fixed 0 roots.
> cache and super generation don't match, space cache will be invalidated
> checking fs roots [o]
> checking csums
> checking root refs
> found 14622791987200 bytes used err is 0
> total csum bytes: 14200176492
> total tree bytes: 78239416320
> total fs tree bytes: 59524497408
> total extent tree bytes: 3236872192
> btree space waste bytes: 10068589919
> file data blocks allocated: 18101311373312
>  referenced 18038641020928
>
> Nov  8 06:55:40 gargamel kernel: [35631.988896] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 513, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988897] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 514, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988899] BTRFS warning (device dm-6): checksum error at logical 27885961216 on dev /dev/mapper/crypt_bcache2, sector 56578304, root 9461, inode 45837, offset 15459172352, length 4096, links 1 (path: system/mlocate/mlocate.db)
> Nov  8 06:55:40 gargamel kernel: [35631.988900] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 515, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988903] BTRFS warning (device dm-6): checksum error at logical 27887534080 on dev /dev/mapper/crypt_bcache2, sector 56581376, root 9461, inode 45837, offset 15460745216, length 4096, links 1 (path: system/mlocate/mlocate.db)
> Nov  8 06:55:40 gargamel kernel: [35631.988904] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887009792 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988905] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27886878720 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988906] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 516, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988907] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887837184 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988908] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 517, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988909] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 518, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988910] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885830144 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988911] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885961216 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988912] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887534080 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988882] BTRFS warning (device dm-6): checksum error at logical 27887403008 on dev /dev/mapper/crypt_bcache2, sector 56581120, root 9461, inode 45837, offset 15460614144, length 4096, links 1 (path: system/mlocate/mlocate.db)
> Nov  8 06:55:40 gargamel kernel: [35631.988885] BTRFS warning (device dm-6): checksum error at logical 27887009792 on dev /dev/mapper/crypt_bcache2, sector 56580352, root 9461, inode 45837, offset 15460220928, length 4096, links 1 (path: system/mlocate/mlocate.db)
> Nov  8 06:55:40 gargamel kernel: [35631.988887] BTRFS warning (device dm-6): checksum error at logical 27886878720 on dev /dev/mapper/crypt_bcache2, sector 56580096, root 9461, inode 45837, offset 15460089856, length 4096, links 1 (path: system/mlocate/mlocate.db)
> Nov  8 06:55:40 gargamel kernel: [35631.988890] BTRFS warning (device dm-6): checksum error at logical 27887837184 on dev /dev/mapper/crypt_bcache2, sector 56581968, root 9461, inode 45837, offset 15461048320, length 4096, links 1 (path: system/mlocate/mlocate.db)
> Nov  8 06:55:40 gargamel kernel: [35631.988895] BTRFS warning (device dm-6): checksum error at logical 27885830144 on dev /dev/mapper/crypt_bcache2, sector 56578048, root 9461, inode 45837, offset 15459041280, length 4096, links 1 (path: system/mlocate/mlocate.db)
> Nov  8 06:55:40 gargamel kernel: [35631.988896] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 513, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988897] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 514, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988899] BTRFS warning (device dm-6): checksum error at logical 27885961216 on dev /dev/mapper/crypt_bcache2, sector 56578304, root 9461, inode 45837, offset 15459172352, length 4096, links 1 (path: system/mlocate/mlocate.db)
> Nov  8 06:55:40 gargamel kernel: [35631.988900] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 515, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988903] BTRFS warning (device dm-6): checksum error at logical 27887534080 on dev /dev/mapper/crypt_bcache2, sector 56581376, root 9461, inode 45837, offset 15460745216, length 4096, links 1 (path: system/mlocate/mlocate.db)
> Nov  8 06:55:40 gargamel kernel: [35631.988904] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887009792 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988905] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27886878720 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988906] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 516, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988907] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887837184 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988908] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 517, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988909] BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 518, gen 0
> Nov  8 06:55:40 gargamel kernel: [35631.988910] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885830144 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988911] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27885961216 on dev /dev/mapper/crypt_bcache2
> Nov  8 06:55:40 gargamel kernel: [35631.988912] BTRFS error (device dm-6): unable to fixup (regular) error at logical 27887534080 on dev /dev/mapper/crypt_bcache2
>
>
>
>>>
>>> BTRFS info (device dm-0): use zlib compression
>>> BTRFS info (device dm-0): disk space caching is enabled
>>> BTRFS info (device dm-0): has skinny extents
>>> BTRFS info (device dm-0): bdev /dev/mapper/crypt_bcache0 errs: wr 0, rd 0,
>>> flush 0, corrupt 512, gen 0
>>> BTRFS info (device dm-0): detected SSD devices, enabling SSD mode
>>> BTRFS info (device dm-0): continuing balance
>>> BTRFS info (device dm-0): The free space cache file (1593999097856) is
>>> invalid. skip it
>>>
>>> BTRFS info (device dm-0): The free space cache file (1671308509184) is
>>> invalid. skip it
>>>
>>> BTRFS info (device dm-0): relocating block group 13835461197824 flags 34
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 22825 at fs/btrfs/disk-io.c:520
>>> btree_csum_one_bio.isra.39+0xf7/0x100
>>
>> Dirty tree block's bytenr doesn't match with page's logical.
>> It seems that the tree block is not up-to-date, maybe corrupted.
>>
>> Seems not related to the 8T limit.
>>
>> Could you please add pr_info() to print out the 'found_start' and 'start'?
>> Also I'm not familiar with this code, the number may has a clue to show
>> what's going wrong.
>>
>> Thanks,
>> Qu
>>
>>> Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c
>>> cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4
>>> snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic
>>> tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core
>>> snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400
>>> snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi
>>> hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device
>>> snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core
>>> input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media
>>> acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse
>>> lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov
>>> async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom
>>> sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii
>>> uhci_hcd usbcore usb_common
>>> CPU: 0 PID: 22825 Comm: kworker/u9:2 Tainted: G        W
>>> 4.8.5-ia32-20161028 #2
>>> Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604
>>> 07/16/2008
>>> Workqueue: btrfs-worker-high btrfs_worker_helper
>>> 00200286 00200286 d3d81e48 df414827 00000000 dfa12da5 d3d81e78 df05677a
>>> df9ed884 00000000 00005929 dfa12da5 00000208 df2cf067 00000208 f7463fa0
>>> f401a080 00000000 d3d81e8c df05684a 00000009 00000000 00000000 d3d81eb4
>>> Call Trace:
>>> [<df414827>] dump_stack+0x58/0x81
>>> [<df05677a>] __warn+0xea/0x110
>>> [<df2cf067>] ? btree_csum_one_bio.isra.39+0xf7/0x100
>>> [<df05684a>] warn_slowpath_null+0x2a/0x30
>>> [<df2cf067>] btree_csum_one_bio.isra.39+0xf7/0x100
>>> [<df2cf085>] __btree_submit_bio_start+0x15/0x20
>>> [<df2cdd10>] run_one_async_start+0x30/0x40
>>> [<df31286d>] btrfs_scrubparity_helper+0xcd/0x2d0
>>> [<df2cde70>] ? run_one_async_free+0x20/0x20
>>> [<df312bbd>] btrfs_worker_helper+0xd/0x10
>>> [<df06d05b>] process_one_work+0x10b/0x400
>>> [<df06d387>] worker_thread+0x37/0x4b0
>>> [<df06d350>] ? process_one_work+0x400/0x400
>>> [<df0722db>] kthread+0x9b/0xb0
>>> [<df799922>] ret_from_kernel_thread+0xe/0x24
>>> [<df072240>] ? kthread_stop+0x100/0x100
>>> ---[ end trace f461faff989bf258 ]---
>>> BTRFS: error (device dm-0) in btrfs_commit_transaction:2232: errno=-5 IO
>>> failure (Error while writing out transaction)
>>> BTRFS info (device dm-0): forced readonly
>>> BTRFS warning (device dm-0): Skipping commit of aborted transaction.
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 22318 at fs/btrfs/transaction.c:1854
>>> btrfs_commit_transaction+0x2f5/0xcc0
>>> BTRFS: Transaction aborted (error -5)
>>> Modules linked in: bcache configs rc_hauppauge ir_kbd_i2c
>>> cpufreq_userspace cpufreq_powersave cpufreq_conservative autofs4
>>> snd_hda_codec_hdmi joydev snd_hda_codec_realtek snd_hda_codec_generic
>>> tuner_simple tuner_types tda9887 snd_hda_intel snd_hda_codec snd_hda_core
>>> snd_hwdep tda8290 coretemp snd_pcm_oss snd_mixer_oss tuner snd_pcm msp3400
>>> snd_seq_midi snd_seq_midi_event firewire_sbp2 saa7127 snd_rawmidi
>>> hwmon_vid dm_crypt dm_mod saa7115 snd_seq bttv hid_generic snd_seq_device
>>> snd_timer ehci_pci ivtv tea575x videobuf_dma_sg rc_core videobuf_core
>>> input_leds tveeprom cx2341x v4l2_common ehci_hcd videodev media
>>> acpi_cpufreq tpm_tis tpm_tis_core gpio_ich snd soundcore tpm psmouse
>>> lpc_ich evdev asus_atk0110 serio_raw lp parport raid456 async_raid6_recov
>>> async_pq async_xor async_memcpy async_tx multipath usbhid hid sr_mod cdrom
>>> sg firewire_ohci firewire_core floppy crc_itu_t i915 atl1 fjes mii
>>> uhci_hcd usbcore usb_common
>>> CPU: 0 PID: 22318 Comm: btrfs-balance Tainted: G        W
>>> 4.8.5-ia32-20161028 #2
>>> Hardware name: System manufacturer P5E-VM HDMI/P5E-VM HDMI, BIOS 0604
>>> 07/16/2008
>>> 00000286 00000286 d74a3ca4 df414827 d74a3ce8 dfa132ab d74a3cd4 df05677a
>>> dfa075cc d74a3d04 0000572e dfa132ab 0000073e df2d7de5 0000073e f698dc00
>>> e9173e70 fffffffb d74a3cf0 df0567db 00000009 00000000 d74a3ce8 dfa075cc
>>> Call Trace:
>>> [<df414827>] dump_stack+0x58/0x81
>>> [<df05677a>] __warn+0xea/0x110
>>> [<df2d7de5>] ? btrfs_commit_transaction+0x2f5/0xcc0
>>> [<df0567db>] warn_slowpath_fmt+0x3b/0x40
>>> [<df2d7de5>] btrfs_commit_transaction+0x2f5/0xcc0
>>> [<df096800>] ? prepare_to_wait_event+0xd0/0xd0
>>> [<df33334f>] prepare_to_relocate+0x12f/0x180
>>> [<df339a41>] relocate_block_group+0x31/0x790
>>> [<df0b1427>] ? vprintk_default+0x37/0x40
>>> [<df796ca0>] ? mutex_lock+0x10/0x30
>>> [<df2f8f45>] ? btrfs_wait_ordered_roots+0x1d5/0x1f0
>>> [<df14eed6>] ? printk+0x17/0x19
>>> [<df2a47b2>] ? btrfs_printk+0x102/0x110
>>> [<df33a388>] btrfs_relocate_block_group+0x1e8/0x2e0
>>> [<df308a9f>] btrfs_relocate_chunk.isra.29+0x3f/0xf0
>>> [<df30221f>] ? free_extent_buffer+0x4f/0xa0
>>> [<df30a555>] btrfs_balance+0xb05/0x1820
>>> [<df0b0afa>] ? console_unlock+0x40a/0x630
>>> [<df30b2c1>] balance_kthread+0x51/0x80
>>> [<df30b270>] ? btrfs_balance+0x1820/0x1820
>>> [<df0722db>] kthread+0x9b/0xb0
>>> [<df799922>] ret_from_kernel_thread+0xe/0x24
>>> [<df072240>] ? kthread_stop+0x100/0x100
>>> ---[ end trace f461faff989bf259 ]---
>>> BTRFS: error (device dm-0) in cleanup_transaction:1854: errno=-5 IO failure
>>> BTRFS info (device dm-0): delayed_refs has NO entry
>>>
>>
>>
>>
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-09  1:50                                                         ` Qu Wenruo
@ 2016-11-09  2:05                                                           ` Marc MERLIN
  2016-11-11  3:48                                                             ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-09  2:05 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David Sterba, Hugo Mills, linux-btrfs

On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote:
> Yeah, quite possible!
> 
> The truth is, current btrfs check only checks:
> 1) Metadata
>    while --check-data-csum option will check data, but still
>    follow the restriction 3).
> 2) Crossing reference of metadata (contents of metadata)
> 3) The first good mirror/backup
> 
> So quite a lot of problems can't be detected by btrfs check:
> 1) Data corruption (csum mismatch)
> 2) 2nd mirror corruption(DUP/RAID0/10) or parity error(RAID5/6)
> 
> For btrfsck to check all mirror and data, you could try out-of-tree 
> offline scrub patchset:
> https://github.com/adam900710/btrfs-progs/tree/fsck_scrub
> 
> Which implements the kernel scrub equivalent in btrfs-progs.

I see, thanks for the answer.
Note that this is very confusing to the end user.
If check --repair returns success, the filesystem should be clean.
Hopefully that patchset can be included in btrfs-progs

But sure enough, I'm seeing a lot of these:
BTRFS warning (device dm-6): checksum error at logical 269783986176 on dev /dev/mapper/crypt_bcache2, sector 529035384, root 16755, inode 1225897, offset 77824, length 4096, links 5 (path: magic/20150624/home/merlin/public_html/rig3/img/thumb800_302_1-Wire.jpg)

This is bad because I would expect check --repair to find them all and offer
to remove all the corrupted files after giving me a list of what I've lost,
or just recompute the checksum to be correct, know the file is now corrupted
but "clean" and I have the option of keeping them as is (ok-ish for a video
file) or restore them from backup.

The worst part with scrub is that I have to find all these files, and then
find all the snapshots they're in (maybe 10 or 20) and delete them all, and
then some of those snapshots are read only because they are btrfs send
source, so I need to destroy those snapshots and lose my btrfs send
relationship and am forced to recreate it (maybe 2 to 6 days of syncing over
a slow-ish link)

When data is corrupted, no solution is perfect, but hopefully check --repair
will indeed be able to restore the entire filesystem to a clean state, even
if some data must be lost in the process.

Thanks for considering.

Marc

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-09  2:05                                                           ` Marc MERLIN
@ 2016-11-11  3:48                                                             ` Marc MERLIN
  2016-11-11  3:55                                                               ` Qu Wenruo
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-11  3:48 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David Sterba, Hugo Mills, linux-btrfs

On Tue, Nov 08, 2016 at 06:05:19PM -0800, Marc MERLIN wrote:
> On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote:
> > Yeah, quite possible!
> > 
> > The truth is, current btrfs check only checks:
> > 1) Metadata
> >    while --check-data-csum option will check data, but still
> >    follow the restriction 3).
> > 2) Crossing reference of metadata (contents of metadata)
> > 3) The first good mirror/backup
> > 
> > So quite a lot of problems can't be detected by btrfs check:
> > 1) Data corruption (csum mismatch)
> > 2) 2nd mirror corruption(DUP/RAID0/10) or parity error(RAID5/6)
> > 
> > For btrfsck to check all mirror and data, you could try out-of-tree 
> > offline scrub patchset:
> > https://github.com/adam900710/btrfs-progs/tree/fsck_scrub
> > 
> > Which implements the kernel scrub equivalent in btrfs-progs.
> 
> I see, thanks for the answer.
> Note that this is very confusing to the end user.
> If check --repair returns success, the filesystem should be clean.
> Hopefully that patchset can be included in btrfs-progs
> 
> But sure enough, I'm seeing a lot of these:
> BTRFS warning (device dm-6): checksum error at logical 269783986176 on dev /dev/mapper/crypt_bcache2, sector 529035384, root 16755, inode 1225897, offset 77824, length 4096, links 5 (path: magic/20150624/home/merlin/public_html/rig3/img/thumb800_302_1-Wire.jpg)

So, I ran check -repair, then I ran scrub and I deleted all the files
that were referenced by pathname and failed scrub.
Now I have this:
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785128960 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1545, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785133056 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1546, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785137152 on dev /dev/mapper/crypt_bcache2
BTRFS warning (device dm-6): checksum error at logical 269784580096 on dev /dev/mapper/crypt_bcache2, sector 529036544, root 17564, inode 1225903, offset 16384: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784584192 on dev /dev/mapper/crypt_bcache2, sector 529036552, root 17564, inode 1225903, offset 20480: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784588288 on dev /dev/mapper/crypt_bcache2, sector 529036560, root 17564, inode 1225903, offset 24576: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784592384 on dev /dev/mapper/crypt_bcache2, sector 529036568, root 17564, inode 1225903, offset 28672: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784596480 on dev /dev/mapper/crypt_bcache2, sector 529036576, root 17564, inode 1225903, offset 32768: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784600576 on dev /dev/mapper/crypt_bcache2, sector 529036584, root 17564, inode 1225903, offset 36864: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784604672 on dev /dev/mapper/crypt_bcache2, sector 529036592, root 17564, inode 1225903, offset 40960: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784608768 on dev /dev/mapper/crypt_bcache2, sector 529036600, root 17564, inode 1225903, offset 45056: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269784612864 on dev /dev/mapper/crypt_bcache2, sector 529036608, root 17564, inode 1225903, offset 49152: path resolving failed with ret=-2

How am I supposed to deal with those?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: btrfs support for filesystems >8TB on 32bit architectures
  2016-11-11  3:48                                                             ` Marc MERLIN
@ 2016-11-11  3:55                                                               ` Qu Wenruo
  2016-11-12  3:17                                                                 ` when btrfs scrub reports errors and btrfs check --repair does not Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Qu Wenruo @ 2016-11-11  3:55 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: David Sterba, Hugo Mills, linux-btrfs



At 11/11/2016 11:48 AM, Marc MERLIN wrote:
> On Tue, Nov 08, 2016 at 06:05:19PM -0800, Marc MERLIN wrote:
>> On Wed, Nov 09, 2016 at 09:50:08AM +0800, Qu Wenruo wrote:
>>> Yeah, quite possible!
>>>
>>> The truth is, current btrfs check only checks:
>>> 1) Metadata
>>>    while --check-data-csum option will check data, but still
>>>    follow the restriction 3).
>>> 2) Crossing reference of metadata (contents of metadata)
>>> 3) The first good mirror/backup
>>>
>>> So quite a lot of problems can't be detected by btrfs check:
>>> 1) Data corruption (csum mismatch)
>>> 2) 2nd mirror corruption(DUP/RAID0/10) or parity error(RAID5/6)
>>>
>>> For btrfsck to check all mirror and data, you could try out-of-tree
>>> offline scrub patchset:
>>> https://github.com/adam900710/btrfs-progs/tree/fsck_scrub
>>>
>>> Which implements the kernel scrub equivalent in btrfs-progs.
>>
>> I see, thanks for the answer.
>> Note that this is very confusing to the end user.
>> If check --repair returns success, the filesystem should be clean.
>> Hopefully that patchset can be included in btrfs-progs
>>
>> But sure enough, I'm seeing a lot of these:
>> BTRFS warning (device dm-6): checksum error at logical 269783986176 on dev /dev/mapper/crypt_bcache2, sector 529035384, root 16755, inode 1225897, offset 77824, length 4096, links 5 (path: magic/20150624/home/merlin/public_html/rig3/img/thumb800_302_1-Wire.jpg)
>
> So, I ran check -repair, then I ran scrub and I deleted all the files
> that were referenced by pathname and failed scrub.
> Now I have this:
> BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785128960 on dev /dev/mapper/crypt_bcache2
> BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1545, gen 0
> BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785133056 on dev /dev/mapper/crypt_bcache2
> BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1546, gen 0
> BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785137152 on dev /dev/mapper/crypt_bcache2
> BTRFS warning (device dm-6): checksum error at logical 269784580096 on dev /dev/mapper/crypt_bcache2, sector 529036544, root 17564, inode 1225903, offset 16384: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269784584192 on dev /dev/mapper/crypt_bcache2, sector 529036552, root 17564, inode 1225903, offset 20480: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269784588288 on dev /dev/mapper/crypt_bcache2, sector 529036560, root 17564, inode 1225903, offset 24576: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269784592384 on dev /dev/mapper/crypt_bcache2, sector 529036568, root 17564, inode 1225903, offset 28672: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269784596480 on dev /dev/mapper/crypt_bcache2, sector 529036576, root 17564, inode 1225903, offset 32768: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269784600576 on dev /dev/mapper/crypt_bcache2, sector 529036584, root 17564, inode 1225903, offset 36864: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269784604672 on dev /dev/mapper/crypt_bcache2, sector 529036592, root 17564, inode 1225903, offset 40960: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269784608768 on dev /dev/mapper/crypt_bcache2, sector 529036600, root 17564, inode 1225903, offset 45056: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269784612864 on dev /dev/mapper/crypt_bcache2, sector 529036608, root 17564, inode 1225903, offset 49152: path resolving failed with ret=-2
>
> How am I supposed to deal with those?

It seems to be orphan inodes.
Btrfs doesn't remove all the contents of an inode at rm time.
It just unlink the inode and put it into a state called orphan 
inodes.(Can't be referred from any directory).

And then free their data extents in next several trans.


Try to find these inodes using inode number in specified subvolume.
If not found, then they are orphan inodes, nothing to worry.
These wrong data extent will disappear soon or later.

Or you can use "btrfs fi sync" to make sure orphan inodes are really 
removed from tree.

Thanks,
Qu
>
> Thanks,
> Marc
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: when btrfs scrub reports errors and btrfs check --repair does not
  2016-11-11  3:55                                                               ` Qu Wenruo
@ 2016-11-12  3:17                                                                 ` Marc MERLIN
  2016-11-13 15:06                                                                   ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-12  3:17 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David Sterba, Hugo Mills, linux-btrfs

On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote:
> It seems to be orphan inodes.
> Btrfs doesn't remove all the contents of an inode at rm time.
> It just unlink the inode and put it into a state called orphan inodes.(Can't
> be referred from any directory).

BTRFS warning (device dm-6): checksum error at logical 269783928832 on dev /dev/mapper/crypt_bcache2, sector 529035272, root 17564, inode 1225897, offset 20480: path resolving failed with ret=-2
BTRFS warning (device dm-6): checksum error at logical 269783932928 on dev /dev/mapper/crypt_bcache2, sector 529035280, root 17564, inode 1225897, offset 24576: path resolving failed with ret=-2
 
Do you mean I should be using find /mnt/mnt -inum ?
Well, how about that, you're right:
gargamel:/mnt/mnt/DS2/backup# find /mnt/mnt -inum 1225897
/mnt/mnt/DS2/backup/debian64_rw.20160713_03:21:57/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
So basically the breakage in my filesystem is enough that the backlink
from the inode to the pathname is gone? That's not good :-/

> And then free their data extents in next several trans.
> 
> Try to find these inodes using inode number in specified subvolume.
> If not found, then they are orphan inodes, nothing to worry.
> These wrong data extent will disappear soon or later.
> 
> Or you can use "btrfs fi sync" to make sure orphan inodes are really removed
> from tree.
 
So, I ran btrfi fi sync /mnt/mnt, butit returned instantly.

scrub after that, still returns:
btrfs scrub start -Bd /mnt/mnt
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1793, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785628672 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1794, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269784580096 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1795, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785632768 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1796, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785104384 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1797, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269784584192 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1798, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785636864 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1799, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785108480 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1800, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269784588288 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1801, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269784055808 on dev /dev/mapper/crypt_bcache2
BTRFS error (device dm-6): bdev /dev/mapper/crypt_bcache2 errs: wr 0, rd 0, flush 0, corrupt 1802, gen 0
BTRFS error (device dm-6): unable to fixup (regular) error at logical 269785640960 on dev /dev/mapper/crypt_bcache2

What am I supposed to do about these, I'm not even clear where this
corruption is located and how to clear it.

I understand you're saying that this does not seem to affect any
remaining data, but if scrub is not clean, it can't even see what
file an inode is linked to, and that inode doesn't get cleaned 2 days
later, my filesystem is in a bad state that check --repair should fix,
is it not?

Yes, I can wipe it and start over, but I'm trying to use this as a
learning experience as well as seeing if the tools are working as they
should.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: when btrfs scrub reports errors and btrfs check --repair does not
  2016-11-12  3:17                                                                 ` when btrfs scrub reports errors and btrfs check --repair does not Marc MERLIN
@ 2016-11-13 15:06                                                                   ` Marc MERLIN
  2016-11-13 15:13                                                                     ` Roman Mamedov
  0 siblings, 1 reply; 40+ messages in thread
From: Marc MERLIN @ 2016-11-13 15:06 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: David Sterba, Hugo Mills, linux-btrfs

On Fri, Nov 11, 2016 at 07:17:08PM -0800, Marc MERLIN wrote:
> On Fri, Nov 11, 2016 at 11:55:21AM +0800, Qu Wenruo wrote:
> > It seems to be orphan inodes.
> > Btrfs doesn't remove all the contents of an inode at rm time.
> > It just unlink the inode and put it into a state called orphan inodes.(Can't
> > be referred from any directory).
> 
> BTRFS warning (device dm-6): checksum error at logical 269783928832 on dev /dev/mapper/crypt_bcache2, sector 529035272, root 17564, inode 1225897, offset 20480: path resolving failed with ret=-2
> BTRFS warning (device dm-6): checksum error at logical 269783932928 on dev /dev/mapper/crypt_bcache2, sector 529035280, root 17564, inode 1225897, offset 24576: path resolving failed with ret=-2
>  
> Do you mean I should be using find /mnt/mnt -inum ?
> Well, how about that, you're right:
> gargamel:/mnt/mnt/DS2/backup# find /mnt/mnt -inum 1225897
> /mnt/mnt/DS2/backup/debian64_rw.20160713_03:21:57/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
> So basically the breakage in my filesystem is enough that the backlink
> from the inode to the pathname is gone? That's not good :-/

Mmmn, been doing find -inum, deleting hits, running scrub, and then
scrub still fails with more, and now I'm seeing this;

gargamel:~# find /mnt/mnt -inum 1225897

/mnt/mnt/DS2/backup/ubuntu_rw.20160713_03:25:42/gandalfthegrey/20100718/var/local/www/Pix/albums/Trips/200509_Malaysia/500_KapalaiIsland/BestOf/33_Diving-Dive5-2_139.jpg
/mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
/mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
(...)
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
/mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y81z9.jpg

And then I see this:
gargamel:~# ls -li /mnt/mnt/DS2/backup/ubuntu_rw.20160713_03:25:42/gandalfthegrey/20100718/var/local/www/Pix/albums/Trips/200509_Malaysia/500_KapalaiIsland/BestOf/33_Diving-Dive5-2_139.jpg /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y81z9.jpg
1225897 -rw-r--r-- 5 merlin merlin 13794 Jan  7  2012 /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
1225898 -rw-r--r-- 5 merlin merlin 13048 Jan  7  2012 /mnt/mnt/DS2/backup/debian64_ro.20160720_02:58:38/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
1225897 -rw-r--r-- 5 merlin merlin 13794 Jan  7  2012 /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y678z6.jpg
1225898 -rw-r--r-- 5 merlin merlin 13048 Jan  7  2012 /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y679z6.jpg
1225913 -rw-r--r-- 5 merlin merlin 15247 Jan  7  2012 /mnt/mnt/DS2/backup/debian64_rw.20160727_02:59:03/gandalfthegreat/20120409/home/merlin/public_html/mirrors/rwf/vfrcharts/x9y81z9.jpg
1225897 lrwxrwxrwx 1 merlin merlin    35 Aug  1  2010 /mnt/mnt/DS2/backup/ubuntu_rw.20160713_03:25:42/gandalfthegrey/20100718/var/local/www/Pix/albums/Trips/200509_Malaysia/500_KapalaiIsland/BestOf/33_Diving-Dive5-2_139.jpg -> ../33_Diving/BestOf/Dive5-2_139.jpg

So first:
a) find -inum returns some inodes that don't match
b) but argh, multiple files (very different) have the same inode number, so finding
files by inode number after scrub flagged an inode bad, isn't going to work :(

At this point, I'm starting to lose patience (and running out of time),
so I'm going to wipe this filesystem after I hear back from you, but
basically scrub and repair and still not up to what they should be IMO
(as per my previous comment):
One should be able to fully repair an unclean filesystem with check --repair, and scrub should
give me things I can either fix by hand (delete the corrupt file) or
that check --repair would fix, and neither is true here.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: when btrfs scrub reports errors and btrfs check --repair does not
  2016-11-13 15:06                                                                   ` Marc MERLIN
@ 2016-11-13 15:13                                                                     ` Roman Mamedov
  2016-11-13 15:52                                                                       ` Marc MERLIN
  0 siblings, 1 reply; 40+ messages in thread
From: Roman Mamedov @ 2016-11-13 15:13 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On Sun, 13 Nov 2016 07:06:30 -0800
Marc MERLIN <marc@merlins.org> wrote:

> So first:
> a) find -inum returns some inodes that don't match
> b) but argh, multiple files (very different) have the same inode number, so finding
> files by inode number after scrub flagged an inode bad, isn't going to work :(

I wonder why do you even need scrub to verify file readability. Just try
reading all files by using e.g. "cfv -Crr", the read errors produced will
point you directly to files which are unreadable, without the need to lookup
them in a backward way via inum. Then just restore those from backups.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: when btrfs scrub reports errors and btrfs check --repair does not
  2016-11-13 15:13                                                                     ` Roman Mamedov
@ 2016-11-13 15:52                                                                       ` Marc MERLIN
  0 siblings, 0 replies; 40+ messages in thread
From: Marc MERLIN @ 2016-11-13 15:52 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-btrfs

On Sun, Nov 13, 2016 at 08:13:29PM +0500, Roman Mamedov wrote:
> On Sun, 13 Nov 2016 07:06:30 -0800
> Marc MERLIN <marc@merlins.org> wrote:
> 
> > So first:
> > a) find -inum returns some inodes that don't match
> > b) but argh, multiple files (very different) have the same inode number, so finding
> > files by inode number after scrub flagged an inode bad, isn't going to work :(
> 
> I wonder why do you even need scrub to verify file readability. Just try
> reading all files by using e.g. "cfv -Crr", the read errors produced will
> point you directly to files which are unreadable, without the need to lookup
> them in a backward way via inum. Then just restore those from backups.

I could read the files, but we're talking about maybe 100 million files?
that would take a while... (and most of them are COW copies of the same
physical data), so scrub is _much_ faster.

Scrub is also reporting issues not related to files, but data structures
it seems, while repair is not fiding them.

As for the data, it's a backup device, so I can just wipe it, but again,
I'm using this as an example of how I would simply bring a drive back to
a clean state, and that's not pretty right now.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2016-11-13 15:52 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-30 18:34 btrfs check --repair: ERROR: cannot read chunk root Marc MERLIN
2016-10-31  1:02 ` Qu Wenruo
2016-10-31  2:06   ` Marc MERLIN
2016-10-31  4:21     ` Marc MERLIN
2016-10-31  5:27     ` Qu Wenruo
2016-10-31  5:47       ` Marc MERLIN
2016-10-31  6:04         ` Qu Wenruo
2016-10-31  6:25           ` Marc MERLIN
2016-10-31  6:32             ` Qu Wenruo
2016-10-31  6:37               ` Marc MERLIN
2016-10-31  7:04                 ` Qu Wenruo
2016-10-31  8:44                   ` Hugo Mills
2016-10-31 15:04                     ` Marc MERLIN
2016-11-01  3:48                       ` Marc MERLIN
2016-11-01  4:13                       ` Qu Wenruo
2016-11-01  4:21                         ` Marc MERLIN
2016-11-04  8:01                           ` Marc MERLIN
2016-11-04  9:00                             ` Roman Mamedov
2016-11-04 17:59                               ` Marc MERLIN
2016-11-07  1:11                             ` Qu Wenruo
     [not found]                               ` <87lgwwnnyf.fsf@notabene.neil.brown.name>
2016-11-07  1:20                                 ` clearing blocks wrongfully marked as bad if --update=no-bbl can't be used? Marc MERLIN
2016-11-07  1:39                                   ` Qu Wenruo
2016-11-07  4:18                                     ` Qu Wenruo
2016-11-07  5:36                                       ` btrfs support for filesystems >8TB on 32bit architectures Marc MERLIN
2016-11-07  6:16                                         ` Qu Wenruo
2016-11-07 14:55                                           ` Marc MERLIN
2016-11-08  0:35                                             ` Qu Wenruo
2016-11-08  0:39                                               ` Marc MERLIN
2016-11-08  0:43                                                 ` Qu Wenruo
2016-11-08  1:06                                                   ` Marc MERLIN
2016-11-08  1:17                                                     ` Qu Wenruo
2016-11-08 15:24                                                       ` Marc MERLIN
2016-11-09  1:50                                                         ` Qu Wenruo
2016-11-09  2:05                                                           ` Marc MERLIN
2016-11-11  3:48                                                             ` Marc MERLIN
2016-11-11  3:55                                                               ` Qu Wenruo
2016-11-12  3:17                                                                 ` when btrfs scrub reports errors and btrfs check --repair does not Marc MERLIN
2016-11-13 15:06                                                                   ` Marc MERLIN
2016-11-13 15:13                                                                     ` Roman Mamedov
2016-11-13 15:52                                                                       ` Marc MERLIN

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).