Linux Btrfs filesystem development
 help / color / mirror / Atom feed
* RAID1 fails to recover chunk tree
@ 2014-10-15 15:42 Zack Coffey
  0 siblings, 0 replies; 23+ messages in thread
From: Zack Coffey @ 2014-10-15 15:42 UTC (permalink / raw)
  To: linux-btrfs

Revisit of a previous issue. Summary, single drive btrfs has lots of
data. Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive.

Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
3.12. Was running fine as a single drive for a while, I added another
drive to the system and wanted to see what RAID1 for metadata would
look like. Turned it on, was doing fine. Forgot I had done that,
shutdown the PC and removed the extra drive. Now nothing I've tried
can access the original single drive.

$ sudo mount -o degraded /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

$ dmesg | tail
[45353.869448] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901511] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901666] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148488] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148573] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
devid 1 transid 60944 /dev/sdc1
[46241.155923] btrfs: allowing degraded mounts
[46241.155927] btrfs: disk space caching is enabled
[46241.159436] btrfs: failed to read chunk root on sdc1
[46241.177815] btrfs: open_ctree failed

$ btrfs-show-super /dev/sdc1
superblock: bytenr=65536, device=/dev/sdc1
------------------------------
---------------------------
csum                    0x93bcb1b5 [match]
bytenr                  65536
flags                   0x1
magic                   _BHRfS_M [match]
fsid                    bd78815a-802b-43e2-8387-fc6ab4237d67
label
generation              60944
root                    909586694144
sys_array_size          97
chunk_root_generation   60938
root_level              1
chunk_root              911673917440
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             1115871535104
bytes_used              321833435136
sectorsize              4096
nodesize                4096
leafsize                4096
stripesize              4096
root_dir                6
num_devices             2
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x9
csum_type               0
csum_size               4
cache_generation        60944
uuid_tree_generation    60944
dev_item.uuid           d82b2027-17b6-4513-a86d-9227a42d7ed1
dev_item.fsid           bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
dev_item.type           0
dev_item.total_bytes    615763673088
dev_item.bytes_used     324270030848
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0


$ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for device

$ sudo btrfs device delete missing /dev/sdc1
ERROR: error removing the device 'missing' - Inappropriate ioctl for device

$ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

$ dmesg | tail
[106991.655384] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[106991.665066] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.954397] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.962009] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.124927] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.126475] btrfs: allowing degraded mounts
[107070.126479] btrfs: use lzo compression
[107070.126480] btrfs: disk space caching is enabled
[107070.127254] btrfs: failed to read chunk root on sdc1
[107070.142983] btrfs: open_ctree failed

$ sudo btrfs rescue super-recover -v /dev/sdc1
All Devices:
        Device: id = 1, name = /dev/sdc1

Before Recovering:
        [All good supers]:
                device name = /dev/sdc1
                superblock bytenr = 65536

                device name = /dev/sdc1
                superblock bytenr = 67108864

                device name = /dev/sdc1
                superblock bytenr = 274877906944

        [All bad supers]:

All supers are valid, no need to recover

$ sudo btrfs check /dev/sdc1
warning, device 2 is missing
Check tree block failed, want=911673917440, have=0
read block failed check_tree_block
Couldn't read chunk root
Couldn't open file system

$ sudo btrfs check --repair /dev/sdc1
enabling repair mode
warning, device 2 is missing
Check tree block failed, want=911673917440, have=0
read block failed check_tree_block
Couldn't read chunk root
Couldn't open file system

$ btrfs rescue chunk-recover -v /dev/sdc1
<<snipped>>
Chunk: start = 860100755456, len = 1073741824, type = 1, num_stripes = 1
      Stripes list:
      [ 0] Stripe: devid = 1, offset = 26877100032
      No block group.
      No device extent.
  Chunk: start = 861174497280, len = 1073741824, type = 1, num_stripes = 1
      Stripes list:
      [ 0] Stripe: devid = 1, offset = 27950841856
      No block group.
      No device extent.

Total Chunks:   333
  Heathy:       305
  Bad:  28

Orphan Block Groups:
  Block Group: start = 872985657344, len = 1073741824, flag = 4
  Block Group: start = 911673917440, len = 33554432, flag = 2
  Block Group: start = 911707471872, len = 1073741824, flag = 4

Orphan Device Extents:
  Device extent: devid = 2, start = 2182086656, len = 33554432, chunk
offset = 911673917440
  Device extent: devid = 2, start = 2215641088, len = 1073741824,
chunk offset = 911707471872
Fail to recover the chunk tree.
<</snipped>>

Here's the full snipped paste: http://pastebin.com/fEm3Gup7

Now I'm on openSUSE Tumbleweed (kernel 3.17). Still get the same
result from 'chunk-recover'. There's 305 healthy chunks, is there
anyway to recover that data and forget about the bad ones?

A good portion of the data on that drive was backed up, but some
wasn't. My fault, I've learned. Can I get anything back from that
drive?

Thanks
Zack

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RAID1 fails to recover chunk tree
@ 2014-10-15 21:09 Zack Coffey
  0 siblings, 0 replies; 23+ messages in thread
From: Zack Coffey @ 2014-10-15 21:09 UTC (permalink / raw)
  To: BTRFS ML

Revisit of a previous issue. Summary, single drive btrfs has lots of
data. Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive.

Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
3.12. Was running fine as a single drive for a while, I added another
drive to the system and wanted to see what RAID1 for metadata would
look like. Turned it on, was doing fine. Forgot I had done that,
shutdown the PC and removed the extra drive. Now nothing I've tried
can access the original single drive.

$ sudo mount -o degraded /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

$ dmesg | tail
[45353.869448] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901511] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901666] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148488] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148573] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
devid 1 transid 60944 /dev/sdc1
[46241.155923] btrfs: allowing degraded mounts
[46241.155927] btrfs: disk space caching is enabled
[46241.159436] btrfs: failed to read chunk root on sdc1
[46241.177815] btrfs: open_ctree failed

$ btrfs-show-super /dev/sdc1
superblock: bytenr=65536, device=/dev/sdc1
------------------------------
---------------------------
csum                    0x93bcb1b5 [match]
bytenr                  65536
flags                   0x1
magic                   _BHRfS_M [match]
fsid                    bd78815a-802b-43e2-8387-fc6ab4237d67
label
generation              60944
root                    909586694144
sys_array_size          97
chunk_root_generation   60938
root_level              1
chunk_root              911673917440
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             1115871535104
bytes_used              321833435136
sectorsize              4096
nodesize                4096
leafsize                4096
stripesize              4096
root_dir                6
num_devices             2
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x9
csum_type               0
csum_size               4
cache_generation        60944
uuid_tree_generation    60944
dev_item.uuid           d82b2027-17b6-4513-a86d-9227a42d7ed1
dev_item.fsid           bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
dev_item.type           0
dev_item.total_bytes    615763673088
dev_item.bytes_used     324270030848
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0


$ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for device

$ sudo btrfs device delete missing /dev/sdc1
ERROR: error removing the device 'missing' - Inappropriate ioctl for device

$ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

$ dmesg | tail
[106991.655384] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[106991.665066] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.954397] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.962009] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.124927] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.126475] btrfs: allowing degraded mounts
[107070.126479] btrfs: use lzo compression
[107070.126480] btrfs: disk space caching is enabled
[107070.127254] btrfs: failed to read chunk root on sdc1
[107070.142983] btrfs: open_ctree failed

$ sudo btrfs rescue super-recover -v /dev/sdc1
All Devices:
        Device: id = 1, name = /dev/sdc1

Before Recovering:
        [All good supers]:
                device name = /dev/sdc1
                superblock bytenr = 65536

                device name = /dev/sdc1
                superblock bytenr = 67108864

                device name = /dev/sdc1
                superblock bytenr = 274877906944

        [All bad supers]:

All supers are valid, no need to recover

$ sudo btrfs check /dev/sdc1
warning, device 2 is missing
Check tree block failed, want=911673917440, have=0
read block failed check_tree_block
Couldn't read chunk root
Couldn't open file system

$ sudo btrfs check --repair /dev/sdc1
enabling repair mode
warning, device 2 is missing
Check tree block failed, want=911673917440, have=0
read block failed check_tree_block
Couldn't read chunk root
Couldn't open file system

$ btrfs rescue chunk-recover -v /dev/sdc1
<<snipped>>
Chunk: start = 860100755456, len = 1073741824, type = 1, num_stripes = 1
      Stripes list:
      [ 0] Stripe: devid = 1, offset = 26877100032
      No block group.
      No device extent.
  Chunk: start = 861174497280, len = 1073741824, type = 1, num_stripes = 1
      Stripes list:
      [ 0] Stripe: devid = 1, offset = 27950841856
      No block group.
      No device extent.

Total Chunks:   333
  Heathy:       305
  Bad:  28

Orphan Block Groups:
  Block Group: start = 872985657344, len = 1073741824, flag = 4
  Block Group: start = 911673917440, len = 33554432, flag = 2
  Block Group: start = 911707471872, len = 1073741824, flag = 4

Orphan Device Extents:
  Device extent: devid = 2, start = 2182086656, len = 33554432, chunk
offset = 911673917440
  Device extent: devid = 2, start = 2215641088, len = 1073741824,
chunk offset = 911707471872
Fail to recover the chunk tree.
<</snipped>>

I can give the full copy of what was snipped if requested.

Now I'm on openSUSE Tumbleweed (kernel 3.17). Still get the same
result from 'chunk-recover'. There's 305 healthy chunks, is there
anyway to recover that data and forget about the bad ones?

A good portion of the data on that drive was backed up, but some
wasn't. My fault, I've learned. Can I get anything back from that
drive?

Thanks
Zack

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RAID1 fails to recover chunk tree
@ 2014-10-27 19:01 Zack Coffey
  0 siblings, 0 replies; 23+ messages in thread
From: Zack Coffey @ 2014-10-27 19:01 UTC (permalink / raw)
  To: BTRFS ML

Revisit of a previous issue. Setup a single 640GB drive with BTRFS and
compression. This was not a system drive, just a place to put random
junk.

Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive. Data remained single
while only metadata was RAID1.

Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
3.12.

$ sudo mount -o degraded /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

$ dmesg | tail
[45353.869448] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/
gal.c at line:
304!
[45353.901511] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901666] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148488] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148573] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
devid 1 transid 60944 /dev/sdc1
[46241.155923] btrfs: allowing degraded mounts
[46241.155927] btrfs: disk space caching is enabled
[46241.159436] btrfs: failed to read chunk root on sdc1
[46241.177815] btrfs: open_ctree failed

$ btrfs-show-super /dev/sdc1
superblock: bytenr=65536, device=/dev/sdc1
------------------------------
---------------------------
csum                    0x93bcb1b5 [match]
bytenr                  65536
flags                   0x1
magic                   _BHRfS_M [match]
fsid                    bd78815a-802b-43e2-8387-fc6ab4237d67
label
generation              60944
root                    909586694144
sys_array_size          97
chunk_root_generation   60938
root_level              1
chunk_root              911673917440
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             1115871535104
bytes_used              321833435136
sectorsize              4096
nodesize                4096
leafsize                4096
stripesize              4096
root_dir                6
num_devices             2
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x9
csum_type               0
csum_size               4
cache_generation        60944
uuid_tree_generation    60944
dev_item.uuid           d82b2027-17b6-4513-a86d-9227a42d7ed1
dev_item.fsid           bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
dev_item.type           0
dev_item.total_bytes    615763673088
dev_item.bytes_used     324270030848
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0


$ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for device

$ sudo btrfs device delete missing /dev/sdc1
ERROR: error removing the device 'missing' - Inappropriate ioctl for device

$ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

$ dmesg | tail
[106991.655384] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[106991.665066] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.954397] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.962009] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.124927] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.126475] btrfs: allowing degraded mounts
[107070.126479] btrfs: use lzo compression
[107070.126480] btrfs: disk space caching is enabled
[107070.127254] btrfs: failed to read chunk root on sdc1
[107070.142983] btrfs: open_ctree failed

$ sudo btrfs rescue super-recover -v /dev/sdc1
All Devices:
        Device: id = 1, name = /dev/sdc1

Before Recovering:
        [All good supers]:
                device name = /dev/sdc1
                superblock bytenr = 65536

                device name = /dev/sdc1
                superblock bytenr = 67108864

                device name = /dev/sdc1
                superblock bytenr = 274877906944

        [All bad supers]:

All supers are valid, no need to recover

$ btrfs rescue chunk-recover -v /dev/sdc1
<<snipped>>
Chunk: start = 860100755456, len = 1073741824, type = 1, num_stripes = 1
      Stripes list:
      [ 0] Stripe: devid = 1, offset = 26877100032
      No block group.
      No device extent.
  Chunk: start = 861174497280, len = 1073741824, type = 1, num_stripes = 1
      Stripes list:
      [ 0] Stripe: devid = 1, offset = 27950841856
      No block group.
      No device extent.

Total Chunks:   333
  Heathy:       305
  Bad:  28

Orphan Block Groups:
  Block Group: start = 872985657344, len = 1073741824, flag = 4
  Block Group: start = 911673917440, len = 33554432, flag = 2
  Block Group: start = 911707471872, len = 1073741824, flag = 4

Orphan Device Extents:
  Device extent: devid = 2, start = 2182086656, len = 33554432, chunk
offset = 911673917440
  Device extent: devid = 2, start = 2215641088, len = 1073741824,
chunk offset = 911707471872
Fail to recover the chunk tree.
<</snipped>>

Here's the full snipped paste: http://pastebin.com/fEm3Gup7

Now I'm on openSUSE Tumbleweed (kernel 3.17). Still get the same
result from 'chunk-recover'. There's 305 healthy chunks, is there
anyway to recover that data and forget about the bad ones?

A good portion of the data on that drive was backed up, but some
wasn't. My fault, I've learned. Can I get anything back from that
drive?

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RAID1 fails to recover chunk tree
@ 2014-10-28 20:18 Zack Coffey
  0 siblings, 0 replies; 23+ messages in thread
From: Zack Coffey @ 2014-10-28 20:18 UTC (permalink / raw)
  To: BTRFS ML

Revisit of a previous issue. Setup a single 640GB drive with BTRFS and
compression. This was not a system drive, just a place to put random
junk.

Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive. Data remained single
while only metadata was RAID1.

Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
3.12.

$ sudo mount -o degraded /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

$ dmesg | tail
[45353.869448] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/
gal.c at line:
304!
[45353.901511] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901666] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148488] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148573] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
devid 1 transid 60944 /dev/sdc1
[46241.155923] btrfs: allowing degraded mounts
[46241.155927] btrfs: disk space caching is enabled
[46241.159436] btrfs: failed to read chunk root on sdc1
[46241.177815] btrfs: open_ctree failed

$ btrfs-show-super /dev/sdc1
superblock: bytenr=65536, device=/dev/sdc1
------------------------------
---------------------------
csum                    0x93bcb1b5 [match]
bytenr                  65536
flags                   0x1
magic                   _BHRfS_M [match]
fsid                    bd78815a-802b-43e2-8387-fc6ab4237d67
label
generation              60944
root                    909586694144
sys_array_size          97
chunk_root_generation   60938
root_level              1
chunk_root              911673917440
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             1115871535104
bytes_used              321833435136
sectorsize              4096
nodesize                4096
leafsize                4096
stripesize              4096
root_dir                6
num_devices             2
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x9
csum_type               0
csum_size               4
cache_generation        60944
uuid_tree_generation    60944
dev_item.uuid           d82b2027-17b6-4513-a86d-9227a42d7ed1
dev_item.fsid           bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
dev_item.type           0
dev_item.total_bytes    615763673088
dev_item.bytes_used     324270030848
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0


$ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for device

$ sudo btrfs device delete missing /dev/sdc1
ERROR: error removing the device 'missing' - Inappropriate ioctl for device

$ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

$ dmesg | tail
[106991.655384] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[106991.665066] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.954397] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.962009] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.124927] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.126475] btrfs: allowing degraded mounts
[107070.126479] btrfs: use lzo compression
[107070.126480] btrfs: disk space caching is enabled
[107070.127254] btrfs: failed to read chunk root on sdc1
[107070.142983] btrfs: open_ctree failed

$ sudo btrfs rescue super-recover -v /dev/sdc1
All Devices:
        Device: id = 1, name = /dev/sdc1

Before Recovering:
        [All good supers]:
                device name = /dev/sdc1
                superblock bytenr = 65536

                device name = /dev/sdc1
                superblock bytenr = 67108864

                device name = /dev/sdc1
                superblock bytenr = 274877906944

        [All bad supers]:

All supers are valid, no need to recover

$ btrfs rescue chunk-recover -v /dev/sdc1
<<snipped>>
Chunk: start = 860100755456, len = 1073741824, type = 1, num_stripes = 1
      Stripes list:
      [ 0] Stripe: devid = 1, offset = 26877100032
      No block group.
      No device extent.
  Chunk: start = 861174497280, len = 1073741824, type = 1, num_stripes = 1
      Stripes list:
      [ 0] Stripe: devid = 1, offset = 27950841856
      No block group.
      No device extent.

Total Chunks:   333
  Heathy:       305
  Bad:  28

Orphan Block Groups:
  Block Group: start = 872985657344, len = 1073741824, flag = 4
  Block Group: start = 911673917440, len = 33554432, flag = 2
  Block Group: start = 911707471872, len = 1073741824, flag = 4

Orphan Device Extents:
  Device extent: devid = 2, start = 2182086656, len = 33554432, chunk
offset = 911673917440
  Device extent: devid = 2, start = 2215641088, len = 1073741824,
chunk offset = 911707471872
Fail to recover the chunk tree.
<</snipped>>

Here's the full snipped paste: http://pastebin.com/fEm3Gup7

Now I'm on openSUSE Tumbleweed (kernel 3.17). Still get the same
result from 'chunk-recover'. There's 305 healthy chunks, is there
anyway to recover that data and forget about the bad ones?

A good portion of the data on that drive was backed up, but some
wasn't. My fault, I've learned. Can I get anything back from that
drive?

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RAID1 fails to recover chunk tree
@ 2014-10-28 20:32 Zack Coffey
  2014-10-29  3:55 ` Anand Jain
  2014-10-29 22:26 ` Robert White
  0 siblings, 2 replies; 23+ messages in thread
From: Zack Coffey @ 2014-10-28 20:32 UTC (permalink / raw)
  To: linux-btrfs

Revisit of a previous issue. Setup a single 640GB drive with BTRFS and
compression. This was not a system drive, just a place to put random
junk.

Made a RAID1 with another drive of just the metadata. Was in
that state for less than 12 hours-ish, removed the second drive and
now cannot get to any data on the original drive. Data remained single
while only metadata was RAID1.

Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
3.12.

$ sudo mount -o degraded /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
        missing codepage or helper program, or other error
        In some cases useful info is found in syslog - try
        dmesg | tail  or so

$ dmesg | tail
[45353.869448] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/
gal.c at line:
304!
[45353.901511] KBD BUG in
../../../../../../../../
drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45353.901666] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148488] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[45354.148573] KBD BUG in
../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
304!
[46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
devid 1 transid 60944 /dev/sdc1
[46241.155923] btrfs: allowing degraded mounts
[46241.155927] btrfs: disk space caching is enabled
[46241.159436] btrfs: failed to read chunk root on sdc1
[46241.177815] btrfs: open_ctree failed

$ btrfs-show-super /dev/sdc1
superblock: bytenr=65536, device=/dev/sdc1
------------------------------
---------------------------
csum                    0x93bcb1b5 [match]
bytenr                  65536
flags                   0x1
magic                   _BHRfS_M [match]
fsid                    bd78815a-802b-43e2-8387-fc6ab4237d67
label
generation              60944
root                    909586694144
sys_array_size          97
chunk_root_generation   60938
root_level              1
chunk_root              911673917440
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             1115871535104
bytes_used              321833435136
sectorsize              4096
nodesize                4096
leafsize                4096
stripesize              4096
root_dir                6
num_devices             2
compat_flags            0x0
compat_ro_flags         0x0
incompat_flags          0x9
csum_type               0
csum_size               4
cache_generation        60944
uuid_tree_generation    60944
dev_item.uuid           d82b2027-17b6-4513-a86d-9227a42d7ed1
dev_item.fsid           bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
dev_item.type           0
dev_item.total_bytes    615763673088
dev_item.bytes_used     324270030848
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0


$ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for device

$ sudo btrfs device delete missing /dev/sdc1
ERROR: error removing the device 'missing' - Inappropriate ioctl for device

$ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
        missing codepage or helper program, or other error
        In some cases useful info is found in syslog - try
        dmesg | tail  or so

$ dmesg | tail
[106991.655384] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[106991.665066] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.954397] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107019.962009] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.124927] btrfs: device fsid
bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
[107070.126475] btrfs: allowing degraded mounts
[107070.126479] btrfs: use lzo compression
[107070.126480] btrfs: disk space caching is enabled
[107070.127254] btrfs: failed to read chunk root on sdc1
[107070.142983] btrfs: open_ctree failed

$ sudo btrfs rescue super-recover -v /dev/sdc1
All Devices:
         Device: id = 1, name = /dev/sdc1

Before Recovering:
         [All good supers]:
                 device name = /dev/sdc1
                 superblock bytenr = 65536

                 device name = /dev/sdc1
                 superblock bytenr = 67108864

                 device name = /dev/sdc1
                 superblock bytenr = 274877906944

         [All bad supers]:

All supers are valid, no need to recover

$ btrfs rescue chunk-recover -v /dev/sdc1
<<snipped>>
Chunk: start = 860100755456, len = 1073741824, type = 1, num_stripes = 1
       Stripes list:
       [ 0] Stripe: devid = 1, offset = 26877100032
       No block group.
       No device extent.
   Chunk: start = 861174497280, len = 1073741824, type = 1, num_stripes = 1
       Stripes list:
       [ 0] Stripe: devid = 1, offset = 27950841856
       No block group.
       No device extent.

Total Chunks:   333
   Heathy:       305
   Bad:  28

Orphan Block Groups:
   Block Group: start = 872985657344, len = 1073741824, flag = 4
   Block Group: start = 911673917440, len = 33554432, flag = 2
   Block Group: start = 911707471872, len = 1073741824, flag = 4

Orphan Device Extents:
   Device extent: devid = 2, start = 2182086656, len = 33554432, chunk
offset = 911673917440
   Device extent: devid = 2, start = 2215641088, len = 1073741824,
chunk offset = 911707471872
Fail to recover the chunk tree.
<</snipped>>

Here's the full snipped paste: http://pastebin.com/fEm3Gup7

Now I'm on openSUSE Tumbleweed (kernel 3.17). Still get the same
result from 'chunk-recover'. There's 305 healthy chunks, is there
anyway to recover that data and forget about the bad ones?

A good portion of the data on that drive was backed up, but some
wasn't. My fault, I've learned. Can I get anything back from that
drive?

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-28 20:32 RAID1 fails to recover chunk tree Zack Coffey
@ 2014-10-29  3:55 ` Anand Jain
  2014-10-29 19:32   ` Zack Coffey
  2014-10-29 22:26 ` Robert White
  1 sibling, 1 reply; 23+ messages in thread
From: Anand Jain @ 2014-10-29  3:55 UTC (permalink / raw)
  To: Zack Coffey; +Cc: linux-btrfs



  'mount degraded,ro'
   see if there is any non-zero non-raid1 group profile.



On 10/29/14 04:32, Zack Coffey wrote:
> Revisit of a previous issue. Setup a single 640GB drive with BTRFS and
> compression. This was not a system drive, just a place to put random
> junk.
>
> Made a RAID1 with another drive of just the metadata. Was in
> that state for less than 12 hours-ish, removed the second drive and
> now cannot get to any data on the original drive. Data remained single
> while only metadata was RAID1.
>
> Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
> 3.12.
>
> $ sudo mount -o degraded /dev/sdc1 /media/Data/
> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>         missing codepage or helper program, or other error
>         In some cases useful info is found in syslog - try
>         dmesg | tail  or so
>
> $ dmesg | tail
> [45353.869448] KBD BUG in
> ../../../../../../../../
> drivers/2d/lnx/fgl/drm/kernel/
> gal.c at line:
> 304!
> [45353.901511] KBD BUG in
> ../../../../../../../../
> drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
> 304!
> [45353.901666] KBD BUG in
> ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
> 304!
> [45354.148488] KBD BUG in
> ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
> 304!
> [45354.148573] KBD BUG in
> ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
> 304!
> [46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
> devid 1 transid 60944 /dev/sdc1
> [46241.155923] btrfs: allowing degraded mounts
> [46241.155927] btrfs: disk space caching is enabled
> [46241.159436] btrfs: failed to read chunk root on sdc1
> [46241.177815] btrfs: open_ctree failed
>
> $ btrfs-show-super /dev/sdc1
> superblock: bytenr=65536, device=/dev/sdc1
> ------------------------------
> ---------------------------
> csum                    0x93bcb1b5 [match]
> bytenr                  65536
> flags                   0x1
> magic                   _BHRfS_M [match]
> fsid                    bd78815a-802b-43e2-8387-fc6ab4237d67
> label
> generation              60944
> root                    909586694144
> sys_array_size          97
> chunk_root_generation   60938
> root_level              1
> chunk_root              911673917440
> chunk_root_level        1
> log_root                0
> log_root_transid        0
> log_root_level          0
> total_bytes             1115871535104
> bytes_used              321833435136
> sectorsize              4096
> nodesize                4096
> leafsize                4096
> stripesize              4096
> root_dir                6
> num_devices             2
> compat_flags            0x0
> compat_ro_flags         0x0
> incompat_flags          0x9
> csum_type               0
> csum_size               4
> cache_generation        60944
> uuid_tree_generation    60944
> dev_item.uuid           d82b2027-17b6-4513-a86d-9227a42d7ed1
> dev_item.fsid           bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
> dev_item.type           0
> dev_item.total_bytes    615763673088
> dev_item.bytes_used     324270030848
> dev_item.io_align       4096
> dev_item.io_width       4096
> dev_item.sector_size    4096
> dev_item.devid          1
> dev_item.dev_group      0
> dev_item.seek_speed     0
> dev_item.bandwidth      0
> dev_item.generation     0
>
>
> $ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
> ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for device
>
> $ sudo btrfs device delete missing /dev/sdc1
> ERROR: error removing the device 'missing' - Inappropriate ioctl for device
>
> $ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>         missing codepage or helper program, or other error
>         In some cases useful info is found in syslog - try
>         dmesg | tail  or so
>
> $ dmesg | tail
> [106991.655384] btrfs: device fsid
> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
> [106991.665066] btrfs: device fsid
> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
> [107019.954397] btrfs: device fsid
> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
> [107019.962009] btrfs: device fsid
> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
> [107070.124927] btrfs: device fsid
> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
> [107070.126475] btrfs: allowing degraded mounts
> [107070.126479] btrfs: use lzo compression
> [107070.126480] btrfs: disk space caching is enabled
> [107070.127254] btrfs: failed to read chunk root on sdc1
> [107070.142983] btrfs: open_ctree failed
>
> $ sudo btrfs rescue super-recover -v /dev/sdc1
> All Devices:
>          Device: id = 1, name = /dev/sdc1
>
> Before Recovering:
>          [All good supers]:
>                  device name = /dev/sdc1
>                  superblock bytenr = 65536
>
>                  device name = /dev/sdc1
>                  superblock bytenr = 67108864
>
>                  device name = /dev/sdc1
>                  superblock bytenr = 274877906944
>
>          [All bad supers]:
>
> All supers are valid, no need to recover
>
> $ btrfs rescue chunk-recover -v /dev/sdc1
> <<snipped>>
> Chunk: start = 860100755456, len = 1073741824, type = 1, num_stripes = 1
>        Stripes list:
>        [ 0] Stripe: devid = 1, offset = 26877100032
>        No block group.
>        No device extent.
>    Chunk: start = 861174497280, len = 1073741824, type = 1, num_stripes = 1
>        Stripes list:
>        [ 0] Stripe: devid = 1, offset = 27950841856
>        No block group.
>        No device extent.
>
> Total Chunks:   333
>    Heathy:       305
>    Bad:  28
>
> Orphan Block Groups:
>    Block Group: start = 872985657344, len = 1073741824, flag = 4
>    Block Group: start = 911673917440, len = 33554432, flag = 2
>    Block Group: start = 911707471872, len = 1073741824, flag = 4
>
> Orphan Device Extents:
>    Device extent: devid = 2, start = 2182086656, len = 33554432, chunk
> offset = 911673917440
>    Device extent: devid = 2, start = 2215641088, len = 1073741824,
> chunk offset = 911707471872
> Fail to recover the chunk tree.
> <</snipped>>
>
> Here's the full snipped paste: http://pastebin.com/fEm3Gup7
>
> Now I'm on openSUSE Tumbleweed (kernel 3.17). Still get the same
> result from 'chunk-recover'. There's 305 healthy chunks, is there
> anyway to recover that data and forget about the bad ones?
>
> A good portion of the data on that drive was backed up, but some
> wasn't. My fault, I've learned. Can I get anything back from that
> drive?
>
> Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-29  3:55 ` Anand Jain
@ 2014-10-29 19:32   ` Zack Coffey
  2014-10-30  3:33     ` Anand Jain
  0 siblings, 1 reply; 23+ messages in thread
From: Zack Coffey @ 2014-10-29 19:32 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs


$ sudo mount -o degraded,ro /dev/sdd1 /asdf
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
        missing codepage or helper program, or other error

        In some cases useful info is found in syslog - try
        dmesg | tail or so.
$ dmesg | tail
[524718.760792] BTRFS info (device sdd1): allowing degraded mounts
[524718.760800] BTRFS info (device sdd1): disk space caching is enabled
[524718.762087] BTRFS: failed to read chunk root on sdd1
[524718.776524] BTRFS: open_ctree failed

$ uname -a
Linux mach 3.17.1-52.g5c4d099-desktop #1 SMP PREEMPT Sat Oct 18 23:36:23 
UTC 2014 (5c4d099) x86_64 x86_64 x86_64 GNU/Linux
$ btrfs --version
Btrfs v3.16.2+20141003


On 10/28/2014 11:55 PM, Anand Jain wrote:
>
>
>  'mount degraded,ro'
>   see if there is any non-zero non-raid1 group profile.
>
>
>
> On 10/29/14 04:32, Zack Coffey wrote:
>> Revisit of a previous issue. Setup a single 640GB drive with BTRFS and
>> compression. This was not a system drive, just a place to put random
>> junk.
>>
>> Made a RAID1 with another drive of just the metadata. Was in
>> that state for less than 12 hours-ish, removed the second drive and
>> now cannot get to any data on the original drive. Data remained single
>> while only metadata was RAID1.
>>
>> Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
>> 3.12.
>>
>> $ sudo mount -o degraded /dev/sdc1 /media/Data/
>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>         missing codepage or helper program, or other error
>>         In some cases useful info is found in syslog - try
>>         dmesg | tail  or so
>>
>> $ dmesg | tail
>> [45353.869448] KBD BUG in
>> ../../../../../../../../
>> drivers/2d/lnx/fgl/drm/kernel/
>> gal.c at line:
>> 304!
>> [45353.901511] KBD BUG in
>> ../../../../../../../../
>> drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
>> 304!
>> [45353.901666] KBD BUG in
>> ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
>> 304!
>> [45354.148488] KBD BUG in
>> ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
>> 304!
>> [45354.148573] KBD BUG in
>> ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
>> 304!
>> [46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
>> devid 1 transid 60944 /dev/sdc1
>> [46241.155923] btrfs: allowing degraded mounts
>> [46241.155927] btrfs: disk space caching is enabled
>> [46241.159436] btrfs: failed to read chunk root on sdc1
>> [46241.177815] btrfs: open_ctree failed
>>
>> $ btrfs-show-super /dev/sdc1
>> superblock: bytenr=65536, device=/dev/sdc1
>> ------------------------------
>> ---------------------------
>> csum                    0x93bcb1b5 [match]
>> bytenr                  65536
>> flags                   0x1
>> magic                   _BHRfS_M [match]
>> fsid                    bd78815a-802b-43e2-8387-fc6ab4237d67
>> label
>> generation              60944
>> root                    909586694144
>> sys_array_size          97
>> chunk_root_generation   60938
>> root_level              1
>> chunk_root              911673917440
>> chunk_root_level        1
>> log_root                0
>> log_root_transid        0
>> log_root_level          0
>> total_bytes             1115871535104
>> bytes_used              321833435136
>> sectorsize              4096
>> nodesize                4096
>> leafsize                4096
>> stripesize              4096
>> root_dir                6
>> num_devices             2
>> compat_flags            0x0
>> compat_ro_flags         0x0
>> incompat_flags          0x9
>> csum_type               0
>> csum_size               4
>> cache_generation        60944
>> uuid_tree_generation    60944
>> dev_item.uuid           d82b2027-17b6-4513-a86d-9227a42d7ed1
>> dev_item.fsid           bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
>> dev_item.type           0
>> dev_item.total_bytes    615763673088
>> dev_item.bytes_used     324270030848
>> dev_item.io_align       4096
>> dev_item.io_width       4096
>> dev_item.sector_size    4096
>> dev_item.devid          1
>> dev_item.dev_group      0
>> dev_item.seek_speed     0
>> dev_item.bandwidth      0
>> dev_item.generation     0
>>
>>
>> $ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
>> ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for 
>> device
>>
>> $ sudo btrfs device delete missing /dev/sdc1
>> ERROR: error removing the device 'missing' - Inappropriate ioctl for 
>> device
>>
>> $ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>         missing codepage or helper program, or other error
>>         In some cases useful info is found in syslog - try
>>         dmesg | tail  or so
>>
>> $ dmesg | tail
>> [106991.655384] btrfs: device fsid
>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>> [106991.665066] btrfs: device fsid
>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>> [107019.954397] btrfs: device fsid
>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>> [107019.962009] btrfs: device fsid
>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>> [107070.124927] btrfs: device fsid
>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>> [107070.126475] btrfs: allowing degraded mounts
>> [107070.126479] btrfs: use lzo compression
>> [107070.126480] btrfs: disk space caching is enabled
>> [107070.127254] btrfs: failed to read chunk root on sdc1
>> [107070.142983] btrfs: open_ctree failed
>>
>> $ sudo btrfs rescue super-recover -v /dev/sdc1
>> All Devices:
>>          Device: id = 1, name = /dev/sdc1
>>
>> Before Recovering:
>>          [All good supers]:
>>                  device name = /dev/sdc1
>>                  superblock bytenr = 65536
>>
>>                  device name = /dev/sdc1
>>                  superblock bytenr = 67108864
>>
>>                  device name = /dev/sdc1
>>                  superblock bytenr = 274877906944
>>
>>          [All bad supers]:
>>
>> All supers are valid, no need to recover
>>
>> $ btrfs rescue chunk-recover -v /dev/sdc1
>> <<snipped>>
>> Chunk: start = 860100755456, len = 1073741824, type = 1, num_stripes = 1
>>        Stripes list:
>>        [ 0] Stripe: devid = 1, offset = 26877100032
>>        No block group.
>>        No device extent.
>>    Chunk: start = 861174497280, len = 1073741824, type = 1, 
>> num_stripes = 1
>>        Stripes list:
>>        [ 0] Stripe: devid = 1, offset = 27950841856
>>        No block group.
>>        No device extent.
>>
>> Total Chunks:   333
>>    Heathy:       305
>>    Bad:  28
>>
>> Orphan Block Groups:
>>    Block Group: start = 872985657344, len = 1073741824, flag = 4
>>    Block Group: start = 911673917440, len = 33554432, flag = 2
>>    Block Group: start = 911707471872, len = 1073741824, flag = 4
>>
>> Orphan Device Extents:
>>    Device extent: devid = 2, start = 2182086656, len = 33554432, chunk
>> offset = 911673917440
>>    Device extent: devid = 2, start = 2215641088, len = 1073741824,
>> chunk offset = 911707471872
>> Fail to recover the chunk tree.
>> <</snipped>>
>>
>> Here's the full snipped paste: http://pastebin.com/fEm3Gup7
>>
>> Now I'm on openSUSE Tumbleweed (kernel 3.17). Still get the same
>> result from 'chunk-recover'. There's 305 healthy chunks, is there
>> anyway to recover that data and forget about the bad ones?
>>
>> A good portion of the data on that drive was backed up, but some
>> wasn't. My fault, I've learned. Can I get anything back from that
>> drive?
>>
>> Thanks
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-28 20:32 RAID1 fails to recover chunk tree Zack Coffey
  2014-10-29  3:55 ` Anand Jain
@ 2014-10-29 22:26 ` Robert White
  2014-10-29 23:07   ` Robert White
  1 sibling, 1 reply; 23+ messages in thread
From: Robert White @ 2014-10-29 22:26 UTC (permalink / raw)
  To: Zack Coffey, linux-btrfs

On 10/28/2014 01:32 PM, Zack Coffey wrote:
> Made a RAID1 with another drive of just the metadata. Was in
> that state for less than 12 hours-ish, removed the second drive and
> now cannot get to any data on the original drive. Data remained single
> while only metadata was RAID1.

I don't know all the details but I would _never_ suspect the action you 
described to _not_ hose up the file system.

The "single" mode is not "restrict to one drive" its concatenation, as 
in treat the entire space as if it were a single drive.

In that twelve hour window data migrated. I _think_ directories may 
count as data in this sense. If a key element (say the root directory) 
migrated onto the disk you eventually removed then there is no root 
directory to read. And if not root, then any secondary directory you choose.

So sure your checksum trees and your extent maps were all duplicated in 
the mirror, but your actual data -- you know all those files that were 
copied on write -- may well be only on that second drive you pulled out.

RAID metadata, and non RAID1 data, would not safely allow for failure 
(or removal) of one drive.

I'm not sure what you expected to happen but what you did is full of fail.

You need to put the second drive back in and then coerce all the data 
back to the first drive. "btrfs device delete" is what you want. You 
_may_ need to switch the metadata back to "single" before the delete.

--Rob.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-29 22:26 ` Robert White
@ 2014-10-29 23:07   ` Robert White
  2014-10-30 13:30     ` Zack Coffey
  0 siblings, 1 reply; 23+ messages in thread
From: Robert White @ 2014-10-29 23:07 UTC (permalink / raw)
  To: Zack Coffey, linux-btrfs

On 10/29/2014 03:26 PM, Robert White wrote:
> On 10/28/2014 01:32 PM, Zack Coffey wrote:
>> Made a RAID1 with another drive of just the metadata. Was in
>> that state for less than 12 hours-ish, removed the second drive and
>> now cannot get to any data on the original drive. Data remained single
>> while only metadata was RAID1.
>
> I don't know all the details but I would _never_ suspect the action you
> described to _not_ hose up the file system.
> You need to put the second drive back in and then coerce all the data
> back to the first drive. "btrfs device delete" is what you want. You
> _may_ need to switch the metadata back to "single" before the delete.
>
> --Rob.
>

P.S. I am/was assuming you said "removed the second drive" in the normal 
sense of disconnecting and removing, as opposed to the semantic action 
of deleting the device element.

If you did do the btrfs delete, you might have needed to do a "btrfs 
filesystem sync" to make sure that all the transactions involved in the 
delete were finished and flushed to disk.

Either way, physically reattaching "the second drive" is your first 
step; presuming again that you haven't destroyed the partition or 
re-used the drive etc. If the partition will mount once the second drive 
is in place, do the delete operation (if you didn't) and then the sync 
(to make sure that everything has finished migrating etc). Then you 
should be able to re-remove the physical drive.

If you already did the delete and sync as part of what you meant by 
"remove" then sorry for the interruption of your misery. 8-)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-29 19:32   ` Zack Coffey
@ 2014-10-30  3:33     ` Anand Jain
  0 siblings, 0 replies; 23+ messages in thread
From: Anand Jain @ 2014-10-30  3:33 UTC (permalink / raw)
  To: zcoffey; +Cc: linux-btrfs



  just notice your case is different from others seen/working on.
  in your the layout has issue. its not about the raid. sorry.

  try: mount -o recovery,ro



On 10/30/2014 03:32 AM, Zack Coffey wrote:
>
> $ sudo mount -o degraded,ro /dev/sdd1 /asdf
> mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
>         missing codepage or helper program, or other error
>
>         In some cases useful info is found in syslog - try
>         dmesg | tail or so.
> $ dmesg | tail
> [524718.760792] BTRFS info (device sdd1): allowing degraded mounts
> [524718.760800] BTRFS info (device sdd1): disk space caching is enabled
> [524718.762087] BTRFS: failed to read chunk root on sdd1
> [524718.776524] BTRFS: open_ctree failed
>
> $ uname -a
> Linux mach 3.17.1-52.g5c4d099-desktop #1 SMP PREEMPT Sat Oct 18 23:36:23
> UTC 2014 (5c4d099) x86_64 x86_64 x86_64 GNU/Linux
> $ btrfs --version
> Btrfs v3.16.2+20141003
>
>
> On 10/28/2014 11:55 PM, Anand Jain wrote:
>>
>>
>>  'mount degraded,ro'
>>   see if there is any non-zero non-raid1 group profile.
>>
>>
>>
>> On 10/29/14 04:32, Zack Coffey wrote:
>>> Revisit of a previous issue. Setup a single 640GB drive with BTRFS and
>>> compression. This was not a system drive, just a place to put random
>>> junk.
>>>
>>> Made a RAID1 with another drive of just the metadata. Was in
>>> that state for less than 12 hours-ish, removed the second drive and
>>> now cannot get to any data on the original drive. Data remained single
>>> while only metadata was RAID1.
>>>
>>> Single drive btrfs was made on Ubuntu with kernel 3.13.0 and tools
>>> 3.12.
>>>
>>> $ sudo mount -o degraded /dev/sdc1 /media/Data/
>>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>>         missing codepage or helper program, or other error
>>>         In some cases useful info is found in syslog - try
>>>         dmesg | tail  or so
>>>
>>> $ dmesg | tail
>>> [45353.869448] KBD BUG in
>>> ../../../../../../../../
>>> drivers/2d/lnx/fgl/drm/kernel/
>>> gal.c at line:
>>> 304!
>>> [45353.901511] KBD BUG in
>>> ../../../../../../../../
>>> drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
>>> 304!
>>> [45353.901666] KBD BUG in
>>> ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
>>> 304!
>>> [45354.148488] KBD BUG in
>>> ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
>>> 304!
>>> [45354.148573] KBD BUG in
>>> ../../../../../../../../drivers/2d/lnx/fgl/drm/kernel/gal.c at line:
>>> 304!
>>> [46241.155350] btrfs: device fsid bd78815a-802b-43e2-8387-fc6ab4237d67
>>> devid 1 transid 60944 /dev/sdc1
>>> [46241.155923] btrfs: allowing degraded mounts
>>> [46241.155927] btrfs: disk space caching is enabled
>>> [46241.159436] btrfs: failed to read chunk root on sdc1
>>> [46241.177815] btrfs: open_ctree failed
>>>
>>> $ btrfs-show-super /dev/sdc1
>>> superblock: bytenr=65536, device=/dev/sdc1
>>> ------------------------------
>>> ---------------------------
>>> csum                    0x93bcb1b5 [match]
>>> bytenr                  65536
>>> flags                   0x1
>>> magic                   _BHRfS_M [match]
>>> fsid                    bd78815a-802b-43e2-8387-fc6ab4237d67
>>> label
>>> generation              60944
>>> root                    909586694144
>>> sys_array_size          97
>>> chunk_root_generation   60938
>>> root_level              1
>>> chunk_root              911673917440
>>> chunk_root_level        1
>>> log_root                0
>>> log_root_transid        0
>>> log_root_level          0
>>> total_bytes             1115871535104
>>> bytes_used              321833435136
>>> sectorsize              4096
>>> nodesize                4096
>>> leafsize                4096
>>> stripesize              4096
>>> root_dir                6
>>> num_devices             2
>>> compat_flags            0x0
>>> compat_ro_flags         0x0
>>> incompat_flags          0x9
>>> csum_type               0
>>> csum_size               4
>>> cache_generation        60944
>>> uuid_tree_generation    60944
>>> dev_item.uuid           d82b2027-17b6-4513-a86d-9227a42d7ed1
>>> dev_item.fsid           bd78815a-802b-43e2-8387-fc6ab4237d67 [match]
>>> dev_item.type           0
>>> dev_item.total_bytes    615763673088
>>> dev_item.bytes_used     324270030848
>>> dev_item.io_align       4096
>>> dev_item.io_width       4096
>>> dev_item.sector_size    4096
>>> dev_item.devid          1
>>> dev_item.dev_group      0
>>> dev_item.seek_speed     0
>>> dev_item.bandwidth      0
>>> dev_item.generation     0
>>>
>>>
>>> $ sudo btrfs device add -f /dev/sdh1 /dev/sdc1
>>> ERROR: error adding the device '/dev/sdh1' - Inappropriate ioctl for
>>> device
>>>
>>> $ sudo btrfs device delete missing /dev/sdc1
>>> ERROR: error removing the device 'missing' - Inappropriate ioctl for
>>> device
>>>
>>> $ sudo mount -o degraded,defaults,compress=lzo /dev/sdc1 /media/Data/
>>> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>>         missing codepage or helper program, or other error
>>>         In some cases useful info is found in syslog - try
>>>         dmesg | tail  or so
>>>
>>> $ dmesg | tail
>>> [106991.655384] btrfs: device fsid
>>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>>> [106991.665066] btrfs: device fsid
>>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>>> [107019.954397] btrfs: device fsid
>>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>>> [107019.962009] btrfs: device fsid
>>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>>> [107070.124927] btrfs: device fsid
>>> bd78815a-802b-43e2-8387-fc6ab4237d67 devid 1 transid 60944 /dev/sdc1
>>> [107070.126475] btrfs: allowing degraded mounts
>>> [107070.126479] btrfs: use lzo compression
>>> [107070.126480] btrfs: disk space caching is enabled
>>> [107070.127254] btrfs: failed to read chunk root on sdc1
>>> [107070.142983] btrfs: open_ctree failed
>>>
>>> $ sudo btrfs rescue super-recover -v /dev/sdc1
>>> All Devices:
>>>          Device: id = 1, name = /dev/sdc1
>>>
>>> Before Recovering:
>>>          [All good supers]:
>>>                  device name = /dev/sdc1
>>>                  superblock bytenr = 65536
>>>
>>>                  device name = /dev/sdc1
>>>                  superblock bytenr = 67108864
>>>
>>>                  device name = /dev/sdc1
>>>                  superblock bytenr = 274877906944
>>>
>>>          [All bad supers]:
>>>
>>> All supers are valid, no need to recover
>>>
>>> $ btrfs rescue chunk-recover -v /dev/sdc1
>>> <<snipped>>
>>> Chunk: start = 860100755456, len = 1073741824, type = 1, num_stripes = 1
>>>        Stripes list:
>>>        [ 0] Stripe: devid = 1, offset = 26877100032
>>>        No block group.
>>>        No device extent.
>>>    Chunk: start = 861174497280, len = 1073741824, type = 1,
>>> num_stripes = 1
>>>        Stripes list:
>>>        [ 0] Stripe: devid = 1, offset = 27950841856
>>>        No block group.
>>>        No device extent.
>>>
>>> Total Chunks:   333
>>>    Heathy:       305
>>>    Bad:  28
>>>
>>> Orphan Block Groups:
>>>    Block Group: start = 872985657344, len = 1073741824, flag = 4
>>>    Block Group: start = 911673917440, len = 33554432, flag = 2
>>>    Block Group: start = 911707471872, len = 1073741824, flag = 4
>>>
>>> Orphan Device Extents:
>>>    Device extent: devid = 2, start = 2182086656, len = 33554432, chunk
>>> offset = 911673917440
>>>    Device extent: devid = 2, start = 2215641088, len = 1073741824,
>>> chunk offset = 911707471872
>>> Fail to recover the chunk tree.
>>> <</snipped>>
>>>
>>> Here's the full snipped paste: http://pastebin.com/fEm3Gup7
>>>
>>> Now I'm on openSUSE Tumbleweed (kernel 3.17). Still get the same
>>> result from 'chunk-recover'. There's 305 healthy chunks, is there
>>> anyway to recover that data and forget about the bad ones?
>>>
>>> A good portion of the data on that drive was backed up, but some
>>> wasn't. My fault, I've learned. Can I get anything back from that
>>> drive?
>>>
>>> Thanks
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-29 23:07   ` Robert White
@ 2014-10-30 13:30     ` Zack Coffey
  2014-10-30 15:23       ` Zygo Blaxell
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Zack Coffey @ 2014-10-30 13:30 UTC (permalink / raw)
  To: linux-btrfs

Rob, That second drive was immediately put to use elsewhere. I figured 
having only the metadata on that drive, it wouldn't matter. The data 
stayed single and wasn't part of the second drive, only the metadata 
was. I must not be capable of understanding why that wouldn't work.

I thought all I was doing was removing a duplication of metadata and the 
worst I would see is a message complaining about a drive missing. Never 
thought the data or access to it could be compromised in what seemed to 
be a simple situation.

Anand, I get the same output with mount -o recovery,ro.

On 10/29/2014 7:07 PM, Robert White wrote:
> On 10/29/2014 03:26 PM, Robert White wrote:
>> On 10/28/2014 01:32 PM, Zack Coffey wrote:
>>> Made a RAID1 with another drive of just the metadata. Was in
>>> that state for less than 12 hours-ish, removed the second drive and
>>> now cannot get to any data on the original drive. Data remained single
>>> while only metadata was RAID1.
>>
>> I don't know all the details but I would _never_ suspect the action you
>> described to _not_ hose up the file system.
>> You need to put the second drive back in and then coerce all the data
>> back to the first drive. "btrfs device delete" is what you want. You
>> _may_ need to switch the metadata back to "single" before the delete.
>>
>> --Rob.
>>
>
> P.S. I am/was assuming you said "removed the second drive" in the 
> normal sense of disconnecting and removing, as opposed to the semantic 
> action of deleting the device element.
>
> If you did do the btrfs delete, you might have needed to do a "btrfs 
> filesystem sync" to make sure that all the transactions involved in 
> the delete were finished and flushed to disk.
>
> Either way, physically reattaching "the second drive" is your first 
> step; presuming again that you haven't destroyed the partition or 
> re-used the drive etc. If the partition will mount once the second 
> drive is in place, do the delete operation (if you didn't) and then 
> the sync (to make sure that everything has finished migrating etc). 
> Then you should be able to re-remove the physical drive.
>
> If you already did the delete and sync as part of what you meant by 
> "remove" then sorry for the interruption of your misery. 8-)
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-30 13:30     ` Zack Coffey
@ 2014-10-30 15:23       ` Zygo Blaxell
  2014-10-30 18:04       ` Chris Murphy
  2014-10-31  8:35       ` Robert White
  2 siblings, 0 replies; 23+ messages in thread
From: Zygo Blaxell @ 2014-10-30 15:23 UTC (permalink / raw)
  To: Zack Coffey; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4489 bytes --]

On Thu, Oct 30, 2014 at 09:30:46AM -0400, Zack Coffey wrote:
> Rob, That second drive was immediately put to use elsewhere. I
> figured having only the metadata on that drive, it wouldn't matter.
> The data stayed single and wasn't part of the second drive, only the
> metadata was. I must not be capable of understanding why that
> wouldn't work.
> 
> I thought all I was doing was removing a duplication of metadata and
> the worst I would see is a message complaining about a drive
> missing. 

There is a check at mount time that counts the number of disks and the
worst-case maximum number of missing disks for each profile.  That count
says "you have data in single profile, and single profile cannot lose
any disks, and you are missing one disk."  It doesn't check _where_
the data is on the disks.

The check prevents read-write mounting (and therefore balance, adding or
removing drives, resizing the filesystem so you can move the data to a
new LV on the same disk, btrfs send/receive, and any other way you could
fix the filesystem in-place).  As far as I know there is currently no
way to recover from this without writing some new code.  Your filesystem
is now permanently read-only.

If you wrote anything to the filesystem while the second disk was present,
it could be written to the second disk.  Since your data profile was
single, there would be only one copy of that new data on the disk that
is now missing.

You should be able to retrieve most of the data.  Mount the filesystem
read-only (options ro,degraded) and rsync the surviving data to a
new filesystem.  If you have default options with checksums enabled,
rsync will report I/O errors on the missing blocks, so you can make a
note of which files are affected and must be replaced from backups.

> Never thought the data or access to it could be compromised
> in what seemed to be a simple situation.

The simple situation is when *all* your chunks are RAID1, not just the
metadata.  RAID1 does work--I've had to RMA two disks in two btrfs RAID1
arrays _this week alone_ and btrfs is fine with them.

You have a filesystem with a mixture of chunk profiles with different
redundancy levels.  That situation is not simple and will not tolerate
a missing disk.  Your filesystem is now probably genuinely broken, and
you have probably lost some data forever.  Redundant metadata will allow
you to determine with certainty what data you have lost.

> Anand, I get the same output with mount -o recovery,ro.
> 
> On 10/29/2014 7:07 PM, Robert White wrote:
> >On 10/29/2014 03:26 PM, Robert White wrote:
> >>On 10/28/2014 01:32 PM, Zack Coffey wrote:
> >>>Made a RAID1 with another drive of just the metadata. Was in
> >>>that state for less than 12 hours-ish, removed the second drive and
> >>>now cannot get to any data on the original drive. Data remained single
> >>>while only metadata was RAID1.
> >>
> >>I don't know all the details but I would _never_ suspect the action you
> >>described to _not_ hose up the file system.
> >>You need to put the second drive back in and then coerce all the data
> >>back to the first drive. "btrfs device delete" is what you want. You
> >>_may_ need to switch the metadata back to "single" before the delete.
> >>
> >>--Rob.
> >>
> >
> >P.S. I am/was assuming you said "removed the second drive" in the
> >normal sense of disconnecting and removing, as opposed to the
> >semantic action of deleting the device element.
> >
> >If you did do the btrfs delete, you might have needed to do a
> >"btrfs filesystem sync" to make sure that all the transactions
> >involved in the delete were finished and flushed to disk.
> >
> >Either way, physically reattaching "the second drive" is your
> >first step; presuming again that you haven't destroyed the
> >partition or re-used the drive etc. If the partition will mount
> >once the second drive is in place, do the delete operation (if you
> >didn't) and then the sync (to make sure that everything has
> >finished migrating etc). Then you should be able to re-remove the
> >physical drive.
> >
> >If you already did the delete and sync as part of what you meant
> >by "remove" then sorry for the interruption of your misery. 8-)
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-30 13:30     ` Zack Coffey
  2014-10-30 15:23       ` Zygo Blaxell
@ 2014-10-30 18:04       ` Chris Murphy
  2014-10-31  1:27         ` Duncan
  2014-10-31  8:35       ` Robert White
  2 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2014-10-30 18:04 UTC (permalink / raw)
  To: Zack Coffey; +Cc: linux-btrfs


On Oct 30, 2014, at 7:30 AM, Zack Coffey <tech42.clickwir@gmail.com> wrote:

> Rob, That second drive was immediately put to use elsewhere. I figured having only the metadata on that drive, it wouldn't matter. The data stayed single and wasn't part of the second drive, only the metadata was. I must not be capable of understanding why that wouldn't work.

single profile means all devices get btrfs chunks. If you do something like:

# mkfs.btfs /dev/sda

By default you get data profile = single, and metadata profile = DUP. If you then

# btrfs add /dev/sdb /mnt/btrfs
# btrfs -mconvert=raid1  /mnt/btrfs

you now have a volume that is data = single and metadata = raid1. The way the allocator works is that it will first allocate 1GiB data chunks to the device with the most free space remaining, so if that's /dev/sdb it will allocate 1GiB data chunks there until free space is the same for both sda and sdb. And once they have equal space btrfs allocates 1GiB data chunks alternating sda and sdb.

Single profile doesn't mean only one device gets data chunks. Is that the misunderstanding?


Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-30 18:04       ` Chris Murphy
@ 2014-10-31  1:27         ` Duncan
  2014-10-31  2:09           ` Chris Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Duncan @ 2014-10-31  1:27 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Thu, 30 Oct 2014 12:04:48 -0600 as excerpted:

>> Rob, That second drive was immediately put to use elsewhere. I figured
>> having only the metadata on that drive, it wouldn't matter. The data
>> stayed single and wasn't part of the second drive, only the metadata
>> was. I must not be capable of understanding why that wouldn't work.
> 
> single profile means all devices get btrfs chunks. If you do something
> like:
> 
> # mkfs.btfs /dev/sda
> 
> By default you get data profile = single, and metadata profile = DUP. If
> you then
> 
> # btrfs add /dev/sdb /mnt/btrfs # btrfs -mconvert=raid1  /mnt/btrfs
> 
> you now have a volume that is data = single and metadata = raid1. The 
way
> the allocator works is that it will first allocate 1GiB data chunks to 
the
> device with the most free space remaining, so if that's /dev/sdb it will
> allocate 1GiB data chunks there until free space is the same for both 
sda
> and sdb. And once they have equal space btrfs allocates 1GiB data chunks
> alternating sda and sdb.
> 
> Single profile doesn't mean only one device gets data chunks. Is that 
the
> misunderstanding?

Just what I was going to say.

Single profile means there's just one copy, and the normal chunk 
allocation algorithm allocates new chunks from the device with the most 
space left.  The freshly added device almost certainly had the most space 
left, so as soon as the existing data chunk was full, probably all new 
data chunks were allocated from the freshly added device.

Then you (OP) go and kill that device as far as btrfs is concerned, and 
all that new data written to it in single mode is ONLY on it because it 
was written in single mode, so now it has suddenly disappeared!

No WONDER that filesystem's having problems!

Even old files, due to copy on write, would have likely been partially or 
entirely moved to the new chunks on the new device if they were updated 
during the time the filesystem was mounted writable with the new device 
there.  And some desktop environments, for instance, have a habit of 
rewriting what might well be the exact same thing back to their config 
files, every time they shutdown, or sometimes as soon as they read it in 
at startup, as well.  So much of your desktop config may well have been 
on that second device, even if you didn't actively change any config 
during that time.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-31  1:27         ` Duncan
@ 2014-10-31  2:09           ` Chris Murphy
  2014-11-02  4:26             ` Robert White
  0 siblings, 1 reply; 23+ messages in thread
From: Chris Murphy @ 2014-10-31  2:09 UTC (permalink / raw)
  To: Btrfs BTRFS


On Oct 30, 2014, at 7:27 PM, Duncan <1i5t5.duncan@cox.net> wrote:

> Chris Murphy posted on Thu, 30 Oct 2014 12:04:48 -0600 as excerpted:
> 
>>> Rob, That second drive was immediately put to use elsewhere. I figured
>>> having only the metadata on that drive, it wouldn't matter. The data
>>> stayed single and wasn't part of the second drive, only the metadata
>>> was. I must not be capable of understanding why that wouldn't work.
>> 
>> single profile means all devices get btrfs chunks. If you do something
>> like:
>> 
>> # mkfs.btfs /dev/sda
>> 
>> By default you get data profile = single, and metadata profile = DUP. If
>> you then
>> 
>> # btrfs add /dev/sdb /mnt/btrfs # btrfs -mconvert=raid1 /mnt/btrfs

          ^device


> 
> Then you (OP) go and kill that device as far as btrfs is concerned, and 
> all that new data written to it in single mode is ONLY on it because it 
> was written in single mode, so now it has suddenly disappeared!
> 
> No WONDER that filesystem's having problems!

So what *is* possible in this case is still mount -o degraded,ro and try to extract what he can from the remaining device. 
> 
> Even old files, due to copy on write, would have likely been partially or 
> entirely moved to the new chunks on the new device if they were updated 
> during the time the filesystem was mounted writable with the new device 
> there.  And some desktop environments, for instance, have a habit of 
> rewriting what might well be the exact same thing back to their config 
> files, every time they shutdown, or sometimes as soon as they read it in 
> at startup, as well.  So much of your desktop config may well have been 
> on that second device, even if you didn't actively change any config 
> during that time.

Is hard to say. If a balance hasn't recently been done, the original device may have a good amount of free space in allocated chunks. I'm pretty sure Btrfs will write first to already allocated chunks with free space before allocating new chunks? So a bunch of stuff could actually still be on the original device - it just needs -o degraded,ro to get access to it. And if the current version of the file isn't retrievable because all or part of it's on the other drive, it'll just cause an error to occur. I think rsync and the like can be set to not fail on such errors, so anything that can be retrieved, is.


Chris Murphy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-30 13:30     ` Zack Coffey
  2014-10-30 15:23       ` Zygo Blaxell
  2014-10-30 18:04       ` Chris Murphy
@ 2014-10-31  8:35       ` Robert White
  2014-10-31 12:15         ` Zack Coffey
  2 siblings, 1 reply; 23+ messages in thread
From: Robert White @ 2014-10-31  8:35 UTC (permalink / raw)
  To: Zack Coffey, linux-btrfs

On 10/30/2014 06:30 AM, Zack Coffey wrote:
> Rob, That second drive was immediately put to use elsewhere. I figured
> having only the metadata on that drive, it wouldn't matter. The data
> stayed single and wasn't part of the second drive, only the metadata
> was. I must not be capable of understanding why that wouldn't work.
>
> I thought all I was doing was removing a duplication of metadata and the
> worst I would see is a message complaining about a drive missing. Never
> thought the data or access to it could be compromised in what seemed to
> be a simple situation.
>
> Anand, I get the same output with mount -o recovery,ro.

Your data is gone if your other drive is gone.

Single doesn't mean what you think it means. Single means "one single 
copy of your data", but it has _nothing_ to do with "one single drive". 
That would mean that after a "btrfs device add" the default would be to 
never, ever, use that added drive.

So RAID0 means "striped", so there are chunks, then chunk=0 is on 
drive=0 at offset zero. Chunk=1 is on drive=1 at offset zero. (where 
there are N drives.) Chunk=N is on drive=N at offset zero. Chunk=N+1 is 
on drive=0 at offset Chunk_Size+1. And so on.

Concatenation is that drive=N follows drive=N-1 at offset 
sum(sizeofeach(all drives less than N)). So Byte=0 is on drive=0 at 
offset0; and Byte=(sizeof drive0) is on drive=1 at byte=0.

The RAID standard never addressed bulk concatenation, so there is no 
"raid-number" for the one whole drive after another. BTRFS uses 
"single", others use other words.

So if you had a 100G drive, and you added a second 100G drive, you'd 
have a logically 200G drive, where the first 100G is on drive one, and 
the second is on drive two.

You've basically obliterated the second half of the filesystem storage 
when you physically removed the drive without semantically removing it 
first. Might as well have erased it with a magnet, and all the data with 
it. Worse still, if you did any sort of balance or defrag you likely 
moved huge numbers of "the _single_ copy of your data" clusters onto 
that other device.

So the layout option isn't about limiting storage, that wouldn't make 
sense, that's what device add/delete is about. Its about how the data is 
laid out across all the drives.

All those unreachable addresses are on that now-defunct drive. No mount 
option will ever get you that data back.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-31  8:35       ` Robert White
@ 2014-10-31 12:15         ` Zack Coffey
  2014-11-02  4:19           ` Robert White
  0 siblings, 1 reply; 23+ messages in thread
From: Zack Coffey @ 2014-10-31 12:15 UTC (permalink / raw)
  To: Robert White, linux-btrfs

Sadly I think I understand now.

So by adding the second drive, BTRFS saw it as an extension of data (ala 
JBOD-ish?). Even though I thought I was only adding RAID1 for metadata, 
was also adding to the data storage.

I assume that even though chunk-recover reports healthy chunks, there's 
little to no way to actually get them?


On 10/31/2014 4:35 AM, Robert White wrote:
> On 10/30/2014 06:30 AM, Zack Coffey wrote:
>> Rob, That second drive was immediately put to use elsewhere. I figured
>> having only the metadata on that drive, it wouldn't matter. The data
>> stayed single and wasn't part of the second drive, only the metadata
>> was. I must not be capable of understanding why that wouldn't work.
>>
>> I thought all I was doing was removing a duplication of metadata and the
>> worst I would see is a message complaining about a drive missing. Never
>> thought the data or access to it could be compromised in what seemed to
>> be a simple situation.
>>
>> Anand, I get the same output with mount -o recovery,ro.
>
> Your data is gone if your other drive is gone.
>
> Single doesn't mean what you think it means. Single means "one single 
> copy of your data", but it has _nothing_ to do with "one single 
> drive". That would mean that after a "btrfs device add" the default 
> would be to never, ever, use that added drive.
>
> So RAID0 means "striped", so there are chunks, then chunk=0 is on 
> drive=0 at offset zero. Chunk=1 is on drive=1 at offset zero. (where 
> there are N drives.) Chunk=N is on drive=N at offset zero. Chunk=N+1 
> is on drive=0 at offset Chunk_Size+1. And so on.
>
> Concatenation is that drive=N follows drive=N-1 at offset 
> sum(sizeofeach(all drives less than N)). So Byte=0 is on drive=0 at 
> offset0; and Byte=(sizeof drive0) is on drive=1 at byte=0.
>
> The RAID standard never addressed bulk concatenation, so there is no 
> "raid-number" for the one whole drive after another. BTRFS uses 
> "single", others use other words.
>
> So if you had a 100G drive, and you added a second 100G drive, you'd 
> have a logically 200G drive, where the first 100G is on drive one, and 
> the second is on drive two.
>
> You've basically obliterated the second half of the filesystem storage 
> when you physically removed the drive without semantically removing it 
> first. Might as well have erased it with a magnet, and all the data 
> with it. Worse still, if you did any sort of balance or defrag you 
> likely moved huge numbers of "the _single_ copy of your data" clusters 
> onto that other device.
>
> So the layout option isn't about limiting storage, that wouldn't make 
> sense, that's what device add/delete is about. Its about how the data 
> is laid out across all the drives.
>
> All those unreachable addresses are on that now-defunct drive. No 
> mount option will ever get you that data back.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-31 12:15         ` Zack Coffey
@ 2014-11-02  4:19           ` Robert White
  0 siblings, 0 replies; 23+ messages in thread
From: Robert White @ 2014-11-02  4:19 UTC (permalink / raw)
  To: zcoffey, linux-btrfs

On 10/31/2014 05:15 AM, Zack Coffey wrote:
> Sadly I think I understand now.
>
> So by adding the second drive, BTRFS saw it as an extension of data (ala
> JBOD-ish?). Even though I thought I was only adding RAID1 for metadata,
> was also adding to the data storage.
>
> I assume that even though chunk-recover reports healthy chunks, there's
> little to no way to actually get them?

Yes.

The chunks are "good" in that they are well defined, but in your case 
they point to a place that no-longer exists. Sort of like if you took 
the card catalog out of a library and then burned down the library. The 
catalog is still correct, it just no longer has any books to back it up. 
Or more correctly, you bought a second building, moved half of your 
books over there, made a complete copy of the card catalog, put that in 
the second building... and then burned that second building down. So the 
copy of the card catalog is still valid, but half of the books have been 
burned.

You are making a couple of problematic assumptions about what terms 
mean, and what level of abstractions they involve, that may mess you up 
going forward. Here's a "quick" re-primer.

JBOD == Just a Box Of Disks. This is just a designation for putting 
disks in a computer without any special hardware. That is, when you put 
disks in your computer it's JBOD. It only stops being JBOD when you add 
_dedicated_ hardware controllers for things like RAID operation. This 
designation puts it in contrast to dedicated storage systems of much 
higher complexity that are available from specialty manufacturers, such 
as IBM DASD, which stands for (Direct Access Storage Device), a NAS 
(network attached storage) server, or a hardware RAID solution from 
someone like SUN.

RAID == Redundant Array of Inexpensive Disks. The reason "striping" is 
"RAID-0" is that there is no redundancy in that layout. The zero 
definition was created after the original RAID-1 through 3 (or 4?) and 
before 5 and 6.

Pure concatenation was already well known before the whole attempt to 
standardize how to think about and implement the more complex layouts. 
Pure concatenation is how, for instance, one would zip a bunch of stuff 
onto successive floppies. It's also how adding banks of ram worked 
before memory controlers and line-fetch interleaving and all that. Its 
the "longer tape is more storage, second tape is even more storage module".

(They didn't make a "RAID minus 1" designation for concatenation as that 
was getting absurd).

So every linux system you will ever build that has more than zero disks 
(or equivalent slow storage like SSDs) that doesn't have special 
dedicated storage processors is a JBOD.

A Hardware RAID is typically an dedicated appliance with storage 
elements (usually disks, often pricey) that are often matched by size 
and transfer dynamics, and often backed by a substantial block of 
non-volatile or battery-backup-powered RAM that will survive 
reboots/crashes in such a way as to be considered "nonvolatile" over a 
reasonable period of time etc. E.g. it's not _Just_ a box of inexpensive 
disks.

(Disclaimer: arguable statements follow...)

BTRFS is _not_ a RAID at all. Nor is it a storage management system. 
BTRFS is a file system that _can_ selectively implement various RAID 
layout modes and can operate without a separate storage management system.

So a "real storage management system", such as Logical Volume Manager 
does things in layers. In LVM, for instance, to make a RAID Volume, I 
have to adopt the physical storage (lvm pv* commands) associate it with 
its peers (lvm vg* commands) and then create logical volumes (lvm lv* 
commands).

In a "real RAID management system", such as with mdadm, I have to match 
the partitioning or media sizes and then join them into the semantic 
array layouts. That is, I have to design the layout, and pre-match the 
storage "with deliberate intent" before bringing the storage into the 
mix. For instance if I "make a RAID-5 device" the RAID-ness exists 
"before" the storage, at least in concept.

For Example:

mdadm --create md23 --level=raid5 --raid-devices=4 /dev/sda /dev/sdb 
/dev/sdc /dev/sdd

The raid "comes to exist" as /dev/md23
It is given a personality of type raid5
It is given a geometry of four devices
Then that entity is _imposed_ on each of four drives.

Now in practical terms this happens all at once, but in terms of intent 
and design it is in a strict order of declaration. And because I did it 
all at once I didn't have to specify the size of the array or the sizes 
of the chunks of the array. The program got to "peek ahead" at the media 
and back-figure the size and such.

Compare this to what you did with BTRFS.

You made a file system on a storage device.
Then you said "here's some more space".
Then you said "hey file system, rearrange yourself to use this space, 
and while you are at it, go ahead and spread the metadata around as if 
it were a raid."

So the expansion of storage happened first, and separately, in the btrfs 
device add activity. The "balance" operation was a declaration of "don't 
just own the new space, figure out how best to use it."

You just also applied the metadata filter to say, by the way, I want a 
full copy of the metadata on both the old and the new spaces.

A non-trivial storage layout might have a number of disks, with a volume 
manager, an encryption manager, and an array manager, all layered to 
create an expanse of storage that a file system could _then_ be placed 
attop.

BTRFS is _way_ more flexible than mdadm. And it is way less into fixed 
boundaries. It can, for instance, change its mind about how things are 
laid out without having to go offline for a protracted period of time.

BTRFS' design philosophy seems built around the idea of being able to 
add non-volatile storage into a filesystem "naked" (unpartitioned), or 
add partitions of same at will, and have one layer of logic deal with 
the whole mess.

So BTRFS' ideas of RAID/single layout for medatada and data is not "disk 
centric" its pure semantics that are _aware_ of storage boundaries. 
That's why you can have, your metadata at a different RAID level than 
your data.

The idea is that you can take the dedicated layers that exist (such as 
dm-crypt or LVM) as you need them to manage space, but then not need to 
have the hard boundaries that complicate the semantic layout of the 
space if you don't want/need them.

The other systems are still important, for instance (absent hardware 
encryption) its _way_ more efficient to impose a RAID3, 4, 5, or 6 on a 
raw disk, then encrypt that raid, then put a filesystem on top of the 
encryption than it is to encrypt the multiple drives and then build 
those RAIDs above the encryption.

The TL;DR is that you have to be really careful about the semantic 
structures. A lot of the terms and ideas overlap at different layers. 
That means that the terms have a lot of slack in their meanings. Like 
when people talk about "the network", a lot hinges on what different 
people mean by words like "local".

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-10-31  2:09           ` Chris Murphy
@ 2014-11-02  4:26             ` Robert White
  2014-11-02  8:48               ` Roman Mamedov
  0 siblings, 1 reply; 23+ messages in thread
From: Robert White @ 2014-11-02  4:26 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

On 10/30/2014 07:09 PM, Chris Murphy wrote:
> Is hard to say. If a balance hasn't recently been done, the original device may have a good amount of free space in allocated chunks. I'm pretty sure Btrfs will write first to already allocated chunks with free space before allocating new chunks? So a bunch of stuff could actually still be on the original device - it just needs -o degraded,ro to get access to it. And if the current version of the file isn't retrievable because all or part of it's on the other drive, it'll just cause an error to occur. I think rsync and the like can be set to not fail on such errors, so anything that can be retrieved, is.

Way back when he said that he "set the metadata to raid1" (or the 
equivalent). That is, he didn't just rely on the default duplicaiton 
onto the second drive. The only way I know to explicitly do that is with 
the mfilter option to balance. So I'm pretty sure that, statistically, 
about half his data is gone.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-11-02  4:26             ` Robert White
@ 2014-11-02  8:48               ` Roman Mamedov
  2014-11-02 11:08                 ` Robert White
  0 siblings, 1 reply; 23+ messages in thread
From: Roman Mamedov @ 2014-11-02  8:48 UTC (permalink / raw)
  To: Robert White; +Cc: Chris Murphy, Btrfs BTRFS

On Sat, 01 Nov 2014 21:26:06 -0700
Robert White <rwhite@pobox.com> wrote:

> On 10/30/2014 07:09 PM, Chris Murphy wrote:
> > Is hard to say. If a balance hasn't recently been done, the original device may have a good amount of free space in allocated chunks. I'm pretty sure Btrfs will write first to already allocated chunks with free space before allocating new chunks? So a bunch of stuff could actually still be on the original device - it just needs -o degraded,ro to get access to it. And if the current version of the file isn't retrievable because all or part of it's on the other drive, it'll just cause an error to occur. I think rsync and the like can be set to not fail on such errors, so anything that can be retrieved, is.
> 
> Way back when he said that he "set the metadata to raid1" (or the 
> equivalent). That is, he didn't just rely on the default duplicaiton 
> onto the second drive. The only way I know to explicitly do that is with 
> the mfilter option to balance. So I'm pretty sure that, statistically, 
> about half his data is gone.

But afaik balance with -mconvert=raid1 (and no other filters specified)
shouldn't touch data at all, no?

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-11-02  8:48               ` Roman Mamedov
@ 2014-11-02 11:08                 ` Robert White
  2014-11-03  6:52                   ` Duncan
  2014-11-03  8:00                   ` Duncan
  0 siblings, 2 replies; 23+ messages in thread
From: Robert White @ 2014-11-02 11:08 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Chris Murphy, Btrfs BTRFS

On 11/02/2014 01:48 AM, Roman Mamedov wrote:
> But afaik balance with -mconvert=raid1 (and no other filters specified)
> shouldn't touch data at all, no?

Ah... you seem to be correct. 8-)

My reading was flawed. I thought that the absence of a -d filter implied 
an "unfiltered activity" (e.g. do it all) not no activity at all. That 
is I was reading those as "filter expressions" not "enable expressions 
with optional filters".

The manual is less than clear.

The usage output is more clear.

By the usage reading (and after checking the code):

Specifying no flags is asking to do everything. So "btrfs balance /path" 
looks at every chunk.

Specifying none of -d -m and -s is the same as specifying all of them, 
so "btrfs balance /path" and "btrfs balance -d -m -s /path" are the same 
operation.

Specifying any of -d -m and/or -s highlights the absence of those not 
specified.

Confusing bit, for example, from wiki

[QUOTE]
If you are getting out of space errors due to metadata being full, try

btrfs balance start -v -dusage=0 /mnt/btrfs
[/QUOTE]

Combined with "Balances only block groups with usage under the given 
percentage. "

Which I was reading -dusage=0 means don't bother with data chunks and 
(and so just fix the metadata), otherwise the mention of using a -d 
filter to affect metadata is perverse.

Blarg... I mean just... blarg...

But now I know. 8-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-11-02 11:08                 ` Robert White
@ 2014-11-03  6:52                   ` Duncan
  2014-11-03  8:00                   ` Duncan
  1 sibling, 0 replies; 23+ messages in thread
From: Duncan @ 2014-11-03  6:52 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Sun, 02 Nov 2014 03:08:46 -0800 as excerpted:

> Specifying none of -d -m and -s is the same as specifying all of them,
> so "btrfs balance /path" and "btrfs balance -d -m -s /path" are the same
> operation.

It's actually a bit more complicated than that.

The system type is a subset of the metadata type, so -m includes -s by 
implication.  Therefore, specifying -d -m is the same as specifying -d -s 
-m, is the same as not specifying any of them.

The normal way to balance system chunks is to balance metadata, which 
includes but is not limited to system chunks.  The -s makes it possible 
to do system chunks only, but to actually make it do it, you will need to 
pass -f, to force the balance, since normally you'd just balance 
metadata, and let system chunks be balanced as part of the metadata 
balance.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: RAID1 fails to recover chunk tree
  2014-11-02 11:08                 ` Robert White
  2014-11-03  6:52                   ` Duncan
@ 2014-11-03  8:00                   ` Duncan
  1 sibling, 0 replies; 23+ messages in thread
From: Duncan @ 2014-11-03  8:00 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Sun, 02 Nov 2014 03:08:46 -0800 as excerpted:

> Confusing bit, for example, from wiki
> 
> [QUOTE]
> If you are getting out of space errors due to metadata being full, try
> 
> btrfs balance start -v -dusage=0 /mnt/btrfs [/QUOTE]
> 
> Combined with "Balances only block groups with usage under the given
> percentage. "
> 
> Which I was reading -dusage=0 means don't bother with data chunks and
> (and so just fix the metadata), otherwise the mention of using a -d
> filter to affect metadata is perverse.
> 
> Blarg... I mean just... blarg...
> 
> But now I know. 8-)

If metadata is full and there's no unallocated space left from which to 
create new metadata chunks, then balancing metadata wouldn't do any good 
anyway.

Which is why you balance data chunks in that case.

The typical scenario is this.  Someone creates a btrfs and starts using 
it, creating files, deleting files, but over time, tending to create more 
files than they delete, so the space starts to fill up.

As they do so, btrfs allocates new data and metadata chunks on demand 
from the unallocated space.  Btrfs allocation and usage happens in two 
steps, unallocated space gets allocated to chunkspace, either data or 
metadata, and then that allocated chunkspace gets actually used for file 
data or metadata, depending on the chunk type.  Data chunks are 1 GiB 
each by default, while metadata chunks default to a quarter GiB each.

The critical bit to understand here is that while btrfs can automatically 
allocate both chunks and actual usage on demand, when it frees space, it 
can only automatically free actual usage, not the allocated chunks.  And 
it can't switch chunks from one type to the other.  To free the chunks 
back to unallocated so they can again be allocated on-demand to data or 
metadata as necessary, one must run a balance, which rewrites the chunks, 
consolidating as it goes, thereby freeing the excess allocated chunks if 
actual usage fits into less chunks than were previously allocated.

Picking up our typical scenario...  Then they delete a bunch of files, 
often the bigger ones, but the data tends to be much bigger than the 
metadata, so deleting these files frees up a lot of data chunk space but 
only a relatively little metadata chunk space.

Then they go writing files again, but on average smaller ones.  These 
smaller files take up less data space but the same amount of metadata 
space, so without a manual balance to reclaim allocated but mostly empty 
chunks, the limited metadata space freed by that big deletion gets filled 
up faster than the data space, and suddenly, people are getting ENOSPC 
errors when df says there's LOTS of space, because all that space is 
taken up by mostly empty data chunks, leaving no room to write new 
metadata chunks.

The scenario is similar to that of ext* running out of inodes (a type of 
metadata, after all) since it preallocates them at mkfs time, except that 
over time, the default number of inodes at a particular ext* filesystem 
size has been bumped up so that this seldom happens in practice any 
more.  But btrfs stores quite a bit more metadata per file, including 
checksums, and for small files, perhaps the entire file including the 
data, in which case it won't actually have a data extent, so oversizing 
btrfs metadata by a similar amount would mean wasting MUCH more space for 
the typical user.  And btrfs can automatically allocate data and metadata 
chunks on demand -- the catch is that it can't automatically unallocate 
chunks on demand[1], a balance is required for that, nor can it switch 
usage types on chunks once allocated.

In that scenario, it's metadata that's out, but to fix it you have to 
balance data, returning unused but allocated data chunks back to the 
unallocated space pool, so they can be allocated as metadata.

Which is why/how the -d (data) filter affects -m (metadata) -- by freeing 
mostly (or with the suggested -dusage=0, entirely[2]) empty data chunks 
back to unallocated so they can be reallocated as metadata chunks.

So call it perverse if you want to, but it's an entirely logical 
perversion![3] =:^)

Meanwhile, it's also possible, altho less common, to run into the 
opposite situation, out of data space, with metadata space left.  That's 
actually rather interesting, as you can create files and sometimes even 
write just a small bit of content into them, since small files are 
entirely stored within the metadata leaf and don't require a data 
allocation.  But as soon as you try to write anything of any significant 
size (a few KiB) to the new file, it'll ENOSPC when it tries to allocate 
a data extent and can't.

---
[1] Yet.  There's patches circulating that once thru discussion and 
merged, should let btrfs automatically handle at least the normal cases 
of data/metadata chunk imbalance.

[2] If there's actual data in a chunk, a balance must have at least 
enough space left in ordered to create at least one more chunk, so as to 
be able to do the rewrite.  But with a bit of luck, there's at least one 
chunk that's entirely empty, in which case usage=0 will free it without 
actually requiring space to create a new chunk to rewrite into, since 
there's nothing to rewrite.  That's why the usage=0.  If you're unlucky 
and there's no entirely empty chunks available for the balance to simply 
delete, then the usage=0 won't help.  That's where the suggestion to 
temporarily add another device of at least a few gigs comes in, the idea 
being to give balance enough room to rewrite a few chunks on the new 
device, thereby freeing the space they would have used on the original 
device(s).  Assuming an over-allocation, the balance should correct the 
problem, leaving enough space on the original device(s) so there's room 
to transfer the chunks back to the original device(s) using btrfs device 
delete <tmp-device>, and hopefully still leave some unallocated space 
left after that.

[3] Sort of like the (in)famous MS Windows perversion of having to hit 
the start button to stop...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-11-03  8:00 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-28 20:32 RAID1 fails to recover chunk tree Zack Coffey
2014-10-29  3:55 ` Anand Jain
2014-10-29 19:32   ` Zack Coffey
2014-10-30  3:33     ` Anand Jain
2014-10-29 22:26 ` Robert White
2014-10-29 23:07   ` Robert White
2014-10-30 13:30     ` Zack Coffey
2014-10-30 15:23       ` Zygo Blaxell
2014-10-30 18:04       ` Chris Murphy
2014-10-31  1:27         ` Duncan
2014-10-31  2:09           ` Chris Murphy
2014-11-02  4:26             ` Robert White
2014-11-02  8:48               ` Roman Mamedov
2014-11-02 11:08                 ` Robert White
2014-11-03  6:52                   ` Duncan
2014-11-03  8:00                   ` Duncan
2014-10-31  8:35       ` Robert White
2014-10-31 12:15         ` Zack Coffey
2014-11-02  4:19           ` Robert White
  -- strict thread matches above, loose matches on Subject: below --
2014-10-28 20:18 Zack Coffey
2014-10-27 19:01 Zack Coffey
2014-10-15 21:09 Zack Coffey
2014-10-15 15:42 Zack Coffey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox