BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
@ 2014-07-23  1:34 Chris Murphy
  2014-07-23  1:52 ` Chris Murphy
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Chris Murphy @ 2014-07-23  1:34 UTC (permalink / raw)
  To: Btrfs BTRFS

3.16.0-0.rc6.git0.1.fc21.1.x86_64
btfs-progs 3.14.2

Fortunately this is a test system so it is dispensable. But in just an hour I ran into 5 bugs, and managed to apparently completely destroy a btrfs file system beyond repair, and it wasn't intentional. 

1. mkfs.btrfs /dev/sda6  ## volume's life starts as single device, on an SSD
2. btrfs device add /dev/sdb1 /  ## added an HDD partition
3. btrfs balance start -dconvert=raid1 -mconvert=raid1
4. clean shutdown, remove device 1 (leaving device 0)
5. poweron, mount degraded
6. gdm/gnome comes up very slowly, then I see a sad face graphic, with a message that there's only 60MB of space left.

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda6        26G   13G   20M 100% /
/dev/sda6        26G   13G   20M 100% /home
/dev/sda6        26G   13G   20M 100% /var
/dev/sda6        26G   13G   20M 100% /boot

# btrfs fi df
Data, RAID1: total=6.00GiB, used=5.99GiB
System, RAID1: total=32.00MiB, used=32.00KiB
Metadata, RAID1: total=768.00MiB, used=412.41MiB
unknown, single: total=160.00MiB, used=0.00

# btrfs fi show
Label: 'Rawhide2'  uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977
	Total devices 2 FS bytes used 6.39GiB
	devid    1 size 12.58GiB used 6.78GiB path /dev/sda6
	*** Some devices missing

Btrfs v3.14.2

BUG 1: The df command is clearly bogus six ways to Sunday. It's a 12.58 GiB partition, only 6.78GiB used, thus 5.8GiB free, yet df and apparently gvfs think it's full, maybe systemd too because the journal wigged out and stopped logging events while also kept stopping and starting. So whatever changes occurred to clean up the df reporting, are very problematic at best when mounting degraded.

============so then he gets curious about replacing the missing disk==============

7. btrfs replace start 2 /dev/sdb1 /   ## this is a ~13GB partition that matches the size of the missing device

This completes, no disk activity for a little over a minute, and then I see a call trace with btrfs_replace implicated. Unfortunately the system becomes so unstable at this point, I can't even capture a dmesg to a separate volume. After 30 minutes of unresponsive local shells, I force a poweroff.

8. Power on. Dropped to a dracut shell, as the btrfs volume  will not mount:
[   53.890761] rawhide kernel: BTRFS: failed to read the system array on sda6
[   53.905058] rawhide kernel: BTRFS: open_ctree failed

9. mount with -o recovery, same message

10. Reboot using vbox pointed to these partitions as raw devices so I can better capture data, and not use a degraded fs as root; the devices are sdb and sdc.

# mount -o ro /dev/sdb /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

[  216.819927] BTRFS: failed to read the system array on sdc
[  216.835570] BTRFS: open_ctree failed

So it's the same message as in dracut shell. Same message with ro,recovery.

11.  mount -o degraded,ro /dev/sdb /mnt

This works. Somehow the replace hasn't completed on some level. Very weird. And not intuitive.

[root@localhost ~]# btrfs fi show
Label: 'Rawhide2'  uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977
	Total devices 2 FS bytes used 6.39GiB
	devid    0 size 12.58GiB used 6.78GiB path /dev/sdc
	devid    1 size 12.58GiB used 6.78GiB path /dev/sdb

Btrfs v3.14.2

Does not show any missing devices.  I vaguely recall in the dracut shell when booted baremetal that btrfs fi show did still show a missing devices along with the original and replacement devices, i.e. the replace didn't complete. I suspect that my 'btrfs replace start 2' is wrong, that devid 2 did not exist, it was actually devid 0 and 1 like above; but the problem is that btrfs fi show does not show devid for missing devices. I only saw the devid 1 for the remaining device, and assumed the missing one was 2. So that's why I did 'btrfs replace start 2' yet I didn't get an error message. The replace started, but apparently didn't complete.

BUG 2: btrfs fi show needs to show the devid of the missing device.
BUG 3: btrfs replace start should fail when specifying a non-existent devid.
BUG 4: btrfs replace start can fail to complete (possibly related to bug 2 and 3). 

BUG 4: When mounting -degraded (rw), I get a major oops resulting in a completely unresponsive system.

# mount -o degraded /dev/sdb /mnt

[   16.466995] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[   55.081687] BTRFS info (device sdb): allowing degraded mounts
[   55.082107] BTRFS info (device sdb): disk space caching is enabled
[   55.117702] SELinux: initialized (dev sdb, type btrfs), uses xattr
[   55.117717] BTRFS: continuing dev_replace from <missing disk> (devid 2) to /dev/sdc @72%
[   55.530810] BTRFS: dev_replace from <missing disk> (devid 2) to /dev/sdc) finished
[   55.532149] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
[   55.533087] IP: [<ffffffffa0268551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
[   55.533087] PGD 0 
[   55.533087] Oops: 0000 [#1] SMP 
[   55.533087] Modules linked in: cfg80211 rfkill btrfs snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device ppdev xor raid6_pq snd_pcm microcode snd_timer serio_raw parport_pc snd i2c_piix4 parport soundcore i2c_core xfs libcrc32c virtio_net virtio_pci virtio_ring ata_generic virtio pata_acpi
[   55.533087] CPU: 2 PID: 821 Comm: btrfs-devrepl Not tainted 3.16.0-0.rc6.git0.1.fc21.1.x86_64 #1
[   55.533087] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   55.533087] task: ffff880099b5eca0 ti: ffff88009983c000 task.ti: ffff88009983c000
[   55.533087] RIP: 0010:[<ffffffffa0268551>]  [<ffffffffa0268551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
[   55.533087] RSP: 0018:ffff88009983fe08  EFLAGS: 00010286
[   55.533087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: bbb3527a6b299586
[   55.533087] RDX: ffff880036b6e410 RSI: ffff88009b4a2800 RDI: ffff880035f6cac0
[   55.533087] RBP: ffff88009983fe10 R08: ffff880036b6e410 R09: 0000000000000234
[   55.533087] R10: ffffe8ffffd01090 R11: ffffffff818675c0 R12: ffff880099a2cdc8
[   55.533087] R13: ffff88009b4a2800 R14: ffff880099eaa000 R15: ffff880036acf200
[   55.533087] FS:  0000000000000000(0000) GS:ffff88009fb00000(0000) knlGS:0000000000000000
[   55.533087] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   55.533087] CR2: 0000000000000088 CR3: 000000009aefe000 CR4: 00000000000006e0
[   55.533087] Stack:
[   55.533087]  ffff880099a2c000 ffff88009983fe90 ffffffffa02bf93d ffff880099a2c100
[   55.533087]  ffff880099a2ce38 00000006baa50000 ffffffff00000028 ffff88009983fea0
[   55.533087]  ffff88009983fe58 000000002909d417 ffff880099a2c000 000000002909d417
[   55.533087] Call Trace:
[   55.533087]  [<ffffffffa02bf93d>] btrfs_dev_replace_finishing+0x32d/0x5c0 [btrfs]
[   55.533087]  [<ffffffffa02c0130>] ? btrfs_dev_replace_status+0x110/0x110 [btrfs]
[   55.533087]  [<ffffffffa02c019d>] btrfs_dev_replace_kthread+0x6d/0x130 [btrfs]
[   55.533087]  [<ffffffff810b311a>] kthread+0xea/0x100
[   55.533087]  [<ffffffff810b3030>] ? insert_kthread_work+0x40/0x40
[   55.533087]  [<ffffffff8172253c>] ret_from_fork+0x7c/0xb0
[   55.533087]  [<ffffffff810b3030>] ? insert_kthread_work+0x40/0x40
[   55.533087] Code: 5f 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 8b bf f0 09 00 00 48 85 ff 74 20 31 db 48 85 f6 74 14 48 8b 46 78 <48> 8b 80 88 00 00 00 48 8b 70 38 e8 2f 23 01 e1 89 d8 5b 5d c3 
[   55.533087] RIP  [<ffffffffa0268551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
[   55.533087]  RSP <ffff88009983fe08>
[   55.533087] CR2: 0000000000000088
[   55.533087] ---[ end trace a34670f31a1db59e ]---

[root@localhost ~]# btrfs check /dev/sdb
warning, device 2 is missing
warning devid 2 not found already
Checking filesystem on /dev/sdb
UUID: f857c336-b8f5-4f5d-9500-a705ee1b6977
checking extents
checking free space cache
Error reading 22597402624, -1
failed to load free space cache for block group 21619867648
Error reading 25839001600, -1
failed to load free space cache for block group 22693609472
free space inode generation (0) did not match free space cache generation (858)
Error reading 22597664768, -1
failed to load free space cache for block group 24841093120
Error reading 28045934592, -1
failed to load free space cache for block group 25914834944
Error reading 25849696256, -1
failed to load free space cache for block group 26988576768
Error reading 22595305472, -1
failed to load free space cache for block group 28095873024
Error reading 25688473600, -1
failed to load free space cache for block group 28364308480
checking fs roots
checking csums
checking root refs
found 1449851186 bytes used err is 0
total csum bytes: 6233932
total tree bytes: 432472064
total fs tree bytes: 415531008
total extent tree bytes: 9240576
btree space waste bytes: 68632283
file data blocks allocated: 10542505984
 referenced 8114642944
Btrfs v3.14.2

BUG 5:

# btrfs-image -c9 -t3 /dev/sdb image.bin
warning, device 2 is missing
warning devid 2 not found already
btrfs-image: disk-io.c:155: readahead_tree_block: Assertion `!(ret)' failed.
Aborted (core dumped)

Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
  2014-07-23  1:34 BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops Chris Murphy
@ 2014-07-23  1:52 ` Chris Murphy
  2014-07-23  2:36 ` [BUG] bogus out of space reported when mounted raid1 degraded Chris Murphy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Chris Murphy @ 2014-07-23  1:52 UTC (permalink / raw)
  To: Btrfs BTRFS

Interesting, if I remove /dev/sdc (the hdd), then this command works:

[root@localhost ~]# mount -o degraded,recovery /dev/sdb /mnt
[root@localhost ~]# btrfs replace status /mnt
72.1% done, 0 write errs, 0 uncorr. read errs
72.1% done, 0 write errs, 0 uncorr. read errs^C

## above command hangs, but cancels with control-c

[root@localhost ~]# btrfs replace cancel /mnt
[root@localhost ~]# btrfs replace status /mnt
Started on 22.Jul 16:10:37, canceled on 22.Jul 19:41:23 at 0.0%, 0 write errs, 0 uncorr. read errs
[root@localhost ~]# btrfs balance start -dconvert=single -mconvert=single /mnt -f
Done, had to relocate 10 out of 10 chunks
[root@localhost ~]# btrfs fi show
Label: 'Rawhide2'  uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977
	Total devices 2 FS bytes used 6.39GiB
	devid    1 size 12.58GiB used 7.03GiB path /dev/sdb
	*** Some devices missing

Btrfs v3.14.2
[root@localhost ~]# btrfs fi df /mnt
Data, single: total=6.00GiB, used=5.99GiB
System, single: total=32.00MiB, used=32.00KiB
Metadata, single: total=1.00GiB, used=412.00MiB
unknown, single: total=160.00MiB, used=0.00
[root@localhost ~]# btrfs device delete missing /mnt
[root@localhost ~]# btrfs fi show
Label: 'Rawhide2'  uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977
	Total devices 1 FS bytes used 6.39GiB
	devid    1 size 12.58GiB used 7.03GiB path /dev/sdb

Btrfs v3.14.2

So it's recovered and back to normal.


Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [BUG] bogus out of space reported when mounted raid1 degraded
  2014-07-23  1:34 BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops Chris Murphy
  2014-07-23  1:52 ` Chris Murphy
@ 2014-07-23  2:36 ` Chris Murphy
  2014-07-23  5:24   ` Duncan
  2014-07-23  2:52 ` BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops Eric Sandeen
  2014-07-23  3:01 ` Liu Bo
  3 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2014-07-23  2:36 UTC (permalink / raw)
  To: Btrfs BTRFS


On Jul 22, 2014, at 7:34 PM, Chris Murphy <lists@colorremedies.com> wrote:

> BUG 1: The df command is clearly bogus six ways to Sunday. It's a 12.58 GiB partition, only 6.78GiB used, thus 5.8GiB free, yet df and apparently gvfs think it's full, maybe systemd too because the journal wigged out and stopped logging events while also kept stopping and starting. So whatever changes occurred to clean up the df reporting, are very problematic at best when mounting degraded.

Used strace on df, think I found the problem so I put it all into a bug.

https://bugzilla.kernel.org/show_bug.cgi?id=80951


Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
  2014-07-23  1:34 BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops Chris Murphy
  2014-07-23  1:52 ` Chris Murphy
  2014-07-23  2:36 ` [BUG] bogus out of space reported when mounted raid1 degraded Chris Murphy
@ 2014-07-23  2:52 ` Eric Sandeen
  2014-07-23  3:28   ` Chris Murphy
  2014-07-23  3:01 ` Liu Bo
  3 siblings, 1 reply; 13+ messages in thread
From: Eric Sandeen @ 2014-07-23  2:52 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

This one (your bug #4) was likely caused by:

commit 99994cde9c59c2b8bb67d46d531b26cc73e39747
Author: Anand Jain <Anand.Jain@oracle.com>
Date:   Tue Jun 3 11:36:00 2014 +0800

    btrfs: dev delete should remove sysfs entry

and hopefully fixed by:

commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Mon Jul 7 12:34:49 2014 -0500

    btrfs: test for valid bdev before kobj removal in btrfs_rm_device

-Eric

On 7/22/14, 8:34 PM, Chris Murphy wrote:
> BUG 4: When mounting -degraded (rw), I get a major oops resulting in a completely unresponsive system.
> 
> # mount -o degraded /dev/sdb /mnt
> 
> [   16.466995] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
> [   55.081687] BTRFS info (device sdb): allowing degraded mounts
> [   55.082107] BTRFS info (device sdb): disk space caching is enabled
> [   55.117702] SELinux: initialized (dev sdb, type btrfs), uses xattr
> [   55.117717] BTRFS: continuing dev_replace from <missing disk> (devid 2) to /dev/sdc @72%
> [   55.530810] BTRFS: dev_replace from <missing disk> (devid 2) to /dev/sdc) finished
> [   55.532149] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
> [   55.533087] IP: [<ffffffffa0268551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
> [   55.533087] PGD 0 
> [   55.533087] Oops: 0000 [#1] SMP 
> [   55.533087] Modules linked in: cfg80211 rfkill btrfs snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device ppdev xor raid6_pq snd_pcm microcode snd_timer serio_raw parport_pc snd i2c_piix4 parport soundcore i2c_core xfs libcrc32c virtio_net virtio_pci virtio_ring ata_generic virtio pata_acpi
> [   55.533087] CPU: 2 PID: 821 Comm: btrfs-devrepl Not tainted 3.16.0-0.rc6.git0.1.fc21.1.x86_64 #1
> [   55.533087] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [   55.533087] task: ffff880099b5eca0 ti: ffff88009983c000 task.ti: ffff88009983c000
> [   55.533087] RIP: 0010:[<ffffffffa0268551>]  [<ffffffffa0268551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
> [   55.533087] RSP: 0018:ffff88009983fe08  EFLAGS: 00010286
> [   55.533087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: bbb3527a6b299586
> [   55.533087] RDX: ffff880036b6e410 RSI: ffff88009b4a2800 RDI: ffff880035f6cac0
> [   55.533087] RBP: ffff88009983fe10 R08: ffff880036b6e410 R09: 0000000000000234
> [   55.533087] R10: ffffe8ffffd01090 R11: ffffffff818675c0 R12: ffff880099a2cdc8
> [   55.533087] R13: ffff88009b4a2800 R14: ffff880099eaa000 R15: ffff880036acf200
> [   55.533087] FS:  0000000000000000(0000) GS:ffff88009fb00000(0000) knlGS:0000000000000000
> [   55.533087] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   55.533087] CR2: 0000000000000088 CR3: 000000009aefe000 CR4: 00000000000006e0
> [   55.533087] Stack:
> [   55.533087]  ffff880099a2c000 ffff88009983fe90 ffffffffa02bf93d ffff880099a2c100
> [   55.533087]  ffff880099a2ce38 00000006baa50000 ffffffff00000028 ffff88009983fea0
> [   55.533087]  ffff88009983fe58 000000002909d417 ffff880099a2c000 000000002909d417
> [   55.533087] Call Trace:
> [   55.533087]  [<ffffffffa02bf93d>] btrfs_dev_replace_finishing+0x32d/0x5c0 [btrfs]
> [   55.533087]  [<ffffffffa02c0130>] ? btrfs_dev_replace_status+0x110/0x110 [btrfs]
> [   55.533087]  [<ffffffffa02c019d>] btrfs_dev_replace_kthread+0x6d/0x130 [btrfs]
> [   55.533087]  [<ffffffff810b311a>] kthread+0xea/0x100
> [   55.533087]  [<ffffffff810b3030>] ? insert_kthread_work+0x40/0x40
> [   55.533087]  [<ffffffff8172253c>] ret_from_fork+0x7c/0xb0
> [   55.533087]  [<ffffffff810b3030>] ? insert_kthread_work+0x40/0x40
> [   55.533087] Code: 5f 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 8b bf f0 09 00 00 48 85 ff 74 20 31 db 48 85 f6 74 14 48 8b 46 78 <48> 8b 80 88 00 00 00 48 8b 70 38 e8 2f 23 01 e1 89 d8 5b 5d c3 
> [   55.533087] RIP  [<ffffffffa0268551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
> [   55.533087]  RSP <ffff88009983fe08>
> [   55.533087] CR2: 0000000000000088
> [   55.533087] ---[ end trace a34670f31a1db59e ]---


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
  2014-07-23  1:34 BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops Chris Murphy
                   ` (2 preceding siblings ...)
  2014-07-23  2:52 ` BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops Eric Sandeen
@ 2014-07-23  3:01 ` Liu Bo
  2014-07-23  3:21   ` Chris Murphy
  3 siblings, 1 reply; 13+ messages in thread
From: Liu Bo @ 2014-07-23  3:01 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Tue, Jul 22, 2014 at 07:34:58PM -0600, Chris Murphy wrote:
> 3.16.0-0.rc6.git0.1.fc21.1.x86_64
> btfs-progs 3.14.2
> 
> Fortunately this is a test system so it is dispensable. But in just an hour I ran into 5 bugs, and managed to apparently completely destroy a btrfs file system beyond repair, and it wasn't intentional. 
> 
> 
> 1. mkfs.btrfs /dev/sda6  ## volume's life starts as single device, on an SSD
> 2. btrfs device add /dev/sdb1 /  ## added an HDD partition
> 3. btrfs balance start -dconvert=raid1 -mconvert=raid1
> 4. clean shutdown, remove device 1 (leaving device 0)
> 5. poweron, mount degraded
> 6. gdm/gnome comes up very slowly, then I see a sad face graphic, with a message that there's only 60MB of space left.
> 
> # df -h
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/sda6        26G   13G   20M 100% /
> /dev/sda6        26G   13G   20M 100% /home
> /dev/sda6        26G   13G   20M 100% /var
> /dev/sda6        26G   13G   20M 100% /boot
> 
> # btrfs fi df
> Data, RAID1: total=6.00GiB, used=5.99GiB
> System, RAID1: total=32.00MiB, used=32.00KiB
> Metadata, RAID1: total=768.00MiB, used=412.41MiB
> unknown, single: total=160.00MiB, used=0.00
> 
> # btrfs fi show
> Label: 'Rawhide2'  uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977
> 	Total devices 2 FS bytes used 6.39GiB
> 	devid    1 size 12.58GiB used 6.78GiB path /dev/sda6
> 	*** Some devices missing
> 
> Btrfs v3.14.2
> 
> BUG 1: The df command is clearly bogus six ways to Sunday. It's a 12.58 GiB partition, only 6.78GiB used, thus 5.8GiB free, yet df and apparently gvfs think it's full, maybe systemd too because the journal wigged out and stopped logging events while also kept stopping and starting. So whatever changes occurred to clean up the df reporting, are very problematic at best when mounting degraded.
> 
> 
> 
> ============so then he gets curious about replacing the missing disk==============
> 
> 
> 7. btrfs replace start 2 /dev/sdb1 /   ## this is a ~13GB partition that matches the size of the missing device
> 
> This completes, no disk activity for a little over a minute, and then I see a call trace with btrfs_replace implicated. Unfortunately the system becomes so unstable at this point, I can't even capture a dmesg to a separate volume. After 30 minutes of unresponsive local shells, I force a poweroff.
> 
> 8. Power on. Dropped to a dracut shell, as the btrfs volume  will not mount:
> [   53.890761] rawhide kernel: BTRFS: failed to read the system array on sda6
> [   53.905058] rawhide kernel: BTRFS: open_ctree failed
> 
> 9. mount with -o recovery, same message
> 
> 10. Reboot using vbox pointed to these partitions as raw devices so I can better capture data, and not use a degraded fs as root; the devices are sdb and sdc.
> 
> # mount -o ro /dev/sdb /mnt
> mount: wrong fs type, bad option, bad superblock on /dev/sdb,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> 
> [  216.819927] BTRFS: failed to read the system array on sdc
> [  216.835570] BTRFS: open_ctree failed
> 
> So it's the same message as in dracut shell. Same message with ro,recovery.
> 
> 11.  mount -o degraded,ro /dev/sdb /mnt
> 
> This works. Somehow the replace hasn't completed on some level. Very weird. And not intuitive.
> 
> [root@localhost ~]# btrfs fi show
> Label: 'Rawhide2'  uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977
> 	Total devices 2 FS bytes used 6.39GiB
> 	devid    0 size 12.58GiB used 6.78GiB path /dev/sdc
> 	devid    1 size 12.58GiB used 6.78GiB path /dev/sdb
> 
> Btrfs v3.14.2
> 
> Does not show any missing devices.  I vaguely recall in the dracut shell when booted baremetal that btrfs fi show did still show a missing devices along with the original and replacement devices, i.e. the replace didn't complete. I suspect that my 'btrfs replace start 2' is wrong, that devid 2 did not exist, it was actually devid 0 and 1 like above; but the problem is that btrfs fi show does not show devid for missing devices. I only saw the devid 1 for the remaining device, and assumed the missing one was 2. So that's why I did 'btrfs replace start 2' yet I didn't get an error message. The replace started, but apparently didn't complete.
> 
> 
> BUG 2: btrfs fi show needs to show the devid of the missing device.
> BUG 3: btrfs replace start should fail when specifying a non-existent devid.
> BUG 4: btrfs replace start can fail to complete (possibly related to bug 2 and 3). 
> 
> BUG 4: When mounting -degraded (rw), I get a major oops resulting in a completely unresponsive system.
> 
> # mount -o degraded /dev/sdb /mnt
> 
> [   16.466995] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
> [   55.081687] BTRFS info (device sdb): allowing degraded mounts
> [   55.082107] BTRFS info (device sdb): disk space caching is enabled
> [   55.117702] SELinux: initialized (dev sdb, type btrfs), uses xattr
> [   55.117717] BTRFS: continuing dev_replace from <missing disk> (devid 2) to /dev/sdc @72%
> [   55.530810] BTRFS: dev_replace from <missing disk> (devid 2) to /dev/sdc) finished
> [   55.532149] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
> [   55.533087] IP: [<ffffffffa0268551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
> [   55.533087] PGD 0 
> [   55.533087] Oops: 0000 [#1] SMP 
> [   55.533087] Modules linked in: cfg80211 rfkill btrfs snd_intel8x0 snd_ac97_codec ac97_bus snd_seq snd_seq_device ppdev xor raid6_pq snd_pcm microcode snd_timer serio_raw parport_pc snd i2c_piix4 parport soundcore i2c_core xfs libcrc32c virtio_net virtio_pci virtio_ring ata_generic virtio pata_acpi
> [   55.533087] CPU: 2 PID: 821 Comm: btrfs-devrepl Not tainted 3.16.0-0.rc6.git0.1.fc21.1.x86_64 #1
> [   55.533087] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> [   55.533087] task: ffff880099b5eca0 ti: ffff88009983c000 task.ti: ffff88009983c000
> [   55.533087] RIP: 0010:[<ffffffffa0268551>]  [<ffffffffa0268551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
> [   55.533087] RSP: 0018:ffff88009983fe08  EFLAGS: 00010286
> [   55.533087] RAX: 0000000000000000 RBX: 0000000000000000 RCX: bbb3527a6b299586
> [   55.533087] RDX: ffff880036b6e410 RSI: ffff88009b4a2800 RDI: ffff880035f6cac0
> [   55.533087] RBP: ffff88009983fe10 R08: ffff880036b6e410 R09: 0000000000000234
> [   55.533087] R10: ffffe8ffffd01090 R11: ffffffff818675c0 R12: ffff880099a2cdc8
> [   55.533087] R13: ffff88009b4a2800 R14: ffff880099eaa000 R15: ffff880036acf200
> [   55.533087] FS:  0000000000000000(0000) GS:ffff88009fb00000(0000) knlGS:0000000000000000
> [   55.533087] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   55.533087] CR2: 0000000000000088 CR3: 000000009aefe000 CR4: 00000000000006e0
> [   55.533087] Stack:
> [   55.533087]  ffff880099a2c000 ffff88009983fe90 ffffffffa02bf93d ffff880099a2c100
> [   55.533087]  ffff880099a2ce38 00000006baa50000 ffffffff00000028 ffff88009983fea0
> [   55.533087]  ffff88009983fe58 000000002909d417 ffff880099a2c000 000000002909d417
> [   55.533087] Call Trace:
> [   55.533087]  [<ffffffffa02bf93d>] btrfs_dev_replace_finishing+0x32d/0x5c0 [btrfs]
> [   55.533087]  [<ffffffffa02c0130>] ? btrfs_dev_replace_status+0x110/0x110 [btrfs]
> [   55.533087]  [<ffffffffa02c019d>] btrfs_dev_replace_kthread+0x6d/0x130 [btrfs]
> [   55.533087]  [<ffffffff810b311a>] kthread+0xea/0x100
> [   55.533087]  [<ffffffff810b3030>] ? insert_kthread_work+0x40/0x40
> [   55.533087]  [<ffffffff8172253c>] ret_from_fork+0x7c/0xb0
> [   55.533087]  [<ffffffff810b3030>] ? insert_kthread_work+0x40/0x40
> [   55.533087] Code: 5f 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 8b bf f0 09 00 00 48 85 ff 74 20 31 db 48 85 f6 74 14 48 8b 46 78 <48> 8b 80 88 00 00 00 48 8b 70 38 e8 2f 23 01 e1 89 d8 5b 5d c3 
> [   55.533087] RIP  [<ffffffffa0268551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
> [   55.533087]  RSP <ffff88009983fe08>
> [   55.533087] CR2: 0000000000000088
> [   55.533087] ---[ end trace a34670f31a1db59e ]---

Looks like src_dev->bdev is NULL, and btrfs_kobj_rm_device() gets there.

thanks,
-liubo

> 
> 
> [root@localhost ~]# btrfs check /dev/sdb
> warning, device 2 is missing
> warning devid 2 not found already
> Checking filesystem on /dev/sdb
> UUID: f857c336-b8f5-4f5d-9500-a705ee1b6977
> checking extents
> checking free space cache
> Error reading 22597402624, -1
> failed to load free space cache for block group 21619867648
> Error reading 25839001600, -1
> failed to load free space cache for block group 22693609472
> free space inode generation (0) did not match free space cache generation (858)
> Error reading 22597664768, -1
> failed to load free space cache for block group 24841093120
> Error reading 28045934592, -1
> failed to load free space cache for block group 25914834944
> Error reading 25849696256, -1
> failed to load free space cache for block group 26988576768
> Error reading 22595305472, -1
> failed to load free space cache for block group 28095873024
> Error reading 25688473600, -1
> failed to load free space cache for block group 28364308480
> checking fs roots
> checking csums
> checking root refs
> found 1449851186 bytes used err is 0
> total csum bytes: 6233932
> total tree bytes: 432472064
> total fs tree bytes: 415531008
> total extent tree bytes: 9240576
> btree space waste bytes: 68632283
> file data blocks allocated: 10542505984
>  referenced 8114642944
> Btrfs v3.14.2
> 
> 
> BUG 5:
> 
> # btrfs-image -c9 -t3 /dev/sdb image.bin
> warning, device 2 is missing
> warning devid 2 not found already
> btrfs-image: disk-io.c:155: readahead_tree_block: Assertion `!(ret)' failed.
> Aborted (core dumped)
> 
> 
> Chris Murphy
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
  2014-07-23  3:01 ` Liu Bo
@ 2014-07-23  3:21   ` Chris Murphy
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Murphy @ 2014-07-23  3:21 UTC (permalink / raw)
  To: bo.li.liu, Eric Sandeen; +Cc: Btrfs BTRFS


On Jul 22, 2014, at 9:01 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
>> 
>> ============so then he gets curious about replacing the missing disk==============
>> 
>> 
>> 7. btrfs replace start 2 /dev/sdb1 /   ## this is a ~13GB partition that matches the size of the missing device
>> 
>> This completes, no disk activity for a little over a minute, and then I see a call trace with btrfs_replace implicated. Unfortunately the system becomes so unstable at this point, I can't even capture a dmesg to a separate volume. After 30 minutes of unresponsive local shells, I force a poweroff.

OK I've reproduced this original oops that causes the problem during device replace. The command above is correct, it is devid 2. Here's the trace that happens during rebuild. It's only slightly different than the -o rw,degraded trace. What I note is that it reports the device replace is finished, yet also at that time it barfs, probably before it finishes writing whatever's needed so that subsequent mounts can be done normally rather than with -o degraded.



[  423.512988] BTRFS: dev_replace from <missing disk> (devid 2) to /dev/sdb1 started
[  651.671835] BTRFS: dev_replace from <missing disk> (devid 2) to /dev/sdb1) finished
[  651.672485] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
[  651.673144] IP: [<ffffffffa03da551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
[  651.673834] PGD 8723b067 PUD 8723c067 PMD 0 
[  651.674512] Oops: 0000 [#1] SMP 
[  651.675184] Modules linked in: ccm xt_CHECKSUM ipt_MASQUERADE ip6t_rpfilter ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep nls_utf8 hfsplus arc4 b43 mac80211 x86_pkg_temp_thermal coretemp kvm_intel cfg80211 uvcvideo kvm ssb videobuf2_vmalloc iTCO_wdt crct10dif_pclmul videobuf2_memops videobuf2_core iTCO_vendor_support crc32_pclmul v4l2_common crc32c_intel videodev btusb ghash_clmulni_intel applesmc sdhci_pci input_polldev bluetooth media sdhci hid_appleir microcode bcm5974 rfkill mmc_core i2c_i801 bcma
[  651.677785]  snd_hda_codec_cirrus lpc_ich snd_hda_codec_generic mfd_core snd_hda_codec_hdmi sbs sbshc snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm mei_me snd_timer apple_gmux snd mei apple_bl shpchp soundcore firewire_sbp2 btrfs xor raid6_pq i915 ttm i2c_algo_bit drm_kms_helper tg3 drm firewire_ohci ptp firewire_core pps_core i2c_core crc_itu_t video
[  651.680756] CPU: 0 PID: 1443 Comm: btrfs Not tainted 3.16.0-0.rc6.git0.1.fc21.1.x86_64 #1
[  651.681816] Hardware name: Apple Inc. MacBookPro8,2/Mac-94245A3940C91C80, BIOS    MBP81.88Z.0047.B27.1201241646 01/24/12
[  651.682913] task: ffff8802546b62c0 ti: ffff880087254000 task.ti: ffff880087254000
[  651.684030] RIP: 0010:[<ffffffffa03da551>]  [<ffffffffa03da551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
[  651.685190] RSP: 0018:ffff880087257c80  EFLAGS: 00010286
[  651.686346] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dfc8a37487c2b3b9
[  651.687517] RDX: ffff88026061f810 RSI: ffff88026061ce00 RDI: ffff88026130d0c0
[  651.688705] RBP: ffff880087257c88 R08: ffff88026061f810 R09: 000000000000052e
[  651.689881] R10: ffff88026fa1cdc0 R11: 0000000000000001 R12: ffff88025f981dc8
[  651.691059] R13: ffff88026061ce00 R14: ffff88026174d800 R15: ffff880262b31800
[  651.692239] FS:  00007f5b0225f880(0000) GS:ffff88026fa00000(0000) knlGS:0000000000000000
[  651.693439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  651.694638] CR2: 0000000000000088 CR3: 000000003f2b5000 CR4: 00000000000407f0
[  651.695850] Stack:
[  651.697053]  ffff88025f981000 ffff880087257d08 ffffffffa043193d ffff88025f981100
[  651.698301]  ffff88025f981e38 0000000a3ea50000 00ff880200000000 ffff8802546b62c0
[  651.699556]  ffffffff810d7fa0 ffff880087257cc8 ffff880087257cc8 00000000547e2838
[  651.700824] Call Trace:
[  651.702099]  [<ffffffffa043193d>] btrfs_dev_replace_finishing+0x32d/0x5c0 [btrfs]
[  651.703397]  [<ffffffff810d7fa0>] ? abort_exclusive_wait+0xb0/0xb0
[  651.704714]  [<ffffffffa0431f52>] btrfs_dev_replace_start+0x382/0x450 [btrfs]
[  651.706048]  [<ffffffffa03faa8a>] btrfs_ioctl+0x1caa/0x28f0 [btrfs]
[  651.707379]  [<ffffffff811b4be6>] ? handle_mm_fault+0x8d6/0xfd0
[  651.708711]  [<ffffffff8105be2c>] ? __do_page_fault+0x29c/0x580
[  651.710038]  [<ffffffff81203187>] ? cp_new_stat+0x157/0x190
[  651.711361]  [<ffffffff81212100>] do_vfs_ioctl+0x2d0/0x4b0
[  651.712683]  [<ffffffff81212361>] SyS_ioctl+0x81/0xa0
[  651.714007]  [<ffffffff817225e9>] system_call_fastpath+0x16/0x1b
[  651.715332] Code: 5f 5d c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 8b bf f0 09 00 00 48 85 ff 74 20 31 db 48 85 f6 74 14 48 8b 46 78 <48> 8b 80 88 00 00 00 48 8b 70 38 e8 2f 03 ea e0 89 d8 5b 5d c3 
[  651.718262] RIP  [<ffffffffa03da551>] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
[  651.719725]  RSP <ffff880087257c80>
[  651.721180] CR2: 0000000000000088
[  651.722708] ---[ end trace 70672604d3ea5888 ]---


Chris Murphy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
  2014-07-23  2:52 ` BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops Eric Sandeen
@ 2014-07-23  3:28   ` Chris Murphy
  2014-07-23  3:36     ` Liu Bo
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2014-07-23  3:28 UTC (permalink / raw)
  To: Btrfs BTRFS; +Cc: bo.li.liu@oracle.com, Eric Sandeen


On Jul 22, 2014, at 8:52 PM, Eric Sandeen <sandeen@redhat.com> wrote:

> This one (your bug #4) was likely caused by:
> 
> commit 99994cde9c59c2b8bb67d46d531b26cc73e39747
> Author: Anand Jain <Anand.Jain@oracle.com>
> Date:   Tue Jun 3 11:36:00 2014 +0800
> 
>    btrfs: dev delete should remove sysfs entry
> 
> and hopefully fixed by:
> 
> commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9
> Author: Eric Sandeen <sandeen@redhat.com>
> Date:   Mon Jul 7 12:34:49 2014 -0500
> 
>    btrfs: test for valid bdev before kobj removal in btrfs_rm_device

OK good. Hopefully the first one is reverted or the second one is accepted before 3.16 is released, replace appears to be broken at the moment.


Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
  2014-07-23  3:28   ` Chris Murphy
@ 2014-07-23  3:36     ` Liu Bo
  2014-07-23  4:03       ` Chris Murphy
  2014-07-23  4:16       ` Eric Sandeen
  0 siblings, 2 replies; 13+ messages in thread
From: Liu Bo @ 2014-07-23  3:36 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS, Eric Sandeen

On Tue, Jul 22, 2014 at 09:28:52PM -0600, Chris Murphy wrote:
> 
> On Jul 22, 2014, at 8:52 PM, Eric Sandeen <sandeen@redhat.com> wrote:
> 
> > This one (your bug #4) was likely caused by:
> > 
> > commit 99994cde9c59c2b8bb67d46d531b26cc73e39747
> > Author: Anand Jain <Anand.Jain@oracle.com>
> > Date:   Tue Jun 3 11:36:00 2014 +0800
> > 
> >    btrfs: dev delete should remove sysfs entry
> > 
> > and hopefully fixed by:
> > 
> > commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9
> > Author: Eric Sandeen <sandeen@redhat.com>
> > Date:   Mon Jul 7 12:34:49 2014 -0500
> > 
> >    btrfs: test for valid bdev before kobj removal in btrfs_rm_device
> 
> OK good. Hopefully the first one is reverted or the second one is accepted before 3.16 is released, replace appears to be broken at the moment.
> 

Looks that they are not the same one, since you didn't use a btrfs_rm_device,

As we just skip adding a sysfs entry for a missing device(dev->bdev is NULL), we
can do the same thing in removing a sysfs entry, could you please try this?

-liubo

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 7869936..12e5355 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -614,7 +614,7 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info,
 	if (!fs_info->device_dir_kobj)
 		return -EINVAL;
 
-	if (one_device) {
+	if (one_device && one_device->bdev) {
 		disk = one_device->bdev->bd_part;
 		disk_kobj = &part_to_dev(disk)->kobj;


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
  2014-07-23  3:36     ` Liu Bo
@ 2014-07-23  4:03       ` Chris Murphy
  2014-07-23  4:16       ` Eric Sandeen
  1 sibling, 0 replies; 13+ messages in thread
From: Chris Murphy @ 2014-07-23  4:03 UTC (permalink / raw)
  To: Btrfs BTRFS; +Cc: Eric Sandeen, bo.li.liu@oracle.com


On Jul 22, 2014, at 9:36 PM, Liu Bo <bo.li.liu@oracle.com> wrote:

> On Tue, Jul 22, 2014 at 09:28:52PM -0600, Chris Murphy wrote:
>> 
>> On Jul 22, 2014, at 8:52 PM, Eric Sandeen <sandeen@redhat.com> wrote:
>> 
>>> This one (your bug #4) was likely caused by:
>>> 
>>> commit 99994cde9c59c2b8bb67d46d531b26cc73e39747
>>> Author: Anand Jain <Anand.Jain@oracle.com>
>>> Date:   Tue Jun 3 11:36:00 2014 +0800
>>> 
>>>   btrfs: dev delete should remove sysfs entry
>>> 
>>> and hopefully fixed by:
>>> 
>>> commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9
>>> Author: Eric Sandeen <sandeen@redhat.com>
>>> Date:   Mon Jul 7 12:34:49 2014 -0500
>>> 
>>>   btrfs: test for valid bdev before kobj removal in btrfs_rm_device
>> 
>> OK good. Hopefully the first one is reverted or the second one is accepted before 3.16 is released, replace appears to be broken at the moment.
>> 
> 
> Looks that they are not the same one, since you didn't use a btrfs_rm_device,
> 
> As we just skip adding a sysfs entry for a missing device(dev->bdev is NULL), we
> can do the same thing in removing a sysfs entry, could you please try this?

Normally yes, but not for a couple weeks this time. 

While replace cancel worked, and balance conversion back to single profile worked, I forgot to immediately device delete missing, and instead I rebooted. Now I can't mount degraded, and I run into this old bug:

[   71.064352] BTRFS info (device sdb): allowing degraded mounts
[   71.064812] BTRFS info (device sdb): enabling auto recovery
[   71.065210] BTRFS info (device sdb): disk space caching is enabled
[   71.072068] BTRFS warning (device sdb): devid 2 missing
[   71.097320] BTRFS: too many missing devices, writeable mount is not allowed
[   71.116616] BTRFS: open_ctree failed

Since I can't mount degraded rw I can't make read-only snapshots, and can't btrfs send receive the subvolumes, so this setup will need to be replaced. Not a big deal, just time, but maybe someone else can test it sooner than me.


Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops
  2014-07-23  3:36     ` Liu Bo
  2014-07-23  4:03       ` Chris Murphy
@ 2014-07-23  4:16       ` Eric Sandeen
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Sandeen @ 2014-07-23  4:16 UTC (permalink / raw)
  To: bo.li.liu, Chris Murphy; +Cc: Btrfs BTRFS

On 7/22/14, 10:36 PM, Liu Bo wrote:
> On Tue, Jul 22, 2014 at 09:28:52PM -0600, Chris Murphy wrote:
>>
>> On Jul 22, 2014, at 8:52 PM, Eric Sandeen <sandeen@redhat.com> wrote:
>>
>>> This one (your bug #4) was likely caused by:
>>>
>>> commit 99994cde9c59c2b8bb67d46d531b26cc73e39747
>>> Author: Anand Jain <Anand.Jain@oracle.com>
>>> Date:   Tue Jun 3 11:36:00 2014 +0800
>>>
>>>    btrfs: dev delete should remove sysfs entry
>>>
>>> and hopefully fixed by:
>>>
>>> commit 0bfaa9c5cb479cebc24979b384374fe47500b4c9
>>> Author: Eric Sandeen <sandeen@redhat.com>
>>> Date:   Mon Jul 7 12:34:49 2014 -0500
>>>
>>>    btrfs: test for valid bdev before kobj removal in btrfs_rm_device
>>
>> OK good. Hopefully the first one is reverted or the second one is accepted before 3.16 is released, replace appears to be broken at the moment.
>>
> 
> Looks that they are not the same one, since you didn't use a btrfs_rm_device,

Oh, you're right - I'm sorry, I didn't look closely enough.

-Eric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] bogus out of space reported when mounted raid1 degraded
  2014-07-23  2:36 ` [BUG] bogus out of space reported when mounted raid1 degraded Chris Murphy
@ 2014-07-23  5:24   ` Duncan
  2014-07-24  1:13     ` Chris Murphy
  0 siblings, 1 reply; 13+ messages in thread
From: Duncan @ 2014-07-23  5:24 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Tue, 22 Jul 2014 20:36:55 -0600 as excerpted:

> On Jul 22, 2014, at 7:34 PM, Chris Murphy <lists@colorremedies.com>
> wrote:
> 
>> BUG 1: The df command is clearly bogus six ways to Sunday. It's a 12.58
>> GiB partition, only 6.78GiB used, thus 5.8GiB free, yet df and
>> apparently gvfs think it's full, maybe systemd too because the journal
>> wigged out and stopped logging events while also kept stopping and
>> starting. So whatever changes occurred to clean up the df reporting,
>> are very problematic at best when mounting degraded.
> 
> Used strace on df, think I found the problem so I put it all into a bug.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=80951

Suggestion for improved bug summary/title:

Current:  df reports bogus filesystem usage when mounted degraded

Problems: While it's product file system and component btrfs...
1) the summary doesn't mention btrfs, and 
2) there's some ambiguity as to whether it's normal or btrfs df.

Proposed improved summaries/titles:

#1: (Non-btrfs) df reports bogus btrfs usage with degraded mount.

#2: With degraded btrfs mount, (non-btrfs) df reports bogus usage.

Meanwhile, to the problem at hand...

There are two root issues here:

The first is a variant of something already discussed in the FAQ and 
reasonably well known on the list: (non-btrfs) df is simply not accurate 
in many cases on a multi-device btrfs, because a multi-device btrfs 
breaks all the old rules and assumptions upon which it bases its 
reporting.  There has been some debate about how it should work, but the 
basic problem is that there's no way to present all the information 
necessary to get a proper picture of the situation while continuing to 
keep output format backward compatibility in ordered to prevent breaking 
the various scripts etc that depend on the existing format.

The best way forward seems to be some sort of at best half-broken 
compromise regarding legacy df output, maintaining backward output format 
compatibility and at least not breaking too badly in the legacy-
assumption single-device filesystem case, but not really working so well 
in all the various multi-device btrfs cases, because the output format is 
simply too constrained to present the necessary information properly.  
With some work, it should be possible to make at least the most common 
multi-device btrfs cases not /entirely/ broken as well, altho the old 
assumptions constrain output format such that there will always be corner-
cases that don't present well -- for these legacy df is just that, 
legacy, and a more appropriate tool is needed.

And a two-device btrfs raid1 mounted degraded with one device missing is 
just such a corner-case, at least presently.  Given the second root issue 
below, however, IMO the existing presentation was as accurate as could be 
expected under the circumstances.

The second half of the solution (still to root issue #1), then, is 
providing a more appropriate btrfs specific tool free of these legacy 
assumptions and output format constraints.  Currently, the solution there 
actually ships as two different reports which must be taken together to 
get a proper picture of the situation, currently with some additional 
interpretation required as well.  Of course I'm talking about btrfs 
filesystem show along with btrfs filesystem df.  

The biggest catch here is that "additional interpretation required" bit.  
There's a bit of it required in normal operation, but for the degraded-
mount case knowledge of root-issue #2 below is required for proper 
interpretation as well.

Which brings us to root-issue #2:

With btrfs raid1 the chunk-allocator policy forces allocation in pairs, 
with each chunk of the pair forced to a different device.

Since the btrfs in question is raid1 (both data and metadata) with two 
devices when undegraded, loss of a single device and degraded-mount means 
the above chunk allocation policy cannot succeed as there's no second 
device available to write the mirror-chunk to.

Note that the situation with a two-device raid1 but with one missing is 
rather different than with a three-device raid1 with one missing, as in 
the latter case and assuming there's still unallocated space left on all 
devices, a pair-chunk-allocation could still succeed, since it could 
still allocate one chunk-mirror on each of the two remaining devices.

The critical bit to understand here is that (AFAIK), degraded-mount does 
*NOT* trigger a chunk-allocation-policy waiver, which means that with a 
two-device btrfs raid1 with a device-missing, no additional chunks can be 
allocated as the pair-chunks-at-a-time-allocated-on-different-devices 
policy cannot be filled.

(Pardon my yelling, but this is the critical bit...)

** ON BTRFS RAID1, TWO DEVICES MUST BE PRESENT IN ORDERED TO ALLOCATE NEW 
CHUNKS.  MOUNTING DEGRADED WITH A SINGLE DEVICE MEANS NO NEW CHUNK 
ALLOCATION, WHICH MEANS YOU'RE LIMITED TO FILLING UP EXISTING CHUNKS **

Conclusions in light of the above, particularly root-issue #2:

Let's take another look at your btrfs fi df and btrfs fi show output from 
earlier in the thread:

>> # btrfs fi df
>> Data, RAID1: total=6.00GiB, used=5.99GiB
>> System, RAID1: total=32.00MiB, used=32.00KiB
>> Metadata, RAID1: total=768.00MiB, used=412.41MiB
>> unknown, single: total=160.00MiB, used=0.00
>> 
>> # btrfs fi show
>> Label: 'Rawhide2'  uuid: f857c336-b8f5-4f5d-9500-a705ee1b6977
>> 	Total devices 2 FS bytes used 6.39GiB
>> 	devid    1 size 12.58GiB used 6.78GiB path /dev/sda6
>> 	*** Some devices missing

Facts:

1) btrfs fi show says only a single device, tho it does have nearly 6 GiB 
of unallocated space left.

2) btrfs fi df says data is raid1, 5.99 GiB used, 6.00 GiB allocated.

3) 0.01 GiB free in the existing data allocation.

4) Can't allocate more since there's only a single device and the
raid1 data allocation policy requires two devices.

See the problem?

Under those circumstances, your (non-btrfs) df output...

>> # df -h
>> Filesystem      Size  Used Avail Use% Mounted on
>> /dev/sda6        26G   13G   20M 100% /

... is as accurate as could be expected under the definitely NOT routine, 
definitely rather corner-case due to degraded-mount, no further chunk 
allocation possible with current raid1 policy, circumstances.

Indeed, 20 M available is perhaps a bit more than the 0.01 GiB btrfs fi df 
is indicating, altho rounded to hundredths of a GiB, that's within 
acceptable rounding error.

Of course as reported in a different followup and as might be expected, 
with a rebalance -dconvert=single -mconvert=single, that pair-allocation-
policy gets converted to a single allocation policy, and you can again 
allocate additional chunks from that available nearly 6 GiB unallocated 
according to btrfs fi show.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] bogus out of space reported when mounted raid1 degraded
  2014-07-23  5:24   ` Duncan
@ 2014-07-24  1:13     ` Chris Murphy
  2014-07-24  8:15       ` Duncan
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Murphy @ 2014-07-24  1:13 UTC (permalink / raw)
  To: Btrfs BTRFS


On Jul 22, 2014, at 11:24 PM, Duncan <1i5t5.duncan@cox.net> wrote:

> Chris Murphy posted on Tue, 22 Jul 2014 20:36:55 -0600 as excerpted:
>> 
>> Used strace on df, think I found the problem so I put it all into a bug.
>> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=80951
> 
> Suggestion for improved bug summary/title:

So as it turns out this is not a bug, but ultimately a bigger problem.

When a btrfs volume is mounted degraded, df's "Available" appears to be showing free blocks in already allocated chunks, multiplied by 2 in the case of raid1.

> ** ON BTRFS RAID1, TWO DEVICES MUST BE PRESENT IN ORDERED TO ALLOCATE NEW 
> CHUNKS.  MOUNTING DEGRADED WITH A SINGLE DEVICE MEANS NO NEW CHUNK 
> ALLOCATION, WHICH MEANS YOU'RE LIMITED TO FILLING UP EXISTING CHUNKS **

I can confirm this behavior (see below).

## mounted degraded, a 13GB btrfs raid1 volume

# df -h
/dev/sdb                  26G   11G  6.1M 100% /mnt/braid
# btrfs fi df /mnt/braid
Data, RAID1: total=5.00GiB, used=5.00GiB
System, RAID1: total=8.00MiB, used=64.00KiB
Metadata, RAID1: total=1.00GiB, used=7.00MiB
unknown, single: total=64.00MiB, used=0.00

cp a 3.0M file to /mnt/braid
cp: error writing ‘/mnt/braid/four/cmurp_120916_124629.jpg’: No space left on device
cp: failed to extend ‘/mnt/braid/four/cmurp_120916_124629.jpg’: No space left on device

Next steps are:
- reconnect missing device and reboot
- mount btrfs volume normally
- retry the 3MB file copy, which works
- umount, disconnect device and reboot
- mount -o degraded

# df
/dev/sdb                 26382984 10519488   2077888  84% /mnt/braid

# df -h
/dev/sdb                  26G   11G  2.0G  84% /mnt/braid

# btrfs fi df /mnt/braid
Data, RAID1: total=6.00GiB, used=5.01GiB
System, RAID1: total=8.00MiB, used=64.00KiB
Metadata, RAID1: total=1.00GiB, used=7.00MiB
unknown, single: total=64.00MiB, used=0.00

Another 1GB chunk has been allocated, and is now possible to write degraded again, up to a GiB of data.



Chris Murphy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] bogus out of space reported when mounted raid1 degraded
  2014-07-24  1:13     ` Chris Murphy
@ 2014-07-24  8:15       ` Duncan
  0 siblings, 0 replies; 13+ messages in thread
From: Duncan @ 2014-07-24  8:15 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Wed, 23 Jul 2014 19:13:10 -0600 as excerpted:

> On Jul 22, 2014, at 11:24 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> 
>> ** ON BTRFS RAID1, TWO DEVICES MUST BE PRESENT IN ORDERED TO ALLOCATE
>> NEW CHUNKS.  MOUNTING DEGRADED WITH A SINGLE DEVICE MEANS NO NEW CHUNK
>> ALLOCATION, WHICH MEANS YOU'RE LIMITED TO FILLING UP EXISTING CHUNKS **
> 
> I can confirm this behavior (see below).

And...

If I'm reading it correctly, patch 9/10 in Miao Xie's new 10-part patch 
series should fix that.

From: Miao Xie <miaox@cn.fujitsu.com>
Subject: [PATCH 09/10] Btrfs: don't consider the missing device when
 allocating new chunks
Date: Thu, 24 Jul 2014 11:37:14 +0800
Message-ID: <1406173035-29478-9-git-send-email-miaox@cn.fujitsu.com>

While I don't claim to be a dev, based on the comments and my reading of 
the patch (and assuming there's no other location blocking it that needs 
patched as well), that should allow new, effectively single-mode chunks 
to be allocated when a btrfs multi-device raid1 mode filesystem is 
mounted degraded with just a single device.  That should allow normal 
writes to continue altho in single mode, and a balance -Xconvert=raid1 
can be used later to upgrade back to raid1 after a second device is added 
back in.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-07-24  8:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-23  1:34 BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops Chris Murphy
2014-07-23  1:52 ` Chris Murphy
2014-07-23  2:36 ` [BUG] bogus out of space reported when mounted raid1 degraded Chris Murphy
2014-07-23  5:24   ` Duncan
2014-07-24  1:13     ` Chris Murphy
2014-07-24  8:15       ` Duncan
2014-07-23  2:52 ` BUGS: bogus out of space reported when mounted raid1 degraded, btrfs replace failure, then oops Eric Sandeen
2014-07-23  3:28   ` Chris Murphy
2014-07-23  3:36     ` Liu Bo
2014-07-23  4:03       ` Chris Murphy
2014-07-23  4:16       ` Eric Sandeen
2014-07-23  3:01 ` Liu Bo
2014-07-23  3:21   ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).