* degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
@ 2016-11-16 10:25 Martin Steigerwald
2016-11-16 10:43 ` Roman Mamedov
0 siblings, 1 reply; 17+ messages in thread
From: Martin Steigerwald @ 2016-11-16 10:25 UTC (permalink / raw)
To: linux-btrfs; +Cc: Martin
Hello!
A degraded BTRFS RAID 1 from one 3TB SATA HDD of my former workstation is not mountable.
Debian 4.8 kernel + btrfs-tools 4.7.3.
A btrfs restore seems to work well enough, so on one hand there is no
urgency. But on the other hand I want to repurpose the harddisk and I
think I want to do it next weekend. So if you want me to gather some
debug data, please speak up quickly. Thank you.
AFAIR I have been able to mount the filesystems in degraded mode, but
this may have been on the other SATA HDD that I already wiped with shred
command.
I have this:
merkaba:~> btrfs fi sh
[…]
warning, device 2 is missing
warning, device 2 is missing
warning, device 2 is missing
Label: 'debian' uuid: […]
Total devices 2 FS bytes used 20.10GiB
devid 1 size 50.00GiB used 29.03GiB path /dev/mapper/satafp1-debian
*** Some devices missing
Label: 'daten' uuid: […]
Total devices 2 FS bytes used 135.02GiB
devid 1 size 1.00TiB used 142.06GiB path /dev/mapper/satafp1-daten
*** Some devices missing
Label: 'backup' uuid: […]
Total devices 2 FS bytes used 88.38GiB
devid 1 size 1.00TiB used 93.06GiB path /dev/mapper/satafp1-backup
*** Some devices missing
But none of these filesystems seem to be mountable. Here some attempts:
merkaba:~#130> LANG=C mount -o degraded /dev/satafp1/backup /mnt/zeit
mount: wrong fs type, bad option, bad superblock on /dev/mapper/satafp1-daten,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
merkaba:~> dmesg | tail -5
[ 2945.155943] BTRFS info (device dm-13): allowing degraded mounts
[ 2945.155953] BTRFS info (device dm-13): disk space caching is enabled
[ 2945.155957] BTRFS info (device dm-13): has skinny extents
[ 2945.611236] BTRFS warning (device dm-13): missing devices (1) exceeds the limit (0), writeable mount is not allowed
[ 2945.646719] BTRFS: open_ctree failed
merkaba:~> LANG=C mount -o usebackuproot /dev/satafp1/daten /mnt/zeit
mount: wrong fs type, bad option, bad superblock on /dev/mapper/satafp1-daten,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
merkaba:~#32> dmesg | tail -5
[ 5739.051433] BTRFS info (device dm-12): trying to use backup root at mount time
[ 5739.051441] BTRFS info (device dm-12): disk space caching is enabled
[ 5739.051444] BTRFS info (device dm-12): has skinny extents
[ 5739.103153] BTRFS error (device dm-12): failed to read chunk tree: -5
[ 5739.130304] BTRFS: open_ctree failed
merkaba:~> LANG=C mount -o degraded,usebackuproot /dev/satafp1/daten /mnt/zeit
mount: wrong fs type, bad option, bad superblock on /dev/mapper/satafp1-daten,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
merkaba:~#32> dmesg | tail -5
[ 5801.704202] BTRFS info (device dm-12): trying to use backup root at mount time
[ 5801.704206] BTRFS info (device dm-12): disk space caching is enabled
[ 5801.704208] BTRFS info (device dm-12): has skinny extents
[ 5803.928059] BTRFS warning (device dm-12): missing devices (1) exceeds the limit (0), writeable mount is not allowed
[ 5804.064638] BTRFS: open_ctree failed
`btrfs check` reports:
merkaba:~#32> btrfs check /dev/satafp1/backup
warning, device 2 is missing
Checking filesystem on /dev/satafp1/backup
UUID: 01cf0493-476f-42e8-8905-61ef205313db
checking extents
checking free space cache
failed to load free space cache for block group 58003030016
failed to load free space cache for block group 60150513664
failed to load free space cache for block group 62297997312
[…]
checking fs roots
^C
I aborted it at this time as I wanted to try clear_cache mount option
after seeing this. I can redo this thing after btrfs restore completed.
merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit
mount: Falscher Dateisystemtyp, ungültige Optionen, der
Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende
Kodierungsseite oder ein anderer Fehler
Manchmal liefert das Systemprotokoll wertvolle Informationen –
versuchen Sie dmesg | tail oder ähnlich
merkaba:~#32> dmesg | tail -6
[ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts
[ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache
[ 3080.120703] BTRFS info (device dm-13): disk space caching is enabled
[ 3080.120706] BTRFS info (device dm-13): has skinny extents
[ 3080.150957] BTRFS warning (device dm-13): missing devices (1) exceeds the limit (0), writeable mount is not allowed
[ 3080.195941] BTRFS: open_ctree failed
merkaba:~> mount -o degraded,clear_cache,usebackuproot /dev/satafp1/backup /mnt/zeit
mount: Falscher Dateisystemtyp, ungültige Optionen, der
Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende
Kodierungsseite oder ein anderer Fehler
Manchmal liefert das Systemprotokoll wertvolle Informationen –
versuchen Sie dmesg | tail oder ähnlich
merkaba:~> dmesg | tail -7
[ 3173.784713] BTRFS info (device dm-13): allowing degraded mounts
[ 3173.784728] BTRFS info (device dm-13): force clearing of disk cache
[ 3173.784737] BTRFS info (device dm-13): trying to use backup root at mount time
[ 3173.784742] BTRFS info (device dm-13): disk space caching is enabled
[ 3173.784746] BTRFS info (device dm-13): has skinny extents
[ 3173.816983] BTRFS warning (device dm-13): missing devices (1) exceeds the limit (0), writeable mount is not allowed
[ 3173.865199] BTRFS: open_ctree failed
I aborted repairing after this assert:
merkaba:~#130> btrfs check --repair /dev/satafp1/backup &| stdbuf -oL tee btrfs-check-repair-satafp1-backup.log
enabling repair mode
warning, device 2 is missing
Checking filesystem on /dev/satafp1/backup
UUID: 01cf0493-476f-42e8-8905-61ef205313db
checking extents
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs[0x43e418]
btrfs(btrfs_reserve_extent+0x5c9)[0x4425df]
btrfs(btrfs_alloc_free_block+0x63)[0x44297c]
btrfs(__btrfs_cow_block+0xfc)[0x436636]
btrfs(btrfs_cow_block+0x8b)[0x436bd8]
btrfs[0x43ad82]
btrfs(btrfs_commit_transaction+0xb8)[0x43c5dc]
btrfs[0x4268b4]
btrfs(cmd_check+0x1111)[0x427d6d]
btrfs(main+0x12f)[0x40a341]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fb2e6bec2b1]
btrfs(_start+0x2a)[0x40a37a]
merkaba:~#130> btrfs --version
btrfs-progs v4.7.3
(Honestly I think asserts like this need to be gone from btrfs-tools for good)
About this I only found this unanswered mailing list post:
btrfs-convert: Unable to find block group for 0
Date: Fri, 24 Jun 2016 11:09:27 +0200
https://www.spinics.net/lists/linux-btrfs/msg56478.html
Out of curiosity I tried:
merkaba:~#1> btrfs rescue zero-log //dev/satafp1/daten
warning, device 2 is missing
Clearing log on //dev/satafp1/daten, previous log_root 0, level 0
Unable to find block group for 0
extent-tree.c:289: find_search_start: Assertion `1` failed.
btrfs[0x43e418]
btrfs(btrfs_reserve_extent+0x5c9)[0x4425df]
btrfs(btrfs_alloc_free_block+0x63)[0x44297c]
btrfs(__btrfs_cow_block+0xfc)[0x436636]
btrfs(btrfs_cow_block+0x8b)[0x436bd8]
btrfs[0x43ad82]
btrfs(btrfs_commit_transaction+0xb8)[0x43c5dc]
btrfs[0x42c0d4]
btrfs(main+0x12f)[0x40a341]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fb2f16a82b1]
btrfs(_start+0x2a)[0x40a37a]
(I didn´t expect much as this is an issue that AFAIK does not happen
easily anymore, but I also thought it could not do much harm)
Superblocks themselves seem to be sane:
merkaba:~#1> btrfs rescue super-recover //dev/satafp1/daten
All supers are valid, no need to recover
So "btrfs restore" it is:
merkaba:[…]> btrfs restore -mxs /dev/satafp1/daten daten-restore
This prints out a ton of:
Trying another mirror
Trying another mirror
But it actually works. Somewhat, I now just got
Trying another mirror
We seem to be looping a lot on daten-restore/[…]/virtualbox-4.1.18-dfsg/out/lib/vboxsoap.a, do you want to keep going on ? (y/N/a):
after about 35 GiB of data restored. I answered no to this one and now it is
at about 53 GiB already. I just got another one of these, but also not
concerning a file I actually need.
Thanks,
--
Martin Steigerwald | Trainer
teamix GmbH
Südwestpark 43
90449 Nürnberg
Tel.: +49 911 30999 55 | Fax: +49 911 30999 99
mail: martin.steigerwald@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de
Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller
teamix Support Hotline: +49 911 30999-112
*** Bitte liken Sie uns auf Facebook: facebook.com/teamix ***
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-16 10:25 degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0 Martin Steigerwald
@ 2016-11-16 10:43 ` Roman Mamedov
2016-11-16 10:55 ` Martin Steigerwald
0 siblings, 1 reply; 17+ messages in thread
From: Roman Mamedov @ 2016-11-16 10:43 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs, Martin
On Wed, 16 Nov 2016 11:25:00 +0100
Martin Steigerwald <martin.steigerwald@teamix.de> wrote:
> merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit
> mount: Falscher Dateisystemtyp, ungültige Optionen, der
> Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende
> Kodierungsseite oder ein anderer Fehler
>
> Manchmal liefert das Systemprotokoll wertvolle Informationen –
> versuchen Sie dmesg | tail oder ähnlich
> merkaba:~#32> dmesg | tail -6
> [ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts
> [ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache
> [ 3080.120703] BTRFS info (device dm-13): disk space caching is enabled
> [ 3080.120706] BTRFS info (device dm-13): has skinny extents
> [ 3080.150957] BTRFS warning (device dm-13): missing devices (1) exceeds the limit (0), writeable mount is not allowed
> [ 3080.195941] BTRFS: open_ctree failed
I have to wonder did you read the above message? What you need at this point
is simply "-o degraded,ro". But I don't see that tried anywhere down the line.
See also (or try): https://patchwork.kernel.org/patch/9419189/
--
With respect,
Roman
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-16 10:43 ` Roman Mamedov
@ 2016-11-16 10:55 ` Martin Steigerwald
2016-11-16 11:00 ` Roman Mamedov
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Martin Steigerwald @ 2016-11-16 10:55 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs, Martin
Am Mittwoch, 16. November 2016, 15:43:36 CET schrieb Roman Mamedov:
> On Wed, 16 Nov 2016 11:25:00 +0100
>
> Martin Steigerwald <martin.steigerwald@teamix.de> wrote:
> > merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit
> > mount: Falscher Dateisystemtyp, ungültige Optionen, der
> > Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende
> > Kodierungsseite oder ein anderer Fehler
> >
> > Manchmal liefert das Systemprotokoll wertvolle Informationen –
> > versuchen Sie dmesg | tail oder ähnlich
> >
> > merkaba:~#32> dmesg | tail -6
> > [ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts
> > [ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache
> > [ 3080.120703] BTRFS info (device dm-13): disk space caching is
> > enabled
> > [ 3080.120706] BTRFS info (device dm-13): has skinny extents
> > [ 3080.150957] BTRFS warning (device dm-13): missing devices (1)
> > exceeds the limit (0), writeable mount is not allowed
> > [ 3080.195941] BTRFS: open_ctree failed
>
> I have to wonder did you read the above message? What you need at this point
> is simply "-o degraded,ro". But I don't see that tried anywhere down the
> line.
>
> See also (or try): https://patchwork.kernel.org/patch/9419189/
Actually I read that one, but I read more into it than what it was saying:
I read into it that BTRFS would automatically use a read only mount.
merkaba:~> mount -o degraded,ro /dev/satafp1/daten /mnt/zeit
actually really works. *Thank you*, Roman.
I do think that above kernel messages invite such a kind of interpretation
tough. I took the "BTRFS: open_ctree failed" message as indicative to some
structural issue with the filesystem.
So mounting work although for some reason scrubbing is aborted (I had this
issue a long time ago on my laptop as well). After removing /var/lib/btrfs
scrub status file for the filesystem:
merkaba:~> btrfs scrub start /mnt/zeit
scrub started on /mnt/zeit, fsid […] (pid=9054)
merkaba:~> btrfs scrub status /mnt/zeit
scrub status for […]
scrub started at Wed Nov 16 11:52:56 2016 and was aborted after
00:00:00
total bytes scrubbed: 0.00B with 0 errors
Anyway, I will now just rsync off the files.
Interestingly enough btrfs restore complained about looping over certain
files… lets see whether the rsync or btrfs send/receive proceeds through.
Ciao,
--
Martin Steigerwald | Trainer
teamix GmbH
Südwestpark 43
90449 Nürnberg
Tel.: +49 911 30999 55 | Fax: +49 911 30999 99
mail: martin.steigerwald@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de
Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller
teamix Support Hotline: +49 911 30999-112
*** Bitte liken Sie uns auf Facebook: facebook.com/teamix ***
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-16 10:55 ` Martin Steigerwald
@ 2016-11-16 11:00 ` Roman Mamedov
2016-11-16 11:04 ` Martin Steigerwald
2016-11-16 11:18 ` Martin Steigerwald
2016-11-16 12:48 ` Austin S. Hemmelgarn
2 siblings, 1 reply; 17+ messages in thread
From: Roman Mamedov @ 2016-11-16 11:00 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: linux-btrfs, Martin
On Wed, 16 Nov 2016 11:55:32 +0100
Martin Steigerwald <martin.steigerwald@teamix.de> wrote:
> I do think that above kernel messages invite such a kind of interpretation
> tough. I took the "BTRFS: open_ctree failed" message as indicative to some
> structural issue with the filesystem.
For the reason as to why the writable mount didn't work, check "btrfs fi df"
for the filesystem to see if you have any "single" profile chunks on it: quite
likely you did already mount it "degraded,rw" in the past *once*, after which
those "single" chunks get created, and consequently it won't mount r/w anymore
(without lifting the restriction on the number of missing devices as proposed).
--
With respect,
Roman
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-16 11:00 ` Roman Mamedov
@ 2016-11-16 11:04 ` Martin Steigerwald
2016-11-16 12:57 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 17+ messages in thread
From: Martin Steigerwald @ 2016-11-16 11:04 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs, Martin
Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov:
> On Wed, 16 Nov 2016 11:55:32 +0100
>
> Martin Steigerwald <martin.steigerwald@teamix.de> wrote:
> > I do think that above kernel messages invite such a kind of interpretation
> > tough. I took the "BTRFS: open_ctree failed" message as indicative to some
> > structural issue with the filesystem.
>
> For the reason as to why the writable mount didn't work, check "btrfs fi df"
> for the filesystem to see if you have any "single" profile chunks on it:
> quite likely you did already mount it "degraded,rw" in the past *once*,
> after which those "single" chunks get created, and consequently it won't
> mount r/w anymore (without lifting the restriction on the number of missing
> devices as proposed).
That exactly explains it. I very likely did a degraded mount without ro on
this disk already.
Funnily enough this creates another complication:
merkaba:/mnt/zeit#1> btrfs send somesubvolume | btrfs receive /mnt/
someotherbtrfs
ERROR: subvolume /mnt/zeit/somesubvolume is not read-only
Yet:
merkaba:/mnt/zeit> btrfs property get somesubvolume
ro=false
merkaba:/mnt/zeit> btrfs property set somesubvolume ro true
ERROR: failed to set flags for somesubvolume: Read-only file system
To me it seems right logic would be to allow the send to proceed in case
the whole filesystem is readonly.
As there seems to be no force option to override the limitation and I
do not feel like compiling my own btrfs-tools right now, I will use rsync
instead.
Thanks,
--
Martin Steigerwald | Trainer
teamix GmbH
Südwestpark 43
90449 Nürnberg
Tel.: +49 911 30999 55 | Fax: +49 911 30999 99
mail: martin.steigerwald@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de
Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller
teamix Support Hotline: +49 911 30999-112
*** Bitte liken Sie uns auf Facebook: facebook.com/teamix ***
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-16 10:55 ` Martin Steigerwald
2016-11-16 11:00 ` Roman Mamedov
@ 2016-11-16 11:18 ` Martin Steigerwald
2016-11-16 12:48 ` Austin S. Hemmelgarn
2 siblings, 0 replies; 17+ messages in thread
From: Martin Steigerwald @ 2016-11-16 11:18 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs, Martin
Am Mittwoch, 16. November 2016, 11:55:32 CET schrieben Sie:
> So mounting work although for some reason scrubbing is aborted (I had this
> issue a long time ago on my laptop as well). After removing /var/lib/btrfs
> scrub status file for the filesystem:
>
> merkaba:~> btrfs scrub start /mnt/zeit
> scrub started on /mnt/zeit, fsid […] (pid=9054)
> merkaba:~> btrfs scrub status /mnt/zeit
> scrub status for […]
> scrub started at Wed Nov 16 11:52:56 2016 and was aborted after
> 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
>
> Anyway, I will now just rsync off the files.
>
> Interestingly enough btrfs restore complained about looping over certain
> files… lets see whether the rsync or btrfs send/receive proceeds through.
I have an idea on why scrubbing may not work:
The filesystem is mounted read only and on checksum errors on one disk scrub
would try to repair it with the good copy from another disk.
Yes, this is it:
merkaba:~> btrfs scrub start -r /dev/satafp1/daten
scrub started on /dev/satafp1/daten, fsid […] (pid=9375)
merkaba:~> btrfs scrub status /dev/satafp1/daten
scrub status for […]
scrub started at Wed Nov 16 12:13:27 2016, running for 00:00:10
total bytes scrubbed: 45.53MiB with 0 errors
It would be helpful to receive a proper error message on this one.
Okay, seems today I learned quite something about BTRFS.
Thanks,
--
Martin Steigerwald | Trainer
teamix GmbH
Südwestpark 43
90449 Nürnberg
Tel.: +49 911 30999 55 | Fax: +49 911 30999 99
mail: martin.steigerwald@teamix.de | web: http://www.teamix.de | blog: http://blog.teamix.de
Amtsgericht Nürnberg, HRB 18320 | Geschäftsführer: Oliver Kügow, Richard Müller
teamix Support Hotline: +49 911 30999-112
*** Bitte liken Sie uns auf Facebook: facebook.com/teamix ***
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-16 10:55 ` Martin Steigerwald
2016-11-16 11:00 ` Roman Mamedov
2016-11-16 11:18 ` Martin Steigerwald
@ 2016-11-16 12:48 ` Austin S. Hemmelgarn
2 siblings, 0 replies; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2016-11-16 12:48 UTC (permalink / raw)
To: Martin Steigerwald, Roman Mamedov; +Cc: linux-btrfs, Martin
On 2016-11-16 05:55, Martin Steigerwald wrote:
> Am Mittwoch, 16. November 2016, 15:43:36 CET schrieb Roman Mamedov:
>> On Wed, 16 Nov 2016 11:25:00 +0100
>>
>> Martin Steigerwald <martin.steigerwald@teamix.de> wrote:
>>> merkaba:~> mount -o degraded,clear_cache /dev/satafp1/backup /mnt/zeit
>>> mount: Falscher Dateisystemtyp, ungültige Optionen, der
>>> Superblock von /dev/mapper/satafp1-backup ist beschädigt, fehlende
>>> Kodierungsseite oder ein anderer Fehler
>>>
>>> Manchmal liefert das Systemprotokoll wertvolle Informationen –
>>> versuchen Sie dmesg | tail oder ähnlich
>>>
>>> merkaba:~#32> dmesg | tail -6
>>> [ 3080.120687] BTRFS info (device dm-13): allowing degraded mounts
>>> [ 3080.120699] BTRFS info (device dm-13): force clearing of disk cache
>>> [ 3080.120703] BTRFS info (device dm-13): disk space caching is
>>> enabled
>>> [ 3080.120706] BTRFS info (device dm-13): has skinny extents
>>> [ 3080.150957] BTRFS warning (device dm-13): missing devices (1)
>>> exceeds the limit (0), writeable mount is not allowed
>>> [ 3080.195941] BTRFS: open_ctree failed
>>
>> I have to wonder did you read the above message? What you need at this point
>> is simply "-o degraded,ro". But I don't see that tried anywhere down the
>> line.
>>
>> See also (or try): https://patchwork.kernel.org/patch/9419189/
>
> Actually I read that one, but I read more into it than what it was saying:
>
> I read into it that BTRFS would automatically use a read only mount.
>
>
> merkaba:~> mount -o degraded,ro /dev/satafp1/daten /mnt/zeit
>
> actually really works. *Thank you*, Roman.
>
>
> I do think that above kernel messages invite such a kind of interpretation
> tough. I took the "BTRFS: open_ctree failed" message as indicative to some
> structural issue with the filesystem.
Technically, the fact that a device is missing is a structural issue
with the FS. Whether or not that falls under what any arbitrary person
considers a structural issue or not is a different story.
General background though:
open_ctree is one of the core functions in the BTRFS code used during
mounting the filesystem. Everything that calls it checks the return
code and spits out 'BTRFS: open_ctree failed' if it failed. The problem
is, just about everything internal (and many external things as well) to
the BTRFS code that could prevent the FS from mounting happens either in
open_ctree, or in a function it calls, so all that that line tells us is
that the mount failed, which is less than useful in most cases. Given
both the confusion you've experienced regarding this (which has happened
to other people too), combined with the amount of effort I've had to put
in to get the rest of the SysOps people where I work to understand that
that message just means 'mount failed', I would really love to see that
just be replaced with 'mount failed' in non-debug builds, preferrably
with better info about _why_ things failed (the case of a degraded
filesystem is pretty covered, but most other cases other than
incompatible feature bits are not).
>
> So mounting work although for some reason scrubbing is aborted (I had this
> issue a long time ago on my laptop as well). After removing /var/lib/btrfs
> scrub status file for the filesystem:
Last I knew, scrub doesn't work on degraded filesystems (in fact, by
definition, it _can't_ work on a degraded array). It absolutely won't
work though without the read-only flag on filesystems which are mounted
read-only.
>
> merkaba:~> btrfs scrub start /mnt/zeit
> scrub started on /mnt/zeit, fsid […] (pid=9054)
> merkaba:~> btrfs scrub status /mnt/zeit
> scrub status for […]
> scrub started at Wed Nov 16 11:52:56 2016 and was aborted after
> 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
>
> Anyway, I will now just rsync off the files.
>
> Interestingly enough btrfs restore complained about looping over certain
> files… lets see whether the rsync or btrfs send/receive proceeds through.
I'd expect rsync to be more likely to work than send/receive. In
general, if you can read the files, rsync will work, whereas
send/receive needs to read some low-level data from the FS which may not
be touched when just reading files, so there are cases where rsync will
work but send/receive won't.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-16 11:04 ` Martin Steigerwald
@ 2016-11-16 12:57 ` Austin S. Hemmelgarn
2016-11-16 17:06 ` Martin Steigerwald
0 siblings, 1 reply; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2016-11-16 12:57 UTC (permalink / raw)
To: Martin Steigerwald, Roman Mamedov; +Cc: linux-btrfs, Martin
On 2016-11-16 06:04, Martin Steigerwald wrote:
> Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov:
>> On Wed, 16 Nov 2016 11:55:32 +0100
>>
>> Martin Steigerwald <martin.steigerwald@teamix.de> wrote:
>>> I do think that above kernel messages invite such a kind of interpretation
>>> tough. I took the "BTRFS: open_ctree failed" message as indicative to some
>>> structural issue with the filesystem.
>>
>> For the reason as to why the writable mount didn't work, check "btrfs fi df"
>> for the filesystem to see if you have any "single" profile chunks on it:
>> quite likely you did already mount it "degraded,rw" in the past *once*,
>> after which those "single" chunks get created, and consequently it won't
>> mount r/w anymore (without lifting the restriction on the number of missing
>> devices as proposed).
>
> That exactly explains it. I very likely did a degraded mount without ro on
> this disk already.
>
> Funnily enough this creates another complication:
>
> merkaba:/mnt/zeit#1> btrfs send somesubvolume | btrfs receive /mnt/
> someotherbtrfs
> ERROR: subvolume /mnt/zeit/somesubvolume is not read-only
>
> Yet:
>
> merkaba:/mnt/zeit> btrfs property get somesubvolume
> ro=false
> merkaba:/mnt/zeit> btrfs property set somesubvolume ro true
> ERROR: failed to set flags for somesubvolume: Read-only file system
>
> To me it seems right logic would be to allow the send to proceed in case
> the whole filesystem is readonly.
It should, but doesn't currently. There was a thread about this a while
back, but I don't think it ever resulted in anything changing.
>
> As there seems to be no force option to override the limitation and I
> do not feel like compiling my own btrfs-tools right now, I will use rsync
> instead.
In a case like this, I'd trust rsync more than send/receive. The
following rsync switches might also be of interest:
-a: This turns on a bunch of things almost everyone wants when using
rsync, similar to the same switch for cp, just with even more added in.
-H: This recreates hardlinks on the receiving end.
-S: This recreates sparse files.
-A: This copies POSIX ACL's
-X: This copies extended attributes (most of them at least, there are a
few that can't be arbitrarily written to).
Pre-creating the subvolumes by hand combined with using all of those
will get you almost everything covered by send/receive except for
sharing of extents and ctime.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-16 12:57 ` Austin S. Hemmelgarn
@ 2016-11-16 17:06 ` Martin Steigerwald
2016-11-17 20:05 ` Chris Murphy
0 siblings, 1 reply; 17+ messages in thread
From: Martin Steigerwald @ 2016-11-16 17:06 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: Martin Steigerwald, Roman Mamedov, linux-btrfs
Am Mittwoch, 16. November 2016, 07:57:08 CET schrieb Austin S. Hemmelgarn:
> On 2016-11-16 06:04, Martin Steigerwald wrote:
> > Am Mittwoch, 16. November 2016, 16:00:31 CET schrieb Roman Mamedov:
> >> On Wed, 16 Nov 2016 11:55:32 +0100
> >>
> >> Martin Steigerwald <martin.steigerwald@teamix.de> wrote:
[…]
> > As there seems to be no force option to override the limitation and I
> > do not feel like compiling my own btrfs-tools right now, I will use rsync
> > instead.
>
> In a case like this, I'd trust rsync more than send/receive. The
> following rsync switches might also be of interest:
> -a: This turns on a bunch of things almost everyone wants when using
> rsync, similar to the same switch for cp, just with even more added in.
> -H: This recreates hardlinks on the receiving end.
> -S: This recreates sparse files.
> -A: This copies POSIX ACL's
> -X: This copies extended attributes (most of them at least, there are a
> few that can't be arbitrarily written to).
> Pre-creating the subvolumes by hand combined with using all of those
> will get you almost everything covered by send/receive except for
> sharing of extents and ctime.
I usually use rsync -aAHXSP already :).
I was able to rsync any relevant data of the disk which is now being deleted
by shred command.
Thank you,
--
Martin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-16 17:06 ` Martin Steigerwald
@ 2016-11-17 20:05 ` Chris Murphy
2016-11-17 20:20 ` Austin S. Hemmelgarn
2016-11-17 20:46 ` Martin Steigerwald
0 siblings, 2 replies; 17+ messages in thread
From: Chris Murphy @ 2016-11-17 20:05 UTC (permalink / raw)
To: Martin Steigerwald
Cc: Austin S. Hemmelgarn, Martin Steigerwald, Roman Mamedov,
Btrfs BTRFS
I think the wiki should be updated to reflect that raid1 and raid10
are mostly OK. I think it's grossly misleading to consider either as
green/OK when a single degraded read write mount creates single chunks
that will then prevent a subsequent degraded read write mount. And
also the lack of various notifications of device faultiness I think
make it less than OK also. It's not in the "do not use" category but
it should be in the middle ground status so users can make informed
decisions.
Chris Murphy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-17 20:05 ` Chris Murphy
@ 2016-11-17 20:20 ` Austin S. Hemmelgarn
2016-11-19 20:27 ` Chris Murphy
2016-11-20 11:58 ` Niccolò Belli
2016-11-17 20:46 ` Martin Steigerwald
1 sibling, 2 replies; 17+ messages in thread
From: Austin S. Hemmelgarn @ 2016-11-17 20:20 UTC (permalink / raw)
To: Chris Murphy, Martin Steigerwald
Cc: Martin Steigerwald, Roman Mamedov, Btrfs BTRFS
On 2016-11-17 15:05, Chris Murphy wrote:
> I think the wiki should be updated to reflect that raid1 and raid10
> are mostly OK. I think it's grossly misleading to consider either as
> green/OK when a single degraded read write mount creates single chunks
> that will then prevent a subsequent degraded read write mount. And
> also the lack of various notifications of device faultiness I think
> make it less than OK also. It's not in the "do not use" category but
> it should be in the middle ground status so users can make informed
> decisions.
>
It's worth pointing out also regarding this:
* This is handled sanely in recent kernels (the check got changed from
per-fs to per-chunk, so you still have a usable FS if all the single
chunks are only on devices you still have).
* This is only an issue with filesystems with exactly two disks. If a
3+ disk raid1 FS goes degraded, you still generate raid1 chunks.
* There are a couple of other cases where raid1 mode falls flat on it's
face (lots of I/O errors in a short span of time with compression
enabled can cause a kernel panic for example).
* raid10 has some other issues of it's own (you lose two devices, your
filesystem is dead, which shouldn't be the case 100% of the time (if you
lose different parts of each mirror, BTRFS _should_ be able to recover,
it just doesn't do so right now)).
As far as the failed device handling issues, those are a problem with
BTRFS in general, not just raid1 and raid10, so I wouldn't count those
against raid1 and raid10.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-17 20:05 ` Chris Murphy
2016-11-17 20:20 ` Austin S. Hemmelgarn
@ 2016-11-17 20:46 ` Martin Steigerwald
1 sibling, 0 replies; 17+ messages in thread
From: Martin Steigerwald @ 2016-11-17 20:46 UTC (permalink / raw)
To: Chris Murphy
Cc: Austin S. Hemmelgarn, Martin Steigerwald, Roman Mamedov,
Btrfs BTRFS
Am Donnerstag, 17. November 2016, 12:05:31 CET schrieb Chris Murphy:
> I think the wiki should be updated to reflect that raid1 and raid10
> are mostly OK. I think it's grossly misleading to consider either as
> green/OK when a single degraded read write mount creates single chunks
> that will then prevent a subsequent degraded read write mount. And
> also the lack of various notifications of device faultiness I think
> make it less than OK also. It's not in the "do not use" category but
> it should be in the middle ground status so users can make informed
> decisions.
I agree – as error reporting I think is indead misleading. Feel free to edit
it.
Ciao,
--
Martin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-17 20:20 ` Austin S. Hemmelgarn
@ 2016-11-19 20:27 ` Chris Murphy
2016-11-20 11:58 ` Niccolò Belli
1 sibling, 0 replies; 17+ messages in thread
From: Chris Murphy @ 2016-11-19 20:27 UTC (permalink / raw)
To: Austin S. Hemmelgarn
Cc: Chris Murphy, Martin Steigerwald, Martin Steigerwald,
Roman Mamedov, Btrfs BTRFS
On Thu, Nov 17, 2016 at 12:20 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2016-11-17 15:05, Chris Murphy wrote:
>>
>> I think the wiki should be updated to reflect that raid1 and raid10
>> are mostly OK. I think it's grossly misleading to consider either as
>> green/OK when a single degraded read write mount creates single chunks
>> that will then prevent a subsequent degraded read write mount. And
>> also the lack of various notifications of device faultiness I think
>> make it less than OK also. It's not in the "do not use" category but
>> it should be in the middle ground status so users can make informed
>> decisions.
>>
> It's worth pointing out also regarding this:
> * This is handled sanely in recent kernels (the check got changed from
> per-fs to per-chunk, so you still have a usable FS if all the single chunks
> are only on devices you still have).
The status page should reflect the version with sane behavior,
relative to the versions with not sane behavior. Otherwise people hit
it unexpectedly despite the status page.
But still, the multiple device stuff really is not "OK" until there's
some kind of faulty device concept and also notification for state
changes from normal to faulty, faulty to normal, or even failed if
that's reliably distinguishable from faulty.
> As far as the failed device handling issues, those are a problem with BTRFS
> in general, not just raid1 and raid10, so I wouldn't count those against
> raid1 and raid10.
Sure but raid56 are already flagged as not ready for prime time.
--
Chris Murphy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2016-11-17 20:20 ` Austin S. Hemmelgarn
2016-11-19 20:27 ` Chris Murphy
@ 2016-11-20 11:58 ` Niccolò Belli
1 sibling, 0 replies; 17+ messages in thread
From: Niccolò Belli @ 2016-11-20 11:58 UTC (permalink / raw)
To: Austin S. Hemmelgarn
Cc: Chris Murphy, Martin Steigerwald, Martin Steigerwald,
Roman Mamedov, Btrfs BTRFS
On giovedì 17 novembre 2016 21:20:56 CET, Austin S. Hemmelgarn wrote:
> On 2016-11-17 15:05, Chris Murphy wrote:
>> I think the wiki should be updated to reflect that raid1 and raid10
>> are mostly OK. I think it's grossly misleading to consider either as
>> green/OK when a single degraded read write mount creates single chunks
>> that will then prevent a subsequent degraded read write mount. And
>> also the lack of various notifications of device faultiness I think
>> make it less than OK also. It's not in the "do not use" category but
>> it should be in the middle ground status so users can make informed
>> decisions.
>>
> It's worth pointing out also regarding this:
> * This is handled sanely in recent kernels (the check got
> changed from per-fs to per-chunk, so you still have a usable FS
> if all the single chunks are only on devices you still have).
> * This is only an issue with filesystems with exactly two
> disks. If a 3+ disk raid1 FS goes degraded, you still generate
> raid1 chunks.
> * There are a couple of other cases where raid1 mode falls flat
> on it's face (lots of I/O errors in a short span of time with
> compression enabled can cause a kernel panic for example).
> * raid10 has some other issues of it's own (you lose two
> devices, your filesystem is dead, which shouldn't be the case
> 100% of the time (if you lose different parts of each mirror,
> BTRFS _should_ be able to recover, it just doesn't do so right
> now)).
>
> As far as the failed device handling issues, those are a
> problem with BTRFS in general, not just raid1 and raid10, so I
> wouldn't count those against raid1 and raid10.
Everything you mentioned should be in the wiki IMHO. Knowledge is power.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
@ 2017-08-22 9:31 g6094199
2017-08-22 10:28 ` Dmitrii Tcvetkov
0 siblings, 1 reply; 17+ messages in thread
From: g6094199 @ 2017-08-22 9:31 UTC (permalink / raw)
To: linux-btrfs; +Cc: rm
He guys,
picking up this old topic cause i'm running into a similar problem.
Running a Ubuntu 16.04 (HWE K4.8) server with 2 nvme SSD as Raid1 as /.
Since one nvme died i had to replace it, where the trouble began. I
replaced the nvme, bootet degraded, added the new disk to the raid
(btrfs dev add) and removed the missing/dead device (btrfs dev del).
Everything worked well. BUT as i rebooted i ran into the "BTRFS RAID 1
not mountable: open_ctree failed, unable to find block group for 0"
because of a MISSING disk?! I checked the btrfs list and found that
there was a patch that enabled a more strict behavior in handing missing
devices (atm cant find the related patch anymore), which was merged some
kernels before k4.8 but was NOT in k4.4. So i managed to install the
k4.4 ubuntu kernel and the system startet booting and working again. So
my pitty is that i cant update to anything after k4.4 with this
production machine. :-(
So 1st should be investigating why did the disk not get removed
correctly? Btrfs dev del should remove the device corretly, right? Is
there a bug?
2nd Was the restriction on handling missing devices to strikt? Is there
a bug?
3rd i saw https://patchwork.kernel.org/patch/9419189/ from Roman. Did he
receive any comments on his patch? This one could help on this problem,
too.
Regards
Sash
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
2017-08-22 9:31 g6094199
@ 2017-08-22 10:28 ` Dmitrii Tcvetkov
0 siblings, 0 replies; 17+ messages in thread
From: Dmitrii Tcvetkov @ 2017-08-22 10:28 UTC (permalink / raw)
To: g6094199; +Cc: linux-btrfs
On Tue, 22 Aug 2017 11:31:23 +0200
g6094199@freenet.de wrote:
> So 1st should be investigating why did the disk not get removed
> correctly? Btrfs dev del should remove the device corretly, right? Is
> there a bug?
It should and probably did. To check that we need to see output of
btrfs filesystem show
and output of
btrfs filesystem usage <mountpoint>
If there are non-raid1 chunks then you need to do soft balance:
btrfs balance start -mconvert=raid1,soft -dconvert=raid1,soft <mountpoint>
The balance should finish very quickly as you probably have only one of
data and metadata single chunks. They appeared during writes when the
filesystem was mounted read-write in degraded mode.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
@ 2017-08-23 13:12 g6094199
0 siblings, 0 replies; 17+ messages in thread
From: g6094199 @ 2017-08-23 13:12 UTC (permalink / raw)
To: Dmitrii Tcvetkov; +Cc: linux-btrfs
> -----Ursprüngliche Nachricht-----
> Von: Dmitrii Tcvetkov
> Gesendet: Di. 22.08.2017 12:28
> An: g6094199@freenet.de
> Kopie: linux-btrfs@vger.kernel.org
> Betreff: Re: degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0
>
> On Tue, 22 Aug 2017 11:31:23 +0200
> g6094199@freenet.de wrote:
>> So 1st should be investigating why did the disk not get removed
>> correctly? Btrfs dev del should remove the device corretly, right? Is
>> there a bug?
>
> It should and probably did. To check that we need to see output of
> btrfs filesystem show
> and output of
> btrfs filesystem usage
He Dimitry!
THX fror your suggestions.
root@vHost1:~# btrfs fi show /
Label: 'System' uuid: 35fdbfd4-5809-4a30-94c1-5e3ca206ca4d
Total devices 2 FS bytes used 17.00GiB
devid 6 size 50.00GiB used 22.03GiB path /dev/nvme0n1p1
devid 7 size 48.83GiB used 22.03GiB path /dev/sda1
root@vHost1:~# btrfs fi df /
Data, RAID1: total=19.00GiB, used=15.68GiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=3.00GiB, used=1.32GiB
GlobalReserve, single: total=384.00MiB, used=0.00B
root@vHost1:/var/log# btrfs filesystem usag /
Overall:
Device size: 98.83GiB
Device allocated: 44.06GiB
Device unallocated: 54.76GiB
Device missing: 0.00B
Used: 34.11GiB
Free (estimated): 30.68GiB (min: 30.68GiB)
Data ratio: 2.00
Metadata ratio: 2.00
Global reserve: 384.00MiB (used: 0.00B)
Data,RAID1: Size:19.00GiB, Used:15.70GiB
/dev/nvme0n1p1 19.00GiB
/dev/sda1 19.00GiB
Metadata,RAID1: Size:3.00GiB, Used:1.36GiB
/dev/nvme0n1p1 3.00GiB
/dev/sda1 3.00GiB
System,RAID1: Size:32.00MiB, Used:16.00KiB
/dev/nvme0n1p1 32.00MiB
/dev/sda1 32.00MiB
Unallocated:
/dev/nvme0n1p1 27.97GiB
/dev/sda1 26.80GiB
> If there are non-raid1 chunks then you need to do soft balance:
> btrfs balance start -mconvert=raid1,soft -dconvert=raid1,soft
Yes of cause i did a balance after replacing the disk (see above). I'm aware of the problems occuring with single chunks, etc. I did again a soft balance as you have suggested, which finished within seconds.
> The balance should finish very quickly as you probably have only one of
> data and metadata single chunks. They appeared during writes when the
> filesystem was mounted read-write in degraded mode.
I guess the typical erros are now sorted out. i will reboot the machine with a currect hwe kernel and send the logs. Anything else i can do?
regards
sash
MIT TRAVELXTRA PROFITIEREN SIE VON 5% RÜCKVERGÜTUNG AUF DEN
REISEPREIS ? bekannte dt. Reiseanbieter und ein umfangreiches
Reiseangebot wie im Reisebüro!
https://email.freenet.de/reisen/index.html
[https://email.freenet.de/reisen/index.html?utm_medium=Text&utm_source=Footersatz&utm_campaign=Footersatz_Reisen07082017&epid=e9900000699&utm_content=Text]
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-08-23 13:18 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-16 10:25 degraded BTRFS RAID 1 not mountable: open_ctree failed, unable to find block group for 0 Martin Steigerwald
2016-11-16 10:43 ` Roman Mamedov
2016-11-16 10:55 ` Martin Steigerwald
2016-11-16 11:00 ` Roman Mamedov
2016-11-16 11:04 ` Martin Steigerwald
2016-11-16 12:57 ` Austin S. Hemmelgarn
2016-11-16 17:06 ` Martin Steigerwald
2016-11-17 20:05 ` Chris Murphy
2016-11-17 20:20 ` Austin S. Hemmelgarn
2016-11-19 20:27 ` Chris Murphy
2016-11-20 11:58 ` Niccolò Belli
2016-11-17 20:46 ` Martin Steigerwald
2016-11-16 11:18 ` Martin Steigerwald
2016-11-16 12:48 ` Austin S. Hemmelgarn
-- strict thread matches above, loose matches on Subject: below --
2017-08-22 9:31 g6094199
2017-08-22 10:28 ` Dmitrii Tcvetkov
2017-08-23 13:12 g6094199
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).