bad btree header at bucket xxx.

public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed

* bad btree header at bucket xxx.
@ 2013-11-03 23:39 Thierry
       [not found] ` <5276DE9D.6000401-gY9mk9TZ+INg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Thierry @ 2013-11-03 23:39 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi,

After a reboot, the bcache devices didn't appeared. Looking at dmesg I
foud out the following entries:
[ 12.182288] bcache: register_bdev() registered backing device dm-0
[ 12.217379] bcache: register_bdev() registered backing device dm-1
[ 12.253040] bcache: error on d8e791cc-47c2-449c-909a-1e4433a5de11: bad
btree header at bucket 32018, block 0, 0 keys, disabling caching
[ 12.253050] bcache: register_cache() registered cache device sdb1
[ 12.253488] bcache: cache_set_free() Cache set
d8e791cc-47c2-449c-909a-1e4433a5de11 unregistered

Here is my device map
NAME MAJ:MIN RM SIZE TYPE FSTYPE MOUNTPOINT UUID PARTUUID
sda 8:0 0 931.5G disk
├─sda1 8:1 0 150M part ext2 0ced49ec-a9ec-4711-95b6-8ce01153ee3c
2a20f8be-8162-48db-9e51-d97e8fde21ae
├─sda2 8:2 0 49.9G part ext4 / 79f36847-8365-46da-85a1-cbab5ea9e30a
59d37a51-5f05-4356-857c-1937b793e9b1
├─sda3 8:3 0 881.5G part LVM2_member
Vzf2b6-NmgQ-5Zff-QKRS-puR8-Dlhz-XIkY3t e337497b-b1e3-479e-ba29-e1b203434e97
│ ├─vgDataNew-lvHome (dm-0) 254:0 0 50G lvm bcache
ff11059e-0edf-428d-9657-37f255bac07e
│ │ └─bcache0 252:0 0 50G disk ext4 /home
66bf41ef-174a-4770-b9ef-d37ab06d3f23
│ ├─vgDataNew-lvVM (dm-1) 254:1 0 100G lvm bcache
9497600e-839f-4f59-862a-a7512cb59f0a
│ │ └─bcache1 252:1 0 100G disk ext4 /data/vm
5ed8126a-2bda-465e-bedb-61272d878fde
│ ├─vgDataNew-lvDownloads (dm-2) 254:2 0 50G lvm /home/xxx/xxx
│ └─vgDataNew-lvSwap (dm-3) 254:3 0 12G lvm [SWAP]
└─sda4 8:4 0 1007K part f6826e28-5dc0-4122-ba84-8ccaffd900fd
sdb 8:16 0 223.6G disk
└─sdb1 8:17 0 223.6G part bcache dff8e23e-7f9a-4a14-94a0-ec5b618a35c4
ba9624c2-7fb7-4cfb-8d8a-1da6ca429820
sr0

Here is the output of bacache-super-show

bcache-super-show /dev/sdb1
sb.magic ok
sb.first_sector 8 [match]
sb.csum C161466F71949DD0 [match]
sb.version 3 [cache device]

dev.uuid dff8e23e-7f9a-4a14-94a0-ec5b618a35c4
dev.sectors_per_block 8
dev.sectors_per_bucket 1024
dev.cache.first_sector 1024
dev.cache.cache_sectors 468858880
dev.cache.total_sectors 468859904
dev.cache.discard yes
dev.cache.pos 0

cset.uuid d8e791cc-47c2-449c-909a-1e4433a5de11


bcache-super-show /dev/vgDataNew/lvHome
sb.magic ok
sb.first_sector 8 [match]
sb.csum ADBDC19FD7011CAB [match]
sb.version 1 [backing device]

dev.uuid ff11059e-0edf-428d-9657-37f255bac07e
dev.sectors_per_block 8
dev.sectors_per_bucket 1024
dev.data.first_sector 16
dev.data.cache_mode 3 [no caching]
dev.data.cache_state 3 [inconsistent]

cset.uuid d8e791cc-47c2-449c-909a-1e4433a5de11


bcache-super-show /dev/vgDataNew/lvVM
sb.magic ok
sb.first_sector 8 [match]
sb.csum BFD4613F155DD0D7 [match]
sb.version 1 [backing device]

dev.uuid 9497600e-839f-4f59-862a-a7512cb59f0a
dev.sectors_per_block 8
dev.sectors_per_bucket 1024
dev.data.first_sector 16
dev.data.cache_mode 3 [no caching]
dev.data.cache_state 3 [inconsistent]

cset.uuid d8e791cc-47c2-449c-909a-1e4433a5de11

As you see I restarted the devices after having changed the cache mode
to "none" followed by a echo 1 > running (The state of the 2 devices
were dirty)
e2fsck do not report any problem on any of the bcache device

I am using gentoo kernel 3.11.6.

Not sure it is related but the last operations before reboot were:
* extend the lvDownloads (without FS resize)
* erronously executed resize2fs /dev/bcache1 which resulted in a message
that indicated that the device already had the maximum length.

Are there other "emegency actions" I should take (cache is till reported
as inconsistent)?

Any idea what can be the cause of this corruption? Do you need more
information to investigate the issue?

How can I fix it now?

Thanks!

Thierry

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <5276DE9D.6000401-gY9mk9TZ+INg9hUCZPvPmw@public.gmane.org>]

* Re: bad btree header at bucket xxx.
       [not found] ` <5276DE9D.6000401-gY9mk9TZ+INg9hUCZPvPmw@public.gmane.org>
@ 2013-11-09  8:47   ` Thierry De Leeuw
  2013-11-09 10:37   ` Rolf Fokkens
  1 sibling, 0 replies; 6+ messages in thread
From: Thierry De Leeuw @ 2013-11-09  8:47 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi,

Any idea? I would like to come back to a sane situation as soon as possible.

Thanks!

Thierry
On 11/04/2013 12:39 AM, Thierry wrote:
> Hi,
>
> After a reboot, the bcache devices didn't appeared. Looking at dmesg I
> foud out the following entries:
> [ 12.182288] bcache: register_bdev() registered backing device dm-0
> [ 12.217379] bcache: register_bdev() registered backing device dm-1
> [ 12.253040] bcache: error on d8e791cc-47c2-449c-909a-1e4433a5de11: bad
> btree header at bucket 32018, block 0, 0 keys, disabling caching
> [ 12.253050] bcache: register_cache() registered cache device sdb1
> [ 12.253488] bcache: cache_set_free() Cache set
> d8e791cc-47c2-449c-909a-1e4433a5de11 unregistered
>
> Here is my device map
> NAME MAJ:MIN RM SIZE TYPE FSTYPE MOUNTPOINT UUID PARTUUID
> sda 8:0 0 931.5G disk
> ├─sda1 8:1 0 150M part ext2 0ced49ec-a9ec-4711-95b6-8ce01153ee3c
> 2a20f8be-8162-48db-9e51-d97e8fde21ae
> ├─sda2 8:2 0 49.9G part ext4 / 79f36847-8365-46da-85a1-cbab5ea9e30a
> 59d37a51-5f05-4356-857c-1937b793e9b1
> ├─sda3 8:3 0 881.5G part LVM2_member
> Vzf2b6-NmgQ-5Zff-QKRS-puR8-Dlhz-XIkY3t e337497b-b1e3-479e-ba29-e1b203434e97
> │ ├─vgDataNew-lvHome (dm-0) 254:0 0 50G lvm bcache
> ff11059e-0edf-428d-9657-37f255bac07e
> │ │ └─bcache0 252:0 0 50G disk ext4 /home
> 66bf41ef-174a-4770-b9ef-d37ab06d3f23
> │ ├─vgDataNew-lvVM (dm-1) 254:1 0 100G lvm bcache
> 9497600e-839f-4f59-862a-a7512cb59f0a
> │ │ └─bcache1 252:1 0 100G disk ext4 /data/vm
> 5ed8126a-2bda-465e-bedb-61272d878fde
> │ ├─vgDataNew-lvDownloads (dm-2) 254:2 0 50G lvm /home/xxx/xxx
> │ └─vgDataNew-lvSwap (dm-3) 254:3 0 12G lvm [SWAP]
> └─sda4 8:4 0 1007K part f6826e28-5dc0-4122-ba84-8ccaffd900fd
> sdb 8:16 0 223.6G disk
> └─sdb1 8:17 0 223.6G part bcache dff8e23e-7f9a-4a14-94a0-ec5b618a35c4
> ba9624c2-7fb7-4cfb-8d8a-1da6ca429820
> sr0
>
> Here is the output of bacache-super-show
>
> bcache-super-show /dev/sdb1
> sb.magic ok
> sb.first_sector 8 [match]
> sb.csum C161466F71949DD0 [match]
> sb.version 3 [cache device]
>
> dev.uuid dff8e23e-7f9a-4a14-94a0-ec5b618a35c4
> dev.sectors_per_block 8
> dev.sectors_per_bucket 1024
> dev.cache.first_sector 1024
> dev.cache.cache_sectors 468858880
> dev.cache.total_sectors 468859904
> dev.cache.discard yes
> dev.cache.pos 0
>
> cset.uuid d8e791cc-47c2-449c-909a-1e4433a5de11
>
>
> bcache-super-show /dev/vgDataNew/lvHome
> sb.magic ok
> sb.first_sector 8 [match]
> sb.csum ADBDC19FD7011CAB [match]
> sb.version 1 [backing device]
>
> dev.uuid ff11059e-0edf-428d-9657-37f255bac07e
> dev.sectors_per_block 8
> dev.sectors_per_bucket 1024
> dev.data.first_sector 16
> dev.data.cache_mode 3 [no caching]
> dev.data.cache_state 3 [inconsistent]
>
> cset.uuid d8e791cc-47c2-449c-909a-1e4433a5de11
>
>
> bcache-super-show /dev/vgDataNew/lvVM
> sb.magic ok
> sb.first_sector 8 [match]
> sb.csum BFD4613F155DD0D7 [match]
> sb.version 1 [backing device]
>
> dev.uuid 9497600e-839f-4f59-862a-a7512cb59f0a
> dev.sectors_per_block 8
> dev.sectors_per_bucket 1024
> dev.data.first_sector 16
> dev.data.cache_mode 3 [no caching]
> dev.data.cache_state 3 [inconsistent]
>
> cset.uuid d8e791cc-47c2-449c-909a-1e4433a5de11
>
> As you see I restarted the devices after having changed the cache mode
> to "none" followed by a echo 1 > running (The state of the 2 devices
> were dirty)
> e2fsck do not report any problem on any of the bcache device
>
> I am using gentoo kernel 3.11.6.
>
> Not sure it is related but the last operations before reboot were:
> * extend the lvDownloads (without FS resize)
> * erronously executed resize2fs /dev/bcache1 which resulted in a message
> that indicated that the device already had the maximum length.
>
> Are there other "emegency actions" I should take (cache is till reported
> as inconsistent)?
>
> Any idea what can be the cause of this corruption? Do you need more
> information to investigate the issue?
>
> How can I fix it now?
>
> Thanks!
>
> Thierry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bad btree header at bucket xxx.
       [not found] ` <5276DE9D.6000401-gY9mk9TZ+INg9hUCZPvPmw@public.gmane.org>
  2013-11-09  8:47   ` Thierry De Leeuw
@ 2013-11-09 10:37   ` Rolf Fokkens
       [not found]     ` <527E1063.4090209-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>
  1 sibling, 1 reply; 6+ messages in thread
From: Rolf Fokkens @ 2013-11-09 10:37 UTC (permalink / raw)
  To: Thierry, linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 11/04/2013 12:39 AM, Thierry wrote:
> After a reboot, the bcache devices didn't appeared. Looking at dmesg I
> foud out the following entries:
> [ 12.182288] bcache: register_bdev() registered backing device dm-0
> [ 12.217379] bcache: register_bdev() registered backing device dm-1
> [ 12.253040] bcache: error on d8e791cc-47c2-449c-909a-1e4433a5de11: bad
> btree header at bucket 32018, block 0, 0 keys, disabling caching
> [ 12.253050] bcache: register_cache() registered cache device sdb1
> [ 12.253488] bcache: cache_set_free() Cache set
> d8e791cc-47c2-449c-909a-1e4433a5de11 unregistered
>
> As you see I restarted the devices after having changed the cache mode
> to "none" followed by a echo 1 > running (The state of the 2 devices
> were dirty)
> e2fsck do not report any problem on any of the bcache device
>
> Are there other "emegency actions" I should take (cache is till reported
> as inconsistent)?
I've seen the same problems in earlier kernels, but not in kernels > 3.11.5.

After also forcing the bcache device to running (echo 1 > running) I 
also recreated the caching device (make-bcache -C ...) and reattached it 
to the bcache device. Have you tried to do that?

Rolf

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <527E1063.4090209-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>]

* Re: bad btree header at bucket xxx.
       [not found]     ` <527E1063.4090209-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>
@ 2013-11-09 13:00       ` Thierry De Leeuw
       [not found]         ` <527E31DF.5080204-gY9mk9TZ+INg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Thierry De Leeuw @ 2013-11-09 13:00 UTC (permalink / raw)
  To: Rolf Fokkens, linux-bcache-u79uwXL29TY76Z2rM5mHXA

Hi Rolf,

Thanks for your answer. Do you know of a specific commit that was made
to fix that issue or it just did not appear anymore? How frequently did
you encoutner this issue?

I also thought of just reformatting the cache, but I would remove all
evidences/traces that could maybe be useful to investigate the issue. So
if I have no-one interested in some post morten on the cache device in
the coming days, I will proceed with that approach.

There is also little tooling to trace, investigate or fix problems
apparently. Is it something planned in the future?

What is also odd to me is that the GUID listed in dmesg does not seem to
match any device (eventhough it is clear that the message is about the
cache device sdb1)

Thanks!

Thierry



On 11/09/2013 11:37 AM, Rolf Fokkens wrote:
> On 11/04/2013 12:39 AM, Thierry wrote:
>> After a reboot, the bcache devices didn't appeared. Looking at dmesg I
>> foud out the following entries:
>> [ 12.182288] bcache: register_bdev() registered backing device dm-0
>> [ 12.217379] bcache: register_bdev() registered backing device dm-1
>> [ 12.253040] bcache: error on d8e791cc-47c2-449c-909a-1e4433a5de11: bad
>> btree header at bucket 32018, block 0, 0 keys, disabling caching
>> [ 12.253050] bcache: register_cache() registered cache device sdb1
>> [ 12.253488] bcache: cache_set_free() Cache set
>> d8e791cc-47c2-449c-909a-1e4433a5de11 unregistered
>>
>> As you see I restarted the devices after having changed the cache mode
>> to "none" followed by a echo 1 > running (The state of the 2 devices
>> were dirty)
>> e2fsck do not report any problem on any of the bcache device
>>
>> Are there other "emegency actions" I should take (cache is till reported
>> as inconsistent)?
> I've seen the same problems in earlier kernels, but not in kernels >
> 3.11.5.
>
> After also forcing the bcache device to running (echo 1 > running) I
> also recreated the caching device (make-bcache -C ...) and reattached
> it to the bcache device. Have you tried to do that?
>
> Rolf
> -- 
> To unsubscribe from this list: send the line "unsubscribe
> linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <527E31DF.5080204-gY9mk9TZ+INg9hUCZPvPmw@public.gmane.org>]

* Re: bad btree header at bucket xxx.
       [not found]         ` <527E31DF.5080204-gY9mk9TZ+INg9hUCZPvPmw@public.gmane.org>
@ 2013-11-10 10:09           ` Rolf Fokkens
       [not found]             ` <527F5B57.1020400-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Rolf Fokkens @ 2013-11-10 10:09 UTC (permalink / raw)
  To: Thierry De Leeuw, linux-bcache-u79uwXL29TY76Z2rM5mHXA

On 11/09/2013 02:00 PM, Thierry De Leeuw wrote:
> Hi Rolf,
>
> Thanks for your answer. Do you know of a specific commit that was made
> to fix that issue or it just did not appear anymore? How frequently did
> you encoutner this issue?
I ran into this once, but haven't had this issue since. I can't recall a 
specific commit, but I think it was kernel 3.11.4.
> I also thought of just reformatting the cache, but I would remove all
> evidences/traces that could maybe be useful to investigate the issue. So
> if I have no-one interested in some post morten on the cache device in
> the coming days, I will proceed with that approach.
> There is also little tooling to trace, investigate or fix problems
> apparently. Is it something planned in the future?
>
> What is also odd to me is that the GUID listed in dmesg does not seem to
> match any device (eventhough it is clear that the message is about the
> cache device sdb1)
When I ran into this issue I was mosty interested in recovering from a 
"broken caching device" which showed not to be that hard. I assumed I 
could gather more evidence if the issue would show up again. I can 
imagine Kent or Gabriel could be interested in some post-mortem 
analysis, but that's up to them.

Rolf

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <527F5B57.1020400-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>]

* Re: bad btree header at bucket xxx.
       [not found]             ` <527F5B57.1020400-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>
@ 2013-11-10 15:16               ` Thierry De Leeuw
  0 siblings, 0 replies; 6+ messages in thread
From: Thierry De Leeuw @ 2013-11-10 15:16 UTC (permalink / raw)
  To: Rolf Fokkens
  Cc: Thierry De Leeuw,
	linux-bcache-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi Rolf

Thanks for the info. I will still wait a bit for Kent or Gabriel to tell if and what info they are interested in (or maybe say that I can drop and recreate the device)

Is there any chance that some data is still in the cache that could be recovered?

Maybe an emergency tool that tries to read the data and apply it before restarting the devices could be useful.

Thanks again for your reply much appreciated!

Thierry

> On 10 nov. 2013, at 11:09, Rolf Fokkens <rolf-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org> wrote:
> 
>> On 11/09/2013 02:00 PM, Thierry De Leeuw wrote:
>> Hi Rolf,
>> 
>> Thanks for your answer. Do you know of a specific commit that was made
>> to fix that issue or it just did not appear anymore? How frequently did
>> you encoutner this issue?
> I ran into this once, but haven't had this issue since. I can't recall a specific commit, but I think it was kernel 3.11.4.
>> I also thought of just reformatting the cache, but I would remove all
>> evidences/traces that could maybe be useful to investigate the issue. So
>> if I have no-one interested in some post morten on the cache device in
>> the coming days, I will proceed with that approach.
>> There is also little tooling to trace, investigate or fix problems
>> apparently. Is it something planned in the future?
>> 
>> What is also odd to me is that the GUID listed in dmesg does not seem to
>> match any device (eventhough it is clear that the message is about the
>> cache device sdb1)
> When I ran into this issue I was mosty interested in recovering from a "broken caching device" which showed not to be that hard. I assumed I could gather more evidence if the issue would show up again. I can imagine Kent or Gabriel could be interested in some post-mortem analysis, but that's up to them.
> 
> Rolf
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-11-10 15:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-03 23:39 bad btree header at bucket xxx Thierry
     [not found] ` <5276DE9D.6000401-gY9mk9TZ+INg9hUCZPvPmw@public.gmane.org>
2013-11-09  8:47   ` Thierry De Leeuw
2013-11-09 10:37   ` Rolf Fokkens
     [not found]     ` <527E1063.4090209-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>
2013-11-09 13:00       ` Thierry De Leeuw
     [not found]         ` <527E31DF.5080204-gY9mk9TZ+INg9hUCZPvPmw@public.gmane.org>
2013-11-10 10:09           ` Rolf Fokkens
     [not found]             ` <527F5B57.1020400-6w2rdlBuEQTpMFipWq+H6g@public.gmane.org>
2013-11-10 15:16               ` Thierry De Leeuw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox