All of lore.kernel.org
 help / color / mirror / Atom feed
* bcache and hibernation
@ 2014-11-13 13:52 Mathijs Kwik
  2014-11-13 15:52 ` Mathijs Kwik
  2014-11-13 22:11 ` Kent Overstreet
  0 siblings, 2 replies; 13+ messages in thread
From: Mathijs Kwik @ 2014-11-13 13:52 UTC (permalink / raw)
  To: linux-bcache

Hi all,

Today, I lost most my data (don't worry, got backups) after the cache
got corrupted somehow. I suspected a recent suspend-to-disk to be the
cause. I checked how my distribution (NixOS) handles suspend/resume and
I have some concerns about how bcache fits into this.

Normally, the kernel and initrd get loaded. The initrd loads required
kernel modules, waits for udev to settle, activates luks&lvm, then
finally asks the kernel to resume from the resume device.

The kernel documentation on suspend is VERY clear you should NOT touch
anything on disk between suspend and resume. So activating luks and LVM
is probably risky already, but it apppears both luks and LVM do not make
any on-disk changes when activated and any in-memory state (within the
resumed image) is still valid. The benefit of activating luks and LVM
before resume seems to be that it allows resuming from encrypted/lvm
volumes. 

Now, with bcache added, things probably get a bit hairy. NixOS supports
bcache inside the initrd and uses udev rules to activate/attach. I
suspect this is probably unsafe. Probably bcache starts to see if any
dirty pages exist, to write them to the backing store. Even without
writeback caching, the activation of lvm will read some sectors, which
might trigger the cache to update. Then after resuming the image, the
in-memory state is corrupted and further damage occurs. 

- Does this sound plausible? 
- Is there any way to tell bcache to make absolutely no changes to
  either the backing device or the cache?
  Basically like a readaround+writearound which can be triggered on
  hibernate and switched off on resume.

Thanks,
Mathijs

^ permalink raw reply	[flat|nested] 13+ messages in thread
* bcache: bad block header
@ 2018-04-03 19:01 Nikolaus Rath
  2018-04-03 22:38 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Nikolaus Rath @ 2018-04-03 19:01 UTC (permalink / raw)
  To: linux-bcache, linux-block

[ Re-send to both linux-block and linux-bcache ]

Hi,

A few days ago, my system refused to boot because it couldn't find the root=
 filesystem anymore. The root filesystem is ext4 on LVM on dm-crypt on bcac=
he, using kernel 4.9.92 (from Debian stretch). Booting from a recovery medi=
um with Kernel 4.16, I got:

[=C2=A0=C2=A0 84.551715] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 84.553188] bcache: register_bcache() error /dev/sdc2: device =
already registered
[=C2=A0=C2=A0 84.616438] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b1=
35f:
[=C2=A0=C2=A0 84.616440] bad btree header at bucket 85065, block 0, 0 keys
[=C2=A0=C2=A0 84.616442] , disabling caching
[=C2=A0=C2=A0 84.616445] bcache: register_cache() registered cache device s=
db2
[=C2=A0=C2=A0 84.616597] bcache: cache_set_free() Cache set 1330b5f6-0c13-4=
3ec-b925-2ee2734b135f unregistered
[=C2=A0=C2=A0 85.375933]=C2=A0 sdb: sdb1 sdb2 sdb4 < sdb5 >
[=C2=A0=C2=A0 85.416610] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b1=
35f:
[=C2=A0=C2=A0 85.416612] bad btree header at bucket 85065, block 0, 0 keys
[=C2=A0=C2=A0 85.416614] , disabling caching
[=C2=A0=C2=A0 85.416618] bcache: register_cache() registered cache device s=
db2
[=C2=A0=C2=A0 85.416624] bcache: register_bcache() error /dev/sdc2: device =
already registered
[=C2=A0=C2=A0 85.416626] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.416796] bcache: cache_set_free() Cache set 1330b5f6-0c13-4=
3ec-b925-2ee2734b135f unregistered
[=C2=A0=C2=A0 85.488246] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b1=
35f:
[=C2=A0=C2=A0 85.488249] bad btree header at bucket 85065, block 0, 0 keys
[=C2=A0=C2=A0 85.488251] , disabling caching
[=C2=A0=C2=A0 85.488254] bcache: register_cache() registered cache device s=
db2
[=C2=A0=C2=A0 85.488429] bcache: cache_set_free() Cache set 1330b5f6-0c13-4=
3ec-b925-2ee2734b135f unregistered
[=C2=A0=C2=A0 85.560003] bcache: error on 1330b5f6-0c13-43ec-b925-2ee2734b1=
35f:
[=C2=A0=C2=A0 85.560006] bad btree header at bucket 85065, block 0, 0 keys
[=C2=A0=C2=A0 85.560008] , disabling caching
[=C2=A0=C2=A0 85.560013] bcache: register_cache() registered cache device s=
db2
[=C2=A0=C2=A0 85.560017] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.560217] bcache: cache_set_free() Cache set 1330b5f6-0c13-4=
3ec-b925-2ee2734b135f unregistered
[=C2=A0=C2=A0 85.571950] bcache: register_bcache() error /dev/sdc2: device =
already registered
[=C2=A0=C2=A0 85.580628] bcache: register_bcache() error /dev/sdc2: device =
already registered
[=C2=A0=C2=A0 85.761969] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.792749] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.952931] bcache: register_bcache() error /dev/sda4: device =
already registered
[=C2=A0=C2=A0 85.955640] bcache: register_bcache() error /dev/sda4: device =
already registered
[...]

These are the first messages that mention bcache. Note that the first messa=
ge is that the device is already registered - is that normal?

smartctl does not report any errors on backing or caching disks, and the sy=
stem was shutdown cleanly.

The only possibly related thing that comes to mind is that a few days ago I=
 hibernated and resumed the system (this is something I normally don't do).=
 Resume worked fine as far as I could tell though, and there have been no u=
nclean shutdowns.

Is there a way to narrow down what may have caused this corruption?

And, is there a way to gracefully recover from this situation without wipin=
g everything? Since the message mentions only problems with one block, can =
I maybe tell bcache to just ignore/drop this specific block?

Thanks!
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =
=C2=BBTime flies like an arrow, fruit flies like a Banana.=C2=AB

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-04-06  0:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-13 13:52 bcache and hibernation Mathijs Kwik
2014-11-13 15:52 ` Mathijs Kwik
     [not found]   ` <CAPBO7TZF5qUV64UZJVE+WQkKa2aCJSTjkQxh6eVktH7nA41Vqw@mail.gmail.com>
2014-11-13 16:52     ` Mathijs Kwik
     [not found]       ` <CAPBO7TbQA2MbFS43racKOwZ+=U2jC4OcLF413-MvvNKML5=QZQ@mail.gmail.com>
2014-11-13 17:23         ` Mathijs Kwik
2015-02-10 22:36           ` Kai Krakow
2014-11-13 22:11 ` Kent Overstreet
2014-11-30 18:25   ` Mathijs Kwik
2014-11-30 23:24     ` Kent Overstreet
2014-11-30 23:29     ` Kent Overstreet
2014-12-01  8:48       ` Mathijs Kwik
  -- strict thread matches above, loose matches on Subject: below --
2018-04-03 19:01 bcache: bad block header Nikolaus Rath
2018-04-03 22:38 ` Jens Axboe
2018-04-05  8:51   ` bcache and hibernation (was: bcache: bad block header) Nikolaus Rath
2018-04-05 18:13     ` bcache and hibernation Michael Lyle
2018-04-05 19:51       ` Nikolaus Rath
2018-04-06  0:21         ` Michael Lyle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.