linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bcache error -> btrfs unmountable
@ 2020-11-12 23:25 Jean-Denis Girard
  2020-11-13  7:59 ` Pavel Goran
  0 siblings, 1 reply; 6+ messages in thread
From: Jean-Denis Girard @ 2020-11-12 23:25 UTC (permalink / raw)
  To: linux-bcache

Hi list,

I have a RAID1 Btrfs (on sdb and sdc) behind bcache (on nvme0n1p4):

[jdg@tiare ~]$  lsblk -o NAME,UUID,SIZE,MOUNTPOINT
NAME           UUID                                   SIZE MOUNTPOINT
sdb            8ae3c26b-6932-4dad-89bc-569ae2c74366   3,7T
└─bcache1      c5b8386b-b81d-4473-9340-7b8a74fc3a3c   3,7T
sdc            7ccac426-dc8c-4cb3-9e64-13b1cf48d4bf   3,7T
└─bcache0      c5b8386b-b81d-4473-9340-7b8a74fc3a3c   3,7T
nvme0n1                                             119,2G
├─nvme0n1p1    1725-D2D0                              512M /boot/efi
├─nvme0n1p2    d3cc080c-0c3f-4191-a25d-7c419e00316a    40G /
├─nvme0n1p3    572b43a3-7690-4daa-beeb-d1c030f194e8    16G [SWAP]
└─nvme0n1p4    a3ed0098-36b4-46a6-8e38-efe9b9a94e52  62,8G <- bcache

The Btrfs filesystem is used for /home (one subvolume per user).

An error happened during the nightly backup on nvme0 (see below) and 
Btrfs went readonly. After reboot, it refused to mount.

I'm on Fedora-32 with kernel-5.9.7, and I compiled latest btrfs-progs:

[root@tiare btrfs-progs-5.9]# ./btrfs -v check  /dev/bcache0
Opening filesystem to check...
parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
Ignoring transid failure
ERROR: could not setup extent tree
ERROR: cannot open file system

I have restored from backups on a different disk, but still, I would be 
interested in trying to restore the broken filesystem.

I have posted this message on Btrfs mailing list already. The advice was 
to seek for help here: what should I try? Detach both HDD from bcache? 
Create loopdev on both HDD with losetup -o 8192, then try to mount?


/var/log/messages :
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 0 QID 5 timeout, aborting
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 1 QID 5 timeout, aborting
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 2 QID 5 timeout, aborting
Nov 11 00:24:28 tiare kernel: nvme nvme0: Abort status: 0x0
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 3 QID 5 timeout, aborting
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 4 QID 5 timeout, aborting
  ...
Nov 11 00:24:58 tiare kernel: nvme nvme0: I/O 0 QID 5 timeout, reset 
controller
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333328 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from
  cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333344 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333384 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333424 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333464 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333520 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 142766872 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 142766888 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_cache_set_error() error on 
db563a68-d350-4eaf-978b-eee7095543c5: nvme0n1p4: too many IO errors 
reading from cache#012, disabling caching
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 142766912 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 142766936 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache1 is "auto" and cache is dirty, stop 
it to avoid potential data corruption.
Nov 11 00:24:58 tiare kernel: bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache0 is "auto" and cache is dirty, stop 
it to avoid potential data corruption.
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 1, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 2, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 3, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 4, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 5, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: nvme nvme0: 8/0/0 default/read/poll queues


Thanks,
-- 
Jean-Denis Girard

SysNux                   Systèmes   Linux   en   Polynésie  française
https://www.sysnux.pf/   Tél: +689 40.50.10.40 / GSM: +689 87.797.527


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache error -> btrfs unmountable
  2020-11-12 23:25 bcache error -> btrfs unmountable Jean-Denis Girard
@ 2020-11-13  7:59 ` Pavel Goran
  2020-11-13 15:58   ` Jean-Denis Girard
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Goran @ 2020-11-13  7:59 UTC (permalink / raw)
  To: Jean-Denis Girard; +Cc: linux-bcache

Hello Jean-Denis,

See comments inline.

Friday, November 13, 2020, 6:25:15 AM, you wrote:

> Hi list,

> I have a RAID1 Btrfs (on sdb and sdc) behind bcache (on nvme0n1p4):

What's the cache mode? Writeback, writethrough, writearound?

You can execute 'cat /sys/block/bcache0/bcache/cache_mode' and
'cat /sys/block/bcache1/bcache/cache_mode' to find out.

> [jdg@tiare ~]$  lsblk -o NAME,UUID,SIZE,MOUNTPOINT
> NAME           UUID                                   SIZE MOUNTPOINT
> sdb            8ae3c26b-6932-4dad-89bc-569ae2c74366   3,7T
> L-bcache1      c5b8386b-b81d-4473-9340-7b8a74fc3a3c   3,7T
> sdc            7ccac426-dc8c-4cb3-9e64-13b1cf48d4bf   3,7T
> L-bcache0      c5b8386b-b81d-4473-9340-7b8a74fc3a3c   3,7T
> nvme0n1                                             119,2G
> +-nvme0n1p1    1725-D2D0                              512M /boot/efi
> +-nvme0n1p2    d3cc080c-0c3f-4191-a25d-7c419e00316a    40G /
> +-nvme0n1p3    572b43a3-7690-4daa-beeb-d1c030f194e8    16G [SWAP]
> L-nvme0n1p4    a3ed0098-36b4-46a6-8e38-efe9b9a94e52  62,8G <- bcache

> The Btrfs filesystem is used for /home (one subvolume per user).

> An error happened during the nightly backup on nvme0 (see below) and 
> Btrfs went readonly. After reboot, it refused to mount.

> I'm on Fedora-32 with kernel-5.9.7, and I compiled latest btrfs-progs:

> [root@tiare btrfs-progs-5.9]# ./btrfs -v check  /dev/bcache0
> Opening filesystem to check...
> parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
> parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
> parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
> Ignoring transid failure
> ERROR: could not setup extent tree
> ERROR: cannot open file system

> I have restored from backups on a different disk, but still, I would be 
> interested in trying to restore the broken filesystem.

> I have posted this message on Btrfs mailing list already. The advice was 
> to seek for help here: what should I try? Detach both HDD from bcache? 
> Create loopdev on both HDD with losetup -o 8192, then try to mount?

You could try to detach the *cache* from the bcache devices, and then try to
use the bcache devices as before; it should be possible and harmless,
*unless* the cache mode is "writeback". If it's "writeback", things are more
complicated, and I'll leave them to more experienced people around.

For instructions on how to detach the cache, see, for example,
https://unix.stackexchange.com/a/115808/82477 (it's just the first thing
that I found by googling, and it seems to match what I did when detaching
myself).

IMPORANT: The kernel logs below indicate that bcache failed to do IO on the
cache device. It could be a hardware problem with your NVMe device, so I
suggest you look at its SMART, ASAP.

> /var/log/messages :
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 0 QID 5 timeout, aborting
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 1 QID 5 timeout, aborting
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 2 QID 5 timeout, aborting
> Nov 11 00:24:28 tiare kernel: nvme nvme0: Abort status: 0x0
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 3 QID 5 timeout, aborting
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 4 QID 5 timeout, aborting
>   ...
> Nov 11 00:24:58 tiare kernel: nvme nvme0: I/O 0 QID 5 timeout, reset 
> controller
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 153333328 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
> IO error on reading from
>   cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 153333344 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 153333384 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 153333424 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 153333464 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 153333520 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 142766872 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 142766888 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_cache_set_error() error on 
> db563a68-d350-4eaf-978b-eee7095543c5: nvme0n1p4: too many IO errors 
> reading from cache#012, disabling caching
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 142766912 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache0 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
> nvme0n1, sector 142766936 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
> class 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache0 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache0 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: conditional_stop_bcache_device() 
> stop_when_cache_set_failed of bcache1 is "auto" and cache is dirty, stop 
> it to avoid potential data corruption.
> Nov 11 00:24:58 tiare kernel: bcache: conditional_stop_bcache_device() 
> stop_when_cache_set_failed of bcache0 is "auto" and cache is dirty, stop 
> it to avoid potential data corruption.
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache0 errs: wr 3, rd 1, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache0 errs: wr 3, rd 2, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache0 errs: wr 3, rd 3, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache0 errs: wr 3, rd 4, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
> /dev/bcache0 errs: wr 3, rd 5, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: nvme nvme0: 8/0/0 default/read/poll queues


> Thanks,



Pavel Goran
  


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache error -> btrfs unmountable
  2020-11-13  7:59 ` Pavel Goran
@ 2020-11-13 15:58   ` Jean-Denis Girard
  2020-11-13 16:19     ` Re[2]: " Pavel Goran
  0 siblings, 1 reply; 6+ messages in thread
From: Jean-Denis Girard @ 2020-11-13 15:58 UTC (permalink / raw)
  To: linux-bcache

Hello Pavel,

Le 12/11/2020 à 21:59, Pavel Goran a écrit :
> Hello Jean-Denis,
> 
> See comments inline.
> 
> Friday, November 13, 2020, 6:25:15 AM, you wrote:
> 
>> Hi list,
> 
>> I have a RAID1 Btrfs (on sdb and sdc) behind bcache (on nvme0n1p4):
> 
> What's the cache mode? Writeback, writethrough, writearound?

Sorry I forgot that important information: mode was Writeback.


> You could try to detach the *cache* from the bcache devices, and then try to
> use the bcache devices as before; it should be possible and harmless,
> *unless* the cache mode is "writeback". If it's "writeback", things are more
> complicated, and I'll leave them to more experienced people around.

ok, as I have Writeback, I'll wait for further instructions.

> IMPORANT: The kernel logs below indicate that bcache failed to do IO on the
> cache device. It could be a hardware problem with your NVMe device, so I
> suggest you look at its SMART, ASAP.

Yes, the nvme is having problem... I'll replace it ASAP.


Thanks for your reply Pavel,
Best regards,
-- 
Jean-Denis Girard

SysNux                   Systèmes   Linux   en   Polynésie  française
https://www.sysnux.pf/   Tél: +689 40.50.10.40 / GSM: +689 87.797.527


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re[2]: bcache error -> btrfs unmountable
  2020-11-13 15:58   ` Jean-Denis Girard
@ 2020-11-13 16:19     ` Pavel Goran
  2020-11-13 16:54       ` Jean-Denis Girard
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Goran @ 2020-11-13 16:19 UTC (permalink / raw)
  To: Jean-Denis Girard; +Cc: linux-bcache

Hello Jean-Denis,

Friday, November 13, 2020, 10:58:56 PM, you wrote:

> Hello Pavel,

> Le 12/11/2020 a 21:59, Pavel Goran a ecrit :
>> Hello Jean-Denis,
>> 
>> See comments inline.
>> 
>> Friday, November 13, 2020, 6:25:15 AM, you wrote:
>> 
>>> Hi list,
>> 
>>> I have a RAID1 Btrfs (on sdb and sdc) behind bcache (on nvme0n1p4):
>> 
>> What's the cache mode? Writeback, writethrough, writearound?

> Sorry I forgot that important information: mode was Writeback.

>> You could try to detach the *cache* from the bcache devices, and then try to
>> use the bcache devices as before; it should be possible and harmless,
>> *unless* the cache mode is "writeback". If it's "writeback", things are more
>> complicated, and I'll leave them to more experienced people around.

> ok, as I have Writeback, I'll wait for further instructions.

First, you may want to check if there is any dirty data in the cache, by
executing:
cat /sys/block/bcache0/bcache/state
cat /sys/block/bcache1/bcache/state

If these return "clean", then you should be good to detach the cache.

You will want to try it *before* removing the faulty NVMe storage (which
would obviously make the cache device inaccessible).

>> IMPORANT: The kernel logs below indicate that bcache failed to do IO on the
>> cache device. It could be a hardware problem with your NVMe device, so I
>> suggest you look at its SMART, ASAP.

> Yes, the nvme is having problem... I'll replace it ASAP.


> Thanks for your reply Pavel,
> Best regards,

Pavel Goran
  


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bcache error -> btrfs unmountable
  2020-11-13 16:19     ` Re[2]: " Pavel Goran
@ 2020-11-13 16:54       ` Jean-Denis Girard
  2020-11-13 17:31         ` Re[2]: " Pavel Goran
  0 siblings, 1 reply; 6+ messages in thread
From: Jean-Denis Girard @ 2020-11-13 16:54 UTC (permalink / raw)
  To: linux-bcache

Le 13/11/2020 à 06:19, Pavel Goran a écrit :
> First, you may want to check if there is any dirty data in the cache, by
> executing:
> cat /sys/block/bcache0/bcache/state
> cat /sys/block/bcache1/bcache/state
> 
> If these return "clean", then you should be good to detach the cache.

I get "no cache", not sure why:
[jdg@tiare ~]$ cat /sys/block/bcache{0,1}/bcache/state
no cache
no cache

Here are the kernel logs concerning bcache:
[jdg@tiare ~]$ dmesg | grep bcache
[    9.217610] bcache: bch_journal_replay() journal replay done, 0 keys 
in 1 entries, seq 254637130
[    9.219671] bcache: register_cache() registered cache device nvme0n1p4
[    9.223512] bcache: register_bdev() registered backing device sdc
[    9.226015] bcache: register_bdev() registered backing device sdb
[    9.312796] BTRFS: device fsid c5b8386b-b81d-4473-9340-7b8a74fc3a3c 
devid 2 transid 29647859 /dev/bcache1 scanned by systemd-udevd (314)
[    9.314219] BTRFS: device fsid c5b8386b-b81d-4473-9340-7b8a74fc3a3c 
devid 1 transid 29647859 /dev/bcache0 scanned by systemd-udevd (290)

That was after rebooting, and no trying to mount the broken Btrfs RAID1.

So, should I detach the cache?


Thanks for your assistance Pavel,
Best regards,
-- 
Jean-Denis Girard

SysNux                   Systèmes   Linux   en   Polynésie  française
https://www.sysnux.pf/   Tél: +689 40.50.10.40 / GSM: +689 87.797.527


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re[2]: bcache error -> btrfs unmountable
  2020-11-13 16:54       ` Jean-Denis Girard
@ 2020-11-13 17:31         ` Pavel Goran
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Goran @ 2020-11-13 17:31 UTC (permalink / raw)
  To: Jean-Denis Girard; +Cc: linux-bcache

Hello Jean-Denis,

Friday, November 13, 2020, 11:54:50 PM, you wrote:

> Le 13/11/2020 a 06:19, Pavel Goran a ecrit :
>> First, you may want to check if there is any dirty data in the cache, by
>> executing:
>> cat /sys/block/bcache0/bcache/state
>> cat /sys/block/bcache1/bcache/state
>> 
>> If these return "clean", then you should be good to detach the cache.

> I get "no cache", not sure why:
> [jdg@tiare ~]$ cat /sys/block/bcache{0,1}/bcache/state
> no cache
> no cache

This means that bcache already detached the faulty cache device. The data
that was cached but not written to the backing devices is now lost. (I don't
expect it to be recoverable, but don't take my word for it.) The logs from
your initial message say that the cache was dirty, so there *was* some
non-written data in the cache.

You can now try to check the BTRFS devices, preferably in read-only mode.
(It's important in case you would try to recoved the lost cached data.)
Since you mentioned you had a backup, probably there isn't much sense in
trying to recover the data, so you could just try to mount the BTRFS
filesystem instead. Maybe compare the filesystem contents with what was
restored from the backup, if you are curious (and if the filesystem can be
mounted). Maybe do btrfs scrub, too.

> Here are the kernel logs concerning bcache:
> [jdg@tiare ~]$ dmesg | grep bcache
> [    9.217610] bcache: bch_journal_replay() journal replay done, 0 keys 
> in 1 entries, seq 254637130
> [    9.219671] bcache: register_cache() registered cache device nvme0n1p4
> [    9.223512] bcache: register_bdev() registered backing device sdc
> [    9.226015] bcache: register_bdev() registered backing device sdb
> [    9.312796] BTRFS: device fsid c5b8386b-b81d-4473-9340-7b8a74fc3a3c 
> devid 2 transid 29647859 /dev/bcache1 scanned by systemd-udevd (314)
> [    9.314219] BTRFS: device fsid c5b8386b-b81d-4473-9340-7b8a74fc3a3c 
> devid 1 transid 29647859 /dev/bcache0 scanned by systemd-udevd (290)

> That was after rebooting, and no trying to mount the broken Btrfs RAID1.

> So, should I detach the cache?


> Thanks for your assistance Pavel,
> Best regards,



Pavel Goran
  


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-13 17:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-12 23:25 bcache error -> btrfs unmountable Jean-Denis Girard
2020-11-13  7:59 ` Pavel Goran
2020-11-13 15:58   ` Jean-Denis Girard
2020-11-13 16:19     ` Re[2]: " Pavel Goran
2020-11-13 16:54       ` Jean-Denis Girard
2020-11-13 17:31         ` Re[2]: " Pavel Goran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).