linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ERROR: could not setup extent tree
@ 2020-11-12  3:53 Jean-Denis Girard
  2020-11-12 22:55 ` Chris Murphy
  0 siblings, 1 reply; 3+ messages in thread
From: Jean-Denis Girard @ 2020-11-12  3:53 UTC (permalink / raw)
  To: linux-btrfs

Hi list,

I have a RAID1 Btrfs (on sdb and sdc) behind bcache (on nvme0n1p4):

[jdg@tiare ~]$  lsblk -o NAME,UUID,SIZE,MOUNTPOINT
NAME           UUID                                   SIZE MOUNTPOINT
sdb            8ae3c26b-6932-4dad-89bc-569ae2c74366   3,7T
└─bcache1      c5b8386b-b81d-4473-9340-7b8a74fc3a3c   3,7T
sdc            7ccac426-dc8c-4cb3-9e64-13b1cf48d4bf   3,7T
└─bcache0      c5b8386b-b81d-4473-9340-7b8a74fc3a3c   3,7T
nvme0n1                                             119,2G
├─nvme0n1p1    1725-D2D0                              512M /boot/efi
├─nvme0n1p2    d3cc080c-0c3f-4191-a25d-7c419e00316a    40G /
├─nvme0n1p3    572b43a3-7690-4daa-beeb-d1c030f194e8    16G [SWAP]
└─nvme0n1p4    a3ed0098-36b4-46a6-8e38-efe9b9a94e52  62,8G <- bcache

The Btrfs filesystem is used for /home (one subvolume per user).

An error happened during the nightly backup on nvme0 (see below) and 
Btrfs went readonly. After reboot, it refused to mount.

I'm on Fedora-32 with kernel-5.9.7, and I compiled latest btrfs-progs:

[root@tiare btrfs-progs-5.9]# ./btrfs -v check  /dev/bcache0
Opening filesystem to check...
parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
Ignoring transid failure
ERROR: could not setup extent tree
ERROR: cannot open file system

I have restored from backups on a different disk, but still, I would be 
interested in trying to restore the broken filesystem: what should I try?

/var/log/messages :
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 0 QID 5 timeout, aborting
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 1 QID 5 timeout, aborting
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 2 QID 5 timeout, aborting
Nov 11 00:24:28 tiare kernel: nvme nvme0: Abort status: 0x0
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 3 QID 5 timeout, aborting
Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 4 QID 5 timeout, aborting
  ...
Nov 11 00:24:58 tiare kernel: nvme nvme0: I/O 0 QID 5 timeout, reset 
controller
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333328 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from
  cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333344 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333384 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333424 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333464 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 153333520 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 142766872 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4: 
IO error on reading from cache, recovering.
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 142766888 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: bcache: bch_cache_set_error() error on 
db563a68-d350-4eaf-978b-eee7095543c5: nvme0n1p4: too many IO errors 
reading from cache#012, disabling caching
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 142766912 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev 
nvme0n1, sector 142766936 op 0x0:(READ) flags 0x80700 phys_seg 1 prio 
class 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache1 is "auto" and cache is dirty, stop 
it to avoid potential data corruption.
Nov 11 00:24:58 tiare kernel: bcache: conditional_stop_bcache_device() 
stop_when_cache_set_failed of bcache0 is "auto" and cache is dirty, stop 
it to avoid potential data corruption.
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 1, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 2, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 3, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 4, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev 
/dev/bcache0 errs: wr 3, rd 5, flush 0, corrupt 0, gen 0
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc: 
Read-ahead I/O failed on backing device, ignore
Nov 11 00:24:58 tiare kernel: nvme nvme0: 8/0/0 default/read/poll queues


Thanks,
-- 
Jean-Denis Girard

SysNux                   Systèmes   Linux   en   Polynésie  française
https://www.sysnux.pf/   Tél: +689 40.50.10.40 / GSM: +689 87.797.527


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ERROR: could not setup extent tree
  2020-11-12  3:53 ERROR: could not setup extent tree Jean-Denis Girard
@ 2020-11-12 22:55 ` Chris Murphy
  2020-11-12 23:08   ` Jean-Denis Girard
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Murphy @ 2020-11-12 22:55 UTC (permalink / raw)
  To: Jean-Denis Girard; +Cc: Btrfs BTRFS

On Wed, Nov 11, 2020 at 10:32 PM Jean-Denis Girard <jd.girard@sysnux.pf> wrote:
>
> Hi list,
>
> I have a RAID1 Btrfs (on sdb and sdc) behind bcache (on nvme0n1p4):
>
> [jdg@tiare ~]$  lsblk -o NAME,UUID,SIZE,MOUNTPOINT
> NAME           UUID                                   SIZE MOUNTPOINT
> sdb            8ae3c26b-6932-4dad-89bc-569ae2c74366   3,7T
> └─bcache1      c5b8386b-b81d-4473-9340-7b8a74fc3a3c   3,7T
> sdc            7ccac426-dc8c-4cb3-9e64-13b1cf48d4bf   3,7T
> └─bcache0      c5b8386b-b81d-4473-9340-7b8a74fc3a3c   3,7T
> nvme0n1                                             119,2G
> ├─nvme0n1p1    1725-D2D0                              512M /boot/efi
> ├─nvme0n1p2    d3cc080c-0c3f-4191-a25d-7c419e00316a    40G /
> ├─nvme0n1p3    572b43a3-7690-4daa-beeb-d1c030f194e8    16G [SWAP]
> └─nvme0n1p4    a3ed0098-36b4-46a6-8e38-efe9b9a94e52  62,8G <- bcache
>
> The Btrfs filesystem is used for /home (one subvolume per user).
>
> An error happened during the nightly backup on nvme0 (see below) and
> Btrfs went readonly. After reboot, it refused to mount.
>
> I'm on Fedora-32 with kernel-5.9.7, and I compiled latest btrfs-progs:
>
> [root@tiare btrfs-progs-5.9]# ./btrfs -v check  /dev/bcache0
> Opening filesystem to check...
> parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
> parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
> parent transid verify failed on 3010317451264 wanted 29647859 found 29647852
> Ignoring transid failure
> ERROR: could not setup extent tree
> ERROR: cannot open file system
>
> I have restored from backups on a different disk, but still, I would be
> interested in trying to restore the broken filesystem: what should I try?
>
> /var/log/messages :
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 0 QID 5 timeout, aborting
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 1 QID 5 timeout, aborting
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 2 QID 5 timeout, aborting
> Nov 11 00:24:28 tiare kernel: nvme nvme0: Abort status: 0x0
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 3 QID 5 timeout, aborting
> Nov 11 00:24:28 tiare kernel: nvme nvme0: I/O 4 QID 5 timeout, aborting
>   ...
> Nov 11 00:24:58 tiare kernel: nvme nvme0: I/O 0 QID 5 timeout, reset
> controller
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 153333328 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4:
> IO error on reading from
>   cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 153333344 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4:
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 153333384 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4:
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 153333424 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4:
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 153333464 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4:
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 153333520 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4:
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 142766872 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_io_errors() nvme0n1p4:
> IO error on reading from cache, recovering.
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 142766888 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_cache_set_error() error on
> db563a68-d350-4eaf-978b-eee7095543c5: nvme0n1p4: too many IO errors
> reading from cache#012, disabling caching
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 142766912 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache0 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: blk_update_request: I/O error, dev
> nvme0n1, sector 142766936 op 0x0:(READ) flags 0x80700 phys_seg 1 prio
> class 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache0 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache0 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: conditional_stop_bcache_device()
> stop_when_cache_set_failed of bcache1 is "auto" and cache is dirty, stop
> it to avoid potential data corruption.
> Nov 11 00:24:58 tiare kernel: bcache: conditional_stop_bcache_device()
> stop_when_cache_set_failed of bcache0 is "auto" and cache is dirty, stop
> it to avoid potential data corruption.
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache0 errs: wr 3, rd 1, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache0 errs: wr 3, rd 2, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache0 errs: wr 3, rd 3, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache0 errs: wr 3, rd 4, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: BTRFS error (device bcache0): bdev
> /dev/bcache0 errs: wr 3, rd 5, flush 0, corrupt 0, gen 0
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: bcache: bch_count_backing_io_errors() sdc:
> Read-ahead I/O failed on backing device, ignore
> Nov 11 00:24:58 tiare kernel: nvme nvme0: 8/0/0 default/read/poll queues


Hypothesis: The NVMe drive has had some kind of failure, and since
this single NVMe is used as cache for both HDDs, in effect this
thwarts the raid1 protection of Btrfs. i.e. you don't have complete
hardware isolation by having dedicated SSD's to use as cache for each
HDD. Something went wrong, and it's adversely affected the writes to
both drives. Btrfs is reporting write errors for both bcache0 and
bcache1 at the same time.

I don't know for sure what the next step is, so my strong advice is to
make no changes until the problem and path forward is well understood.
The more things are changed at this point, the greater the likelihood
of non-recovery. Importantly, I'd say whatever you do should be
reversible, until you get superior advice.

You might consider reposting or cross-posting on the bcache list and
see if they have some advice for recovery, or maybe it's safer to just
decouple bcache, and once the HDDs are freed to see if Btrfs can
recover on its own.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ERROR: could not setup extent tree
  2020-11-12 22:55 ` Chris Murphy
@ 2020-11-12 23:08   ` Jean-Denis Girard
  0 siblings, 0 replies; 3+ messages in thread
From: Jean-Denis Girard @ 2020-11-12 23:08 UTC (permalink / raw)
  To: linux-btrfs

Le 12/11/2020 à 12:55, Chris Murphy a écrit :
> Hypothesis: The NVMe drive has had some kind of failure, and since
> this single NVMe is used as cache for both HDDs, in effect this
> thwarts the raid1 protection of Btrfs. i.e. you don't have complete
> hardware isolation by having dedicated SSD's to use as cache for each
> HDD. Something went wrong, and it's adversely affected the writes to
> both drives. Btrfs is reporting write errors for both bcache0 and
> bcache1 at the same time.

ok, it makes sense, so I made a mistake with this setup...

> I don't know for sure what the next step is, so my strong advice is to
> make no changes until the problem and path forward is well understood.
> The more things are changed at this point, the greater the likelihood
> of non-recovery. Importantly, I'd say whatever you do should be
> reversible, until you get superior advice.

I restored from backups on a different HDD, so the original setup has 
not been touched.

> You might consider reposting or cross-posting on the bcache list and
> see if they have some advice for recovery, or maybe it's safer to just
> decouple bcache, and once the HDDs are freed to see if Btrfs can
> recover on its own.

Good idea, I'll also post on bcache list.


Thanks for your reply Chris!

Best regards,
-- 
Jean-Denis Girard

SysNux                   Systèmes   Linux   en   Polynésie  française
https://www.sysnux.pf/   Tél: +689 40.50.10.40 / GSM: +689 87.797.527


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-11-12 23:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-12  3:53 ERROR: could not setup extent tree Jean-Denis Girard
2020-11-12 22:55 ` Chris Murphy
2020-11-12 23:08   ` Jean-Denis Girard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).