linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Balance / Remove Device Errors and Loops
@ 2020-02-19  0:05 Steven Fosdick
  0 siblings, 0 replies; only message in thread
From: Steven Fosdick @ 2020-02-19  0:05 UTC (permalink / raw)
  To: linux-btrfs

Guys,

The recent patch to allow a balance operation to be cancelled more
quickly is very welcome.  I have applied that to 5.5.4 and it has
already avoided me having to do a hard reset.

On a filesystem which has suffered one disc failure I am trying to get
the data properly redundant across the remaining discs.  According to
the documentation 'device replace' needs to be able to read the
superblock on the outgoing device.  As this one has failed and that is
not possible so I mounted the filesystem in degraded mode, added a new
device, which worked fine, and then tried removing the missing device
which did not work.  After some checsum warnings:

Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494749184 csum 0x8941f998 expected csum
0x4c946d24 mirror 2
Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494753280 csum 0x8941f998 expected csum
0x3cacfa54 mirror 2
Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494757376 csum 0x8941f998 expected csum
0x453f4f60 mirror 2
Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494761472 csum 0x8941f998 expected csum
0x5630f6fa mirror 2
Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494765568 csum 0x8941f998 expected csum
0xbf215c7a mirror 2
Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494769664 csum 0x8941f998 expected csum
0x242df5b3 mirror 2
Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494773760 csum 0x8941f998 expected csum
0x84d8643c mirror 2
Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494777856 csum 0x8941f998 expected csum
0xcd4799e3 mirror 2
Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494781952 csum 0x8941f998 expected csum
0x84e72065 mirror 2
Feb 10 19:49:44 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 272 off 1494786048 csum 0x8941f998 expected csum
0xa1a55d97 mirror 2

The device remove failed with I/O error.  As another way to move the
data onto the three remaining discs I tried using a balance filter.  I
was able, I think, to move all the metadata to the remaining discs
with:

btrfs balance start -mdevid=3

and then start on the data.  Trying to do the whole device at once
fails after a checksum warning so I tried working in units of 100Gb,
for example:

btrfs balance start -ddevid=3,drange=0..107374182400 /data

and then:

btrfs balance start -ddevid=3,drange=107374182400..214748364800 /data

etc.  The first ten of these succeeded, though three found nothing to
move.  The next, 1073741824000..1181116006400 failed after a checksum
warning:

Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070544384 csum 0x8941f998 expected csum
0x963bbe29 mirror 2
Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070552576 csum 0x8941f998 expected csum
0xb1aa5076 mirror 2
Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070556672 csum 0x8941f998 expected csum
0x8eefa0c0 mirror 2
Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070548480 csum 0x8941f998 expected csum
0xf865c292 mirror 2
Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070560768 csum 0x8941f998 expected csum
0x8e37b369 mirror 2
Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070564864 csum 0x8941f998 expected csum
0xcf28e045 mirror 2
Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070568960 csum 0x8941f998 expected csum
0xe37e32c0 mirror 2
Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070573056 csum 0x8941f998 expected csum
0xed9dd8b2 mirror 2
Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070577152 csum 0x8941f998 expected csum
0x2f48ca31 mirror 2
Feb 18 23:50:23 meije kernel: BTRFS warning (device sda): csum failed
root -9 ino 601 off 2070581248 csum 0x8941f998 expected csum
0xfb166087 mirror 2

Thereafter, all higher offsets seem to get stuck in a loop.  For example:

Feb 18 23:55:02 meije kernel: BTRFS info (device sda): balance: start
-ddevid=3,drange=1181116006400..1288490188800
Feb 18 23:55:02 meije kernel: BTRFS info (device sda): relocating
block group 2573789560832 flags data|raid5
Feb 18 23:55:27 meije kernel: BTRFS info (device sda): found 865 extents
Feb 18 23:55:30 meije kernel: BTRFS info (device sda): found 862 extents
Feb 18 23:55:31 meije kernel: BTRFS info (device sda): found 855 extents
Feb 18 23:55:32 meije kernel: BTRFS info (device sda): found 855 extents
Feb 18 23:55:33 meije kernel: BTRFS info (device sda): found 855 extents
Feb 18 23:55:34 meije kernel: BTRFS info (device sda): found 855 extents
...

Any idea what is going on here?  There are no errors logged for the
case of being stuck in a loop.  What would cause that loop to go round
again?  I don't even understand why it normally goes round twice.

This doesn't seem close to balancing the majority of the data onto the
working devices:

Data,RAID5: Size:8.95TiB, Used:8.93TiB (99.73%)
   /dev/sda    4.48TiB
   /dev/sdc    4.48TiB
   missing    3.52TiB
   /dev/sdb 976.00GiB # the new device.

This is kernel version 5.5.4-arch1-1 and btrfs --version v5.4.  I
cannot see any changes to btrfs, i.e. the fs/btrfs directory, between
the arch kernel and the vanilla 5.4 (though I have installed the
balance cancel patch from this list as mentioned previously).

Regards,
Steve.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-02-19  0:05 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-19  0:05 Balance / Remove Device Errors and Loops Steven Fosdick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).