read errors with md RAID5 array

Linux RAID subsystem development
 help / color / mirror / Atom feed

* read errors with md RAID5 array
@ 2016-08-15 13:12 Tim Small
  2016-08-15 13:57 ` Chris Murphy
  2016-08-15 14:59 ` Andreas Klauer
  0 siblings, 2 replies; 9+ messages in thread
From: Tim Small @ 2016-08-15 13:12 UTC (permalink / raw)
  To: linux-raid@vger.kernel.org

I'm seeing some strange read errors whilst reading from an md RAID5
array (3x 2TB SATA Drives, Intel AHCI controller).

One of the underlying devices is reporting some "pending sectors" via
SMART, so I triggered a check (via sync_action the pseudo file), but
when this didn't decrease the unreadable sector count, I just did:

dd if=/dev/md2 of=/dev/null conv=noerror

This results in:

[ 1466.586612] buffer_io_error: 85 callbacks suppressed
[ 1466.586617] Buffer I/O error on dev md2, logical block 7057384, async
page read
[ 1466.824085] Buffer I/O error on dev md2, logical block 7057384, async
page read
[ 1466.986397] Buffer I/O error on dev md2, logical block 7057384, async
page read
[ 1467.143073] Buffer I/O error on dev md2, logical block 7057384, async
page read
[ 1467.305265] Buffer I/O error on dev md2, logical block 7057384, async
page read
[ 1467.465493] Buffer I/O error on dev md2, logical block 7057384, async
page read
[ 1467.623860] Buffer I/O error on dev md2, logical block 7057384, async
page read
[ 1467.774287] Buffer I/O error on dev md2, logical block 7057384, async
page read
[ 1467.934768] Buffer I/O error on dev md2, logical block 7057385, async
page read
[ 1468.097099] Buffer I/O error on dev md2, logical block 7057385, async
page read
[ 1569.197498] buffer_io_error: 198 callbacks suppressed
[ 1569.197503] Buffer I/O error on dev md2, logical block 8124804, async
page read
[ 1569.443257] Buffer I/O error on dev md2, logical block 8124804, async
page read
[ 1569.597697] Buffer I/O error on dev md2, logical block 8124804, async
page read
[ 1569.760507] Buffer I/O error on dev md2, logical block 8124804, async
page read
[ 1569.924565] Buffer I/O error on dev md2, logical block 8124804, async
page read
[ 1570.087074] Buffer I/O error on dev md2, logical block 8124804, async
page read
[ 1570.241459] Buffer I/O error on dev md2, logical block 8124804, async
page read
[ 1570.407910] Buffer I/O error on dev md2, logical block 8124804, async
page read
[ 1570.570488] Buffer I/O error on dev md2, logical block 8124805, async
page read
[ 1570.732574] Buffer I/O error on dev md2, logical block 8124805, async
page read


I'm not getting any accompanying reports of underlying SATA read errors,
nor apparently any attempt to correct unreadable sectors.

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
[raid1] [raid10]
md2 : active raid5 sda2[0] sdd2[3] sdc2[1]
      3885793280 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
[UUU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

unused devices: <none>

I thought perhaps that the array was aware of a RAID5 hole, and failing
reads, but this would seem to disagree on that?

# cat /sys/block/md2/md/mismatch_cnt
0

... unless that's not the way to detect such errors?


# uname -a
Linux magic 4.4.0-34-generic #53-Ubuntu SMP Wed Jul 27 16:06:39 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

This is the current Ubuntu LTS kernel.  Were there any known md, or
block layer problems with the 4.4 kernel?  Should I try with the latest
mainline kernel, or am I missing something else entirely?

Tim.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: read errors with md RAID5 array
  2016-08-15 13:12 read errors with md RAID5 array Tim Small
@ 2016-08-15 13:57 ` Chris Murphy
  2016-08-15 14:42   ` Tim Small
  2016-08-15 14:59 ` Andreas Klauer
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2016-08-15 13:57 UTC (permalink / raw)
  To: Tim Small; +Cc: linux-raid@vger.kernel.org

$ sudo smartctl -l scterc <dev>   ## for each device used in the array
$ sudo cat /sys/block/<dev>/device/timeout   ## for each device used
in the array


Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: read errors with md RAID5 array
  2016-08-15 13:57 ` Chris Murphy
@ 2016-08-15 14:42   ` Tim Small
  2016-08-15 16:23     ` Chris Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Small @ 2016-08-15 14:42 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-raid@vger.kernel.org

On 15/08/16 14:57, Chris Murphy wrote:
> $ sudo smartctl -l scterc <dev>   ## for each device used in the array
> $ sudo cat /sys/block/<dev>/device/timeout   ## for each device used
> in the array

These were all reporting:

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

However I'm not sure how this would cause a read error from the md
device itself?  There are no timeout/reset messages in the kernel logs
for the underlying SATA devices?

To check, I've set the ERC on all drives to 6.5 seconds for both reads
and writes, and restarted the "dd if=/dev/md2 of=/dev/null
conv=noerror", and it's just produced read failures at exactly the same
places, with no further kernel messages.

Some scenarios:

1. These are write-hole locations, and the md driver has recorded this
and is failing I/O here (didn't know it did this, and a quick read
through the raid5 code couldn't see this, BICBW as I was just skimming it).

2. Two underlying drives have I/O problems at these locations (but then
why no errors in kernel logs?).

3. Something's bad in the block or ATA layer.

... or something else.

Cheers,

Tim.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: read errors with md RAID5 array
  2016-08-15 13:12 read errors with md RAID5 array Tim Small
  2016-08-15 13:57 ` Chris Murphy
@ 2016-08-15 14:59 ` Andreas Klauer
  2016-08-16 11:40   ` Tim Small
  1 sibling, 1 reply; 9+ messages in thread
From: Andreas Klauer @ 2016-08-15 14:59 UTC (permalink / raw)
  To: Tim Small; +Cc: linux-raid@vger.kernel.org

On Mon, Aug 15, 2016 at 02:12:23PM +0100, Tim Small wrote:
> I'm seeing some strange read errors whilst reading from an md RAID5
> array (3x 2TB SATA Drives, Intel AHCI controller).

mdadm --examine and --examine-badblocks for all disks/partitions?

> One of the underlying devices is reporting some "pending sectors"

smartctl -a for all disks?

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: read errors with md RAID5 array
  2016-08-15 14:42   ` Tim Small
@ 2016-08-15 16:23     ` Chris Murphy
  2016-08-16 12:22       ` Tim Small
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2016-08-15 16:23 UTC (permalink / raw)
  To: Tim Small; +Cc: Chris Murphy, linux-raid@vger.kernel.org

On Mon, Aug 15, 2016 at 8:42 AM, Tim Small <tim@buttersideup.com> wrote:
> On 15/08/16 14:57, Chris Murphy wrote:
>> $ sudo smartctl -l scterc <dev>   ## for each device used in the array
>> $ sudo cat /sys/block/<dev>/device/timeout   ## for each device used
>> in the array
>
> These were all reporting:
>
> SCT Error Recovery Control:
>            Read: Disabled
>           Write: Disabled

You failed to provide the value for the 2nd command. Is it something
other than 30 for each device?

>
> However I'm not sure how this would cause a read error from the md
> device itself?  There are no timeout/reset messages in the kernel logs
> for the underlying SATA devices?

Nevertheless it's a misconfiguration that inhibits proper read error
reporting by the drive, thereby preventing the md driver from fixing
bad sectors via writing good data over them and causing the drive
firmware to sort it out. So you should issue 'smartctl -l scterc,70,70
<dev>' for all devices and make sure this is made persistent at boot
time.

>
> To check, I've set the ERC on all drives to 6.5 seconds for both reads
> and writes, and restarted the "dd if=/dev/md2 of=/dev/null
> conv=noerror", and it's just produced read failures at exactly the same
> places, with no further kernel messages.

Well it isn't really a read error, it's a buffer io error that happens
to be triggered when reading, so it's a little more specific than a
read error. It sounds to me you've run into a bug or there's some kind
of hardware problem somewhere. It might be helpful if you provide the
entire dmesg from boot until the first error message. As well as the
stuff Andreas asked for.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: read errors with md RAID5 array
  2016-08-15 14:59 ` Andreas Klauer
@ 2016-08-16 11:40   ` Tim Small
  2016-08-16 12:27     ` Andreas Klauer
  2016-08-16 18:25     ` Chris Murphy
  0 siblings, 2 replies; 9+ messages in thread
From: Tim Small @ 2016-08-16 11:40 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid@vger.kernel.org

On 15/08/16 15:59, Andreas Klauer wrote:
> On Mon, Aug 15, 2016 at 02:12:23PM +0100, Tim Small wrote:
>> > I'm seeing some strange read errors whilst reading from an md RAID5
>> > array (3x 2TB SATA Drives, Intel AHCI controller).
> mdadm --examine and --examine-badblocks for all disks/partitions?
> 

Hi,

Thanks very much for your suggestions...

# for i in a c d ; do mdadm --examine-badblocks  /dev/sd${i}2 ; done
Bad-blocks on /dev/sda2:
          2321554488 for 512 sectors
          2321555000 for 512 sectors
          2321555512 for 152 sectors
Bad-blocks on /dev/sdc2:
             1656848 for 128 sectors
            28490768 for 512 sectors
            28491280 for 392 sectors
            28572344 for 120 sectors
            32760864 for 128 sectors
          2321554488 for 512 sectors
          2321555000 for 512 sectors
          2321555512 for 152 sectors
Bad-blocks on /dev/sdd2:
             1656848 for 128 sectors
            28490768 for 512 sectors
            28491280 for 392 sectors
            28572344 for 120 sectors
            32760864 for 128 sectors
          2321554488 for 512 sectors
          2321555000 for 512 sectors
          2321555512 for 152 sectors

I didn't know about the bad block functionality in md.  The mdadm manual
page doesn't say much, so is this the canonical document?

http://neil.brown.name/blog/20100519043730

Until recently, two of the drives (sda, sdc) were running a firmware
version which (as far as I can work out) made them occasionally lock up
and disappear from the OS (requiring a power cycle), this firmware has
now been updated, so hopefully they'll now behave.

Degraded array reporting was also broken on this machine for a couple of
weeks due to an email misconfiguration (now fixed), so last week I found
it with sda (ML0220F30ZE35D) apparently missing from the machine, and
also with pending sectors on sdb (ML0220F31085KD).  The array rebuilt
quite quickly from the bitmap, and then I turned to trying to resolve
the pending sectors...

When the 'check' action didn't force the reallocations, I ran a 'repair'
action instead (thinking that perhaps the check wasn't attempting the
read+recontruct+write for some reason, however I now assume that this
was the wrong thing to do in the light of the bad block list entries).

I'm not really sure from the blog post, under what circumstances a bad
block entry would end up being written to multiple devices in the array,
and under what circumstances it might be written to all devices in an
array?  There are no entries on these array members which appear on only
one array member, and some are present on all three drives - which seems
strange to me.

I suppose a combination of the "Firstly" and "Secondly" paragraphs would
result in the same block being marked as bad on two devices.

Will the detection of an inconsistency (e.g. via a check) mark the
stripe which was impacted as bad on all active array members?

FWIW, what I'd like to do in the future with this array, is to reshape
it into a 4 drive RAID6, and then grow it to a 5 drive RAID6, and
possibly replace one or both of sda (ML0220F30ZE35D) and sdc
(ML0220F31085KD).  However I'd like to try and do this without losing
any data which is currently on the array but marked as inaccessible.
I'd also like to avoid losing the entire array, if the reshape fails
when the array is in this state with unreadable portions.

In the meantime I'm trying to work out what data (if any) is now
inaccessible.  This is made slightly more interesting because this array
has 'bcache' sitting in front of it, so I might have good data in the
cache on the SSD which is marked bad/inaccessible on the raid5 md device.

Tim.

# for i in a c d ; do mdadm --examine  /dev/sd${i}2 ; done

/dev/sda2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x9
     Array UUID : ad7ef7fa:e78344ea:a8778f06:abf07bf5
           Name : magic:2  (local to host magic)
  Creation Time : Wed Jul 15 14:43:06 2015
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3885793456 (1852.89 GiB 1989.53 GB)
     Array Size : 3885793280 (3705.78 GiB 3979.05 GB)
  Used Dev Size : 3885793280 (1852.89 GiB 1989.53 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=176 sectors
          State : clean
    Device UUID : fcc77733:e7e3582c:e8bff1ce:dd8d5232

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Aug 16 09:05:35 2016
  Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
       Checksum : d18d7379 - correct
         Events : 520706

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x9
     Array UUID : ad7ef7fa:e78344ea:a8778f06:abf07bf5
           Name : magic:2  (local to host magic)
  Creation Time : Wed Jul 15 14:43:06 2015
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3885793456 (1852.89 GiB 1989.53 GB)
     Array Size : 3885793280 (3705.78 GiB 3979.05 GB)
  Used Dev Size : 3885793280 (1852.89 GiB 1989.53 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=176 sectors
          State : clean
    Device UUID : 55004cc7:b2e691de:c612612a:675ea2f3

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Aug 16 09:05:35 2016
  Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
       Checksum : 345a1f90 - correct
         Events : 520706

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x9
     Array UUID : ad7ef7fa:e78344ea:a8778f06:abf07bf5
           Name : magic:2  (local to host magic)
  Creation Time : Wed Jul 15 14:43:06 2015
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 3885793456 (1852.89 GiB 1989.53 GB)
     Array Size : 3885793280 (3705.78 GiB 3979.05 GB)
  Used Dev Size : 3885793280 (1852.89 GiB 1989.53 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=176 sectors
          State : clean
    Device UUID : 9abd8f30:29cb5ff5:2742646f:df56aa87

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Aug 16 09:05:35 2016
  Bad Block Log : 512 entries available at offset 72 sectors - bad
blocks present.
       Checksum : 8d769b9e - correct
         Events : 520706

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: read errors with md RAID5 array
  2016-08-15 16:23     ` Chris Murphy
@ 2016-08-16 12:22       ` Tim Small
  0 siblings, 0 replies; 9+ messages in thread
From: Tim Small @ 2016-08-16 12:22 UTC (permalink / raw)
  To: Chris Murphy; +Cc: linux-raid@vger.kernel.org



On 15/08/16 17:23, Chris Murphy wrote:
>> > These were all reporting:
>> >
>> > SCT Error Recovery Control:
>> >            Read: Disabled
>> >           Write: Disabled
> 
> You failed to provide the value for the 2nd command. Is it something
> other than 30 for each device?

Sorry about that - it's 30 seconds for all array members.

# for i in a c d ; do cat /sys/block/sd${i}/device/timeout ; done
30
30
30

Thanks,

Tim.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: read errors with md RAID5 array
  2016-08-16 11:40   ` Tim Small
@ 2016-08-16 12:27     ` Andreas Klauer
  2016-08-16 18:25     ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Andreas Klauer @ 2016-08-16 12:27 UTC (permalink / raw)
  To: Tim Small; +Cc: linux-raid@vger.kernel.org

On Tue, Aug 16, 2016 at 12:40:45PM +0100, Tim Small wrote:
> I didn't know about the bad block functionality in md.

I don't know how it's supposed to work either. I disable it everywhere.
(the option was --update=no-bbl but if I remember correctly it will 
accept that only if the bbl is empty)

I don't want arrays to have bad blocks. I don't want disks with bad blocks 
to be left in the array. I don't trust disks that develop defects or lose 
data so the only choice for me is to replace it with a new one.

Silently ignoring disk errors, silently fixing errors in the background, 
keeping bad disks around, in my point of view this will only cause much 
more trouble later on.

I want to be notified about any and all problems md encounters so I can 
decide what to do... unfortunately not many people seem to share this 
view and the "read errors are normal" faction seems to be growing...

Identical bad blocks on multiple devices should be the reason why your 
md is reporting I/O layers; those blocks are already marked bad by md, 
it does not even try to read them from the disks.

The last time I encountered these I ended up editing metadata 
or doing a (dangerous) re-create since I found no other way to 
get rid of them.

> In the meantime I'm trying to work out what data (if any) is now
> inaccessible.  This is made slightly more interesting because this array
> has 'bcache' sitting in front of it, so I might have good data in the
> cache on the SSD which is marked bad/inaccessible on the raid5 md device.

md won't be able to use that to repair by itself. Does bcache have some 
recovery mode that makes it dump back everything that is cached to disk? 
This comes with its own dangers, if the cache is wrong or other bugs...

Usually for such dangerous experiments you would use an overlay 
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
but I'm not sure how well that plays together with bcache either.

If you want to go with re-create in your case it would be something like

mdadm --create /dev/md42 --assume-clean \
    --metadata=1.2 --data-offset=128M --level=5 --chunk=512 --layout=ls \
    --raid-devices=3 /dev/overlay/sd{a,c,d}2

You have to specify all varaibles because mdadm defaults change over time.

Then --stop and --assemble with --update=no-bbl before the horrors repeat...

Mount and verify files for correctness (files larger than disks*chunksize).

Then --add a fourth drive and --replace the one you said has bad sectors 
according to SMART. Book a flight to Olympics in Rio and win a gold medal 
in hard disk long-cast throwing.

Once your RAID is running with three drives that are fully operational 
you can do your RAID6 or whatever.

If you don't have a backup, make one before doing anything else, 
as long as you still have somewhat access to your stuff.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: read errors with md RAID5 array
  2016-08-16 11:40   ` Tim Small
  2016-08-16 12:27     ` Andreas Klauer
@ 2016-08-16 18:25     ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2016-08-16 18:25 UTC (permalink / raw)
  To: Tim Small; +Cc: Andreas Klauer, linux-raid@vger.kernel.org

On Tue, Aug 16, 2016 at 5:40 AM, Tim Small <tim@buttersideup.com> wrote:

> # for i in a c d ; do mdadm --examine-badblocks  /dev/sd${i}2 ; done
> Bad-blocks on /dev/sda2:
>           2321554488 for 512 sectors
>           2321555000 for 512 sectors
>           2321555512 for 152 sectors
> Bad-blocks on /dev/sdc2:
>              1656848 for 128 sectors
>             28490768 for 512 sectors
>             28491280 for 392 sectors
>             28572344 for 120 sectors
>             32760864 for 128 sectors
>           2321554488 for 512 sectors
>           2321555000 for 512 sectors
>           2321555512 for 152 sectors
> Bad-blocks on /dev/sdd2:
>              1656848 for 128 sectors
>             28490768 for 512 sectors
>             28491280 for 392 sectors
>             28572344 for 120 sectors
>             32760864 for 128 sectors
>           2321554488 for 512 sectors
>           2321555000 for 512 sectors
>           2321555512 for 152 sectors

Does this actually jive with what the drive is reporting? I would only
expect bad blocks to get populated if there's a write error, and the
user would only opt in to using a bad blocks list if they're basically
saying they refuse (on mainly economic grounds) that they will/can not
replace a drive that has no reserve sectors remaining for remapping.

I'm with Andreas on this aspect that silently accumulating a list of
bad sectors is specious. But I also can't tell if that's a factor in
this. But this is suggesting f'n ass tons of bad sectors. The sdd2
partition alone has over 2000 bad sectors? What? If that were true
it's a disqualified drive.

>
> I didn't know about the bad block functionality in md.  The mdadm manual
> page doesn't say much, so is this the canonical document?
>
> http://neil.brown.name/blog/20100519043730
>
> Until recently, two of the drives (sda, sdc) were running a firmware
> version which (as far as I can work out) made them occasionally lock up
> and disappear from the OS (requiring a power cycle), this firmware has
> now been updated, so hopefully they'll now behave.

There are user reports of firmware updates causing latent problems
that persist until data is overwritten. I personally always do ATA
Secure Erase, or Enhanced Secure Erase, using hdparm, anytime a drive
gets a firmware update. Unless I simply don't care about the data on
the drive.

> Degraded array reporting was also broken on this machine for a couple of
> weeks due to an email misconfiguration (now fixed), so last week I found
> it with sda (ML0220F30ZE35D) apparently missing from the machine, and
> also with pending sectors on sdb (ML0220F31085KD).  The array rebuilt
> quite quickly from the bitmap, and then I turned to trying to resolve
> the pending sectors...

It's worth checking alignment to make sure writes are for sure
happening on 4KiB boundaries. It's sorta old news, but sometimes old
tools were not using aligned values (parted and fdisk frequently
started the first partition at LBA 34 for example, which is not
aligned). This causes internal RMW in the drive, where any write is
actually treated by the drive first as a read, and can produce a
persistent read error that never gets fixed. To avoid the write being
treated as a RMW, it must be a complete 4KiB write to the aligned 512
byte based LBA. Kinda annoying...

> I'm not really sure from the blog post, under what circumstances a bad
> block entry would end up being written to multiple devices in the array,
> and under what circumstances it might be written to all devices in an
> array?  There are no entries on these array members which appear on only
> one array member, and some are present on all three drives - which seems
> strange to me.

Yes, strange. But even if it's cross reference bad sectors across all
drives, 2000+ bad sectors across 3 drives is too many.

> FWIW, what I'd like to do in the future with this array, is to reshape
> it into a 4 drive RAID6, and then grow it to a 5 drive RAID6, and
> possibly replace one or both of sda (ML0220F30ZE35D) and sdc
> (ML0220F31085KD).

Well I would defer to most anyone on this list, but given the state of
things, I have serious doubts about the array. I personally would
qualify five new drives with badblocks -w using the default 5 rounds
of destructive write-read-verify, making sure to change -b to 4096,
and then make a new raid6 array and migrate the data over.

The current state of the array I consider sufficiently fragile that a
reshape risks all the data. So only if you have a current backup that
you're prepared to need to use would I do a reshape then grow (that's
true anyway even if healthy, but in particular with an array that's in
a weak state).

> In the meantime I'm trying to work out what data (if any) is now
> inaccessible.  This is made slightly more interesting because this array
> has 'bcache' sitting in front of it, so I might have good data in the
> cache on the SSD which is marked bad/inaccessible on the raid5 md device.

OK that's a whole separate ball of wax now.  Do you realize that
bcache is orphaned? For all we know you're running into bcache related
bugs at this point.

If your use case really benefits from an SSD cache, you should look at
lvmcache. And in that case you probably ought to evaluate if it makes
more sense to manage the RAID using LVM entirely rather than mdadm.
It's the same md kernel code, but it's created, managed, monitored all
by LVM tools and metadata. So you get things like per logical volume
RAID levels. The feature set of LVM is really incredibly vast and
often overwhelming, and yet on the RAID front mdadm still has more to
offer and I think is easier to use. But for your use case it might be
easier in the long run to consolidate on one set of tools.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-08-16 18:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-15 13:12 read errors with md RAID5 array Tim Small
2016-08-15 13:57 ` Chris Murphy
2016-08-15 14:42   ` Tim Small
2016-08-15 16:23     ` Chris Murphy
2016-08-16 12:22       ` Tim Small
2016-08-15 14:59 ` Andreas Klauer
2016-08-16 11:40   ` Tim Small
2016-08-16 12:27     ` Andreas Klauer
2016-08-16 18:25     ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox