Re: RAID6 reshape, 2 disk failures

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stan Hoeppner <stan@hardwarefreak.com>
To: Linux-RAID <linux-raid@vger.kernel.org>
Subject: Re: RAID6 reshape, 2 disk failures
Date: Tue, 16 Oct 2012 21:29:08 -0500	[thread overview]
Message-ID: <507E17F4.9020406@hardwarefreak.com> (raw)
In-Reply-To: <CADNH=7FunKmUSREJJz2Lonqjb42gB_WHMQJ7-jmip-X8-1yZnw@mail.gmail.com>

On 10/16/2012 5:57 PM, Mathias Burén wrote:
> Hi list,
> 
> I started a reshape from 64K chunk size to 512K (now default IIRC).
> During this time 2 disks failed with some time in between. The first
> one was removed by MD, so I shut down and removed the HDD, continued
> the reshape. After a while the second HDD failed. This is what it
> looks liek right now, the second failed HDD still in as you can see:

Apparently you don't realize you're going through all of this for the
sake of a senseless change that will gain you nothing, and cost you
performance.  Large chunk sizes are murder for parity RAID due to the
increased IO bandwidth required during RMW cycles.  The new 512KB
default is way too big.  And with many random IO workloads even 64KB is
a bit large.  This was discussed on this list in detail not long ago.

I guess one positive aspect is you've discovered problems with a couple
of drives.  Better now than later I guess.

-- 
Stan


>  $ iostat -m
> Linux 3.5.5-1-ck (ion)  10/16/2012      _x86_64_        (4 CPU)
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            8.93    7.81    5.40   15.57    0.00   62.28
> 
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sda              38.93         0.00        13.09        939    8134936
> sdb              59.37         5.19         2.60    3224158    1613418
> sdf              59.37         5.19         2.60    3224136    1613418
> sdc              59.37         5.19         2.60    3224134    1613418
> sdd              59.37         5.19         2.60    3224151    1613418
> sde              42.17         3.68         1.84    2289332    1145595
> sdg              59.37         5.19         2.60    3224061    1613418
> sdh               0.00         0.00         0.00          9          0
> md0               0.06         0.00         0.00       2023          0
> dm-0              0.06         0.00         0.00       2022          0
> 
>  $ cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sde1[0](F) sdg1[8] sdc1[5] sdd1[3] sdb1[4] sdf1[9]
>       9751756800 blocks super 1.2 level 6, 64k chunk, algorithm 2
> [7/5] [_UUUUU_]
>       [================>....]  reshape = 84.6% (1650786304/1950351360)
> finish=2089.2min speed=2389K/sec
> 
> unused devices: <none>
> 
>  $ sudo mdadm -D /dev/md0
> [sudo] password for x:
> /dev/md0:
>         Version : 1.2
>   Creation Time : Tue Oct 19 08:58:41 2010
>      Raid Level : raid6
>      Array Size : 9751756800 (9300.00 GiB 9985.80 GB)
>   Used Dev Size : 1950351360 (1860.00 GiB 1997.16 GB)
>    Raid Devices : 7
>   Total Devices : 6
>     Persistence : Superblock is persistent
> 
>     Update Time : Tue Oct 16 23:55:28 2012
>           State : clean, degraded, reshaping
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 1
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>  Reshape Status : 84% complete
>   New Chunksize : 512K
> 
>            Name : ion:0  (local to host ion)
>            UUID : e6595c64:b3ae90b3:f01133ac:3f402d20
>          Events : 8386010
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       65        0      faulty spare rebuilding   /dev/sde1
>        9       8       81        1      active sync   /dev/sdf1
>        4       8       17        2      active sync   /dev/sdb1
>        3       8       49        3      active sync   /dev/sdd1
>        5       8       33        4      active sync   /dev/sdc1
>        8       8       97        5      active sync   /dev/sdg1
>        6       0        0        6      removed
> 
> 
> What is confusing to me is that /dev/sde1 (which is failing) is
> currently marked as rebuilding. But when I check iostat, it's far
> behind the other drives in total I/O since the reshape started, and
> the I/O hasn't actually changed for a few hours. This together with _
> instead of U leads me to believe that it's not actually being used. So
> why does it say rebuilding?
> 
> I guess my question is if it's possible for me to remove the drive, or
> would I mess the array up? I am not going to anything until the
> reshape finishes though.
> 
> Thanks,
> Mathias
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2012-10-17  2:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-16 22:57 RAID6 reshape, 2 disk failures Mathias Burén
2012-10-17  2:29 ` Stan Hoeppner [this message]
2012-10-17  3:06 ` Chris Murphy
2012-10-17  8:03   ` Mathias Burén
2012-10-17  9:09     ` Chris Murphy
     [not found]       ` <CADNH=7GaGCLdK2Rk_A6vPN+Th0z0QYT7mRV0KJH=CoAffuvb6w@mail.gmail.com>
2012-10-17 18:46         ` Chris Murphy
2012-10-17 19:03           ` Mathias Burén
2012-10-17 19:35             ` Chris Murphy
2012-10-18 11:56             ` Stan Hoeppner
2012-10-18 12:17               ` Mathias Burén
2012-10-18 17:11                 ` Mathias Burén
2012-10-18 19:54                   ` Chris Murphy
2012-10-18 20:17                     ` Mathias Burén
2012-10-18 20:58                       ` Stan Hoeppner
2012-10-19 14:32                         ` Offtopic: on case (was: R: RAID6 reshape, 2 disk failures) Carabetta Giulio
2012-10-19 16:44                           ` Offtopic: on case Stan Hoeppner
2012-10-18 21:28                       ` RAID6 reshape, 2 disk failures Chris Murphy
2012-10-21 22:31     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=507E17F4.9020406@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.