Re: disk order problem in a raid 10 array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Xavier Brochard <xavier@alternatif.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: disk order problem in a raid 10 array
Date: Sat, 19 Mar 2011 12:42:47 +1100	[thread overview]
Message-ID: <20110319124247.41bac36a@notabene.brown> (raw)
In-Reply-To: <201103190059.08093.xavier@alternatif.org>

On Sat, 19 Mar 2011 00:59:07 +0100 Xavier Brochard <xavier@alternatif.org>
wrote:

> Le samedi 19 mars 2011 00:20:39 NeilBrown, vous avez écrit :
> > On Fri, 18 Mar 2011 23:50:18 +0100 Xavier Brochard <xavier@alternatif.org>
> > > Le vendredi 18 mars 2011 23:22:51, NeilBrown  écrivait :
> > > > On Fri, 18 Mar 2011 21:12:49 +0100 Xavier Brochard
> > > > > Le vendredi 18 mars 2011 18:22:34 hansbkk@gmail.com, vous avez écrit :
> > > > > > On Fri, Mar 18, 2011 at 9:49 PM, Xavier Brochard
> > > > > > <xavier@alternatif.org>
> > > > > 
> > > > > wrote:
> > > > > > > disk order is mixed between each boot - even with live-cd.
> > > > > > > is that normal?
> > > > > > 
> > > > > > If nothing is changing and the order is swapping really every boot,
> > > > > > then IMO that is odd.
> > > > > 
> > > > > nothing has changed, except kernel minor version
> > > > 
> > > > Yet you don't tell us what the kernel minor version changed from or to.
> > > 
> > > Previously it was ubuntu 2.6.32-27-server or 2.6.32-28-server and now it
> > > is ubuntu 2.6.32-29.58-server 2.6.32.28+drm33.13
> > > 
> > > > That may not be important, but it might and you obviously don't know
> > > > which. It is always better to give too much information rather than
> > > > not enough.
> 
> > > 
> > > Here's full output of mdadm --examine /dev/sd[cdefg]1
> > > As you can see, disks sdc, sdd and sde claims to be different, is it a
> > > problem?
> > 
> > Where all of these outputs collected at the same time?
> 
> yes
> 
> > They seem
> > inconsistent.
> 
> > In particular, sdc1 has a higher 'events' number than the others (154 vs
> > 102) yet an earlier Update Time.  It also thinks that the array is
> > completely failed.
> 
> When I removed that disk (sdc is number 2) and another one (I tried with 
> different disks), all other disks display (with mdadm -E):
> 0	Active
> 1	Active
> 2	Active
> 3 	Active
> 4	Spare
> 
> But when I removed that disk (#2) and #0, it start to recover and all other 
> disks display (with mdadm -E):
> 0	Removed
> 1	Active
> 2	Faulty removed
> 3	Active
> 4	Spare
> That looks coherent for me, now.
> 
> > So I suspect that device is badly confused and you probably want to zero
> > it's metadata ... but don't do that too hastily.
> > 
> > All the other devices think the array is working correctly with a full
> > compliment of devices.  However there is no device which claims to
> > be "RaidDevice 2" - except sdc1 and it is obviously confused..
> > 
> > The device name listed in the table at the end of --examine output.
> > It is the name that the device had when the metadata was last written.  And
> > device names can change on reboot.
> > The fact that the names don't line up suggest that the metadata hasn't been
> > written since the last reboot - so presumably you aren't really using the
> > array.(???)
> > 
> > [the newer 1.x metadata format doesn't try to record the names of devices
> > in the superblock so it doesn't result in some of this confusion).
> > 
> > 
> > Based on your earlier email, it would appear that the device discovery for
> > some of your devices is happening in parallel at boot time, so or ordering
> > could be random - each time you boot you get a different order.  This will
> > not confuse md or mdadm - they look at the content of the devices rather
> > than the name.
> > If you want a definitive name for each device, it might be a good idea to
> > look in /dev/disk/by-path or /dev/disk/by-id and use names from there.
> > 
> > Could you please sent a complete output of:
> > 
> >    cat /proc/mdstat
> >    mdadm -D /dev/md0
> >    mdadm -E /dev/sd?1
> > 
> > all collected at the same time.  Then I will suggest if there is any action
> > you should take to repair anything.
> 
> Here it is, thankyou for you help
> 

I suggest you:

  mdadm --zero /dev/sdb1

having first double-checked that sdb1 is the devices with Events of 154,

then

 mdadm -S /dev/md0
 mdadm -As /dev/md0


and let the  array rebuild the spare.
Then check the data and make sure it is all good.
Then add /dev/sdb1 back in as the spare
  mdadm /dev/md0 --add /dev/sdb1

and everything should be fine - providing you don't hit any hardware errors
etc.


NeilBrown




> mdstat:
> =====
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] 
> [raid10] 
> md0 : inactive sdb1[2](S) sdf1[4](S) sdd1[3](S) sdc1[1](S) sde1[0](S)
>       2441919680 blocks
>        
> unused devices: <none>
> ====
> obviously, mdadm -D /dev/md0 output nothing
> 
> mdadm -E /dev/sd?1
> ====
> /dev/sdb1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 5
> Preferred Minor : 0
> 
>     Update Time : Wed Mar 16 09:50:03 2011
>           State : clean
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 2
>   Spare Devices : 0
>        Checksum : ec151590 - correct
>          Events : 154
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     2       8       65        2      active sync   /dev/sde1
> 
>    0     0       0        0        0      removed
>    1     1       0        0        1      faulty removed
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       0        0        3      faulty removed
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
> 
>     Update Time : Fri Mar 18 16:37:45 2011
>           State : clean
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : ec181672 - correct
>          Events : 107
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     1       8       17        1      active sync   /dev/sdb1
> 
>    0     0       0        0        0      removed
>    1     1       8       17        1      active sync   /dev/sdb1
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       33        4      spare   /dev/sdc1
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
> 
>     Update Time : Fri Mar 18 16:37:45 2011
>           State : clean
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : ec181696 - correct
>          Events : 107
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     3       8       49        3      active sync   /dev/sdd1
> 
>    0     0       0        0        0      removed
>    1     1       8       17        1      active sync   /dev/sdb1
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       33        4      spare   /dev/sdc1
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 5
> Preferred Minor : 0
> 
>     Update Time : Wed Mar 16 07:43:45 2011
>           State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 1
>        Checksum : ec14f740 - correct
>          Events : 102
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8       33        0      active sync   /dev/sdc1
> 
>    0     0       8       33        0      active sync   /dev/sdc1
>    1     1       8       49        1      active sync   /dev/sdd1
>    2     2       8       65        2      active sync   /dev/sde1
>    3     3       8       81        3      active sync   /dev/sdf1
>    4     4       8       97        4      spare
> /dev/sdf1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : b784237b:5a021f4d:4cf004e3:2cb521cf
>   Creation Time : Sun Jan  2 16:41:45 2011
>      Raid Level : raid10
>   Used Dev Size : 488383936 (465.76 GiB 500.11 GB)
>      Array Size : 976767872 (931.52 GiB 1000.21 GB)
>    Raid Devices : 4
>   Total Devices : 3
> Preferred Minor : 0
> 
>     Update Time : Fri Mar 18 16:37:45 2011
>           State : clean
>  Active Devices : 2
> Working Devices : 3
>  Failed Devices : 1
>   Spare Devices : 1
>        Checksum : ec181682 - correct
>          Events : 107
> 
>          Layout : near=2
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     4       8       33        4      spare   /dev/sdc1
> 
>    0     0       0        0        0      removed
>    1     1       8       17        1      active sync   /dev/sdb1
>    2     2       0        0        2      faulty removed
>    3     3       8       49        3      active sync   /dev/sdd1
>    4     4       8       33        4      spare   /dev/sdc1
> ====
> 
> 
> 
> Xavier
> xavier@alternatif.org - 09 54 06 16 26

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-03-19  1:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-18 14:49 disk order problem in a raid 10 array Xavier Brochard
2011-03-18 17:22 ` hansbkk
2011-03-18 20:09   ` Xavier Brochard
2011-03-18 20:12   ` Xavier Brochard
2011-03-18 22:22     ` NeilBrown
2011-03-18 20:26 ` Adaptive throttling for RAID1 background resync Hari Subramanian
2011-03-18 20:28   ` Roberto Spadim
2011-03-18 20:31     ` Hari Subramanian
2011-03-18 20:36       ` Roberto Spadim
2011-03-18 20:54         ` Hari Subramanian
2011-03-18 21:02           ` Roberto Spadim
2011-03-18 22:11   ` NeilBrown
2011-03-21 21:02     ` Hari Subramanian
2011-03-18 22:14 ` disk order problem in a raid 10 array NeilBrown
     [not found] ` <201103182350.19281.xavier@alternatif.org>
     [not found]   ` <20110319102039.52cc2282@notabene.brown>
2011-03-18 23:59     ` Xavier Brochard
2011-03-19  0:05       ` Xavier Brochard
2011-03-19  0:07         ` Roberto Spadim
2011-03-19  0:25           ` Xavier Brochard
2011-03-19  1:42       ` NeilBrown [this message]
2011-03-19 13:44         ` Xavier Brochard
2011-03-19 15:14           ` Xavier Brochard
2011-03-20  3:53           ` NeilBrown
2011-03-20 10:40             ` Xavier Brochard
2011-03-19 12:01     ` Xavier Brochard
  -- strict thread matches above, loose matches on Subject: below --
2011-03-18 23:06 Xavier Brochard
2011-03-18 23:06 Xavier Brochard
2011-03-18 23:57 ` Roberto Spadim
2011-03-19  0:03   ` Xavier Brochard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110319124247.41bac36a@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=xavier@alternatif.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).