2 Disks Jumped Out While Reshaping RAID5

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

* 2 Disks Jumped Out While Reshaping RAID5
@ 2009-09-05 20:22 Majed B.
  2009-09-05 21:32 ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Majed B. @ 2009-09-05 20:22 UTC (permalink / raw)
  To: linux-raid

Hello all,

I have posted my problem already here:
http://ubuntuforums.org/showthread.php?p=7900571#post7900571
It also has file attachments of the output of mdadm -E /dev/sd[a-h]1

I appreciate any help on this.
--
      Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2 Disks Jumped Out While Reshaping RAID5
  2009-09-05 20:22 2 Disks Jumped Out While Reshaping RAID5 Majed B.
@ 2009-09-05 21:32 ` NeilBrown
  2009-09-06 10:00   ` Majed B.
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2009-09-05 21:32 UTC (permalink / raw)
  To: Majed B.; +Cc: linux-raid

On Sun, September 6, 2009 6:22 am, Majed B. wrote:
> Hello all,
>
> I have posted my problem already here:
> http://ubuntuforums.org/showthread.php?p=7900571#post7900571
> It also has file attachments of the output of mdadm -E /dev/sd[a-h]1

It seems that you need to log in to read the attachements... so I haven't.

>
> I appreciate any help on this.

Hopefully you just need to add "--force" to the assemble command
and it would all just work.  However I haven't tested that on an array
that is in the process of a reshape so I cannot promise.
I might try to reproduce your situation and with some scratch drives
and check that mdadm -Af does the right thing, but it won't be a day
or so, and as I cannot see the --examine output I might get the
situation a bit wrong ... (hint hint: it is always best to post
full information rather than pointers to it, unless said information is
really really big).

NeilBrown

> --
> Â  Â  Â  Majed B.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2 Disks Jumped Out While Reshaping RAID5
  2009-09-05 21:32 ` NeilBrown
@ 2009-09-06 10:00   ` Majed B.
  2009-09-06 23:52     ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Majed B. @ 2009-09-06 10:00 UTC (permalink / raw)
  To: linux-raid

I forgot to mention that I'm running mdadm 2.6.7.1 (latest in Ubuntu
Server repositories).

I tried forcing the assembly, but as mentioned, I just got an error:
root@Adam:/var/www# mdadm -Af /dev/md0
mdadm: superblock on /dev/sdh1 doesn't match others - assembly aborted

I know I could've pasted whatever I wrote here, but it seemed
redundant. I'll keep your hint in mind for the next time, if any
(hopefully not).

This may be of an interest to you:
root@Adam:/var/www# mdadm -E /dev/sd[a-h]1 | grep Reshap
sda1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
sdb1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
sdc1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
sdd1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
sde1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
sdf1   Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
sdg1  Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB)
sdh1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)

Note that sdd1 was the spare.

The UUIDs are all the same and the superblock is all similar except
for the reshaping position of sdg1.

I didn't try to recreate the array as I've never faced this issue
before, so I don't know what kind of repercussions it may have.

What I do know, that at the worst case scenario, I can recreate the
array out of 7 disks (all but sdg1), but lose about 2.3TB worth of
data :(

On Sun, Sep 6, 2009 at 12:32 AM, NeilBrown<neilb@suse.de> wrote:
> On Sun, September 6, 2009 6:22 am, Majed B. wrote:
>> Hello all,
>>
>> I have posted my problem already here:
>> http://ubuntuforums.org/showthread.php?p=7900571#post7900571
>> It also has file attachments of the output of mdadm -E /dev/sd[a-h]1
>
> It seems that you need to log in to read the attachements... so I haven't.
>
>>
>> I appreciate any help on this.
>
> Hopefully you just need to add "--force" to the assemble command
> and it would all just work.  However I haven't tested that on an array
> that is in the process of a reshape so I cannot promise.
> I might try to reproduce your situation and with some scratch drives
> and check that mdadm -Af does the right thing, but it won't be a day
> or so, and as I cannot see the --examine output I might get the
> situation a bit wrong ... (hint hint: it is always best to post
> full information rather than pointers to it, unless said information is
> really really big).
>
> NeilBrown

-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2 Disks Jumped Out While Reshaping RAID5
  2009-09-06 10:00   ` Majed B.
@ 2009-09-06 23:52     ` Neil Brown
  2009-09-06 23:55       ` Majed B.
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2009-09-06 23:52 UTC (permalink / raw)
  To: Majed B.; +Cc: linux-raid

On Sunday September 6, majedb@gmail.com wrote:
> I forgot to mention that I'm running mdadm 2.6.7.1 (latest in Ubuntu
> Server repositories).

You will need at least 2.6.8 to be able to assemble arrays which are
in the middle of a reshape.  I would suggest 2.6.9 or 3.0.

> 
> I tried forcing the assembly, but as mentioned, I just got an error:
> root@Adam:/var/www# mdadm -Af /dev/md0
> mdadm: superblock on /dev/sdh1 doesn't match others - assembly aborted

This incorrect message is fixed in 2.6.8 an later.

> 
> I know I could've pasted whatever I wrote here, but it seemed
> redundant. I'll keep your hint in mind for the next time, if any
> (hopefully not).

Redundant?  How is that relevant?
If you want help, your goal should be to make it as easy as possible
for people to help you.  Having all information in one email message
is easy.  Having to use a browser to get some of it makes it hard.
Having to register on the website to down load an attachment makes it
nearly impossible.

> 
> This may be of an interest to you:
> root@Adam:/var/www# mdadm -E /dev/sd[a-h]1 | grep Reshap
> sda1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
> sdb1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
> sdc1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
> sdd1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
> sde1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
> sdf1   Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
> sdg1  Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB)
> sdh1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)

If you can post that again but without the "grep" I might be able to
be more helpful.  (i.e. the complete output of "mdadm -E /dev/sd[a-h]1").

NeilBrown



> 
> Note that sdd1 was the spare.
> 
> The UUIDs are all the same and the superblock is all similar except
> for the reshaping position of sdg1.
> 
> I didn't try to recreate the array as I've never faced this issue
> before, so I don't know what kind of repercussions it may have.
> 
> What I do know, that at the worst case scenario, I can recreate the
> array out of 7 disks (all but sdg1), but lose about 2.3TB worth of
> data :(
> 
> On Sun, Sep 6, 2009 at 12:32 AM, NeilBrown<neilb@suse.de> wrote:
> > On Sun, September 6, 2009 6:22 am, Majed B. wrote:
> >> Hello all,
> >>
> >> I have posted my problem already here:
> >> http://ubuntuforums.org/showthread.php?p=7900571#post7900571
> >> It also has file attachments of the output of mdadm -E /dev/sd[a-h]1
> >
> > It seems that you need to log in to read the attachements... so I haven't.
> >
> >>
> >> I appreciate any help on this.
> >
> > Hopefully you just need to add "--force" to the assemble command
> > and it would all just work.  However I haven't tested that on an array
> > that is in the process of a reshape so I cannot promise.
> > I might try to reproduce your situation and with some scratch drives
> > and check that mdadm -Af does the right thing, but it won't be a day
> > or so, and as I cannot see the --examine output I might get the
> > situation a bit wrong ... (hint hint: it is always best to post
> > full information rather than pointers to it, unless said information is
> > really really big).
> >
> > NeilBrown
> 
> -- 
>        Majed B.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2 Disks Jumped Out While Reshaping RAID5
  2009-09-06 23:52     ` Neil Brown
@ 2009-09-06 23:55       ` Majed B.
  2009-09-07  0:01         ` Majed B.
  0 siblings, 1 reply; 9+ messages in thread
From: Majed B. @ 2009-09-06 23:55 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Thanks a lot Neil!

I didn't know that it requires you to register to download. My bad.
Here's the output of examine:

/dev/sda1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
  Creation Time : Sat May 23 00:22:49 2009
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Sep  2 13:28:39 2009
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5b5a1163 - correct
         Events : 949214

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     4       8        1        4      active sync   /dev/sda1

   0     0       0        0        0      removed
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
   4     4       8        1        4      active sync   /dev/sda1
   5     5       8      113        5      active sync   /dev/sdh1
   6     6       8       17        6      active sync   /dev/sdb1
   7     7       8       49        7      active sync   /dev/sdd1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
  Creation Time : Sat May 23 00:22:49 2009
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Sep  2 13:28:39 2009
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5b5a1177 - correct
         Events : 949214

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     6       8       17        6      active sync   /dev/sdb1

   0     0       0        0        0      removed
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
   4     4       8        1        4      active sync   /dev/sda1
   5     5       8      113        5      active sync   /dev/sdh1
   6     6       8       17        6      active sync   /dev/sdb1
   7     7       8       49        7      active sync   /dev/sdd1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
  Creation Time : Sat May 23 00:22:49 2009
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Sep  2 13:28:39 2009
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5b5a117f - correct
         Events : 949214

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     2       8       33        2      active sync   /dev/sdc1

   0     0       0        0        0      removed
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
   4     4       8        1        4      active sync   /dev/sda1
   5     5       8      113        5      active sync   /dev/sdh1
   6     6       8       17        6      active sync   /dev/sdb1
   7     7       8       49        7      active sync   /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
  Creation Time : Sat May 23 00:22:49 2009
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Sep  2 13:28:39 2009
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5b5a1199 - correct
         Events : 949214

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     7       8       49        7      active sync   /dev/sdd1

   0     0       0        0        0      removed
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
   4     4       8        1        4      active sync   /dev/sda1
   5     5       8      113        5      active sync   /dev/sdh1
   6     6       8       17        6      active sync   /dev/sdb1
   7     7       8       49        7      active sync   /dev/sdd1
/dev/sde1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
  Creation Time : Sat May 23 00:22:49 2009
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Sep  2 13:28:39 2009
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5b5a119d - correct
         Events : 949214

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     1       8       65        1      active sync   /dev/sde1

   0     0       0        0        0      removed
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
   4     4       8        1        4      active sync   /dev/sda1
   5     5       8      113        5      active sync   /dev/sdh1
   6     6       8       17        6      active sync   /dev/sdb1
   7     7       8       49        7      active sync   /dev/sdd1
/dev/sdf1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
  Creation Time : Sat May 23 00:22:49 2009
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Sep  2 06:40:04 2009
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5b59b1c1 - correct
         Events : 949204

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     0       8       81        0      active sync   /dev/sdf1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
   4     4       8        1        4      active sync   /dev/sda1
   5     5       8      113        5      active sync   /dev/sdh1
   6     6       8       17        6      active sync   /dev/sdb1
   7     7       8       49        7      active sync   /dev/sdd1
/dev/sdg1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
  Creation Time : Sat May 23 00:22:49 2009
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

  Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Sep  2 00:10:39 2009
          State : clean
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0
       Checksum : fba3471a - correct
         Events : 874530

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     3       8       97        3      active sync   /dev/sdg1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8       97        3      active sync   /dev/sdg1
   4     4       8        1        4      active sync   /dev/sda1
   5     5       8      113        5      active sync   /dev/sdh1
   6     6       8       17        6      active sync   /dev/sdb1
   7     7       8       49        7      active sync   /dev/sdd1
/dev/sdh1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
  Creation Time : Sat May 23 00:22:49 2009
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
   Raid Devices : 8
  Total Devices : 8
Preferred Minor : 0

  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
  Delta Devices : 1 (7->8)

    Update Time : Wed Sep  2 13:28:39 2009
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 5b5a11d5 - correct
         Events : 949214

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     5       8      113        5      active sync   /dev/sdh1

   0     0       0        0        0      removed
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       0        0        3      faulty removed
   4     4       8        1        4      active sync   /dev/sda1
   5     5       8      113        5      active sync   /dev/sdh1
   6     6       8       17        6      active sync   /dev/sdb1
   7     7       8       49        7      active sync   /dev/sdd1


I have already downloaded and compiled mdadm 3.0, but didn't install
it, awaiting further instructions from you. I'll install it now and
run -Af and report back what happens.

Thank you again!

On Mon, Sep 7, 2009 at 2:52 AM, Neil Brown<neilb@suse.de> wrote:
> On Sunday September 6, majedb@gmail.com wrote:
>> I forgot to mention that I'm running mdadm 2.6.7.1 (latest in Ubuntu
>> Server repositories).
>
> You will need at least 2.6.8 to be able to assemble arrays which are
> in the middle of a reshape.  I would suggest 2.6.9 or 3.0.
>
>>
>> I tried forcing the assembly, but as mentioned, I just got an error:
>> root@Adam:/var/www# mdadm -Af /dev/md0
>> mdadm: superblock on /dev/sdh1 doesn't match others - assembly aborted
>
> This incorrect message is fixed in 2.6.8 an later.
>
>>
>> I know I could've pasted whatever I wrote here, but it seemed
>> redundant. I'll keep your hint in mind for the next time, if any
>> (hopefully not).
>
> Redundant?  How is that relevant?
> If you want help, your goal should be to make it as easy as possible
> for people to help you.  Having all information in one email message
> is easy.  Having to use a browser to get some of it makes it hard.
> Having to register on the website to down load an attachment makes it
> nearly impossible.
>
>>
>> This may be of an interest to you:
>> root@Adam:/var/www# mdadm -E /dev/sd[a-h]1 | grep Reshap
>> sda1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>> sdb1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>> sdc1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>> sdd1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>> sde1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>> sdf1   Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>> sdg1  Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB)
>> sdh1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>
> If you can post that again but without the "grep" I might be able to
> be more helpful.  (i.e. the complete output of "mdadm -E /dev/sd[a-h]1").
>
> NeilBrown
>
>
>
>>
>> Note that sdd1 was the spare.
>>
>> The UUIDs are all the same and the superblock is all similar except
>> for the reshaping position of sdg1.
>>
>> I didn't try to recreate the array as I've never faced this issue
>> before, so I don't know what kind of repercussions it may have.
>>
>> What I do know, that at the worst case scenario, I can recreate the
>> array out of 7 disks (all but sdg1), but lose about 2.3TB worth of
>> data :(
>>
>> On Sun, Sep 6, 2009 at 12:32 AM, NeilBrown<neilb@suse.de> wrote:
>> > On Sun, September 6, 2009 6:22 am, Majed B. wrote:
>> >> Hello all,
>> >>
>> >> I have posted my problem already here:
>> >> http://ubuntuforums.org/showthread.php?p=7900571#post7900571
>> >> It also has file attachments of the output of mdadm -E /dev/sd[a-h]1
>> >
>> > It seems that you need to log in to read the attachements... so I haven't.
>> >
>> >>
>> >> I appreciate any help on this.
>> >
>> > Hopefully you just need to add "--force" to the assemble command
>> > and it would all just work.  However I haven't tested that on an array
>> > that is in the process of a reshape so I cannot promise.
>> > I might try to reproduce your situation and with some scratch drives
>> > and check that mdadm -Af does the right thing, but it won't be a day
>> > or so, and as I cannot see the --examine output I might get the
>> > situation a bit wrong ... (hint hint: it is always best to post
>> > full information rather than pointers to it, unless said information is
>> > really really big).
>> >
>> > NeilBrown
>>
>> --
>>        Majed B.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2 Disks Jumped Out While Reshaping RAID5
  2009-09-06 23:55       ` Majed B.
@ 2009-09-07  0:01         ` Majed B.
  2009-09-07  0:31           ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Majed B. @ 2009-09-07  0:01 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

I have installed mdadm 3.0 and ran -Af and now it's continuing reshaping!!!

root@Adam:~# mdadm --version
mdadm - v3.0 - 2nd June 2009
root@Adam:~# mdadm -Af --verbose /dev/md0
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sdi5: Device or resource busy
mdadm: /dev/sdi5 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sdi2
mdadm: /dev/sdi2 has wrong uuid.
mdadm: cannot open device /dev/sdi1: Device or resource busy
mdadm: /dev/sdi1 has wrong uuid.
mdadm: cannot open device /dev/sdi: Device or resource busy
mdadm: /dev/sdi has wrong uuid.
mdadm: no RAID superblock on /dev/sdh
mdadm: /dev/sdh has wrong uuid.
mdadm: no RAID superblock on /dev/sdg
mdadm: /dev/sdg has wrong uuid.
mdadm: no RAID superblock on /dev/sdf
mdadm: /dev/sdf has wrong uuid.
mdadm: no RAID superblock on /dev/sde
mdadm: /dev/sde has wrong uuid.
mdadm: no RAID superblock on /dev/sdd
mdadm: /dev/sdd has wrong uuid.
mdadm: no RAID superblock on /dev/sdc
mdadm: /dev/sdc has wrong uuid.
mdadm: no RAID superblock on /dev/sdb
mdadm: /dev/sdb has wrong uuid.
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sda has wrong uuid.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 4.
mdadm: forcing event count in /dev/sdf1(0) from 949204 upto 949214
mdadm: added /dev/sde1 to /dev/md0 as 1
mdadm: added /dev/sdc1 to /dev/md0 as 2
mdadm: added /dev/sdg1 to /dev/md0 as 3
mdadm: added /dev/sda1 to /dev/md0 as 4
mdadm: added /dev/sdh1 to /dev/md0 as 5
mdadm: added /dev/sdb1 to /dev/md0 as 6
mdadm: added /dev/sdd1 to /dev/md0 as 7
mdadm: added /dev/sdf1 to /dev/md0 as 0
mdadm: /dev/md0 has been started with 7 drives (out of 8).
root@Adam:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid5 sdf1[0] sdd1[7] sdb1[6] sdh1[5] sda1[4] sdc1[2] sde1[1]
      5860558848 blocks super 0.91 level 5, 256k chunk, algorithm 2
[8/7] [UUU_UUUU]
      [=========>...........]  reshape = 49.1% (479633152/976759808)
finish=950.5min speed=8704K/sec

unused devices: <none>

sdg1 is not in the list. Is that correct?!  sdg1 was one of the
array's disks before expanding. So I guess now the array is degraded
yet is reshaping as if it had 8 disks, correct?

So after the reshaping process is over, I can add sdg1 again and it
will resync properly, right?

On Mon, Sep 7, 2009 at 2:55 AM, Majed B.<majedb@gmail.com> wrote:
> Thanks a lot Neil!
>
> I didn't know that it requires you to register to download. My bad.
> Here's the output of examine:
>
> /dev/sda1:
>          Magic : a92b4efc
>        Version : 00.91.00
>           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
>  Creation Time : Sat May 23 00:22:49 2009
>     Raid Level : raid5
>  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 0
>
>  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>  Delta Devices : 1 (7->8)
>
>    Update Time : Wed Sep  2 13:28:39 2009
>          State : clean
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5b5a1163 - correct
>         Events : 949214
>
>         Layout : left-symmetric
>     Chunk Size : 256K
>
>      Number   Major   Minor   RaidDevice State
> this     4       8        1        4      active sync   /dev/sda1
>
>   0     0       0        0        0      removed
>   1     1       8       65        1      active sync   /dev/sde1
>   2     2       8       33        2      active sync   /dev/sdc1
>   3     3       0        0        3      faulty removed
>   4     4       8        1        4      active sync   /dev/sda1
>   5     5       8      113        5      active sync   /dev/sdh1
>   6     6       8       17        6      active sync   /dev/sdb1
>   7     7       8       49        7      active sync   /dev/sdd1
> /dev/sdb1:
>          Magic : a92b4efc
>        Version : 00.91.00
>           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
>  Creation Time : Sat May 23 00:22:49 2009
>     Raid Level : raid5
>  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 0
>
>  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>  Delta Devices : 1 (7->8)
>
>    Update Time : Wed Sep  2 13:28:39 2009
>          State : clean
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5b5a1177 - correct
>         Events : 949214
>
>         Layout : left-symmetric
>     Chunk Size : 256K
>
>      Number   Major   Minor   RaidDevice State
> this     6       8       17        6      active sync   /dev/sdb1
>
>   0     0       0        0        0      removed
>   1     1       8       65        1      active sync   /dev/sde1
>   2     2       8       33        2      active sync   /dev/sdc1
>   3     3       0        0        3      faulty removed
>   4     4       8        1        4      active sync   /dev/sda1
>   5     5       8      113        5      active sync   /dev/sdh1
>   6     6       8       17        6      active sync   /dev/sdb1
>   7     7       8       49        7      active sync   /dev/sdd1
> /dev/sdc1:
>          Magic : a92b4efc
>        Version : 00.91.00
>           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
>  Creation Time : Sat May 23 00:22:49 2009
>     Raid Level : raid5
>  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 0
>
>  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>  Delta Devices : 1 (7->8)
>
>    Update Time : Wed Sep  2 13:28:39 2009
>          State : clean
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5b5a117f - correct
>         Events : 949214
>
>         Layout : left-symmetric
>     Chunk Size : 256K
>
>      Number   Major   Minor   RaidDevice State
> this     2       8       33        2      active sync   /dev/sdc1
>
>   0     0       0        0        0      removed
>   1     1       8       65        1      active sync   /dev/sde1
>   2     2       8       33        2      active sync   /dev/sdc1
>   3     3       0        0        3      faulty removed
>   4     4       8        1        4      active sync   /dev/sda1
>   5     5       8      113        5      active sync   /dev/sdh1
>   6     6       8       17        6      active sync   /dev/sdb1
>   7     7       8       49        7      active sync   /dev/sdd1
> /dev/sdd1:
>          Magic : a92b4efc
>        Version : 00.91.00
>           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
>  Creation Time : Sat May 23 00:22:49 2009
>     Raid Level : raid5
>  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 0
>
>  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>  Delta Devices : 1 (7->8)
>
>    Update Time : Wed Sep  2 13:28:39 2009
>          State : clean
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5b5a1199 - correct
>         Events : 949214
>
>         Layout : left-symmetric
>     Chunk Size : 256K
>
>      Number   Major   Minor   RaidDevice State
> this     7       8       49        7      active sync   /dev/sdd1
>
>   0     0       0        0        0      removed
>   1     1       8       65        1      active sync   /dev/sde1
>   2     2       8       33        2      active sync   /dev/sdc1
>   3     3       0        0        3      faulty removed
>   4     4       8        1        4      active sync   /dev/sda1
>   5     5       8      113        5      active sync   /dev/sdh1
>   6     6       8       17        6      active sync   /dev/sdb1
>   7     7       8       49        7      active sync   /dev/sdd1
> /dev/sde1:
>          Magic : a92b4efc
>        Version : 00.91.00
>           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
>  Creation Time : Sat May 23 00:22:49 2009
>     Raid Level : raid5
>  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 0
>
>  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>  Delta Devices : 1 (7->8)
>
>    Update Time : Wed Sep  2 13:28:39 2009
>          State : clean
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5b5a119d - correct
>         Events : 949214
>
>         Layout : left-symmetric
>     Chunk Size : 256K
>
>      Number   Major   Minor   RaidDevice State
> this     1       8       65        1      active sync   /dev/sde1
>
>   0     0       0        0        0      removed
>   1     1       8       65        1      active sync   /dev/sde1
>   2     2       8       33        2      active sync   /dev/sdc1
>   3     3       0        0        3      faulty removed
>   4     4       8        1        4      active sync   /dev/sda1
>   5     5       8      113        5      active sync   /dev/sdh1
>   6     6       8       17        6      active sync   /dev/sdb1
>   7     7       8       49        7      active sync   /dev/sdd1
> /dev/sdf1:
>          Magic : a92b4efc
>        Version : 00.91.00
>           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
>  Creation Time : Sat May 23 00:22:49 2009
>     Raid Level : raid5
>  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 0
>
>  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>  Delta Devices : 1 (7->8)
>
>    Update Time : Wed Sep  2 06:40:04 2009
>          State : clean
>  Active Devices : 7
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5b59b1c1 - correct
>         Events : 949204
>
>         Layout : left-symmetric
>     Chunk Size : 256K
>
>      Number   Major   Minor   RaidDevice State
> this     0       8       81        0      active sync   /dev/sdf1
>
>   0     0       8       81        0      active sync   /dev/sdf1
>   1     1       8       65        1      active sync   /dev/sde1
>   2     2       8       33        2      active sync   /dev/sdc1
>   3     3       0        0        3      faulty removed
>   4     4       8        1        4      active sync   /dev/sda1
>   5     5       8      113        5      active sync   /dev/sdh1
>   6     6       8       17        6      active sync   /dev/sdb1
>   7     7       8       49        7      active sync   /dev/sdd1
> /dev/sdg1:
>          Magic : a92b4efc
>        Version : 00.91.00
>           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
>  Creation Time : Sat May 23 00:22:49 2009
>     Raid Level : raid5
>  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 0
>
>  Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB)
>  Delta Devices : 1 (7->8)
>
>    Update Time : Wed Sep  2 00:10:39 2009
>          State : clean
>  Active Devices : 8
> Working Devices : 8
>  Failed Devices : 0
>  Spare Devices : 0
>       Checksum : fba3471a - correct
>         Events : 874530
>
>         Layout : left-symmetric
>     Chunk Size : 256K
>
>      Number   Major   Minor   RaidDevice State
> this     3       8       97        3      active sync   /dev/sdg1
>
>   0     0       8       81        0      active sync   /dev/sdf1
>   1     1       8       65        1      active sync   /dev/sde1
>   2     2       8       33        2      active sync   /dev/sdc1
>   3     3       8       97        3      active sync   /dev/sdg1
>   4     4       8        1        4      active sync   /dev/sda1
>   5     5       8      113        5      active sync   /dev/sdh1
>   6     6       8       17        6      active sync   /dev/sdb1
>   7     7       8       49        7      active sync   /dev/sdd1
> /dev/sdh1:
>          Magic : a92b4efc
>        Version : 00.91.00
>           UUID : ed1a2670:03308d80:95a69c9f:ccf9605f
>  Creation Time : Sat May 23 00:22:49 2009
>     Raid Level : raid5
>  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
>     Array Size : 6837318656 (6520.58 GiB 7001.41 GB)
>   Raid Devices : 8
>  Total Devices : 8
> Preferred Minor : 0
>
>  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>  Delta Devices : 1 (7->8)
>
>    Update Time : Wed Sep  2 13:28:39 2009
>          State : clean
>  Active Devices : 6
> Working Devices : 6
>  Failed Devices : 1
>  Spare Devices : 0
>       Checksum : 5b5a11d5 - correct
>         Events : 949214
>
>         Layout : left-symmetric
>     Chunk Size : 256K
>
>      Number   Major   Minor   RaidDevice State
> this     5       8      113        5      active sync   /dev/sdh1
>
>   0     0       0        0        0      removed
>   1     1       8       65        1      active sync   /dev/sde1
>   2     2       8       33        2      active sync   /dev/sdc1
>   3     3       0        0        3      faulty removed
>   4     4       8        1        4      active sync   /dev/sda1
>   5     5       8      113        5      active sync   /dev/sdh1
>   6     6       8       17        6      active sync   /dev/sdb1
>   7     7       8       49        7      active sync   /dev/sdd1
>
>
> I have already downloaded and compiled mdadm 3.0, but didn't install
> it, awaiting further instructions from you. I'll install it now and
> run -Af and report back what happens.
>
> Thank you again!
>
> On Mon, Sep 7, 2009 at 2:52 AM, Neil Brown<neilb@suse.de> wrote:
>> On Sunday September 6, majedb@gmail.com wrote:
>>> I forgot to mention that I'm running mdadm 2.6.7.1 (latest in Ubuntu
>>> Server repositories).
>>
>> You will need at least 2.6.8 to be able to assemble arrays which are
>> in the middle of a reshape.  I would suggest 2.6.9 or 3.0.
>>
>>>
>>> I tried forcing the assembly, but as mentioned, I just got an error:
>>> root@Adam:/var/www# mdadm -Af /dev/md0
>>> mdadm: superblock on /dev/sdh1 doesn't match others - assembly aborted
>>
>> This incorrect message is fixed in 2.6.8 an later.
>>
>>>
>>> I know I could've pasted whatever I wrote here, but it seemed
>>> redundant. I'll keep your hint in mind for the next time, if any
>>> (hopefully not).
>>
>> Redundant?  How is that relevant?
>> If you want help, your goal should be to make it as easy as possible
>> for people to help you.  Having all information in one email message
>> is easy.  Having to use a browser to get some of it makes it hard.
>> Having to register on the website to down load an attachment makes it
>> nearly impossible.
>>
>>>
>>> This may be of an interest to you:
>>> root@Adam:/var/www# mdadm -E /dev/sd[a-h]1 | grep Reshap
>>> sda1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>>> sdb1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>>> sdc1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>>> sdd1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>>> sde1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>>> sdf1   Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>>> sdg1  Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB)
>>> sdh1  Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB)
>>
>> If you can post that again but without the "grep" I might be able to
>> be more helpful.  (i.e. the complete output of "mdadm -E /dev/sd[a-h]1").
>>
>> NeilBrown
>>
>>
>>
>>>
>>> Note that sdd1 was the spare.
>>>
>>> The UUIDs are all the same and the superblock is all similar except
>>> for the reshaping position of sdg1.
>>>
>>> I didn't try to recreate the array as I've never faced this issue
>>> before, so I don't know what kind of repercussions it may have.
>>>
>>> What I do know, that at the worst case scenario, I can recreate the
>>> array out of 7 disks (all but sdg1), but lose about 2.3TB worth of
>>> data :(
>>>
>>> On Sun, Sep 6, 2009 at 12:32 AM, NeilBrown<neilb@suse.de> wrote:
>>> > On Sun, September 6, 2009 6:22 am, Majed B. wrote:
>>> >> Hello all,
>>> >>
>>> >> I have posted my problem already here:
>>> >> http://ubuntuforums.org/showthread.php?p=7900571#post7900571
>>> >> It also has file attachments of the output of mdadm -E /dev/sd[a-h]1
>>> >
>>> > It seems that you need to log in to read the attachements... so I haven't.
>>> >
>>> >>
>>> >> I appreciate any help on this.
>>> >
>>> > Hopefully you just need to add "--force" to the assemble command
>>> > and it would all just work.  However I haven't tested that on an array
>>> > that is in the process of a reshape so I cannot promise.
>>> > I might try to reproduce your situation and with some scratch drives
>>> > and check that mdadm -Af does the right thing, but it won't be a day
>>> > or so, and as I cannot see the --examine output I might get the
>>> > situation a bit wrong ... (hint hint: it is always best to post
>>> > full information rather than pointers to it, unless said information is
>>> > really really big).
>>> >
>>> > NeilBrown
>>>
>>> --
>>>        Majed B.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
>       Majed B.
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2 Disks Jumped Out While Reshaping RAID5
  2009-09-07  0:01         ` Majed B.
@ 2009-09-07  0:31           ` NeilBrown
  2009-09-07  0:44             ` Majed B.
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2009-09-07  0:31 UTC (permalink / raw)
  To: Majed B.; +Cc: linux-raid

On Mon, September 7, 2009 10:01 am, Majed B. wrote:
> I have installed mdadm 3.0 and ran -Af and now it's continuing
> reshaping!!!

Excellent.

Based on the --examine info you provided it appears that
/dev/sdg1 reported an error at about 00:10:39 on Wednesday morning
and was evicted from the array.  Reshape was up to 2435GB (37%) at
that point.
Reshape continued until 06:40:04 that morning at which point it
had reached 3201GB (49%).  At that point /dev/sdf1 seems to have
reported an error so the whole array went off line.

When you reassembled with mdadm-3.0 and --force, it excluded sdg1
as that was the oldest, and marked sdf1 as up-to-date, and continued.

The reshape processes will have redone the last few chunks so all
the data will have been properly relocated.

As all the superblocks report that the array was "State : clean",
you can be quite sure that all your data is safe (if they were
"State : active" there would be a small chance some a block or two
was corrupted and a fsck etc would be advised).

It wouldn't hurt to examine your kernel logs to see what sort of
error was tiggered at those two times in case there might be a need
to replace a device.

> sdg1 is not in the list. Is that correct?!  sdg1 was one of the
> array's disks before expanding. So I guess now the array is degraded
> yet is reshaping as if it had 8 disks, correct?

Yes, that is correct.
It may be that sdg has a transient error, or it may have a serious
media or other error.  You should convince yourself that it is working
reliably before adding it back in to the array.

>
> So after the reshaping process is over, I can add sdg1 again and it
> will resync properly, right?

Yes it will, providing no write-errors occur while writing data to it.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2 Disks Jumped Out While Reshaping RAID5
  2009-09-07  0:31           ` NeilBrown
@ 2009-09-07  0:44             ` Majed B.
  2009-09-07 16:34               ` Majed B.
  0 siblings, 1 reply; 9+ messages in thread
From: Majed B. @ 2009-09-07  0:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thanks a lot Neil for your help :)

kernel logs showed a SATA link error for sdg. I double checked the
cables and they were more than fine and the array was running for
weeks before I did the reshaping and no errors were reported before
the reshaping process.

I'm using an MSI motherboard (MS-7514) and been having random issues
with it since reaching 6 disks. I've recently ordered an EVGA
motherboard and if things turn to be stable on it, I'll ditch MSI for
good.

Throughout searching for the past 6 days, I noticed people complaining
from acpi and apic causing issues, so I turned them off and will see
how things turn out.

These are the hard disks I'm using:

root@Adam:~# hddtemp /dev/sd[a-h]
/dev/sda: WDC WD10EACS-00D6B1: 26°C
/dev/sdb: WDC WD10EACS-00D6B1: 28°C
/dev/sdc: WDC WD10EACS-00ZJB0: 29°C
/dev/sdd: WDC WD10EADS-65L5B1: 27°C
/dev/sde: WDC WD10EADS-65L5B1: 28°C
/dev/sdf: MAXTOR STM31000340AS: 28°C
/dev/sdg: WDC WD10EACS-00ZJB0: 26°C
/dev/sdh: WDC WD10EADS-00L5B1: 25°C
/dev/sdi: Hitachi HDS721680PLAT80: 32°C

(sdi is the OS disk)

Neil, do you suggest any certain test/stress-tests to put sdg through?

I'll force a couple of short and long smartd tests on it, and have dd
read the whole disk a couple of times to make sure all sectors are
read properly. Is that sufficient?

Thank you again.

On Mon, Sep 7, 2009 at 3:31 AM, NeilBrown<neilb@suse.de> wrote:
> On Mon, September 7, 2009 10:01 am, Majed B. wrote:
>> I have installed mdadm 3.0 and ran -Af and now it's continuing
>> reshaping!!!
>
> Excellent.
>
> Based on the --examine info you provided it appears that
> /dev/sdg1 reported an error at about 00:10:39 on Wednesday morning
> and was evicted from the array.  Reshape was up to 2435GB (37%) at
> that point.
> Reshape continued until 06:40:04 that morning at which point it
> had reached 3201GB (49%).  At that point /dev/sdf1 seems to have
> reported an error so the whole array went off line.
>
> When you reassembled with mdadm-3.0 and --force, it excluded sdg1
> as that was the oldest, and marked sdf1 as up-to-date, and continued.
>
> The reshape processes will have redone the last few chunks so all
> the data will have been properly relocated.
>
> As all the superblocks report that the array was "State : clean",
> you can be quite sure that all your data is safe (if they were
> "State : active" there would be a small chance some a block or two
> was corrupted and a fsck etc would be advised).
>
> It wouldn't hurt to examine your kernel logs to see what sort of
> error was tiggered at those two times in case there might be a need
> to replace a device.
>
>
>
>
>> sdg1 is not in the list. Is that correct?!  sdg1 was one of the
>> array's disks before expanding. So I guess now the array is degraded
>> yet is reshaping as if it had 8 disks, correct?
>
> Yes, that is correct.
> It may be that sdg has a transient error, or it may have a serious
> media or other error.  You should convince yourself that it is working
> reliably before adding it back in to the array.
>
>
>
>>
>> So after the reshaping process is over, I can add sdg1 again and it
>> will resync properly, right?
>
> Yes it will, providing no write-errors occur while writing data to it.
>
> NeilBrown
>
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2 Disks Jumped Out While Reshaping RAID5
  2009-09-07  0:44             ` Majed B.
@ 2009-09-07 16:34               ` Majed B.
  0 siblings, 0 replies; 9+ messages in thread
From: Majed B. @ 2009-09-07 16:34 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

A little update on the situation:

After uninstalling mdadm 2.6.7.1 which ships with Ubuntu 9.04, and
installing mdadm 3.0, I got this:

root@Adam:~# cat /proc/mdstat
Personalities :
unused devices: <none>

I'm guessing that happened because initram tools was removed when
uninstalling the old mdadm. No problem, I'll just assemble the array
on boot (through a line in /etc/rc.local).

I then proceeded to assemble the array, but it refused:
root@Adam:~# mdadm -Af --verbose /dev/md0
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sdi5: Device or resource busy
mdadm: /dev/sdi5 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sdi2
mdadm: /dev/sdi2 has wrong uuid.
mdadm: cannot open device /dev/sdi1: Device or resource busy
mdadm: /dev/sdi1 has wrong uuid.
mdadm: cannot open device /dev/sdi: Device or resource busy
mdadm: /dev/sdi has wrong uuid.
mdadm: no RAID superblock on /dev/sdh
mdadm: /dev/sdh has wrong uuid.
mdadm: superblock on /dev/sdg1 doesn't match others - assembly aborted

Since sdg1 has flunked out before, I just zeroed its superblock to add
it later, if it wasn't dead:

root@Adam:~# mdadm --zero-superblock /dev/sdg
mdadm: Unrecognised md component device - /dev/sdg
root@Adam:~# mdadm --zero-superblock /dev/sdg1
root@Adam:~# mdadm --zero-superblock /dev/sdg1
mdadm: Unrecognised md component device - /dev/sdg1

The array assembled properly after that (with 7 out 8 disks -- running
degraded):
root@Adam:~# mdadm -Af --verbose /dev/md0
mdadm: looking for devices for /dev/md0
mdadm: cannot open device /dev/sdi5: Device or resource busy
mdadm: /dev/sdi5 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sdi2
mdadm: /dev/sdi2 has wrong uuid.
mdadm: cannot open device /dev/sdi1: Device or resource busy
mdadm: /dev/sdi1 has wrong uuid.
mdadm: cannot open device /dev/sdi: Device or resource busy
mdadm: /dev/sdi has wrong uuid.
mdadm: no RAID superblock on /dev/sdh
mdadm: /dev/sdh has wrong uuid.
mdadm: no RAID superblock on /dev/sdg1
mdadm: /dev/sdg1 has wrong uuid.
mdadm: no RAID superblock on /dev/sdg
mdadm: /dev/sdg has wrong uuid.
mdadm: no RAID superblock on /dev/sdf
mdadm: /dev/sdf has wrong uuid.
mdadm: no RAID superblock on /dev/sde
mdadm: /dev/sde has wrong uuid.
mdadm: no RAID superblock on /dev/sdd
mdadm: /dev/sdd has wrong uuid.
mdadm: no RAID superblock on /dev/sdc
mdadm: /dev/sdc has wrong uuid.
mdadm: no RAID superblock on /dev/sdb
mdadm: /dev/sdb has wrong uuid.
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sda has wrong uuid.
mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 5.
mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 7.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 6.
mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 4.
mdadm: added /dev/sde1 to /dev/md0 as 1
mdadm: added /dev/sdc1 to /dev/md0 as 2
mdadm: no uptodate device for slot 3 of /dev/md0
mdadm: added /dev/sda1 to /dev/md0 as 4
mdadm: added /dev/sdh1 to /dev/md0 as 5
mdadm: added /dev/sdb1 to /dev/md0 as 6
mdadm: added /dev/sdd1 to /dev/md0 as 7
mdadm: added /dev/sdf1 to /dev/md0 as 0
mdadm: /dev/md0 has been started with 7 drives (out of 8).
root@Adam:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdf1[0] sdd1[7] sdb1[6] sdh1[5] sda1[4] sdc1[2] sde1[1]
      6837318656 blocks level 5, 256k chunk, algorithm 2 [8/7] [UUU_UUUU]

unused devices: <none>


After some poking, I'm suspecting the MSI motherboard itself, since
the problems happens to disks that are on ports 7 and 8 on the
motherboard, and those two ports have their own controller and they
share a single bus.

I've ordered an EVGA motherboard that should arrive in a week or so.
I'll update later when I move the hard disks to it and add that sdg
disk.

Thanks again Neil for your help :)

On Mon, Sep 7, 2009 at 3:44 AM, Majed B.<majedb@gmail.com> wrote:
> Thanks a lot Neil for your help :)
>
> kernel logs showed a SATA link error for sdg. I double checked the
> cables and they were more than fine and the array was running for
> weeks before I did the reshaping and no errors were reported before
> the reshaping process.
>
> I'm using an MSI motherboard (MS-7514) and been having random issues
> with it since reaching 6 disks. I've recently ordered an EVGA
> motherboard and if things turn to be stable on it, I'll ditch MSI for
> good.
>
> Throughout searching for the past 6 days, I noticed people complaining
> from acpi and apic causing issues, so I turned them off and will see
> how things turn out.
>
> These are the hard disks I'm using:
>
> root@Adam:~# hddtemp /dev/sd[a-h]
> /dev/sda: WDC WD10EACS-00D6B1: 26°C
> /dev/sdb: WDC WD10EACS-00D6B1: 28°C
> /dev/sdc: WDC WD10EACS-00ZJB0: 29°C
> /dev/sdd: WDC WD10EADS-65L5B1: 27°C
> /dev/sde: WDC WD10EADS-65L5B1: 28°C
> /dev/sdf: MAXTOR STM31000340AS: 28°C
> /dev/sdg: WDC WD10EACS-00ZJB0: 26°C
> /dev/sdh: WDC WD10EADS-00L5B1: 25°C
> /dev/sdi: Hitachi HDS721680PLAT80: 32°C
>
> (sdi is the OS disk)
>
> Neil, do you suggest any certain test/stress-tests to put sdg through?
>
> I'll force a couple of short and long smartd tests on it, and have dd
> read the whole disk a couple of times to make sure all sectors are
> read properly. Is that sufficient?
>
> Thank you again.
>
> On Mon, Sep 7, 2009 at 3:31 AM, NeilBrown<neilb@suse.de> wrote:
>> On Mon, September 7, 2009 10:01 am, Majed B. wrote:
>>> I have installed mdadm 3.0 and ran -Af and now it's continuing
>>> reshaping!!!
>>
>> Excellent.
>>
>> Based on the --examine info you provided it appears that
>> /dev/sdg1 reported an error at about 00:10:39 on Wednesday morning
>> and was evicted from the array.  Reshape was up to 2435GB (37%) at
>> that point.
>> Reshape continued until 06:40:04 that morning at which point it
>> had reached 3201GB (49%).  At that point /dev/sdf1 seems to have
>> reported an error so the whole array went off line.
>>
>> When you reassembled with mdadm-3.0 and --force, it excluded sdg1
>> as that was the oldest, and marked sdf1 as up-to-date, and continued.
>>
>> The reshape processes will have redone the last few chunks so all
>> the data will have been properly relocated.
>>
>> As all the superblocks report that the array was "State : clean",
>> you can be quite sure that all your data is safe (if they were
>> "State : active" there would be a small chance some a block or two
>> was corrupted and a fsck etc would be advised).
>>
>> It wouldn't hurt to examine your kernel logs to see what sort of
>> error was tiggered at those two times in case there might be a need
>> to replace a device.
>>
>>
>>
>>
>>> sdg1 is not in the list. Is that correct?!  sdg1 was one of the
>>> array's disks before expanding. So I guess now the array is degraded
>>> yet is reshaping as if it had 8 disks, correct?
>>
>> Yes, that is correct.
>> It may be that sdg has a transient error, or it may have a serious
>> media or other error.  You should convince yourself that it is working
>> reliably before adding it back in to the array.
>>
>>
>>
>>>
>>> So after the reshaping process is over, I can add sdg1 again and it
>>> will resync properly, right?
>>
>> Yes it will, providing no write-errors occur while writing data to it.
>>
>> NeilBrown
>>
>>
>
>
>
> --
>       Majed B.
>



-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-09-07 16:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-05 20:22 2 Disks Jumped Out While Reshaping RAID5 Majed B.
2009-09-05 21:32 ` NeilBrown
2009-09-06 10:00   ` Majed B.
2009-09-06 23:52     ` Neil Brown
2009-09-06 23:55       ` Majed B.
2009-09-07  0:01         ` Majed B.
2009-09-07  0:31           ` NeilBrown
2009-09-07  0:44             ` Majed B.
2009-09-07 16:34               ` Majed B.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox