linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: md: bug in file md.c, line 1440 (2.4.22)
       [not found] <3F5017CA.4080700@tomt.net>
@ 2003-09-03  0:47 ` Neil Brown
  2003-09-03  1:38   ` Mike Fedyk
                     ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Neil Brown @ 2003-09-03  0:47 UTC (permalink / raw)
  To: Andre Tomt; +Cc: linux-kernel, linux-raid, mingo

On Saturday August 30, andre@tomt.net wrote:
> Heya :-)
> 
> Having a funny showstopper problem here with md, the autostart fails 
> miserably with "md: bug in file md.c, line 1440"
> 
> Here's the story;
and a sad one it is...

> md:	**********************************
> md:	* <COMPLETE RAID STATE PRINTOUT> *
> md:	**********************************
> md5: <hda9> array superblock:
> md:  SB: (V:0.90.0) ID:<80012e77.b449af86.6bccffae.ddda9474> CT:3e4caa02
> md:     L1 S24394560 ND:-22 RD:2 md5 LO:0 CS:32768
> md:     UT:3f50117d ST:1 AD:2 WD:2 FD:-24 SD:0 CSUM:a62f39ca E:0000006d
>      D  0:  DISK<N:0,hdc9(22,9),R:0,S:6>
>      D  1:  DISK<N:1,hda9(3,9),R:1,S:6>
>      D  2:  DISK<N:2,[dev 00:00](0,0),R:2,S:9>
>      D  3:  DISK<N:0,[dev 00:00](0,0),R:0,S:9>
>      D  4:  DISK<N:0,[dev 00:00](0,0),R:0,S:9>

Your problem is that these extra slots (N:0) are flagged as failed
(S:9) and this confuses md.c.

If you get mdadm 1.3.0 and apply the three patches that can be found
in
   http://cgi.cse.unsw.edu.au/~neilb/source/mdadm/patch/applied/

and then stop the array and use:
   mdadm --assemble --update=summaries /dev/md5 /dev/sda9 /dev/sdc9

then it should fix things up for you.
You will need to do a similar thing for all of the arrays.
This will be difficult for md2 as it is 'root'.  You will need to boot
a rescue disc to fix this one.

I have not idea how it got the failed flag.

NeilBrown

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: md: bug in file md.c, line 1440 (2.4.22)
  2003-09-03  0:47 ` md: bug in file md.c, line 1440 (2.4.22) Neil Brown
@ 2003-09-03  1:38   ` Mike Fedyk
  2003-09-03  8:57   ` Lars Marowsky-Bree
  2003-09-07  0:06   ` Andre Tomt
  2 siblings, 0 replies; 4+ messages in thread
From: Mike Fedyk @ 2003-09-03  1:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: Andre Tomt, linux-kernel, linux-raid, mingo

On Wed, Sep 03, 2003 at 10:47:41AM +1000, Neil Brown wrote:
> I have not idea how it got the failed flag.
> 
> NeilBrown

Andre, is there any chance you ran a 2.6 kernel on that raid array?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: md: bug in file md.c, line 1440 (2.4.22)
  2003-09-03  0:47 ` md: bug in file md.c, line 1440 (2.4.22) Neil Brown
  2003-09-03  1:38   ` Mike Fedyk
@ 2003-09-03  8:57   ` Lars Marowsky-Bree
  2003-09-07  0:06   ` Andre Tomt
  2 siblings, 0 replies; 4+ messages in thread
From: Lars Marowsky-Bree @ 2003-09-03  8:57 UTC (permalink / raw)
  To: Neil Brown, Andre Tomt; +Cc: linux-kernel, linux-raid, mingo

On 2003-09-03T10:47:41,
   Neil Brown <neilb@cse.unsw.edu.au> said:

> I have not idea how it got the failed flag.

What's proven very helpful to figure out these things is to run a
test script against md, and just trying all the various possible actions
via mdadm or raidtools randomly.

I've done that for m-p, and while it's not pretty, it is _really_
helpful and has found all sorts of weird accounting bugs in the md code
(2.4 has many): ftp://ftp.suse.com/pub/people/lmb/md-mp/mp-test.sh - if
anyone feels like extending it to include raid5/raid1 etc, that would be
cool ;-)



Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering		ever tried. ever failed. no matter.
SuSE Labs				try again. fail again. fail better.
Research & Development, SuSE Linux AG		-- Samuel Beckett

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: md: bug in file md.c, line 1440 (2.4.22)
  2003-09-03  0:47 ` md: bug in file md.c, line 1440 (2.4.22) Neil Brown
  2003-09-03  1:38   ` Mike Fedyk
  2003-09-03  8:57   ` Lars Marowsky-Bree
@ 2003-09-07  0:06   ` Andre Tomt
  2 siblings, 0 replies; 4+ messages in thread
From: Andre Tomt @ 2003-09-07  0:06 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, linux-raid, mingo

Neil Brown wrote:
> Your problem is that these extra slots (N:0) are flagged as failed
> (S:9) and this confuses md.c.
> 
> If you get mdadm 1.3.0 and apply the three patches that can be found
> in
>    http://cgi.cse.unsw.edu.au/~neilb/source/mdadm/patch/applied/
> 
> and then stop the array and use:
>    mdadm --assemble --update=summaries /dev/md5 /dev/sda9 /dev/sdc9
> 
> then it should fix things up for you.
> You will need to do a similar thing for all of the arrays.
> This will be difficult for md2 as it is 'root'.  You will need to boot
> a rescue disc to fix this one.
> 
> I have not idea how it got the failed flag.

I didn't read this in time to get this tested - I did a full backup and 
restore earlier, zeroing all sectors on both drives. All is fine now, 
however I have no idea how this has happened. I'll set one partition 
faulty, re-add it and reboot later just to make sure it really works.

2.6 has never been booted on that machine, or disks.

Thanks, I'll keep mdadm in mind next time something like this happens ;-)

-- 
Cheers,
André Tomt
andre@tomt.net

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-09-07  0:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3F5017CA.4080700@tomt.net>
2003-09-03  0:47 ` md: bug in file md.c, line 1440 (2.4.22) Neil Brown
2003-09-03  1:38   ` Mike Fedyk
2003-09-03  8:57   ` Lars Marowsky-Bree
2003-09-07  0:06   ` Andre Tomt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).