* Which (physical) disk is broken?
@ 2004-12-21 11:08 Turbo Fredriksson
2004-12-22 16:39 ` Guy
0 siblings, 1 reply; 12+ messages in thread
From: Turbo Fredriksson @ 2004-12-21 11:08 UTC (permalink / raw)
To: linux-raid
I've forgot, and I can't seem to find out which physical disk
is broken/removed from an array...
----- s n i p -----
aurora:~# mdadm -D /dev/md/1
/dev/md/1:
Version : 00.90.01
Creation Time : Wed Oct 27 08:12:44 2004
Raid Level : raid5
Array Size : 141483520 (134.93 GiB 144.88 GB)
Device Size : 17685440 (16.87 GiB 18.11 GB)
Raid Devices : 9
Total Devices : 8
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Tue Dec 21 12:04:43 2004
State : dirty, degraded
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1
1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1
2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1
3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1
4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1
5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1
6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1
7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1
8 0 0 -1 removed
UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
Events : 0.974012
----- s n i p -----
I _THINK_ (!!) it's '/dev/scsi/host4/bus0/target15/lun0/part', but
I'm not sure.... Don't want to yank a perfectly working disk :)
----- s n i p -----
aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1
mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1 (Expected magic a92b4efc, got 00000000)
----- s n i p -----
Is there any way I can find out exactly which disk 'number 8' is?
----- s n i p -----
aurora:~# mdadm --version
mdadm - v1.6.0 - 4 June 2004
aurora:~# uname -a
Linux aurora 2.6.8.1 #1 SMP Sun Dec 12 21:04:58 CET 2004 sparc64 unknown
----- s n i p -----
--
Ortega assassination explosion FBI president PLO Iran domestic
disruption genetic Ft. Bragg Rule Psix $400 million in gold bullion
Clinton Cocaine Treasury
[See http://www.aclu.org/echelonwatch/index.html for more about this]
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Which (physical) disk is broken?
2004-12-21 11:08 Which (physical) disk is broken? Turbo Fredriksson
@ 2004-12-22 16:39 ` Guy
2004-12-22 20:15 ` Michael
2004-12-22 22:33 ` Turbo Fredriksson
0 siblings, 2 replies; 12+ messages in thread
From: Guy @ 2004-12-22 16:39 UTC (permalink / raw)
To: 'Turbo Fredriksson', linux-raid
Since it was removed, it is not a disk. It is no disk.
The question is: what disk was it?
You could determine which disks it is not!
If you access your array, every disk in it will have disk activity.
The disks that do not have activity are not in the array.
If you have more arrays, you would need to determine which disks they use
also.
If you have non array disks, you will need to identify them also.
Once all but one disk is identified, it must be your disk.
Use a command like this to cause your disks to have activity, then look for
the blinking lights:
dd if=/dev/md/1 of=/dev/null bs=64k
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Turbo Fredriksson
Sent: Tuesday, December 21, 2004 6:09 AM
To: linux-raid@vger.kernel.org
Subject: Which (physical) disk is broken?
I've forgot, and I can't seem to find out which physical disk
is broken/removed from an array...
----- s n i p -----
aurora:~# mdadm -D /dev/md/1
/dev/md/1:
Version : 00.90.01
Creation Time : Wed Oct 27 08:12:44 2004
Raid Level : raid5
Array Size : 141483520 (134.93 GiB 144.88 GB)
Device Size : 17685440 (16.87 GiB 18.11 GB)
Raid Devices : 9
Total Devices : 8
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Tue Dec 21 12:04:43 2004
State : dirty, degraded
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
0 8 49 0 active sync
/dev/scsi/host3/bus0/target4/lun0/part1
1 8 81 1 active sync
/dev/scsi/host3/bus0/target8/lun0/part1
2 8 97 2 active sync
/dev/scsi/host3/bus0/target9/lun0/part1
3 8 241 3 active sync
/dev/scsi/host4/bus0/target4/lun0/part1
4 65 1 4 active sync
/dev/scsi/host4/bus0/target5/lun0/part1
5 65 17 5 active sync
/dev/scsi/host4/bus0/target8/lun0/part1
6 65 33 6 active sync
/dev/scsi/host4/bus0/target9/lun0/part1
7 65 113 7 active sync
/dev/scsi/host4/bus0/target14/lun0/part1
8 0 0 -1 removed
UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
Events : 0.974012
----- s n i p -----
I _THINK_ (!!) it's '/dev/scsi/host4/bus0/target15/lun0/part', but
I'm not sure.... Don't want to yank a perfectly working disk :)
----- s n i p -----
aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1
mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1
(Expected magic a92b4efc, got 00000000)
----- s n i p -----
Is there any way I can find out exactly which disk 'number 8' is?
----- s n i p -----
aurora:~# mdadm --version
mdadm - v1.6.0 - 4 June 2004
aurora:~# uname -a
Linux aurora 2.6.8.1 #1 SMP Sun Dec 12 21:04:58 CET 2004 sparc64 unknown
----- s n i p -----
--
Ortega assassination explosion FBI president PLO Iran domestic
disruption genetic Ft. Bragg Rule Psix $400 million in gold bullion
Clinton Cocaine Treasury
[See http://www.aclu.org/echelonwatch/index.html for more about this]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Which (physical) disk is broken?
2004-12-22 16:39 ` Guy
@ 2004-12-22 20:15 ` Michael
2004-12-22 23:10 ` Neil Brown
2004-12-22 22:33 ` Turbo Fredriksson
1 sibling, 1 reply; 12+ messages in thread
From: Michael @ 2004-12-22 20:15 UTC (permalink / raw)
To: linux-raid
<snip>
> Use a command like this to cause your disks to have activity, then look for
> the blinking lights:
> dd if=/dev/md/1 of=/dev/null bs=64k
>
Oh swell!.... I have one blinking light for ALL the disks on all 12 of
our raid systems. There has to be a better way.
Michael
> Guy
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Turbo Fredriksson
> Sent: Tuesday, December 21, 2004 6:09 AM
> To: linux-raid@vger.kernel.org
> Subject: Which (physical) disk is broken?
>
> I've forgot, and I can't seem to find out which physical disk
> is broken/removed from an array...
>
> ----- s n i p -----
> aurora:~# mdadm -D /dev/md/1
> /dev/md/1:
> Version : 00.90.01
> Creation Time : Wed Oct 27 08:12:44 2004
> Raid Level : raid5
> Array Size : 141483520 (134.93 GiB 144.88 GB)
> Device Size : 17685440 (16.87 GiB 18.11 GB)
> Raid Devices : 9
> Total Devices : 8
> Preferred Minor : 1
> Persistence : Superblock is persistent
>
> Update Time : Tue Dec 21 12:04:43 2004
> State : dirty, degraded
> Active Devices : 8
> Working Devices : 8
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 32K
>
> Number Major Minor RaidDevice State
> 0 8 49 0 active sync
> /dev/scsi/host3/bus0/target4/lun0/part1
> 1 8 81 1 active sync
> /dev/scsi/host3/bus0/target8/lun0/part1
> 2 8 97 2 active sync
> /dev/scsi/host3/bus0/target9/lun0/part1
> 3 8 241 3 active sync
> /dev/scsi/host4/bus0/target4/lun0/part1
> 4 65 1 4 active sync
> /dev/scsi/host4/bus0/target5/lun0/part1
> 5 65 17 5 active sync
> /dev/scsi/host4/bus0/target8/lun0/part1
> 6 65 33 6 active sync
> /dev/scsi/host4/bus0/target9/lun0/part1
> 7 65 113 7 active sync
> /dev/scsi/host4/bus0/target14/lun0/part1
> 8 0 0 -1 removed
> UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
> Events : 0.974012
> ----- s n i p -----
>
> I _THINK_ (!!) it's '/dev/scsi/host4/bus0/target15/lun0/part', but
> I'm not sure.... Don't want to yank a perfectly working disk :)
>
> ----- s n i p -----
> aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1
> mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1
> (Expected magic a92b4efc, got 00000000)
> ----- s n i p -----
>
> Is there any way I can find out exactly which disk 'number 8' is?
>
> ----- s n i p -----
> aurora:~# mdadm --version
> mdadm - v1.6.0 - 4 June 2004
> aurora:~# uname -a
> Linux aurora 2.6.8.1 #1 SMP Sun Dec 12 21:04:58 CET 2004 sparc64 unknown
> ----- s n i p -----
> --
> Ortega assassination explosion FBI president PLO Iran domestic
> disruption genetic Ft. Bragg Rule Psix $400 million in gold bullion
> Clinton Cocaine Treasury
> [See http://www.aclu.org/echelonwatch/index.html for more about this]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken?
2004-12-22 16:39 ` Guy
2004-12-22 20:15 ` Michael
@ 2004-12-22 22:33 ` Turbo Fredriksson
2004-12-22 23:00 ` Neil Brown
1 sibling, 1 reply; 12+ messages in thread
From: Turbo Fredriksson @ 2004-12-22 22:33 UTC (permalink / raw)
To: linux-raid
>>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes:
Guy> If you access your array, every disk in it will have disk activity.
He. That's one way I guess... I was more hoping for some support for
this in mdadm...
But thanx.
--
ammunition SDI nuclear Panama ammonium Albanian NSA nitrate security
CIA president radar SEAL Team 6 NORAD Cocaine
[See http://www.aclu.org/echelonwatch/index.html for more about this]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken?
2004-12-22 22:33 ` Turbo Fredriksson
@ 2004-12-22 23:00 ` Neil Brown
2004-12-24 11:09 ` Turbo Fredriksson
0 siblings, 1 reply; 12+ messages in thread
From: Neil Brown @ 2004-12-22 23:00 UTC (permalink / raw)
To: Turbo Fredriksson; +Cc: linux-raid
On Wednesday December 22, turbo@bayour.com wrote:
> >>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes:
>
> Guy> If you access your array, every disk in it will have disk activity.
>
> He. That's one way I guess... I was more hoping for some support for
> this in mdadm...
Would be nice....
but until disks have little blue lights that can be turned on and off
under software control, and the linux block-device layer has an
interface to access this control, there isn't much mdadm can usefully
do.
NeilBrown
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Which (physical) disk is broken?
2004-12-22 20:15 ` Michael
@ 2004-12-22 23:10 ` Neil Brown
2004-12-23 0:52 ` Guy
0 siblings, 1 reply; 12+ messages in thread
From: Neil Brown @ 2004-12-22 23:10 UTC (permalink / raw)
To: michael; +Cc: linux-raid
On Wednesday December 22, michael@insulin-pumpers.org wrote:
> <snip>
> > Use a command like this to cause your disks to have activity, then look for
> > the blinking lights:
> > dd if=/dev/md/1 of=/dev/null bs=64k
> >
>
> Oh swell!.... I have one blinking light for ALL the disks on all 12 of
> our raid systems. There has to be a better way.
There are really only two ways:
1/ make lights blinks.
2/ have labels on the front of each bay saying what SCSI (or
whatever) ID the device has.
If your storage array does not have per-device lights, or per-device
labels, you should speak sternly to your hardware provider.
NeilBrown
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Which (physical) disk is broken?
2004-12-22 23:10 ` Neil Brown
@ 2004-12-23 0:52 ` Guy
0 siblings, 0 replies; 12+ messages in thread
From: Guy @ 2004-12-23 0:52 UTC (permalink / raw)
To: michael; +Cc: linux-raid
I have never seen a disk system without per disk lights. You may need to
open a door or something.
If you don't have per disk labels, how would you ever know what disk to
remove? At least the manual should give details. I have seen this issue, I
then add my own labels.
This seems like a large enough system that you should add custom labels so
you know which disk is which, and which array uses each disk. Which SCSI
tray connects to which SCSI host, ...
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Neil Brown
Sent: Wednesday, December 22, 2004 6:11 PM
To: michael@insulin-pumpers.org
Cc: linux-raid@vger.kernel.org
Subject: RE: Which (physical) disk is broken?
On Wednesday December 22, michael@insulin-pumpers.org wrote:
> <snip>
> > Use a command like this to cause your disks to have activity, then look
for
> > the blinking lights:
> > dd if=/dev/md/1 of=/dev/null bs=64k
> >
>
> Oh swell!.... I have one blinking light for ALL the disks on all 12 of
> our raid systems. There has to be a better way.
There are really only two ways:
1/ make lights blinks.
2/ have labels on the front of each bay saying what SCSI (or
whatever) ID the device has.
If your storage array does not have per-device lights, or per-device
labels, you should speak sternly to your hardware provider.
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken?
2004-12-22 23:00 ` Neil Brown
@ 2004-12-24 11:09 ` Turbo Fredriksson
2004-12-24 11:31 ` Theepan
2004-12-24 17:14 ` Guy
0 siblings, 2 replies; 12+ messages in thread
From: Turbo Fredriksson @ 2004-12-24 11:09 UTC (permalink / raw)
To: linux-raid
>>>>> "Neil" == Neil Brown <neilb@cse.unsw.edu.au> writes:
Neil> On Wednesday December 22, turbo@bayour.com wrote:
>> >>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes:
>>
Guy> If you access your array, every disk in it will have disk
Guy> activity.
>> He. That's one way I guess... I was more hoping for some
>> support for this in mdadm...
Neil> Would be nice.... but until disks have little blue lights
Neil> that can be turned on and off under software control, and
Neil> the linux block-device layer has an interface to access this
Neil> control, there isn't much mdadm can usefully do.
I was more thinking on the 'magic'. When I created the array, I included
this broken disk. I later removed it, but there should still (?) be a
record of it in the super block (or wherever mdadm checks).
It knows how many disks there SHOULD be, and it knows how many is working.
----- s n i p -----
aurora:~# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Wed Oct 27 08:12:44 2004
Raid Level : raid5
Array Size : 141483520 (134.93 GiB 144.88 GB)
Device Size : 17685440 (16.87 GiB 18.11 GB)
Raid Devices : 9
Total Devices : 8
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Fri Dec 24 11:45:11 2004
State : clean, degraded
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1
1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1
2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1
3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1
4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1
5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1
6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1
7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1
8 0 0 -1 removed
UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
Events : 0.1005733
----- s n i p -----
Maybe it's to late now, but can't mdadm write a 'tag' somehow that 'this disk
have broken down' and/or 'this disk have been removed from the array'?
I don't know how the (software) RAID works on kernel/hardware level, only
how I, as a user/admin uses it :)
But if I check one of the disks scsi4:4:part1 for example:
----- s n i p -----
aurora:~# mdadm -E /dev/scsi/host4/bus0/target4/lun0/part1
/dev/scsi/host4/bus0/target4/lun0/part1:
Magic : a92b4efc
Version : 00.90.00
UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
Creation Time : Wed Oct 27 08:12:44 2004
Raid Level : raid5
Device Size : 17685440 (16.87 GiB 18.11 GB)
Raid Devices : 9
Total Devices : 8
Preferred Minor : 1
Update Time : Fri Dec 24 11:49:46 2004
State : clean
Active Devices : 8
Working Devices : 8
Failed Devices : 1
Spare Devices : 0
Checksum : b038c0d3 - correct
Events : 0.1005753
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1
0 0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1
1 1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1
2 2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1
3 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1
4 4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1
5 5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1
6 6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1
7 7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1
8 8 0 0 8 faulty removed
----- s n i p -----
It (mdadm?) know exactly what md it belongs to etc... If I check
the disk that i THINK is the 'faulty removed' one, it 'knows nothing':
----- s n i p -----
aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1
mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1 (Expected magic a92b4efc, got 00000000)
----- s n i p -----
So, IF this is the disk, then what did mdadm do when it was removed?
Did it clear the super block? Couldn't it write something else instead?
Or at least keep the UUID?
Hmm, looking trough the code (I do that freely, before someone throws
it at me :) i see that all mdadm do is a ioctl() so I guess it's something
that have to be done in the kernel (?)...
But how come mdadm knows that there's one removed? Common sence? There's
9 raid devices (how come it knows that?) and 8 total devices, hence one
must be/have been removed?
Looking at md.c in the kernel, I see that it don't write to the disk
(when removing a device from an array) when persistent super blocks
is used (which I don't). There don't seem to be an option in mdadm
for this (I do remember using it with raidtools long ago)..
Am I way of? It's early on christmas eve, and my blood to coffein ratio
is high... :)
--
Iran arrangements [Hello to all my fans in domestic surveillance]
Kennedy pits 767 Peking Clinton assassination Cocaine PLO Ft. Meade
killed $400 million in gold bullion kibo
[See http://www.aclu.org/echelonwatch/index.html for more about this]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken?
2004-12-24 11:09 ` Turbo Fredriksson
@ 2004-12-24 11:31 ` Theepan
2004-12-24 12:32 ` Turbo Fredriksson
2004-12-24 17:14 ` Guy
1 sibling, 1 reply; 12+ messages in thread
From: Theepan @ 2004-12-24 11:31 UTC (permalink / raw)
To: linux-raid
>
> I was more thinking on the 'magic'. When I created the array, I included
> this broken disk. I later removed it, but there should still (?) be a
> record of it in the super block (or wherever mdadm checks).
>
> It knows how many disks there SHOULD be, and it knows how many is working.
>
How about something primitive as checking the syslog messages (sent to
facility kern)? You'll get messages from 'md' when you manipulate arrays, at
least this is true for kernel 2.4. You'll see something like this:
Dec 24 12:25:02 storm kernel: md: trying to remove hdk1 from md4 ...
--
Theepan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken?
2004-12-24 11:31 ` Theepan
@ 2004-12-24 12:32 ` Turbo Fredriksson
0 siblings, 0 replies; 12+ messages in thread
From: Turbo Fredriksson @ 2004-12-24 12:32 UTC (permalink / raw)
To: linux-raid
Quoting "Theepan" <tornado@linuxfromscratch.org>:
> How about something primitive as checking the syslog messages (sent to
> facility kern)? You'll get messages from 'md' when you manipulate arrays, at
> least this is true for kernel 2.4. You'll see something like this:
>
> Dec 24 12:25:02 storm kernel: md: trying to remove hdk1 from md4 ...
To long time ago... It's not kept on disk any more. I'll have a look at
any backups, but I don't think I keep anything that far back...
--
Albanian quiche Clinton 767 Noriega pits congress Soviet Legion of
Doom Rule Psix Ft. Bragg Serbian strategic jihad arrangements
[See http://www.aclu.org/echelonwatch/index.html for more about this]
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Which (physical) disk is broken?
2004-12-24 11:09 ` Turbo Fredriksson
2004-12-24 11:31 ` Theepan
@ 2004-12-24 17:14 ` Guy
2004-12-27 12:35 ` Turbo Fredriksson
1 sibling, 1 reply; 12+ messages in thread
From: Guy @ 2004-12-24 17:14 UTC (permalink / raw)
To: 'Turbo Fredriksson', linux-raid
The info is in the superblock. But if the disk has failed, you may not be
able to read the superblock.
Did you say you don't use superblocks?
I guess you better keep a paper trail!
You said:
"when persistent super blocks is used (which I don't)."
But the output from mdadm said:
"Persistence : Superblock is persistent"
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Turbo Fredriksson
Sent: Friday, December 24, 2004 6:10 AM
To: linux-raid@vger.kernel.org
Subject: Re: Which (physical) disk is broken?
>>>>> "Neil" == Neil Brown <neilb@cse.unsw.edu.au> writes:
Neil> On Wednesday December 22, turbo@bayour.com wrote:
>> >>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes:
>>
Guy> If you access your array, every disk in it will have disk
Guy> activity.
>> He. That's one way I guess... I was more hoping for some
>> support for this in mdadm...
Neil> Would be nice.... but until disks have little blue lights
Neil> that can be turned on and off under software control, and
Neil> the linux block-device layer has an interface to access this
Neil> control, there isn't much mdadm can usefully do.
I was more thinking on the 'magic'. When I created the array, I included
this broken disk. I later removed it, but there should still (?) be a
record of it in the super block (or wherever mdadm checks).
It knows how many disks there SHOULD be, and it knows how many is working.
----- s n i p -----
aurora:~# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Wed Oct 27 08:12:44 2004
Raid Level : raid5
Array Size : 141483520 (134.93 GiB 144.88 GB)
Device Size : 17685440 (16.87 GiB 18.11 GB)
Raid Devices : 9
Total Devices : 8
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Fri Dec 24 11:45:11 2004
State : clean, degraded
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
0 8 49 0 active sync
/dev/scsi/host3/bus0/target4/lun0/part1
1 8 81 1 active sync
/dev/scsi/host3/bus0/target8/lun0/part1
2 8 97 2 active sync
/dev/scsi/host3/bus0/target9/lun0/part1
3 8 241 3 active sync
/dev/scsi/host4/bus0/target4/lun0/part1
4 65 1 4 active sync
/dev/scsi/host4/bus0/target5/lun0/part1
5 65 17 5 active sync
/dev/scsi/host4/bus0/target8/lun0/part1
6 65 33 6 active sync
/dev/scsi/host4/bus0/target9/lun0/part1
7 65 113 7 active sync
/dev/scsi/host4/bus0/target14/lun0/part1
8 0 0 -1 removed
UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
Events : 0.1005733
----- s n i p -----
Maybe it's to late now, but can't mdadm write a 'tag' somehow that 'this
disk
have broken down' and/or 'this disk have been removed from the array'?
I don't know how the (software) RAID works on kernel/hardware level, only
how I, as a user/admin uses it :)
But if I check one of the disks scsi4:4:part1 for example:
----- s n i p -----
aurora:~# mdadm -E /dev/scsi/host4/bus0/target4/lun0/part1
/dev/scsi/host4/bus0/target4/lun0/part1:
Magic : a92b4efc
Version : 00.90.00
UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
Creation Time : Wed Oct 27 08:12:44 2004
Raid Level : raid5
Device Size : 17685440 (16.87 GiB 18.11 GB)
Raid Devices : 9
Total Devices : 8
Preferred Minor : 1
Update Time : Fri Dec 24 11:49:46 2004
State : clean
Active Devices : 8
Working Devices : 8
Failed Devices : 1
Spare Devices : 0
Checksum : b038c0d3 - correct
Events : 0.1005753
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 3 8 241 3 active sync
/dev/scsi/host4/bus0/target4/lun0/part1
0 0 8 49 0 active sync
/dev/scsi/host3/bus0/target4/lun0/part1
1 1 8 81 1 active sync
/dev/scsi/host3/bus0/target8/lun0/part1
2 2 8 97 2 active sync
/dev/scsi/host3/bus0/target9/lun0/part1
3 3 8 241 3 active sync
/dev/scsi/host4/bus0/target4/lun0/part1
4 4 65 1 4 active sync
/dev/scsi/host4/bus0/target5/lun0/part1
5 5 65 17 5 active sync
/dev/scsi/host4/bus0/target8/lun0/part1
6 6 65 33 6 active sync
/dev/scsi/host4/bus0/target9/lun0/part1
7 7 65 113 7 active sync
/dev/scsi/host4/bus0/target14/lun0/part1
8 8 0 0 8 faulty removed
----- s n i p -----
It (mdadm?) know exactly what md it belongs to etc... If I check
the disk that i THINK is the 'faulty removed' one, it 'knows nothing':
----- s n i p -----
aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1
mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1
(Expected magic a92b4efc, got 00000000)
----- s n i p -----
So, IF this is the disk, then what did mdadm do when it was removed?
Did it clear the super block? Couldn't it write something else instead?
Or at least keep the UUID?
Hmm, looking trough the code (I do that freely, before someone throws
it at me :) i see that all mdadm do is a ioctl() so I guess it's something
that have to be done in the kernel (?)...
But how come mdadm knows that there's one removed? Common sence? There's
9 raid devices (how come it knows that?) and 8 total devices, hence one
must be/have been removed?
Looking at md.c in the kernel, I see that it don't write to the disk
(when removing a device from an array) when persistent super blocks
is used (which I don't). There don't seem to be an option in mdadm
for this (I do remember using it with raidtools long ago)..
Am I way of? It's early on christmas eve, and my blood to coffein ratio
is high... :)
--
Iran arrangements [Hello to all my fans in domestic surveillance]
Kennedy pits 767 Peking Clinton assassination Cocaine PLO Ft. Meade
killed $400 million in gold bullion kibo
[See http://www.aclu.org/echelonwatch/index.html for more about this]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken?
2004-12-24 17:14 ` Guy
@ 2004-12-27 12:35 ` Turbo Fredriksson
0 siblings, 0 replies; 12+ messages in thread
From: Turbo Fredriksson @ 2004-12-27 12:35 UTC (permalink / raw)
To: linux-raid
>>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes:
Guy> The info is in the superblock. But if the disk has failed,
Guy> you may not be able to read the superblock.
It's not THAT failed. It just have way to many bad blocks to be useful
so I took it out of service (but, unfortunately not from the machine -
I'll remember to do that next time :)...
To bad there's no 'eject' command for disks :) That would be quite funny,
shooting my colleague(s) with disks from the rack :)
Guy> Did you say you don't use superblocks? I guess you better
Guy> keep a paper trail!
I do, but I always forget to update it...
Guy> You said: "when persistent super blocks is used (which I
Guy> don't)."
I didn't say I don't use 'super blocks' I said 'PERSISTENT super blocks'
Guy> But the output from mdadm said: "Persistence : Superblock is
Guy> persistent"
He, yeah. Sorry. I saw that in the same instant it was sent, but to late
to cancel...
I've had quite a lot of time thinking about this, and I am now QUITE
sure that I created the array with 'missing' on the place where this
disk should have been.... I can VAGLEY remember that that disk kept
failing the array when I built the machine/array initially. But I'm not
sure it was THAT disk, or some other (I have a bunch of these disks
that "don't quite work")...
Thanx for all the help everyone. But seeing the thread
'mdadm -D /dev/md3: 1 0 0 0 sync ???' on this list, leads me to
believe I'm not THAT far out in wanting some better information
on what disk is what...
--
Khaddafi Marxist Legion of Doom 767 747 domestic disruption Peking
subway nitrate Uzi Kennedy explosion Rule Psix NORAD Honduras
[See http://www.aclu.org/echelonwatch/index.html for more about this]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-12-27 12:35 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-21 11:08 Which (physical) disk is broken? Turbo Fredriksson
2004-12-22 16:39 ` Guy
2004-12-22 20:15 ` Michael
2004-12-22 23:10 ` Neil Brown
2004-12-23 0:52 ` Guy
2004-12-22 22:33 ` Turbo Fredriksson
2004-12-22 23:00 ` Neil Brown
2004-12-24 11:09 ` Turbo Fredriksson
2004-12-24 11:31 ` Theepan
2004-12-24 12:32 ` Turbo Fredriksson
2004-12-24 17:14 ` Guy
2004-12-27 12:35 ` Turbo Fredriksson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).