* Which (physical) disk is broken?
@ 2004-12-21 11:08 Turbo Fredriksson
2004-12-22 16:39 ` Guy
0 siblings, 1 reply; 12+ messages in thread
From: Turbo Fredriksson @ 2004-12-21 11:08 UTC (permalink / raw)
To: linux-raid
I've forgot, and I can't seem to find out which physical disk
is broken/removed from an array...
----- s n i p -----
aurora:~# mdadm -D /dev/md/1
/dev/md/1:
Version : 00.90.01
Creation Time : Wed Oct 27 08:12:44 2004
Raid Level : raid5
Array Size : 141483520 (134.93 GiB 144.88 GB)
Device Size : 17685440 (16.87 GiB 18.11 GB)
Raid Devices : 9
Total Devices : 8
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Tue Dec 21 12:04:43 2004
State : dirty, degraded
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1
1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1
2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1
3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1
4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1
5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1
6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1
7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1
8 0 0 -1 removed
UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b
Events : 0.974012
----- s n i p -----
I _THINK_ (!!) it's '/dev/scsi/host4/bus0/target15/lun0/part', but
I'm not sure.... Don't want to yank a perfectly working disk :)
----- s n i p -----
aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1
mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1 (Expected magic a92b4efc, got 00000000)
----- s n i p -----
Is there any way I can find out exactly which disk 'number 8' is?
----- s n i p -----
aurora:~# mdadm --version
mdadm - v1.6.0 - 4 June 2004
aurora:~# uname -a
Linux aurora 2.6.8.1 #1 SMP Sun Dec 12 21:04:58 CET 2004 sparc64 unknown
----- s n i p -----
--
Ortega assassination explosion FBI president PLO Iran domestic
disruption genetic Ft. Bragg Rule Psix $400 million in gold bullion
Clinton Cocaine Treasury
[See http://www.aclu.org/echelonwatch/index.html for more about this]
^ permalink raw reply [flat|nested] 12+ messages in thread* RE: Which (physical) disk is broken? 2004-12-21 11:08 Which (physical) disk is broken? Turbo Fredriksson @ 2004-12-22 16:39 ` Guy 2004-12-22 20:15 ` Michael 2004-12-22 22:33 ` Turbo Fredriksson 0 siblings, 2 replies; 12+ messages in thread From: Guy @ 2004-12-22 16:39 UTC (permalink / raw) To: 'Turbo Fredriksson', linux-raid Since it was removed, it is not a disk. It is no disk. The question is: what disk was it? You could determine which disks it is not! If you access your array, every disk in it will have disk activity. The disks that do not have activity are not in the array. If you have more arrays, you would need to determine which disks they use also. If you have non array disks, you will need to identify them also. Once all but one disk is identified, it must be your disk. Use a command like this to cause your disks to have activity, then look for the blinking lights: dd if=/dev/md/1 of=/dev/null bs=64k Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Turbo Fredriksson Sent: Tuesday, December 21, 2004 6:09 AM To: linux-raid@vger.kernel.org Subject: Which (physical) disk is broken? I've forgot, and I can't seem to find out which physical disk is broken/removed from an array... ----- s n i p ----- aurora:~# mdadm -D /dev/md/1 /dev/md/1: Version : 00.90.01 Creation Time : Wed Oct 27 08:12:44 2004 Raid Level : raid5 Array Size : 141483520 (134.93 GiB 144.88 GB) Device Size : 17685440 (16.87 GiB 18.11 GB) Raid Devices : 9 Total Devices : 8 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Tue Dec 21 12:04:43 2004 State : dirty, degraded Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State 0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1 1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1 2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1 5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1 6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1 7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1 8 0 0 -1 removed UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b Events : 0.974012 ----- s n i p ----- I _THINK_ (!!) it's '/dev/scsi/host4/bus0/target15/lun0/part', but I'm not sure.... Don't want to yank a perfectly working disk :) ----- s n i p ----- aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1 mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1 (Expected magic a92b4efc, got 00000000) ----- s n i p ----- Is there any way I can find out exactly which disk 'number 8' is? ----- s n i p ----- aurora:~# mdadm --version mdadm - v1.6.0 - 4 June 2004 aurora:~# uname -a Linux aurora 2.6.8.1 #1 SMP Sun Dec 12 21:04:58 CET 2004 sparc64 unknown ----- s n i p ----- -- Ortega assassination explosion FBI president PLO Iran domestic disruption genetic Ft. Bragg Rule Psix $400 million in gold bullion Clinton Cocaine Treasury [See http://www.aclu.org/echelonwatch/index.html for more about this] - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Which (physical) disk is broken? 2004-12-22 16:39 ` Guy @ 2004-12-22 20:15 ` Michael 2004-12-22 23:10 ` Neil Brown 2004-12-22 22:33 ` Turbo Fredriksson 1 sibling, 1 reply; 12+ messages in thread From: Michael @ 2004-12-22 20:15 UTC (permalink / raw) To: linux-raid <snip> > Use a command like this to cause your disks to have activity, then look for > the blinking lights: > dd if=/dev/md/1 of=/dev/null bs=64k > Oh swell!.... I have one blinking light for ALL the disks on all 12 of our raid systems. There has to be a better way. Michael > Guy > > -----Original Message----- > From: linux-raid-owner@vger.kernel.org > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Turbo Fredriksson > Sent: Tuesday, December 21, 2004 6:09 AM > To: linux-raid@vger.kernel.org > Subject: Which (physical) disk is broken? > > I've forgot, and I can't seem to find out which physical disk > is broken/removed from an array... > > ----- s n i p ----- > aurora:~# mdadm -D /dev/md/1 > /dev/md/1: > Version : 00.90.01 > Creation Time : Wed Oct 27 08:12:44 2004 > Raid Level : raid5 > Array Size : 141483520 (134.93 GiB 144.88 GB) > Device Size : 17685440 (16.87 GiB 18.11 GB) > Raid Devices : 9 > Total Devices : 8 > Preferred Minor : 1 > Persistence : Superblock is persistent > > Update Time : Tue Dec 21 12:04:43 2004 > State : dirty, degraded > Active Devices : 8 > Working Devices : 8 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 32K > > Number Major Minor RaidDevice State > 0 8 49 0 active sync > /dev/scsi/host3/bus0/target4/lun0/part1 > 1 8 81 1 active sync > /dev/scsi/host3/bus0/target8/lun0/part1 > 2 8 97 2 active sync > /dev/scsi/host3/bus0/target9/lun0/part1 > 3 8 241 3 active sync > /dev/scsi/host4/bus0/target4/lun0/part1 > 4 65 1 4 active sync > /dev/scsi/host4/bus0/target5/lun0/part1 > 5 65 17 5 active sync > /dev/scsi/host4/bus0/target8/lun0/part1 > 6 65 33 6 active sync > /dev/scsi/host4/bus0/target9/lun0/part1 > 7 65 113 7 active sync > /dev/scsi/host4/bus0/target14/lun0/part1 > 8 0 0 -1 removed > UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b > Events : 0.974012 > ----- s n i p ----- > > I _THINK_ (!!) it's '/dev/scsi/host4/bus0/target15/lun0/part', but > I'm not sure.... Don't want to yank a perfectly working disk :) > > ----- s n i p ----- > aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1 > mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1 > (Expected magic a92b4efc, got 00000000) > ----- s n i p ----- > > Is there any way I can find out exactly which disk 'number 8' is? > > ----- s n i p ----- > aurora:~# mdadm --version > mdadm - v1.6.0 - 4 June 2004 > aurora:~# uname -a > Linux aurora 2.6.8.1 #1 SMP Sun Dec 12 21:04:58 CET 2004 sparc64 unknown > ----- s n i p ----- > -- > Ortega assassination explosion FBI president PLO Iran domestic > disruption genetic Ft. Bragg Rule Psix $400 million in gold bullion > Clinton Cocaine Treasury > [See http://www.aclu.org/echelonwatch/index.html for more about this] > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Which (physical) disk is broken? 2004-12-22 20:15 ` Michael @ 2004-12-22 23:10 ` Neil Brown 2004-12-23 0:52 ` Guy 0 siblings, 1 reply; 12+ messages in thread From: Neil Brown @ 2004-12-22 23:10 UTC (permalink / raw) To: michael; +Cc: linux-raid On Wednesday December 22, michael@insulin-pumpers.org wrote: > <snip> > > Use a command like this to cause your disks to have activity, then look for > > the blinking lights: > > dd if=/dev/md/1 of=/dev/null bs=64k > > > > Oh swell!.... I have one blinking light for ALL the disks on all 12 of > our raid systems. There has to be a better way. There are really only two ways: 1/ make lights blinks. 2/ have labels on the front of each bay saying what SCSI (or whatever) ID the device has. If your storage array does not have per-device lights, or per-device labels, you should speak sternly to your hardware provider. NeilBrown ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Which (physical) disk is broken? 2004-12-22 23:10 ` Neil Brown @ 2004-12-23 0:52 ` Guy 0 siblings, 0 replies; 12+ messages in thread From: Guy @ 2004-12-23 0:52 UTC (permalink / raw) To: michael; +Cc: linux-raid I have never seen a disk system without per disk lights. You may need to open a door or something. If you don't have per disk labels, how would you ever know what disk to remove? At least the manual should give details. I have seen this issue, I then add my own labels. This seems like a large enough system that you should add custom labels so you know which disk is which, and which array uses each disk. Which SCSI tray connects to which SCSI host, ... Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Neil Brown Sent: Wednesday, December 22, 2004 6:11 PM To: michael@insulin-pumpers.org Cc: linux-raid@vger.kernel.org Subject: RE: Which (physical) disk is broken? On Wednesday December 22, michael@insulin-pumpers.org wrote: > <snip> > > Use a command like this to cause your disks to have activity, then look for > > the blinking lights: > > dd if=/dev/md/1 of=/dev/null bs=64k > > > > Oh swell!.... I have one blinking light for ALL the disks on all 12 of > our raid systems. There has to be a better way. There are really only two ways: 1/ make lights blinks. 2/ have labels on the front of each bay saying what SCSI (or whatever) ID the device has. If your storage array does not have per-device lights, or per-device labels, you should speak sternly to your hardware provider. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken? 2004-12-22 16:39 ` Guy 2004-12-22 20:15 ` Michael @ 2004-12-22 22:33 ` Turbo Fredriksson 2004-12-22 23:00 ` Neil Brown 1 sibling, 1 reply; 12+ messages in thread From: Turbo Fredriksson @ 2004-12-22 22:33 UTC (permalink / raw) To: linux-raid >>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes: Guy> If you access your array, every disk in it will have disk activity. He. That's one way I guess... I was more hoping for some support for this in mdadm... But thanx. -- ammunition SDI nuclear Panama ammonium Albanian NSA nitrate security CIA president radar SEAL Team 6 NORAD Cocaine [See http://www.aclu.org/echelonwatch/index.html for more about this] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken? 2004-12-22 22:33 ` Turbo Fredriksson @ 2004-12-22 23:00 ` Neil Brown 2004-12-24 11:09 ` Turbo Fredriksson 0 siblings, 1 reply; 12+ messages in thread From: Neil Brown @ 2004-12-22 23:00 UTC (permalink / raw) To: Turbo Fredriksson; +Cc: linux-raid On Wednesday December 22, turbo@bayour.com wrote: > >>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes: > > Guy> If you access your array, every disk in it will have disk activity. > > He. That's one way I guess... I was more hoping for some support for > this in mdadm... Would be nice.... but until disks have little blue lights that can be turned on and off under software control, and the linux block-device layer has an interface to access this control, there isn't much mdadm can usefully do. NeilBrown ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken? 2004-12-22 23:00 ` Neil Brown @ 2004-12-24 11:09 ` Turbo Fredriksson 2004-12-24 11:31 ` Theepan 2004-12-24 17:14 ` Guy 0 siblings, 2 replies; 12+ messages in thread From: Turbo Fredriksson @ 2004-12-24 11:09 UTC (permalink / raw) To: linux-raid >>>>> "Neil" == Neil Brown <neilb@cse.unsw.edu.au> writes: Neil> On Wednesday December 22, turbo@bayour.com wrote: >> >>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes: >> Guy> If you access your array, every disk in it will have disk Guy> activity. >> He. That's one way I guess... I was more hoping for some >> support for this in mdadm... Neil> Would be nice.... but until disks have little blue lights Neil> that can be turned on and off under software control, and Neil> the linux block-device layer has an interface to access this Neil> control, there isn't much mdadm can usefully do. I was more thinking on the 'magic'. When I created the array, I included this broken disk. I later removed it, but there should still (?) be a record of it in the super block (or wherever mdadm checks). It knows how many disks there SHOULD be, and it knows how many is working. ----- s n i p ----- aurora:~# mdadm -D /dev/md1 /dev/md1: Version : 00.90.01 Creation Time : Wed Oct 27 08:12:44 2004 Raid Level : raid5 Array Size : 141483520 (134.93 GiB 144.88 GB) Device Size : 17685440 (16.87 GiB 18.11 GB) Raid Devices : 9 Total Devices : 8 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Dec 24 11:45:11 2004 State : clean, degraded Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State 0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1 1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1 2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1 5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1 6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1 7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1 8 0 0 -1 removed UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b Events : 0.1005733 ----- s n i p ----- Maybe it's to late now, but can't mdadm write a 'tag' somehow that 'this disk have broken down' and/or 'this disk have been removed from the array'? I don't know how the (software) RAID works on kernel/hardware level, only how I, as a user/admin uses it :) But if I check one of the disks scsi4:4:part1 for example: ----- s n i p ----- aurora:~# mdadm -E /dev/scsi/host4/bus0/target4/lun0/part1 /dev/scsi/host4/bus0/target4/lun0/part1: Magic : a92b4efc Version : 00.90.00 UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b Creation Time : Wed Oct 27 08:12:44 2004 Raid Level : raid5 Device Size : 17685440 (16.87 GiB 18.11 GB) Raid Devices : 9 Total Devices : 8 Preferred Minor : 1 Update Time : Fri Dec 24 11:49:46 2004 State : clean Active Devices : 8 Working Devices : 8 Failed Devices : 1 Spare Devices : 0 Checksum : b038c0d3 - correct Events : 0.1005753 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State this 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 0 0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1 1 1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1 2 2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1 3 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 4 4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1 5 5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1 6 6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1 7 7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1 8 8 0 0 8 faulty removed ----- s n i p ----- It (mdadm?) know exactly what md it belongs to etc... If I check the disk that i THINK is the 'faulty removed' one, it 'knows nothing': ----- s n i p ----- aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1 mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1 (Expected magic a92b4efc, got 00000000) ----- s n i p ----- So, IF this is the disk, then what did mdadm do when it was removed? Did it clear the super block? Couldn't it write something else instead? Or at least keep the UUID? Hmm, looking trough the code (I do that freely, before someone throws it at me :) i see that all mdadm do is a ioctl() so I guess it's something that have to be done in the kernel (?)... But how come mdadm knows that there's one removed? Common sence? There's 9 raid devices (how come it knows that?) and 8 total devices, hence one must be/have been removed? Looking at md.c in the kernel, I see that it don't write to the disk (when removing a device from an array) when persistent super blocks is used (which I don't). There don't seem to be an option in mdadm for this (I do remember using it with raidtools long ago).. Am I way of? It's early on christmas eve, and my blood to coffein ratio is high... :) -- Iran arrangements [Hello to all my fans in domestic surveillance] Kennedy pits 767 Peking Clinton assassination Cocaine PLO Ft. Meade killed $400 million in gold bullion kibo [See http://www.aclu.org/echelonwatch/index.html for more about this] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken? 2004-12-24 11:09 ` Turbo Fredriksson @ 2004-12-24 11:31 ` Theepan 2004-12-24 12:32 ` Turbo Fredriksson 2004-12-24 17:14 ` Guy 1 sibling, 1 reply; 12+ messages in thread From: Theepan @ 2004-12-24 11:31 UTC (permalink / raw) To: linux-raid > > I was more thinking on the 'magic'. When I created the array, I included > this broken disk. I later removed it, but there should still (?) be a > record of it in the super block (or wherever mdadm checks). > > It knows how many disks there SHOULD be, and it knows how many is working. > How about something primitive as checking the syslog messages (sent to facility kern)? You'll get messages from 'md' when you manipulate arrays, at least this is true for kernel 2.4. You'll see something like this: Dec 24 12:25:02 storm kernel: md: trying to remove hdk1 from md4 ... -- Theepan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken? 2004-12-24 11:31 ` Theepan @ 2004-12-24 12:32 ` Turbo Fredriksson 0 siblings, 0 replies; 12+ messages in thread From: Turbo Fredriksson @ 2004-12-24 12:32 UTC (permalink / raw) To: linux-raid Quoting "Theepan" <tornado@linuxfromscratch.org>: > How about something primitive as checking the syslog messages (sent to > facility kern)? You'll get messages from 'md' when you manipulate arrays, at > least this is true for kernel 2.4. You'll see something like this: > > Dec 24 12:25:02 storm kernel: md: trying to remove hdk1 from md4 ... To long time ago... It's not kept on disk any more. I'll have a look at any backups, but I don't think I keep anything that far back... -- Albanian quiche Clinton 767 Noriega pits congress Soviet Legion of Doom Rule Psix Ft. Bragg Serbian strategic jihad arrangements [See http://www.aclu.org/echelonwatch/index.html for more about this] ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Which (physical) disk is broken? 2004-12-24 11:09 ` Turbo Fredriksson 2004-12-24 11:31 ` Theepan @ 2004-12-24 17:14 ` Guy 2004-12-27 12:35 ` Turbo Fredriksson 1 sibling, 1 reply; 12+ messages in thread From: Guy @ 2004-12-24 17:14 UTC (permalink / raw) To: 'Turbo Fredriksson', linux-raid The info is in the superblock. But if the disk has failed, you may not be able to read the superblock. Did you say you don't use superblocks? I guess you better keep a paper trail! You said: "when persistent super blocks is used (which I don't)." But the output from mdadm said: "Persistence : Superblock is persistent" Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Turbo Fredriksson Sent: Friday, December 24, 2004 6:10 AM To: linux-raid@vger.kernel.org Subject: Re: Which (physical) disk is broken? >>>>> "Neil" == Neil Brown <neilb@cse.unsw.edu.au> writes: Neil> On Wednesday December 22, turbo@bayour.com wrote: >> >>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes: >> Guy> If you access your array, every disk in it will have disk Guy> activity. >> He. That's one way I guess... I was more hoping for some >> support for this in mdadm... Neil> Would be nice.... but until disks have little blue lights Neil> that can be turned on and off under software control, and Neil> the linux block-device layer has an interface to access this Neil> control, there isn't much mdadm can usefully do. I was more thinking on the 'magic'. When I created the array, I included this broken disk. I later removed it, but there should still (?) be a record of it in the super block (or wherever mdadm checks). It knows how many disks there SHOULD be, and it knows how many is working. ----- s n i p ----- aurora:~# mdadm -D /dev/md1 /dev/md1: Version : 00.90.01 Creation Time : Wed Oct 27 08:12:44 2004 Raid Level : raid5 Array Size : 141483520 (134.93 GiB 144.88 GB) Device Size : 17685440 (16.87 GiB 18.11 GB) Raid Devices : 9 Total Devices : 8 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Dec 24 11:45:11 2004 State : clean, degraded Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State 0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1 1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1 2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1 5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1 6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1 7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1 8 0 0 -1 removed UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b Events : 0.1005733 ----- s n i p ----- Maybe it's to late now, but can't mdadm write a 'tag' somehow that 'this disk have broken down' and/or 'this disk have been removed from the array'? I don't know how the (software) RAID works on kernel/hardware level, only how I, as a user/admin uses it :) But if I check one of the disks scsi4:4:part1 for example: ----- s n i p ----- aurora:~# mdadm -E /dev/scsi/host4/bus0/target4/lun0/part1 /dev/scsi/host4/bus0/target4/lun0/part1: Magic : a92b4efc Version : 00.90.00 UUID : d0b4e775:290f5785:2e6ad7d1:5a66178b Creation Time : Wed Oct 27 08:12:44 2004 Raid Level : raid5 Device Size : 17685440 (16.87 GiB 18.11 GB) Raid Devices : 9 Total Devices : 8 Preferred Minor : 1 Update Time : Fri Dec 24 11:49:46 2004 State : clean Active Devices : 8 Working Devices : 8 Failed Devices : 1 Spare Devices : 0 Checksum : b038c0d3 - correct Events : 0.1005753 Layout : left-symmetric Chunk Size : 32K Number Major Minor RaidDevice State this 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 0 0 8 49 0 active sync /dev/scsi/host3/bus0/target4/lun0/part1 1 1 8 81 1 active sync /dev/scsi/host3/bus0/target8/lun0/part1 2 2 8 97 2 active sync /dev/scsi/host3/bus0/target9/lun0/part1 3 3 8 241 3 active sync /dev/scsi/host4/bus0/target4/lun0/part1 4 4 65 1 4 active sync /dev/scsi/host4/bus0/target5/lun0/part1 5 5 65 17 5 active sync /dev/scsi/host4/bus0/target8/lun0/part1 6 6 65 33 6 active sync /dev/scsi/host4/bus0/target9/lun0/part1 7 7 65 113 7 active sync /dev/scsi/host4/bus0/target14/lun0/part1 8 8 0 0 8 faulty removed ----- s n i p ----- It (mdadm?) know exactly what md it belongs to etc... If I check the disk that i THINK is the 'faulty removed' one, it 'knows nothing': ----- s n i p ----- aurora:~# mdadm -E /dev/scsi/host4/bus0/target15/lun0/part1 mdadm: No super block found on /dev/scsi/host4/bus0/target15/lun0/part1 (Expected magic a92b4efc, got 00000000) ----- s n i p ----- So, IF this is the disk, then what did mdadm do when it was removed? Did it clear the super block? Couldn't it write something else instead? Or at least keep the UUID? Hmm, looking trough the code (I do that freely, before someone throws it at me :) i see that all mdadm do is a ioctl() so I guess it's something that have to be done in the kernel (?)... But how come mdadm knows that there's one removed? Common sence? There's 9 raid devices (how come it knows that?) and 8 total devices, hence one must be/have been removed? Looking at md.c in the kernel, I see that it don't write to the disk (when removing a device from an array) when persistent super blocks is used (which I don't). There don't seem to be an option in mdadm for this (I do remember using it with raidtools long ago).. Am I way of? It's early on christmas eve, and my blood to coffein ratio is high... :) -- Iran arrangements [Hello to all my fans in domestic surveillance] Kennedy pits 767 Peking Clinton assassination Cocaine PLO Ft. Meade killed $400 million in gold bullion kibo [See http://www.aclu.org/echelonwatch/index.html for more about this] - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Which (physical) disk is broken? 2004-12-24 17:14 ` Guy @ 2004-12-27 12:35 ` Turbo Fredriksson 0 siblings, 0 replies; 12+ messages in thread From: Turbo Fredriksson @ 2004-12-27 12:35 UTC (permalink / raw) To: linux-raid >>>>> "Guy" == Guy <bugzilla@watkins-home.com> writes: Guy> The info is in the superblock. But if the disk has failed, Guy> you may not be able to read the superblock. It's not THAT failed. It just have way to many bad blocks to be useful so I took it out of service (but, unfortunately not from the machine - I'll remember to do that next time :)... To bad there's no 'eject' command for disks :) That would be quite funny, shooting my colleague(s) with disks from the rack :) Guy> Did you say you don't use superblocks? I guess you better Guy> keep a paper trail! I do, but I always forget to update it... Guy> You said: "when persistent super blocks is used (which I Guy> don't)." I didn't say I don't use 'super blocks' I said 'PERSISTENT super blocks' Guy> But the output from mdadm said: "Persistence : Superblock is Guy> persistent" He, yeah. Sorry. I saw that in the same instant it was sent, but to late to cancel... I've had quite a lot of time thinking about this, and I am now QUITE sure that I created the array with 'missing' on the place where this disk should have been.... I can VAGLEY remember that that disk kept failing the array when I built the machine/array initially. But I'm not sure it was THAT disk, or some other (I have a bunch of these disks that "don't quite work")... Thanx for all the help everyone. But seeing the thread 'mdadm -D /dev/md3: 1 0 0 0 sync ???' on this list, leads me to believe I'm not THAT far out in wanting some better information on what disk is what... -- Khaddafi Marxist Legion of Doom 767 747 domestic disruption Peking subway nitrate Uzi Kennedy explosion Rule Psix NORAD Honduras [See http://www.aclu.org/echelonwatch/index.html for more about this] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-12-27 12:35 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-12-21 11:08 Which (physical) disk is broken? Turbo Fredriksson 2004-12-22 16:39 ` Guy 2004-12-22 20:15 ` Michael 2004-12-22 23:10 ` Neil Brown 2004-12-23 0:52 ` Guy 2004-12-22 22:33 ` Turbo Fredriksson 2004-12-22 23:00 ` Neil Brown 2004-12-24 11:09 ` Turbo Fredriksson 2004-12-24 11:31 ` Theepan 2004-12-24 12:32 ` Turbo Fredriksson 2004-12-24 17:14 ` Guy 2004-12-27 12:35 ` Turbo Fredriksson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).