* Looking for some advice on best way to identify drives / recover from issues
@ 2014-01-05 15:04 Dylan Distasio
2014-01-05 15:44 ` Mark Knecht
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Dylan Distasio @ 2014-01-05 15:04 UTC (permalink / raw)
To: linux-raid
Hi all-
I''ve been fortunate enough to not have to email this august group for
advice regarding my mdadm arrays in quite awhile, but am looking for
some suggestions.
I woke up this morning to something beeping in my headless Norco
server case at home (never a promising start to the morning). I was
unable to ping the box which increased my dismay. I proceeded to
perform a hard reboot, and still nothing on the ping. At this point,
I plugged a monitor in to see what was happening on reboot.
Let me take a moment to provide details of my basic set up. There are
three separate HD controllers being used in this box: the motherboard
headers, a supermicro PCI-X card (in a PCI slot), and a Highpoint
RocketRaid SAS controller used as JBOD.
I have a number of separate mdadm arrays tied to this physical box
that have been built over the years including a RAID6 one, a RAID10,
and 2 mirrors.
Unfortunately, I did not take the time to physically label the drives
in the box (there are close to 20) as I built these, and had been
meaning to, but life got in the way. Since I have had no issues with
these arrays in a very long time, I don't even remember if I split
them across controllers or what.
So back to the reboot, I can see the motherboard drives showing up as
the POST runs through its paces. I can then see what appears to be
the Supermicro drives showing up, but when the Highpoint controller
gets to it own internal boot screen, it hangs at detecting drives, and
I am unable to get into the controller card BIOS by hitting ctrl-H
(keyboard works though, as I can ctrl-alt-delete, so it is not locking
the PC).
So at this point, I don't know my point of failure. I am guessing the
Highpoint flaked out though, especially since I now believe that was
the component beeping based on the PC restarting ok otherwise.
I am looking for advice on minimizing my risk of making things worse
as I attempt to identify what drives belong which with array. The
RAID6 is my most immediate concern in getting back up and running.
My immediate thought was to disconnect all drives and then reconnect
them one by one from a motherboard header, and use:
mdadm --examine /dev/sdX1
Will that give me enough info to figure out which drive belongs to
which array? Does anyone have any other suggestions? I am not sure
of the current state of ANY of the arrays that were on this box, but I
don't want to make things worse by booting this system up with some
drives missing because I've unplugged them, and having the a bad
situation get worse.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-05 15:04 Looking for some advice on best way to identify drives / recover from issues Dylan Distasio @ 2014-01-05 15:44 ` Mark Knecht 2014-01-05 17:01 ` Dylan Distasio 2014-01-05 16:33 ` Roger Heflin 2014-01-05 18:34 ` Phil Turmel 2 siblings, 1 reply; 11+ messages in thread From: Mark Knecht @ 2014-01-05 15:44 UTC (permalink / raw) To: Dylan Distasio; +Cc: Linux-RAID On Sun, Jan 5, 2014 at 7:04 AM, Dylan Distasio <interzone@gmail.com> wrote: > Hi all- > > I''ve been fortunate enough to not have to email this august group for > advice regarding my mdadm arrays in quite awhile, but am looking for > some suggestions. > > I woke up this morning to something beeping in my headless Norco > server case at home (never a promising start to the morning). I was > unable to ping the box which increased my dismay. I proceeded to > perform a hard reboot, and still nothing on the ping. At this point, > I plugged a monitor in to see what was happening on reboot. > > Let me take a moment to provide details of my basic set up. There are > three separate HD controllers being used in this box: the motherboard > headers, a supermicro PCI-X card (in a PCI slot), and a Highpoint > RocketRaid SAS controller used as JBOD. > > I have a number of separate mdadm arrays tied to this physical box > that have been built over the years including a RAID6 one, a RAID10, > and 2 mirrors. > > Unfortunately, I did not take the time to physically label the drives > in the box (there are close to 20) as I built these, and had been > meaning to, but life got in the way. Since I have had no issues with > these arrays in a very long time, I don't even remember if I split > them across controllers or what. > > So back to the reboot, I can see the motherboard drives showing up as > the POST runs through its paces. I can then see what appears to be > the Supermicro drives showing up, but when the Highpoint controller > gets to it own internal boot screen, it hangs at detecting drives, and > I am unable to get into the controller card BIOS by hitting ctrl-H > (keyboard works though, as I can ctrl-alt-delete, so it is not locking > the PC). > > So at this point, I don't know my point of failure. I am guessing the > Highpoint flaked out though, especially since I now believe that was > the component beeping based on the PC restarting ok otherwise. > > I am looking for advice on minimizing my risk of making things worse > as I attempt to identify what drives belong which with array. The > RAID6 is my most immediate concern in getting back up and running. > > My immediate thought was to disconnect all drives and then reconnect > them one by one from a motherboard header, and use: > > mdadm --examine /dev/sdX1 > > Will that give me enough info to figure out which drive belongs to > which array? Does anyone have any other suggestions? I am not sure > of the current state of ANY of the arrays that were on this box, but I > don't want to make things worse by booting this system up with some > drives missing because I've unplugged them, and having the a bad > situation get worse. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html I'm reacting to nothing more than 20 drives, no documentation and beeping: 1) Are the beeps POST codes? http://www.computerhope.com/beep.htm 2) Before making any physical changes I'd start by drawing an accurate picture. Exactly what cables go to exactly what drives. Put a label on each drive & each cable (masking tape/black pen, etc.) so that if you do disassemble things you have a chance of getting it back together later in the same configuration. 3) If I thought it was the High Point I'd likely just remove it from its slot and try booting again. (Assuming it's not needed to boot.) 4) It's not clear to me from this email exactly what's required (if anything) in terms of RAID to make machine boot but if I could boot from nothing but what's attached to the MB then that what I'd be trying to do first. With 20 drives you could have a power supply failiing and the system isn't getting enough power to run 20 drives, etc. Minimize as much as possible. 5) If you get to where you can boot then you should run smartctl on each drive looking for any info. However I would understand if 20 drives over a bunch of years means not all drives support S.M.A.R.T. 6) Once you get it booting I'd run a check of any RAID that's included at that point to ensure it hadn't been damaged and then look to add things back in. Good luck! - Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-05 15:44 ` Mark Knecht @ 2014-01-05 17:01 ` Dylan Distasio 2014-01-05 18:05 ` Mark Knecht 0 siblings, 1 reply; 11+ messages in thread From: Dylan Distasio @ 2014-01-05 17:01 UTC (permalink / raw) To: linux-raid > > I'm reacting to nothing more than 20 drives, no documentation and beeping: > > 1) Are the beeps POST codes? > > http://www.computerhope.com/beep.htm No, I woke up to it beeping, but it POSTs fine. I think the beeping was probably coming from the Highpoint card indicating array failure. It no longer beeps, but it also won't pick up any drives. > > 2) Before making any physical changes I'd start by drawing an > accurate picture. Exactly what cables go to exactly what drives. Put a > label on each drive & each cable (masking tape/black pen, etc.) so > that if you do disassemble things you have a chance of getting it back > together later in the same configuration. > Good idea. > 3) If I thought it was the High Point I'd likely just remove it from > its slot and try booting again. (Assuming it's not needed to boot.) > > 4) It's not clear to me from this email exactly what's required (if > anything) in terms of RAID to make machine boot but if I could boot > from nothing but what's attached to the MB then that what I'd be > trying to do first. With 20 drives you could have a power supply > failiing and the system isn't getting enough power to run 20 drives, > etc. Minimize as much as possible. > > 5) If you get to where you can boot then you should run smartctl on > each drive looking for any info. However I would understand if 20 > drives over a bunch of years means not all drives support S.M.A.R.T. > > 6) Once you get it booting I'd run a check of any RAID that's included > at that point to ensure it hadn't been damaged and then look to add > things back in. > > Good luck! > > - Mark The machine will boot fine from what I can see, but I am concerned with disconnecting some drives and letting it boot because it might result in a degraded array assembling simply because I did not reconnect the drives it was looking for when I remove the highpoint from the picture. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-05 17:01 ` Dylan Distasio @ 2014-01-05 18:05 ` Mark Knecht 0 siblings, 0 replies; 11+ messages in thread From: Mark Knecht @ 2014-01-05 18:05 UTC (permalink / raw) To: Dylan Distasio; +Cc: Linux-RAID On Sun, Jan 5, 2014 at 9:01 AM, Dylan Distasio <interzone@gmail.com> wrote: >> >> I'm reacting to nothing more than 20 drives, no documentation and beeping: >> >> 1) Are the beeps POST codes? >> >> http://www.computerhope.com/beep.htm > > No, I woke up to it beeping, but it POSTs fine. I think the beeping > was probably coming from the Highpoint card indicating array failure. > It no longer beeps, but it also won't pick up any drives. > > The machine will boot fine from what I can see, but I am concerned > with disconnecting some drives and letting it boot because it might > result in a degraded array assembling simply because I did not > reconnect the drives it was looking for when I remove the highpoint > from the picture. A couple of things. Again, I'm not an md expert so be careful, but this is just informational: 1) You said you have a couple of RAIDs in this box but you didn't describe them very completely. To the extent you can what are they? Give all the info you can: RAID type, number of drives, which drives, do the RAIDs use dev names (/dev/sda) or do you use UUIDs? What controllers are connected to what RAIDs. 2) IMO if the machine is booting without picking up the drives attached to the Highpoint then it would seem logical that you could completely remove the Highpoint - actually take it out of the PCI slot - and still boot the machine. If you can do that and get to a Linux prompt then document and test what you have. After that's done and someone more md knowledgeable checks in on what you are reporting then you could look at the Highpoint drives and finally the Highpoint controller. Personally I wouldn't do anything in terms of assembling the RAID by hand until I had LOTS of mdadm data (Examine & Detail analysis for all drives and RAIDs) If you have another machine where you could test drives then I suppose you could remove them one-at-a-time and run smartstl on each one to determine if there's any obvious problem. Good luck, Mark ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-05 15:04 Looking for some advice on best way to identify drives / recover from issues Dylan Distasio 2014-01-05 15:44 ` Mark Knecht @ 2014-01-05 16:33 ` Roger Heflin 2014-01-05 17:06 ` Dylan Distasio 2014-01-05 18:34 ` Phil Turmel 2 siblings, 1 reply; 11+ messages in thread From: Roger Heflin @ 2014-01-05 16:33 UTC (permalink / raw) To: Dylan Distasio; +Cc: Linux RAID The crude but simple way is this: Get the machine up with all disks that will work. dd if=/dev/mdX of=/dev/null on each array, noting which disks light up, repeat on all arrays, same process can be done with each disk (dd if=/dev/sdX of=/dev/null ) to see exactly what disk maps to where. This trick is rather nice since it pretty much works with everything...even if you have a hw raid controlled and a failed disk, that will be the one disk that never lights, so you can find the failed on there also, just make sure that when done you have the expected number of disks to not light up. The biggest issue is that if the md's come up missing the 4 drives it may complicate things with MD, though at worse that should require some usage of the raw mdadm command to force things on after doing this. On Sun, Jan 5, 2014 at 9:04 AM, Dylan Distasio <interzone@gmail.com> wrote: > Hi all- > > I''ve been fortunate enough to not have to email this august group for > advice regarding my mdadm arrays in quite awhile, but am looking for > some suggestions. > > I woke up this morning to something beeping in my headless Norco > server case at home (never a promising start to the morning). I was > unable to ping the box which increased my dismay. I proceeded to > perform a hard reboot, and still nothing on the ping. At this point, > I plugged a monitor in to see what was happening on reboot. > > Let me take a moment to provide details of my basic set up. There are > three separate HD controllers being used in this box: the motherboard > headers, a supermicro PCI-X card (in a PCI slot), and a Highpoint > RocketRaid SAS controller used as JBOD. > > I have a number of separate mdadm arrays tied to this physical box > that have been built over the years including a RAID6 one, a RAID10, > and 2 mirrors. > > Unfortunately, I did not take the time to physically label the drives > in the box (there are close to 20) as I built these, and had been > meaning to, but life got in the way. Since I have had no issues with > these arrays in a very long time, I don't even remember if I split > them across controllers or what. > > So back to the reboot, I can see the motherboard drives showing up as > the POST runs through its paces. I can then see what appears to be > the Supermicro drives showing up, but when the Highpoint controller > gets to it own internal boot screen, it hangs at detecting drives, and > I am unable to get into the controller card BIOS by hitting ctrl-H > (keyboard works though, as I can ctrl-alt-delete, so it is not locking > the PC). > > So at this point, I don't know my point of failure. I am guessing the > Highpoint flaked out though, especially since I now believe that was > the component beeping based on the PC restarting ok otherwise. > > I am looking for advice on minimizing my risk of making things worse > as I attempt to identify what drives belong which with array. The > RAID6 is my most immediate concern in getting back up and running. > > My immediate thought was to disconnect all drives and then reconnect > them one by one from a motherboard header, and use: > > mdadm --examine /dev/sdX1 > > Will that give me enough info to figure out which drive belongs to > which array? Does anyone have any other suggestions? I am not sure > of the current state of ANY of the arrays that were on this box, but I > don't want to make things worse by booting this system up with some > drives missing because I've unplugged them, and having the a bad > situation get worse. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-05 16:33 ` Roger Heflin @ 2014-01-05 17:06 ` Dylan Distasio 2014-01-05 19:37 ` Krzysztof Adamski 2014-01-22 17:16 ` Dylan Distasio 0 siblings, 2 replies; 11+ messages in thread From: Dylan Distasio @ 2014-01-05 17:06 UTC (permalink / raw) To: linux-raid Thanks for the trick. The issue of complicating things with MD is what I am concerned about. I am afraid to boot the PC up with drives missing (if for example I remove the highpoint controller) because it may end up assembling an array with drives missing and degrading it when it didn't need to be. I'm really wishing I had labeled my drives now, since I don't know which ones are part of which array physically, and don't want any arrays to assemble until I do. I was wondering if booting into a live CD would be the way to go. I need some way of checking which drive is in which array without the risk of any arrays assembling. On Sun, Jan 5, 2014 at 11:33 AM, Roger Heflin <rogerheflin@gmail.com> wrote: > The crude but simple way is this: > > Get the machine up with all disks that will work. > > dd if=/dev/mdX of=/dev/null on each array, noting which disks light > up, repeat on all arrays, same process can be done with each disk (dd > if=/dev/sdX of=/dev/null ) to see exactly what disk maps to where. > This trick is rather nice since it pretty much works with > everything...even if you have a hw raid controlled and a failed disk, > that will be the one disk that never lights, so you can find the > failed on there also, just make sure that when done you have the > expected number of disks to not light up. > > The biggest issue is that if the md's come up missing the 4 drives it > may complicate things with MD, though at worse that should require > some usage of the raw mdadm command to force things on after doing > this. > > On Sun, Jan 5, 2014 at 9:04 AM, Dylan Distasio <interzone@gmail.com> wrote: >> Hi all- >> >> I''ve been fortunate enough to not have to email this august group for >> advice regarding my mdadm arrays in quite awhile, but am looking for >> some suggestions. >> >> I woke up this morning to something beeping in my headless Norco >> server case at home (never a promising start to the morning). I was >> unable to ping the box which increased my dismay. I proceeded to >> perform a hard reboot, and still nothing on the ping. At this point, >> I plugged a monitor in to see what was happening on reboot. >> >> Let me take a moment to provide details of my basic set up. There are >> three separate HD controllers being used in this box: the motherboard >> headers, a supermicro PCI-X card (in a PCI slot), and a Highpoint >> RocketRaid SAS controller used as JBOD. >> >> I have a number of separate mdadm arrays tied to this physical box >> that have been built over the years including a RAID6 one, a RAID10, >> and 2 mirrors. >> >> Unfortunately, I did not take the time to physically label the drives >> in the box (there are close to 20) as I built these, and had been >> meaning to, but life got in the way. Since I have had no issues with >> these arrays in a very long time, I don't even remember if I split >> them across controllers or what. >> >> So back to the reboot, I can see the motherboard drives showing up as >> the POST runs through its paces. I can then see what appears to be >> the Supermicro drives showing up, but when the Highpoint controller >> gets to it own internal boot screen, it hangs at detecting drives, and >> I am unable to get into the controller card BIOS by hitting ctrl-H >> (keyboard works though, as I can ctrl-alt-delete, so it is not locking >> the PC). >> >> So at this point, I don't know my point of failure. I am guessing the >> Highpoint flaked out though, especially since I now believe that was >> the component beeping based on the PC restarting ok otherwise. >> >> I am looking for advice on minimizing my risk of making things worse >> as I attempt to identify what drives belong which with array. The >> RAID6 is my most immediate concern in getting back up and running. >> >> My immediate thought was to disconnect all drives and then reconnect >> them one by one from a motherboard header, and use: >> >> mdadm --examine /dev/sdX1 >> >> Will that give me enough info to figure out which drive belongs to >> which array? Does anyone have any other suggestions? I am not sure >> of the current state of ANY of the arrays that were on this box, but I >> don't want to make things worse by booting this system up with some >> drives missing because I've unplugged them, and having the a bad >> situation get worse. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-05 17:06 ` Dylan Distasio @ 2014-01-05 19:37 ` Krzysztof Adamski 2014-01-22 17:16 ` Dylan Distasio 1 sibling, 0 replies; 11+ messages in thread From: Krzysztof Adamski @ 2014-01-05 19:37 UTC (permalink / raw) To: Dylan Distasio; +Cc: linux-raid Boot with knoppix CD and examine the drives without MD starting. This is after you remove the highpoint card. This way you can figure out what drives belong to which array. If you can use lsdrv while running from knoppix. On Sun, 2014-01-05 at 12:06 -0500, Dylan Distasio wrote: > Thanks for the trick. The issue of complicating things with MD is > what I am concerned about. I am afraid to boot the PC up with drives > missing (if for example I remove the highpoint controller) because it > may end up assembling an array with drives missing and degrading it > when it didn't need to be. > > I'm really wishing I had labeled my drives now, since I don't know > which ones are part of which array physically, and don't want any > arrays to assemble until I do. I was wondering if booting into a live > CD would be the way to go. I need some way of checking which drive is > in which array without the risk of any arrays assembling. > > On Sun, Jan 5, 2014 at 11:33 AM, Roger Heflin <rogerheflin@gmail.com> wrote: > > The crude but simple way is this: > > > > Get the machine up with all disks that will work. > > > > dd if=/dev/mdX of=/dev/null on each array, noting which disks light > > up, repeat on all arrays, same process can be done with each disk (dd > > if=/dev/sdX of=/dev/null ) to see exactly what disk maps to where. > > This trick is rather nice since it pretty much works with > > everything...even if you have a hw raid controlled and a failed disk, > > that will be the one disk that never lights, so you can find the > > failed on there also, just make sure that when done you have the > > expected number of disks to not light up. > > > > The biggest issue is that if the md's come up missing the 4 drives it > > may complicate things with MD, though at worse that should require > > some usage of the raw mdadm command to force things on after doing > > this. > > > > On Sun, Jan 5, 2014 at 9:04 AM, Dylan Distasio <interzone@gmail.com> wrote: > >> Hi all- > >> > >> I''ve been fortunate enough to not have to email this august group for > >> advice regarding my mdadm arrays in quite awhile, but am looking for > >> some suggestions. > >> > >> I woke up this morning to something beeping in my headless Norco > >> server case at home (never a promising start to the morning). I was > >> unable to ping the box which increased my dismay. I proceeded to > >> perform a hard reboot, and still nothing on the ping. At this point, > >> I plugged a monitor in to see what was happening on reboot. > >> > >> Let me take a moment to provide details of my basic set up. There are > >> three separate HD controllers being used in this box: the motherboard > >> headers, a supermicro PCI-X card (in a PCI slot), and a Highpoint > >> RocketRaid SAS controller used as JBOD. > >> > >> I have a number of separate mdadm arrays tied to this physical box > >> that have been built over the years including a RAID6 one, a RAID10, > >> and 2 mirrors. > >> > >> Unfortunately, I did not take the time to physically label the drives > >> in the box (there are close to 20) as I built these, and had been > >> meaning to, but life got in the way. Since I have had no issues with > >> these arrays in a very long time, I don't even remember if I split > >> them across controllers or what. > >> > >> So back to the reboot, I can see the motherboard drives showing up as > >> the POST runs through its paces. I can then see what appears to be > >> the Supermicro drives showing up, but when the Highpoint controller > >> gets to it own internal boot screen, it hangs at detecting drives, and > >> I am unable to get into the controller card BIOS by hitting ctrl-H > >> (keyboard works though, as I can ctrl-alt-delete, so it is not locking > >> the PC). > >> > >> So at this point, I don't know my point of failure. I am guessing the > >> Highpoint flaked out though, especially since I now believe that was > >> the component beeping based on the PC restarting ok otherwise. > >> > >> I am looking for advice on minimizing my risk of making things worse > >> as I attempt to identify what drives belong which with array. The > >> RAID6 is my most immediate concern in getting back up and running. > >> > >> My immediate thought was to disconnect all drives and then reconnect > >> them one by one from a motherboard header, and use: > >> > >> mdadm --examine /dev/sdX1 > >> > >> Will that give me enough info to figure out which drive belongs to > >> which array? Does anyone have any other suggestions? I am not sure > >> of the current state of ANY of the arrays that were on this box, but I > >> don't want to make things worse by booting this system up with some > >> drives missing because I've unplugged them, and having the a bad > >> situation get worse. > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-05 17:06 ` Dylan Distasio 2014-01-05 19:37 ` Krzysztof Adamski @ 2014-01-22 17:16 ` Dylan Distasio 1 sibling, 0 replies; 11+ messages in thread From: Dylan Distasio @ 2014-01-22 17:16 UTC (permalink / raw) To: linux-raid I just wanted to provide an update on my situation for those interested. It might help someone in the future with my combination of hardware. I originally suspected my Highpoint controller was the point of failure, and decided to get a LSI 9211 controller card to swap it out with since they are pretty affordable on eBay and seem to be decent low end controllers. I figured that would be the easiest troubleshooting first step based on my situation, especially since the original controller was seizing up. Anyways, I had time yesterday to swap it out after waiting for it to arrive from Hong Kong. When I rebooted, I got a grub error 15. I'll be honest, grub isn't my forte, but I imagined that it might have been related to device order assignment, and that the new hardware had confused it somehow. I did some googling, and decided to roll the dice with a boot repair live CD. I went through the steps to install a newer version of grub and could see that the tool had successfully found the OS drive, so I let it finish, and rebooted. At this point, I now got a cryptic "No upper memory" error. I was beginning to pull out some of what little hair is left at that point. I did some additional googling and stumbled across some threads on Gigabyte motherboards not playing nice with LSI 9211 cards...Doh! I had heard that updating to a Beta Bios had sometimes helped, and proceeded to format a flash drive as a bootable DOS disk with the flash utility. Of course, this is an older motherboard and I could not get it to boot from the flash drive, and I don't have a floppy handy. On to the next choice...I had a newer gigabyte mobo lying around with a processor and ram already installed. I swapped that in, and tried the LSI card. Immediately, I was greeted with the same upper memory error. Ugh! I decided to flash that mobo with a beta bios as a long shot. I was able to do so with a flash drive. I rebooted, and was greeted with the grub menu! It worked, the only problem now was that I noticed the new LSI card was dropping 4 of the 8 drives. At this point, the lightbulb went off that it was probably NOT the highpoint controller that was the point of failure. I swapped that back in and disconnected the mini-SAS cable to the problematic drives. Sure enough, it was working fine. I realized at this point that there were only two choices left as failure points. The 4 drives (which I was hoping was not the case, as losing that many at once would not have been good), or the SATA backplane on my Norco 4020 case. The backplane is divided into 5 separate banks of SATA connectors, each with their own power connection, that control 4 drives each. I proceeded to pull the 4 drives in their trays and hook them up directly to the sata end of the the controller card. I rebooted, and success! My arrays were all running successfully. I am now working on trying to repair the backplane assuming I can swap out the damaged sections. I will need to pull apart and rewire this entire case which won't be a fun project, but most importantly I got to the root cause, and there doesn't appear to be any harm to the arrays. I am going to run a check on them shortly though. On Sun, Jan 5, 2014 at 12:06 PM, Dylan Distasio <interzone@gmail.com> wrote: > Thanks for the trick. The issue of complicating things with MD is > what I am concerned about. I am afraid to boot the PC up with drives > missing (if for example I remove the highpoint controller) because it > may end up assembling an array with drives missing and degrading it > when it didn't need to be. > > I'm really wishing I had labeled my drives now, since I don't know > which ones are part of which array physically, and don't want any > arrays to assemble until I do. I was wondering if booting into a live > CD would be the way to go. I need some way of checking which drive is > in which array without the risk of any arrays assembling. > > On Sun, Jan 5, 2014 at 11:33 AM, Roger Heflin <rogerheflin@gmail.com> wrote: >> The crude but simple way is this: >> >> Get the machine up with all disks that will work. >> >> dd if=/dev/mdX of=/dev/null on each array, noting which disks light >> up, repeat on all arrays, same process can be done with each disk (dd >> if=/dev/sdX of=/dev/null ) to see exactly what disk maps to where. >> This trick is rather nice since it pretty much works with >> everything...even if you have a hw raid controlled and a failed disk, >> that will be the one disk that never lights, so you can find the >> failed on there also, just make sure that when done you have the >> expected number of disks to not light up. >> >> The biggest issue is that if the md's come up missing the 4 drives it >> may complicate things with MD, though at worse that should require >> some usage of the raw mdadm command to force things on after doing >> this. >> >> On Sun, Jan 5, 2014 at 9:04 AM, Dylan Distasio <interzone@gmail.com> wrote: >>> Hi all- >>> >>> I''ve been fortunate enough to not have to email this august group for >>> advice regarding my mdadm arrays in quite awhile, but am looking for >>> some suggestions. >>> >>> I woke up this morning to something beeping in my headless Norco >>> server case at home (never a promising start to the morning). I was >>> unable to ping the box which increased my dismay. I proceeded to >>> perform a hard reboot, and still nothing on the ping. At this point, >>> I plugged a monitor in to see what was happening on reboot. >>> >>> Let me take a moment to provide details of my basic set up. There are >>> three separate HD controllers being used in this box: the motherboard >>> headers, a supermicro PCI-X card (in a PCI slot), and a Highpoint >>> RocketRaid SAS controller used as JBOD. >>> >>> I have a number of separate mdadm arrays tied to this physical box >>> that have been built over the years including a RAID6 one, a RAID10, >>> and 2 mirrors. >>> >>> Unfortunately, I did not take the time to physically label the drives >>> in the box (there are close to 20) as I built these, and had been >>> meaning to, but life got in the way. Since I have had no issues with >>> these arrays in a very long time, I don't even remember if I split >>> them across controllers or what. >>> >>> So back to the reboot, I can see the motherboard drives showing up as >>> the POST runs through its paces. I can then see what appears to be >>> the Supermicro drives showing up, but when the Highpoint controller >>> gets to it own internal boot screen, it hangs at detecting drives, and >>> I am unable to get into the controller card BIOS by hitting ctrl-H >>> (keyboard works though, as I can ctrl-alt-delete, so it is not locking >>> the PC). >>> >>> So at this point, I don't know my point of failure. I am guessing the >>> Highpoint flaked out though, especially since I now believe that was >>> the component beeping based on the PC restarting ok otherwise. >>> >>> I am looking for advice on minimizing my risk of making things worse >>> as I attempt to identify what drives belong which with array. The >>> RAID6 is my most immediate concern in getting back up and running. >>> >>> My immediate thought was to disconnect all drives and then reconnect >>> them one by one from a motherboard header, and use: >>> >>> mdadm --examine /dev/sdX1 >>> >>> Will that give me enough info to figure out which drive belongs to >>> which array? Does anyone have any other suggestions? I am not sure >>> of the current state of ANY of the arrays that were on this box, but I >>> don't want to make things worse by booting this system up with some >>> drives missing because I've unplugged them, and having the a bad >>> situation get worse. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-05 15:04 Looking for some advice on best way to identify drives / recover from issues Dylan Distasio 2014-01-05 15:44 ` Mark Knecht 2014-01-05 16:33 ` Roger Heflin @ 2014-01-05 18:34 ` Phil Turmel 2014-01-06 15:57 ` John Stoffel 2 siblings, 1 reply; 11+ messages in thread From: Phil Turmel @ 2014-01-05 18:34 UTC (permalink / raw) To: Dylan Distasio, linux-raid Good afternoon Dylan, On 01/05/2014 10:04 AM, Dylan Distasio wrote: > Hi all- [trim /] > Unfortunately, I did not take the time to physically label the drives > in the box (there are close to 20) as I built these, and had been > meaning to, but life got in the way. Since I have had no issues with > these arrays in a very long time, I don't even remember if I split > them across controllers or what. [trim /] > Will that give me enough info to figure out which drive belongs to > which array? Does anyone have any other suggestions? I am not sure > of the current state of ANY of the arrays that were on this box, but I > don't want to make things worse by booting this system up with some > drives missing because I've unplugged them, and having the a bad > situation get worse. I created a script for precisely this type of documentation task, keyed to drive serial numbers and UUIDs wherever identifiable. https://github.com/pturmel/lsdrv (If any of your drive serial numbers are entirely numeric, you'll want the patch shown in the open issues) HTH, Phil ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-05 18:34 ` Phil Turmel @ 2014-01-06 15:57 ` John Stoffel 2014-01-06 16:54 ` Phil Turmel 0 siblings, 1 reply; 11+ messages in thread From: John Stoffel @ 2014-01-06 15:57 UTC (permalink / raw) To: Phil Turmel; +Cc: Dylan Distasio, linux-raid >>>>> "Phil" == Phil Turmel <philip@turmel.org> writes: Phil> I created a script for precisely this type of documentation task, keyed Phil> to drive serial numbers and UUIDs wherever identifiable. Phil> https://github.com/pturmel/lsdrv Phil, Thanks for the script, it looks good, but I wanted to poke you about the continuation and corner vars, which are defined with funky graphic chars. Would it be hard to put in simpler plain ASCII graphics there by default, and offer a switch for UTF-8 (???) output? Also, I wonder if having it going the other way, which is from mount point down to the device(s) would make sense as well? Now I just need to find the time to hack your code and see what I can do. Thanks! John ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Looking for some advice on best way to identify drives / recover from issues 2014-01-06 15:57 ` John Stoffel @ 2014-01-06 16:54 ` Phil Turmel 0 siblings, 0 replies; 11+ messages in thread From: Phil Turmel @ 2014-01-06 16:54 UTC (permalink / raw) To: John Stoffel; +Cc: Dylan Distasio, linux-raid On 01/06/2014 10:57 AM, John Stoffel wrote: >>>>>> "Phil" == Phil Turmel <philip@turmel.org> writes: > > Phil> I created a script for precisely this type of documentation task, keyed > Phil> to drive serial numbers and UUIDs wherever identifiable. > > Phil> https://github.com/pturmel/lsdrv > > Phil, > > Thanks for the script, it looks good, but I wanted to poke you about > the continuation and corner vars, which are defined with funky graphic > chars. Would it be hard to put in simpler plain ASCII graphics there > by default, and offer a switch for UTF-8 (???) output? I'll take patches that make the script locale-aware using python2's normal methods. I hadn't bothered to figure out how to do so since distros have been installing with utf-8 by default for years now. I do not want to give up the utf-8 line drawing characters for the majority. > Also, I wonder if having it going the other way, which is from mount > point down to the device(s) would make sense as well? Now I just need > to find the time to hack your code and see what I can do. Shouldn't be hard. I chose not to go in that direction as it would leave out unmounted devices. My intent is to document the relationships amongst everything present (with SNs and UUIDs) to the fullest extent possible. Thanks for the feedback. Phil ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-01-22 17:16 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-05 15:04 Looking for some advice on best way to identify drives / recover from issues Dylan Distasio 2014-01-05 15:44 ` Mark Knecht 2014-01-05 17:01 ` Dylan Distasio 2014-01-05 18:05 ` Mark Knecht 2014-01-05 16:33 ` Roger Heflin 2014-01-05 17:06 ` Dylan Distasio 2014-01-05 19:37 ` Krzysztof Adamski 2014-01-22 17:16 ` Dylan Distasio 2014-01-05 18:34 ` Phil Turmel 2014-01-06 15:57 ` John Stoffel 2014-01-06 16:54 ` Phil Turmel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).