* Rebuilding a RAID5 array after drive (hardware) failure @ 2014-05-22 4:31 George Duffield 2014-05-22 4:49 ` NeilBrown 0 siblings, 1 reply; 8+ messages in thread From: George Duffield @ 2014-05-22 4:31 UTC (permalink / raw) To: linux-raid I have a RAID5 array comprised of 4 x 3TB Seagate 7200 RPM SATAII drives. The array was created on Ubuntu Server running on a HP Microserver N54L using the following command: sudo mdadm --create --verbose /dev/md0 --raid-devices=4 --level=5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 Formatted using: mkfs.ext4 -b 4096 -E stride=128,stripe-width=384 /dev/md0 The array is mounted in /etc/fstab by reference to its UUID and is now near full. A few days back I turned on the server to access some of the files stored on it when I found the server was not present on the network. Inspecting the actual server (connected kb & monitor) I noticed that the machine had not progressed beyond the BIOS post screen – one of the drives had become damaged (2nd drive in same slot in same Microserver to be damaged the same way – drive spins up fine, machine knows it's there, but can't communicat successfully with the drive). In any event, suffice it to say the drive is history – it and the Microserver will be RMAd when this is over. So, I'm now left with a degraded array comprising 3x3TB drives. I've purchased a replacement drive (same make and model) in the interim (and I've yet to boot this machine with the old drive removed or the new one inserted i.e. from an OS standpoint Ubuntu/mdadm does not yet know the array is degraded). As I've lost complete faith in the Microserver (and it may very well damage the new drive during recovery of the array) I've also purchased and assembled a 2nd machine with 6 on board SATA ports rather than rely on another Microserver. My intention is to remove the drives from the Microserver and install them in the new machine (which I'll boot off the same USB flash drive I used to boot the Microserver from [to further complicate things it seems my flash drive may also be corrupted, so I may have to recover from a fresh Ubuntu install and reassemble the array]). A few questions if I may: - Is moving the array to another computer and recovering it on the new computer running Ubuntu Server likely to present any particular challenges? - Does the order/ sequence of connection of the drives to the motherboard matter? Another way of asking the aforementioned question is whether mdadm would care if one swapped drives in Microserver backplane/ PC SATA ports such that the physical backplane slot/ SATA port that one/more of the drives occupies differs from that it occupied when the array was created? - How would I best approach rebuilding the array, my current thinking is as follows: = Identify with certainty which drive has failed - this will be done by removing the OS flash drive from the Microserver and disconnecting all drives from the backplane other than the one I believe is faulty (first slot on backplane) and booting the machine. The failed drive causes a POST failure and is thus easily identified. = Remove all drives from the Microserver and install into new PC referenced above, at the same time replacing the failed drive with the replacement I purchased = Powering new PC via UPS = Booting the PC from the flash drive = Allowing the degraded array to be assembled by mdadm when prompted at boot = Adding the replacement drive to the array and allowing the array to be re-synchronized = If I'm not able to access the flash drive I will create a fresh install of Ubuntu Server and attempt to recreate the array in the fresh install. All thoughts/ comments/ guidance much appreciated. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rebuilding a RAID5 array after drive (hardware) failure 2014-05-22 4:31 Rebuilding a RAID5 array after drive (hardware) failure George Duffield @ 2014-05-22 4:49 ` NeilBrown 2014-05-23 18:29 ` George Duffield 0 siblings, 1 reply; 8+ messages in thread From: NeilBrown @ 2014-05-22 4:49 UTC (permalink / raw) To: George Duffield; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 4703 bytes --] On Thu, 22 May 2014 06:31:58 +0200 George Duffield <forumscollective@gmail.com> wrote: > I have a RAID5 array comprised of 4 x 3TB Seagate 7200 RPM SATAII > drives. The array was created on Ubuntu Server running on a HP > Microserver N54L using the following command: > > sudo mdadm --create --verbose /dev/md0 --raid-devices=4 --level=5 > /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 > > Formatted using: > mkfs.ext4 -b 4096 -E stride=128,stripe-width=384 /dev/md0 > > The array is mounted in /etc/fstab by reference to its UUID and is now > near full. > > A few days back I turned on the server to access some of the files > stored on it when I found the server was not present on the network. > Inspecting the actual server (connected kb & monitor) I noticed that > the machine had not progressed beyond the BIOS post screen – one of > the drives had become damaged (2nd drive in same slot in same > Microserver to be damaged the same way – drive spins up fine, machine > knows it's there, but can't communicat successfully with the drive). > In any event, suffice it to say the drive is history – it and the > Microserver will be RMAd when this is over. > > So, I'm now left with a degraded array comprising 3x3TB drives. I've > purchased a replacement drive (same make and model) in the interim > (and I've yet to boot this machine with the old drive removed or the > new one inserted i.e. from an OS standpoint Ubuntu/mdadm does not yet > know the array is degraded). > > As I've lost complete faith in the Microserver (and it may very well > damage the new drive during recovery of the array) I've also purchased > and assembled a 2nd machine with 6 on board SATA ports rather than > rely on another Microserver. My intention is to remove the drives > from the Microserver and install them in the new machine (which I'll > boot off the same USB flash drive I used to boot the Microserver from > [to further complicate things it seems my flash drive may also be > corrupted, so I may have to recover from a fresh Ubuntu install and > reassemble the array]). > > A few questions if I may: > - Is moving the array to another computer and recovering it on the new > computer running Ubuntu Server likely to present any particular > challenges? No. If you were trying to boot of the array that you moved it might be interesting. But as you aren't I cannot see any possible issue (assuming the hardware functions correctly). > > - Does the order/ sequence of connection of the drives to the > motherboard matter? No. > > Another way of asking the aforementioned question is whether mdadm > would care if one swapped drives in Microserver backplane/ PC SATA > ports such that the physical backplane slot/ SATA port that one/more > of the drives occupies differs from that it occupied when the array > was created? No. mdadm looks at the content of the devices, not their location. > > - How would I best approach rebuilding the array, my current thinking > is as follows: > = Identify with certainty which drive has failed - this will be done > by removing the OS flash drive from the Microserver and disconnecting > all drives from the backplane other than the one I believe is faulty > (first slot on backplane) and booting the machine. The failed drive > causes a POST failure and is thus easily identified. > = Remove all drives from the Microserver and install into new PC > referenced above, at the same time replacing the failed drive with the > replacement I purchased > = Powering new PC via UPS > = Booting the PC from the flash drive > = Allowing the degraded array to be assembled by mdadm when prompted at boot > = Adding the replacement drive to the array and allowing the array to > be re-synchronized > = If I'm not able to access the flash drive I will create a fresh > install of Ubuntu Server and attempt to recreate the array in the > fresh install. > > All thoughts/ comments/ guidance much appreciated. Sounds good. Though I would discourage the boot sequence from assembling the degraded array if possible. Just get the machine up with the drive untouched. Then use "mdadm -E" to look at each device and make sure they are what you think they are (e.g. consistent Event numbers etc). Then mdadm --assemble /dev/mdWHATEVER ..list-of-devices... Then make sure that looks good. Then mdadm /dev/mdWHATEVER --add new-device NeilBrown > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rebuilding a RAID5 array after drive (hardware) failure 2014-05-22 4:49 ` NeilBrown @ 2014-05-23 18:29 ` George Duffield 2014-05-23 18:38 ` George Duffield 0 siblings, 1 reply; 8+ messages in thread From: George Duffield @ 2014-05-23 18:29 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Thanks for clarifying my questions. Seeing as the flash drive has indeed failed (Murphy at his proverbial best) I have to change my approach by creating a fresh install of Ubuntu Server then integrating the array into the new install. On top of that the drive that was marked faulty is actually up and running again (in the new machine --- I've no idea why/how), but all drives passed POST sequence in the Microserver and have since been successfully moved to the new machine. I ran a fresh install of Ubuntu Server last night and installed mdadm. On rebooting the array was automatically seen and reported by mdadm as Clean. I did not attempt to mount the array. Somehow the flash disk with the new OS was corrupted on a reboot (/ could not be mounted) so I shut down the box using shutdown -h now. Tonight I've reinstalled Ubuntu Server on the flash drive, added mdadm and rebooted without the RAID drives powered up. After completing the config of th server OS (nfs, samba etc) I shut down again, added the drives and rebooted. Running lsblk returns the following showing all of the drives from the array accounted for: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 2.7T 0 disk └─sda1 8:1 0 2.7T 0 part └─md127 9:127 0 8.2T 0 raid5 sdb 8:16 0 2.7T 0 disk └─sdb1 8:17 0 2.7T 0 part └─md127 9:127 0 8.2T 0 raid5 sdc 8:32 0 2.7T 0 disk └─sdc1 8:33 0 2.7T 0 part └─md127 9:127 0 8.2T 0 raid5 sdd 8:48 0 2.7T 0 disk └─sdd1 8:49 0 2.7T 0 part └─md127 9:127 0 8.2T 0 raid5 sde 8:64 1 14.5G 0 disk └─sde1 8:65 1 14.4G 0 part / I then tried to assemble the array as follows: $ sudo mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: /dev/sda1 is busy - skipping mdadm: /dev/sdb1 is busy - skipping mdadm: /dev/sdc1 is busy - skipping mdadm: /dev/sdd1 is busy - skipping No idea why the drives are reported as being busy - they're not mounted nor referenced in /etc/fstab. What is required in order to reassemble the array? Thanks again. On Thu, May 22, 2014 at 6:49 AM, NeilBrown <neilb@suse.de> wrote: > On Thu, 22 May 2014 06:31:58 +0200 George Duffield > <forumscollective@gmail.com> wrote: > >> I have a RAID5 array comprised of 4 x 3TB Seagate 7200 RPM SATAII >> drives. The array was created on Ubuntu Server running on a HP >> Microserver N54L using the following command: >> >> sudo mdadm --create --verbose /dev/md0 --raid-devices=4 --level=5 >> /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 >> >> Formatted using: >> mkfs.ext4 -b 4096 -E stride=128,stripe-width=384 /dev/md0 >> >> The array is mounted in /etc/fstab by reference to its UUID and is now >> near full. >> >> A few days back I turned on the server to access some of the files >> stored on it when I found the server was not present on the network. >> Inspecting the actual server (connected kb & monitor) I noticed that >> the machine had not progressed beyond the BIOS post screen – one of >> the drives had become damaged (2nd drive in same slot in same >> Microserver to be damaged the same way – drive spins up fine, machine >> knows it's there, but can't communicat successfully with the drive). >> In any event, suffice it to say the drive is history – it and the >> Microserver will be RMAd when this is over. >> >> So, I'm now left with a degraded array comprising 3x3TB drives. I've >> purchased a replacement drive (same make and model) in the interim >> (and I've yet to boot this machine with the old drive removed or the >> new one inserted i.e. from an OS standpoint Ubuntu/mdadm does not yet >> know the array is degraded). >> >> As I've lost complete faith in the Microserver (and it may very well >> damage the new drive during recovery of the array) I've also purchased >> and assembled a 2nd machine with 6 on board SATA ports rather than >> rely on another Microserver. My intention is to remove the drives >> from the Microserver and install them in the new machine (which I'll >> boot off the same USB flash drive I used to boot the Microserver from >> [to further complicate things it seems my flash drive may also be >> corrupted, so I may have to recover from a fresh Ubuntu install and >> reassemble the array]). >> >> A few questions if I may: >> - Is moving the array to another computer and recovering it on the new >> computer running Ubuntu Server likely to present any particular >> challenges? > > No. If you were trying to boot of the array that you moved it might be > interesting. But as you aren't I cannot see any possible issue (assuming the > hardware functions correctly). > >> >> - Does the order/ sequence of connection of the drives to the >> motherboard matter? > > No. > >> >> Another way of asking the aforementioned question is whether mdadm >> would care if one swapped drives in Microserver backplane/ PC SATA >> ports such that the physical backplane slot/ SATA port that one/more >> of the drives occupies differs from that it occupied when the array >> was created? > > No. mdadm looks at the content of the devices, not their location. > > >> >> - How would I best approach rebuilding the array, my current thinking >> is as follows: >> = Identify with certainty which drive has failed - this will be done >> by removing the OS flash drive from the Microserver and disconnecting >> all drives from the backplane other than the one I believe is faulty >> (first slot on backplane) and booting the machine. The failed drive >> causes a POST failure and is thus easily identified. >> = Remove all drives from the Microserver and install into new PC >> referenced above, at the same time replacing the failed drive with the >> replacement I purchased >> = Powering new PC via UPS >> = Booting the PC from the flash drive >> = Allowing the degraded array to be assembled by mdadm when prompted at boot >> = Adding the replacement drive to the array and allowing the array to >> be re-synchronized >> = If I'm not able to access the flash drive I will create a fresh >> install of Ubuntu Server and attempt to recreate the array in the >> fresh install. >> >> All thoughts/ comments/ guidance much appreciated. > > Sounds good. > Though I would discourage the boot sequence from assembling the degraded > array if possible. > Just get the machine up with the drive untouched. Then use "mdadm -E" to > look at each device and make sure they are what you think they are (e.g. > consistent Event numbers etc). > Then > mdadm --assemble /dev/mdWHATEVER ..list-of-devices... > > Then make sure that looks good. > Then > mdadm /dev/mdWHATEVER --add new-device > > NeilBrown > > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rebuilding a RAID5 array after drive (hardware) failure 2014-05-23 18:29 ` George Duffield @ 2014-05-23 18:38 ` George Duffield 2014-05-23 19:26 ` Dylan Distasio 2014-05-26 0:46 ` NeilBrown 0 siblings, 2 replies; 8+ messages in thread From: George Duffield @ 2014-05-23 18:38 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid If it's of use in diagnosing: cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : active (auto-read-only) raid5 sdc1[2] sdb1[1] sda1[0] sdd1[4] 8790400512 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] So it looks to me I have an array /dev/md127 and it is healthy: $ sudo mdadm --detail /dev/md127 /dev/md127: Version : 1.2 Creation Time : Sun Feb 2 21:40:15 2014 Raid Level : raid5 Array Size : 8790400512 (8383.18 GiB 9001.37 GB) Used Dev Size : 2930133504 (2794.39 GiB 3000.46 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Fri May 23 00:06:34 2014 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : fileserver:0 UUID : 8389cd99:a86f705a:15c33960:9f1d7cbe Events : 210 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 4 8 49 3 active sync /dev/sdd1 Some questions: - How did md127 come into existence? - How do I get it out of active (auto-read-only) state so I can use it? - Can it be renamed to md0? On Fri, May 23, 2014 at 8:29 PM, George Duffield <forumscollective@gmail.com> wrote: > Thanks for clarifying my questions. Seeing as the flash drive has > indeed failed (Murphy at his proverbial best) I have to change my > approach by creating a fresh install of Ubuntu Server then integrating > the array into the new install. On top of that the drive that was > marked faulty is actually up and running again (in the new machine --- > I've no idea why/how), but all drives passed POST sequence in the > Microserver and have since been successfully moved to the new machine. > I ran a fresh install of Ubuntu Server last night and installed > mdadm. On rebooting the array was automatically seen and reported by > mdadm as Clean. I did not attempt to mount the array. Somehow the > flash disk with the new OS was corrupted on a reboot (/ could not be > mounted) so I shut down the box using shutdown -h now. > > Tonight I've reinstalled Ubuntu Server on the flash drive, added mdadm > and rebooted without the RAID drives powered up. After completing the > config of th server OS (nfs, samba etc) I shut down again, added the > drives and rebooted. > > Running lsblk returns the following showing all of the drives from the > array accounted for: > > $ lsblk > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sda 8:0 0 2.7T 0 disk > └─sda1 8:1 0 2.7T 0 part > └─md127 9:127 0 8.2T 0 raid5 > sdb 8:16 0 2.7T 0 disk > └─sdb1 8:17 0 2.7T 0 part > └─md127 9:127 0 8.2T 0 raid5 > sdc 8:32 0 2.7T 0 disk > └─sdc1 8:33 0 2.7T 0 part > └─md127 9:127 0 8.2T 0 raid5 > sdd 8:48 0 2.7T 0 disk > └─sdd1 8:49 0 2.7T 0 part > └─md127 9:127 0 8.2T 0 raid5 > sde 8:64 1 14.5G 0 disk > └─sde1 8:65 1 14.4G 0 part / > > I then tried to assemble the array as follows: > > $ sudo mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 > mdadm: /dev/sda1 is busy - skipping > mdadm: /dev/sdb1 is busy - skipping > mdadm: /dev/sdc1 is busy - skipping > mdadm: /dev/sdd1 is busy - skipping > > No idea why the drives are reported as being busy - they're not > mounted nor referenced in /etc/fstab. > > What is required in order to reassemble the array? > > Thanks again. > > On Thu, May 22, 2014 at 6:49 AM, NeilBrown <neilb@suse.de> wrote: >> On Thu, 22 May 2014 06:31:58 +0200 George Duffield >> <forumscollective@gmail.com> wrote: >> >>> I have a RAID5 array comprised of 4 x 3TB Seagate 7200 RPM SATAII >>> drives. The array was created on Ubuntu Server running on a HP >>> Microserver N54L using the following command: >>> >>> sudo mdadm --create --verbose /dev/md0 --raid-devices=4 --level=5 >>> /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 >>> >>> Formatted using: >>> mkfs.ext4 -b 4096 -E stride=128,stripe-width=384 /dev/md0 >>> >>> The array is mounted in /etc/fstab by reference to its UUID and is now >>> near full. >>> >>> A few days back I turned on the server to access some of the files >>> stored on it when I found the server was not present on the network. >>> Inspecting the actual server (connected kb & monitor) I noticed that >>> the machine had not progressed beyond the BIOS post screen – one of >>> the drives had become damaged (2nd drive in same slot in same >>> Microserver to be damaged the same way – drive spins up fine, machine >>> knows it's there, but can't communicat successfully with the drive). >>> In any event, suffice it to say the drive is history – it and the >>> Microserver will be RMAd when this is over. >>> >>> So, I'm now left with a degraded array comprising 3x3TB drives. I've >>> purchased a replacement drive (same make and model) in the interim >>> (and I've yet to boot this machine with the old drive removed or the >>> new one inserted i.e. from an OS standpoint Ubuntu/mdadm does not yet >>> know the array is degraded). >>> >>> As I've lost complete faith in the Microserver (and it may very well >>> damage the new drive during recovery of the array) I've also purchased >>> and assembled a 2nd machine with 6 on board SATA ports rather than >>> rely on another Microserver. My intention is to remove the drives >>> from the Microserver and install them in the new machine (which I'll >>> boot off the same USB flash drive I used to boot the Microserver from >>> [to further complicate things it seems my flash drive may also be >>> corrupted, so I may have to recover from a fresh Ubuntu install and >>> reassemble the array]). >>> >>> A few questions if I may: >>> - Is moving the array to another computer and recovering it on the new >>> computer running Ubuntu Server likely to present any particular >>> challenges? >> >> No. If you were trying to boot of the array that you moved it might be >> interesting. But as you aren't I cannot see any possible issue (assuming the >> hardware functions correctly). >> >>> >>> - Does the order/ sequence of connection of the drives to the >>> motherboard matter? >> >> No. >> >>> >>> Another way of asking the aforementioned question is whether mdadm >>> would care if one swapped drives in Microserver backplane/ PC SATA >>> ports such that the physical backplane slot/ SATA port that one/more >>> of the drives occupies differs from that it occupied when the array >>> was created? >> >> No. mdadm looks at the content of the devices, not their location. >> >> >>> >>> - How would I best approach rebuilding the array, my current thinking >>> is as follows: >>> = Identify with certainty which drive has failed - this will be done >>> by removing the OS flash drive from the Microserver and disconnecting >>> all drives from the backplane other than the one I believe is faulty >>> (first slot on backplane) and booting the machine. The failed drive >>> causes a POST failure and is thus easily identified. >>> = Remove all drives from the Microserver and install into new PC >>> referenced above, at the same time replacing the failed drive with the >>> replacement I purchased >>> = Powering new PC via UPS >>> = Booting the PC from the flash drive >>> = Allowing the degraded array to be assembled by mdadm when prompted at boot >>> = Adding the replacement drive to the array and allowing the array to >>> be re-synchronized >>> = If I'm not able to access the flash drive I will create a fresh >>> install of Ubuntu Server and attempt to recreate the array in the >>> fresh install. >>> >>> All thoughts/ comments/ guidance much appreciated. >> >> Sounds good. >> Though I would discourage the boot sequence from assembling the degraded >> array if possible. >> Just get the machine up with the drive untouched. Then use "mdadm -E" to >> look at each device and make sure they are what you think they are (e.g. >> consistent Event numbers etc). >> Then >> mdadm --assemble /dev/mdWHATEVER ..list-of-devices... >> >> Then make sure that looks good. >> Then >> mdadm /dev/mdWHATEVER --add new-device >> >> NeilBrown >> >> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rebuilding a RAID5 array after drive (hardware) failure 2014-05-23 18:38 ` George Duffield @ 2014-05-23 19:26 ` Dylan Distasio 2014-05-26 0:46 ` NeilBrown 1 sibling, 0 replies; 8+ messages in thread From: Dylan Distasio @ 2014-05-23 19:26 UTC (permalink / raw) To: George Duffield; +Cc: NeilBrown, linux-raid I have had similar issues with Ubuntu in the past. I'm not sure what your mdadm.conf looks like, but it sounds like you need to update initramfs which is assembling the array during the boot process to reflect the current mdadm config on your filesystem. The md127 version of your array is getting assembled by initramfs before md0 comes into existence. Try running: sudo update-initramfs -u Then reboot and see how things look On Fri, May 23, 2014 at 2:38 PM, George Duffield <forumscollective@gmail.com> wrote: > If it's of use in diagnosing: > > cat /proc/mdstat > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md127 : active (auto-read-only) raid5 sdc1[2] sdb1[1] sda1[0] sdd1[4] > 8790400512 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] > > So it looks to me I have an array /dev/md127 and it is healthy: > > $ sudo mdadm --detail /dev/md127 > > /dev/md127: > Version : 1.2 > Creation Time : Sun Feb 2 21:40:15 2014 > Raid Level : raid5 > Array Size : 8790400512 (8383.18 GiB 9001.37 GB) > Used Dev Size : 2930133504 (2794.39 GiB 3000.46 GB) > Raid Devices : 4 > Total Devices : 4 > Persistence : Superblock is persistent > > Update Time : Fri May 23 00:06:34 2014 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 512K > > Name : fileserver:0 > UUID : 8389cd99:a86f705a:15c33960:9f1d7cbe > Events : 210 > > Number Major Minor RaidDevice State > 0 8 1 0 active sync /dev/sda1 > 1 8 17 1 active sync /dev/sdb1 > 2 8 33 2 active sync /dev/sdc1 > 4 8 49 3 active sync /dev/sdd1 > > > > > Some questions: > - How did md127 come into existence? > - How do I get it out of active (auto-read-only) state so I can use it? > - Can it be renamed to md0? > > On Fri, May 23, 2014 at 8:29 PM, George Duffield > <forumscollective@gmail.com> wrote: >> Thanks for clarifying my questions. Seeing as the flash drive has >> indeed failed (Murphy at his proverbial best) I have to change my >> approach by creating a fresh install of Ubuntu Server then integrating >> the array into the new install. On top of that the drive that was >> marked faulty is actually up and running again (in the new machine --- >> I've no idea why/how), but all drives passed POST sequence in the >> Microserver and have since been successfully moved to the new machine. >> I ran a fresh install of Ubuntu Server last night and installed >> mdadm. On rebooting the array was automatically seen and reported by >> mdadm as Clean. I did not attempt to mount the array. Somehow the >> flash disk with the new OS was corrupted on a reboot (/ could not be >> mounted) so I shut down the box using shutdown -h now. >> >> Tonight I've reinstalled Ubuntu Server on the flash drive, added mdadm >> and rebooted without the RAID drives powered up. After completing the >> config of th server OS (nfs, samba etc) I shut down again, added the >> drives and rebooted. >> >> Running lsblk returns the following showing all of the drives from the >> array accounted for: >> >> $ lsblk >> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >> sda 8:0 0 2.7T 0 disk >> └─sda1 8:1 0 2.7T 0 part >> └─md127 9:127 0 8.2T 0 raid5 >> sdb 8:16 0 2.7T 0 disk >> └─sdb1 8:17 0 2.7T 0 part >> └─md127 9:127 0 8.2T 0 raid5 >> sdc 8:32 0 2.7T 0 disk >> └─sdc1 8:33 0 2.7T 0 part >> └─md127 9:127 0 8.2T 0 raid5 >> sdd 8:48 0 2.7T 0 disk >> └─sdd1 8:49 0 2.7T 0 part >> └─md127 9:127 0 8.2T 0 raid5 >> sde 8:64 1 14.5G 0 disk >> └─sde1 8:65 1 14.4G 0 part / >> >> I then tried to assemble the array as follows: >> >> $ sudo mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 >> mdadm: /dev/sda1 is busy - skipping >> mdadm: /dev/sdb1 is busy - skipping >> mdadm: /dev/sdc1 is busy - skipping >> mdadm: /dev/sdd1 is busy - skipping >> >> No idea why the drives are reported as being busy - they're not >> mounted nor referenced in /etc/fstab. >> >> What is required in order to reassemble the array? >> >> Thanks again. >> >> On Thu, May 22, 2014 at 6:49 AM, NeilBrown <neilb@suse.de> wrote: >>> On Thu, 22 May 2014 06:31:58 +0200 George Duffield >>> <forumscollective@gmail.com> wrote: >>> >>>> I have a RAID5 array comprised of 4 x 3TB Seagate 7200 RPM SATAII >>>> drives. The array was created on Ubuntu Server running on a HP >>>> Microserver N54L using the following command: >>>> >>>> sudo mdadm --create --verbose /dev/md0 --raid-devices=4 --level=5 >>>> /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 >>>> >>>> Formatted using: >>>> mkfs.ext4 -b 4096 -E stride=128,stripe-width=384 /dev/md0 >>>> >>>> The array is mounted in /etc/fstab by reference to its UUID and is now >>>> near full. >>>> >>>> A few days back I turned on the server to access some of the files >>>> stored on it when I found the server was not present on the network. >>>> Inspecting the actual server (connected kb & monitor) I noticed that >>>> the machine had not progressed beyond the BIOS post screen – one of >>>> the drives had become damaged (2nd drive in same slot in same >>>> Microserver to be damaged the same way – drive spins up fine, machine >>>> knows it's there, but can't communicat successfully with the drive). >>>> In any event, suffice it to say the drive is history – it and the >>>> Microserver will be RMAd when this is over. >>>> >>>> So, I'm now left with a degraded array comprising 3x3TB drives. I've >>>> purchased a replacement drive (same make and model) in the interim >>>> (and I've yet to boot this machine with the old drive removed or the >>>> new one inserted i.e. from an OS standpoint Ubuntu/mdadm does not yet >>>> know the array is degraded). >>>> >>>> As I've lost complete faith in the Microserver (and it may very well >>>> damage the new drive during recovery of the array) I've also purchased >>>> and assembled a 2nd machine with 6 on board SATA ports rather than >>>> rely on another Microserver. My intention is to remove the drives >>>> from the Microserver and install them in the new machine (which I'll >>>> boot off the same USB flash drive I used to boot the Microserver from >>>> [to further complicate things it seems my flash drive may also be >>>> corrupted, so I may have to recover from a fresh Ubuntu install and >>>> reassemble the array]). >>>> >>>> A few questions if I may: >>>> - Is moving the array to another computer and recovering it on the new >>>> computer running Ubuntu Server likely to present any particular >>>> challenges? >>> >>> No. If you were trying to boot of the array that you moved it might be >>> interesting. But as you aren't I cannot see any possible issue (assuming the >>> hardware functions correctly). >>> >>>> >>>> - Does the order/ sequence of connection of the drives to the >>>> motherboard matter? >>> >>> No. >>> >>>> >>>> Another way of asking the aforementioned question is whether mdadm >>>> would care if one swapped drives in Microserver backplane/ PC SATA >>>> ports such that the physical backplane slot/ SATA port that one/more >>>> of the drives occupies differs from that it occupied when the array >>>> was created? >>> >>> No. mdadm looks at the content of the devices, not their location. >>> >>> >>>> >>>> - How would I best approach rebuilding the array, my current thinking >>>> is as follows: >>>> = Identify with certainty which drive has failed - this will be done >>>> by removing the OS flash drive from the Microserver and disconnecting >>>> all drives from the backplane other than the one I believe is faulty >>>> (first slot on backplane) and booting the machine. The failed drive >>>> causes a POST failure and is thus easily identified. >>>> = Remove all drives from the Microserver and install into new PC >>>> referenced above, at the same time replacing the failed drive with the >>>> replacement I purchased >>>> = Powering new PC via UPS >>>> = Booting the PC from the flash drive >>>> = Allowing the degraded array to be assembled by mdadm when prompted at boot >>>> = Adding the replacement drive to the array and allowing the array to >>>> be re-synchronized >>>> = If I'm not able to access the flash drive I will create a fresh >>>> install of Ubuntu Server and attempt to recreate the array in the >>>> fresh install. >>>> >>>> All thoughts/ comments/ guidance much appreciated. >>> >>> Sounds good. >>> Though I would discourage the boot sequence from assembling the degraded >>> array if possible. >>> Just get the machine up with the drive untouched. Then use "mdadm -E" to >>> look at each device and make sure they are what you think they are (e.g. >>> consistent Event numbers etc). >>> Then >>> mdadm --assemble /dev/mdWHATEVER ..list-of-devices... >>> >>> Then make sure that looks good. >>> Then >>> mdadm /dev/mdWHATEVER --add new-device >>> >>> NeilBrown >>> >>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rebuilding a RAID5 array after drive (hardware) failure 2014-05-23 18:38 ` George Duffield 2014-05-23 19:26 ` Dylan Distasio @ 2014-05-26 0:46 ` NeilBrown 2014-05-26 19:57 ` George Duffield 1 sibling, 1 reply; 8+ messages in thread From: NeilBrown @ 2014-05-26 0:46 UTC (permalink / raw) To: George Duffield; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 10171 bytes --] On Fri, 23 May 2014 20:38:22 +0200 George Duffield <forumscollective@gmail.com> wrote: > If it's of use in diagnosing: > > cat /proc/mdstat > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md127 : active (auto-read-only) raid5 sdc1[2] sdb1[1] sda1[0] sdd1[4] > 8790400512 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] That is why they were "busy" :-) > > So it looks to me I have an array /dev/md127 and it is healthy: > > $ sudo mdadm --detail /dev/md127 > > /dev/md127: > Version : 1.2 > Creation Time : Sun Feb 2 21:40:15 2014 > Raid Level : raid5 > Array Size : 8790400512 (8383.18 GiB 9001.37 GB) > Used Dev Size : 2930133504 (2794.39 GiB 3000.46 GB) > Raid Devices : 4 > Total Devices : 4 > Persistence : Superblock is persistent > > Update Time : Fri May 23 00:06:34 2014 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 512K > > Name : fileserver:0 > UUID : 8389cd99:a86f705a:15c33960:9f1d7cbe > Events : 210 > > Number Major Minor RaidDevice State > 0 8 1 0 active sync /dev/sda1 > 1 8 17 1 active sync /dev/sdb1 > 2 8 33 2 active sync /dev/sdc1 > 4 8 49 3 active sync /dev/sdd1 > > > > > Some questions: > - How did md127 come into existence? When mdadm tried to assemble the array it found no hard evidence to suggest the use of "/dev/md0" or similar so it choose a high unused number: 127. If the hostname had been "fileserver", mdadm would have realised that this array was array "0" in this machine, and would have used "/dev/md0". (See the "Name" field). > - How do I get it out of active (auto-read-only) state so I can use it? Just use it. On first write the read-only status will automatically disappear. > - Can it be renamed to md0? Sure. If this is being assembled by the initrd, then either the initrd must set the hostname to "fileserver" before mdadm gets to assemble the array, or the initrd must contain an /etc/mdadm.conf which lists "/dev/md0" has having the UUID of this array. NeilBrown > > On Fri, May 23, 2014 at 8:29 PM, George Duffield > <forumscollective@gmail.com> wrote: > > Thanks for clarifying my questions. Seeing as the flash drive has > > indeed failed (Murphy at his proverbial best) I have to change my > > approach by creating a fresh install of Ubuntu Server then integrating > > the array into the new install. On top of that the drive that was > > marked faulty is actually up and running again (in the new machine --- > > I've no idea why/how), but all drives passed POST sequence in the > > Microserver and have since been successfully moved to the new machine. > > I ran a fresh install of Ubuntu Server last night and installed > > mdadm. On rebooting the array was automatically seen and reported by > > mdadm as Clean. I did not attempt to mount the array. Somehow the > > flash disk with the new OS was corrupted on a reboot (/ could not be > > mounted) so I shut down the box using shutdown -h now. > > > > Tonight I've reinstalled Ubuntu Server on the flash drive, added mdadm > > and rebooted without the RAID drives powered up. After completing the > > config of th server OS (nfs, samba etc) I shut down again, added the > > drives and rebooted. > > > > Running lsblk returns the following showing all of the drives from the > > array accounted for: > > > > $ lsblk > > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > > sda 8:0 0 2.7T 0 disk > > └─sda1 8:1 0 2.7T 0 part > > └─md127 9:127 0 8.2T 0 raid5 > > sdb 8:16 0 2.7T 0 disk > > └─sdb1 8:17 0 2.7T 0 part > > └─md127 9:127 0 8.2T 0 raid5 > > sdc 8:32 0 2.7T 0 disk > > └─sdc1 8:33 0 2.7T 0 part > > └─md127 9:127 0 8.2T 0 raid5 > > sdd 8:48 0 2.7T 0 disk > > └─sdd1 8:49 0 2.7T 0 part > > └─md127 9:127 0 8.2T 0 raid5 > > sde 8:64 1 14.5G 0 disk > > └─sde1 8:65 1 14.4G 0 part / > > > > I then tried to assemble the array as follows: > > > > $ sudo mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 > > mdadm: /dev/sda1 is busy - skipping > > mdadm: /dev/sdb1 is busy - skipping > > mdadm: /dev/sdc1 is busy - skipping > > mdadm: /dev/sdd1 is busy - skipping > > > > No idea why the drives are reported as being busy - they're not > > mounted nor referenced in /etc/fstab. > > > > What is required in order to reassemble the array? > > > > Thanks again. > > > > On Thu, May 22, 2014 at 6:49 AM, NeilBrown <neilb@suse.de> wrote: > >> On Thu, 22 May 2014 06:31:58 +0200 George Duffield > >> <forumscollective@gmail.com> wrote: > >> > >>> I have a RAID5 array comprised of 4 x 3TB Seagate 7200 RPM SATAII > >>> drives. The array was created on Ubuntu Server running on a HP > >>> Microserver N54L using the following command: > >>> > >>> sudo mdadm --create --verbose /dev/md0 --raid-devices=4 --level=5 > >>> /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 > >>> > >>> Formatted using: > >>> mkfs.ext4 -b 4096 -E stride=128,stripe-width=384 /dev/md0 > >>> > >>> The array is mounted in /etc/fstab by reference to its UUID and is now > >>> near full. > >>> > >>> A few days back I turned on the server to access some of the files > >>> stored on it when I found the server was not present on the network. > >>> Inspecting the actual server (connected kb & monitor) I noticed that > >>> the machine had not progressed beyond the BIOS post screen – one of > >>> the drives had become damaged (2nd drive in same slot in same > >>> Microserver to be damaged the same way – drive spins up fine, machine > >>> knows it's there, but can't communicat successfully with the drive). > >>> In any event, suffice it to say the drive is history – it and the > >>> Microserver will be RMAd when this is over. > >>> > >>> So, I'm now left with a degraded array comprising 3x3TB drives. I've > >>> purchased a replacement drive (same make and model) in the interim > >>> (and I've yet to boot this machine with the old drive removed or the > >>> new one inserted i.e. from an OS standpoint Ubuntu/mdadm does not yet > >>> know the array is degraded). > >>> > >>> As I've lost complete faith in the Microserver (and it may very well > >>> damage the new drive during recovery of the array) I've also purchased > >>> and assembled a 2nd machine with 6 on board SATA ports rather than > >>> rely on another Microserver. My intention is to remove the drives > >>> from the Microserver and install them in the new machine (which I'll > >>> boot off the same USB flash drive I used to boot the Microserver from > >>> [to further complicate things it seems my flash drive may also be > >>> corrupted, so I may have to recover from a fresh Ubuntu install and > >>> reassemble the array]). > >>> > >>> A few questions if I may: > >>> - Is moving the array to another computer and recovering it on the new > >>> computer running Ubuntu Server likely to present any particular > >>> challenges? > >> > >> No. If you were trying to boot of the array that you moved it might be > >> interesting. But as you aren't I cannot see any possible issue (assuming the > >> hardware functions correctly). > >> > >>> > >>> - Does the order/ sequence of connection of the drives to the > >>> motherboard matter? > >> > >> No. > >> > >>> > >>> Another way of asking the aforementioned question is whether mdadm > >>> would care if one swapped drives in Microserver backplane/ PC SATA > >>> ports such that the physical backplane slot/ SATA port that one/more > >>> of the drives occupies differs from that it occupied when the array > >>> was created? > >> > >> No. mdadm looks at the content of the devices, not their location. > >> > >> > >>> > >>> - How would I best approach rebuilding the array, my current thinking > >>> is as follows: > >>> = Identify with certainty which drive has failed - this will be done > >>> by removing the OS flash drive from the Microserver and disconnecting > >>> all drives from the backplane other than the one I believe is faulty > >>> (first slot on backplane) and booting the machine. The failed drive > >>> causes a POST failure and is thus easily identified. > >>> = Remove all drives from the Microserver and install into new PC > >>> referenced above, at the same time replacing the failed drive with the > >>> replacement I purchased > >>> = Powering new PC via UPS > >>> = Booting the PC from the flash drive > >>> = Allowing the degraded array to be assembled by mdadm when prompted at boot > >>> = Adding the replacement drive to the array and allowing the array to > >>> be re-synchronized > >>> = If I'm not able to access the flash drive I will create a fresh > >>> install of Ubuntu Server and attempt to recreate the array in the > >>> fresh install. > >>> > >>> All thoughts/ comments/ guidance much appreciated. > >> > >> Sounds good. > >> Though I would discourage the boot sequence from assembling the degraded > >> array if possible. > >> Just get the machine up with the drive untouched. Then use "mdadm -E" to > >> look at each device and make sure they are what you think they are (e.g. > >> consistent Event numbers etc). > >> Then > >> mdadm --assemble /dev/mdWHATEVER ..list-of-devices... > >> > >> Then make sure that looks good. > >> Then > >> mdadm /dev/mdWHATEVER --add new-device > >> > >> NeilBrown > >> > >> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rebuilding a RAID5 array after drive (hardware) failure 2014-05-26 0:46 ` NeilBrown @ 2014-05-26 19:57 ` George Duffield 2014-05-26 23:03 ` NeilBrown 0 siblings, 1 reply; 8+ messages in thread From: George Duffield @ 2014-05-26 19:57 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid >> - Can it be renamed to md0? > > Sure. > If this is being assembled by the initrd, then either the initrd must set the > hostname to "fileserver" before mdadm gets to assemble the array, or the > initrd must contain an /etc/mdadm.conf which lists "/dev/md0" has having the > UUID of this array. > > NeilBrown > I'm learning a few things the hard way here e.g. I had saved my mdadm.conf to a file in the home folder of the default user so I could reference it later if needs be - no flash, no file. Next time email it to myself. If I were to run sudo update-initramfs -u would that restore things or pretty much make md127 permanent. In the event update-initramfs -u is not the way to go, it there an example of a good mdadm.conf floating about that I can reference to make the necessary changes? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Rebuilding a RAID5 array after drive (hardware) failure 2014-05-26 19:57 ` George Duffield @ 2014-05-26 23:03 ` NeilBrown 0 siblings, 0 replies; 8+ messages in thread From: NeilBrown @ 2014-05-26 23:03 UTC (permalink / raw) To: George Duffield; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1787 bytes --] On Mon, 26 May 2014 21:57:15 +0200 George Duffield <forumscollective@gmail.com> wrote: > >> - Can it be renamed to md0? > > > > Sure. > > If this is being assembled by the initrd, then either the initrd must set the > > hostname to "fileserver" before mdadm gets to assemble the array, or the > > initrd must contain an /etc/mdadm.conf which lists "/dev/md0" has having the > > UUID of this array. > > > > NeilBrown > > > > I'm learning a few things the hard way here e.g. I had saved my > mdadm.conf to a file in the home folder of the default user so I could > reference it later if needs be - no flash, no file. Next time email > it to myself. > > If I were to run sudo update-initramfs -u would that restore things or > pretty much make md127 permanent. I can't see. Each distro has their own 'initramfs' building code. What I would probably do is: - run update-initramfs mkdir /tmp/i; cd /tmp/i; zcat /boot/initramfs | cpio -idv # note, filename might be wrong - inspect /tmp/i/etc/mdadm.conf and fstab etc etc and make sure "/dev/md0" is used and "/dev/md127" isn't. - make changes as necessary - find | cpio -oc | gzip --best > /boot/initramfs.test then figure a way to boot from /boot/initramfs.test > > In the event update-initramfs -u is not the way to go, it there an > example of a good mdadm.conf floating about that I can reference to > make the necessary changes? It is really worth making the effort to read the man page and understand how to make your own /etc/mdadm.conf mdadm -Ds is a good start, but should be examined and edited to make sure it matches your requirements. All you really need is: ARRAY /dev/md0 UUID=whatever:xxxxxxx...... That is it. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-05-26 23:03 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-22 4:31 Rebuilding a RAID5 array after drive (hardware) failure George Duffield 2014-05-22 4:49 ` NeilBrown 2014-05-23 18:29 ` George Duffield 2014-05-23 18:38 ` George Duffield 2014-05-23 19:26 ` Dylan Distasio 2014-05-26 0:46 ` NeilBrown 2014-05-26 19:57 ` George Duffield 2014-05-26 23:03 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).