From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux.news@bucksch.org Subject: Re: md RAID5: Disk wrongly marked "spare", need to force re-add it Date: Sat, 20 Apr 2013 00:56:17 +0200 Message-ID: <5171CB91.1040708@bucksch.org> References: <516869D2.9030506@bucksch.org> <516B3077.9020507@schinagl.nl> <516B590C.5060807@bucksch.org> <516AE7A0.4070504@schinagl.nl> <516BD5E0.4040007@bucksch.org> <516FF25B.4000907@bucksch.org> <516FFC13.2030803@ultratux.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <516FFC13.2030803@ultratux.net> Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org Cc: Maarten List-Id: linux-raid.ids Maarten wrote, On 18.04.2013 15:58: > On 18/04/13 15:17, Ben Bucksch wrote: >> To re-summarize (for full info, see first post of thread): >> * There are 2 RAID5 arrays in the machine, each have 8 disks. >> * I upgraded Ubuntu 10.04 to 12.04. >> * After reboot, both arrays had each ejected one disk. >> The ejected disks are working fine (at least now). >> * During the resync mandated by above ejection, >> one other drive failed, this one fatally with a real hardware failure. >> * The second array resynced fine, further proving that the >> disks ejected during upgrade were working. >> * Now I am left with: originally 8-disk RAID5, 6 disks are healthy, >> 1 disk with hardware failure, and 1 disk that was ejected, but is >> working. >> * The latter is currently marked "spare" by md and has an event count >> (only) 2 events lower than the other 6 disks. >> * My task is to get the latter disk back online *with* its data, without >> resync. >> >> I desperately need help, please. >> >> Based on suggestions here by Oliver and on forums, I did (and the result >> is): >> >>> # mdadm --stop /dev/md0 >>> mdadm: stopped /dev/md0 >>> # mdadm --assemble --run --force /dev/md0 /dev/sd[jlmnopq] >>> mdadm: failed to RUN_ARRAY /dev/md0: >>> mdadm: Not enough devices to start the array. > At this point, does dmesg show anything pointing to that input/output > error ? The procedure is correct [630786.513314] md: md0 stopped. [630786.513341] md: unbind [630786.590662] md: export_rdev(sdl) [630786.590744] md: unbind [630786.670652] md: export_rdev(sdj) [630786.670887] md: unbind [630786.750650] md: export_rdev(sdq) [630786.750707] md: unbind [630786.830649] md: export_rdev(sdn) [630786.830712] md: unbind [630786.910651] md: export_rdev(sdp) [630786.910710] md: unbind [630786.990649] md: export_rdev(sdo) [630786.990700] md: unbind [630787.070649] md: export_rdev(sdm) [630793.315121] md: md0 stopped. [630794.785328] md: bind [630794.785512] md: bind [630794.785695] md: bind [630794.785891] md: bind [630794.786643] md: bind [630794.787009] md: bind [630794.788164] md: bind [630794.788236] md: kicking non-fresh sdl from array! [630794.788250] md: unbind [630794.810082] md: export_rdev(sdl) [630794.812725] raid5: device sdj operational as raid disk 0 [630794.812734] raid5: device sdq operational as raid disk 7 [630794.812740] raid5: device sdn operational as raid disk 6 [630794.812745] raid5: device sdp operational as raid disk 5 [630794.812750] raid5: device sdo operational as raid disk 4 [630794.812755] raid5: device sdm operational as raid disk 3 [630794.813895] raid5: allocated 8490kB for md0 [630794.813966] 0: w=1 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0 [630794.813974] 7: w=2 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0 [630794.813980] 6: w=3 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0 [630794.813986] 5: w=4 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0 [630794.813993] 4: w=5 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0 [630794.813999] 3: w=6 pa=0 pr=8 m=1 a=2 r=8 op1=0 op2=0 [630794.814005] raid5: not enough operational devices for md0 (2/8 failed) [630794.820671] RAID5 conf printout: [630794.820675] --- rd:8 wd:6 [630794.820680] disk 0, o:1, dev:sdj [630794.820685] disk 3, o:1, dev:sdm [630794.820689] disk 4, o:1, dev:sdo [630794.820693] disk 5, o:1, dev:sdp [630794.820697] disk 6, o:1, dev:sdn [630794.820701] disk 7, o:1, dev:sdq [630794.820945] raid5: failed to run raid set md0 [630794.826530] md: pers->run() failed ... [630794.834455] md: export_rdev(sdl) [630794.834463] md: export_rdev(sdl) The problem is: md: kicking non-fresh sdl from array! thus: raid5: not enough operational devices for md0 (2/8 failed) # mdadm -E /dev/sdl Checksum : ca6e81a9 - correct Events : 13274863 # mdadm -E /dev/sdn Checksum : c9a41046 - correct Events : 13274865 So, the question is: How do I convince md not to be so anal retentive and prevent me from accessing any of my data? The drive ***is fine***, has practically all the data (I don't care about these 2 events), just use it already. Nobody seems to know the magic shell commands to do that. The lack of a proper shell command for that effectively constitutes a dataloss bug. I've been patient, but I'm getting more and more upset at md. Thanks, Maarten, for your help. I hope 1) you or anybody else can help me, and I hope 2) these kinds of problems will be fixed once and for good by the devs. > Good luck! Thanks. Ben