* Panic when finishing raidreconf on 2.4.0-test4 with preempt @ 2003-09-06 21:32 Chris Meadors 2003-09-09 18:11 ` Jakob Oestergaard 0 siblings, 1 reply; 5+ messages in thread From: Chris Meadors @ 2003-09-06 21:32 UTC (permalink / raw) To: linux-kernel I've done this twice now, I'd prefer not to do it again, but can upon request, if you really need the oops output. Running raidreconf to expand a 4 disk array to 5, seems to work correctly until the very end. I'm guessing it is as the RAID super block is being written. A preempt error is triggered and the kernel panics. Upon reboot the MD driver doesn't think the 5th disk is valid for consideration in the array and skips over it. Leaving a very corrupted file system. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Panic when finishing raidreconf on 2.4.0-test4 with preempt 2003-09-06 21:32 Panic when finishing raidreconf on 2.4.0-test4 with preempt Chris Meadors @ 2003-09-09 18:11 ` Jakob Oestergaard 2003-09-09 19:21 ` Chris Meadors 0 siblings, 1 reply; 5+ messages in thread From: Jakob Oestergaard @ 2003-09-09 18:11 UTC (permalink / raw) To: Chris Meadors; +Cc: linux-kernel On Sat, Sep 06, 2003 at 05:32:30PM -0400, Chris Meadors wrote: > I've done this twice now, I'd prefer not to do it again, but can upon > request, if you really need the oops output. > > Running raidreconf to expand a 4 disk array to 5, seems to work > correctly until the very end. I'm guessing it is as the RAID super > block is being written. A preempt error is triggered and the kernel > panics. Upon reboot the MD driver doesn't think the 5th disk is valid > for consideration in the array and skips over it. Leaving a very > corrupted file system. raidreconf does no "funny business" with the kernel, so I think this points to either: *) a bug which mkraid can trigger as well *) an API change combined with missing error handling, which raidreconf now triggers (by calling the old API) *) a more general kernel bug - there is a *massive* VM load when raidreconf does its magic, perhaps calling mkraid after beating the VM half way to death can trigger the same error? raidreconf, upon complete reconfiguration, will set up the new superblock for you array, mark it as "unclean", and add the disks one by one. Once all disks are added, the kernel should start calculating parity information (because raidreconf does not do this during the conversion, and hence marks the newly set up array as unclean in order to have the kernel do this dirty work). There should be nothing special about this, compared to normal mkraid and raidhotadd usage - except raidreconf is probably a lot more likely to trigger races. Ah, fingerpointing ;) (/me sits back, confident that his code is perfect and the kernel alone is to blame) -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob Østergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Panic when finishing raidreconf on 2.4.0-test4 with preempt 2003-09-09 18:11 ` Jakob Oestergaard @ 2003-09-09 19:21 ` Chris Meadors 2003-09-09 20:42 ` Jakob Oestergaard 0 siblings, 1 reply; 5+ messages in thread From: Chris Meadors @ 2003-09-09 19:21 UTC (permalink / raw) To: Jakob Oestergaard; +Cc: linux-kernel On Tue, 2003-09-09 at 14:11, Jakob Oestergaard wrote: > raidreconf does no "funny business" with the kernel, so I think this > points to either: > *) a bug which mkraid can trigger as well > *) an API change combined with missing error handling, which raidreconf > now triggers (by calling the old API) > *) a more general kernel bug - there is a *massive* VM load when > raidreconf does its magic, perhaps calling mkraid after beating > the VM half way to death can trigger the same error? > > raidreconf, upon complete reconfiguration, will set up the new > superblock for you array, mark it as "unclean", and add the disks one by > one. Once all disks are added, the kernel should start calculating > parity information (because raidreconf does not do this during the > conversion, and hence marks the newly set up array as unclean in order > to have the kernel do this dirty work). > > There should be nothing special about this, compared to normal mkraid > and raidhotadd usage - except raidreconf is probably a lot more likely > to trigger races. > > Ah, fingerpointing ;) > > (/me sits back, confident that his code is perfect and the kernel alone > is to blame) I'll mess around this evening a bit if I get a chance. I really wasn't in the mood to see this error again (losing five years worth of data can do that to a person, but I've come to terms (with my own arrogance and stupidity, along with the data loss (just old e-mails and pictures, but stuff that is nice to hold onto anyway)) and pre-ordered Plextor's new DVD burner). But that does leave me with a few blank drives that I can beat on all anyone needs. I'll be putting -test5 on first. I had planned on disabling the preempt, but since that was reported in the oops, I'll leave it on. I initially ran my mkraid under 2.4.20, but I'll see how it does with 2.6.0-test5. I'll mkraid on a 4 drive RAID5 setup, and see if it completes. Then raidreconf it to 5 drives. I'll scribble down the oops this time too, if I see it again. Anything else anyone wants me to try? Or other data to fill in blanks? -- Chris ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Panic when finishing raidreconf on 2.4.0-test4 with preempt 2003-09-09 19:21 ` Chris Meadors @ 2003-09-09 20:42 ` Jakob Oestergaard 2003-09-12 7:16 ` Panic when finishing raidreconf on 2.6.0-test4 " Chris Meadors 0 siblings, 1 reply; 5+ messages in thread From: Jakob Oestergaard @ 2003-09-09 20:42 UTC (permalink / raw) To: Chris Meadors; +Cc: linux-kernel On Tue, Sep 09, 2003 at 03:21:31PM -0400, Chris Meadors wrote: ... > I'll mess around this evening a bit if I get a chance. I really wasn't > in the mood to see this error again (losing five years worth of data can > do that to a person, but I've come to terms (with my own arrogance and > stupidity, along with the data loss (just old e-mails and pictures, but > stuff that is nice to hold onto anyway)) and pre-ordered Plextor's new > DVD burner). But that does leave me with a few blank drives that I can > beat on all anyone needs. Eh, ok, I'm not really sure what you did... You ran raidreconf once, and after the entire reconfiguration had run, the kernel barfed. Then what? You re-ran the reconfiguration? Same as before? If so, then I can pretty much guarantee you that your data are lost. You may get Ibas (ibas.no) to scrape off the upper layers of your disk platters, run some pattern analysis on whats left, and possibly then retrieve some of your old data, but that's about the only chance I can see you having. If you only ran raidreconf once, then there might still be a good chance to get your data back. To me it doesn't sound like this is the case, but if it is, please let me know. Sorry about your loss (but running an experimental raid reconfiguration tool on an experimental kernel without backups, well... ;) > > I'll be putting -test5 on first. I had planned on disabling the > preempt, but since that was reported in the oops, I'll leave it on. Ok. It would be interesting to see if the oops goes away when preempt is disabled. -- ................................................................ : jakob@unthought.net : And I see the elder races, : :.........................: putrid forms of man : : Jakob Østergaard : See him rise and claim the earth, : : OZ9ABN : his downfall is at hand. : :.........................:............{Konkhra}...............: ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Panic when finishing raidreconf on 2.6.0-test4 with preempt 2003-09-09 20:42 ` Jakob Oestergaard @ 2003-09-12 7:16 ` Chris Meadors 0 siblings, 0 replies; 5+ messages in thread From: Chris Meadors @ 2003-09-12 7:16 UTC (permalink / raw) To: linux-kernel [Kernel version corrected in the subject line.] [Plus forgot to include l-k.] On Tue, 2003-09-09 at 16:42, Jakob Oestergaard wrote: > On Tue, Sep 09, 2003 at 03:21:31PM -0400, Chris Meadors wrote: > > Eh, ok, I'm not really sure what you did... > > You ran raidreconf once, and after the entire reconfiguration had run, > the kernel barfed. That's what I figured... > Then what? You re-ran the reconfiguration? Same as before? ...after I ran it the second time. The problem was, it takes a while for the reconf to run. So I went to watch a movie or something. I got back and my screen was blanked, key presses wouldn't clear it. Even Alt+SysRq wouldn't respond. So, I hit the reset button. That is when I saw that the kernel wouldn't recognize the drive that should have been part of the array as being part of the array. I figured the kernel panicked, and went to reproduce it. As the reconf was running the second time, I started thinking to myself, that maybe it wasn't a good idea. That I could have probably recovered data if I had run fsck on the initial result. > If so, then I can pretty much guarantee you that your data are lost. You > may get Ibas (ibas.no) to scrape off the upper layers of your disk > platters, run some pattern analysis on whats left, and possibly then > retrieve some of your old data, but that's about the only chance I can > see you having. > > If you only ran raidreconf once, then there might still be a good chance > to get your data back. To me it doesn't sound like this is the case, > but if it is, please let me know. Nope, as I said, I ran it twice. Since my machine was hung solid, and the screen was blank, I didn't know exactly what had happened, and then my fingers worked faster than my brain. > Sorry about your loss (but running an experimental raid reconfiguration > tool on an experimental kernel without backups, well... ;) Exactly, the raidreconf HOWTO also plainly says, "unless you don't consider the data important, you should make a backup of your current RAID array now." I don't know if it it is worth adding to the documentation, to not rerun raidreconf, if it fails for what ever reason, until the array has been recovered to a fully consistent state. > Ok. It would be interesting to see if the oops goes away when preempt is > disabled. Okay, it took some time, but here is what I've tested, all on 2.6.0-test5 now: First, by mistake, raidreconf on a started array, gets all the way through, but when it discovers at the end that md0 already has disks, it exits gracefully, no oops. Second, raidreconf still triggers the oops in -test5 when expanding a 4 disk RAID5 to 5 disks. Third, mkraid completes without any trouble when building a new 4 disk array. Last, even in a kernel built without preempt support (I don't know why I thought that was the problem initially, I must have misread something), raidreconf still oops the machine when attempting to write the new superblocks. The here is some of the the output from the oops, copied by hand, as it hangs the machine solid, and I don't have anything else to capture it with: EIP is at blk_read_rq_sectors+0x50/0xd0 Process md0_raid5 Stack trace: __end_that_request_first+0x127/0x230 scsi_end_request+0x3f/0xf0 scsi_io_completion+0x1bb/0x470 sym_xpt_done+0x3b/0x50 sd_rw_intr+0x5a/0x1d0 scsi_finish_command+0x76/0xc0 run_timer_softirq+0x10a/0x1b0 scsi_softirq+0x99/0xa0 do_IRQ+0xfe/0x130 common_interupt+0x18/0x20 xor_p5_mmx_5+0x6b/0x180 xor_block+0x5b/0xc0 compute_parity+0x15d/0x340 default_wake_function+0x0/0x30 handle_stripe+0x95f/0xcc0 __wake_up_common+0x31/0x60 raid5d+07d/0x140 default_wake_function+0x0/0x30 md_thread+0x0/0x190 kernel_thread_helper+0x5/0x10 If you need anything else, I can reproduce this at will. It just takes about 30 minutes to reconf to 5 9GB drives. -- Chris ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-09-12 7:16 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-09-06 21:32 Panic when finishing raidreconf on 2.4.0-test4 with preempt Chris Meadors 2003-09-09 18:11 ` Jakob Oestergaard 2003-09-09 19:21 ` Chris Meadors 2003-09-09 20:42 ` Jakob Oestergaard 2003-09-12 7:16 ` Panic when finishing raidreconf on 2.6.0-test4 " Chris Meadors
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox