From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Max T. Woodbury" Subject: Re: ide-io.c, ide_do_request -- race condition? Date: Sat, 10 Jul 2004 15:25:05 -0400 Sender: linux-ide-owner@vger.kernel.org Message-ID: <40F04291.434D8AF9@verizon.net> References: <40E99531.BDB8D339@verizon.net> <200407072143.07618.bzolnier@elka.pw.edu.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from out003pub.verizon.net ([206.46.170.103]:27831 "EHLO out003.verizon.net") by vger.kernel.org with ESMTP id S266352AbUGJTYk (ORCPT ); Sat, 10 Jul 2004 15:24:40 -0400 List-Id: linux-ide@vger.kernel.org To: Bartlomiej Zolnierkiewicz , linux-ide@vger.kernel.org Bartlomiej Zolnierkiewicz wrote: > > Hi, > > On Tuesday 06 of July 2004 00:51, Max T. Woodbury wrote: > > > (The fact that the machine runs other OSs without noticeable > > problems is also an indication that the underlying hardware > > is in working order. Only the system software and disk > > drive changed between the two setups and I have explained > > why I do not think it is the disk drive.) > > disk drive changed? please explain The drives in the Thinkpad 760 are mounted in caddies that can be easily exchanged when the power is off. I have three drives. One runs the machine as a GPS, the second as a code development Windows box and the third is my Linux code development machine. I'm having a fair amount of trouble getting the Linux setup to do what I want it to do. Not only did the Linux install flake out, but I still can't get the PCMCIA sockets working, but that's another issue for another list and I haven't quite got enough information on that set of problems to make a request for help useful... In order to get to the internet with Linux I have to use its docking station. No such problem with Windoze. (Yeah, absolutely disgusting but that's what's happening.) > > I concluded that there was a race condition someplace in > > the disk drivers sequence and went a hunting through the > > driver code starting from the IDE end. > > > > In ide-io.c there is a block comment just before the place > > next time please just paste the code/comments you refer to, > it will make reading/replying a lot easier It's a bit difficult but willco... > > where the i/o request setup routine is called. It notes > > that some older chipsets do not like to be interrupted > > during the setup process. It also notes that 'massive > > fs corruption" will result if the setup process is > > interrupted. (Hmmmm.) A search of the kernel mailing list > > archives found one note on this piece of code where commands > > were getting lost (rarely) in an SMP environment. I thought > > do you have a link to this note handy? not handy but... linux-kernel 2003-03-27 23:54:19 Hmmm. I don't remember what my original search criteria was, but this search brought up a Lot more stuff... mostly irrelevant... > if I guess right, you are referring to this part of ide-io.c (2.6.7 here): > > /* > * Some systems have trouble with IDE IRQs arriving while > * the driver is still setting things up. So, here we disable > * the IRQ used by this interface while the request is being started. > * This may look bad at first, but pretty much the same thing > * happens anyway when any interrupt comes in, IDE or otherwise > * -- the kernel masks the IRQ while it is being handled. > */ > if (hwif->irq != masked_irq) > disable_irq_nosync(hwif->irq); > > mask IRQ used by the IDE port > > spin_unlock(&ide_lock); > local_irq_enable(); > > enable local IRQs, IDE IRQ stays masked > > /* allow other IRQs while we start this request */ > startstop = start_request(drive, rq); > > start request > > spin_lock_irq(&ide_lock); > > disable local IRQs > > if (hwif->irq != masked_irq) > enable_irq(hwif->irq); > > unmask IDE IRQ Yep. That's the chunk... > > > This is 2.4.22 code, but it has not been changed for 2.4.26. > > There is some significant changes with 2.6.7, but it is worse > > if anything. A little more explanation is probably in order. > > I don't think it's worse, more below. > > > At the top of the routine all interrupts are turned off and > > If you are referring to a __cli() at a top of ide_do_request() > please notice that comment says "paranoia" - IRQs should be already > disabled by block layer which does spin_lock_irqsave() earlier. Yep, but enables and disables should be paired and that is the one that paired with the enable just before the call to start_request. > > > the hardware group busy bit is set. This bit is protected by > > a spinlock, so there should be no need to lock out interrupts > > while manipulating it. There are a few other conditions > > checked next, none requiring interrupt lockout. So a whole > > Please note that the same spinlock can be accessed from IRQ > and non-IRQ context so IRQs must be disabled in non-IRQ context > to prevent deadlocks. Yep. There were LOTS of problems in VMS drivers with exactly that problem. I had to explain it to customers a few times. The concept of spinlock priority and its use to prevent deadlocks and the relation between priority and interrupt lockout level was more complex in VMS than in Linux, since the resource allocation model was more complex. Your design is less complex, thus less prone to problems. Still, the Intel interrupt model reminds me of the one on the cheaper PDP-11s, but that's getting way off topic. At any rate, the concept is about 20 years old and I have a reasonable idea of what you are saying and why it is important. (Spin locks themselves go back at least 30 years.) > __cli() there is just "paranoia" and it is gone in 2.6 kernels That's not quite correct. There is a check and a BUG() call to assure that interrupts are disabled on entry in the 2.6 code I've seen. If I understand the new code correctly, you've replaced the single interrupt disable call at the top of this routine by a bunch of similar calls elsewhere before entering this routine. That would make interrupt latency worse, not better. > > bunch of code has been executing under interrupt lockout when > > there was no need for the lockout. Not a huge problem, just > > strange. Also, in 2.6, the lockout has to begin before the > > routine is called which is why I said 2.6 was worse. > > 2.6 is much better - you have one spinlock per block queue while > in 2.4 you have one global spinlock (io_request_lock) for all > block requests. Yep. That's a little courser than the model I was using on the never completed OS design I did in the early 70s, but it is better than the single global lock in 2.4 and way better than the design of many other OSs I've waded into. Still, you've got a complete interrupt lockout in place at the top of this routine which has two bad effects: 1) the interrupt latency is longer and 2) there is no one place to turn it off any longer. > > Then comes the block comment, the all-CPU lockout of completion > > interrupts, and just when the comment suggests that all > > interrupts should be turned off, they are turned back on. > > > > I can understand the problem the comment might be addressing. > > The interface could well be sensitive to the timing of the > > over-all load sequence or a read from the status register > > checking for 'ready' could bollix the whole setup process. > > This final setup phase really should be an atomic operation. > > > > I've done a little 'playing' with this code. First I tried > > just removing the enable call. It seems to have had absolutely > > no effect. The system did NOT hang because the interrupt lock > > out did not end. The file system corruption showed up again > > on applying a subsequent update. I've also made a slightly > > more venturesome change. I pulled the disable at the head of > > the routine and put it just before the setup call and moved > > the enable to after the setup call. I've seen no problems > > please just inline your changes / patches (please use 'diff -u') If you don't mind, I'd prefer to wait on that. I asked for comments because the results of the modification were not exactly as I expected and further testing seems to indicate that this may not be the real source of the problem. Since my understanding is incomplete, any patch I post is likely to be wrong in one way or another. > > process for an IDE controller could be fairly lengthy and > > that interrupt latency could become a problem, and enabling > > local interrupts would relieve the problem, but not at the > > cost of corrupting system integrity? > > yes, if this is a issue here but we don't know yet > > > C) What is the procedure for getting it included? > > sending patch ('diff -u' format) with a description to Maintainer > (happens to be me in case of IDE ;-) and linux-ide mailing list, > also linux-kernel mailing list if the patch is important / needs > more testers etc.) Thanks. I was hoping to get your attention, but I did not want to presume on your time, thus the post to linux-ide. (If you don't mind, linux-kernel is way too noisy. I subscribed once a good while ago and turned it off because I could not handle the volume of just plain junk that gets posted to that list. Linus must be some kind of saint if he wades through all of it...) > You may also want to read > Documentation/SubmittingPatches Done that even before I posted this. Since this problem is not even close to the reliable patch stage, and is down=level on even the current 2.4 series, much less the 2.6 series, posting a patch did not seem to be appropriate. > and > Documentation/CodingStyle > from the linux kernel source package. Which is honored more in the breach than the observance... (OK. I exaggerate.) > > > I've been going through the linux-ide archives and noticed > > that there have been a number of mystery fs corruption issues > > that just disappeared. This might be related. There was also > > a DMA problem that might have been relevant, but I know it does > > not apply in this case since "hdparm" shows DMA turned off by > > default on this machine. > > dmesg output would be helpful, the same goes for lspci output That is an important part of this issue. Nothing shows in dmesg until it is much too late. The read errors get reported, but no write errors. There should be a 'pirntk' in 'ide_abort' and 'idedisk_abort' (I may have the routine names wrong, I'm doing this from memory) but there isn't, (I'll post a patch for that fix if you want.) so I can't tell if the problem is coming down from the upper layers. I also think there should be a 'printk' associated with the posting of the immediate stop command. (Again, this is from memory. I'll post a patch with all this if you want me to. It will not fix any problems, but might shed light.) > You may also consider opening bug at http://bugme.osdl.org > and attaching all useful files to a bug entry (dmesg, lspci > and hdparm outputs, kernel config etc.). If it was a problem with relevant dmesg or hdparm results, I would have done exactly that. The problem is fairly reproducible with my hardware, in the sense that it will definitely happen within a two hour window while doing an installation from floppy and CD on a stand-alone machine, but there is no command that I can type that will always produce the problem within a reasonable amount of time after that command is issued. I've been trying to produce an installation floppy where I actually have the .config file that goes with the kernel that is blowing up, but dissecting the RedHat installation image is not exactly easy, and the hardware is probably listed as 'unsupported' by them so they will just duck the issue. (Since the machine can NOT boot from a CD, I can not install FC2 which does 2.6 and it does not have enough memory to install E3. My understanding of the kernel is probably better than my knowledge of all the command line tools and procedures.) Still, this is an important problem. File system corruption is just not something an OS should allow to happen unless the user does something extreme. > > Regards, > Bartlomiej Max