From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Max T. Woodbury" Subject: Re: ide-io.c, ide_do_request -- race condition? Date: Sun, 11 Jul 2004 10:38:16 -0400 Sender: linux-ide-owner@vger.kernel.org Message-ID: <40F150D8.B7FB27D3@verizon.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from out012pub.verizon.net ([206.46.170.137]:27377 "EHLO out012.verizon.net") by vger.kernel.org with ESMTP id S266607AbUGKOiP (ORCPT ); Sun, 11 Jul 2004 10:38:15 -0400 Received: from M2Dual.localdomain ([4.15.37.40]) by out012.verizon.net (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP id <20040711143814.ZPYF2198.out012.verizon.net@M2Dual.localdomain> for ; Sun, 11 Jul 2004 09:38:14 -0500 Received: from verizon.net ([10.0.0.10]) by M2Dual.localdomain (8.12.8/8.12.8) with ESMTP id i6BEcClo013904 for ; Sun, 11 Jul 2004 10:38:13 -0400 List-Id: linux-ide@vger.kernel.org To: linux-ide@vger.kernel.org Bartlomiej Zolnierkiewicz wrote: > On Saturday 10 of July 2004 21:25, Max T. Woodbury wrote: > > Bartlomiej Zolnierkiewicz wrote: > > > On Tuesday 06 of July 2004 00:51, Max T. Woodbury wrote: > > > > (The fact that the machine runs other OSs without noticeable > > > > problems is also an indication that the underlying hardware > > > > is in working order. Only the system software and disk > > > > drive changed between the two setups and I have explained > > > > why I do not think it is the disk drive.) > > > > > > disk drive changed? please explain > > > > The drives in the Thinkpad 760 are mounted in caddies that can > > be easily exchanged when the power is off. I have three drives. > > One runs the machine as a GPS, the second as a code development > > Windows box and the third is my Linux code development machine. > > I'm having a fair amount of trouble getting the Linux setup to do > > what I want it to do. Not only did the Linux install flake out, > > but I still can't get the PCMCIA sockets working, but that's another > > issue for another list and I haven't quite got enough information > > on that set of problems to make a request for help useful... In > > order to get to the internet with Linux I have to use its docking > > station. No such problem with Windoze. (Yeah, absolutely > > disgusting but that's what's happening.) > > Are you sure that 'Linux' disk is okay? > http://smartmontool.sf.net Yes. The CDs all passed the scanner RedHat provided and they installed correctly on a desktop system. The drive I used for linux was zeroed and it passed a read/write badblock scan on the whole disk and several on the partition in question. Still, it could be that the drive is getting the data into its buffer correctly but not out to the magnetic media, but the fact that the errors move around so much indicates that it is not the media itself that is bad. My next comment is on the fringes of my competence and is mostly opinion: If the drive is failing to actually transfer the data from its buffer to the media, there should be an error indication of some sort and it should be passed back to the OS where it can be logged. I didn't see any such code, but I'm not sure I'd recognize it if I saw it. I do wish there were a write check option. It does nasty things to performance, but improves system integrity. Of course I haven't checked the upper layers in the system yet. There might be such an option at the block cache or file system levels and I'd not have seen it yet. It's been about four years since I went through that code last so I'm foggy on the details and it has almost certainly changed since I looked at it. > > > __cli() there is just "paranoia" and it is gone in 2.6 kernels > > > > That's not quite correct. There is a check and a BUG() call to assure > > that interrupts are disabled on entry in the 2.6 code I've seen. If I > > understand the new code correctly, you've replaced the single interrupt > > disable call at the top of this routine by a bunch of similar calls > > elsewhere before entering this routine. That would make interrupt latency > > worse, not better. > > This is not correct - __cli() is really just a "paranoia", you may remove > it if you like and it shouldn't change anything (but we would like to know > if it changes something ie. fixes fs corruption :-). Hmm. At the place it was, it was harmless paranoia, but I thought you might not have been being paranoid enough. You should not have taken the lockout off until after the command was actually started. But doing that was not enough to fix the problem... I did a quick scan of the IRQ level code in the arch/i386 tree to get an idea o f what it did and recognized the pattern from other OSs I've seen, so I did not look at all the details closely. I think I saw CLI and STI pairs that might have nulled the CLI in ide-io.c in there, but I did not try to figure exactly which conditionals were on or off so I was not certain of the exact set of code that was operational. As I said, I was scanning for an overall pattern. > Please take a look at generic_unplug_device() in drivers/block/ll_rw_blk.c: If I remember correctly, that's the block cache layer. The last time I looked at it, it didn't do hot-plug, so I need to look at it again... I'll do as you suggest. > > > > bunch of code has been executing under interrupt lockout when > > > > there was no need for the lockout. Not a huge problem, just > > > > strange. Also, in 2.6, the lockout has to begin before the > > > > routine is called which is why I said 2.6 was worse. > > > > > > 2.6 is much better - you have one spinlock per block queue while > > > in 2.4 you have one global spinlock (io_request_lock) for all > > > block requests. > > > > Yep. That's a little courser than the model I was using on the never > > completed OS design I did in the early 70s, but it is better than the > > single global lock in 2.4 and way better than the design of many other > > OSs I've waded into. Still, you've got a complete interrupt lockout > > in place at the top of this routine which has two bad effects: 1) the > > interrupt latency is longer and 2) there is no one place to turn it off > > any longer. > > 'lockout' happens earlier both in 2.4 and 2.6 -> generic_unplug_device(). Hmm. I'll have to look at that layer again. I think we're talking about a different level of lockout. I suspect that you mean that this is a critical region of code and only one thread of execution can be in it at one time. I was thinking 'atomic' in the sense that it could not be interrupted and that its timing was consistent. I'm not sure I am using the term in the same way everyone else does. > > > > I've been going through the linux-ide archives and noticed > > > > that there have been a number of mystery fs corruption issues > > > > that just disappeared. This might be related. There was also > > > > a DMA problem that might have been relevant, but I know it does > > > > not apply in this case since "hdparm" shows DMA turned off by > > > > default on this machine. > > > > > > dmesg output would be helpful, the same goes for lspci output > > > > That is an important part of this issue. Nothing shows in dmesg > > until it is much too late. The read errors get reported, but no > > write errors. There should be a 'pirntk' in 'ide_abort' and > > 'idedisk_abort' (I may have the routine names wrong, I'm doing > > this from memory) but there isn't, (I'll post a patch for that fix > > if you want.) so I can't tell if the problem is coming down from > > the upper layers. I also think there should be a 'printk' associated > > with the posting of the immediate stop command. (Again, this is from > > memory. I'll post a patch with all this if you want me to. It will > > not fix any problems, but might shed light.) > > I believe that dmesg/lspci would be useful for me or other people reading > this because it allows us to know a bit more about this specific hardware > ('Thinkpad 760' is really not enough). Yep. lspci will be OS version independent. I'll get you that shortly. (I booted the original FC1 distribution in an attempt to get the PCMCIA stuff to work and it promptly (well after 12 hours) messed up the file system.) I'm in the process of cleaning up the disk again. 'dmesg' is somewhat dependent on the OS features configured. I'll get you the one from the installation boot disk where the problem was first noticed. I take it you want the initial boot part, not the part after rc.sysinit or whatever is driving the installation process takes over. It also makes a difference if the machine is docked or undocked. I'll get you one of the undocked machine. Both configurations have the problem but the undocked configuration is simpler. > > Still, this is an important problem. File system corruption is just not > > something an OS should allow to happen unless the user does something > > extreme. > > Without more info we won't go further in solving this issue. I really don't expect you to solve the problem for me. I have to do most of the work myself. I'd recognized that I was on the borders of my competence and have asked for advice. I recognized that I was making assumptions about the code that I needed to check. While you haven't exactly answered the questions I asked, you have responded in a way that provides most of the information I need and pointed me to more places to look for answers. There was also the possibility that this problem has an impact beyond my personal involvement. I'm not certain that it does, but it would be irresponsible of me to ignore such a possibility. (And I just arrogant enough that I think I have such a responsibility...) Thank you for your help so far. Max