From: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
To: "Max T. Woodbury" <max.teneyck.woodbury@verizon.net>
Cc: linux-ide@vger.kernel.org
Subject: Re: ide-io.c, ide_do_request -- race condition?
Date: Sat, 10 Jul 2004 22:07:19 +0200 [thread overview]
Message-ID: <200407102207.19713.bzolnier@elka.pw.edu.pl> (raw)
In-Reply-To: <40F04291.434D8AF9@verizon.net>
On Saturday 10 of July 2004 21:25, Max T. Woodbury wrote:
> Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> >
> > On Tuesday 06 of July 2004 00:51, Max T. Woodbury wrote:
> > > (The fact that the machine runs other OSs without noticeable
> > > problems is also an indication that the underlying hardware
> > > is in working order. Only the system software and disk
> > > drive changed between the two setups and I have explained
> > > why I do not think it is the disk drive.)
> >
> > disk drive changed? please explain
>
> The drives in the Thinkpad 760 are mounted in caddies that can
> be easily exchanged when the power is off. I have three drives.
> One runs the machine as a GPS, the second as a code development
> Windows box and the third is my Linux code development machine.
> I'm having a fair amount of trouble getting the Linux setup to do
> what I want it to do. Not only did the Linux install flake out,
> but I still can't get the PCMCIA sockets working, but that's another
> issue for another list and I haven't quite got enough information
> on that set of problems to make a request for help useful... In
> order to get to the internet with Linux I have to use its docking
> station. No such problem with Windoze. (Yeah, absolutely
> disgusting but that's what's happening.)
Are you sure that 'Linux' disk is okay?
http://smartmontool.sf.net
> > __cli() there is just "paranoia" and it is gone in 2.6 kernels
>
> That's not quite correct. There is a check and a BUG() call to assure
> that interrupts are disabled on entry in the 2.6 code I've seen. If I
> understand the new code correctly, you've replaced the single interrupt
> disable call at the top of this routine by a bunch of similar calls
> elsewhere before entering this routine. That would make interrupt latency
> worse, not better.
This is not correct - __cli() is really just a "paranoia", you may remove
it if you like and it shouldn't change anything (but we would like to know
if it changes something ie. fixes fs corruption :-).
Please take a look at generic_unplug_device() in drivers/block/ll_rw_blk.c:
spin_lock_irq() disables IRQs
__generic_unplug_device() calls queue->request_fn (ide_do_request)
spin_unlock_irq() enables IRQs
The only difference between 2.4 and 2.6 is that 2.4 is using
spin_lock_irqsave() / spin_unlock_irqrestore() variants.
> > > bunch of code has been executing under interrupt lockout when
> > > there was no need for the lockout. Not a huge problem, just
> > > strange. Also, in 2.6, the lockout has to begin before the
> > > routine is called which is why I said 2.6 was worse.
> >
> > 2.6 is much better - you have one spinlock per block queue while
> > in 2.4 you have one global spinlock (io_request_lock) for all
> > block requests.
>
> Yep. That's a little courser than the model I was using on the never
> completed OS design I did in the early 70s, but it is better than the
> single global lock in 2.4 and way better than the design of many other
> OSs I've waded into. Still, you've got a complete interrupt lockout
> in place at the top of this routine which has two bad effects: 1) the
> interrupt latency is longer and 2) there is no one place to turn it off
> any longer.
'lockout' happens earlier both in 2.4 and 2.6 -> generic_unplug_device().
> Thanks. I was hoping to get your attention, but I did not want to
> presume on your time, thus the post to linux-ide. (If you don't
> mind, linux-kernel is way too noisy. I subscribed once a good while
> ago and turned it off because I could not handle the volume of just
> plain junk that gets posted to that list. Linus must be some kind of
> saint if he wades through all of it...)
well, I don't read everything and I guess Linus does the same 8)
> > > I've been going through the linux-ide archives and noticed
> > > that there have been a number of mystery fs corruption issues
> > > that just disappeared. This might be related. There was also
> > > a DMA problem that might have been relevant, but I know it does
> > > not apply in this case since "hdparm" shows DMA turned off by
> > > default on this machine.
> >
> > dmesg output would be helpful, the same goes for lspci output
>
> That is an important part of this issue. Nothing shows in dmesg
> until it is much too late. The read errors get reported, but no
> write errors. There should be a 'pirntk' in 'ide_abort' and
> 'idedisk_abort' (I may have the routine names wrong, I'm doing
> this from memory) but there isn't, (I'll post a patch for that fix
> if you want.) so I can't tell if the problem is coming down from
> the upper layers. I also think there should be a 'printk' associated
> with the posting of the immediate stop command. (Again, this is from
> memory. I'll post a patch with all this if you want me to. It will
> not fix any problems, but might shed light.)
I believe that dmesg/lspci would be useful for me or other people reading
this because it allows us to know a bit more about this specific hardware
('Thinkpad 760' is really not enough).
> Still, this is an important problem. File system corruption is just not
> something an OS should allow to happen unless the user does something
> extreme.
Without more info we won't go further in solving this issue.
Bartlomiej
next prev parent reply other threads:[~2004-07-10 20:01 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-07-05 22:51 ide-io.c, ide_do_request -- race condition? Max T. Woodbury
2004-07-07 19:43 ` Bartlomiej Zolnierkiewicz
2004-07-10 19:25 ` Max T. Woodbury
2004-07-10 20:07 ` Bartlomiej Zolnierkiewicz [this message]
2004-07-11 15:02 ` Max T. Woodbury
2004-07-12 15:15 ` Max T. Woodbury
2004-07-12 15:47 ` Bartlomiej Zolnierkiewicz
-- strict thread matches above, loose matches on Subject: below --
2004-07-11 14:38 Max T. Woodbury
2004-07-12 17:52 Max T. Woodbury
2004-07-12 18:35 ` Eric D. Mudama
2004-07-16 6:12 ` Max T. Woodbury
2004-07-16 7:02 ` Jens Axboe
2004-07-16 16:33 ` Max T. Woodbury
2004-07-16 17:57 ` Jens Axboe
2004-07-16 7:06 ` Jeff Garzik
2004-07-16 17:45 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200407102207.19713.bzolnier@elka.pw.edu.pl \
--to=b.zolnierkiewicz@elka.pw.edu.pl \
--cc=linux-ide@vger.kernel.org \
--cc=max.teneyck.woodbury@verizon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.