ide-io.c, ide_do_request -- race condition?

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ide-io.c, ide_do_request -- race condition?
@ 2004-07-05 22:51 Max T. Woodbury
  2004-07-07 19:43 ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 16+ messages in thread
From: Max T. Woodbury @ 2004-07-05 22:51 UTC (permalink / raw)
  To: linux-ide

I have a question about a specific statement in the ide-io.c
code.  However, I know that I need to establish a context
for that question, so please be patient with the fairly long
discussion that follows.

I recently decided to install Linux (Fedora Core 1 to be
specific) on a venerably old (Thinkpad 760ED) laptop.
The process proved troublesome.  Part way through the
installation, the file system became corrupted, throwing
read errors on a block in the installed packages (i.e.
RPM) database.  I did a bad block scan, zeroed the partition
and tried again.  Twice.  The list of bad blocks was NOT
consistent from time to time.  That eliminated the disk
drive as the source of the problem.  A fourth attempt
produced a clue.  I was watching the installation using
"top" and it took a bit longer for the corruption to occur.
Three attempts later and I had a clean installation.  The
trick was to put a heavy computational load on the machine
("while [[ 0 == 0 ]]; do echo -n; done &" x 5) during the
installation.  However the installation took about 6 hours
as a result, two and a half to three times the normal time.

The problem did not end there.  Updating from the 
installation base to the current patch level also corrupted
the file system.  It took three more tries, from scratch, to
get a usable system.

Surprisingly, a kernel build did NOT produce any corruption.
This pretty much eliminated memory as a source of the problem.
The build process is a much more memory intense process than
the installation process.  It would have blown up faster than
the installations if there was a memory problem.

(The fact that the machine runs other OSs without noticeable
problems is also an indication that the underlying hardware
is in working order.  Only the system software and disk
drive changed between the two setups and I have explained
why I do not think it is the disk drive.)

I concluded that there was a race condition someplace in
the disk drivers sequence and went a hunting through the 
driver code starting from the IDE end.

In ide-io.c there is a block comment just before the place
where the i/o request setup routine is called.  It notes
that some older chipsets do not like to be interrupted
during the setup process.  It also notes that 'massive
fs corruption" will result if the setup process is
interrupted.  (Hmmmm.)  A search of the kernel mailing list
archives found one note on this piece of code where commands
were getting lost (rarely) in an SMP environment.  I thought
that this would be a good piece of code to fine tooth.

The code was more or less what I expected.  A pair of calls
locked the register set by turing the relevant interrupt off
before the setup and back on when it was done.  They were the
obvious places to do the SMP synchronization and inspection
proved that that was in fact the case so premature interrupts
should not be a problem if everything else was kosher.

What I found next was very surprising given the comments.
Local interrupts were then turned back on!  I expected to see
ALL interrupts turned off here because the setup had to be
atomic, but the opposite was what the code did.  What is going
on here?

This is 2.4.22 code, but it has not been changed for 2.4.26.
There is some significant changes with 2.6.7, but it is worse
if anything.  A little more explanation is probably in order.

At the top of the routine all interrupts are turned off and
the hardware group busy bit is set.  This bit is protected by
a spinlock, so there should be no need to lock out interrupts
while manipulating it.  There are a few other conditions
checked next, none requiring interrupt lockout.  So a whole
bunch of code has been executing under interrupt lockout when
there was no need for the lockout.  Not a huge problem, just
strange.  Also, in 2.6, the lockout has to begin before the
routine is called which is why I said 2.6 was worse.

Then comes the block comment, the all-CPU lockout of completion
interrupts, and just when the comment suggests that all 
interrupts should be turned off, they are turned back on.

I can understand the problem the comment might be addressing.
The interface could well be sensitive to the timing of the
over-all load sequence or a read from the status register
checking for 'ready' could bollix the whole setup process.
This final setup phase really should be an atomic operation.

I've done a little 'playing' with this code.  First I tried
just removing the enable call.  It seems to have had absolutely
no effect.  The system did NOT hang because the interrupt lock
out did not end.  The file system corruption showed up again
on applying a subsequent update.  I've also made a slightly
more venturesome change.  I pulled the disable at the head of
the routine and put it just before the setup call and moved
the enable to after the setup call.  I've seen no problems
with this variant, but I might not see any with the hardware
I have.  A machine with more than one IDE interface on the
same IRQ line might show a problem, particularly if it was a
multi-processor running SMP.  (I have an SMP machine, but not
one with three IDE controllers.)

I haven't dug around the Linux kernel as much as I probably
should have.  (That does not mean I am not familiar with
kernels.  I did a lot of digging through PDP 11 and VAX
operating system code including RSX-11 A, B, D and M, 
RSTS-11, RT-11, DOS-11 and VMS plus some bare metal [paper 
tape loader!] applications.)  I can see that the setup 
process for an IDE controller could be fairly lengthy and 
that interrupt latency could become a problem, and enabling
local interrupts would relieve the problem, but not at the
cost of corrupting system integrity?

Could someone else fine tooth this?  Just to make sure I'm
not missing something major (or minor)...

1) Modify the FC1 installation environment so that the
   correct lockout sequence is used.  This will require that
   I build a new boot floppy.

2) Do another installation without using the system load
   trick.  This should be fairly definitive since I have not
   be able to do a complete system load without problem in
   half a dozen attempts.

3) Do the updates without the system load trick.

Finally, if this proves to be an acceptable cure to my
problem,

A) Does there need to be a way to turn this fix off when
   it is not needed? (and how do you tell if it is needed?
   boot command line option?  Blacklist?)

B) Should it be included in the standard distribution?

C) What is the procedure for getting it included?

I've been going through the linux-ide archives and noticed
that there have been a number of mystery fs corruption issues
that just disappeared.  This might be related.  There was also
a DMA problem that might have been relevant, but I know it does
not apply in this case since "hdparm" shows DMA turned off by
default on this machine.

Max T.E. Woodbury
max@mtew.isa-geek.net

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-05 22:51 ide-io.c, ide_do_request -- race condition? Max T. Woodbury
@ 2004-07-07 19:43 ` Bartlomiej Zolnierkiewicz
  2004-07-10 19:25   ` Max T. Woodbury
  0 siblings, 1 reply; 16+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2004-07-07 19:43 UTC (permalink / raw)
  To: Max T. Woodbury; +Cc: linux-ide


Hi,

On Tuesday 06 of July 2004 00:51, Max T. Woodbury wrote:

> The problem did not end there.  Updating from the
> installation base to the current patch level also corrupted
> the file system.  It took three more tries, from scratch, to
> get a usable system.
>
> Surprisingly, a kernel build did NOT produce any corruption.
> This pretty much eliminated memory as a source of the problem.
> The build process is a much more memory intense process than
> the installation process.  It would have blown up faster than
> the installations if there was a memory problem.
>
> (The fact that the machine runs other OSs without noticeable
> problems is also an indication that the underlying hardware
> is in working order.  Only the system software and disk
> drive changed between the two setups and I have explained
> why I do not think it is the disk drive.)

disk drive changed?  please explain

> I concluded that there was a race condition someplace in
> the disk drivers sequence and went a hunting through the
> driver code starting from the IDE end.
>
> In ide-io.c there is a block comment just before the place

next time please just paste the code/comments you refer to,
it will make reading/replying a lot easier

> where the i/o request setup routine is called.  It notes
> that some older chipsets do not like to be interrupted
> during the setup process.  It also notes that 'massive
> fs corruption" will result if the setup process is
> interrupted.  (Hmmmm.)  A search of the kernel mailing list
> archives found one note on this piece of code where commands
> were getting lost (rarely) in an SMP environment.  I thought

do you have a link to this note handy?

> that this would be a good piece of code to fine tooth.
>
> The code was more or less what I expected.  A pair of calls
> locked the register set by turing the relevant interrupt off
> before the setup and back on when it was done.  They were the
> obvious places to do the SMP synchronization and inspection
> proved that that was in fact the case so premature interrupts
> should not be a problem if everything else was kosher.
>
> What I found next was very surprising given the comments.
> Local interrupts were then turned back on!  I expected to see
> ALL interrupts turned off here because the setup had to be
> atomic, but the opposite was what the code did.  What is going
> on here?

if I guess right, you are referring to this part of ide-io.c (2.6.7 here):

		/*
		 * Some systems have trouble with IDE IRQs arriving while
		 * the driver is still setting things up.  So, here we disable
		 * the IRQ used by this interface while the request is being started.
		 * This may look bad at first, but pretty much the same thing
		 * happens anyway when any interrupt comes in, IDE or otherwise
		 *  -- the kernel masks the IRQ while it is being handled.
		 */
		if (hwif->irq != masked_irq)
			disable_irq_nosync(hwif->irq);

mask IRQ used by the IDE port

		spin_unlock(&ide_lock);
		local_irq_enable();

enable local IRQs, IDE IRQ stays masked

			/* allow other IRQs while we start this request */
		startstop = start_request(drive, rq);

start request

		spin_lock_irq(&ide_lock);

disable local IRQs

		if (hwif->irq != masked_irq)
			enable_irq(hwif->irq);

unmask IDE IRQ

> This is 2.4.22 code, but it has not been changed for 2.4.26.
> There is some significant changes with 2.6.7, but it is worse
> if anything.  A little more explanation is probably in order.

I don't think it's worse, more below.

> At the top of the routine all interrupts are turned off and

If you are referring to a __cli() at a top of ide_do_request()
please notice that comment says "paranoia" - IRQs should be already
disabled by block layer which does spin_lock_irqsave() earlier.

> the hardware group busy bit is set.  This bit is protected by
> a spinlock, so there should be no need to lock out interrupts
> while manipulating it.  There are a few other conditions
> checked next, none requiring interrupt lockout.  So a whole

Please note that the same spinlock can be accessed from IRQ
and non-IRQ context so IRQs must be disabled in non-IRQ context
to prevent deadlocks.

__cli() there is just "paranoia" and it is gone in 2.6 kernels

> bunch of code has been executing under interrupt lockout when
> there was no need for the lockout.  Not a huge problem, just
> strange.  Also, in 2.6, the lockout has to begin before the
> routine is called which is why I said 2.6 was worse.

2.6 is much better - you have one spinlock per block queue while
in 2.4 you have one global spinlock (io_request_lock) for all
block requests.

> Then comes the block comment, the all-CPU lockout of completion
> interrupts, and just when the comment suggests that all
> interrupts should be turned off, they are turned back on.
>
> I can understand the problem the comment might be addressing.
> The interface could well be sensitive to the timing of the
> over-all load sequence or a read from the status register
> checking for 'ready' could bollix the whole setup process.
> This final setup phase really should be an atomic operation.
>
> I've done a little 'playing' with this code.  First I tried
> just removing the enable call.  It seems to have had absolutely
> no effect.  The system did NOT hang because the interrupt lock
> out did not end.  The file system corruption showed up again
> on applying a subsequent update.  I've also made a slightly
> more venturesome change.  I pulled the disable at the head of
> the routine and put it just before the setup call and moved
> the enable to after the setup call.  I've seen no problems

please just inline your changes / patches (please use 'diff -u')

> with this variant, but I might not see any with the hardware
> I have.  A machine with more than one IDE interface on the
> same IRQ line might show a problem, particularly if it was a
> multi-processor running SMP.  (I have an SMP machine, but not
> one with three IDE controllers.)
>
> I haven't dug around the Linux kernel as much as I probably
> should have.  (That does not mean I am not familiar with
> kernels.  I did a lot of digging through PDP 11 and VAX
> operating system code including RSX-11 A, B, D and M,
> RSTS-11, RT-11, DOS-11 and VMS plus some bare metal [paper
> tape loader!] applications.)  I can see that the setup

:-)

> process for an IDE controller could be fairly lengthy and
> that interrupt latency could become a problem, and enabling
> local interrupts would relieve the problem, but not at the
> cost of corrupting system integrity?

yes, if this is a issue here but we don't know yet

> Could someone else fine tooth this?  Just to make sure I'm
> not missing something major (or minor)...
>
> 1) Modify the FC1 installation environment so that the
>    correct lockout sequence is used.  This will require that
>    I build a new boot floppy.
>
> 2) Do another installation without using the system load
>    trick.  This should be fairly definitive since I have not
>    be able to do a complete system load without problem in
>    half a dozen attempts.
>
> 3) Do the updates without the system load trick.
>
> Finally, if this proves to be an acceptable cure to my
> problem,
>
> A) Does there need to be a way to turn this fix off when
>    it is not needed? (and how do you tell if it is needed?
>    boot command line option?  Blacklist?)

depends on the fix, we need to see it

> B) Should it be included in the standard distribution?

in the standard kernel, yes

> C) What is the procedure for getting it included?

sending patch ('diff -u' format) with a description to Maintainer
(happens to be me in case of IDE ;-) and linux-ide mailing list,
also linux-kernel mailing list if the patch is important / needs
more testers etc.)

You may also want to read
	Documentation/SubmittingPatches
and
	Documentation/CodingStyle
from the linux kernel source package.

> I've been going through the linux-ide archives and noticed
> that there have been a number of mystery fs corruption issues
> that just disappeared.  This might be related.  There was also
> a DMA problem that might have been relevant, but I know it does
> not apply in this case since "hdparm" shows DMA turned off by
> default on this machine.

dmesg output would be helpful, the same goes for lspci output

You may also consider opening bug at http://bugme.osdl.org
and attaching all useful files to a bug entry (dmesg, lspci
and hdparm outputs, kernel config etc.).

Regards,
Bartlomiej


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-07 19:43 ` Bartlomiej Zolnierkiewicz
@ 2004-07-10 19:25   ` Max T. Woodbury
  2004-07-10 20:07     ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 16+ messages in thread
From: Max T. Woodbury @ 2004-07-10 19:25 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz, linux-ide

Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Tuesday 06 of July 2004 00:51, Max T. Woodbury wrote:
> 
> > (The fact that the machine runs other OSs without noticeable
> > problems is also an indication that the underlying hardware
> > is in working order.  Only the system software and disk
> > drive changed between the two setups and I have explained
> > why I do not think it is the disk drive.)
> 
> disk drive changed?  please explain

The drives in the Thinkpad 760 are mounted in caddies that can
be easily exchanged when the power is off.  I have three drives.
One runs the machine as a GPS, the second as a code development
Windows box and the third is my Linux code development machine.
I'm having a fair amount of trouble getting the Linux setup to do
what I want it to do.  Not only did the Linux install flake out,
but I still can't get the PCMCIA sockets working, but that's another
issue for another list and I haven't quite got enough information
on that set of problems to make a request for help useful...  In
order to get to the internet with Linux I have to use its docking
station.  No such problem with Windoze.  (Yeah, absolutely
disgusting but that's what's happening.)

> > I concluded that there was a race condition someplace in
> > the disk drivers sequence and went a hunting through the
> > driver code starting from the IDE end.
> >
> > In ide-io.c there is a block comment just before the place
> 
> next time please just paste the code/comments you refer to,
> it will make reading/replying a lot easier

It's a bit difficult but willco...

> > where the i/o request setup routine is called.  It notes
> > that some older chipsets do not like to be interrupted
> > during the setup process.  It also notes that 'massive
> > fs corruption" will result if the setup process is
> > interrupted.  (Hmmmm.)  A search of the kernel mailing list
> > archives found one note on this piece of code where commands
> > were getting lost (rarely) in an SMP environment.  I thought
> 
> do you have a link to this note handy?

not handy but... linux-kernel 2003-03-27 23:54:19

Hmmm. I don't remember what my original search criteria was, but
this search brought up a Lot more stuff... mostly irrelevant...

> if I guess right, you are referring to this part of ide-io.c (2.6.7 here):
> 
>                 /*
>                  * Some systems have trouble with IDE IRQs arriving while
>                  * the driver is still setting things up.  So, here we disable
>                  * the IRQ used by this interface while the request is being started.
>                  * This may look bad at first, but pretty much the same thing
>                  * happens anyway when any interrupt comes in, IDE or otherwise
>                  *  -- the kernel masks the IRQ while it is being handled.
>                  */
>                 if (hwif->irq != masked_irq)
>                         disable_irq_nosync(hwif->irq);
> 
> mask IRQ used by the IDE port
> 
>                 spin_unlock(&ide_lock);
>                 local_irq_enable();
> 
> enable local IRQs, IDE IRQ stays masked
> 
>                         /* allow other IRQs while we start this request */
>                 startstop = start_request(drive, rq);
> 
> start request
> 
>                 spin_lock_irq(&ide_lock);
> 
> disable local IRQs
> 
>                 if (hwif->irq != masked_irq)
>                         enable_irq(hwif->irq);
> 
> unmask IDE IRQ

Yep.  That's the chunk...

> 
> > This is 2.4.22 code, but it has not been changed for 2.4.26.
> > There is some significant changes with 2.6.7, but it is worse
> > if anything.  A little more explanation is probably in order.
> 
> I don't think it's worse, more below.
> 
> > At the top of the routine all interrupts are turned off and
> 
> If you are referring to a __cli() at a top of ide_do_request()
> please notice that comment says "paranoia" - IRQs should be already
> disabled by block layer which does spin_lock_irqsave() earlier.

Yep, but enables and disables should be paired and that is the one
that paired with the enable just before the call to start_request.

> 
> > the hardware group busy bit is set.  This bit is protected by
> > a spinlock, so there should be no need to lock out interrupts
> > while manipulating it.  There are a few other conditions
> > checked next, none requiring interrupt lockout.  So a whole
> 
> Please note that the same spinlock can be accessed from IRQ
> and non-IRQ context so IRQs must be disabled in non-IRQ context
> to prevent deadlocks.

Yep.  There were LOTS of problems in VMS drivers with exactly that
problem.  I had to explain it to customers a few times.  The concept
of spinlock priority and its use to prevent deadlocks and the relation
between priority and interrupt lockout level was more complex in VMS
than in Linux, since the resource allocation model was more complex.
Your design is less complex, thus less prone to problems.  Still, the
Intel interrupt model reminds me of the one on the cheaper PDP-11s,
but that's getting way off topic.  At any rate, the concept is about
20 years old and I have a reasonable idea of what you are saying and
why it is important.  (Spin locks themselves go back at least 30 years.)

> __cli() there is just "paranoia" and it is gone in 2.6 kernels

That's not quite correct.  There is a check and a BUG() call to assure
that interrupts are disabled on entry in the 2.6 code I've seen.  If I
understand the new code correctly, you've replaced the single interrupt
disable call at the top of this routine by a bunch of similar calls
elsewhere before entering this routine.  That would make interrupt latency
worse, not better.

> > bunch of code has been executing under interrupt lockout when
> > there was no need for the lockout.  Not a huge problem, just
> > strange.  Also, in 2.6, the lockout has to begin before the
> > routine is called which is why I said 2.6 was worse.
> 
> 2.6 is much better - you have one spinlock per block queue while
> in 2.4 you have one global spinlock (io_request_lock) for all
> block requests.

Yep.  That's a little courser than the model I was using on the never
completed OS design I did in the early 70s, but it is better than the
single global lock in 2.4 and way better than the design of many other
OSs I've waded into.  Still, you've got a complete interrupt lockout
in place at the top of this routine which has two bad effects: 1) the
interrupt latency is longer and 2) there is no one place to turn it off
any longer.

> > Then comes the block comment, the all-CPU lockout of completion
> > interrupts, and just when the comment suggests that all
> > interrupts should be turned off, they are turned back on.
> >
> > I can understand the problem the comment might be addressing.
> > The interface could well be sensitive to the timing of the
> > over-all load sequence or a read from the status register
> > checking for 'ready' could bollix the whole setup process.
> > This final setup phase really should be an atomic operation.
> >
> > I've done a little 'playing' with this code.  First I tried
> > just removing the enable call.  It seems to have had absolutely
> > no effect.  The system did NOT hang because the interrupt lock
> > out did not end.  The file system corruption showed up again
> > on applying a subsequent update.  I've also made a slightly
> > more venturesome change.  I pulled the disable at the head of
> > the routine and put it just before the setup call and moved
> > the enable to after the setup call.  I've seen no problems
> 
> please just inline your changes / patches (please use 'diff -u')

If you don't mind, I'd prefer to wait on that.  I asked for comments
because the results of the modification were not exactly as I expected
and further testing seems to indicate that this may not be the real
source of the problem.  Since my understanding is incomplete, any patch
I post is likely to be wrong in one way or another.

> > process for an IDE controller could be fairly lengthy and
> > that interrupt latency could become a problem, and enabling
> > local interrupts would relieve the problem, but not at the
> > cost of corrupting system integrity?
> 
> yes, if this is a issue here but we don't know yet
> 
> > C) What is the procedure for getting it included?
> 
> sending patch ('diff -u' format) with a description to Maintainer
> (happens to be me in case of IDE ;-) and linux-ide mailing list,
> also linux-kernel mailing list if the patch is important / needs
> more testers etc.)

Thanks.  I was hoping to get your attention, but I did not want to
presume on your time, thus the post to linux-ide.  (If you don't
mind, linux-kernel is way too noisy.  I subscribed once a good while
ago and turned it off because I could not handle the volume of just
plain junk that gets posted to that list.  Linus must be some kind of
saint if he wades through all of it...)

> You may also want to read
>         Documentation/SubmittingPatches

Done that even before I posted this.  Since this problem is not even
close to the reliable patch stage, and is down=level on even the current
2.4 series, much less the 2.6 series, posting a patch did not seem to be
appropriate.

> and
>         Documentation/CodingStyle
> from the linux kernel source package.

Which is honored more in the breach than the observance... (OK. I
exaggerate.)

> 
> > I've been going through the linux-ide archives and noticed
> > that there have been a number of mystery fs corruption issues
> > that just disappeared.  This might be related.  There was also
> > a DMA problem that might have been relevant, but I know it does
> > not apply in this case since "hdparm" shows DMA turned off by
> > default on this machine.
> 
> dmesg output would be helpful, the same goes for lspci output

That is an important part of this issue.  Nothing shows in dmesg
until it is much too late.  The read errors get reported, but no
write errors.  There should be a 'pirntk' in 'ide_abort' and 
'idedisk_abort' (I may have the routine names wrong, I'm doing
this from memory) but there isn't,  (I'll post a patch for that fix
if you want.) so I can't tell if the problem is coming down from
the upper layers.  I also think there should be a 'printk' associated
with the posting of the immediate stop command.  (Again, this is from
memory.  I'll post a patch with all this if you want me to.  It will
not fix any problems, but might shed light.)

> You may also consider opening bug at http://bugme.osdl.org
> and attaching all useful files to a bug entry (dmesg, lspci
> and hdparm outputs, kernel config etc.).

If it was a problem with relevant dmesg or hdparm results, I would
have done exactly that.  The problem is fairly reproducible with my
hardware, in the sense that it will definitely happen within a two
hour window while doing an installation from floppy and CD on a
stand-alone machine, but there is no command that I can type that will
always produce the problem within a reasonable amount of time after
that command is issued.  I've been trying to produce an installation
floppy where I actually have the .config file that goes with the kernel
that is blowing up, but dissecting the RedHat installation image is not
exactly easy, and the hardware is probably listed as 'unsupported' by
them so they will just duck the issue.  (Since the machine can NOT boot
from a CD, I can not install FC2 which does 2.6 and it does not have
enough memory to install E3.  My understanding of the kernel is probably
better than my knowledge of all the command line tools and procedures.)

Still, this is an important problem.  File system corruption is just not
something an OS should allow to happen unless the user does something
extreme.

> 
> Regards,
> Bartlomiej

Max

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-10 19:25   ` Max T. Woodbury
@ 2004-07-10 20:07     ` Bartlomiej Zolnierkiewicz
  2004-07-11 15:02       ` Max T. Woodbury
  2004-07-12 15:15       ` Max T. Woodbury
  0 siblings, 2 replies; 16+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2004-07-10 20:07 UTC (permalink / raw)
  To: Max T. Woodbury; +Cc: linux-ide

On Saturday 10 of July 2004 21:25, Max T. Woodbury wrote:
> Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> >
> > On Tuesday 06 of July 2004 00:51, Max T. Woodbury wrote:
> > > (The fact that the machine runs other OSs without noticeable
> > > problems is also an indication that the underlying hardware
> > > is in working order.  Only the system software and disk
> > > drive changed between the two setups and I have explained
> > > why I do not think it is the disk drive.)
> >
> > disk drive changed?  please explain
>
> The drives in the Thinkpad 760 are mounted in caddies that can
> be easily exchanged when the power is off.  I have three drives.
> One runs the machine as a GPS, the second as a code development
> Windows box and the third is my Linux code development machine.
> I'm having a fair amount of trouble getting the Linux setup to do
> what I want it to do.  Not only did the Linux install flake out,
> but I still can't get the PCMCIA sockets working, but that's another
> issue for another list and I haven't quite got enough information
> on that set of problems to make a request for help useful...  In
> order to get to the internet with Linux I have to use its docking
> station.  No such problem with Windoze.  (Yeah, absolutely
> disgusting but that's what's happening.)

Are you sure that 'Linux' disk is okay?
http://smartmontool.sf.net

> > __cli() there is just "paranoia" and it is gone in 2.6 kernels
>
> That's not quite correct.  There is a check and a BUG() call to assure
> that interrupts are disabled on entry in the 2.6 code I've seen.  If I
> understand the new code correctly, you've replaced the single interrupt
> disable call at the top of this routine by a bunch of similar calls
> elsewhere before entering this routine.  That would make interrupt latency
> worse, not better.

This is not correct - __cli() is really just a "paranoia", you may remove
it if you like and it shouldn't change anything (but we would like to know
if it changes something ie. fixes fs corruption :-).

Please take a look at generic_unplug_device() in drivers/block/ll_rw_blk.c:

spin_lock_irq() disables IRQs
__generic_unplug_device() calls queue->request_fn (ide_do_request)
spin_unlock_irq() enables IRQs

The only difference between 2.4 and 2.6 is that 2.4 is using
spin_lock_irqsave() / spin_unlock_irqrestore() variants.

> > > bunch of code has been executing under interrupt lockout when
> > > there was no need for the lockout.  Not a huge problem, just
> > > strange.  Also, in 2.6, the lockout has to begin before the
> > > routine is called which is why I said 2.6 was worse.
> >
> > 2.6 is much better - you have one spinlock per block queue while
> > in 2.4 you have one global spinlock (io_request_lock) for all
> > block requests.
>
> Yep.  That's a little courser than the model I was using on the never
> completed OS design I did in the early 70s, but it is better than the
> single global lock in 2.4 and way better than the design of many other
> OSs I've waded into.  Still, you've got a complete interrupt lockout
> in place at the top of this routine which has two bad effects: 1) the
> interrupt latency is longer and 2) there is no one place to turn it off
> any longer.

'lockout' happens earlier both in 2.4 and 2.6 -> generic_unplug_device().

> Thanks.  I was hoping to get your attention, but I did not want to
> presume on your time, thus the post to linux-ide.  (If you don't
> mind, linux-kernel is way too noisy.  I subscribed once a good while
> ago and turned it off because I could not handle the volume of just
> plain junk that gets posted to that list.  Linus must be some kind of
> saint if he wades through all of it...)

well, I don't read everything and I guess Linus does the same 8)

> > > I've been going through the linux-ide archives and noticed
> > > that there have been a number of mystery fs corruption issues
> > > that just disappeared.  This might be related.  There was also
> > > a DMA problem that might have been relevant, but I know it does
> > > not apply in this case since "hdparm" shows DMA turned off by
> > > default on this machine.
> >
> > dmesg output would be helpful, the same goes for lspci output
>
> That is an important part of this issue.  Nothing shows in dmesg
> until it is much too late.  The read errors get reported, but no
> write errors.  There should be a 'pirntk' in 'ide_abort' and
> 'idedisk_abort' (I may have the routine names wrong, I'm doing
> this from memory) but there isn't,  (I'll post a patch for that fix
> if you want.) so I can't tell if the problem is coming down from
> the upper layers.  I also think there should be a 'printk' associated
> with the posting of the immediate stop command.  (Again, this is from
> memory.  I'll post a patch with all this if you want me to.  It will
> not fix any problems, but might shed light.)

I believe that dmesg/lspci would be useful for me or other people reading
this because it allows us to know a bit more about this specific hardware
('Thinkpad 760' is really not enough).

> Still, this is an important problem.  File system corruption is just not
> something an OS should allow to happen unless the user does something
> extreme.

Without more info we won't go further in solving this issue.

Bartlomiej


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-10 20:07     ` Bartlomiej Zolnierkiewicz
@ 2004-07-11 15:02       ` Max T. Woodbury
  2004-07-12 15:15       ` Max T. Woodbury
  1 sibling, 0 replies; 16+ messages in thread
From: Max T. Woodbury @ 2004-07-11 15:02 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz, linux-ide

Bartlomiej Zolnierkiewicz wrote:
> 
> I believe that dmesg/lspci would be useful for me or other people reading
> this because it allows us to know a bit more about this specific hardware
> ('Thinkpad 760' is really not enough).
> 

lspci =vv output:

00:00.0 Host bridge: Intel Corp. 430MX - 82437MX Mob. System Ctrlr (MTSC) & 82438MX Data Path (MTDP) (rev 02)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 32

00:01.0 ISA bridge: Intel Corp. 82371FB PIIX ISA [Triton I] (rev 02)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0

00:02.0 CardBus bridge: Texas Instruments PCI1130 (rev 02)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 168, cache line size 08
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at 10812000 (32-bit, non-prefetchable) [size=4K]
	Bus: primary=00, secondary=01, subordinate=03, sec-latency=176
	BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt- PostWrite-
	16-bit legacy interface ports at 0001

00:02.1 CardBus bridge: Texas Instruments PCI1130 (rev 02)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 168, cache line size 08
	Interrupt: pin B routed to IRQ 0
	Region 0: Memory at 10811000 (32-bit, non-prefetchable) [size=4K]
	Bus: primary=00, secondary=04, subordinate=06, sec-latency=176
	BridgeCtl: Parity- SERR- ISA- VGA- MAbort- >Reset+ 16bInt- PostWrite-
	16-bit legacy interface ports at 0001

00:03.0 VGA compatible controller: Trident Microsystems TGUI 9660/938x/968x (rev d3) (prog-if 00 [VGA])
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at 08000000 (32-bit, non-prefetchable) [size=4M]
	Region 1: Memory at 08400000 (32-bit, non-prefetchable) [size=64K]
	Region 2: Memory at 08800000 (32-bit, non-prefetchable) [size=4M]
	Expansion ROM at 000c0000 [disabled] [size=64K]

00:05.0 Multimedia video controller: IBM MPEG PCI Bridge
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Interrupt: pin A routed to IRQ 11
	Region 0: Memory at 10810000 (32-bit, non-prefetchable) [size=256]


Hmm. No IDE interface... It must be legacy stuff behind the ISA bridge...
I think I need to look at the man page again...  If I find anything, I'll 
post this with the additional stuff.

Still working on the dmesg stuff...

Max

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-10 20:07     ` Bartlomiej Zolnierkiewicz
  2004-07-11 15:02       ` Max T. Woodbury
@ 2004-07-12 15:15       ` Max T. Woodbury
  2004-07-12 15:47         ` Bartlomiej Zolnierkiewicz
  1 sibling, 1 reply; 16+ messages in thread
From: Max T. Woodbury @ 2004-07-12 15:15 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: linux-ide

Here's the dmesg output:

Linux version 2.4.22-1.2115.nptlBOOT (bhcompile@bugs.devel.redhat.com) (gcc version 3.2.3 20030422 (Red Hat Linux 3.2.3-6)) #1 Wed Oct 29 15:19:13 EST 2003
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000005000000 (usable)
 BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
80MB LOWMEM available.
On node 0 totalpages: 20480
zone(0): 4096 pages.
zone(1): 16384 pages.
zone(2): 0 pages.
DMI not present.
Kernel command line: initrd=initrd.img ramdisk_size=8192 BOOT_IMAGE=vmlinuz rescue
Initializing CPU#0
Detected 132.634 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 264.60 BogoMIPS
Memory: 77860k/81920k available (1264k kernel code, 3672k reserved, 362k data, 108k init, 0k highmem)
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 4096 (order: 2, 16384 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
Intel Pentium with F0 0F bug - workaround enabled.
CPU:     After generic, caps: 000001bf 00000000 00000000 00000000
CPU:             Common caps: 000001bf 00000000 00000000 00000000
CPU: Intel Pentium 75 - 200 stepping 0c
Checking 'hlt' instruction... OK.
Checking for popad bug... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: none
PCI: PCI BIOS revision 2.10 entry at 0xfd930, last bus=6
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
BIOS EDD facility v0.09 2003-Jan-22, 0 devices found
EDD information not available.
Starting kswapd
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
loop: loaded (max 8 devices)
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIXa: IDE controller at PCI slot 00:01.0
PIIXa: chipset revision 2
PIIXa: not 100% native mode: will probe irqs later
PIIXa: neither IDE port enabled (BIOS)
hda: IBM-DARA-209000, ATA DISK drive
hdb: HITACHI CDR-S100, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 12594960 sectors (6449 MB) w/418KiB Cache, CHS=833/240/63
hdb: attached ide-cdrom driver.
hdb: ATAPI 20X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.12
Partition check:
 hda: hda1 hda2 hda3 hda4
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
host/usb-uhci.c: $Revision: 1.275 $ time 15:20:47 Oct 29 2003
host/usb-uhci.c: High bandwidth mode enabled
host/usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz>
hid-core.c: USB HID support drivers
mice: PS/2 mouse device common for all mice
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 8192 bind 16384)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 533k freed
VFS: Mounted root (ext2 filesystem).
vga16fb: initializing
vga16fb: mapped to 0xc00a0000
Console: switching to colour frame buffer device 80x30
fb0: VGA16 VGA frame buffer device
SCSI subsystem driver Revision: 1.00
ISO 9660 Extensions: RRIP_1991A
Unable to identify CD-ROM format.
VFS: Can't find ext2 filesystem on dev loop(7,0).
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
raid5: measuring checksumming speed
   8regs     :   106.400 MB/sec
   32regs    :    97.600 MB/sec
raid5: using function: 8regs (106.400 MB/sec)
md: raid5 personality registered as nr 4
Journalled Block Device driver loaded
LVM version 1.0.5+(22/07/2002) module loaded
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,4), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,4), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,4), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding Swap: 514072k swap-space (priority -1)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-12 15:15       ` Max T. Woodbury
@ 2004-07-12 15:47         ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 16+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2004-07-12 15:47 UTC (permalink / raw)
  To: Max T. Woodbury; +Cc: linux-ide


On Monday 12 of July 2004 17:15, Max T. Woodbury wrote:
> PIIXa: IDE controller at PCI slot 00:01.0
> PIIXa: chipset revision 2
> PIIXa: not 100% native mode: will probe irqs later
> PIIXa: neither IDE port enabled (BIOS)

Do you have IDE ports disabled in BIOS?

This prevents piix IDE driver from working and instead you are using
generic IDE driver which is much slower (no DMA!) and may be unsafe.

Bartlomiej


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
@ 2004-07-11 14:38 Max T. Woodbury
  0 siblings, 0 replies; 16+ messages in thread
From: Max T. Woodbury @ 2004-07-11 14:38 UTC (permalink / raw)
  To: linux-ide

Bartlomiej Zolnierkiewicz wrote:
> On Saturday 10 of July 2004 21:25, Max T. Woodbury wrote:
> > Bartlomiej Zolnierkiewicz wrote:
> > > On Tuesday 06 of July 2004 00:51, Max T. Woodbury wrote:
> > > > (The fact that the machine runs other OSs without noticeable
> > > > problems is also an indication that the underlying hardware
> > > > is in working order.  Only the system software and disk
> > > > drive changed between the two setups and I have explained
> > > > why I do not think it is the disk drive.)
> > >
> > > disk drive changed?  please explain
> >
> > The drives in the Thinkpad 760 are mounted in caddies that can
> > be easily exchanged when the power is off.  I have three drives.
> > One runs the machine as a GPS, the second as a code development
> > Windows box and the third is my Linux code development machine.
> > I'm having a fair amount of trouble getting the Linux setup to do
> > what I want it to do.  Not only did the Linux install flake out,
> > but I still can't get the PCMCIA sockets working, but that's another
> > issue for another list and I haven't quite got enough information
> > on that set of problems to make a request for help useful...  In
> > order to get to the internet with Linux I have to use its docking
> > station.  No such problem with Windoze.  (Yeah, absolutely
> > disgusting but that's what's happening.)
>
> Are you sure that 'Linux' disk is okay?
> http://smartmontool.sf.net

Yes.  The CDs all passed the scanner RedHat provided and they installed
correctly on a desktop system. The drive I used for linux was zeroed
and it passed a read/write badblock scan on the whole disk and several
on the partition in question.  Still, it could be that the drive is
getting the data into its buffer correctly but not out to the magnetic
media, but the fact that the errors move around so much indicates that
it is not the media itself that is bad.  My next comment is on the
fringes of my competence and is mostly opinion:  If the drive is
failing to actually transfer the data from its buffer to the media,
there should be an error indication of some sort and it should be passed
back to the OS where it can be logged.  I didn't see any such code, but
I'm not sure I'd recognize it if I saw it.

I do wish there were a write check option.  It does nasty things to
performance, but improves system integrity.  Of course I haven't checked
the upper layers in the system yet.  There might be such an option at the
block cache or file system levels and I'd not have seen it yet. It's
been about four years since I went through that code last so I'm foggy on
the details and it has almost certainly changed since I looked at it.

> > > __cli() there is just "paranoia" and it is gone in 2.6 kernels
> >
> > That's not quite correct.  There is a check and a BUG() call to assure
> > that interrupts are disabled on entry in the 2.6 code I've seen.  If I
> > understand the new code correctly, you've replaced the single interrupt
> > disable call at the top of this routine by a bunch of similar calls
> > elsewhere before entering this routine.  That would make interrupt latency
> > worse, not better.
>
> This is not correct - __cli() is really just a "paranoia", you may remove
> it if you like and it shouldn't change anything (but we would like to know
> if it changes something ie. fixes fs corruption :-).

Hmm. At the place it was, it was harmless paranoia, but I thought you might
not have been being paranoid enough.  You should not have taken the lockout
off until after the command was actually started.  But doing that was not
enough to fix the problem...

I did a quick scan of the IRQ level code in the arch/i386 tree to get an idea o
f what it did and recognized the pattern from other OSs I've seen, so I did not
look at all the details closely.  I think I saw CLI and STI pairs that might
have nulled the CLI in ide-io.c in there, but I did not try to figure exactly
which conditionals were on or off so I was not certain of the exact set of code
that was operational.  As I said, I was scanning for an overall pattern.

> Please take a look at generic_unplug_device() in drivers/block/ll_rw_blk.c:

If I remember correctly, that's the block cache layer.  The last time I looked
at it, it didn't do hot-plug, so I need to look at it again...  I'll do as you
suggest.

> > > > bunch of code has been executing under interrupt lockout when
> > > > there was no need for the lockout.  Not a huge problem, just
> > > > strange.  Also, in 2.6, the lockout has to begin before the
> > > > routine is called which is why I said 2.6 was worse.
> > >
> > > 2.6 is much better - you have one spinlock per block queue while
> > > in 2.4 you have one global spinlock (io_request_lock) for all
> > > block requests.
> >
> > Yep.  That's a little courser than the model I was using on the never
> > completed OS design I did in the early 70s, but it is better than the
> > single global lock in 2.4 and way better than the design of many other
> > OSs I've waded into.  Still, you've got a complete interrupt lockout
> > in place at the top of this routine which has two bad effects: 1) the
> > interrupt latency is longer and 2) there is no one place to turn it off
> > any longer.
>
> 'lockout' happens earlier both in 2.4 and 2.6 -> generic_unplug_device().

Hmm.  I'll have to look at that layer again.  I think we're talking about
a different level of lockout.  I suspect that you mean that this is a
critical region of code and only one thread of execution can be in it at
one time.  I was thinking 'atomic' in the sense that it could not be
interrupted and that its timing was consistent.  I'm not sure I am using
the term in the same way everyone else does.

> > > > I've been going through the linux-ide archives and noticed
> > > > that there have been a number of mystery fs corruption issues
> > > > that just disappeared.  This might be related.  There was also
> > > > a DMA problem that might have been relevant, but I know it does
> > > > not apply in this case since "hdparm" shows DMA turned off by
> > > > default on this machine.
> > >
> > > dmesg output would be helpful, the same goes for lspci output
> >
> > That is an important part of this issue.  Nothing shows in dmesg
> > until it is much too late.  The read errors get reported, but no
> > write errors.  There should be a 'pirntk' in 'ide_abort' and
> > 'idedisk_abort' (I may have the routine names wrong, I'm doing
> > this from memory) but there isn't,  (I'll post a patch for that fix
> > if you want.) so I can't tell if the problem is coming down from
> > the upper layers.  I also think there should be a 'printk' associated
> > with the posting of the immediate stop command.  (Again, this is from
> > memory.  I'll post a patch with all this if you want me to.  It will
> > not fix any problems, but might shed light.)
>
> I believe that dmesg/lspci would be useful for me or other people reading
> this because it allows us to know a bit more about this specific hardware
> ('Thinkpad 760' is really not enough).

Yep. lspci will be OS version independent.  I'll get you that shortly.
(I booted the original FC1 distribution in an attempt to get the PCMCIA
stuff to work and it promptly (well after 12 hours) messed up the file
system.)  I'm in the process of cleaning up the disk again.

'dmesg' is somewhat dependent on the OS features configured.  I'll get
you the one from the installation boot disk where the problem was first
noticed.  I take it you want the initial boot part, not the part after
rc.sysinit or whatever is driving the installation process takes over.
It also makes a difference if the machine is docked or undocked.  I'll
get you one of the undocked machine.  Both configurations have the problem
but the undocked configuration is simpler.

 > > Still, this is an important problem.  File system corruption is just not
> > something an OS should allow to happen unless the user does something
> > extreme.
>
> Without more info we won't go further in solving this issue.

I really don't expect you to solve the problem for me.  I have to do
most of the work myself.  I'd recognized that I was on the borders of my
competence and have asked for advice.  I recognized that I was making
assumptions about the code that I needed to check.  While you haven't
exactly answered the questions I asked, you have responded in a way that
provides most of the information I need and pointed me to more places to
look for answers.

There was also the possibility that this problem has an impact beyond my
personal involvement.  I'm not certain that it does, but it would be
irresponsible of me to ignore such a possibility.  (And I just arrogant
enough that I think I have such a responsibility...)

Thank you for your help so far.

Max

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
@ 2004-07-12 17:52 Max T. Woodbury
  2004-07-12 18:35 ` Eric D. Mudama
  0 siblings, 1 reply; 16+ messages in thread
From: Max T. Woodbury @ 2004-07-12 17:52 UTC (permalink / raw)
  To: linux-ide

Bartlomiej Zolnierkiewicz wrote:
> 
> On Monday 12 of July 2004 17:15, Max T. Woodbury wrote:
> > PIIXa: IDE controller at PCI slot 00:01.0
> > PIIXa: chipset revision 2
> > PIIXa: not 100% native mode: will probe irqs later
> > PIIXa: neither IDE port enabled (BIOS)
> 
> Do you have IDE ports disabled in BIOS?
> 
> This prevents piix IDE driver from working and instead you are using
> generic IDE driver which is much slower (no DMA!) and may be unsafe.
> 
> Bartlomiej

The Thinkpad 760 BIOS setup does not make this easy.  It's this pretty
GUI thingy with a humming bird for a cursor and practically no text
anywhere.  Very international.  There's a more comprehensive control
program under 'doz.  I'll try to find something on the thinkpad lists...
I don't remember seeing anything in the user docs on this.

Still, why would PIO mode be unsafe?  (I can see slower, but I don't
expect speed from this beast.  Oh well.  Thanks for the pointer.)

Hmm.  Going to DMA would almost certainly make the symptoms disappear
because the I/O timing would change and whatever was screwing up the
I/O would happen at a non-critical point.  Much like the fact that I
could get the symptoms to disappear by by putting a computational load
on the machine.  It would NOT actually solve the underlying problem,
whatever that problem really is.

Max

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-12 17:52 Max T. Woodbury
@ 2004-07-12 18:35 ` Eric D. Mudama
  2004-07-16  6:12   ` Max T. Woodbury
  0 siblings, 1 reply; 16+ messages in thread
From: Eric D. Mudama @ 2004-07-12 18:35 UTC (permalink / raw)
  To: Max T. Woodbury; +Cc: linux-ide

On Mon, Jul 12 at 13:52, Max T. Woodbury wrote:
>Still, why would PIO mode be unsafe?  (I can see slower, but I don't
>expect speed from this beast.  Oh well.  Thanks for the pointer.)

PIO has no data integrity check, so bogus cables that glitch the data
will not be detected.  Not sure if that is what he was talking about,
but is definitely a problem for PIO.

--eric


-- 
Eric D. Mudama
edmudama@mail.bounceswoosh.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-12 18:35 ` Eric D. Mudama
@ 2004-07-16  6:12   ` Max T. Woodbury
  2004-07-16  7:02     ` Jens Axboe
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Max T. Woodbury @ 2004-07-16  6:12 UTC (permalink / raw)
  To: Eric D. Mudama; +Cc: linux-ide

"Eric D. Mudama" wrote:
> 
> On Mon, Jul 12 at 13:52, Max T. Woodbury wrote:
> >Still, why would PIO mode be unsafe?  (I can see slower, but I don't
> >expect speed from this beast.  Oh well.  Thanks for the pointer.)
> 
> PIO has no data integrity check, so bogus cables that glitch the data
> will not be detected.  Not sure if that is what he was talking about,
> but is definitely a problem for PIO.

Huh? Unless something major has changed since the last time I looked at
DMA hardware (and it has been a few years), DMA uses the same transfer
sequence from the devices point of view as PIO.  The fact that the 
transfer is under the control of another device rather than a program 
should be transparent to the target device.  Impedance mismatches,
reflections and constructive and destructive interference caused by
cable problems don't care about who's in control of the busses.

I can see a possible problem with cache consistency causing problems
with PIO, but there are similar (abet in some sense inverted or
reversed) problems with DMA.

max

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-16  6:12   ` Max T. Woodbury
@ 2004-07-16  7:02     ` Jens Axboe
  2004-07-16 16:33       ` Max T. Woodbury
  2004-07-16  7:06     ` Jeff Garzik
  2004-07-16 17:45     ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2004-07-16  7:02 UTC (permalink / raw)
  To: Max T. Woodbury; +Cc: Eric D. Mudama, linux-ide

On Fri, Jul 16 2004, Max T. Woodbury wrote:
> "Eric D. Mudama" wrote:
> > 
> > On Mon, Jul 12 at 13:52, Max T. Woodbury wrote:
> > >Still, why would PIO mode be unsafe?  (I can see slower, but I don't
> > >expect speed from this beast.  Oh well.  Thanks for the pointer.)
> > 
> > PIO has no data integrity check, so bogus cables that glitch the data
> > will not be detected.  Not sure if that is what he was talking about,
> > but is definitely a problem for PIO.
> 
> Huh? Unless something major has changed since the last time I looked at
> DMA hardware (and it has been a few years), DMA uses the same transfer
> sequence from the devices point of view as PIO.  The fact that the 
> transfer is under the control of another device rather than a program 
> should be transparent to the target device.  Impedance mismatches,
> reflections and constructive and destructive interference caused by
> cable problems don't care about who's in control of the busses.
> 
> I can see a possible problem with cache consistency causing problems
> with PIO, but there are similar (abet in some sense inverted or
> reversed) problems with DMA.

Yes that's very clever of you. But read what Eric writes - PIO has no
data integrity check. DMA transfers are crc'ed so you know if something
goes bad between device and host in the data phase, with PIO you do not.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-16  7:02     ` Jens Axboe
@ 2004-07-16 16:33       ` Max T. Woodbury
  2004-07-16 17:57         ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Max T. Woodbury @ 2004-07-16 16:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric D. Mudama, linux-ide

Jens Axboe wrote:
> 
> On Fri, Jul 16 2004, Max T. Woodbury wrote:
> > "Eric D. Mudama" wrote:
> > >
> > > On Mon, Jul 12 at 13:52, Max T. Woodbury wrote:
> > > >Still, why would PIO mode be unsafe?  (I can see slower, but I don't
> > > >expect speed from this beast.  Oh well.  Thanks for the pointer.)
> > >
> > > PIO has no data integrity check, so bogus cables that glitch the data
> > > will not be detected.  Not sure if that is what he was talking about,
> > > but is definitely a problem for PIO.
> >
> > Huh? Unless something major has changed since the last time I looked at
> > DMA hardware (and it has been a few years), DMA uses the same transfer
> > sequence from the devices point of view as PIO.  The fact that the
> > transfer is under the control of another device rather than a program
> > should be transparent to the target device.  Impedance mismatches,
> > reflections and constructive and destructive interference caused by
> > cable problems don't care about who's in control of the busses.
> >
> > I can see a possible problem with cache consistency causing problems
> > with PIO, but there are similar (abet in some sense inverted or
> > reversed) problems with DMA.
> 
> Yes that's very clever of you. But read what Eric writes - PIO has no
> data integrity check. DMA transfers are crc'ed so you know if something
> goes bad between device and host in the data phase, with PIO you do not.

Sorry, NO.  From the device point of view, DMA and PIO are indistinguishable.
Both have CRCs on some busses and neither have CRCs on others.  There are
ALWAYS CRCs on transfers across the drive interface cables.  This is controlled
by the IDE/CPU interface chip and not by the DMA hardware.  The transfer of
the CRC is triggered by the termination of data transfer which happens with
both DMA and PIO.  These are design issues that go back at least thirty
years and are generally well understood.

Bartlomiej was talking about timing register setup problems.  They can be
messed up for either mode.  They are separate for DMA and PIO.  His point
was that the default driver left PIO timing setup to the hardware and BIOS
while the special drivers sometimes included more specific initialization.
These are fairly new problems arising from multi-tiered bus technologies
like PCI.

max

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-16 16:33       ` Max T. Woodbury
@ 2004-07-16 17:57         ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2004-07-16 17:57 UTC (permalink / raw)
  To: Max T. Woodbury; +Cc: Eric D. Mudama, linux-ide

On Fri, Jul 16 2004, Max T. Woodbury wrote:
> Jens Axboe wrote:
> > 
> > On Fri, Jul 16 2004, Max T. Woodbury wrote:
> > > "Eric D. Mudama" wrote:
> > > >
> > > > On Mon, Jul 12 at 13:52, Max T. Woodbury wrote:
> > > > >Still, why would PIO mode be unsafe?  (I can see slower, but I don't
> > > > >expect speed from this beast.  Oh well.  Thanks for the pointer.)
> > > >
> > > > PIO has no data integrity check, so bogus cables that glitch the data
> > > > will not be detected.  Not sure if that is what he was talking about,
> > > > but is definitely a problem for PIO.
> > >
> > > Huh? Unless something major has changed since the last time I looked at
> > > DMA hardware (and it has been a few years), DMA uses the same transfer
> > > sequence from the devices point of view as PIO.  The fact that the
> > > transfer is under the control of another device rather than a program
> > > should be transparent to the target device.  Impedance mismatches,
> > > reflections and constructive and destructive interference caused by
> > > cable problems don't care about who's in control of the busses.
> > >
> > > I can see a possible problem with cache consistency causing problems
> > > with PIO, but there are similar (abet in some sense inverted or
> > > reversed) problems with DMA.
> > 
> > Yes that's very clever of you. But read what Eric writes - PIO has no
> > data integrity check. DMA transfers are crc'ed so you know if something
> > goes bad between device and host in the data phase, with PIO you do not.
> 
> Sorry, NO.  From the device point of view, DMA and PIO are
> indistinguishable.

That's nonsense. Even the commands are different.

> Both have CRCs on some busses and neither have CRCs on others.  There
> are ALWAYS CRCs on transfers across the drive interface cables.  This
> is controlled by the IDE/CPU interface chip and not by the DMA
> hardware.  The transfer of the CRC is triggered by the termination of
> data transfer which happens with both DMA and PIO.  These are design
> issues that go back at least thirty years and are generally well
> understood.

So the icrc bit is where on the non-dma read commands? I repeat, if you
are in pio mode, you will not notice crc differences in data transferred
between host and device.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-16  6:12   ` Max T. Woodbury
  2004-07-16  7:02     ` Jens Axboe
@ 2004-07-16  7:06     ` Jeff Garzik
  2004-07-16 17:45     ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 16+ messages in thread
From: Jeff Garzik @ 2004-07-16  7:06 UTC (permalink / raw)
  To: Max T. Woodbury; +Cc: Eric D. Mudama, linux-ide

Max T. Woodbury wrote:
> "Eric D. Mudama" wrote:
> 
>>On Mon, Jul 12 at 13:52, Max T. Woodbury wrote:
>>
>>>Still, why would PIO mode be unsafe?  (I can see slower, but I don't
>>>expect speed from this beast.  Oh well.  Thanks for the pointer.)
>>
>>PIO has no data integrity check, so bogus cables that glitch the data
>>will not be detected.  Not sure if that is what he was talking about,
>>but is definitely a problem for PIO.
> 
> 
> Huh? Unless something major has changed since the last time I looked at
> DMA hardware (and it has been a few years), DMA uses the same transfer
> sequence from the devices point of view as PIO.  The fact that the 


Ultra DMA does CRC, PIO does not.  Thus will cable glitches be detected.

	Jeff



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: ide-io.c, ide_do_request -- race condition?
  2004-07-16  6:12   ` Max T. Woodbury
  2004-07-16  7:02     ` Jens Axboe
  2004-07-16  7:06     ` Jeff Garzik
@ 2004-07-16 17:45     ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2004-07-16 17:45 UTC (permalink / raw)
  To: Max T. Woodbury; +Cc: Eric D. Mudama, linux-ide


> Huh? Unless something major has changed since the last time I looked at
> DMA hardware (and it has been a few years), DMA uses the same transfer
> sequence from the devices point of view as PIO.  The fact that the 
> transfer is under the control of another device rather than a program 
> should be transparent to the target device.  Impedance mismatches,
> reflections and constructive and destructive interference caused by
> cable problems don't care about who's in control of the busses.
> 
> I can see a possible problem with cache consistency causing problems
> with PIO, but there are similar (abet in some sense inverted or
> reversed) problems with DMA.

Actually, it's not the same transfer sequence (and it's not the same
ATA commands neither). Look for an ATA spec and check out the signals
on the wire, DMA is different at the device level as well.

In addition, U/DMA does CRC

Ben.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-07-16 17:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-05 22:51 ide-io.c, ide_do_request -- race condition? Max T. Woodbury
2004-07-07 19:43 ` Bartlomiej Zolnierkiewicz
2004-07-10 19:25   ` Max T. Woodbury
2004-07-10 20:07     ` Bartlomiej Zolnierkiewicz
2004-07-11 15:02       ` Max T. Woodbury
2004-07-12 15:15       ` Max T. Woodbury
2004-07-12 15:47         ` Bartlomiej Zolnierkiewicz
  -- strict thread matches above, loose matches on Subject: below --
2004-07-11 14:38 Max T. Woodbury
2004-07-12 17:52 Max T. Woodbury
2004-07-12 18:35 ` Eric D. Mudama
2004-07-16  6:12   ` Max T. Woodbury
2004-07-16  7:02     ` Jens Axboe
2004-07-16 16:33       ` Max T. Woodbury
2004-07-16 17:57         ` Jens Axboe
2004-07-16  7:06     ` Jeff Garzik
2004-07-16 17:45     ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).