From: Daniel Tram Lux <daniel@starbattle.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Rob Love <rml@ximian.com>,
Marcelo Tosatti <marcelo.tosatti@cyclades.com>,
steve@drifthost.com, James Bourne <jbourne@hardrock.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Gergely Tamas <dice@mfa.kfki.hu>,
Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Subject: Re: no DRQ after issuing WRITE was Re: 2.4.23-uv3 patch set released
Date: Sat, 03 Jan 2004 20:27:54 +0100 [thread overview]
Message-ID: <3FF717BA.2000205@starbattle.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0401031024090.20823@home.osdl.org>
Linus Torvalds wrote:
>On Sat, 3 Jan 2004, Daniel Tram Lux wrote:
>
>
>>I tried setting the timeout up as a first fix, it also decreased the
>>frequency of the error,
>>however it did not get rid of the error.
>>
>>
>
>That is scary beyond belief.
>
>Basically you doubled your timeout, and you certainly _should_ have seen
>at least one more status read from the extra 50 msec you waited. And since
>your patch (which fixes it) only adds _one_ status read, that implies that
>the extra 50 msec timeout didn't get a single time through the loop.
>
>
>
>>The device the error occurs with is a cf card. The error also occurs
>>much more frequently in 2.4.23 than in 2.4.20 (but it can be provoked in
>>2.4.20). Neither use the preemption patch and both are from kernel.org.
>>The platform is based on an AMD Elan processor which is a 486 compatible
>>processor, running at 133 Mhz. The IDE subsytem does not use any extra
>>drivers and is not a PCI ide chipset.
>>
>>
>
>Yes, that path is only used for PIO writes, so it's clearly not a
>high-performance IDE setup. Adn yes, I guess with a slow enough CPU and
>enough interrupt load, you literally could spend 50ms just handling irqs.
>Still, that is pretty damn scary. But I guess the load:
>
>
>
>> ... there is a flood ping running and the machine is being
>>flood pinged + there is traffic on three serial ports (RS485).
>>
>>
>
>is pretty extreme.
>
These systems are going to Texas and I am in Denmark, so I wanted to be
sure that the
system also works under heavy load. This is not the typical use however ;-).
>
>
>
>>I saw two possibilities, either disabeling the interrupts while first
>>reading the status and then checking the timeout, after which the
>>interrupts would be enabled again. Or to just make one extra check after
>>the timout has expired because that is cheaper than returning, failing
>>and then resetting the drive. After I applied my patch (using the
>>5*HZ/100 timeout) my test ran for a full weekend without giving the
>>timeout error.
>>
>>
>
>Ok, I think I'm convinced. That loop and the IDE usage of interrupt
>enables is just crap. I don't think your addition is very pretty, but the
>alternative is to rewrite the loop to be sane, which isn't going to happen
>in a stable kernel.
>
>How about a slightly more minimal patch, though? Ie does this work for
>you?
>
> Linus
>
>----
>===== drivers/ide/ide-iops.c 1.18 vs edited =====
>--- 1.18/drivers/ide/ide-iops.c Wed Jun 11 18:23:09 2003
>+++ edited/drivers/ide/ide-iops.c Sat Jan 3 10:54:21 2004
>@@ -647,6 +647,15 @@
> timeout += jiffies;
> while ((stat = hwif->INB(IDE_STATUS_REG)) & BUSY_STAT) {
> if (time_after(jiffies, timeout)) {
>+ /*
>+ * One last read after the timeout in case
>+ * heavy interrupt load made us not make any
>+ * progress during the timeout..
>+ */
>+ stat = hwif->INB(IDE_STATUS_REG);
>+ if (!(stat & BUSY_STAT))
>+ break;
>+
> local_irq_restore(flags);
> *startstop = DRIVER(drive)->error(drive, "status timeout", stat);
> return 1;
>
>
The patch seems functionally to be equivalent, I can test it first thing
on monday. I agree that the loop
should be rewritten too, it is neither clear for reading nor very
accurate when it comes to measuring
the timeout.
I also was planning to test this patch with a reduced timeout of
3*HZ/100, do you think that is worth
testing?
Daniel
P.S.
I will try to make my patches more minimal in future ;-).
next prev parent reply other threads:[~2004-01-03 19:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-12-29 7:58 2.4.23-uv3 patch set released James Bourne
2003-12-30 11:36 ` Marcelo Tosatti
2003-12-30 19:59 ` Linus Torvalds
2003-12-30 21:46 ` no DRQ after issuing WRITE was " Marcelo Tosatti
2003-12-30 21:57 ` Linus Torvalds
2003-12-30 22:21 ` Marcelo Tosatti
2003-12-30 23:18 ` Eric D. Mudama
2003-12-30 22:23 ` Rob Love
2003-12-30 22:54 ` Linus Torvalds
2003-12-30 22:58 ` Rob Love
2004-01-03 11:22 ` Daniel Tram Lux
2004-01-03 18:57 ` Linus Torvalds
2004-01-03 19:27 ` Daniel Tram Lux [this message]
2004-01-03 22:10 ` Pavel Machek
2003-12-31 6:10 ` James Bourne
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3FF717BA.2000205@starbattle.com \
--to=daniel@starbattle.com \
--cc=B.Zolnierkiewicz@elka.pw.edu.pl \
--cc=dice@mfa.kfki.hu \
--cc=jbourne@hardrock.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
--cc=rml@ximian.com \
--cc=steve@drifthost.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.