From: Resident Boxholder <resid@boxho.com>
To: undisclosed-recipients:;
Subject: Re: Problem: IDE data corruption with VIA chipsets on 2.4.20-19.8+others
Date: Thu, 11 Sep 2003 19:58:19 -0400 [thread overview]
Message-ID: <3F610C1B.6000307@boxho.com> (raw)
In-Reply-To: <003601c37826$26d8d220$5d74ad8e@hyperwolf>
VIA KT333
VIA KT400
SiS 745
The problem definately occurs on 2.4.20-19.8
B> I saw that on VIA KT266 k2.6.0-test5 and it
seems to go away if I use anticipatory scheduling
instead of deadline scheduling in kernel config and
don't use aggressive mem settings in cmos, and USB
on a server what's that for?
Did you flash all your bioses before trying anything?
I saw acpi-derivative problems go away after flashing
an award bios on another server. It seems there was
no graceful fail or default when acpi id info is not
in bios, so improvement was drastic. ACPI problems
cause USB to go clunk when ide is active, and downstream
conflicts like that are not too informative or productive
to look at if flashing the bios would "fix the code" as
if by magic.
-Bob
Eric Bickle wrote:
>....
>RedHat Linux 8....two IDE
>80gig hard-drives in a Software RAID-1... they all crash, they all
>run RedHat Linux, and they all use IDE.
>
>We also get various IDE errors in the /var/log files, such as: (also another
>weird IRQ error - except we're running a stock config! (ie/ no PCI devices
>other than one NIC... No idea if they're related to the IDE thing <sigh>.
>The IRQ thing appears to be USB related)
>== during server runtime ==
>kernel: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
>kernel: hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=150637065,
>sector=150636992
>kernel: end_request: I/O error, dev 16:01 (hdc), sector 150636992
>kernel: hdc: dma_intr: status=0x53 { DriveReady SeekComplete Index Error }
>kernel: hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=150630007,
>sector=150629920
>kernel: end_request: I/O error, dev 16:01 (hdc), sector 150629920
>== during server boot ==
>kernel: usb.c: new USB bus registered, assigned bus number 3
>kernel: hub.c: USB hub found
>kernel: hub.c: 2 ports detected
>kernel: PCI: Found IRQ 3 for device 00:10.2
>kernel: IRQ routing conflict for 00:10.0, have irq 9, want irq 3
>kernel: IRQ routing conflict for 00:10.1, have irq 9, want irq 3
>kernel: IRQ routing conflict for 00:10.2, have irq 9, want irq 3
>kernel: IRQ routing conflict for 00:10.3, have irq 9, want irq 3
>kernel: usb-uhci.c: USB UHCI at I/O 0x9400, IRQ 9
>
>Usually after that happens (the IDE errors) it will crash fairly soon. Other
>times no errors are logged but the server is *extremely* slow - likely due
>to disk performance. My guess is that there is some sort of internal IDE
>error (A "CorrectableError"?) that the kernel is recovering from and not
>writing a message to the log.
>
>Once and awhile after a major kernel panic or reboot, the system refused to
>reboot at all, going into an endless cycle of disk checking. We tried
>various brand-name ram suppliers in case it was a ram corruption - no luck.
>Everything points to IDE.
>
>We tried various motherboards, including the following chipsets:
>VIA KT333
>VIA KT400
>SiS 745
>
>The problem definately occurs on 2.4.20-19.8, but also some earlier kernel
>versions as well (which I can't remember). It only happens during extremely
>high disk usage (I've seen it fail with about 8% CPU and Memory Usage...)
>It's not our database server - it runs fine on our older "BX-Boards" (the
>older 300mhz intels) and on various Windows NT/2000 boxes. The configuration
>of the database server is exactially the same as on the other servers (I
>double checked at least 5 times...)
>
>After about 4 months of random server crashes and corruption, various trials
>and testing, I'm fairly certain it must be a hardware interaction between
>the IDE hard-drives and the Linux kernel. We've gone through 20 IDE
>hard-drives that work fine and look fine after the crashes - definately not
>a physical hard-drive problem. Definately not a motherboard problem because
>we've been through 4 different motherboards, different manufacturers and 3
>different chipsets.
>
>Ideas..? Help..? :( The clients are going to be banging down the doors any
>day if we don't get operational servers, and so far the only solution that I
>believe works 100% is to install Windows (doh). As a last resort I'm asking
>to see if any of you Kernel-gurus have ideas :-)
>
>Thanks,
>-Eric Bickle
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>
prev parent reply other threads:[~2003-09-12 0:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-09-11 5:32 Problem: IDE data corruption with VIA chipsets on 2.4.20-19.8+others Eric Bickle
2003-09-11 8:28 ` Sebastian Piecha
2003-09-11 11:26 ` Andre Tomt
2003-09-11 14:32 ` Alan Cox
2003-09-11 14:34 ` Alan Cox
2003-09-11 20:24 ` Francois Romieu
2003-09-11 23:58 ` Resident Boxholder [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3F610C1B.6000307@boxho.com \
--to=resid@boxho.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox