public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Disk corruption - Abit KT7, 2.2.19+ide patches
@ 2002-01-15 20:23 Nicholas Lee
       [not found] ` <20020115205116.GH51648@niksula.cs.hut.fi>
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Nicholas Lee @ 2002-01-15 20:23 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4838 bytes --]


Following up on 
http://marc.theaimsgroup.com/?l=linux-kernel&m=99889965423508&w=2

Still running the same kernel:

nic@hoppa:/var/log$ uname -a
Linux hoppa 2.2.19 #1 Mon Sep 17 12:56:24 NZST 2001 i686 unknown
nic@hoppa:/var/log$ cat /proc/ide/drivers
ide-cdrom version 4.58
ide-disk version 1.09

New HDD:
nic@hoppa:/var/log$ cat /proc/ide/hda/model
FUJITSU MPG3307AH E

nic@hoppa:/var/log$ sudo hdparm -v /dev/hda

/dev/hda:
 multcount    =  0 (off)
 I/O support  =  0 (default 16-bit)
 unmaskirq    =  0 (off)
 using_dma    =  0 (off)
 keepsettings =  0 (off)
 nowerr       =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 3737/255/63, sectors = 60046560, start = 0



I've discovered what seems like some disk corruption.

nic@hoppa:/var/log$ ls -l /var/log/messages
-rw-r-----    1 root     adm        996992 Jan 16 08:56 /var/log/messages

is full of:

an 16 08:54:42 hoppa kernel: hda: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jan 16 08:54:42 hoppa kernel: hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=28084900, sector=2959177
Jan 16 08:54:42 hoppa kernel: end_request: I/O error, dev 03:07 (hda), sector 2959177
Jan 16 08:54:48 hoppa kernel: hda: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jan 16 08:54:48 hoppa kernel: hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=28084900, sector=2959177
Jan 16 08:54:48 hoppa kernel: end_request: I/O error, dev 03:07 (hda), sector 2959177
Jan 16 08:56:14 hoppa kernel: hda: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
Jan 16 08:56:14 hoppa kernel: hda: read_intr: error=0x40 { UncorrectableError }, LBAsect=28084900, sector=2959177
Jan 16 08:56:14 hoppa kernel: end_request: I/O error, dev 03:07 (hda), sector 2959177



also:



Jan 13 09:29:29 hoppa kernel: hda: write_intr: ^@^@^@^@^@^@^@^@^@^@.....

and



Dec 23 06:25:07 hoppa squid[200]: Pinger socket opened on FD 13 
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Package: gdm
Status: install ok unpacked
Priority: optional
Section: x11
[...]


plus some other binary data.

This Dec 23 entries come 'AFTER' the initial Jan 13 entries at the start of the log.


This rotation of message has been running since:
nic@hoppa:/var/log$ ls -l messages.0
-rw-r-----    1 root     adm        185510 Jan 13 06:25 messages.0


message.0 also seems to include corrupted data.


lsof output:

nic@hoppa:/var/log$ sudo lsof | grep "log/message"
syslogd     143  root   17w   REG        3,7  997296    172812 /var/log/messages


uptime:

nic@hoppa:/var/log$ w
  9:11am  up 40 days,  3:59,  2 users,  load average: 1.03, 1.02, 1.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU  WHAT
nic      pts/1    inktiger.kpac.co  8:53am  0.00s  0.11s  0.02s  w 


Only things of note running are a distributed net client, samba and
squid cache.

Although for the last three days there has been some higher IO load with
some 100Mb+ files been copied across our WAN via "rsync -vp -aze ssh
{...}".



The Fujitsu drive which replaced the Seagate drive definitely handles
the rough conditions a lot better. Previously the seagate bus would just
lock, and require a hard power reset.


The IDE cables have been replaced, checked to make sure they are
end-to-end installed in each socket. There are NO other IDE devices on
either IDE channel.  ie.  HDD is only IDE device.


The BIOS on this Abit KT7 is recent as of November when the system had
the hard drive replaced and was last rebooted.




I have another system with much more load, and the same motherboard. Two
HDDs and CDROM and large application load.

It periodic (once a week on average) has a 
Jan 13 22:44:35 woodcut kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Jan 13 22:44:35 woodcut kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
Jan 13 22:55:40 woodcut kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Jan 13 22:55:40 woodcut kernel: hda: dma_intr: error=0x84 { DriveStatusError BadCRC }




-- 
Nicholas Lee - nj.lee at plumtree.co dot nz, somewhere on the fish Maui caught.
gpg. 8072 4F86 EDCD 4FC1 18EF  5BDD 07B0 9597 6D58 D70C            icq. 1612865 

                         Quixotic Eccentricity

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Disk corruption - Abit KT7, 2.2.19+ide patches
       [not found]   ` <20020115211032.GC598@inktiger.kiwa.co.nz>
@ 2002-01-15 21:37     ` Nicholas Lee
       [not found]     ` <20020115214049.GI51648@niksula.cs.hut.fi>
  1 sibling, 0 replies; 11+ messages in thread
From: Nicholas Lee @ 2002-01-15 21:37 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel


Reposting as the linux mailing message bounced.

Problem with:
<linux-kernel@vger.rutgers.edu>:
128.6.14.121 does not like recipient.
Remote host said: 550 <linux-kernel@vger.rutgers.edu>... User unknown
Giving up on 128.6.14.121.


On Wed, Jan 16, 2002 at 10:10:32AM +1300, Nicholas Lee wrote:
> On Tue, Jan 15, 2002 at 10:51:16PM +0200, Ville Herva wrote:
> > 
> > We are seeing corruption on KT7-RAID as well. But it's HPT370 only, and
> > looks to be pci transfer corruption. It seems depended on which pci slots
> > are populated (short story - expect full coverage on linux-kernel later this
> > week as we finish our tests.)
> 
> Interesting. I'd had a feeling that its related to the PCI bus and
> network traffic.
> 
> 
> > We tried a number of bioses.
> > 
> > A question: what pci cards do you have and in which slots?
> 
> 
> Hoppa is the current problem machine, woodcut only has periodic problem
> notice even though it has the higher load and is in a different city.
> 
> 
> 
> [nic@woodcut:~] sudo lspci
> 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
> 00:01.0 PCI bridge: VIA Technologies, Inc.: Unknown device 8305
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
> 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
> 00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10)
> 00:07.3 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10)
> 00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30)
> 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RT8139 (rev 10)
> 00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RT8139 (rev 10)
> 00:11.0 VGA compatible controller: Number 9 Computer Company Imagine 128 T2R [Ticket to Ride]
> 
> 
> eth0      Interrupt:11 Base address:0xdc00 
> 
> 
> 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RT8139 (rev 10)
> 	Subsystem: Realtek Semiconductor Co., Ltd. RT8139
> 	Flags: bus master, medium devsel, latency 32, IRQ 11
> 	I/O ports at dc00
> 	Memory at d6811000 (32-bit, non-prefetchable)
> 
> 
> 
> nic@hoppa:~$ sudo lspci
> 00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0305 (rev 02)
> 00:01.0 PCI bridge: VIA Technologies, Inc.: Unknown device 8305
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
> 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10)
> 00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10)
> 00:07.3 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 10)
> 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30)
> 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RT8139 (rev 10)
> 01:00.0 VGA compatible controller: Silicon Integrated Systems [SiS] 86C326 (rev d2)
> 
> 
> 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RT8139 (rev 10)
> 	Subsystem: Realtek Semiconductor Co., Ltd. RT8139
> 	Flags: bus master, medium devsel, latency 32, IRQ 11
> 	I/O ports at ec00
> 	Memory at e6800000 (32-bit, non-prefetchable)
> 
> eth0      Interrupt:11 Base address:0xec00 
> 
> 
> 
> > (Your problem may be - and propably is - a completely separate issue than
> > ours, but I'd like to know.)
> 
> Looks like the active NIC is in the same slot on both machines. 
> 
> > 
> > Alternatively, you may want to try 
> > (1) 2.2.20pre2 that notably includes Via chipset fixes
> > (2) The ide patch from Krzysztof Oledzki <ole@ans.pl>, which includes
> >     2.4 Via ide driver backport (http://www.ans.pl/ide)
> 
> I'll try this, but it'll have to be next week. Busy at the moment.
> 
> I suspect that the drive is now full of bad sectors, it might be
> troublesome compiling. 8-\
> 
> -- 
> Nicholas Lee - nj.lee at plumtree.co dot nz, somewhere on the fish Maui caught.
> gpg. 8072 4F86 EDCD 4FC1 18EF  5BDD 07B0 9597 6D58 D70C            icq. 1612865 
> 
>                          Quixotic Eccentricity

-- 
Nicholas Lee - nj.lee at plumtree.co dot nz, somewhere on the fish Maui caught.
gpg. 8072 4F86 EDCD 4FC1 18EF  5BDD 07B0 9597 6D58 D70C            icq. 1612865 

                         Quixotic Eccentricity

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Disk corruption - Abit KT7, 2.2.19+ide patches
       [not found]     ` <20020115214049.GI51648@niksula.cs.hut.fi>
@ 2002-01-15 22:02       ` Nicholas Lee
  2002-01-15 22:59         ` Ed Sweetman
  0 siblings, 1 reply; 11+ messages in thread
From: Nicholas Lee @ 2002-01-15 22:02 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel

On Tue, Jan 15, 2002 at 11:40:49PM +0200, Ville Herva wrote:
> 
> Hmm, do the pci ids map somehow to physical pci slots? It seems one

Im not sure. Someone in the mailing list should know though. 8)

> particular physical pci slot location is troublesome in our case. It caused
> problems with nic and even with a scsi adapter. Unfortunately I can't
> remember which slot it was - I'll have to check (I _think_ it was the third
> counting from bottom).
> 
> So I'm interested in the physical location of you nic...

Ok. I can't check the machine in Wellington, but in the machine here.
Not couting the AGP slot, the NIC is sitting in the third slot in the
back of the box.


I'd have to open it (later today when the office is closed) to comfirm
its sitting in the 'third' PCI slot from the CPU.  

The NIC is the only PCI card in this machines. The video card being a
basic AGP one. (This is the other difference with the machine in
Wellington which has an old but expensive PCI video card.)


The problem with moving the card, is that the problem exhibits very
slowly. After the previous problems with the Seagate drive reseting and
the computer finally crashing majorly I replaced the drive and
everything seemed fine.

Only now have I notice the problems.

-- 
Nicholas Lee - nj.lee at plumtree.co dot nz, somewhere on the fish Maui caught.
gpg. 8072 4F86 EDCD 4FC1 18EF  5BDD 07B0 9597 6D58 D70C            icq. 1612865 

                         Quixotic Eccentricity

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Disk corruption - Abit KT7, 2.2.19+ide patches
  2002-01-15 22:02       ` Nicholas Lee
@ 2002-01-15 22:59         ` Ed Sweetman
  2002-01-15 23:13           ` Nicholas Lee
  2002-01-16  7:07           ` Ville Herva
  0 siblings, 2 replies; 11+ messages in thread
From: Ed Sweetman @ 2002-01-15 22:59 UTC (permalink / raw)
  To: Nicholas Lee, Ville Herva; +Cc: linux-kernel


> On Tue, Jan 15, 2002 at 11:40:49PM +0200, Ville Herva wrote:
> >
> > Hmm, do the pci ids map somehow to physical pci slots? It seems one
>
> Im not sure. Someone in the mailing list should know though. 8)
>
> > particular physical pci slot location is troublesome in our case. It
caused
> > problems with nic and even with a scsi adapter. Unfortunately I can't
> > remember which slot it was - I'll have to check (I _think_ it was the
third
> > counting from bottom).
> >
> > So I'm interested in the physical location of you nic...
>
> Ok. I can't check the machine in Wellington, but in the machine here.
> Not couting the AGP slot, the NIC is sitting in the third slot in the
> back of the box.
>
>
> I'd have to open it (later today when the office is closed) to comfirm
> its sitting in the 'third' PCI slot from the CPU.
>
> The NIC is the only PCI card in this machines. The video card being a
> basic AGP one. (This is the other difference with the machine in
> Wellington which has an old but expensive PCI video card.)
>
>
> The problem with moving the card, is that the problem exhibits very
> slowly. After the previous problems with the Seagate drive reseting and
> the computer finally crashing majorly I replaced the drive and
> everything seemed fine.
>
> Only now have I notice the problems.
>

sounds like you're using the shared irq slot, might want to verify that with
lspci -vvv to see if anything else is using an irq at the time that's the
same as the card in that slot.  Also some places will do various special
things to one of the last pci slots, you should be able to find out by
looking in the manual.  Some cards just dont play nicely with shared irqs.

Then there's always the locality to some other device in the case possibly
causing your problem.   many reasons could cause the problem you're
describing.   I'm not really sure how this is a linux problem though since
you mention it's occuring only in a certain physical slot.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Disk corruption - Abit KT7, 2.2.19+ide patches
  2002-01-15 22:59         ` Ed Sweetman
@ 2002-01-15 23:13           ` Nicholas Lee
  2002-01-16  7:07           ` Ville Herva
  1 sibling, 0 replies; 11+ messages in thread
From: Nicholas Lee @ 2002-01-15 23:13 UTC (permalink / raw)
  To: Ed Sweetman; +Cc: Ville Herva, linux-kernel

On Tue, Jan 15, 2002 at 05:59:19PM -0500, Ed Sweetman wrote:

> sounds like you're using the shared irq slot, might want to verify that with
> lspci -vvv to see if anything else is using an irq at the time that's the
> same as the card in that slot.  Also some places will do various special
> things to one of the last pci slots, you should be able to find out by
> looking in the manual.  Some cards just dont play nicely with shared irqs.

Nope:

nic@hoppa:~$ sudo lspci -vvv  | grep IRQ
	Interrupt: pin D routed to IRQ 10
	Interrupt: pin D routed to IRQ 10
	Interrupt: pin A routed to IRQ 11

IRQ 10 is USB
IRQ 11 is the NIC

00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RT8139 (rev 10)
        Subsystem: Realtek Semiconductor Co., Ltd. RT8139
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 min, 64 max, 32 set
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at ec00
        Region 1: Memory at e6800000 (32-bit, non-prefetchable)


nic@hoppa:~$ dmesg | grep -i irq
VP_IDE: not 100% native mode: will probe irqs later
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
eth0: RealTek RTL8139 Fast Ethernet at 0xec00, IRQ 11, 00:50:bf:04:61:e1.

IDE channel on IRQ 14.


> Then there's always the locality to some other device in the case possibly
> causing your problem.   many reasons could cause the problem you're
> describing.   I'm not really sure how this is a linux problem though since
> you mention it's occuring only in a certain physical slot.

I'm not sure about the 'certain' slot. I'll have to test that myself.

-- 
Nicholas Lee - nj.lee at plumtree.co dot nz, somewhere on the fish Maui caught.
gpg. 8072 4F86 EDCD 4FC1 18EF  5BDD 07B0 9597 6D58 D70C            icq. 1612865 

                         Quixotic Eccentricity

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Disk corruption - Abit KT7, 2.2.19+ide patches
  2002-01-15 22:59         ` Ed Sweetman
  2002-01-15 23:13           ` Nicholas Lee
@ 2002-01-16  7:07           ` Ville Herva
  2002-01-19  2:40             ` Nicholas Lee
  1 sibling, 1 reply; 11+ messages in thread
From: Ville Herva @ 2002-01-16  7:07 UTC (permalink / raw)
  To: Ed Sweetman; +Cc: Nicholas Lee, linux-kernel

On Tue, Jan 15, 2002 at 05:59:19PM -0500, you [Ed Sweetman] claimed:
> 
> sounds like you're using the shared irq slot, might want to verify that with
> lspci -vvv to see if anything else is using an irq at the time that's the
> same as the card in that slot.  Also some places will do various special
> things to one of the last pci slots, you should be able to find out by
> looking in the manual.  Some cards just dont play nicely with shared irqs.

Oh, I check some time ago. Sorry for baing vague, but as I said, we expect
to post more info in a couple of days. 

The card was in a slot that shares an IQR with something called "serial bus
controller" (and USB gadget, I gather.) It's _not_ in the slot that shares
the IRQ with (both) HPT370 controllers.

USB is disabled in BIOS and in kernel config. Ansolutely no USB devices
attached.

> describing.   I'm not really sure how this is a linux problem though since
> you mention it's occuring only in a certain physical slot.

No. I'm pretty positive this is a case of Via PCI being flaky.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Disk corruption - Abit KT7, 2.2.19+ide patches
  2002-01-16  7:07           ` Ville Herva
@ 2002-01-19  2:40             ` Nicholas Lee
  2002-01-19 10:44               ` Ville Herva
  0 siblings, 1 reply; 11+ messages in thread
From: Nicholas Lee @ 2002-01-19  2:40 UTC (permalink / raw)
  To: Ville Herva; +Cc: linux-kernel

On Wed, Jan 16, 2002 at 09:07:11AM +0200, Ville Herva wrote:

> Oh, I check some time ago. Sorry for baing vague, but as I said, we expect
> to post more info in a couple of days. 
> 
> The card was in a slot that shares an IQR with something called "serial bus
> controller" (and USB gadget, I gather.) It's _not_ in the slot that shares
> the IRQ with (both) HPT370 controllers.
> 
> USB is disabled in BIOS and in kernel config. Ansolutely no USB devices
> attached.
> 
> No. I'm pretty positive this is a case of Via PCI being flaky.

I opened the box, and yes the NIC was in PCI slot 3. I moved it to slot
1 and I'll patch up the bad blocks on that drive and see if it happens
again. 

Of course it took several months this time, and it's likely I'll be
upgrading that machine to 2.4. So the new drivers in 2.4 might handle
the buggy chipset.

-- 
Nicholas Lee - nj.lee at plumtree.co dot nz, somewhere on the fish Maui caught.
gpg. 8072 4F86 EDCD 4FC1 18EF  5BDD 07B0 9597 6D58 D70C            icq. 1612865 

                         Quixotic Eccentricity

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Disk corruption - Abit KT7, 2.2.19+ide patches
  2002-01-19  2:40             ` Nicholas Lee
@ 2002-01-19 10:44               ` Ville Herva
  0 siblings, 0 replies; 11+ messages in thread
From: Ville Herva @ 2002-01-19 10:44 UTC (permalink / raw)
  To: Nicholas Lee, linux-kernel

On Sat, Jan 19, 2002 at 03:40:59PM +1300, you [Nicholas Lee] claimed:
> 
> I opened the box, and yes the NIC was in PCI slot 3. I moved it to slot
> 1 and I'll patch up the bad blocks on that drive and see if it happens
> again. 

Interesting. May or may not be the same bug.
 
> Of course it took several months this time, and it's likely I'll be
> upgrading that machine to 2.4. So the new drivers in 2.4 might handle
> the buggy chipset.

We also tried 2.4, but it didn't solve the problem for us.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* 2.4.18-pre3-ac2 still having problems (was Disk corruption - Abit KT7, 2.2.19+ide patches)
  2002-01-15 20:23 Disk corruption - Abit KT7, 2.2.19+ide patches Nicholas Lee
       [not found] ` <20020115205116.GH51648@niksula.cs.hut.fi>
@ 2002-01-24 22:40 ` Nicholas Lee
  2002-01-26 13:10   ` Hans-Peter Jansen
  2002-01-25  1:14 ` Disk corruption - Abit KT7, 2.2.19+ide patches Tim Moore
  2 siblings, 1 reply; 11+ messages in thread
From: Nicholas Lee @ 2002-01-24 22:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jani Forssell, Ville Herva


Just a follow up on the previous report.

Replace the kernel with 2.4.18-pre3-ac2 which includes the recent ATA
driver from Andre Hedric.  NIC was moved from slot 3.


nic@hoppa:~$ cat /proc/ide/drivers 
ide-cdrom version 4.59
ide-disk version 1.12


Everything seemed to be running smoothly.

I performed the stress test mentioned here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=101059003125783&w 
with the added complication of ping flooding and being ping flooded.


nic@hoppa:~$ sudo cat /dev/hda > /dev/null & sudo ping -f -s 64000 inktiger
[1] 445
PING inktiger.kpac.co.nz (192.168.9.108): 64000 data bytes
.....................................................................................................................................................................................................................................................................................................................................................................................................
--- inktiger.kpac.co.nz ping statistics ---
406 packets transmitted, 16 packets received, 96% packet loss
round-trip min/avg/max = 258.3/695.8/1783.5 ms
nic@hoppa:~$ uname -a
Linux hoppa 2.4.18-pre3-ac2 #3 Wed Jan 23 10:48:41 NZDT 2002 i686 unknown



I was going to say "Stable as a rock again", but I thought to soon. 

Five mintues later, down it goes:

hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=28086092, sector=2960368
end_request: I/O error, dev 03:07 (hda), sector 2960368
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=28086108, sector=2960376
end_request: I/O error, dev 03:07 (hda), sector 2960376
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=28086108, sector=2960384
end_request: I/O error, dev 03:07 (hda), sector 2960384

Start filling up the console.

Lucky I've got ext3 I think. Nope, as soon as the reboot gets to that
partition the above messages start filling up the console again.

fsck -c /dev/hda7 only makes things worse. Looks like I'll just have to
deep six that whole part of the drive.

Power reset - to reset the HD IDE bus - doesn't seem to help matters
either. Looks like the low-level format part of the drive might be
corrupted.


Note: /dev/hda7 is the /var mount point. I've noted before that often
problems with the drive and related to CUPS spool events. Network and
disk IO at the same time. 


Looks like that part of the drive is completely toasted. I wonder where
I sould send the bill too. I'm definitely thinking that I should not
even consider AMD/VIA solutions near core servers.


Default settings on boot:
[nic@inktiger:~] cat hdparm.log 

/dev/hda:
 multcount    = 16 (on)
 I/O support  =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 nowerr       =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 3737/255/63, sectors = 60046560, start = 0
 busstate     =  1 (on)


-- 
Nicholas Lee - nj.lee at plumtree.co dot nz, somewhere on the fish Maui caught.
gpg. 8072 4F86 EDCD 4FC1 18EF  5BDD 07B0 9597 6D58 D70C            icq. 1612865 

                         Quixotic Eccentricity

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Disk corruption - Abit KT7, 2.2.19+ide patches
  2002-01-15 20:23 Disk corruption - Abit KT7, 2.2.19+ide patches Nicholas Lee
       [not found] ` <20020115205116.GH51648@niksula.cs.hut.fi>
  2002-01-24 22:40 ` 2.4.18-pre3-ac2 still having problems (was Disk corruption - Abit KT7, 2.2.19+ide patches) Nicholas Lee
@ 2002-01-25  1:14 ` Tim Moore
  2 siblings, 0 replies; 11+ messages in thread
From: Tim Moore @ 2002-01-25  1:14 UTC (permalink / raw)
  To: Nicholas Lee; +Cc: linux-kernel

I've tried to approximate your basic test on an Abit KA7, but
cannot trigger i/o errors or oopsen.  Let me know if you
would like anything rerun/modified.

2.2.21pre1 + ide.2.2.19.05042001.patch.  Tests run at runlevel 5
with a moderate amount of the usual stuff.  vmstat was triggered
in a separate xterm and edited into the log at the appropriate
places.  I was surprised to find fewer context switches with
'ping -f -s 64000' than with no ping.

rgds,
tim.

*****

[17:04] abit:~ > cat io.log
Script started on Thu Jan 24 16:20:48 2002

[tim@abit tim]# /usr/bin/time dd if=/dev/hda of=/dev/null bs=1k & \
? /usr/bin/time dd if=/dev/hdc of=/dev/null bs=1k & \
? sleep 15; killall dd
[1] 9786
[2] 9788
[tim@abit tim]# Command terminated by signal 15
0.11user 6.10system 0:15.07elapsed 41%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (111major+16minor)pagefaults 0swaps
Command terminated by signal 15
0.18user 6.65system 0:15.16elapsed 45%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (112major+16minor)pagefaults 0swaps

[2]  + Exit 15                       /usr/bin/time dd if=/dev/hdc of=/dev/null bs=1k
[1]  + Exit 15                       /usr/bin/time dd if=/dev/hda of=/dev/null bs=1k

[16:23] abit:~ > vmstat -n 1
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0    116   2352 373500  18236   0   0     0     0  103   203   0   4  96
 3  0  0    116   3116 372476  18244   0   0     5     0  108   240   9   6  85
 2  0  0    116   2144 373384  18232   0   0 31102     0 7882  8510   7  88   6
 2  0  0    116   2784 372756  18220   0   0 31360     0 7944  8552   1  92   7
 4  0  0    116   2468 373080  18208   0   0 31163     7 7903  8674   2  91   7
 2  0  0    116   2948 372616  18192   0   0 31392     0 7952  8525   4  92   4
 3  0  0    116   2096 373520  18120   0   0 31552     0 7992  8501   4  95   1
 5  0  0    116   2596 373084  18060   0   0 32400     0 8207  8705   3  93   5
 5  0  0    116   2448 373240  18052   0   0 32148     0 8141  8695   4  90   6
 2  0  0    116   2152 373556  18032   0   0 31016     6 7864  8651   2  89   9
 3  0  0    116   2420 373288  18032   0   0 31476     0 7973  8497   3  91   6
 2  0  0    116   2608 373100  18032   0   0 31556     0 7993  8492   2  93   5
 4  0  0    116   2624 373088  18024   0   0 31340     0 7939  8486   4  88   8
 4  0  0    116   2972 373832  16844   0   0 30540     0 7739  8306   3  90   7
 4  0  0    116   2120 374664  16844   0   0 31552     0 7992  8500   3  94   3
 6  0  0    116   2452 374336  16844   0   0 31416     0 7958  8478   5  90   5
 0  0  0    116   2844 374200  16900   0   0 13226     0 3412  3820   7  42  51
 1  0  0    116   2840 374200  16900   0   0     0     0  103   169   1   4  95
 2  0  0    116   2836 374204  16900   0   0     6     0  117   220   0   4  96



[tim@abit tim]# /usr/bin/time dd if=/dev/hda of=/dev/null bs=1k & \
? /usr/bin/time dd if=/dev/hdc of=/dev/null bs=1k & \
? ping -f 192.168.1.11 &\
? sleep 15; killall ping dd
[1] 9794
[2] 9796
[3] 9798
PING 192.168.1.11 (192.168.1.11) from 192.168.1.10 : 56(84) bytes of data.
Warning: no SO_RCVTIMEO support, falling back to poll
0.08user 2.68system 0:15.25elapsed 18%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (111major+16minor)pagefaults 0swaps
Command terminated by signal 15
0.07user 2.82system 0:15.28elapsed 18%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (113major+16minor)pagefaults 0swaps

[3]  + Terminated                    ping -f 192.168.1.11
[2]  + Exit 15                       /usr/bin/time dd if=/dev/hdc of=/dev/null bs=1k
[1]  + Exit 15                       /usr/bin/time dd if=/dev/hda of=/dev/null bs=1k

[16:25] abit:~ > vmstat -n 1
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0    116   2748 374216  16960   0   0     0     0  104   205   1   5  94
 4  0  0    116   2432 374228  16968   0   0    20     0  114   293  13   6  81
 4  0  0    116   2860 373672  16972   0   0 14795     0 6824 22553  14  86   0
 5  0  0    116   2364 373892  16972   0   0 11096     0 8049 25992  21  79   0
 5  0  0    116   2224 374040  16972   0   0 16016     0 10143 23148  24  76   0
 5  0  0    116   2572 373752  16972   0   0 17368     0 10537 22909  21  79   0
 5  0  0    116   2260 373996  16972   0   0 16623     7 9278 20696  21  79   0
 5  0  0    116   2548 373996  16972   0   0 13304     0 8033 20136  22  78   0
 5  0  0    116   2104 374404  16972   0   0 13460     0 7336 22157  18  82   0
 4  0  0    116   2064 374476  16972   0   0 17092     0 8340 19564  16  84   0
 4  0  0    116   2532 374016  16972   0   0 15536     0 8659 20688  27  73   0
 5  0  0    116   2416 373996  16972   0   0 15080    22 7954 19934  23  77   0
 5  0  0    116   2360 374192  16972   0   0 15424     0 8511 20816  17  83   0
 4  0  0    116   2956 373596  16972   0   0 17192     0 7081 17983  17  83   0
 4  0  0    116   2684 373864  16972   0   0 14728     0 6767 20708  16  84   0
 6  0  0    116   2712 373528  16972   0   0 12844     0 8386 22734  26  74   0
 1  0  0    116   3296 373656  16972   0   0 13180    11 6344 12894  16  61  23
 1  0  0    116   3296 373656  16972   0   0     0     0  299   622   3   4  93
 0  0  0    116   3296 373656  16972   0   0     0     0  103   168   2   2  96



[tim@abit tim]# /usr/bin/time dd if=/dev/hda of=/dev/null bs=1k & \
? /usr/bin/time dd if=/dev/hdc of=/dev/null bs=1k & \
? ping -f 192.168.1.11 -s 64000 &\
? sleep 15; killall ping dd
[1] 9802
[2] 9804
[3] 9806
PING 192.168.1.11 (192.168.1.11) from 192.168.1.10 : 64000(64028) bytes of data.
Warning: no SO_RCVTIMEO support, falling back to poll
[tim@abit tim]# Command terminated by signal 15
0.15user 5.49system 0:15.32elapsed 36%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (111major+16minor)pagefaults 0swaps
Command terminated by signal 15
0.17user 5.63system 0:15.32elapsed 37%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (111major+16minor)pagefaults 0swaps

[2]    Exit 15                       /usr/bin/time dd if=/dev/hdc of=/dev/null bs=1k
[1]  + Exit 15                       /usr/bin/time dd if=/dev/hda of=/dev/null bs=1k

[16:26] abit:~ > vmstat -n 1
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 5  0  0    116   2464 354208  37612   0   0     0     0  106   237  11   8  82
 4  0  0    116   2128 354208  37612   0   0     0     0 3470  2162  15  85   0
 3  0  0    116   2356 354204  37604   0   0 10100     0 5932  6888  12  88   0
 3  0  0    116   2280 354344  37604   0   0 16652     0 7632  8982  11  88   1
 3  0  0    116   2224 354492  37604   0   0 19732    13 8408  7217   5  84  11
 3  0  0    116   2616 357800  34204   0   0 20132     0 8552  7206   8  88   4
 3  0  0    116   2568 357864  34204   0   0 20544     0 8604  7352   8  87   5
 5  0  0    116   3008 357436  34204   0   0 20180     0 8453  7300   7  86   8
 5  0  0    116   2636 357808  34204   0   0 19828     0 8422  7303   8  80  13
 4  0  0    116   2360 358388  33904   0   0 19864     8 8427  7239  13  79   8
 7  0  0    116   2844 357908  33904   0   0 20256     0 8522  7199   9  82  10
 4  0  0    116   2512 358244  33900   0   0 19916     0 8378  7169  10  86   4
 5  0  0    116   2536 358220  33900   0   0 18536     0 8300  7213  17  78   6
 4  0  0    116   2392 358392  33892   0   0 19620     0 8358  7096  12  80   9
 5  0  0    116   2896 357892  33892   0   0 20236     0 8459  7221   5  85  10
 1  0  0    116   3152 358132  33948   0   0 12575     0 5395  4783  14  53  33
 1  0  0    116   3152 358132  33948   0   0     0     0  103   177   2   3  95
 0  0  0    116   3152 358132  33948   0   0     0     0  103   173   1   2  97



[tim@abit tim]# grep read_intr /var/log/messages

[tim@abit tim]# ping dell
PING dell.yoyodyne.org (192.168.1.11) from 192.168.1.10 : 56(84) bytes of data.
64 bytes from dell.yoyodyne.org (192.168.1.11): icmp_seq=0 ttl=255 time=270 usec
64 bytes from dell.yoyodyne.org (192.168.1.11): icmp_seq=1 ttl=255 time=235 usec
64 bytes from dell.yoyodyne.org (192.168.1.11): icmp_seq=2 ttl=255 time=239 usec

--- dell.yoyodyne.org ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/mdev = 0.235/0.248/0.270/0.015 ms

[tim@abit tim]# cat /proc/ide/drivers
ide-scsi version 0.9
ide-disk version 1.09

[tim@abit tim]# hdparm -iv /dev/hd{a,c}

/dev/hda:
 multcount    =  0 (off)
 I/O support  =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 nowerr       =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 2501/255/63, sectors = 40188960, start = 0

 Model=IC35L020AVER07-0, FwRev=ER2OA44A, SerialNo=SVPTV0L4268
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40
 BuffType=DualPortCache, BuffSize=1916kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=40188960
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 *udma4 udma5 

/dev/hdc:
 multcount    =  0 (off)
 I/O support  =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 nowerr       =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 2491/255/63, sectors = 40021632, start = 0

 Model=Maxtor 32049H2, FwRev=YAH814Y0, SerialNo=L21R7EKC
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=40021632
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 *udma4 udma5 

[tim@abit tim]# dmesg
Linux version 2.2.21pre1 (root@abit) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #2 Mon Dec 31 17:38:18 PST 2001
BIOS-provided physical RAM map:
 BIOS-e820: 0009f000 @ 00000000 (usable)
 BIOS-e820: 1ff00000 @ 00100000 (usable)
Detected 848393 kHz processor.
ide_setup: hdb=ide-scsi
ide_setup: hdd=ide-scsi
Console: colour VGA+ 80x25
Calibrating delay loop... 1690.82 BogoMIPS
Memory: 517020k/524288k available (1196k kernel code, 412k reserved, 5592k data, 68k init)
Dentry hash table entries: 65536 (order 7, 512k)
Buffer cache hash table entries: 524288 (order 9, 2048k)
Page cache hash table entries: 131072 (order 7, 512k)
CPU: L1 I Cache: 64K  L1 D Cache: 64K
CPU: L2 Cache: 512K
CPU: AMD Athlon(tm) Processor stepping 02
Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.35a (19990819) Richard Gooch (rgooch@atnf.csiro.au)
PCI: PCI BIOS revision 2.10 entry at 0xfb4d0
PCI: Probing PCI hardware
Linux NET4.0 for Linux 2.2
Based upon Swansea University Computer Society NET3.039
NET4: Unix domain sockets 1.0 for Linux NET4.0.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
TCP: Hash tables configured (ehash 524288 bhash 65536)
Initializing RT netlink socket
Starting kswapd v 1.5 
parport0: PC-style at 0x378 [SPP,PS2,EPP]
Detected PS/2 Mouse Port.
Serial driver version 4.27 with no serial options enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.09
Crystal 4280/461x + AC97 Audio, version 0.13, 17:39:20 Dec 31 2001
cs461x: Card found at 0xd7106000 and 0xd7000000, IRQ 5
cs461x: Voyetra at 0xd7106000/0xd7000000, IRQ 5
ac97_codec: AC97 Audio codec, vendor id1: 0x4352, id2: 0x5914 (Unknown)
cs461x: Found 1 audio device(s).
loop: registered device at major 7
Uniform Multi-Platform E-IDE driver Revision: 6.30
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller on PCI bus 00 dev 39
VP_IDE: chipset revision 16
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66 controller on pci00:07.1
    ide0: BM-DMA at 0xe000-0xe007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xe008-0xe00f, BIOS settings: hdc:DMA, hdd:DMA
hda: IC35L020AVER07-0, ATA DISK drive
hdb: YAMAHA CRW4416E, ATAPI CDROM drive
hdc: Maxtor 32049H2, ATA DISK drive
hdd: BCD-F520D CD-ROM, ATAPI CDROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: IC35L020AVER07-0, 19623MB w/1916kB Cache, CHS=2501/255/63, UDMA(66)
hdc: Maxtor 32049H2, 19541MB w/2048kB Cache, CHS=39704/16/63, UDMA(66)
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
ppa: Version 2.07 (for Linux 2.2.x)
WARNING - no ppa compatible devices found.
  As of 31/Aug/1998 Iomega started shipping parallel
  port ZIP drives with a different interface which is
  supported by the imm (ZIP Plus) driver. If the
  cable is marked with "AutoDetect", this is what has
  happened.
scsi0 : SCSI host adapter emulation for IDE ATAPI devices
scsi : 1 host.
  Vendor: YAMAHA    Model: CRW4416E          Rev: 1.0e
  Type:   CD-ROM                             ANSI SCSI revision: 02
Detected scsi CD-ROM sr0 at scsi0, channel 0, id 0, lun 0
  Vendor: BCD       Model: F520D CD-ROM      Rev: 2.41
  Type:   CD-ROM                             ANSI SCSI revision: 02
Detected scsi CD-ROM sr1 at scsi0, channel 0, id 1, lun 0
scsi : detected 2 SCSI generics 2 SCSI cdroms total.
sr0: scsi3-mmc drive: 16x/16x writer cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.11
sr1: scsi3-mmc drive: 1x/52x cd/rw xa/form2 cdda tray
tulip.c:v0.91g-ppc 7/16/99 becker@cesdis.gsfc.nasa.gov
eth0: Lite-On 82c168 PNIC rev 32 at 0xec00, 00:A0:CC:57:89:93, IRQ 11.
eth0:  MII transceiver #1 config 3000 status 7829 advertising 01e1.
Partition check:
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 >
 hdc: [PTBL] [2491/255/63] hdc1 hdc2 hdc3 hdc4
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 68k freed
Adding Swap: 257032k swap-space (priority 1)
Adding Swap: 530104k swap-space (priority 1)
eth0: Setting full-duplex based on MII#1 link partner capability of 41e1.
ide-scsi: hdb: unsupported command in request queue (0)
end_request: I/O error, dev 03:40 (hdb), sector 0
VFS: Disk change detected on device sr(11,1)
VFS: Disk change detected on device sr(11,1)
ide-scsi: hdb: unsupported command in request queue (0)
end_request: I/O error, dev 03:40 (hdb), sector 0

[tim@abit tim]# lspci -vvv
00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 0391 (rev 02)
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Latency: 0 set
        Region 0: Memory at d0000000 (32-bit, prefetchable)
        Capabilities: [a0] AGP version 2.0
                Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
                Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

00:01.0 PCI bridge: VIA Technologies, Inc.: Unknown device 8391 (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
        Latency: 0 set
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: d2000000-d3ffffff
        Prefetchable memory behind bridge: d4000000-d5ffffff
        BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- AuxPwr- DSI- D1+ D2- PME-
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 22)
        Subsystem: VIA Technologies, Inc.: Unknown device 0000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0 set

00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev 10) (prog-if 8a [Master SecP PriP])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 set
        Region 4: I/O ports at e000
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- AuxPwr- DSI- D1- D2- PME-
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30)
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Capabilities: [68] Power Management version 2
                Flags: PMEClk- AuxPwr- DSI- D1- D2- PME-
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:08.0 FireWire (IEEE 1394): Texas Instruments TSB12LV23 OHCI Compliant IEEE-1394 Controller (prog-if 10 [OHCI])
        Subsystem: Ads Technologies Inc: Unknown device 0000
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 3 min, 4 max, 32 set, cache line size 08
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at d7104000 (32-bit, non-prefetchable)
        Region 1: Memory at d7100000 (32-bit, non-prefetchable)
        Capabilities: [44] Power Management version 1
                Flags: PMEClk- AuxPwr+ DSI- D1- D2- PME-
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:09.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24 [CrystalClear SoundFusion Audio Accelerator] (rev 01)
        Subsystem: Voyetra Technologies: Unknown device 3357
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 4 min, 24 max, 32 set
        Interrupt: pin A routed to IRQ 5
        Region 0: Memory at d7106000 (32-bit, non-prefetchable)
        Region 1: Memory at d7000000 (32-bit, non-prefetchable)
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- AuxPwr- DSI+ D1+ D2+ PME-
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0f.0 Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20)
        Subsystem: Netgear FA310TX
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 set
        Interrupt: pin A routed to IRQ 11
        Region 0: I/O ports at ec00
        Region 1: Memory at d7105000 (32-bit, non-prefetchable)

01:00.0 VGA compatible controller: nVidia Corporation Riva TNT2 Model 64 (rev 11) (prog-if 00 [VGA])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 5 min, 1 max, 32 set
        Interrupt: pin A routed to IRQ 10
        Region 0: Memory at d2000000 (32-bit, non-prefetchable)
        Region 1: Memory at d4000000 (32-bit, prefetchable)
        Capabilities: [60] Power Management version 1
                Flags: PMEClk- AuxPwr- DSI- D1- D2- PME-
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [44] AGP version 2.0
                Status: RQ=31 SBA- 64bit- FW- Rate=x1,x2
                Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>

[tim@abit tim]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Jan15 ?        00:00:05 init
root         2     1  0 Jan15 ?        00:00:02 [kflushd]
root         3     1  0 Jan15 ?        00:00:02 [kupdate]
root         4     1  0 Jan15 ?        00:00:08 [kswapd]
root         5     1  0 Jan15 ?        00:00:00 [keventd]
bin        305     1  0 Jan15 ?        00:00:00 portmap
rpcuser    333     1  0 Jan15 ?        00:00:00 rpc.statd
root       358     1  0 Jan15 ?        00:00:00 rpc.mountd
root       367     1  0 Jan15 ?        00:00:00 [nfsd]
root       368     1  0 Jan15 ?        00:00:00 [nfsd]
root       369     1  0 Jan15 ?        00:00:00 [nfsd]
root       370     1  0 Jan15 ?        00:00:00 [nfsd]
root       371     1  0 Jan15 ?        00:00:00 [nfsd]
root       372     1  0 Jan15 ?        00:00:00 [nfsd]
root       373     1  0 Jan15 ?        00:00:00 [nfsd]
root       374     1  0 Jan15 ?        00:00:00 [nfsd]
root       375   367  0 Jan15 ?        00:00:00 [lockd]
root       376   375  0 Jan15 ?        00:00:00 [rpciod]
root       432     1  0 Jan15 ?        00:00:01 nscd
root       437   432  0 Jan15 ?        00:00:00 nscd
root       438   437  0 Jan15 ?        00:00:01 nscd
root       439   437  0 Jan15 ?        00:00:02 nscd
root       447     1  0 Jan15 ?        00:00:02 syslogd -r -l dell:asus:smp:lap -m 0
root       456     1  0 Jan15 ?        00:00:00 klogd
nobody     470     1  0 Jan15 ?        00:00:00 identd -e -o
nobody     473   470  0 Jan15 ?        00:00:00 identd -e -o
nobody     474   473  0 Jan15 ?        00:00:00 identd -e -o
nobody     475   473  0 Jan15 ?        00:00:00 identd -e -o
nobody     476   473  0 Jan15 ?        00:00:00 identd -e -o
daemon     488     1  0 Jan15 ?        00:00:00 /usr/sbin/atd
root       502     1  0 Jan15 ?        00:00:00 crond
root       516     1  0 Jan15 ?        00:00:00 inetd
root       527     1  0 Jan15 ?        00:00:00 sshd
root       543     1  0 Jan15 ?        00:00:00 xntpd -A
root       557     1  0 Jan15 ?        00:00:00 lpd -l
root       604     1  0 Jan15 ?        00:00:00 sendmail: accepting connections
xfs        667     1  0 Jan15 ?        00:00:00 xfs -droppriv -daemon -port -1
root       693     1  0 Jan15 tty1     00:00:00 /sbin/mingetty tty1
root       694     1  0 Jan15 tty2     00:00:00 /sbin/mingetty tty2
root       695     1  0 Jan15 tty3     00:00:00 /sbin/mingetty tty3
root       696     1  0 Jan15 tty4     00:00:00 /sbin/mingetty tty4
root       697     1  0 Jan15 tty5     00:00:00 /sbin/mingetty tty5
root       698     1  0 Jan15 tty6     00:00:00 /sbin/mingetty tty6
root       699     1  0 Jan15 ?        00:00:00 /usr/X11R6/bin/xdm -nodaemon
root       710   699  1 Jan15 ?        02:53:46 /etc/X11/X -auth /usr/X11R6/lib/X11/xdm/authdir/authfiles/A:0-F1dWBe
root       711   699  0 Jan15 ?        00:00:00 -:0                         
tim        722   711  0 Jan15 ?        00:04:18 /usr/X11R6/bin/afterstep
tim        766     1  0 Jan15 ?        00:00:01 /usr/bin/X11/xclipboard -geometry 357x178+1241+351
root       767     1  1 Jan15 ?        03:50:53 xosview -geometry 253x302+1345+0
tim        768     1  0 Jan15 ?        00:00:00 xcalc -geometry 188x231+1408+329
tim        769     1  0 Jan15 ?        00:00:03 xterm -cm -ls -sl 4500 -sb -vb -rv -bg white -fg black -fn 9x15bold -geometry 15
tim        780   769  0 Jan15 ttyp3    00:00:00 -tcsh
tim        889     1  0 Jan15 ?        00:00:03 ical -geom 1x25+0+15 -iconic
tim       5190   722  0 Jan17 ?        00:00:00 /usr/X11R6/bin/Animate --window 0 --context 8
tim       5191   722  0 Jan17 ?        00:00:01 /usr/X11R6/bin/Wharf --window 0 --context 8
tim       5193     1  0 Jan17 ?        00:00:00 asclock -shape -12 -led green -exe /home/tim/bin/ical_launch
tim       5194   722  0 Jan17 ?        00:00:06 /usr/X11R6/bin/Pager --window 0 --context 8 0 0
tim       5196     1  0 Jan17 ?        00:00:01 ascpu -u 2 -samples 15
tim       5198     1  0 Jan17 ?        00:00:00 asmix -shape
tim      10580     1  0 Jan18 ?        00:00:00 csh /home/tim/bin/x11 -r smp
tim      10581 10580  0 Jan18 ?        00:00:00 rsh -n -l tim smp /usr/bin/X11/xterm -ls -sb -sl 3500 -fg black -bg white -displ
tim      14310     1  0 Jan19 ?        00:00:00 csh /home/tim/bin/x11 -r smp
tim      14311 14310  0 Jan19 ?        00:00:00 rsh -n -l tim smp /usr/bin/X11/xterm -ls -sb -sl 3500 -fg black -bg white -displ
tim      29379     1  0 Jan21 ?        00:00:14 xterm -geometry 90x30 -cm -ls -sl 4500 -sb -vb -rv -bg white -fg black -fn 9x15b
tim      29382 29379  0 Jan21 ttyp4    00:00:00 -tcsh
root      6169     1  0 Jan23 ttyp4    00:00:00 su - nobody -c /usr/sbin/junkbuster /etc/junkbuster/config
nobody    6174  6169  0 Jan23 ttyp4    00:00:01 /usr/sbin/junkbuster /etc/junkbuster/config
tim       8459   780  0 02:11 ttyp3    00:00:00 tail -n 300 -f /var/log/messages
tim       8460   780  0 02:11 ttyp3    00:00:00 egrep -v  named.* [UNX]S|:80 | alive
tim       8529     1  0 02:26 ?        00:00:00 csh /usr/bin/netscape
tim       8530  8529  0 02:26 ?        00:01:43 /opt/netscape/netscape
tim       9836 29382  0 17:00 ttyp4    00:00:00 ps -ef


[tim@abit tim]# exit

Script done on Thu Jan 24 16:29:08 2002
[17:04] abit:~ > 


--

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.4.18-pre3-ac2 still having problems (was Disk corruption - Abit KT7, 2.2.19+ide patches)
  2002-01-24 22:40 ` 2.4.18-pre3-ac2 still having problems (was Disk corruption - Abit KT7, 2.2.19+ide patches) Nicholas Lee
@ 2002-01-26 13:10   ` Hans-Peter Jansen
  0 siblings, 0 replies; 11+ messages in thread
From: Hans-Peter Jansen @ 2002-01-26 13:10 UTC (permalink / raw)
  To: Nicholas Lee, linux-kernel; +Cc: Jani Forssell, Ville Herva

On Thursday, 24. January 2002 23:40, Nicholas Lee wrote:
> Just a follow up on the previous report.
>
> Replace the kernel with 2.4.18-pre3-ac2 which includes the recent ATA
> driver from Andre Hedric.  NIC was moved from slot 3.
>
>
> nic@hoppa:~$ cat /proc/ide/drivers
> ide-cdrom version 4.59
> ide-disk version 1.12
>
>
> Everything seemed to be running smoothly.
>
> I performed the stress test mentioned here:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=101059003125783&w
> with the added complication of ping flooding and being ping flooded.
>
>
> nic@hoppa:~$ sudo cat /dev/hda > /dev/null & sudo ping -f -s 64000 inktiger
> [1] 445
> PING inktiger.kpac.co.nz (192.168.9.108): 64000 data bytes
> ...........................................................................
>............................................................................
>............................................................................
>............................................................................
>............................................................................
>.......... --- inktiger.kpac.co.nz ping statistics ---
> 406 packets transmitted, 16 packets received, 96% packet loss
> round-trip min/avg/max = 258.3/695.8/1783.5 ms
> nic@hoppa:~$ uname -a
> Linux hoppa 2.4.18-pre3-ac2 #3 Wed Jan 23 10:48:41 NZDT 2002 i686 unknown
>
>
>
> I was going to say "Stable as a rock again", but I thought to soon.
>
> Five mintues later, down it goes:
>
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=28086092,
> sector=2960368 end_request: I/O error, dev 03:07 (hda), sector 2960368
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=28086108,
> sector=2960376 end_request: I/O error, dev 03:07 (hda), sector 2960376
> hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=28086108,
> sector=2960384 end_request: I/O error, dev 03:07 (hda), sector 2960384
>
> Start filling up the console.
>
> Lucky I've got ext3 I think. Nope, as soon as the reboot gets to that
> partition the above messages start filling up the console again.
>
> fsck -c /dev/hda7 only makes things worse. Looks like I'll just have to
> deep six that whole part of the drive.
>
> Power reset - to reset the HD IDE bus - doesn't seem to help matters
> either. Looks like the low-level format part of the drive might be
> corrupted.
>
>
> Note: /dev/hda7 is the /var mount point. I've noted before that often
> problems with the drive and related to CUPS spool events. Network and
> disk IO at the same time.
>
>
> Looks like that part of the drive is completely toasted. I wonder where
> I sould send the bill too. I'm definitely thinking that I should not
> even consider AMD/VIA solutions near core servers.
>
>
> Default settings on boot:
> [nic@inktiger:~] cat hdparm.log
>
> /dev/hda:
>  multcount    = 16 (on)
>  I/O support  =  1 (32-bit)
>  unmaskirq    =  1 (on)
>  using_dma    =  1 (on)
>  keepsettings =  0 (off)
>  nowerr       =  0 (off)
>  readonly     =  0 (off)
>  readahead    =  8 (on)
>  geometry     = 3737/255/63, sectors = 60046560, start = 0
>  busstate     =  1 (on)

While you not mentioning your drive manufacturer, I bet for
IBM (you know, what this acronym stands for: idiots build...)
Actually not all of them, but definitely those, who try to engender
ide harddisks. 

If my bet is right, search for discussions on ibm hd corruption here.

Basically, if you powercycle your system during a write operation,
affected blocks will return hard read errors on future access.

The likelyhood for affecting /var is high. Here it happened
during a procmail delivery of lkml messages.

Try to return those drives to your dealer.

Hans-Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-01-26 13:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-15 20:23 Disk corruption - Abit KT7, 2.2.19+ide patches Nicholas Lee
     [not found] ` <20020115205116.GH51648@niksula.cs.hut.fi>
     [not found]   ` <20020115211032.GC598@inktiger.kiwa.co.nz>
2002-01-15 21:37     ` Nicholas Lee
     [not found]     ` <20020115214049.GI51648@niksula.cs.hut.fi>
2002-01-15 22:02       ` Nicholas Lee
2002-01-15 22:59         ` Ed Sweetman
2002-01-15 23:13           ` Nicholas Lee
2002-01-16  7:07           ` Ville Herva
2002-01-19  2:40             ` Nicholas Lee
2002-01-19 10:44               ` Ville Herva
2002-01-24 22:40 ` 2.4.18-pre3-ac2 still having problems (was Disk corruption - Abit KT7, 2.2.19+ide patches) Nicholas Lee
2002-01-26 13:10   ` Hans-Peter Jansen
2002-01-25  1:14 ` Disk corruption - Abit KT7, 2.2.19+ide patches Tim Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox