PROBLEM: Silicon Image 3112 Lockups

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* PROBLEM: Silicon Image 3112 Lockups
@ 2005-09-07  1:07 Jeremy Smith
  2005-09-07  2:01 ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Smith @ 2005-09-07  1:07 UTC (permalink / raw)
  To: jgarzik; +Cc: linux-ide



I'm working on a system (K8N-DL) with a 3114 driver running 2.6.12 and 
experiencing lockups on heavy disk access.  This is the built-in on an 
ASUS-K8N-DL board.  I was wondering if the following problem is 
symptomatic of a known bug.

I will get momentary freezes, and then I'll continue.  At some point 
though, the disk access will freeze--no panic, but a lockup of all disk 
access.  I have configured the first drive as a single drive concatenation 
in the "RAID" bootup and haven't done anything with the second drive.  The 
large logical partition on each drive is configured with software RAID.

Here is one instance from the system log.  A hardboot was required.

Sep  6 17:49:12 localhost kernel: Bootdata ok (command line is 
root=/dev/ram0 mem=3000M init=/linuxrc real_root=/dev/sda2 vga=0x317)
Sep  6 17:49:12 localhost kernel: Memory: 3015772k/3072000k available 
(3153k kernel code, 0k reserved, 1323k data, 224k init)
Sep  6 17:49:14 localhost kernel: ata1: SATA max UDMA/100 cmd 
0xFFFFC2000091E080 ctl 0xFFFFC2000091E08A bmdma 0xFFFFC2000091E000 irq 3
Sep  6 17:49:14 localhost kernel: ata2: SATA max UDMA/100 cmd 
0xFFFFC2000091E0C0 ctl 0xFFFFC2000091E0CA bmdma 0xFFFFC2000091E008 irq 3
Sep  6 17:49:14 localhost kernel: ata3: SATA max UDMA/100 cmd 
0xFFFFC2000091E280 ctl 0xFFFFC2000091E28A bmdma 0xFFFFC2000091E200 irq 3
Sep  6 17:49:14 localhost kernel: ata4: SATA max UDMA/100 cmd 
0xFFFFC2000091E2C0 ctl 0xFFFFC2000091E2CA bmdma 0xFFFFC2000091E208 irq 3
Sep  6 17:49:14 localhost kernel: ata1: dev 0 ATA, max UDMA/133, 488397168 
sectors: lba48
Sep  6 17:49:14 localhost kernel: ata1: dev 0 configured for UDMA/100
Sep  6 17:49:14 localhost kernel: scsi0 : sata_sil
Sep  6 17:49:14 localhost kernel: ata2: dev 0 ATA, max UDMA/133, 488397168 
sectors: lba48
Sep  6 17:49:14 localhost kernel: ata2: dev 0 configured for UDMA/100
Sep  6 17:49:14 localhost kernel: scsi1 : sata_sil
Sep  6 17:49:14 localhost kernel: ata3: no device found (phy stat 
00000000)
Sep  6 17:49:14 localhost kernel: scsi2 : sata_sil
Sep  6 17:49:14 localhost kernel: ata4: no device found (phy stat 
00000000)
Sep  6 17:49:14 localhost kernel: scsi3 : sata_sil
Sep  6 17:49:15 localhost kernel: EXT3-fs: mounted filesystem with ordered 
data mode.
Sep  6 17:49:15 localhost kernel: EXT3-fs: mounted filesystem with ordered 
data mode.
Sep  6 18:09:44 localhost kernel: ata1: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:09:44 localhost kernel: ata1: error=0x04 { DriveStatusError }
Sep  6 18:09:51 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:09:51 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:01 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:01 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:09 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:09 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:16 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:16 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:17 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:17 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:20 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:20 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:22 localhost kernel: ata1: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:22 localhost kernel: ata1: error=0x04 { DriveStatusError }
Sep  6 18:10:23 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:23 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:23 localhost kernel: ata1: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:23 localhost kernel: ata1: error=0x04 { DriveStatusError }
Sep  6 18:10:24 localhost kernel: ata1: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:24 localhost kernel: ata1: error=0x04 { DriveStatusError }
Sep  6 18:10:24 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:24 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:25 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:25 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:25 localhost kernel: ata1: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:25 localhost kernel: ata1: error=0x04 { DriveStatusError }
Sep  6 18:10:26 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:26 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:26 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:26 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:27 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:27 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:43 localhost kernel: ata1: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:43 localhost kernel: ata1: error=0x04 { DriveStatusError }
Sep  6 18:10:50 localhost kernel: ata1: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:50 localhost kernel: ata1: error=0x04 { DriveStatusError }
Sep  6 18:10:50 localhost kernel: ata1: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:50 localhost kernel: ata1: error=0x04 { DriveStatusError }
Sep  6 18:10:53 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:53 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:10:55 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:10:55 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 18:11:12 localhost kernel: ata2: status=0x51 { DriveReady 
SeekComplete Error }
Sep  6 18:11:12 localhost kernel: ata2: error=0x04 { DriveStatusError }
Sep  6 19:27:11 localhost kernel: Bootdata ok (command line is 
root=/dev/ram0 mem=3000M init=/linuxrc real_root=/dev/sda2 vga=0x317)


Linux localhost 2.6.12-gentoo-r9 #1 SMP Sat Sep 3 02:05:00 MDT 2005 
x86_64 AMD Opteron(tm) Processor 244 AuthenticAMD GNU/Linux

Gnu C                  3.4.4
Gnu make               3.80
binutils               2.15.92.0.2
util-linux             2.12i
mount                  2.12i
module-init-tools      3.0
e2fsprogs              1.38
reiserfsprogs          line
reiser4progs           line
Linux C Library        2.3.5
Dynamic linker (ldd)   2.3.5
Procps                 3.2.5
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
udev                   068
Modules Loaded         nvidia vmnet parport_pc parport vmmon snd_ca0106 
snd_ac97_codec snd_pcm snd_timer snd snd_page_alloc tg3 ata_piix sata_sil 
libata sbp2 ohci1394 ieee1394 ohci_hcd uhci_hcd usb_storage usbhid ehci_hcd

dd if=/dev/sda of=/dev/null can reproduct error, as can several 
disk-intensive activities

Thanks,
Jer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PROBLEM: Silicon Image 3112 Lockups
  2005-09-07  1:07 PROBLEM: Silicon Image 3112 Lockups Jeremy Smith
@ 2005-09-07  2:01 ` Tejun Heo
  2005-09-07  2:13   ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2005-09-07  2:01 UTC (permalink / raw)
  To: Jeremy Smith, Alexander Shaposhnikov, Carlos Pardo, Paul Taylor
  Cc: jgarzik, linux-ide


  Hello, Jeremy.

Jeremy Smith wrote:
> 
> 
> I'm working on a system (K8N-DL) with a 3114 driver running 2.6.12 and 
> experiencing lockups on heavy disk access.  This is the built-in on an 
> ASUS-K8N-DL board.  I was wondering if the following problem is 
> symptomatic of a known bug.
> 
> I will get momentary freezes, and then I'll continue.  At some point 
> though, the disk access will freeze--no panic, but a lockup of all disk 
> access.  I have configured the first drive as a single drive 
> concatenation in the "RAID" bootup and haven't done anything with the 
> second drive.  The large logical partition on each drive is configured 
> with software RAID.

  You're the second person reporting similar problem with ASUS K8N-DL 
board.  Please see the following threads.

http://marc.theaimsgroup.com/?l=linux-ide&m=112497821103098&w=2
http://marc.theaimsgroup.com/?l=linux-ide&m=112600646820285&w=2

> dd if=/dev/sda of=/dev/null can reproduct error, as can several 
> disk-intensive activities

  In the following mail, I've attached a patch which might alleviate 
errors during writes (as Alexander was reporting CRC errors with write 
commands), but it won't do any good if you're getting errors during reading.

http://marc.theaimsgroup.com/?l=linux-ide&m=112602112819183&w=2

  Carlos and Paul, do you guys know anything about this mainboard? 
Should we perform some special tweaking to get these boards work?  I'll 
dig 3112/3114 document further but I'm not very sure what I should look for.

  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PROBLEM: Silicon Image 3112 Lockups
  2005-09-07  2:01 ` Tejun Heo
@ 2005-09-07  2:13   ` Jeff Garzik
  2005-09-07  2:34     ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2005-09-07  2:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeremy Smith, Alexander Shaposhnikov, Carlos Pardo, Paul Taylor,
	linux-ide

Tejun Heo wrote:
>  In the following mail, I've attached a patch which might alleviate 
> errors during writes (as Alexander was reporting CRC errors with write 
> commands), but it won't do any good if you're getting errors during 
> reading.
> 
> http://marc.theaimsgroup.com/?l=linux-ide&m=112602112819183&w=2

Note that I would put BIG CAPITAL LETTER WARNINGS on that patch, since 
it messes with the voltage.

	Jeff




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PROBLEM: Silicon Image 3112 Lockups
  2005-09-07  2:13   ` Jeff Garzik
@ 2005-09-07  2:34     ` Tejun Heo
  2005-09-07  2:42       ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2005-09-07  2:34 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Jeremy Smith, Alexander Shaposhnikov, Carlos Pardo, Paul Taylor,
	linux-ide

Jeff Garzik wrote:
> Tejun Heo wrote:
> 
>>  In the following mail, I've attached a patch which might alleviate 
>> errors during writes (as Alexander was reporting CRC errors with write 
>> commands), but it won't do any good if you're getting errors during 
>> reading.
>>
>> http://marc.theaimsgroup.com/?l=linux-ide&m=112602112819183&w=2
> 
> 
> Note that I would put BIG CAPITAL LETTER WARNINGS on that patch, since 
> it messes with the voltage.
> 

  Alexander & Jeremy.

  It's as Jeff said.

  TRY THE PATCH AT YOUR OWN RISK.  IT MIGHT FRY PHY OF YOUR DRIVE. 
(enough capitals?)

  Even if you're brave enough to try, DO NOT GO OVER 600mV.  600mV is at 
least inside specified limits.  Also, it won't change anything regarding 
read errors.  All it does is increasing voltage swing while transmitting 
data (writes).

  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PROBLEM: Silicon Image 3112 Lockups
  2005-09-07  2:34     ` Tejun Heo
@ 2005-09-07  2:42       ` Tejun Heo
  2005-09-07  3:21         ` Jeremy Smith
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2005-09-07  2:42 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Garzik, Jeremy Smith, Alexander Shaposhnikov, Carlos Pardo,
	Paul Taylor, linux-ide

Tejun Heo wrote:
> Jeff Garzik wrote:
> 
>> Tejun Heo wrote:
>>
>>>  In the following mail, I've attached a patch which might alleviate 
>>> errors during writes (as Alexander was reporting CRC errors with 
>>> write commands), but it won't do any good if you're getting errors 
>>> during reading.
>>>
>>> http://marc.theaimsgroup.com/?l=linux-ide&m=112602112819183&w=2
>>
>>
>>
>> Note that I would put BIG CAPITAL LETTER WARNINGS on that patch, since 
>> it messes with the voltage.
>>
> 
>  Alexander & Jeremy.
> 
>  It's as Jeff said.
> 
>  TRY THE PATCH AT YOUR OWN RISK.  IT MIGHT FRY PHY OF YOUR DRIVE. 
> (enough capitals?)
> 
>  Even if you're brave enough to try, DO NOT GO OVER 600mV.  600mV is at 
> least inside specified limits.  Also, it won't change anything regarding 
> read errors.  All it does is increasing voltage swing while transmitting 
> data (writes).

  Oh.. it might affect writes if errors are occurring due to CRC errors 
during command trasmit.  If you're getting ABRT errors instead of 
ICRC's, it might indicate that commands are being mistransferred (again, 
I'm not sure at all).

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PROBLEM: Silicon Image 3112 Lockups
  2005-09-07  2:42       ` Tejun Heo
@ 2005-09-07  3:21         ` Jeremy Smith
  2005-09-07  6:00           ` Tejun Heo
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Smith @ 2005-09-07  3:21 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Garzik, Alexander Shaposhnikov, Carlos Pardo, Paul Taylor,
	linux-ide

On Wed, 7 Sep 2005, Tejun Heo wrote:

> Tejun Heo wrote:
>> Jeff Garzik wrote:
>> 
>>> Tejun Heo wrote:
>>>
>>>>  In the following mail, I've attached a patch which might alleviate 
>>>> errors during writes (as Alexander was reporting CRC errors with write 
>>>> commands), but it won't do any good if you're getting errors during 
>>>> reading.
>>>> 
>>>> http://marc.theaimsgroup.com/?l=linux-ide&m=112602112819183&w=2
>>> 
>>> 
>>> 
>>> Note that I would put BIG CAPITAL LETTER WARNINGS on that patch, since it 
>>> messes with the voltage.
>>> 
>>
>>  Alexander & Jeremy.
>>
>>  It's as Jeff said.
>>
>>  TRY THE PATCH AT YOUR OWN RISK.  IT MIGHT FRY PHY OF YOUR DRIVE. (enough 
>> capitals?)
>>
>>  Even if you're brave enough to try, DO NOT GO OVER 600mV.  600mV is at 
>> least inside specified limits.  Also, it won't change anything regarding 
>> read errors.  All it does is increasing voltage swing while transmitting 
>> data (writes).
>
> Oh.. it might affect writes if errors are occurring due to CRC errors during 
> command trasmit.  If you're getting ABRT errors instead of ICRC's, it might 
> indicate that commands are being mistransferred (again, I'm not sure at all).
>
> -- 
> tejun
>

Did you mean reads here?  Because I think it's happening on reads as 
well--it happens on an "e2fsck -b -n" on the drive when I'm booted off a 
CDROM.  I'm willing to try it out if it could help, but if it's unlikely 
too...

I don't have any idea how these drivers work, but the ASUS K8N-DL also has 
the nvidia SATA controller in it--which doesn't appear to work at all, so 
I started by hooking up the drivers to the SI controller.  Can the mere 
presence of this additional controller make a difference?

For what it's worth, I don't _think_ I was seeing similar lockups until I 
updated the firmware on this board to the latest version (1004 from 1003), 
but that could be a red herring because I also wasn't paying attention to 
syslog.

I've tried changes to cabling...both drives experience the exact same 
symptoms for me; it certainly could be hardware related, but it would be 
on the board, for which I don't have a spare.

Is there any additional information I can provide?

Jer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PROBLEM: Silicon Image 3112 Lockups
  2005-09-07  3:21         ` Jeremy Smith
@ 2005-09-07  6:00           ` Tejun Heo
  2005-09-07  6:09             ` Jeff Garzik
  0 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2005-09-07  6:00 UTC (permalink / raw)
  To: Jeremy Smith
  Cc: Jeff Garzik, Alexander Shaposhnikov, Carlos Pardo, Paul Taylor,
	linux-ide

Jeremy Smith wrote:
> 
> On Wed, 7 Sep 2005, Tejun Heo wrote:
> 
>> Tejun Heo wrote:
>>
>>> Jeff Garzik wrote:
>>>
>>>> Tejun Heo wrote:
>>>>
>>>>>  In the following mail, I've attached a patch which might alleviate 
>>>>> errors during writes (as Alexander was reporting CRC errors with 
>>>>> write commands), but it won't do any good if you're getting errors 
>>>>> during reading.
>>>>>
>>>>> http://marc.theaimsgroup.com/?l=linux-ide&m=112602112819183&w=2
>>>>
>>>>
>>>>
>>>>
>>>> Note that I would put BIG CAPITAL LETTER WARNINGS on that patch, 
>>>> since it messes with the voltage.
>>>>
>>>
>>>  Alexander & Jeremy.
>>>
>>>  It's as Jeff said.
>>>
>>>  TRY THE PATCH AT YOUR OWN RISK.  IT MIGHT FRY PHY OF YOUR DRIVE. 
>>> (enough capitals?)
>>>
>>>  Even if you're brave enough to try, DO NOT GO OVER 600mV.  600mV is 
>>> at least inside specified limits.  Also, it won't change anything 
>>> regarding read errors.  All it does is increasing voltage swing while 
>>> transmitting data (writes).
>>
>>
>> Oh.. it might affect writes if errors are occurring due to CRC errors 
>> during command trasmit.  If you're getting ABRT errors instead of 
>> ICRC's, it might indicate that commands are being mistransferred 
>> (again, I'm not sure at all).
>>
>> -- 
>> tejun
>>
> 
> Did you mean reads here?  Because I think it's happening on reads as 
> well--it happens on an "e2fsck -b -n" on the drive when I'm booted off a 
> CDROM.  I'm willing to try it out if it could help, but if it's unlikely 
> too...

  Yes, I meant reads.  It would be great if somebody tries the patch 
out.  Maybe you and Alexander can coordinate and only one can take the 
risk. ;-p  If I had access to K8N-DL, I would have tested it myself, but 
sadly I don't.  I did test with my discerete sii3112 card and Samsung 
HD160JJ drive at 600mV and had no problem but this doesn't guarantee 
anything for you guys.

  I think it would be nice if Alexander or you can test it but I have to 
warn you again.

  YOU MAY FRY YOUR HARDWARE WITH THIS.

> I don't have any idea how these drivers work, but the ASUS K8N-DL also 
> has the nvidia SATA controller in it--which doesn't appear to work at 
> all, so I started by hooking up the drivers to the SI controller.  Can 
> the mere presence of this additional controller make a difference?

  I doubt that that would have anything to do with this.

> For what it's worth, I don't _think_ I was seeing similar lockups until 
> I updated the firmware on this board to the latest version (1004 from 
> 1003), but that could be a red herring because I also wasn't paying 
> attention to syslog.

  I don't know.  If some specific configurations are required for the 
controller, they are usually done by BIOS (either mainboard BIOS or 
per-controller BIOS), so BIOS update could affect the problem.  But 
these are still just wild speculations.  Maybe we should contact ASUS 
about this?

> I've tried changes to cabling...both drives experience the exact same 
> symptoms for me; it certainly could be hardware related, but it would be 
> on the board, for which I don't have a spare.
> 
> Is there any additional information I can provide?

  Well, I think two same reports for not-so-widespread mainboard 
indicate away from cabling problems.  And I cannot think of any more 
info which could be helpful yet.  I'll let you know if something comes up.

  Thanks & good luck.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PROBLEM: Silicon Image 3112 Lockups
  2005-09-07  6:00           ` Tejun Heo
@ 2005-09-07  6:09             ` Jeff Garzik
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Garzik @ 2005-09-07  6:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeremy Smith, Alexander Shaposhnikov, Carlos Pardo, Paul Taylor,
	linux-ide

Tejun Heo wrote:
>  I don't know.  If some specific configurations are required for the 
> controller, they are usually done by BIOS (either mainboard BIOS or 
> per-controller BIOS), so BIOS update could affect the problem.  But 
> these are still just wild speculations.  Maybe we should contact ASUS 
> about this?


Note that, in the past,  system BIOS updates have cured sata_sil data 
corruption bug reports.  Updating BIOS is always a good idea.

	Jeff



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-09-07  6:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-07  1:07 PROBLEM: Silicon Image 3112 Lockups Jeremy Smith
2005-09-07  2:01 ` Tejun Heo
2005-09-07  2:13   ` Jeff Garzik
2005-09-07  2:34     ` Tejun Heo
2005-09-07  2:42       ` Tejun Heo
2005-09-07  3:21         ` Jeremy Smith
2005-09-07  6:00           ` Tejun Heo
2005-09-07  6:09             ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).