linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Help diagnosing an SATA vs. Sil3112 error on NF7-S 2.0 + FC5/2.6.17 ?
@ 2006-09-04  2:46 jon
  2006-09-04  3:03 ` Tejun Heo
  0 siblings, 1 reply; 3+ messages in thread
From: jon @ 2006-09-04  2:46 UTC (permalink / raw)
  To: htejun; +Cc: linux-ide

    Hi,

    I found a page on the Sil m15w bug at

http://home-tj.org/wiki/index.php/Sil_m15w

    I'm having a Sil3112 vs. Seagate SATA drive problem and am hoping
you can help me narrow down the possibilities as to what's happening.
Specifics:

    FC5 Linux running on an Abit NF-7 S 2.0, upgraded to
        kernel-2.6.17-1.2174_FC5
    Seagate ST3400633AS Rev 3.AA boot disk
    (and also a couple of older PATA disks on the IDE ports).

    So far I've seen two sorts of errors. They both seem to be preceded
by a sort of "chirp" from the drive. The first case resulted in journal
failure and remounting of the partition that occurred on R/O, the second
appeared to be more of a transient failure - after locking up the
machine for a minute, things resumed. The syslogs looked like this:

    First error:
> Sep  2 23:07:56 rocky kernel: ata1: command 0x25 timeout, stat 0x50
host_stat 0x1
> Sep  2 23:07:56 rocky kernel: ata1: status=0x50 { DriveReady SeekComplete }
> Sep  2 23:07:56 rocky kernel: ata1: error=0x01 { AddrMarkNotFound }
> Sep  2 23:07:56 rocky kernel: sda: Current: sense key: No Sense
> Sep  2 23:07:56 rocky kernel:     Additional sense: No additional sense
information
> Sep  2 23:07:56 rocky kernel: EXT3-fs error (device sda4):
ext3_free_blocks: Freeing blocks not in datazone - block = 1977993469,
count = 1
> Sep  2 23:07:56 rocky kernel: Aborting journal on device sda4.
> Sep  2 23:07:56 rocky kernel: ext3_abort called.
> Sep  2 23:07:56 rocky kernel: EXT3-fs error (device sda4):
ext3_journal_start_sb: Detected aborted journal
> Sep  2 23:07:56 rocky kernel: Remounting filesystem read-only
> Sep  2 23:07:56 rocky kernel: EXT3-fs error (device sda4):
ext3_free_blocks: Freeing blocks not in datazone - block = 1499238360,
count = 1
> Sep  2 23:07:56 rocky kernel: EXT3-fs error (device sda4):
ext3_free_blocks: Freeing blocks not in datazone - block = 1092876199,
count = 1
  [... and many, many more of the last line - there were hundreds of
   blocks recovered into lost+found after fsck, although their contents
   may all have been from previously deleted files]

    Second error:
> Sep  3 00:02:18 rocky kernel: ata1: command 0xca timeout, stat 0x50
host_stat 0x1
> Sep  3 00:02:18 rocky kernel: ata1: status=0x50 { DriveReady SeekComplete }
> Sep  3 00:02:18 rocky kernel: ata1: error=0x01 { AddrMarkNotFound }
> Sep  3 00:02:18 rocky kernel: sda: Current: sense key: No Sense
> Sep  3 00:02:18 rocky kernel:     Additional sense: No additional sense
information
> Sep  3 00:02:18 rocky kernel: Info fld=0x1

    Is either of these related to the "m15w" error? Or would you have
any other suggestions as to a known cause of the problem? I looked in
sata_sil.c, and the ST3400633AS is not on the blacklist in this kernel.

    So far I've upgraded the BIOS to the latest from Abit, which
includes a more recent SATA BIOS from Silicon Image, and fiddled with
some of the BIOS settings - particularly changing Ext-P2P Discard from
30us to 1ms, as suggested in a much older NVIDIA/Abit bug dialogue. I
don't know if any of this is actually helping yet, though.

    Thanks very much for any advice you may be able to offer!

    Jon Leech
    jon@alumni.caltech.edu




-- 
VGER BF report: U 0.672189

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Help diagnosing an SATA vs. Sil3112 error on NF7-S 2.0 +     FC5/2.6.17 ?
  2006-09-04  2:46 Help diagnosing an SATA vs. Sil3112 error on NF7-S 2.0 + FC5/2.6.17 ? jon
@ 2006-09-04  3:03 ` Tejun Heo
  2006-09-04 20:51   ` jon
  0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2006-09-04  3:03 UTC (permalink / raw)
  To: jon; +Cc: linux-ide

Hello,

jon@alumni.caltech.edu wrote:
>     So far I've seen two sorts of errors. They both seem to be preceded
> by a sort of "chirp" from the drive. The first case resulted in journal
> failure and remounting of the partition that occurred on R/O, the second
> appeared to be more of a transient failure - after locking up the
> machine for a minute, things resumed. The syslogs looked like this:
> 
>     First error:
>> Sep  2 23:07:56 rocky kernel: ata1: command 0x25 timeout, stat 0x50
> host_stat 0x1
>> Sep  2 23:07:56 rocky kernel: ata1: status=0x50 { DriveReady SeekComplete }
>> Sep  2 23:07:56 rocky kernel: ata1: error=0x01 { AddrMarkNotFound }
>> Sep  2 23:07:56 rocky kernel: sda: Current: sense key: No Sense
>> Sep  2 23:07:56 rocky kernel:     Additional sense: No additional sense
> information
>> Sep  2 23:07:56 rocky kernel: EXT3-fs error (device sda4):
> ext3_free_blocks: Freeing blocks not in datazone - block = 1977993469,
> count = 1
>> Sep  2 23:07:56 rocky kernel: Aborting journal on device sda4.
>> Sep  2 23:07:56 rocky kernel: ext3_abort called.
>> Sep  2 23:07:56 rocky kernel: EXT3-fs error (device sda4):
> ext3_journal_start_sb: Detected aborted journal
>> Sep  2 23:07:56 rocky kernel: Remounting filesystem read-only
>> Sep  2 23:07:56 rocky kernel: EXT3-fs error (device sda4):
> ext3_free_blocks: Freeing blocks not in datazone - block = 1499238360,
> count = 1
>> Sep  2 23:07:56 rocky kernel: EXT3-fs error (device sda4):
> ext3_free_blocks: Freeing blocks not in datazone - block = 1092876199,
> count = 1
>   [... and many, many more of the last line - there were hundreds of
>    blocks recovered into lost+found after fsck, although their contents
>    may all have been from previously deleted files]
> 
>     Second error:
>> Sep  3 00:02:18 rocky kernel: ata1: command 0xca timeout, stat 0x50
> host_stat 0x1
>> Sep  3 00:02:18 rocky kernel: ata1: status=0x50 { DriveReady SeekComplete }
>> Sep  3 00:02:18 rocky kernel: ata1: error=0x01 { AddrMarkNotFound }
>> Sep  3 00:02:18 rocky kernel: sda: Current: sense key: No Sense
>> Sep  3 00:02:18 rocky kernel:     Additional sense: No additional sense
> information
>> Sep  3 00:02:18 rocky kernel: Info fld=0x1
> 
>     Is either of these related to the "m15w" error? Or would you have
> any other suggestions as to a known cause of the problem? I looked in
> sata_sil.c, and the ST3400633AS is not on the blacklist in this kernel.

No, none is related to m15w.  It seems that your drive is failing some 
commands w/ ID not found error, which might be a media problem. 
Anyways, libata is having problem recovering from the error condition 
and retrying the command, thus the catastrophe.

>     So far I've upgraded the BIOS to the latest from Abit, which
> includes a more recent SATA BIOS from Silicon Image, and fiddled with
> some of the BIOS settings - particularly changing Ext-P2P Discard from
> 30us to 1ms, as suggested in a much older NVIDIA/Abit bug dialogue. I
> don't know if any of this is actually helping yet, though.

I'm skeptical.

Can you try 2.6.18-rc5?  Latest libata has much improved error handling. 
  If the error your drive is reporting are transient, new libata EH 
should be able to recover from most of them and, even if not, it will 
help diagnosing the problem.

Thanks.

-- 
tejun

-- 
VGER BF report: H 3.80529e-06

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Help diagnosing an SATA vs. Sil3112 error on NF7-S 2.0 +       FC5/2.6.17 ?
  2006-09-04  3:03 ` Tejun Heo
@ 2006-09-04 20:51   ` jon
  0 siblings, 0 replies; 3+ messages in thread
From: jon @ 2006-09-04 20:51 UTC (permalink / raw)
  To: linux-ide

Tejun Heo writes:
> No, none is related to m15w.  It seems that your drive is failing some
> commands w/ ID not found error, which might be a media problem.
> Anyways, libata is having problem recovering from the error condition
> and retrying the command, thus the catastrophe.
>
>>     So far I've upgraded the BIOS to the latest from Abit, which
>> includes a more recent SATA BIOS from Silicon Image, and fiddled with
>> some of the BIOS settings - particularly changing Ext-P2P Discard from
>> 30us to 1ms, as suggested in a much older NVIDIA/Abit bug dialogue. I
>> don't know if any of this is actually helping yet, though.
>
> I'm skeptical.
>
> Can you try 2.6.18-rc5?  Latest libata has much improved error handling.
>   If the error your drive is reporting are transient, new libata EH
> should be able to recover from most of them and, even if not, it will
> help diagnosing the problem.

    Unfortunately, Fedora hasn't packaged up the 2.6.18 kernels, and
this is not yet a bad enough problem yet to motivate me to build my own.
BIOS adjustments and updates may have worked around the problem - it
hasn't happened again since doing them. If not I'll just back off on
using this drive until I can get a 2.6.18 kernel. I was hoping this was
a well-known problem :-(
    Thanks!
    Jon


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-09-04 20:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-04  2:46 Help diagnosing an SATA vs. Sil3112 error on NF7-S 2.0 + FC5/2.6.17 ? jon
2006-09-04  3:03 ` Tejun Heo
2006-09-04 20:51   ` jon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).