linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* FW: LIBATA issue with SATA drive
@ 2007-10-17 16:13 Ip, Clarence
  2007-10-18  4:09 ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Ip, Clarence @ 2007-10-17 16:13 UTC (permalink / raw)
  To: htejun; +Cc: linux-ide

Hello Tejun,

I've CC'd the mailing list, but couldn't find Torsten's e-mail. Do you
have any suggestions on where I should look for this problem? Did the
boot-up log suggest anything to you?

- Clarence


-----Original Message-----
From: Tejun Heo [mailto:htejun@gmail.com] 
Sent: Wednesday, October 17, 2007 4:32 AM
To: Ip, Clarence
Subject: Re: LIBATA issue with SATA drive

Ip, Clarence wrote:
> Hello Tejun,
> 
> I'm using the 2.6.22 sources with a Sii3132 SATA controller and a
> Seagate HDD. What I've noticed is that it often fails to boot with
this
> configuration. Warm boots appear to always fail while cold boots
> (power-cycles) fail maybe 20%-50% of the time. Here's a log from my
last
> attempt:

Hmmm... We had a similar report against -mm kernel but that case seemed
like newly introduced with recent -mm changes.  Do you mind cc'ing the
other reporter and linux-ide@vger.kernel.org mailing list?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: LIBATA issue with SATA drive
  2007-10-17 16:13 FW: LIBATA issue with SATA drive Ip, Clarence
@ 2007-10-18  4:09 ` Tejun Heo
  2007-10-18 18:52   ` Torsten Kaiser
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2007-10-18  4:09 UTC (permalink / raw)
  To: Ip, Clarence; +Cc: linux-ide, just.for.lkml, jens.axboe

Hello, all.

Torsten, Clarence is reporting similar problem on 2.6.22.  The original
message follows.

> I'm using the 2.6.22 sources with a Sii3132 SATA controller and a Seagate HDD.
> What I've noticed is that it often fails to boot with this configuration.
> Warm boots appear to always fail while cold boots (power-cycles) fail maybe
> 20%-50% of the time. Here's a log from my last attempt:
> 
> Loading iSCSI transport class v2.0-724.
> PCI: Enabling device 0000:01:00.0 (0000 -> 0003)
> scsi0 : sata_sil24
> scsi1 : sata_sil24
> ata1: SATA max UDMA/100 cmd 0xe1258000 ctl 0x00000000 bmdma 0x00000000 irq 0
> ata2: SATA max UDMA/100 cmd 0xe125a000 ctl 0x00000000 bmdma 0x00000000 irq 0
> ata1: SATA link down (SStatus 0 SControl 0)
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 0)
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2: failed to recover some devices, retrying in 5 secs
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 0)
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2: limiting SATA link speed to 1.5 Gbps
> ata2.00: limiting speed to UDMA7:PIO5
> ata2: failed to recover some devices, retrying in 5 secs
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2: failed to recover some devices, retrying in 5 secs
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
> ata2: EH pending after completion, repeating EH (cnt=4)
> i2c /dev entries driver
> IBM IIC driver v2.1
> 
> If the boot succeeds, I have no further problems with HDD access.
> I also did not see this problem with the 2.6.17 kernel. Do you have
> any ideas as to what may be happening? We're running on a PPC440SPE
> processor.
> 
> Thanks in advance, 

The symptom seems very similar to yours but the kernel is 2.6.22 which
doesn't have the SG change which you found out to be broken.  Can you
update us on how the testing of patched kernel went?

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: LIBATA issue with SATA drive
  2007-10-18  4:09 ` Tejun Heo
@ 2007-10-18 18:52   ` Torsten Kaiser
  2007-10-19  1:13     ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Torsten Kaiser @ 2007-10-18 18:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Ip, Clarence, linux-ide, jens.axboe

On 10/18/07, Tejun Heo <htejun@gmail.com> wrote:
> Hello, all.
>
> Torsten, Clarence is reporting similar problem on 2.6.22.  The original
> message follows.

I don't think it matches.

> > I'm using the 2.6.22 sources with a Sii3132 SATA controller and a Seagate HDD.
> > What I've noticed is that it often fails to boot with this configuration.
> > Warm boots appear to always fail while cold boots (power-cycles) fail maybe
> > 20%-50% of the time. Here's a log from my last attempt:

For me warm boots always worked and even cold boots only failed after
turning the power off for several minutes. (I tested with ~1 hour
downtime)
But the failures on cold boot where "relativ" reliable, I would guess >50%.

> > Loading iSCSI transport class v2.0-724.
> > PCI: Enabling device 0000:01:00.0 (0000 -> 0003)
> > scsi0 : sata_sil24
> > scsi1 : sata_sil24
> > ata1: SATA max UDMA/100 cmd 0xe1258000 ctl 0x00000000 bmdma 0x00000000 irq 0
> > ata2: SATA max UDMA/100 cmd 0xe125a000 ctl 0x00000000 bmdma 0x00000000 irq 0
> > ata1: SATA link down (SStatus 0 SControl 0)
> > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 0)
> > ata2.00: qc timeout (cmd 0xec)
> > ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > ata2: failed to recover some devices, retrying in 5 secs
> > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 0)
> > ata2.00: qc timeout (cmd 0xec)
> > ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > ata2: limiting SATA link speed to 1.5 Gbps
> > ata2.00: limiting speed to UDMA7:PIO5

My failure also came much later. It seemed the first write command (to
a md raid) triggered it.
All of my drives where always detected correctly.

> > ata2: failed to recover some devices, retrying in 5 secs
> > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
> > ata2.00: qc timeout (cmd 0xec)
> > ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> > ata2: failed to recover some devices, retrying in 5 secs
> > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
> > ata2: EH pending after completion, repeating EH (cnt=4)
> > i2c /dev entries driver
> > IBM IIC driver v2.1
> >
> > If the boot succeeds, I have no further problems with HDD access.
> > I also did not see this problem with the 2.6.17 kernel. Do you have
> > any ideas as to what may be happening? We're running on a PPC440SPE
> > processor.

I'm using x86_64 on an Opteron system.

> The symptom seems very similar to yours but the kernel is 2.6.22 which
> doesn't have the SG change which you found out to be broken.  Can you
> update us on how the testing of patched kernel went?

Sorry I didn't realize that you where still waiting for a 'confirm
good'. I intended only to mail, if I got the error again, as the debug
output about SGE_TRM confirmed, that this fix changes the behavior of
sata_sil24 and there was no really 200% sure method of detecting that
this bug was gone.

Anyway, after fixing ata_sg_is_last() by adding the +1 I did not had a
single failure.

I'm currently using 2.6.23-mm1 and 16 boots where all good.
(Apart from the unrelated failure with sata_nv and swncq...)

Torsten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: LIBATA issue with SATA drive
  2007-10-18 18:52   ` Torsten Kaiser
@ 2007-10-19  1:13     ` Tejun Heo
  2007-10-19  1:32       ` Jeff Garzik
  2007-10-19  4:51       ` Torsten Kaiser
  0 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2007-10-19  1:13 UTC (permalink / raw)
  To: Torsten Kaiser; +Cc: Ip, Clarence, linux-ide, jens.axboe

Hello,

Torsten Kaiser wrote:
>> The symptom seems very similar to yours but the kernel is 2.6.22 which
>> doesn't have the SG change which you found out to be broken.  Can you
>> update us on how the testing of patched kernel went?
> 
> Sorry I didn't realize that you where still waiting for a 'confirm
> good'. I intended only to mail, if I got the error again, as the debug
> output about SGE_TRM confirmed, that this fix changes the behavior of
> sata_sil24 and there was no really 200% sure method of detecting that
> this bug was gone.
> 
> Anyway, after fixing ata_sg_is_last() by adding the +1 I did not had a
> single failure.
> 
> I'm currently using 2.6.23-mm1 and 16 boots where all good.
> (Apart from the unrelated failure with sata_nv and swncq...)

I see, a different issue then.  It's just weird to see similar issues 
popping up now after allterm the time sata_sil24 has been around.

Clarence, it seems it could be that what's broken is irq routing not 
sata_sil24 itself  I dunno anything about irq routing on PPC440SPE but I 
bet there are kernel parameters to adjust them including irqpoll.  Do 
those have any effect?

--
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: LIBATA issue with SATA drive
  2007-10-19  1:13     ` Tejun Heo
@ 2007-10-19  1:32       ` Jeff Garzik
  2007-10-19  2:36         ` Tejun Heo
  2007-10-19  4:51       ` Torsten Kaiser
  1 sibling, 1 reply; 10+ messages in thread
From: Jeff Garzik @ 2007-10-19  1:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Torsten Kaiser, Ip, Clarence, linux-ide, jens.axboe

Tejun Heo wrote:
> Hello,
> 
> Torsten Kaiser wrote:
>>> The symptom seems very similar to yours but the kernel is 2.6.22 which
>>> doesn't have the SG change which you found out to be broken.  Can you
>>> update us on how the testing of patched kernel went?
>>
>> Sorry I didn't realize that you where still waiting for a 'confirm
>> good'. I intended only to mail, if I got the error again, as the debug
>> output about SGE_TRM confirmed, that this fix changes the behavior of
>> sata_sil24 and there was no really 200% sure method of detecting that
>> this bug was gone.
>>
>> Anyway, after fixing ata_sg_is_last() by adding the +1 I did not had a
>> single failure.
>>
>> I'm currently using 2.6.23-mm1 and 16 boots where all good.
>> (Apart from the unrelated failure with sata_nv and swncq...)
> 
> I see, a different issue then.  It's just weird to see similar issues 
> popping up now after allterm the time sata_sil24 has been around.

What kernel version are we talking about?  If it includes sg-chaining 
via a git tree somewhere, sata_sil24 has a bug that was just fixed (by 
removing ata_sg_is_last).

	Jeff




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: LIBATA issue with SATA drive
  2007-10-19  1:32       ` Jeff Garzik
@ 2007-10-19  2:36         ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2007-10-19  2:36 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Torsten Kaiser, Ip, Clarence, linux-ide, jens.axboe

Jeff Garzik wrote:
> Tejun Heo wrote:
>> Hello,
>>
>> Torsten Kaiser wrote:
>>>> The symptom seems very similar to yours but the kernel is 2.6.22 which
>>>> doesn't have the SG change which you found out to be broken.  Can you
>>>> update us on how the testing of patched kernel went?
>>>
>>> Sorry I didn't realize that you where still waiting for a 'confirm
>>> good'. I intended only to mail, if I got the error again, as the debug
>>> output about SGE_TRM confirmed, that this fix changes the behavior of
>>> sata_sil24 and there was no really 200% sure method of detecting that
>>> this bug was gone.
>>>
>>> Anyway, after fixing ata_sg_is_last() by adding the +1 I did not had a
>>> single failure.
>>>
>>> I'm currently using 2.6.23-mm1 and 16 boots where all good.
>>> (Apart from the unrelated failure with sata_nv and swncq...)
>>
>> I see, a different issue then.  It's just weird to see similar issues 
>> popping up now after allterm the time sata_sil24 has been around.
> 
> What kernel version are we talking about?  If it includes sg-chaining 
> via a git tree somewhere, sata_sil24 has a bug that was just fixed (by 
> removing ata_sg_is_last).

Yeah, that was what I thought and the reason why I cc'd Torsten but the 
kernel version in question here is 2.6.22.  :-(

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: LIBATA issue with SATA drive
  2007-10-19  1:13     ` Tejun Heo
  2007-10-19  1:32       ` Jeff Garzik
@ 2007-10-19  4:51       ` Torsten Kaiser
  2007-10-19  5:05         ` Tejun Heo
  1 sibling, 1 reply; 10+ messages in thread
From: Torsten Kaiser @ 2007-10-19  4:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Ip, Clarence, linux-ide, jens.axboe

On 10/19/07, Tejun Heo <htejun@gmail.com> wrote:
> Torsten Kaiser wrote:
> > I'm currently using 2.6.23-mm1 and 16 boots where all good.
> > (Apart from the unrelated failure with sata_nv and swncq...)
>
> I see, a different issue then.  It's just weird to see similar issues
> popping up now after allterm the time sata_sil24 has been around.

Just remebered another thing about sata_sil24 that popped up with 2.6.23-mm1.
With this kernel version (comparing to 2.6.23-rc8-mm1) the port
probing time goes up from ~0.5 seconds per port/drive to ~2 seconds.
Also the SControl changed:

2.6.23-rc8-mm1:
[    4.110000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
2.6.23-mm1:
[    5.930000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)

But except for increased delay during boot no errors can seen, the
drives work normally.

> Clarence, it seems it could be that what's broken is irq routing not
> sata_sil24 itself  I dunno anything about irq routing on PPC440SPE but I
> bet there are kernel parameters to adjust them including irqpoll.  Do
> those have any effect?

(Both kernel assign the same irq (17) to the controller. So at least
this delay is not caused by irq routing)

Part of what I suspect in this case is the ACPI of my bios. As there
is one checksum error and the cable detection of the pata port is
completly broken, it might be that this delay is also its fault.

But as I think that ACPI is limited to x86, I doubt this is related...

Torsten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: LIBATA issue with SATA drive
  2007-10-19  4:51       ` Torsten Kaiser
@ 2007-10-19  5:05         ` Tejun Heo
  2007-10-19 16:21           ` Torsten Kaiser
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2007-10-19  5:05 UTC (permalink / raw)
  To: Torsten Kaiser; +Cc: Ip, Clarence, linux-ide, jens.axboe

Torsten Kaiser wrote:
> On 10/19/07, Tejun Heo <htejun@gmail.com> wrote:
>> Torsten Kaiser wrote:
>>> I'm currently using 2.6.23-mm1 and 16 boots where all good.
>>> (Apart from the unrelated failure with sata_nv and swncq...)
>> I see, a different issue then.  It's just weird to see similar issues
>> popping up now after allterm the time sata_sil24 has been around.
> 
> Just remebered another thing about sata_sil24 that popped up with 2.6.23-mm1.
> With this kernel version (comparing to 2.6.23-rc8-mm1) the port
> probing time goes up from ~0.5 seconds per port/drive to ~2 seconds.
> Also the SControl changed:
> 
> 2.6.23-rc8-mm1:
> [    4.110000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> 2.6.23-mm1:
> [    5.930000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> 
> But except for increased delay during boot no errors can seen, the
> drives work normally.

Yeah, PARTIAL / SLUMBER mode restriction is lifted.  Dunno whether 
that's related to the increased delay tho.  Will investigate.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: LIBATA issue with SATA drive
  2007-10-19  5:05         ` Tejun Heo
@ 2007-10-19 16:21           ` Torsten Kaiser
  2007-10-23  2:58             ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Torsten Kaiser @ 2007-10-19 16:21 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

On 10/19/07, Tejun Heo <htejun@gmail.com> wrote:
> Torsten Kaiser wrote:
> > Just remebered another thing about sata_sil24 that popped up with 2.6.23-mm1.
> > With this kernel version (comparing to 2.6.23-rc8-mm1) the port
> > probing time goes up from ~0.5 seconds per port/drive to ~2 seconds.
> > Also the SControl changed:
> >
> > 2.6.23-rc8-mm1:
> > [    4.110000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > 2.6.23-mm1:
> > [    5.930000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> >
> > But except for increased delay during boot no errors can seen, the
> > drives work normally.
>
> Yeah, PARTIAL / SLUMBER mode restriction is lifted.  Dunno whether
> that's related to the increased delay tho.  Will investigate.

But don't invest too much time into this.
That the delay is part of the broken ACPI of this board/bios seems
very likely to me.
(And apart from that delay there seems to be no other changes.)

Torsten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: FW: LIBATA issue with SATA drive
  2007-10-19 16:21           ` Torsten Kaiser
@ 2007-10-23  2:58             ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2007-10-23  2:58 UTC (permalink / raw)
  To: Torsten Kaiser; +Cc: linux-ide

Torsten Kaiser wrote:
> On 10/19/07, Tejun Heo <htejun@gmail.com> wrote:
>> Torsten Kaiser wrote:
>>> Just remebered another thing about sata_sil24 that popped up with 2.6.23-mm1.
>>> With this kernel version (comparing to 2.6.23-rc8-mm1) the port
>>> probing time goes up from ~0.5 seconds per port/drive to ~2 seconds.
>>> Also the SControl changed:
>>>
>>> 2.6.23-rc8-mm1:
>>> [    4.110000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> 2.6.23-mm1:
>>> [    5.930000] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>>>
>>> But except for increased delay during boot no errors can seen, the
>>> drives work normally.
>> Yeah, PARTIAL / SLUMBER mode restriction is lifted.  Dunno whether
>> that's related to the increased delay tho.  Will investigate.
> 
> But don't invest too much time into this.
> That the delay is part of the broken ACPI of this board/bios seems
> very likely to me.
> (And apart from that delay there seems to be no other changes.)

Alright, found it.  Its because sata_sil24 now uses hardreset during
probing.  This behavior has changed because sil24 now supports PMP and
some PMPs don't work with only SRST.  HRST uses longer timing values to
make sure PHY gets stable before proceeding and that's where the extra
wait is coming from.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-10-23  2:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-17 16:13 FW: LIBATA issue with SATA drive Ip, Clarence
2007-10-18  4:09 ` Tejun Heo
2007-10-18 18:52   ` Torsten Kaiser
2007-10-19  1:13     ` Tejun Heo
2007-10-19  1:32       ` Jeff Garzik
2007-10-19  2:36         ` Tejun Heo
2007-10-19  4:51       ` Torsten Kaiser
2007-10-19  5:05         ` Tejun Heo
2007-10-19 16:21           ` Torsten Kaiser
2007-10-23  2:58             ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).