* System freezes with kernel >2.6.19 - sata_nv
@ 2007-04-29 22:27 Stefan
2007-04-30 10:04 ` Tejun Heo
0 siblings, 1 reply; 11+ messages in thread
From: Stefan @ 2007-04-29 22:27 UTC (permalink / raw)
To: linux-ide
[-- Attachment #1: Type: text/plain, Size: 761 bytes --]
Hi folks,
yesterday I upgraded kernel 2.6.19 to 2.6.20 (gentoo kernel). Now my
box locks up about 10 min after boot.
After that I tested with a vanilla 2.6.21.1 it shows the same behavior.
I'm attaching a kern log file from the 2.6.20. The 2.6.21.1 locked up so
hard, that there was no trace left in the log file.
If I switch back to 2.6.19 everything is fine again.
If necessary I will hook up a laptop to this box, so I can capture
messages via netconsole.
This machine is running an AMD X2 64, NFORCE4 (ASUS A8N-E)
I haven't been following the development of libata for a while, but from
the 2.6.20 changelog It looks like there have been some major changes.
Just let me know what kind of information you need in order to narrow it
down.
Cheers Stefan
[-- Attachment #2: kernboot2.6.20.txt.gz --]
[-- Type: application/x-tar, Size: 12277 bytes --]
[-- Attachment #3: 2.6.21.1config.gz --]
[-- Type: application/x-tar, Size: 11188 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv
2007-04-29 22:27 System freezes with kernel >2.6.19 - sata_nv Stefan
@ 2007-04-30 10:04 ` Tejun Heo
2007-04-30 23:44 ` Robert Hancock
2007-05-01 0:19 ` System freezes with kernel >2.6.19 - sata_nv Stefan
0 siblings, 2 replies; 11+ messages in thread
From: Tejun Heo @ 2007-04-30 10:04 UTC (permalink / raw)
To: Stefan; +Cc: linux-ide, Robert Hancock
Stefan wrote:
> Hi folks,
>
> yesterday I upgraded kernel 2.6.19 to 2.6.20 (gentoo kernel). Now my
> box locks up about 10 min after boot.
> After that I tested with a vanilla 2.6.21.1 it shows the same behavior.
> I'm attaching a kern log file from the 2.6.20. The 2.6.21.1 locked up so
> hard, that there was no trace left in the log file.
> If I switch back to 2.6.19 everything is fine again.
> If necessary I will hook up a laptop to this box, so I can capture
> messages via netconsole.
Yes please.
> This machine is running an AMD X2 64, NFORCE4 (ASUS A8N-E)
>
> I haven't been following the development of libata for a while, but from
> the 2.6.20 changelog It looks like there have been some major changes.
>
> Just let me know what kind of information you need in order to narrow it
> down.
Does giving 'sata_nv.adma=0' kernel parameter make any difference?
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv
2007-04-30 10:04 ` Tejun Heo
@ 2007-04-30 23:44 ` Robert Hancock
2007-05-01 2:24 ` Tejun Heo
2007-05-01 0:19 ` System freezes with kernel >2.6.19 - sata_nv Stefan
1 sibling, 1 reply; 11+ messages in thread
From: Robert Hancock @ 2007-04-30 23:44 UTC (permalink / raw)
To: Tejun Heo; +Cc: Stefan, linux-ide
Tejun Heo wrote:
> Stefan wrote:
>> Hi folks,
>>
>> yesterday I upgraded kernel 2.6.19 to 2.6.20 (gentoo kernel). Now my
>> box locks up about 10 min after boot.
>> After that I tested with a vanilla 2.6.21.1 it shows the same behavior.
>> I'm attaching a kern log file from the 2.6.20. The 2.6.21.1 locked up so
>> hard, that there was no trace left in the log file.
>> If I switch back to 2.6.19 everything is fine again.
>> If necessary I will hook up a laptop to this box, so I can capture
>> messages via netconsole.
>
> Yes please.
>
>> This machine is running an AMD X2 64, NFORCE4 (ASUS A8N-E)
>>
>> I haven't been following the development of libata for a while, but from
>> the 2.6.20 changelog It looks like there have been some major changes.
>>
>> Just let me know what kind of information you need in order to narrow it
>> down.
>
> Does giving 'sata_nv.adma=0' kernel parameter make any difference?
If adma=0 words, then that means it's either an ADMA related problem or
that SAMSUNG HD401LJ drive has some problems with NCQ (since ADMA off
means NCQ off as well). I would say the latter is more likely.
We should really have some kind of "noncq" kernel parameter we can use
to help debugging these problems. Though, later kernels are supposed to
switch it off automatically after too many errors..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv
2007-04-30 23:44 ` Robert Hancock
@ 2007-05-01 2:24 ` Tejun Heo
2007-05-01 23:27 ` System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1] Stefan
0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2007-05-01 2:24 UTC (permalink / raw)
To: Robert Hancock; +Cc: Stefan, linux-ide
Robert Hancock wrote:
> Tejun Heo wrote:
>> Stefan wrote:
>>> Hi folks,
>>>
>>> yesterday I upgraded kernel 2.6.19 to 2.6.20 (gentoo kernel). Now my
>>> box locks up about 10 min after boot.
>>> After that I tested with a vanilla 2.6.21.1 it shows the same behavior.
>>> I'm attaching a kern log file from the 2.6.20. The 2.6.21.1 locked up so
>>> hard, that there was no trace left in the log file.
>>> If I switch back to 2.6.19 everything is fine again.
>>> If necessary I will hook up a laptop to this box, so I can capture
>>> messages via netconsole.
>>
>> Yes please.
>>
>>> This machine is running an AMD X2 64, NFORCE4 (ASUS A8N-E)
>>>
>>> I haven't been following the development of libata for a while, but from
>>> the 2.6.20 changelog It looks like there have been some major changes.
>>>
>>> Just let me know what kind of information you need in order to narrow it
>>> down.
>>
>> Does giving 'sata_nv.adma=0' kernel parameter make any difference?
>
> If adma=0 words, then that means it's either an ADMA related problem or
> that SAMSUNG HD401LJ drive has some problems with NCQ (since ADMA off
> means NCQ off as well). I would say the latter is more likely.
I don't have first hand experience with the particular model but I'll be
surprised if they screwed their firmware up with new generation of
harddisks. Firmware on the previous generation drives was pretty good
and they don't get worse usually.
> We should really have some kind of "noncq" kernel parameter we can use
> to help debugging these problems. Though, later kernels are supposed to
> switch it off automatically after too many errors..
Till now there hasn't been any case where a broken NCQ prevented a
machine from booting but, yeah, having such thing would be nice for
debugging.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1]
2007-05-01 2:24 ` Tejun Heo
@ 2007-05-01 23:27 ` Stefan
2007-05-01 23:57 ` Robert Hancock
0 siblings, 1 reply; 11+ messages in thread
From: Stefan @ 2007-05-01 23:27 UTC (permalink / raw)
To: Tejun Heo; +Cc: Robert Hancock, linux-ide
[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]
Tejun Heo wrote:
> Robert Hancock wrote:
>
>> Tejun Heo wrote:
>>
>>> Stefan wrote:
>>>
>>>> Hi folks,
>>>>
>>>> yesterday I upgraded kernel 2.6.19 to 2.6.20 (gentoo kernel). Now my
>>>> box locks up about 10 min after boot.
>>>> After that I tested with a vanilla 2.6.21.1 it shows the same behavior.
>>>> I'm attaching a kern log file from the 2.6.20. The 2.6.21.1 locked up so
>>>> hard, that there was no trace left in the log file.
>>>> If I switch back to 2.6.19 everything is fine again.
>>>> If necessary I will hook up a laptop to this box, so I can capture
>>>> messages via netconsole.
>>>>
>>> Yes please.
>>>
Okay, I had time to set this up. I'm attaching the log messages I got
via netconsole.
>>>
>>>> This machine is running an AMD X2 64, NFORCE4 (ASUS A8N-E)
>>>>
>>>> I haven't been following the development of libata for a while, but from
>>>> the 2.6.20 changelog It looks like there have been some major changes.
>>>>
>>>> Just let me know what kind of information you need in order to narrow it
>>>> down.
>>>>
>>> Does giving 'sata_nv.adma=0' kernel parameter make any difference?
>>>
I tested about 20h with adma disabled, the crash won't occur.
If I remove
sata_nv.adma=0
from boot options again it doesn't take long until my machine locks up.
[Attached dmesg output with 2.6.21.1 kernel + crash info I got via
netconsole]
I hope this is useful to you guys.
Cheers Stefan
>> If adma=0 words, then that means it's either an ADMA related problem or
>> that SAMSUNG HD401LJ drive has some problems with NCQ (since ADMA off
>> means NCQ off as well). I would say the latter is more likely.
>>
>
> I don't have first hand experience with the particular model but I'll be
> surprised if they screwed their firmware up with new generation of
> harddisks. Firmware on the previous generation drives was pretty good
> and they don't get worse usually.
>
>
>> We should really have some kind of "noncq" kernel parameter we can use
>> to help debugging these problems. Though, later kernels are supposed to
>> switch it off automatically after too many errors..
>>
>
> Till now there hasn't been any case where a broken NCQ prevented a
> machine from booting but, yeah, having such thing would be nice for
> debugging.
>
>
[-- Attachment #2: bootlog.txt.gz --]
[-- Type: application/x-tar, Size: 7502 bytes --]
[-- Attachment #3: crashtrace.txt.gz --]
[-- Type: application/x-tar, Size: 1250 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1]
2007-05-01 23:27 ` System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1] Stefan
@ 2007-05-01 23:57 ` Robert Hancock
2007-05-02 11:41 ` Tejun Heo
0 siblings, 1 reply; 11+ messages in thread
From: Robert Hancock @ 2007-05-01 23:57 UTC (permalink / raw)
To: Stefan; +Cc: Tejun Heo, linux-ide
Stefan wrote:
> Okay, I had time to set this up. I'm attaching the log messages I got
> via netconsole.
>
>
> I tested about 20h with adma disabled, the crash won't occur.
>
> If I remove
>
> sata_nv.adma=0
>
> from boot options again it doesn't take long until my machine locks up.
>
> [Attached dmesg output with 2.6.21.1 kernel + crash info I got via
> netconsole]
>
>
> I hope this is useful to you guys.
It looks like you've got SError bits set from the controller, 0x200000
means link layer CRC error (btw, we really should be decoding that error
and printing it in human readable form rather than making people pore
through the SATA spec and count bits). First thing you should try is
replacing the SATA cable to that drive.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1]
2007-05-01 23:57 ` Robert Hancock
@ 2007-05-02 11:41 ` Tejun Heo
2007-05-02 22:04 ` Stefan
2007-05-13 19:33 ` Stefan
0 siblings, 2 replies; 11+ messages in thread
From: Tejun Heo @ 2007-05-02 11:41 UTC (permalink / raw)
To: Robert Hancock; +Cc: Stefan, linux-ide
Robert Hancock wrote:
> Stefan wrote:
>> Okay, I had time to set this up. I'm attaching the log messages I got
>> via netconsole.
>>
>> I tested about 20h with adma disabled, the crash won't occur.
>>
>> If I remove
>>
>> sata_nv.adma=0
>>
>> from boot options again it doesn't take long until my machine locks up.
>>
>> [Attached dmesg output with 2.6.21.1 kernel + crash info I got via
>> netconsole]
>>
>> I hope this is useful to you guys.
>
> It looks like you've got SError bits set from the controller, 0x200000
> means link layer CRC error (btw, we really should be decoding that error
> and printing it in human readable form rather than making people pore
> through the SATA spec and count bits).
Hmmm... Maybe, but most of the bits are nearly meaningless to end users
anyway.
> First thing you should try is replacing the SATA cable to that drive.
Yeap, please apply some hardware debugging techniques - replacing /
reseating SATA cables and connecting it to different power connector.
But it's disturbing to see machine lock up even if CRC error occurs.
sata_nv non-adma interface locks the whole machine up too after certain
error conditions but I thought adma was saner than that. I hope we can
work around this somehow.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1]
2007-05-02 11:41 ` Tejun Heo
@ 2007-05-02 22:04 ` Stefan
2007-05-02 23:46 ` Robert Hancock
2007-05-13 19:33 ` Stefan
1 sibling, 1 reply; 11+ messages in thread
From: Stefan @ 2007-05-02 22:04 UTC (permalink / raw)
To: Tejun Heo; +Cc: Robert Hancock, linux-ide
Tejun Heo wrote:
> Robert Hancock wrote:
>
>> Stefan wrote:
>>
>>> Okay, I had time to set this up. I'm attaching the log messages I got
>>> via netconsole.
>>>
>>> I tested about 20h with adma disabled, the crash won't occur.
>>>
>>> If I remove
>>>
>>> sata_nv.adma=0
>>>
>>> from boot options again it doesn't take long until my machine locks up.
>>>
>>> [Attached dmesg output with 2.6.21.1 kernel + crash info I got via
>>> netconsole]
>>>
>>> I hope this is useful to you guys.
>>>
>> It looks like you've got SError bits set from the controller, 0x200000
>> means link layer CRC error (btw, we really should be decoding that error
>> and printing it in human readable form rather than making people pore
>> through the SATA spec and count bits).
>>
>
> Hmmm... Maybe, but most of the bits are nearly meaningless to end users
> anyway.
>
>
>> First thing you should try is replacing the SATA cable to that drive.
>>
>
> Yeap, please apply some hardware debugging techniques - replacing /
> reseating SATA cables and connecting it to different power connector.
> But it's disturbing to see machine lock up even if CRC error occurs.
> sata_nv non-adma interface locks the whole machine up too after certain
> error conditions but I thought adma was saner than that. I hope we can
> work around this somehow.
>
> Thanks.
>
>
I have just replaced the cables with brand new ones, still no change the
machine freezes ~10min after boot.
Sometimes I even have to completely power down the machine to be able to
boot again.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1]
2007-05-02 22:04 ` Stefan
@ 2007-05-02 23:46 ` Robert Hancock
0 siblings, 0 replies; 11+ messages in thread
From: Robert Hancock @ 2007-05-02 23:46 UTC (permalink / raw)
To: Stefan; +Cc: Tejun Heo, linux-ide
Stefan wrote:
>>> First thing you should try is replacing the SATA cable to that drive.
>>>
>> Yeap, please apply some hardware debugging techniques - replacing /
>> reseating SATA cables and connecting it to different power connector.
>> But it's disturbing to see machine lock up even if CRC error occurs.
>> sata_nv non-adma interface locks the whole machine up too after certain
>> error conditions but I thought adma was saner than that. I hope we can
>> work around this somehow.
>>
>> Thanks.
>>
>>
> I have just replaced the cables with brand new ones, still no change the
> machine freezes ~10min after boot.
> Sometimes I even have to completely power down the machine to be able to
> boot again.
Power problems would be another possibility..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1]
2007-05-02 11:41 ` Tejun Heo
2007-05-02 22:04 ` Stefan
@ 2007-05-13 19:33 ` Stefan
1 sibling, 0 replies; 11+ messages in thread
From: Stefan @ 2007-05-13 19:33 UTC (permalink / raw)
To: Tejun Heo; +Cc: Robert Hancock, linux-ide
Tejun Heo wrote:
> Robert Hancock wrote:
>
>> Stefan wrote:
>>
>>> Okay, I had time to set this up. I'm attaching the log messages I got
>>> via netconsole.
>>>
>>> I tested about 20h with adma disabled, the crash won't occur.
>>>
>>> If I remove
>>>
>>> sata_nv.adma=0
>>>
>>> from boot options again it doesn't take long until my machine locks up.
>>>
>>> [Attached dmesg output with 2.6.21.1 kernel + crash info I got via
>>> netconsole]
>>>
>>> I hope this is useful to you guys.
>>>
>> It looks like you've got SError bits set from the controller, 0x200000
>> means link layer CRC error (btw, we really should be decoding that error
>> and printing it in human readable form rather than making people pore
>> through the SATA spec and count bits).
>>
>
> Hmmm... Maybe, but most of the bits are nearly meaningless to end users
> anyway.
>
>
>> First thing you should try is replacing the SATA cable to that drive.
>>
>
> Yeap, please apply some hardware debugging techniques - replacing /
> reseating SATA cables and connecting it to different power connector.
> But it's disturbing to see machine lock up even if CRC error occurs.
> sata_nv non-adma interface locks the whole machine up too after certain
> error conditions but I thought adma was saner than that. I hope we can
> work around this somehow.
>
> Thanks.
>
>
Hi folks, got some news:
I replaced cables, which didn't change anything. My PSU is strong enough
to take a lot more, so I don't think it could be a power problem.
Therefore I replaced the SAMSUNG HD401LJ with ST3160812AS.
With the seagate attached I don't get the crash. So this may be a
problem in combination with the HD401LJ+ NFORCE4 + ADMA.
--Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: System freezes with kernel >2.6.19 - sata_nv
2007-04-30 10:04 ` Tejun Heo
2007-04-30 23:44 ` Robert Hancock
@ 2007-05-01 0:19 ` Stefan
1 sibling, 0 replies; 11+ messages in thread
From: Stefan @ 2007-05-01 0:19 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-ide, Robert Hancock
Tejun Heo wrote:
> Stefan wrote:
>
>> Hi folks,
>>
>> yesterday I upgraded kernel 2.6.19 to 2.6.20 (gentoo kernel). Now my
>> box locks up about 10 min after boot.
>> After that I tested with a vanilla 2.6.21.1 it shows the same behavior.
>> I'm attaching a kern log file from the 2.6.20. The 2.6.21.1 locked up so
>> hard, that there was no trace left in the log file.
>> If I switch back to 2.6.19 everything is fine again.
>> If necessary I will hook up a laptop to this box, so I can capture
>> messages via netconsole.
>>
>
> Yes please.
>
>
>> This machine is running an AMD X2 64, NFORCE4 (ASUS A8N-E)
>>
>> I haven't been following the development of libata for a while, but from
>> the 2.6.20 changelog It looks like there have been some major changes.
>>
>> Just let me know what kind of information you need in order to narrow it
>> down.
>>
>
> Does giving 'sata_nv.adma=0' kernel parameter make any difference?
>
>
It seems like switching off adma works. I didn't experience any problems
within the last few hours.
Just to mention I also use windows 2003 on this box with NCQ enabled (I
think the nvidia driver there states that ncq is active).
Should I recompile the kernel with some debugging options enabled?
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-05-13 21:33 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-29 22:27 System freezes with kernel >2.6.19 - sata_nv Stefan
2007-04-30 10:04 ` Tejun Heo
2007-04-30 23:44 ` Robert Hancock
2007-05-01 2:24 ` Tejun Heo
2007-05-01 23:27 ` System freezes with kernel >2.6.19 - sata_nv [added crash info kern 2.6.21.1] Stefan
2007-05-01 23:57 ` Robert Hancock
2007-05-02 11:41 ` Tejun Heo
2007-05-02 22:04 ` Stefan
2007-05-02 23:46 ` Robert Hancock
2007-05-13 19:33 ` Stefan
2007-05-01 0:19 ` System freezes with kernel >2.6.19 - sata_nv Stefan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).