From: "Gabor FUNK" <FUNK.Gabor@hunetkft.hu>
To: Tejun Heo <htejun@gmail.com>
Cc: IDE/ATA development list <linux-ide@vger.kernel.org>
Subject: Re: JMicron - hard resetting link
Date: Fri, 15 Feb 2008 00:02:32 +0100 [thread overview]
Message-ID: <005c01c86f5d$b050b4e0$4d0fa8c0@M2007> (raw)
In-Reply-To: 47B230CA.9060506@gmail.com
To be honest, I didn't believe that doing anything with the PSU
would do something.
However, seemingly it did.
I have also updated the BIOS, but I guess this has not much
to do with it.
So a different brand PSU was additionally installed, and this
one got the motherboard and the 4 disk which were failing.
The "old" PSU got the second 4 hdds and the 2 other system
HDDs.
Test was started yesterday (Feb 13) about 16:30 CET including
array building up and file copies. About today (14) 20:22 the
problem appeared, but seemingly "moved" with the PSU to the
other 4 disks bunch (on nvidia controller) - more precisely, only
2 of them (array is still operational).
Feb 14 20:22:32 storage1 kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x2 frozen
Feb 14 20:22:32 storage1 kernel: ata10.00: cmd
c8/00:00:c3:d5:3b/00:00:00:00:00/e2 tag 0 dma 131072 in
Feb 14 20:22:32 storage1 kernel: res
40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 14 20:22:32 storage1 kernel: ata10.00: status: { DRDY }
Feb 14 20:22:32 storage1 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x2 frozen
Feb 14 20:22:32 storage1 kernel: ata9.00: cmd
c8/00:00:c3:d5:3b/00:00:00:00:00/e2 tag 0 dma 131072 in
Feb 14 20:22:32 storage1 kernel: res
40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 14 20:22:32 storage1 kernel: ata9.00: status: { DRDY }
Feb 14 20:22:33 storage1 kernel: ata10: soft resetting link
Feb 14 20:22:33 storage1 kernel: ata9: soft resetting link
Feb 14 20:22:33 storage1 kernel: ata10: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Feb 14 20:22:33 storage1 kernel: ata9: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Feb 14 20:23:03 storage1 kernel: ata9.00: qc timeout (cmd 0x27)
Feb 14 20:23:03 storage1 kernel: ata9.00: failed to read native max address
(err_mask=0x4)
Feb 14 20:23:03 storage1 kernel: ata9.00: HPA support seems broken, will
skip HPA handling
Feb 14 20:23:03 storage1 kernel: ata9.00: revalidation failed (errno=-5)
Feb 14 20:23:03 storage1 kernel: ata9: failed to recover some devices,
retrying in 5 secs
Feb 14 20:23:03 storage1 kernel: ata10.00: qc timeout (cmd 0x27)
Feb 14 20:23:03 storage1 kernel: ata10.00: failed to read native max address
(err_mask=0x4)
Feb 14 20:23:03 storage1 kernel: ata10.00: HPA support seems broken, will
skip HPA handling
Feb 14 20:23:03 storage1 kernel: ata10.00: revalidation failed (errno=-5)
Feb 14 20:23:03 storage1 kernel: ata10: failed to recover some devices,
retrying in 5 secs
Feb 14 20:23:08 storage1 kernel: ata9: hard resetting link
Feb 14 20:23:08 storage1 kernel: ata10: hard resetting link
...
Full kern.log is at:
http://www.huweb.hu/maques/tmp/jmicron/kern0214.log
So it seems that there is definitely something with the "old" PSU.
Also, I tried to mount the failed drives, without success.
Thought I let you know.
Now I will try with the only one, "new" PSU to see what happens...
G.
----- Original Message -----
From: "Tejun Heo" <htejun@gmail.com>
To: "Gabor FUNK" <FUNK.Gabor@hunetkft.hu>
Cc: "IDE/ATA development list" <linux-ide@vger.kernel.org>
Sent: Wednesday, February 13, 2008 12:50 AM
Subject: Re: JMicron - hard resetting link
> Hello,
>
> Gabor FUNK wrote:
>>> What I said was that timeouts occurring due to transmission errors
>>> should be recoverable. It seems like IRQ delivery didn't work probably
>>> due to screaming IRQ. I need to see the messages before the first
>>> relevant error message. It's always a good idea to post full kernel log
>>> from boot till failure. Things which don't seem relevant are often
>>> relevant.
>> Naturally. Full kern.log with boot:
>> http://www.huweb.hu/maques/tmp/jmicron/kern.log
>> (no edits, there are really only those 2 lines between Feb 6 and Feb 9's
>> 1st exception)
>
> Hmmm... Indeed. This is the first time this mode of failure is reported.
>
>> Previously there was kernel 2.6.23.9 and I noticed the following in
>> syslog by then:
>> Feb 6 19:10:19 storage1 kernel: ata4: D2H reg with I during NCQ, this
>> message won't be printed again
>> Feb 6 19:10:20 storage1 kernel: ata1: D2H reg with I during NCQ, this
>> message won't be printed again
>> Feb 6 19:10:20 storage1 kernel: ata2: D2H reg with I during NCQ, this
>> message won't be printed again
>> Feb 6 19:10:21 storage1 kernel: ata3: D2H reg with I during NCQ, this
>> message won't be printed again
>>
>> I googled and saw that there was some fixes related to this (maybe it
>> was you), so that's why we hoped that 2.6.24 will fix this. Actually the
>> above error messages were gone, but...
>
> Yeap, those are gone.
>
>>> Till now, none of this kind of problem has been tracked down to MB or
>>> the controller while 90% of hardware problems turned out to be power
>>> related.
>> I'll put a brand new, probably different PSU in the case and put the MB
>> and the 4 disks of the problematic controller on it, and put the 2 system
>> and other 4 disks to this one (or even another one).
>
> Yeap, please keep me posted.
>
>> Meanwhile I'd welcome if you have any suggestion why controller reset
>> causing a "fatal error"...
>> BTW, the drives were accessible after the array broke (when I got there).
>
> What do you mean by 'drives were accessible'? /dev/sdX nodes were
> accessible?
>
> --
> tejun
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2008-02-14 23:02 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-12 9:48 JMicron - hard resetting link Gabor FUNK
2008-02-12 13:05 ` Tejun Heo
2008-02-12 14:38 ` Gabor FUNK
2008-02-12 14:52 ` Tejun Heo
2008-02-12 17:27 ` Gabor FUNK
2008-02-12 23:50 ` Tejun Heo
2008-02-14 23:02 ` Gabor FUNK [this message]
2008-02-14 23:32 ` Tejun Heo
2008-02-21 21:45 ` Gabor FUNK
2008-02-22 2:03 ` Tejun Heo
2008-02-24 9:04 ` Gabor FUNK
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='005c01c86f5d$b050b4e0$4d0fa8c0@M2007' \
--to=funk.gabor@hunetkft.hu \
--cc=htejun@gmail.com \
--cc=linux-ide@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).