Re: JMicron - hard resetting link

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Gabor FUNK" <FUNK.Gabor@hunetkft.hu>
To: Tejun Heo <htejun@gmail.com>
Cc: IDE/ATA development list <linux-ide@vger.kernel.org>
Subject: Re: JMicron - hard resetting link
Date: Fri, 15 Feb 2008 00:02:32 +0100	[thread overview]
Message-ID: <005c01c86f5d$b050b4e0$4d0fa8c0@M2007> (raw)
In-Reply-To: 47B230CA.9060506@gmail.com

To be honest, I didn't believe that doing anything with the PSU
would do something.
However, seemingly it did.
I have also updated the BIOS, but I guess this has not much
to do with it.
So a different brand PSU was additionally installed, and this
one got the motherboard and the 4 disk which were failing.
The "old" PSU got the second 4 hdds and the 2 other system
HDDs.
Test was started yesterday (Feb 13) about 16:30 CET including
array building up and file copies. About today (14) 20:22 the
problem appeared, but seemingly "moved" with the PSU to the
other 4 disks bunch (on nvidia controller) - more precisely, only
2 of them (array is still operational).

Feb 14 20:22:32 storage1 kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 
0x0 action 0x2 frozen
Feb 14 20:22:32 storage1 kernel: ata10.00: cmd 
c8/00:00:c3:d5:3b/00:00:00:00:00/e2 tag 0 dma 131072 in
Feb 14 20:22:32 storage1 kernel:          res 
40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 14 20:22:32 storage1 kernel: ata10.00: status: { DRDY }
Feb 14 20:22:32 storage1 kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 
0x0 action 0x2 frozen
Feb 14 20:22:32 storage1 kernel: ata9.00: cmd 
c8/00:00:c3:d5:3b/00:00:00:00:00/e2 tag 0 dma 131072 in
Feb 14 20:22:32 storage1 kernel:          res 
40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 14 20:22:32 storage1 kernel: ata9.00: status: { DRDY }
Feb 14 20:22:33 storage1 kernel: ata10: soft resetting link
Feb 14 20:22:33 storage1 kernel: ata9: soft resetting link
Feb 14 20:22:33 storage1 kernel: ata10: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)
Feb 14 20:22:33 storage1 kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 
SControl 300)
Feb 14 20:23:03 storage1 kernel: ata9.00: qc timeout (cmd 0x27)
Feb 14 20:23:03 storage1 kernel: ata9.00: failed to read native max address 
(err_mask=0x4)
Feb 14 20:23:03 storage1 kernel: ata9.00: HPA support seems broken, will 
skip HPA handling
Feb 14 20:23:03 storage1 kernel: ata9.00: revalidation failed (errno=-5)
Feb 14 20:23:03 storage1 kernel: ata9: failed to recover some devices, 
retrying in 5 secs
Feb 14 20:23:03 storage1 kernel: ata10.00: qc timeout (cmd 0x27)
Feb 14 20:23:03 storage1 kernel: ata10.00: failed to read native max address 
(err_mask=0x4)
Feb 14 20:23:03 storage1 kernel: ata10.00: HPA support seems broken, will 
skip HPA handling
Feb 14 20:23:03 storage1 kernel: ata10.00: revalidation failed (errno=-5)
Feb 14 20:23:03 storage1 kernel: ata10: failed to recover some devices, 
retrying in 5 secs
Feb 14 20:23:08 storage1 kernel: ata9: hard resetting link
Feb 14 20:23:08 storage1 kernel: ata10: hard resetting link
...

Full kern.log is at:
http://www.huweb.hu/maques/tmp/jmicron/kern0214.log

So it seems that there is definitely something with the "old" PSU.

Also, I tried to mount the failed drives, without success.

Thought I let you know.
Now I will try with the only one, "new" PSU to see what happens...

G.


----- Original Message ----- 
From: "Tejun Heo" <htejun@gmail.com>
To: "Gabor FUNK" <FUNK.Gabor@hunetkft.hu>
Cc: "IDE/ATA development list" <linux-ide@vger.kernel.org>
Sent: Wednesday, February 13, 2008 12:50 AM
Subject: Re: JMicron - hard resetting link


> Hello,
>
> Gabor FUNK wrote:
>>> What I said was that timeouts occurring due to transmission errors
>>> should be recoverable.  It seems like IRQ delivery didn't work probably
>>> due to screaming IRQ.  I need to see the messages before the first
>>> relevant error message.  It's always a good idea to post full kernel log
>>> from boot till failure.  Things which don't seem relevant are often
>>> relevant.
>> Naturally. Full kern.log with boot:
>> http://www.huweb.hu/maques/tmp/jmicron/kern.log
>> (no edits, there are really only those 2 lines between Feb 6 and Feb 9's
>> 1st exception)
>
> Hmmm... Indeed.  This is the first time this mode of failure is reported.
>
>> Previously there was kernel 2.6.23.9 and I noticed the following in
>> syslog by then:
>> Feb  6 19:10:19 storage1 kernel: ata4: D2H reg with I during NCQ, this
>> message won't be printed again
>> Feb  6 19:10:20 storage1 kernel: ata1: D2H reg with I during NCQ, this
>> message won't be printed again
>> Feb  6 19:10:20 storage1 kernel: ata2: D2H reg with I during NCQ, this
>> message won't be printed again
>> Feb  6 19:10:21 storage1 kernel: ata3: D2H reg with I during NCQ, this
>> message won't be printed again
>>
>> I googled and saw that there was some fixes related to this (maybe it
>> was you), so that's why we hoped that 2.6.24 will fix this. Actually the
>> above error messages were gone, but...
>
> Yeap, those are gone.
>
>>> Till now, none of this kind of problem has been tracked down to MB or
>>> the controller while 90% of hardware problems turned out to be power
>>> related.
>> I'll put a brand new, probably different PSU in the case and put the MB
>> and the 4 disks of the problematic controller on it, and put the 2 system
>> and other 4 disks to this one (or even another one).
>
> Yeap, please keep me posted.
>
>> Meanwhile I'd welcome if you have any suggestion why controller reset
>> causing a "fatal error"...
>> BTW, the drives were accessible after the array broke (when I got there).
>
> What do you mean by 'drives were accessible'?  /dev/sdX nodes were
> accessible?
>
> -- 
> tejun
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2008-02-14 23:02 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-12  9:48 JMicron - hard resetting link Gabor FUNK
2008-02-12 13:05 ` Tejun Heo
2008-02-12 14:38   ` Gabor FUNK
2008-02-12 14:52     ` Tejun Heo
2008-02-12 17:27       ` Gabor FUNK
2008-02-12 23:50         ` Tejun Heo
2008-02-14 23:02           ` Gabor FUNK [this message]
2008-02-14 23:32             ` Tejun Heo
2008-02-21 21:45               ` Gabor FUNK
2008-02-22  2:03                 ` Tejun Heo
2008-02-24  9:04                   ` Gabor FUNK

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='005c01c86f5d$b050b4e0$4d0fa8c0@M2007' \
    --to=funk.gabor@hunetkft.hu \
    --cc=htejun@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).