From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Gabor FUNK" <FUNK.Gabor@hunetkft.hu>
Subject: Re: JMicron - hard resetting link
Date: Tue, 12 Feb 2008 18:27:44 +0100
Message-ID: <002f01c86d9c$94542f50$4d0fa8c0@M2007>
References: <009401c86d5c$5eb57bf0$4d0fa8c0@M2007> <47B19997.1010404@gmail.com> <003801c86d84$fdae0510$4d0fa8c0@M2007> <47B1B299.3010208@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain;
	format=flowed;
	charset="iso-8859-2";
	reply-type=original
Content-Transfer-Encoding: 7bit
Return-path: <linux-ide-owner@vger.kernel.org>
Received: from ns1.huweb.hu ([62.112.193.37]:52641 "EHLO ns1.huweb.hu"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753695AbYBLR17 (ORCPT <rfc822;linux-ide@vger.kernel.org>);
	Tue, 12 Feb 2008 12:27:59 -0500
Sender: linux-ide-owner@vger.kernel.org
List-Id: linux-ide@vger.kernel.org
To: Tejun Heo <htejun@gmail.com>
Cc: IDE/ATA development list <linux-ide@vger.kernel.org>

> What I said was that timeouts occurring due to transmission errors
> should be recoverable.  It seems like IRQ delivery didn't work probably
> due to screaming IRQ.  I need to see the messages before the first
> relevant error message.  It's always a good idea to post full kernel log
> from boot till failure.  Things which don't seem relevant are often
> relevant.
Naturally. Full kern.log with boot:
http://www.huweb.hu/maques/tmp/jmicron/kern.log
(no edits, there are really only those 2 lines between Feb 6 and Feb 9's 1st 
exception)

Previously there was kernel 2.6.23.9 and I noticed the following in syslog 
by then:
Feb  6 19:10:19 storage1 kernel: ata4: D2H reg with I during NCQ, this 
message won't be printed again
Feb  6 19:10:20 storage1 kernel: ata1: D2H reg with I during NCQ, this 
message won't be printed again
Feb  6 19:10:20 storage1 kernel: ata2: D2H reg with I during NCQ, this 
message won't be printed again
Feb  6 19:10:21 storage1 kernel: ata3: D2H reg with I during NCQ, this 
message won't be printed again

I googled and saw that there was some fixes related to this (maybe it
was you), so that's why we hoped that 2.6.24 will fix this. Actually the
above error messages were gone, but...

> Till now, none of this kind of problem has been tracked down to MB or
> the controller while 90% of hardware problems turned out to be power
> related.
I'll put a brand new, probably different PSU in the case and put the MB
and the 4 disks of the problematic controller on it, and put the 2 system
and other 4 disks to this one (or even another one).

Meanwhile I'd welcome if you have any suggestion why controller reset
causing a "fatal error"...
BTW, the drives were accessible after the array broke (when I got there).

Thanks,
Gabor