From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: JMicron - hard resetting link Date: Wed, 13 Feb 2008 08:50:34 +0900 Message-ID: <47B230CA.9060506@gmail.com> References: <009401c86d5c$5eb57bf0$4d0fa8c0@M2007> <47B19997.1010404@gmail.com> <003801c86d84$fdae0510$4d0fa8c0@M2007> <47B1B299.3010208@gmail.com> <002f01c86d9c$94542f50$4d0fa8c0@M2007> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-2 Content-Transfer-Encoding: 7bit Return-path: Received: from el-out-1112.google.com ([209.85.162.176]:3629 "EHLO el-out-1112.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1762956AbYBLXun (ORCPT ); Tue, 12 Feb 2008 18:50:43 -0500 Received: by el-out-1112.google.com with SMTP id v27so2151752ele.23 for ; Tue, 12 Feb 2008 15:50:42 -0800 (PST) In-Reply-To: <002f01c86d9c$94542f50$4d0fa8c0@M2007> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Gabor FUNK Cc: IDE/ATA development list Hello, Gabor FUNK wrote: >> What I said was that timeouts occurring due to transmission errors >> should be recoverable. It seems like IRQ delivery didn't work probably >> due to screaming IRQ. I need to see the messages before the first >> relevant error message. It's always a good idea to post full kernel log >> from boot till failure. Things which don't seem relevant are often >> relevant. > Naturally. Full kern.log with boot: > http://www.huweb.hu/maques/tmp/jmicron/kern.log > (no edits, there are really only those 2 lines between Feb 6 and Feb 9's > 1st exception) Hmmm... Indeed. This is the first time this mode of failure is reported. > Previously there was kernel 2.6.23.9 and I noticed the following in > syslog by then: > Feb 6 19:10:19 storage1 kernel: ata4: D2H reg with I during NCQ, this > message won't be printed again > Feb 6 19:10:20 storage1 kernel: ata1: D2H reg with I during NCQ, this > message won't be printed again > Feb 6 19:10:20 storage1 kernel: ata2: D2H reg with I during NCQ, this > message won't be printed again > Feb 6 19:10:21 storage1 kernel: ata3: D2H reg with I during NCQ, this > message won't be printed again > > I googled and saw that there was some fixes related to this (maybe it > was you), so that's why we hoped that 2.6.24 will fix this. Actually the > above error messages were gone, but... Yeap, those are gone. >> Till now, none of this kind of problem has been tracked down to MB or >> the controller while 90% of hardware problems turned out to be power >> related. > I'll put a brand new, probably different PSU in the case and put the MB > and the 4 disks of the problematic controller on it, and put the 2 system > and other 4 disks to this one (or even another one). Yeap, please keep me posted. > Meanwhile I'd welcome if you have any suggestion why controller reset > causing a "fatal error"... > BTW, the drives were accessible after the array broke (when I got there). What do you mean by 'drives were accessible'? /dev/sdX nodes were accessible? -- tejun