From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Lord Subject: Re: XFS shutting down due to IO timeout on SATA disk (pata_via for CX700) Date: Mon, 15 Sep 2008 23:49:19 -0400 Message-ID: <48CF2CBF.2050406@rtr.ca> References: <20080911193511.7960bc82@neptune.home> <48CE22E5.9090403@kernel.org> <48CEC5FB.4040503@rtr.ca> <48CEC76E.7020101@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from rtr.ca ([76.10.145.34]:43262 "EHLO mail.rtr.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752040AbYIPDtO (ORCPT ); Mon, 15 Sep 2008 23:49:14 -0400 In-Reply-To: <48CEC76E.7020101@kernel.org> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Tejun Heo Cc: =?UTF-8?B?QnJ1bm8gUHLDqW1vbnQ=?= , Linux Kernel , linux-ide@vger.kernel.org, Jeff Garzik Tejun Heo wrote: >.. > Hmm.. most of FLUSH timeouts I've seen are either a dying drive or bad > PSU. There just isn't much which can go wrong from the driver side. > IIRC, there was a problem when the unused part of TF is not cleared > but that was the only one. > >> Smartctl output is clean (no logged errors), and the drives themselves >> are fine after a reboot -- necessary since libata/scsi kicked the drive out >> of the RAID array. >> >> Something strange is going on here. > > Any chance you can trick the client to hook up the drive to a separate PSU? .. No, the failures happen randomly at customer sites, and only since they "upgraded" to SLES10 with libata. I think the PSUs are probably just fine. Time to hack the drivers to give proper status on the timeouts, too; otherwise we won't ever have any clue as to what is really happening. Cheers