From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756062AbZBPCLg (ORCPT ); Sun, 15 Feb 2009 21:11:36 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754643AbZBPCL1 (ORCPT ); Sun, 15 Feb 2009 21:11:27 -0500 Received: from hera.kernel.org ([140.211.167.34]:40415 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752407AbZBPCL0 (ORCPT ); Sun, 15 Feb 2009 21:11:26 -0500 Message-ID: <4998CB34.6090300@kernel.org> Date: Mon, 16 Feb 2009 11:11:00 +0900 From: Tejun Heo User-Agent: Thunderbird 2.0.0.19 (X11/20081227) MIME-Version: 1.0 To: Robert Hancock CC: Serguei Miridonov , linux-kernel@vger.kernel.org, Jeff Garzik Subject: Re: Intel ICH9M/M-E SATA error-handling/reset problems References: <200902141206.06419.mirsev@cicese.mx> <49973F4F.1010804@gmail.com> <200902151000.16688.mirsev@cicese.mx> <4998593D.2050300@gmail.com> In-Reply-To: <4998593D.2050300@gmail.com> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.0 (hera.kernel.org [127.0.0.1]); Mon, 16 Feb 2009 02:11:17 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Robert Hancock wrote: > Serguei Miridonov wrote: >> Hello Robert and Jeff, >> >> Thank you for your replies. >> On Saturday 14 February 2009, Jeff Garzik wrote: >>> Serguei Miridonov wrote: >>>> I have some problems with SATA in a new notebook PC (HP Pavilion >>>> dv5t, Intel chipset). Seagate FreeAgent Pro 1TB external drivee >>>> practically can not be used with eSATA in Linux (fresh install >>>> from DVD Fedora 10, now fully updated), and yesterday I also had >>>> problem with DVD recording using internal HL-DT-ST BDDVDRW drive. >>> Some eSata fixes went into the more-recent kernels... Can you try >>> 2.6.29-rc5? >> >> Unfortunately, right now I can not provide a good testing bed for a >> new kernel. I was also thinking about bad cable and returned it to the >> store. Recording DVDs, as you understand, can not be considered for >> testing: I don't do it on regular basis... I will be looking for a new >> eSATA cable in a week or two, so when I have it I'll try to download >> and build the kernel for these experiments. Please try shorter (or different) cable. Most eSATA problems are cabling problems. Speeding down to 1.5Gbps often improves the situation a lot (windows might do this by default). There was a stupid bug in speeding down logic and speeding down to 1.5Gbps didn't happen as designed till lately. The fix went into -stable and should show up in most distros soon (or just roll your own kernel). >> I agree with you completely. Nevertheless, something like 10 errors >> per 2GB transfer can not be the reason to give up. Vista, at least, >> recovers and continues the data transfer. Linux simply can not return >> the interface or connected device into operating mode. Do you think it >> is normal? Well, there isn't much point in keeping retrying if the same command fails consecutively. The problem was the broken speed down logic, so all the retries failed and FS eventually received IO failure. Should have been fixed with recent changes. .... >>>> It appears that Linux kernel has problems with >>>> error-handling/reset of SATA hardware. I have found a lot of >>>> reports regarding SATA problems: data transfer failures, CD/DVD >>>> recording, waking up from suspend to RAM, etc. Aren't they all >>>> related? Can Linux SATA chipsets drivers >>> Not related at all, mostly.. though a lot of people seem to think >>> they are. Often times people think problems are related because the >>> error messages seem similar, and even the same error can be >>> triggered by numerous different problems, often not the fault of >>> the kernel. Heh... yeah, this sometimes gets tiring. Maybe we should reformat ATA error messages every six month or so? :-) Joking aside, yes, there have been and are repeated patterns of failures. Some have passed (e.g. the ATAPI transfer length ones) and some stay (cabling, power). Nonetheless, in most cases, what people think they are experiencing isn't quite correct. >> I'm not talking now about errors triggered by the kernel due to some >> bugs. What I see in the logs, this is the kernel fault to recover from >> errors, not causing it. I hope that this is fixed already in newer >> kernels, though I could not find such information in changelogs. >> >> I could be wrong, of course, but it seems to me that if kernel can >> really reset the interface and return it and connected devices to >> operating mode, then most of issues mentioned above may become not so >> critical and people could live with them until root cause is fixed >> properly. >> >> May be resetting the interface will not help is all cases if a device >> is left in some screwed up state due to earlier poor error handling... >> Well, this is another issue which can be device-vendor-dependent... >> However, regarding external Seagate drive, Vista does not have any >> special driver to handle its errors, it just works... libata EH actually does pretty good in most cases. You'll see a lot of current and archived bug reports but when considering the number of ATA devices (many of them are crappy) out in the wild and that the influx of bug reports has gone down considerably, I think it's doing pretty good. In the log, ata2.00 went down after a timeout. The reset per-se isn't the problem and is the RTTD after a timeout as the controller and device states are unknown. The situations like yours in the log often happens because an ATAPI device shuts down completely after certain transmission problems. When this happens, there's nothing much the driver can do and soft reboot wouldn't recover the device either. But seeing you're on dv5, I think you might be experiencing something else. Please take a look at the following bz. http://bugzilla.kernel.org/show_bug.cgi?id=12276 It seems recent HP laptops do something differently and make the ahci controller behave strangely. On dv5 and HDX16t, suspend/resume doesn't work. The link simply doesn't come up after resuming and this is the _ONLY_ report of this kind of problem for all intel AHCIs ever, so yeah HP is doing something. I'm trying to contact HP about this but hasn't gotten anywhere yet. So, you're more likely to be seeing similar problem, I think. Can you please test whether you see the same suspend/resume problem? Thanks. -- tejun