All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Lord <kernel@teksavvy.com>
To: Tim Small <tim@buttersideup.com>
Cc: Tejun Heo <tj@kernel.org>,
	Justin Piszcz <jpiszcz@lucidpixels.com>,
	"smartmontools-support@lists.sourceforge.net"
	<smartmontools-support@lists.sourceforge.net>,
	linux-ide@vger.kernel.org
Subject: Re: [smartmontools-support] SATA drive reset/disable events on ICH7 ata_piix when polling SMART info
Date: Sat, 06 Feb 2010 12:30:52 -0500	[thread overview]
Message-ID: <4B6DA74C.2040007@teksavvy.com> (raw)
In-Reply-To: <4B6D8A12.70200@buttersideup.com>

Tim Small wrote:
> Tejun Heo wrote:
>>> The only constants seem to be libata and ICH7/8.
>>> We must have a bug somewhere in there.
>>>     
>> In piix mode or ahci mode?  If in piix mode, ich7 and 8 would behave
>> quite differently.  ICH8 has SIDPR so it can hardreset while 7 can't.
>> ICH SIDPR access had a hardware problem where write to SControl to
>> clear DET is sometimes ignored which led to occassional hardreset
>> failure which got fixed recently.  The reason why ich's are involved
>> in those incidents could just be that they are extremely popular.
>>   
> 
> It's a non-AHCI capable ICH7, so it's in piix mode.
> 
>> Things to try after such completely drive shutdown are...
>>   
> 
> Unfortunately I can't do much with this box, as it's a rented box in a
> datacentre, however....
> 
>> * Soft reset the machine.  Can BIOS recognize the drive?
>>   
> 
> Yes, if I either 'echo b > /proc/sysrq-trigger', then the BIOS
> recognises the drive, and the box reboot normally.
> 
>> In many cases I've seen, it's usually that the drive's firmware is
>> completely hung and only power cycling the drive brought it back.  But
>> then again, there have been some number of cases which didn't get
>> diagnosed properly, so it's definitely possible that we're doing
>> something wrong in the driver.
>>
>> Anyways, if it happens again, please try the above and try to find out
>> whether the controller or the drive is hung.  Also, please keep in
>> mind that timeouts on 0xEA (flush) is very often indicative of power
>>   
> 
> OK, I didn't think I was seeing those - is it possible to tell from the
> detail which I posted in my original message?  As for the potential for
> PSU shenanigans - I don't have access to the box to fiddle with that,
> unfortunately, but I believe I can stress the I/O subsystem quite
> heavily with dd and/or bonnie, but it's only when polling for SMART
> status that these errors show up.  I've just started dd (to RAID mirror)
> + hdparm -I again to check...
> 
> Do the SMART error counters in the OP make this suspicious?  Is there
> likely to be any different between running smartctl -a and hdparm -I  in
> terms of code path taken though the kernel, or timings on the hardware,
> as far as you know?
..


My theory on the problem when I first had it here, was that doing
a FLUSH_CACHE[_EXT] before any PIO command (eg. SMART) should prevent
the problem.  This was never explored further (by me or others).

Cheers


  reply	other threads:[~2010-02-06 17:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-05 14:07 SATA drive reset/disable events on ICH7 ata_piix when polling SMART info Tim Small
2010-02-05 14:17 ` [smartmontools-support] " Justin Piszcz
2010-02-05 14:31   ` Tim Small
2010-02-05 14:48     ` Justin Piszcz
2010-02-05 21:47     ` Mark Lord
2010-02-06  3:39       ` Tejun Heo
2010-02-06 15:26         ` Tim Small
2010-02-06 17:30           ` Mark Lord [this message]
2010-02-06 22:22             ` Tim Small
2010-02-07  4:51               ` Mark Lord
2010-02-08  2:40                 ` Tejun Heo
2010-02-08 13:03                 ` Tim Small
2010-02-08  2:49             ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B6DA74C.2040007@teksavvy.com \
    --to=kernel@teksavvy.com \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=smartmontools-support@lists.sourceforge.net \
    --cc=tim@buttersideup.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.