From: Robert Hancock <hancockrwd@gmail.com>
To: Jonathan Steinert <hachi@kuiki.net>
Cc: linux-kernel@vger.kernel.org
Subject: Re: problem with sata_sil24: PCI fault or device removal?
Date: Wed, 20 May 2009 00:10:27 -0600 [thread overview]
Message-ID: <4A139ED3.5060201@gmail.com> (raw)
In-Reply-To: <20090519213008.GA21409@miyako.harrison.succub.us>
Jonathan Steinert wrote:
> I have a box here running 2.6.26 x86_64 normally (also tested with 2.6.29, and going to test with later versions too) that has major issues with SATA. I'm not sure which things are causes and which things are side effects, so I'm just going to list symptoms.
>
> I'm happy to use this machine for debugging, but I'm not subscribed to lkml. If you could please CC me on responses that would help a lot.
>
> - The first error I usually see is:
>
> sata_sil24: PCI fault or device removal?
>
> - In some cases I've had the box hard-lock (no magic SysRq response or anything) with no errors.
>
> - Booting a live OS with 2.6.25 @ 32bit seems to be just fine, but this could be a side effect of 32 vs 64bit. Not sure yet.
>
> - SMART commands might make the situation worse. I was using smartmontools to run long self-tests on the drives every sunday, and tends to crash on sunday around the time of the long test starting. I was also using hddtemp and it got a little less frequent when I removed that.
>
> - Lots of IO does make it worse. dd if=/dev/sdwhatever of=/dev/null can get it to start spewing errors within a minute usually.
>
> - Logs and command outputs are at: http://hachi.kuiki.net/bug_reports/20090519-linux-sata/
>
> lspci: http://hachi.kuiki.net/bug_reports/20090519-linux-sata/lspci.txt
> lspci -vvv: http://hachi.kuiki.net/bug_reports/20090519-linux-sata/lspci_long.txt
> dmesg: http://hachi.kuiki.net/bug_reports/20090519-linux-sata/dmesg.txt
> console output during a crash: http://hachi.kuiki.net/bug_reports/20090519-linux-sata/crash1.txt
>
> If anyone has time to help, it would be much appreicated. I'm able and willing to collect any other information you might want.
What is this controller, an add-in card? The fact that you got a status
register read returning all ones (think that's the usual result of a PCI
abort), and these errors:
[ 2094.921961] ata12: irq_stat 0x00020002, PCI master abort while
transferring data
[ 2099.317472] ata12.00: irq_stat 0x00020002, PCI parity error while
fetching SGT
really suggest you're having some kind of problems on the PCI bus..
Could be a hardware fault, power issue, etc.
prev parent reply other threads:[~2009-05-20 6:10 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-19 21:30 problem with sata_sil24: PCI fault or device removal? Jonathan Steinert
2009-05-20 6:10 ` Robert Hancock [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A139ED3.5060201@gmail.com \
--to=hancockrwd@gmail.com \
--cc=hachi@kuiki.net \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).