From: Justin Piszcz <jpiszcz@lucidpixels.com>
To: Robert Hancock <hancockr@shaw.ca>
Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
linux-ide@vger.kernel.org, xfs@oss.sgi.com,
Alan Piszcz <ap@solarrain.com>
Subject: Re: Lots of con-current I/O = resets SATA link? (2.6.25.10)
Date: Sun, 6 Jul 2008 08:13:00 -0400 (EDT) [thread overview]
Message-ID: <alpine.DEB.1.10.0807060741010.4930@p34.internal.lan> (raw)
In-Reply-To: <alpine.DEB.1.10.0807060447180.5005@p34.internal.lan>
On Sun, 6 Jul 2008, Justin Piszcz wrote:
> On Sat, 5 Jul 2008, Justin Piszcz wrote:
>> On Sat, 5 Jul 2008, Robert Hancock wrote:
>
> In short, utilizing Raptors (especially veliciraptors)+NCQ on the ICH8 w/AHCI
> & other cards in a RAID 5 configuration is a death trap (a good way to lose
> your data), it appears unsafe to use NCQ w/raptors in a RAID 5
> configuration. I've defaulted back to disabling it like I always do
> and my RAID5 is rebuilding now.
>
> After the rebuild is completed I will perform more testing.
Running many parallel, tar, untar and copies of big fileskernel tarball and the kernel
source tree)
$ ps auxww | grep -c cp
437
$ ps auxww | grep -c tar
71
More than ~50k context switches..
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
9 8 160 48092 160 6640044 0 0 524 21776 2264 50571 0 27 30 43
0 9 160 46572 160 6642572 0 0 220 22956 2032 45197 0 40 11 49
0 47 160 51424 160 6642800 0 0 0 22900 1799 39694 0 57 5 38
0 6 160 48916 160 6646272 0 0 112 23932 1763 41746 0 49 13 38
0 7 160 49316 160 6646192 0 0 0 25712 1513 37190 0 20 30 50
0 7 160 49240 160 6646264 0 0 0 28352 1853 38319 0 27 18 55
0 1 160 46652 160 6649688 0 0 548 22800 1933 34609 0 22 69 8
0 0 160 47032 160 6651108 0 0 2268 23652 1998 40729 0 22 56 22
1 0 160 47192 160 6651580 0 0 340 21220 1718 34293 1 17 60 23
This is with the "noapic" boot option and NCQ disabled.
If there are no further errors I will reboot once more and re-run these tests
without the "noapic" boot option and NCQ+irqbalance disabled as before I left
NCQ enabled when irqbalance was disabled.
Trying to find a pattern here but not having much luck.
When all is said and done with over > 500 processes doing I/O with NCQ
disabled and IRQ balance disabled w/noapic, I could not reproduce the
problem.
The problem here is look at the IRQ routing, nearly every device is on IRQ 11:
$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 100 0 0 0 XT-PIC-XT timer
1: 2 0 0 0 XT-PIC-XT i8042
2: 0 0 0 0 XT-PIC-XT cascade
8: 1 0 0 0 XT-PIC-XT rtc
9: 60454 0 0 0 XT-PIC-XT acpi, HDA Intel, eth2
10: 129911 0 0 0 XT-PIC-XT pata_marvell, uhci_hcd:usb4, eth1
11: 10278157 0 0 0 XT-PIC-XT sata_sil24, sata_sil24, sata_sil24, ohci1394, ehci_hcd:usb1, ehci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb5, uhci_hcd:usb6, uhci_hcd:usb7, i915@pci:0000:00:02.0
12: 4 0 0 0 XT-PIC-XT i8042
377: 3027113 0 0 0 PCI-MSI-edge eth0
378: 9168537 0 0 0 PCI-MSI-edge ahci
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 9832917 9837364 9833540 9842241 Local timer interrupts
RES: 2313942 5729262 5207216 5776735 Rescheduling interrupts
CAL: 24888 884 25272 25155 function call interrupts
TLB: 7990 21120 23055 43247 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
SPU: 0 0 0 0 Spurious interrupts
ERR: 0
Justin.
next prev parent reply other threads:[~2008-07-06 12:13 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <fa.u8J+BqAcxU1mg8ob9pMBJaAHBPo@ifi.uio.no>
2008-07-05 18:38 ` Lots of con-current I/O = resets SATA link? (2.6.25.10) Robert Hancock
2008-07-05 18:54 ` Jon Nelson
2008-07-05 19:04 ` Robert Hancock
2008-07-05 19:28 ` Justin Piszcz
2008-07-05 23:22 ` Robert Hancock
2008-07-05 23:24 ` Justin Piszcz
2008-07-06 10:31 ` Justin Piszcz
2008-07-06 12:13 ` Justin Piszcz [this message]
2008-07-06 12:42 ` Justin Piszcz
2008-07-06 19:51 ` Justin Piszcz
2008-07-07 9:45 ` Mattias Wadenstein
2008-07-07 9:57 ` Justin Piszcz
2008-07-07 18:14 ` Michal Soltys
2008-07-05 16:57 Justin Piszcz
2008-07-05 17:35 ` Jon Nelson
2008-07-05 17:35 ` Jon Nelson
2008-07-07 15:04 ` Gerhard Wiesinger
2008-07-07 15:08 ` Gerhard Wiesinger
2008-07-07 16:04 ` Justin Piszcz
2008-07-08 6:24 ` Gerhard Wiesinger
2008-07-08 6:59 ` Gerhard Wiesinger
2008-07-08 8:35 ` Justin Piszcz
2008-07-08 10:31 ` Gerhard Wiesinger
2008-07-08 8:34 ` Justin Piszcz
2008-07-08 10:33 ` Gerhard Wiesinger
2008-07-08 13:15 ` Justin Piszcz
2008-07-09 5:37 ` Gerhard Wiesinger
2008-07-10 1:27 ` Henrique de Moraes Holschuh
2008-07-12 8:29 ` Gerhard Wiesinger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.1.10.0807060741010.4930@p34.internal.lan \
--to=jpiszcz@lucidpixels.com \
--cc=ap@solarrain.com \
--cc=hancockr@shaw.ca \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).