From: Jim Duchek <jim.duchek@gmail.com>
To: Mark Knecht <markknecht@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Array 'freezes' for some time after large writes?
Date: Tue, 30 Mar 2010 18:22:19 -0600 [thread overview]
Message-ID: <m2hdead81ad1003301722g30c943ebo9e0552faa82c3c3d@mail.gmail.com> (raw)
In-Reply-To: <5bdc1c8b1003301650g26d7317cwb6254b4bb64596@mail.gmail.com>
Interesting... You're using the AHCI SATA driver... I'm using
ata_piix. I begin to think it might be a hardware issue.
Jim
On 30 March 2010 17:50, Mark Knecht <markknecht@gmail.com> wrote:
> On Tue, Mar 30, 2010 at 3:21 PM, Mark Knecht <markknecht@gmail.com> wrote:
>> I just finished a long compile on my dad's i5-661/DH55HC machine which
>> uses this same WD drive and I didn't spot any sign of this happening
>> there. That's a very recent Intel chipset also and probably more or
>> less the same SATA controller.
>>
>> I'm going to turn on the kernel message into dmesg thing for a while
>> and see if anything pops up.
>>
>> I can set up some additional partitions on my local drive to test
>> other file systems but since you're ext3 and I'm ext3 then it's not
>> that unless the problem moved forward with code over time.
>>
>> I like the idea of using dd but I want to be careful about that sort
>> of thing. I've not used dd before, but if I could tell it to write a
>> gigabyte without messing up existing stuff then that could be helpful.
>>
>> Back later,
>> Mark
>>
>> On Tue, Mar 30, 2010 at 1:59 PM, Jim Duchek <jim.duchek@gmail.com> wrote:
>>> I'm using ext4 on everything, but it's hard to judge which ext3 bugs
>>> might affect ext4 as well. I really don't have the ability to
>>> destructively test the array, I need all the data that's on it and I
>>> don't have enough spare space elsewhere to back it all up. You might
>>> see if you can trigger it with dd, writing to the drive directly w/no
>>> filesystem?
>>>
>>> Jim
>>>
>
> <SNIP>
>
> I know this isn't going to survive email very well but you might want
> to look at interrupts. I'm seeing the count on CPU #5 rising much more
> quickly than other CPU's, and in my case it's generally CPU #5 that
> stalls out with this 100% wait problem.
>
> I'm looking at another 4 processor machine that's been up for a few
> days. Its interrupt counts are fairly balanced, except for TLB
> Shootdowns, whatever that is.
>
> Wouldn't know how to tell if it's related...
>
> - Mark
>
> Using keyboard-interactive authentication.
> Password:
> Last login: Tue Mar 30 15:59:22 PDT 2010 from 192.168.1.65 on pts/0
> keeper ~ # cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
> CPU6 CPU7
> 0: 232 0 0 1 0 0
> 0 0 IO-APIC-edge timer
> 1: 0 0 0 2 0 0
> 0 0 IO-APIC-edge i8042
> 3: 0 0 0 2 0 0
> 0 0 IO-APIC-edge
> 8: 0 0 0 91 0 0
> 0 0 IO-APIC-edge rtc0
> 9: 0 0 0 0 0 0
> 0 0 IO-APIC-fasteoi acpi
> 12: 0 0 0 4 0 0
> 0 0 IO-APIC-edge i8042
> 14: 0 0 0 0 0 0
> 0 0 IO-APIC-edge ide0
> 15: 0 0 0 0 0 0
> 0 0 IO-APIC-edge ide1
> 16: 0 0 0 0 82 0
> 0 0 IO-APIC-fasteoi ahci, uhci_hcd:usb1, nvidia
> 18: 0 0 0 0 0 0
> 0 0 IO-APIC-fasteoi uhci_hcd:usb6, ehci_hcd:usb7
> 19: 0 0 0 0 0 3137
> 0 0 IO-APIC-fasteoi ahci, firewire_ohci,
> uhci_hcd:usb3, uhci_hcd:usb5
> 20: 0 0 0 0 0 0
> 265 0 IO-APIC-fasteoi eth0
> 21: 0 0 0 0 0 0
> 0 0 IO-APIC-fasteoi uhci_hcd:usb2
> 22: 154 0 0 0 0 0
> 0 0 IO-APIC-fasteoi hda_intel
> 23: 0 0 0 0 0 0
> 0 0 IO-APIC-fasteoi uhci_hcd:usb4, ehci_hcd:usb8
> NMI: 0 0 0 0 0 0
> 0 0 Non-maskable interrupts
> LOC: 7048 6722 3577 3598 3491 8425
> 3756 3569 Local timer interrupts
> SPU: 0 0 0 0 0 0
> 0 0 Spurious interrupts
> PMI: 0 0 0 0 0 0
> 0 0 Performance monitoring interrupts
> PND: 0 0 0 0 0 0
> 0 0 Performance pending work
> RES: 335 332 353 259 176 173
> 251 82 Rescheduling interrupts
> CAL: 242 233 258 180 241 160
> 260 260 Function call interrupts
> TLB: 232 242 270 235 342 474
> 537 497 TLB shootdowns
> TRM: 0 0 0 0 0 0
> 0 0 Thermal event interrupts
> THR: 0 0 0 0 0 0
> 0 0 Threshold APIC interrupts
> MCE: 0 0 0 0 0 0
> 0 0 Machine check exceptions
> MCP: 2 2 2 2 2 2
> 2 2 Machine check polls
> ERR: 7
> MIS: 0
> keeper ~ # date
> Tue Mar 30 16:45:13 PDT 2010
> keeper ~ # cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
> CPU6 CPU7
> 0: 232 0 0 9 0 0
> 0 0 IO-APIC-edge timer
> 1: 0 0 0 2 0 0
> 0 0 IO-APIC-edge i8042
> 3: 0 0 0 2 0 0
> 0 0 IO-APIC-edge
> 8: 0 0 0 91 0 0
> 0 0 IO-APIC-edge rtc0
> 9: 0 0 0 0 0 0
> 0 0 IO-APIC-fasteoi acpi
> 12: 0 0 0 4 0 0
> 0 0 IO-APIC-edge i8042
> 14: 0 0 0 0 0 0
> 0 0 IO-APIC-edge ide0
> 15: 0 0 0 0 0 0
> 0 0 IO-APIC-edge ide1
> 16: 0 0 0 0 2660 0
> 0 0 IO-APIC-fasteoi ahci, uhci_hcd:usb1, nvidia
> 18: 0 0 0 0 0 0
> 0 0 IO-APIC-fasteoi uhci_hcd:usb6, ehci_hcd:usb7
> 19: 0 0 0 0 0 20762
> 0 0 IO-APIC-fasteoi ahci, firewire_ohci,
> uhci_hcd:usb3, uhci_hcd:usb5
> 20: 0 0 0 0 0 0
> 1903 0 IO-APIC-fasteoi eth0
> 21: 0 0 0 0 0 0
> 0 0 IO-APIC-fasteoi uhci_hcd:usb2
> 22: 154 0 0 0 0 0
> 0 0 IO-APIC-fasteoi hda_intel
> 23: 0 0 0 0 0 0
> 0 0 IO-APIC-fasteoi uhci_hcd:usb4, ehci_hcd:usb8
> NMI: 0 0 0 0 0 0
> 0 0 Non-maskable interrupts
> LOC: 10618 11998 8756 6940 6484 22076
> 7456 6599 Local timer interrupts
> SPU: 0 0 0 0 0 0
> 0 0 Spurious interrupts
> PMI: 0 0 0 0 0 0
> 0 0 Performance monitoring interrupts
> PND: 0 0 0 0 0 0
> 0 0 Performance pending work
> RES: 335 332 353 259 176 173
> 251 82 Rescheduling interrupts
> CAL: 242 233 258 180 241 160
> 260 260 Function call interrupts
> TLB: 232 243 270 236 343 475
> 538 497 TLB shootdowns
> TRM: 0 0 0 0 0 0
> 0 0 Thermal event interrupts
> THR: 0 0 0 0 0 0
> 0 0 Threshold APIC interrupts
> MCE: 0 0 0 0 0 0
> 0 0 Machine check exceptions
> MCP: 10 10 10 10 10 10
> 10 10 Machine check polls
> ERR: 7
> MIS: 0
> keeper ~ #
>
next prev parent reply other threads:[~2010-03-31 0:22 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-30 17:07 Array 'freezes' for some time after large writes? Jim Duchek
2010-03-30 17:18 ` Mark Knecht
2010-03-30 17:47 ` Jim Duchek
2010-03-30 18:00 ` Mark Knecht
2010-03-30 18:05 ` Mark Knecht
2010-03-30 20:32 ` Jim Duchek
2010-03-30 20:45 ` Mark Knecht
2010-03-30 20:59 ` Jim Duchek
2010-03-30 22:21 ` Mark Knecht
2010-03-30 23:50 ` Mark Knecht
2010-03-31 0:22 ` Jim Duchek [this message]
2010-03-31 1:35 ` Roger Heflin
2010-03-31 16:12 ` Mark Knecht
2010-03-31 16:25 ` Jim Duchek
2010-03-31 16:37 ` Asdo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m2hdead81ad1003301722g30c943ebo9e0552faa82c3c3d@mail.gmail.com \
--to=jim.duchek@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=markknecht@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).