Re: Array 'freezes' for some time after large writes?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mark Knecht <markknecht@gmail.com>
To: Jim Duchek <jim.duchek@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Array 'freezes' for some time after large writes?
Date: Tue, 30 Mar 2010 16:50:53 -0700	[thread overview]
Message-ID: <5bdc1c8b1003301650g26d7317cwb6254b4bb64596@mail.gmail.com> (raw)
In-Reply-To: <5bdc1c8b1003301521s33c9b227s3a1f7434f78c4bc9@mail.gmail.com>

On Tue, Mar 30, 2010 at 3:21 PM, Mark Knecht <markknecht@gmail.com> wrote:
> I just finished a long compile on my dad's i5-661/DH55HC machine which
> uses this same WD drive and I didn't spot any sign of this happening
> there. That's a very recent Intel chipset also and probably more or
> less the same SATA controller.
>
> I'm going to turn on the kernel message into dmesg thing for a while
> and see if anything pops up.
>
> I can set up some additional partitions on my local drive to test
> other file systems but since you're ext3 and I'm ext3 then it's not
> that unless the problem moved forward with code over time.
>
> I like the idea of using dd but I want to be careful about that sort
> of thing. I've not used dd before, but if I could tell it to write a
> gigabyte without messing up existing stuff then that could be helpful.
>
> Back later,
> Mark
>
> On Tue, Mar 30, 2010 at 1:59 PM, Jim Duchek <jim.duchek@gmail.com> wrote:
>> I'm using ext4 on everything, but it's hard to judge which ext3 bugs
>> might affect ext4 as well.  I really don't have the ability to
>> destructively test the array, I need all the data that's on it and I
>> don't have enough spare space elsewhere to back it all up.  You might
>> see if you can trigger it with dd, writing to the drive directly w/no
>> filesystem?
>>
>> Jim
>>

<SNIP>

I know this isn't going to survive email very well but you might want
to look at interrupts. I'm seeing the count on CPU #5 rising much more
quickly than other CPU's, and in my case it's generally CPU #5 that
stalls out with this 100% wait problem.

I'm looking at another 4 processor machine that's been up for a few
days. Its interrupt counts are fairly balanced, except for TLB
Shootdowns, whatever that is.

Wouldn't know how to tell if it's related...

- Mark

Using keyboard-interactive authentication.
Password:
Last login: Tue Mar 30 15:59:22 PDT 2010 from 192.168.1.65 on pts/0
keeper ~ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7
  0:        232          0          0          1          0          0
         0          0   IO-APIC-edge      timer
  1:          0          0          0          2          0          0
         0          0   IO-APIC-edge      i8042
  3:          0          0          0          2          0          0
         0          0   IO-APIC-edge
  8:          0          0          0         91          0          0
         0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   acpi
 12:          0          0          0          4          0          0
         0          0   IO-APIC-edge      i8042
 14:          0          0          0          0          0          0
         0          0   IO-APIC-edge      ide0
 15:          0          0          0          0          0          0
         0          0   IO-APIC-edge      ide1
 16:          0          0          0          0         82          0
         0          0   IO-APIC-fasteoi   ahci, uhci_hcd:usb1, nvidia
 18:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb6, ehci_hcd:usb7
 19:          0          0          0          0          0       3137
         0          0   IO-APIC-fasteoi   ahci, firewire_ohci,
uhci_hcd:usb3, uhci_hcd:usb5
 20:          0          0          0          0          0          0
       265          0   IO-APIC-fasteoi   eth0
 21:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb2
 22:        154          0          0          0          0          0
         0          0   IO-APIC-fasteoi   hda_intel
 23:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb4, ehci_hcd:usb8
NMI:          0          0          0          0          0          0
         0          0   Non-maskable interrupts
LOC:       7048       6722       3577       3598       3491       8425
      3756       3569   Local timer interrupts
SPU:          0          0          0          0          0          0
         0          0   Spurious interrupts
PMI:          0          0          0          0          0          0
         0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0
         0          0   Performance pending work
RES:        335        332        353        259        176        173
       251         82   Rescheduling interrupts
CAL:        242        233        258        180        241        160
       260        260   Function call interrupts
TLB:        232        242        270        235        342        474
       537        497   TLB shootdowns
TRM:          0          0          0          0          0          0
         0          0   Thermal event interrupts
THR:          0          0          0          0          0          0
         0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0
         0          0   Machine check exceptions
MCP:          2          2          2          2          2          2
         2          2   Machine check polls
ERR:          7
MIS:          0
keeper ~ # date
Tue Mar 30 16:45:13 PDT 2010
keeper ~ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7
  0:        232          0          0          9          0          0
         0          0   IO-APIC-edge      timer
  1:          0          0          0          2          0          0
         0          0   IO-APIC-edge      i8042
  3:          0          0          0          2          0          0
         0          0   IO-APIC-edge
  8:          0          0          0         91          0          0
         0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   acpi
 12:          0          0          0          4          0          0
         0          0   IO-APIC-edge      i8042
 14:          0          0          0          0          0          0
         0          0   IO-APIC-edge      ide0
 15:          0          0          0          0          0          0
         0          0   IO-APIC-edge      ide1
 16:          0          0          0          0       2660          0
         0          0   IO-APIC-fasteoi   ahci, uhci_hcd:usb1, nvidia
 18:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb6, ehci_hcd:usb7
 19:          0          0          0          0          0      20762
         0          0   IO-APIC-fasteoi   ahci, firewire_ohci,
uhci_hcd:usb3, uhci_hcd:usb5
 20:          0          0          0          0          0          0
      1903          0   IO-APIC-fasteoi   eth0
 21:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb2
 22:        154          0          0          0          0          0
         0          0   IO-APIC-fasteoi   hda_intel
 23:          0          0          0          0          0          0
         0          0   IO-APIC-fasteoi   uhci_hcd:usb4, ehci_hcd:usb8
NMI:          0          0          0          0          0          0
         0          0   Non-maskable interrupts
LOC:      10618      11998       8756       6940       6484      22076
      7456       6599   Local timer interrupts
SPU:          0          0          0          0          0          0
         0          0   Spurious interrupts
PMI:          0          0          0          0          0          0
         0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0
         0          0   Performance pending work
RES:        335        332        353        259        176        173
       251         82   Rescheduling interrupts
CAL:        242        233        258        180        241        160
       260        260   Function call interrupts
TLB:        232        243        270        236        343        475
       538        497   TLB shootdowns
TRM:          0          0          0          0          0          0
         0          0   Thermal event interrupts
THR:          0          0          0          0          0          0
         0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0
         0          0   Machine check exceptions
MCP:         10         10         10         10         10         10
        10         10   Machine check polls
ERR:          7
MIS:          0
keeper ~ #
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2010-03-30 23:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-30 17:07 Array 'freezes' for some time after large writes? Jim Duchek
2010-03-30 17:18 ` Mark Knecht
2010-03-30 17:47   ` Jim Duchek
2010-03-30 18:00     ` Mark Knecht
2010-03-30 18:05     ` Mark Knecht
2010-03-30 20:32       ` Jim Duchek
2010-03-30 20:45         ` Mark Knecht
2010-03-30 20:59           ` Jim Duchek
2010-03-30 22:21             ` Mark Knecht
2010-03-30 23:50               ` Mark Knecht [this message]
2010-03-31  0:22                 ` Jim Duchek
2010-03-31  1:35 ` Roger Heflin
2010-03-31 16:12   ` Mark Knecht
2010-03-31 16:25     ` Jim Duchek
2010-03-31 16:37 ` Asdo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5bdc1c8b1003301650g26d7317cwb6254b4bb64596@mail.gmail.com \
    --to=markknecht@gmail.com \
    --cc=jim.duchek@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).