From: jim owens <jowens@hp.com>
To: lrhorer@satx.rr.com
Cc: 'Linux RAID' <linux-raid@vger.kernel.org>
Subject: Re: RAID halting
Date: Fri, 10 Apr 2009 08:50:05 -0400 [thread overview]
Message-ID: <49DF407D.1080900@hp.com> (raw)
In-Reply-To: <20090410045146.TNPB12747.cdptpa-omta01.mail.rr.com@Leslie>
Leslie Rhorer wrote:
>>> for f in /sys/block/*/queue/scheduler; do
>>> echo noop > $f
>>> echo $f "$(cat $f)"
>>> done
>> OK, I did this. Two questions:
>
> It doesn't seem to have helped or hindered. I still get halts, but under
> moderate loads not every time.
>
>>> Leslie: I still think finding out what the kernel is doing during the
>>> stall would be a HUGE hint to the problem. Did you look into oprofile or
>>> ftrace?
>> I couldn't find a Debian source for ftrace, but I did download oprofile.
>
> Something very disturbing is happening now, however. Just a few minutes
> after loading oprofile, the system did a sudden total shutdown. The file
> systems were all left dirty, and power was suddenly cut to the main chassis.
> This has never happened before. I rebooted the system, and the file systems
> replayed their journals. Some data was lost, of course, but nothing
> serious. A few hours later, the exact same thing happened again: A sudden
> shut-down. Nothing like this has ever happened before. Of course the
> system can issue a power shutdown from software, but it is supposed to clean
> up the file systems first, and it's not supposed to just do it autonomously.
There are some problems with oprofile on recent kernels and
various hardware platforms. From the discussions I have seen,
it appears to be conflicts between the platform interrupt
handlers that manage things like power events and the CPU
performance counter non-maskable interrupts that are triggered
by oprofile. The result is the system goes boom.
Your platform/distro is not where this was reported, but what
is happening to you sounds like the same problem.
Two approaches have been tried to work around this:
1) disable those platform management drivers.
2) run oprofile using the kernel clock (1000hz) to collect
events instead of the hardware counters.
Since it is only very recently that the cause of this problem
was identified (and I was not really paying attention), I don't
know how successful either work around is or when fixes might
be available.
jim
next prev parent reply other threads:[~2009-04-10 12:50 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <49D7C19C.2050308@gmail.com>
2009-04-05 0:07 ` RAID halting Lelsie Rhorer
2009-04-05 0:49 ` Greg Freemyer
2009-04-05 5:34 ` Lelsie Rhorer
2009-04-05 7:16 ` Richard Scobie
2009-04-05 8:22 ` Lelsie Rhorer
2009-04-05 14:05 ` Drew
2009-04-05 18:54 ` Leslie Rhorer
2009-04-05 19:17 ` John Robinson
2009-04-05 20:00 ` Greg Freemyer
2009-04-05 20:39 ` Peter Grandi
2009-04-05 23:27 ` Leslie Rhorer
2009-04-05 22:03 ` Leslie Rhorer
2009-04-06 22:16 ` Greg Freemyer
2009-04-07 18:22 ` Leslie Rhorer
2009-04-24 4:52 ` Leslie Rhorer
2009-04-24 6:50 ` Richard Scobie
2009-04-24 10:03 ` Leslie Rhorer
2009-04-28 19:36 ` lrhorer
2009-04-24 15:24 ` Andrew Burgess
2009-04-25 4:26 ` Leslie Rhorer
2009-04-24 17:03 ` Doug Ledford
2009-04-24 20:25 ` Richard Scobie
2009-04-24 20:28 ` CoolCold
2009-04-24 21:04 ` Richard Scobie
2009-04-25 7:40 ` Leslie Rhorer
2009-04-25 8:53 ` Michał Przyłuski
2009-04-28 19:33 ` Leslie Rhorer
2009-04-29 11:25 ` John Robinson
2009-04-30 0:55 ` Leslie Rhorer
2009-04-30 12:34 ` John Robinson
2009-05-03 2:16 ` Leslie Rhorer
2009-05-03 2:23 ` Leslie Rhorer
2009-04-24 20:25 ` Greg Freemyer
2009-04-25 7:24 ` Leslie Rhorer
2009-04-05 21:02 ` Leslie Rhorer
2009-04-05 19:26 ` Richard Scobie
2009-04-05 20:40 ` Leslie Rhorer
2009-04-05 20:57 ` Peter Grandi
2009-04-05 23:55 ` Leslie Rhorer
2009-04-06 20:35 ` jim owens
2009-04-07 17:47 ` Leslie Rhorer
2009-04-07 18:18 ` David Lethe
2009-04-08 14:17 ` Leslie Rhorer
2009-04-08 14:30 ` David Lethe
2009-04-09 4:52 ` Leslie Rhorer
2009-04-09 6:45 ` David Lethe
2009-04-08 14:37 ` Greg Freemyer
2009-04-08 16:29 ` Andrew Burgess
2009-04-09 3:24 ` Leslie Rhorer
2009-04-10 3:02 ` Leslie Rhorer
2009-04-10 4:51 ` Leslie Rhorer
2009-04-10 12:50 ` jim owens [this message]
2009-04-10 15:31 ` Bill Davidsen
2009-04-11 1:37 ` Leslie Rhorer
2009-04-11 13:02 ` Bill Davidsen
2009-04-10 8:53 ` David Greaves
2009-04-08 18:04 ` Corey Hickey
2009-04-07 18:20 ` Greg Freemyer
2009-04-08 8:45 ` John Robinson
2009-04-09 3:34 ` Leslie Rhorer
2009-04-05 7:33 ` Richard Scobie
2009-04-05 0:57 ` Roger Heflin
2009-04-05 6:30 ` Lelsie Rhorer
[not found] <49F2A193.8080807@sauce.co.nz>
2009-04-25 7:03 ` Leslie Rhorer
[not found] <49F21B75.7060705@sauce.co.nz>
2009-04-25 4:32 ` Leslie Rhorer
[not found] <49D89515.3020800@computer.org>
2009-04-05 18:40 ` Leslie Rhorer
2009-04-05 14:22 FW: " David Lethe
2009-04-05 14:53 ` David Lethe
2009-04-05 20:33 ` Leslie Rhorer
2009-04-05 22:20 ` Peter Grandi
2009-04-06 0:31 ` Doug Ledford
2009-04-06 1:53 ` Leslie Rhorer
2009-04-06 12:37 ` Doug Ledford
-- strict thread matches above, loose matches on Subject: below --
2009-04-05 5:33 David Lethe
2009-04-05 8:14 ` RAID halting Lelsie Rhorer
2009-04-04 17:05 Lelsie Rhorer
2009-04-02 13:35 Andrew Burgess
2009-04-04 5:57 ` RAID halting Lelsie Rhorer
2009-04-04 13:01 ` Andrew Burgess
2009-04-04 14:39 ` Lelsie Rhorer
2009-04-04 15:04 ` Andrew Burgess
2009-04-04 15:15 ` Lelsie Rhorer
2009-04-04 16:39 ` Andrew Burgess
2009-04-02 7:33 Peter Grandi
2009-04-02 23:01 ` RAID halting Lelsie Rhorer
2009-04-02 6:56 your mail Luca Berra
2009-04-04 6:44 ` RAID halting Lelsie Rhorer
2009-04-02 4:38 Strange filesystem slowness with 8TB RAID6 NeilBrown
2009-04-04 7:12 ` RAID halting Lelsie Rhorer
2009-04-04 12:38 ` Roger Heflin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49DF407D.1080900@hp.com \
--to=jowens@hp.com \
--cc=linux-raid@vger.kernel.org \
--cc=lrhorer@satx.rr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).