[patch 0/7] fault-injection capabilities (v5)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 0/7] fault-injection capabilities (v5)
@ 2006-10-12  7:43 Akinobu Mita
  2006-10-12 21:26 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Akinobu Mita @ 2006-10-12  7:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, akpm, Don Mullis

Fault-injection capabilities patch set version 5.
Please read the mail for the patch 1/7 for details

Changes from v3 to v5 (v4 was not delivered to linux-kernel list)

- do dump_stack() on injecting failures (if verbose option enabled)

- updated to the latest kernel version
- add debugfs entries automatically at boot time
  (fault-inject-debugfs.ko has been removed)
- various bugfixes and cleanups


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 0/7] fault-injection capabilities (v5)
  2006-10-12  7:43 [patch 0/7] fault-injection capabilities (v5) Akinobu Mita
@ 2006-10-12 21:26 ` Andrew Morton
  2006-10-13 17:46   ` Akinobu Mita
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2006-10-12 21:26 UTC (permalink / raw)
  To: Akinobu Mita; +Cc: linux-kernel, ak, Don Mullis

On Thu, 12 Oct 2006 16:43:05 +0900
Akinobu Mita <akinobu.mita@gmail.com> wrote:

> Fault-injection capabilities patch set version 5.

It all looks quite nice, thanks.  Couple of things...

You've presumably run a kernel with these various things enabled.  What
happens?  Does the kernel run really slowly?  Does userspace collapse in a
heap?  Does it oops and die?

Also, one place where this infrastructure could be of benefit is in device
drivers: simulate a bad sector on the disk, a pulled cable, a timeout
reading from a status register, etc.  If that works well and is useful then
I can see us encouraging driver developers to wire up fault-injection in
the major drivers.

Hence it would be useful at some stage to go in and to actually do all this
for a particular driver.  As an example implementation for others to
emulate and as a test for the fault-injection infrastructure itself - we
may discover that new capabilities are needed as this work is done.

I wouldn't say this is an urgent thing to be doing, but it is a logical
next step..

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 0/7] fault-injection capabilities (v5)
  2006-10-12 21:26 ` Andrew Morton
@ 2006-10-13 17:46   ` Akinobu Mita
  2006-10-13 19:00     ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Akinobu Mita @ 2006-10-13 17:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, ak, Don Mullis

On Thu, Oct 12, 2006 at 02:26:25PM -0700, Andrew Morton wrote:

> You've presumably run a kernel with these various things enabled.  What
> happens?  Does the kernel run really slowly?  Does userspace collapse in a
> heap?  Does it oops and die?

I don't feel much slowness with STACKTRACE & FRAME_POINTER and
enabling stacktrace filter. But with enabling STACK_UNWIND I feel
big latency on X. (There are two type of implementation of stacktrace
filter in it [1] using STACKTRACE with FRAME_POINTER, and [2] STACK_UNWIND)

I don't know why there is quite difference between simple STACKTRACE and
STACK_UNWIND. I'm about to try to use rb tree rather than linked list in
unwind.

In order to prevent from breaking other userspace programs and to
inject failures into only a specific code or process, process filter and
stacktrace filter are available. Without using them the system would be
almost unusable.

Now I'm stuck on the script in fault-injection.txt with random 700
modules. This script just tries to load/unload for all available kernel
modules. It usually get several oopses or CPU soft lockup now.  It
seems that relatively large number of them involved around driver model
(drivers/base/*). (I hope recent large number of error handle fixes
especially by Jeff Garzik fix them)

> Also, one place where this infrastructure could be of benefit is in device
> drivers: simulate a bad sector on the disk, a pulled cable, a timeout
> reading from a status register, etc.  If that works well and is useful then
> I can see us encouraging driver developers to wire up fault-injection in
> the major drivers.
> 
> Hence it would be useful at some stage to go in and to actually do all this
> for a particular driver.  As an example implementation for others to
> emulate and as a test for the fault-injection infrastructure itself - we
> may discover that new capabilities are needed as this work is done.
> 
> I wouldn't say this is an urgent thing to be doing, but it is a logical
> next step..

Yes. I'm learning from md/faulty and scsi-debug module what they are
doing and how to integrate such kind of features in general form.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 0/7] fault-injection capabilities (v5)
  2006-10-13 17:46   ` Akinobu Mita
@ 2006-10-13 19:00     ` Andrew Morton
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2006-10-13 19:00 UTC (permalink / raw)
  To: Akinobu Mita, Jan Beulich, Ingo Molnar; +Cc: linux-kernel, ak, Don Mullis

On Sat, 14 Oct 2006 02:46:24 +0900
Akinobu Mita <akinobu.mita@gmail.com> wrote:

> On Thu, Oct 12, 2006 at 02:26:25PM -0700, Andrew Morton wrote:
> 
> > You've presumably run a kernel with these various things enabled.  What
> > happens?  Does the kernel run really slowly?  Does userspace collapse in a
> > heap?  Does it oops and die?
> 
> I don't feel much slowness with STACKTRACE & FRAME_POINTER and
> enabling stacktrace filter. But with enabling STACK_UNWIND I feel
> big latency on X. (There are two type of implementation of stacktrace
> filter in it [1] using STACKTRACE with FRAME_POINTER, and [2] STACK_UNWIND)
> 
> I don't know why there is quite difference between simple STACKTRACE and
> STACK_UNWIND. I'm about to try to use rb tree rather than linked list in
> unwind.

umm, we've hit this before, recently - iirc it was making lockdep run
really slowly.

The new unwinding code is apparently really inefficient in some situations.
It wasn't expected that it would be called at a high frequency, except people
_do_ want to do that.

I forget the details, but I can cc people who have better memory.

> In order to prevent from breaking other userspace programs and to
> inject failures into only a specific code or process, process filter and
> stacktrace filter are available. Without using them the system would be
> almost unusable.
> 
> Now I'm stuck on the script in fault-injection.txt with random 700
> modules. This script just tries to load/unload for all available kernel
> modules. It usually get several oopses or CPU soft lockup now.  It
> seems that relatively large number of them involved around driver model
> (drivers/base/*).

I'm shocked ;)

> (I hope recent large number of error handle fixes
> especially by Jeff Garzik fix them)

We're getting there, but there's a lot more to do.

> > Also, one place where this infrastructure could be of benefit is in device
> > drivers: simulate a bad sector on the disk, a pulled cable, a timeout
> > reading from a status register, etc.  If that works well and is useful then
> > I can see us encouraging driver developers to wire up fault-injection in
> > the major drivers.
> > 
> > Hence it would be useful at some stage to go in and to actually do all this
> > for a particular driver.  As an example implementation for others to
> > emulate and as a test for the fault-injection infrastructure itself - we
> > may discover that new capabilities are needed as this work is done.
> > 
> > I wouldn't say this is an urgent thing to be doing, but it is a logical
> > next step..
> 
> Yes. I'm learning from md/faulty and scsi-debug module what they are
> doing and how to integrate such kind of features in general form.

Neato, thanks.  Please don't slow down!

It's the GFP_ATOMIC allocations which we care about, really.  Things like
GFP_KERNEL actually don't fail in the current implementation (unless the
caller got oom-killed), and all this check-for-allocation-failures work
we're doing is mainly for completeness, and in anticipation of changes in
the page allocator strategy.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch 0/7] fault-injection capabilities (v5)
@ 2006-10-14 10:52 Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2006-10-14 10:52 UTC (permalink / raw)
  To: mingo, akinobu.mita, akpm; +Cc: dwm, ak, linux-kernel

>> I don't feel much slowness with STACKTRACE & FRAME_POINTER and
>> enabling stacktrace filter. But with enabling STACK_UNWIND I feel
>> big latency on X. (There are two type of implementation of stacktrace
>> filter in it [1] using STACKTRACE with FRAME_POINTER, and [2] STACK_UNWIND)
>> 
>> I don't know why there is quite difference between simple STACKTRACE and
>> STACK_UNWIND. I'm about to try to use rb tree rather than linked list in
>> unwind.
>
>umm, we've hit this before, recently - iirc it was making lockdep run
>really slowly.
>
>The new unwinding code is apparently really inefficient in some situations.
>It wasn't expected that it would be called at a high frequency, except people
>_do_ want to do that.
>
>I forget the details, but I can cc people who have better memory.

The problem is that there's currently nothing to allow a binary search through
the unwind descriptors. The easy path to add these is closed as binutils don't
work here due to two independent limitations. Hence I'm going to add a two-
phase initializations, the second part of which will allocate and initialize a
helper table equivalent to a linker generated .eh_frame_hdr section. I'm not
certain at this point whether we'll need this for modules too.

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-10-14 10:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-12  7:43 [patch 0/7] fault-injection capabilities (v5) Akinobu Mita
2006-10-12 21:26 ` Andrew Morton
2006-10-13 17:46   ` Akinobu Mita
2006-10-13 19:00     ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2006-10-14 10:52 Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox