From: Dave Haywood <tla@oak.selfip.net>
To: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
David Miller <davem@davemloft.net>,
mpm@selenic.com, rjw@sisk.pl, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, torvalds@linux-foundation.org,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [bug] SLOB crash, 2.6.24-rc2
Date: Thu, 15 Nov 2007 12:18:08 +0000 [thread overview]
Message-ID: <473C3900.8010508@oak.selfip.net> (raw)
In-Reply-To: <20071115112820.GA18228@elte.hu>
Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
>
>> On Thursday 15 November 2007 21:43, Ingo Molnar wrote:
>>
>>> * David Miller <davem@davemloft.net> wrote:
>>>
>>>> From: Matt Mackall <mpm@selenic.com>
>>>> Date: Wed, 14 Nov 2007 17:37:13 -0600
>>>>
>>>>
>>>>> No, the usual strategy for debugging problems -outside- SLOB is to
>>>>> switch to another allocator with more extensive debugging facilities.
>>>>>
>>>> Ok, so the thing we still can do is do a dump_stack() at the list
>>>> debugging assertion trigger points.
>>>>
>>> ok, i'll first try to trigger it again.
>>>
>> I had implemented SLOB in userspace, so I resynched and think I found
>> your problem. Sorry for the attachment format -- this mailer isn't the
>> best. I'm really computer illiterate when it comes to userspace...
>>
>
> thx, i'll try your fix in a minute.
>
>
>> Anyway, I'm really happy to see you're testing and using SLOB upstream
>> :) Is there any particular reason that you're using it?
>>
>
> i sometimes test SLOB for -rt, but this time it's the result of my
> "automated random QA" effort, as part of arch/x86 maintainance/QA.
>
> the main trick is to build and booting random "make randconfig"
> bzImages. That finds build bugs and a good deal of boot hang and crash
> bugs as well. (it also found a compiler bug already) I can build and
> boot about 1000 random kernels in 24 hours, and it's all fully
> automated. I usually run it overnight - when a kernel does not come up
> due to a bootup hang or crash (or the kernel log signals any exception
> condition) then the script stops and i can fix it in the morning.
>
> The first step towards this was to get allyesconfig bzImage kernels to
> build and boot fine. That effort took months (we had many problems in
> this area) - i think you saw bugreports and fixes from me about that on
> lkml.
>
> Once that worked reasonably well i made a small Kconfig patch that
> forcibly selects a "minimum set" of drivers and kernel subsystems that
> are needed to boot up a testsystem. Once a "make allnoconfig" and a
> "make allyesconfig" bzImage kernel boots up fine on the testbox all
> randconfig configs "inbetween" are supposed to build and boot fine as
> well.
>
> I also have a patch that adds all the x86 boot options like nosmp,
> maxcpus=1, nohz=off, hpet=disable to be selectable as .config options -
> so those boot options are randomized as well.
>
> I also have a small patch that disables half a dozen drivers/features
> that are not expected to work out of box in a bzImage kernel. (such as
> ISA drivers that assume the presence of hardware, or root filesystem
> features such as NFSROOT)
>
> the resulting make randconfig kernel still has 99% of the degrees of
> freedom that a stock make randconfig kernel has, so by all practical
> purposes it's a fully random kernel - it just happens to boot on my
> testsystem all the time.
>
> A successful bootup means the test system is able to boot up into a
> stock Fedora 8 userspace and is able to bring up its network interfaces
> and ssh out (automatically) to the build box to signal the completion of
> a successful test cycle. The logs are also analyzed for lockdep
> assertions (if lockdep is enabled - which it is in about 20% of the
> randconfig kernels) and other kernel bugs.
>
> (just in case you were wondering about one of the reasons why the
> arch/x86 unification merge went so smoothly, with nary a regression ;-)
> Thomas is doing other types of automated QA of the x86 queue as well.)
>
> this method found the SG-list corruption bugs the following night after
> Linus committed Jen's SG-list changes, so it's pretty good at finding
> regressions as early as possible.
>
> Ingo
>
How complete is the QA testing? I was reading this interesting thread
and it occurred to me that this sounds like a useful distributed
computing application. ie a central server with all valid Kconfig
combinations (how many are there?) for a particular release (-rc or
otherwise) across all architectures. These are allocated to clients on
request to be built / booted etc. Any errors are fed back to the
central server. I guess this would be a useful resource for
developers. More importantly (and I don't know if this is the case
already!) a new Linux release (2.6.x) could be "certified" with some
level of testing on known hardware / architectures.
tbh, I feel sorry for Ingo's machine compiling 1000 random kernels in
24h! I'm surprised it hasn't called the Samaritans...
Dave.
prev parent reply other threads:[~2007-11-15 12:40 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-11 19:58 2.6.24-rc2: Reported regressions from 2.6.23 (updated) Rafael J. Wysocki
2007-11-11 20:09 ` Alan Cox
2007-11-11 20:34 ` Rafael J. Wysocki
2007-11-11 22:22 ` Bartlomiej Zolnierkiewicz
2007-11-11 22:46 ` Alan Cox
2007-11-13 1:11 ` Andrew Morton
2007-11-13 14:09 ` Thomas Lindroth
[not found] ` <3d08dbff0711130534k702f66ebj1f8e91d107eff2a1@mail.gmail.com>
2007-11-13 19:52 ` Andrew Morton
2007-11-11 20:30 ` Ingo Molnar
2007-11-11 20:33 ` Francois Romieu
2007-11-14 11:20 ` [bug] SLOB crash, 2.6.24-rc2 Ingo Molnar
2007-11-14 17:36 ` Matt Mackall
2007-11-14 18:39 ` Matt Mackall
2007-11-14 19:05 ` Ingo Molnar
2007-11-14 19:42 ` Matt Mackall
2007-11-14 22:39 ` David Miller
2007-11-14 22:53 ` Matt Mackall
2007-11-14 23:10 ` David Miller
2007-11-14 23:37 ` Matt Mackall
2007-11-14 23:41 ` David Miller
2007-11-15 0:09 ` Matt Mackall
2007-11-15 10:43 ` Ingo Molnar
2007-11-15 10:51 ` David Miller
2007-11-15 11:03 ` Ingo Molnar
2007-11-15 11:05 ` David Miller
2007-11-15 10:57 ` Nick Piggin
2007-11-15 11:28 ` Ingo Molnar
2007-11-15 11:32 ` [patch] slob: fix memory corruption Ingo Molnar
2007-11-15 12:48 ` Ingo Molnar
2007-11-15 20:25 ` Nick Piggin
2007-11-15 16:00 ` Matt Mackall
2007-11-15 11:39 ` [bug] SLOB crash, 2.6.24-rc2 Nick Piggin
2007-11-15 12:18 ` Dave Haywood [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=473C3900.8010508@oak.selfip.net \
--to=tla@oak.selfip.net \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mpm@selenic.com \
--cc=nickpiggin@yahoo.com.au \
--cc=rjw@sisk.pl \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox