From: "Frantisek Rysanek" <Frantisek.Rysanek@post.cz>
To: linux-kernel@vger.kernel.org
Subject: [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU
Date: Thu, 13 Sep 2007 16:46:09 +0200 [thread overview]
Message-ID: <46E96951.10344.29AE6546@localhost> (raw)
Dear everyone,
apologies in advance for a silly question...
I'm using a homebrew stripped-down mini-distro based on Fedora 5,
with various newer kernels, on a live CD, to test hardware with.
The live CD is composed by means of scripted binary copy of the key
necessary components (libc, init, bash, /dev/, /etc/, you know the
rest...) - it's almost like rolling your own MS-DOS boot floppy.
A minimum system is about 4-10 MB, a neat firewall takes up
about 22 MB.
Recently I've stubled over what seems like a lasting bug
in the Linux kernel. Excuse me for that accusation, which is
admittedly based on rather vague data, dated versions
of the user-space software (libc, bash...), and a homebrew
hackey distro.
First impression:
looped execution of Bonnie++2 makes bash go berserk.
There are two possible flawed behaviors:
1) the bash process that's waiting for Bonnie++2 to return,
starts looping inside the last waitpid() call I believe,
eating 100% CPU.
At least that's what 'top' + 'strace -p <bash PID>' would
suggest. The top and strace have to be running beforehand,
as the same happens to the bash process on any other virtual
console, if you try to run any further command. (The further
command doesn't seem to get executed anymore.)
2) the bash processes don't start eating 100% CPU, but any further
command that you try to execute returns immediately with
a segfault.
I boot the CD with just bare bash on all 6 virtual consoles.
I mount a previously created EXT3 FS (several hundred GB
to over 1 TB) on a mountpoint, `cd` into the mountpoint
on one or two consoles, and run
while true; do bonnie++2 -u root -s 4096; done
Then I run 'iostat 2', 'top' and 'strace -p <bash PID>' on the
remaining consoles. I try running some other command now and then, to
make the paging and block IO subsystems load some more blocks from
the CD.
I believe the `top` output suggests that the Bonnie processes don't
eat all that much RAM, but the kernel-space buffering eats almost all
of it. Only about 50 Megs remain truly "free", most of the RAM gets
"cached". The system stabilizes at this balance, and a few minutes
later it hangs in the aforementioned way.
This happens without a swap. If I mkswap+swapon some free hard drive,
the symptoms seem somewhat more difficult to reproduce, but do occur
after a somewhat longer period of time.
The symptoms are fairly easily reproduced on 2.6.16.18 through
2.6.16.48, as well as 2.6.18.8. On 2.6.22.6 it seems to take a bit
more time to reproduce the problem.
I've reproduced the problem on three different dual Xeon
boxes, all of them SuperMicro of different sizes/generations,
all of them upgraded to the latest BIOS (now showing no more
IRQ routing mischiefs).
The hardware setups are along the lines of
- Intel 7501 chipset, dual Xeon Northwood, 1 GB RAM,
Adaptec 79xx HBA, external RAID (~80 MBps),
internal Adaptec 2120 RAID (~50 MBps)
- Intel 7520 chipset, dual Xeon Irwindale, 2 GB RAM,
several internal U320 SCSI drives via Adaptec 79xx HBA,
an external RAID (~80 MBps) via LSI 20320 HBA (Fusion MPT)
- Intel 7520 chipset, dual Xeon Nocona, 1 GB RAM,
internal LSI MegaRAID SATA150-6 with 6 disk drives.
I've never seen this before I started using bonnie++2 as a load
generator :-) Both my hardware systems and my Linux CD are otherwise
perfectly stable, under sequential IO, cpuburn, older versions of
Bonnie on Linux 2.4 / FreeBSD etc.
I know what it looks like when there's a hardware problem and I know
how to prove/deny a hardware problem by selective A/B-style hardware
replacements, I'm fairly good at shielding away hardware unstability.
Should I start from compiling a fresh libc + bash + whatever else?
Any ideas are welcome :-)
Frank Rysanek
next reply other threads:[~2007-09-13 14:48 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-13 14:46 Frantisek Rysanek [this message]
2007-09-13 0:30 ` [newbie:] Bonnie++2 hangs recent 2.6 kernels? Bash keeps looping in waitpid(), eating 100% CPU Nick Piggin
2007-09-14 15:00 ` Frantisek Rysanek
2007-09-19 22:53 ` Chuck Ebbert
2007-09-20 7:30 ` Frantisek Rysanek
2007-09-19 8:50 ` Frantisek Rysanek
2007-09-19 15:34 ` Randy Dunlap
2007-09-19 16:08 ` Frantisek Rysanek
2008-03-06 20:46 ` Frantisek Rysanek
-- strict thread matches above, loose matches on Subject: below --
2007-09-19 14:58 Frantisek Rysanek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46E96951.10344.29AE6546@localhost \
--to=frantisek.rysanek@post.cz \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox