From: Con Kolivas <conman@kolivas.net>
To: linux kernel mailing list <linux-kernel@vger.kernel.org>
Cc: Andrew Morton <akpm@digeo.com>
Subject: Re: Pathological case identified from contest
Date: Thu, 17 Oct 2002 17:16:46 +1000 [thread overview]
Message-ID: <1034839006.3dae63de3f69a@kolivas.net> (raw)
In-Reply-To: <1034828795.3dae3bfb42911@kolivas.net>
Quoting Con Kolivas <conman@kolivas.net>:
> Quoting Andrew Morton <akpm@digeo.com>:
>
> > Con Kolivas wrote:
> > >
> > > I found a pathological case in 2.5 while running contest with
> process_load
> > > recently after checking the results which showed a bad result for
> > 2.5.43-mm1:
> > >
> > > 2.5.43-mm1 101.38 72% 42 31%
> > > 2.5.43-mm1 102.90 75% 34 28%
> > > 2.5.43-mm1 504.12 14% 603 85%
> > > 2.5.43-mm1 96.73 77% 34 26%
> > >
> > > This was very strange so I looked into it further
> > >
> > > The default for process_load is this command:
> > >
> > > process_load --processes $nproc --recordsize 8192 --injections 2
> > >
> > > where $nproc=4*num_cpus
> > >
> > > When I changed recordsize to 16384, many of the 2.5 kernels started
> > exhibiting
> > > the same behaviour. While the machine was apparently still alive and
> would
> > > respond to my request to abort, the kernel compile would all but stop
> > while
> > > process_load just continued without allowing anything to happen from
> > kernel
> > > compilation for up to 5 minutes at a time. This doesnt happen with any
> 2.4
> > kernels.
> > >
> >
> > Well it doesn't happen on my test machine (UP or SMP). I tried
> > various recordsizes. It's probably related to HZ, memory bandwidth
> > and the precise timing at which things happen.
> >
> > The test describes itself thusly:
> >
> > * This test generates a load which simulates a process-loaded system.
> > *
> > * The test creates a ring of processes, each connected to its
> predecessor
> > * and successor by a pipe. After the ring is created, the parent
> process
> > * injects some dummy data records into the ring and then joins. The
> > * processes pass the data records around the ring until they are killed.
> > *
> >
> > It'll be starvation in the CPU scheduler I expect. For some reason
> > the ring of piping processes is just never giving a timeslice to
> > anything else. Or maybe something to do with the exceptional
> > wakeup strategy which pipes use.
> >
> > Don't now, sorry. One for the kernel/*.c guys.
>
> Ok well I've done some profiling as suggested by wli and it shows pretty
> much
> what I find in the results - it gets stuck while doing process_load and
> never
> moves on.
>
> recordsize 8192 kern profile:
> c01223ac 76997 4.48583 do_anonymous_page
> c0188694 135835 7.91373 __generic_copy_from_user
> c0188610 345071 20.1038 __generic_copy_to_user
> c0105298 801429 46.6911 poll_idle
> sysprofile:
> 00000000 160258 5.03854 (no symbol)
> /lib/i686/libc-2.2.5.so
> c0188610 345071 10.8491 __generic_copy_to_user
> /home/con/kernel/linux-2.5.43/vmlinux
> c0105298 801429 25.1971 poll_idle
> /home/con/kernel/linux-2.5.43/vmlinux
> 00000000 1132668 35.6113 (no symbol)
> /usr/lib/gcc-lib/i686-pc-linux-gnu/2.95.3/cc1
>
> Normal run consistent with doing kernel compilation most of the time.
>
> recordsize 16384 kernprofile:
> c0111ef4 403545 4.3407 do_schedule
> c0105298 558704 6.00965 poll_idle
> c0188694 2571995 27.6655 __generic_copy_from_user
> c0188610 4489796 48.2941 __generic_copy_to_user
> sysprofile:
> c0111ef4 403545 4.24896 do_schedule
> /home/con/kernel/linux-2.5.43/vmlinux
> c0105298 558704 5.88264 poll_idle
> /home/con/kernel/linux-2.5.43/vmlinux
> c0188694 2571995 27.0807 __generic_copy_from_user
> /home/con/kernel/linux-2.5.43/vmlinux
> c0188610 4489796 47.2734 __generic_copy_to_user
> /home/con/kernel/linux-2.5.43/vmlinux
>
> I had to abort the run with recordsize 16384 but you can see it's just stuck
> in
> process_load copying data between forked processes.
>
> Can someone else on lkml decipher why it gets stuck here?
Well this has become more common with 2.5.43-mm2. I had to abort the
process_load run 3 times when benchmarking it. Going back to other kernels and
trying them it didnt happen so I dont think its my hardware failing or something
like that.
Con
next prev parent reply other threads:[~2002-10-17 7:10 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-17 2:13 Pathological case identified from contest Con Kolivas
2002-10-17 2:49 ` Andrew Morton
2002-10-17 4:26 ` Con Kolivas
2002-10-17 7:16 ` Con Kolivas [this message]
2002-10-17 7:35 ` Andrew Morton
2002-10-17 17:15 ` Rik van Riel
2002-10-20 2:59 ` Con Kolivas
2002-10-20 3:05 ` Andrew Morton
2002-10-20 6:27 ` Con Kolivas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1034839006.3dae63de3f69a@kolivas.net \
--to=conman@kolivas.net \
--cc=akpm@digeo.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox