All of lore.kernel.org
 help / color / mirror / Atom feed
From: Con Kolivas <conman@kolivas.net>
To: linux kernel mailing list <linux-kernel@vger.kernel.org>
Cc: Andrew Morton <akpm@digeo.com>
Subject: Re: Pathological case identified from contest
Date: Thu, 17 Oct 2002 17:16:46 +1000	[thread overview]
Message-ID: <1034839006.3dae63de3f69a@kolivas.net> (raw)
In-Reply-To: <1034828795.3dae3bfb42911@kolivas.net>

Quoting Con Kolivas <conman@kolivas.net>:

> Quoting Andrew Morton <akpm@digeo.com>:
> 
> > Con Kolivas wrote:
> > > 
> > > I found a pathological case in 2.5 while running contest with
> process_load
> > > recently after checking the results which showed a bad result for
> > 2.5.43-mm1:
> > > 
> > > 2.5.43-mm1              101.38  72%     42      31%
> > > 2.5.43-mm1              102.90  75%     34      28%
> > > 2.5.43-mm1              504.12  14%     603     85%
> > > 2.5.43-mm1              96.73   77%     34      26%
> > > 
> > > This was very strange so I looked into it further
> > > 
> > > The default for process_load is this command:
> > > 
> > > process_load --processes $nproc --recordsize 8192 --injections 2
> > > 
> > > where $nproc=4*num_cpus
> > > 
> > > When I changed recordsize to 16384, many of the 2.5 kernels started
> > exhibiting
> > > the same behaviour. While the machine was apparently still alive and
> would
> > > respond to my request to abort, the kernel compile would all but stop
> > while
> > > process_load just continued without allowing anything to happen from
> > kernel
> > > compilation for up to 5 minutes at a time. This doesnt happen with any
> 2.4
> > kernels.
> > > 
> > 
> > Well it doesn't happen on my test machine (UP or SMP).  I tried
> > various recordsizes.  It's probably related to HZ, memory bandwidth
> > and the precise timing at which things happen.
> > 
> > The test describes itself thusly:
> > 
> >  *  This test generates a load which simulates a process-loaded system.
> >  *
> >  *  The test creates a ring of processes, each connected to its
> predecessor
> >  *  and successor by a pipe.  After the ring is created, the parent
> process
> >  *  injects some dummy data records into the ring and then joins.  The
> >  *  processes pass the data records around the ring until they are killed.
> >  *
> > 
> > It'll be starvation in the CPU scheduler I expect.  For some reason
> > the ring of piping processes is just never giving a timeslice to
> > anything else.  Or maybe something to do with the exceptional
> > wakeup strategy which pipes use.
> > 
> > Don't now, sorry.  One for the kernel/*.c guys.
> 
> Ok well I've done some profiling as suggested by wli and it shows pretty
> much
> what I find in the results - it gets stuck while doing process_load and
> never
> moves on.
> 
> recordsize 8192 kern profile:
> c01223ac 76997    4.48583     do_anonymous_page
> c0188694 135835   7.91373     __generic_copy_from_user
> c0188610 345071   20.1038     __generic_copy_to_user
> c0105298 801429   46.6911     poll_idle
> sysprofile:
> 00000000 160258   5.03854     (no symbol)            
> /lib/i686/libc-2.2.5.so
> c0188610 345071   10.8491     __generic_copy_to_user 
> /home/con/kernel/linux-2.5.43/vmlinux
> c0105298 801429   25.1971     poll_idle              
> /home/con/kernel/linux-2.5.43/vmlinux
> 00000000 1132668  35.6113     (no symbol)            
> /usr/lib/gcc-lib/i686-pc-linux-gnu/2.95.3/cc1
> 
> Normal run consistent with doing kernel compilation most of the time.
> 
> recordsize 16384 kernprofile: 
> c0111ef4 403545   4.3407      do_schedule
> c0105298 558704   6.00965     poll_idle
> c0188694 2571995  27.6655     __generic_copy_from_user
> c0188610 4489796  48.2941     __generic_copy_to_user
> sysprofile:
> c0111ef4 403545   4.24896     do_schedule            
> /home/con/kernel/linux-2.5.43/vmlinux
> c0105298 558704   5.88264     poll_idle              
> /home/con/kernel/linux-2.5.43/vmlinux
> c0188694 2571995  27.0807     __generic_copy_from_user
> /home/con/kernel/linux-2.5.43/vmlinux
> c0188610 4489796  47.2734     __generic_copy_to_user 
> /home/con/kernel/linux-2.5.43/vmlinux
> 
> I had to abort the run with recordsize 16384 but you can see it's just stuck
> in
> process_load copying data between forked processes.
> 
> Can someone else on lkml decipher why it gets stuck here?

Well this has become more common with 2.5.43-mm2. I had to abort the
process_load run 3 times when benchmarking it. Going back to other kernels and
trying them it didnt happen so I dont think its my hardware failing or something
like that.

Con

  reply	other threads:[~2002-10-17  7:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-17  2:13 Pathological case identified from contest Con Kolivas
2002-10-17  2:49 ` Andrew Morton
2002-10-17  4:26   ` Con Kolivas
2002-10-17  7:16     ` Con Kolivas [this message]
2002-10-17  7:35       ` Andrew Morton
2002-10-17 17:15         ` Rik van Riel
2002-10-20  2:59         ` Con Kolivas
2002-10-20  3:05           ` Andrew Morton
2002-10-20  6:27             ` Con Kolivas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1034839006.3dae63de3f69a@kolivas.net \
    --to=conman@kolivas.net \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.