Re: Pathological case identified from contest

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Con Kolivas <conman@kolivas.net>
To: Andrew Morton <akpm@digeo.com>
Cc: linux kernel mailing list <linux-kernel@vger.kernel.org>,
	William Lee Irwin <wli@holomorphy.com>
Subject: Re: Pathological case identified from contest
Date: Thu, 17 Oct 2002 14:26:35 +1000	[thread overview]
Message-ID: <1034828795.3dae3bfb42911@kolivas.net> (raw)
In-Reply-To: <3DAE252B.A9A5F6B1@digeo.com>

Quoting Andrew Morton <akpm@digeo.com>:

> Con Kolivas wrote:
> > 
> > I found a pathological case in 2.5 while running contest with process_load
> > recently after checking the results which showed a bad result for
> 2.5.43-mm1:
> > 
> > 2.5.43-mm1              101.38  72%     42      31%
> > 2.5.43-mm1              102.90  75%     34      28%
> > 2.5.43-mm1              504.12  14%     603     85%
> > 2.5.43-mm1              96.73   77%     34      26%
> > 
> > This was very strange so I looked into it further
> > 
> > The default for process_load is this command:
> > 
> > process_load --processes $nproc --recordsize 8192 --injections 2
> > 
> > where $nproc=4*num_cpus
> > 
> > When I changed recordsize to 16384, many of the 2.5 kernels started
> exhibiting
> > the same behaviour. While the machine was apparently still alive and would
> > respond to my request to abort, the kernel compile would all but stop
> while
> > process_load just continued without allowing anything to happen from
> kernel
> > compilation for up to 5 minutes at a time. This doesnt happen with any 2.4
> kernels.
> > 
> 
> Well it doesn't happen on my test machine (UP or SMP).  I tried
> various recordsizes.  It's probably related to HZ, memory bandwidth
> and the precise timing at which things happen.
> 
> The test describes itself thusly:
> 
>  *  This test generates a load which simulates a process-loaded system.
>  *
>  *  The test creates a ring of processes, each connected to its predecessor
>  *  and successor by a pipe.  After the ring is created, the parent process
>  *  injects some dummy data records into the ring and then joins.  The
>  *  processes pass the data records around the ring until they are killed.
>  *
> 
> It'll be starvation in the CPU scheduler I expect.  For some reason
> the ring of piping processes is just never giving a timeslice to
> anything else.  Or maybe something to do with the exceptional
> wakeup strategy which pipes use.
> 
> Don't now, sorry.  One for the kernel/*.c guys.

Ok well I've done some profiling as suggested by wli and it shows pretty much
what I find in the results - it gets stuck while doing process_load and never
moves on.

recordsize 8192 kern profile:
c01223ac 76997    4.48583     do_anonymous_page
c0188694 135835   7.91373     __generic_copy_from_user
c0188610 345071   20.1038     __generic_copy_to_user
c0105298 801429   46.6911     poll_idle
sysprofile:
00000000 160258   5.03854     (no symbol)             /lib/i686/libc-2.2.5.so
c0188610 345071   10.8491     __generic_copy_to_user 
/home/con/kernel/linux-2.5.43/vmlinux
c0105298 801429   25.1971     poll_idle              
/home/con/kernel/linux-2.5.43/vmlinux
00000000 1132668  35.6113     (no symbol)            
/usr/lib/gcc-lib/i686-pc-linux-gnu/2.95.3/cc1

Normal run consistent with doing kernel compilation most of the time.

recordsize 16384 kernprofile: 
c0111ef4 403545   4.3407      do_schedule
c0105298 558704   6.00965     poll_idle
c0188694 2571995  27.6655     __generic_copy_from_user
c0188610 4489796  48.2941     __generic_copy_to_user
sysprofile:
c0111ef4 403545   4.24896     do_schedule            
/home/con/kernel/linux-2.5.43/vmlinux
c0105298 558704   5.88264     poll_idle              
/home/con/kernel/linux-2.5.43/vmlinux
c0188694 2571995  27.0807     __generic_copy_from_user
/home/con/kernel/linux-2.5.43/vmlinux
c0188610 4489796  47.2734     __generic_copy_to_user 
/home/con/kernel/linux-2.5.43/vmlinux

I had to abort the run with recordsize 16384 but you can see it's just stuck in
process_load copying data between forked processes.

Can someone else on lkml decipher why it gets stuck here?

Con

next prev parent reply	other threads:[~2002-10-17  4:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-17  2:13 Pathological case identified from contest Con Kolivas
2002-10-17  2:49 ` Andrew Morton
2002-10-17  4:26   ` Con Kolivas [this message]
2002-10-17  7:16     ` Con Kolivas
2002-10-17  7:35       ` Andrew Morton
2002-10-17 17:15         ` Rik van Riel
2002-10-20  2:59         ` Con Kolivas
2002-10-20  3:05           ` Andrew Morton
2002-10-20  6:27             ` Con Kolivas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1034828795.3dae3bfb42911@kolivas.net \
    --to=conman@kolivas.net \
    --cc=akpm@digeo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox