Bug report: smp affinity patch

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Bug report: smp affinity patch
@ 2002-02-22 22:02 Joe Korty
  0 siblings, 0 replies; 2+ messages in thread
From: Joe Korty @ 2002-02-22 22:02 UTC (permalink / raw)
  To: rml; +Cc: mingo, linux-kernel, l-k

Hi everyone,
  On occasion, the smp affinity patch can leave one or more runnable
processes in such a state that the scheduler never selects them for
execution.  The reason this occurs is unknown.  This note reports
the symptoms and how the problem may be replicated:

I am using the smp affinity patch from Robert Love, which provides
a /proc/pid/affinity interface to the user.  I presume the problem
is also present in the Ingo Molnar patch, since it is so similar
in implementation, although I have not tested it.

I ran across this problem when I wrote a shell script that
implemented cpu shielding.  The shielding script modifies the
affinities of (most) all proceses in the system -- for each process,
it either forces that process to run only on the shielded cpu, or
forces it to avoid the shielded cpu altogether.  The only processes
to which neither are applied are the ksoftirqd_CPUxx daemons, each of
which has to remain on the cpu originally attached to.

Joe

---------------------------------------

Environment:
    linux-2.4.17.tar.gz
    + patch-2.4.18-rc2.gz
    + cpu-affinity-rml-2.4.16-1.patch (from www.tech9.net/rml/linux)
    PC, Pentium III, dual cpu's, dual IO APIC's, scsi, console via com1.

Test Shell Script used (filename `shield'):

    #!/bin/bash
    # shell script that reserves some cpu to some small
    # set of procs: accomplished by tweaking the affinity
    # masks of all procs -- either by removing that cpu from
    # those pids which are not to use it, or by setting the
    # affinity to only that cpu, for those procs that are
    # to be attached to the shielded cpu..
    #
    # usage: shield unshieldmask shieldmask pid pid ...
    # example: shield 2 1 1027 1028
    # meaning: pids 1027,1028 are to run on cpu0; every
    # other procs is to run on cpu 1.
    # example: shield 3 3
    # meaning: make every cpu available to all procs.
    # note: procs 3 & 4 (ksoftirqd_CPU[0-1]) are not and
    # must not have their affinities changed by this script.

    unshieldmask=${1:-e}
    shift
    shieldmask=${1:-1}
    shift
    cd /proc
    for i in $(/bin/ls -d [0-9]*); do
	if [ -d $i ]; then
	    case $i in
		3|4) ;;
		${1:-no}) echo $shieldmask >$i/affinity ;;
		${2:-no}) echo $shieldmask >$i/affinity ;;
		${3:-no}) echo $shieldmask >$i/affinity ;;
		${4:-no}) echo $shieldmask >$i/affinity ;;
		*)  echo $unshieldmask >$i/affinity ;;
	    esac
	fi
    done

Test initialization sequence:

    in window #1:
	top -d1 -i
    in window #2:
	echo 'main() {for(;;);}' >l.c && make l
	./l &
	[1]  1087
	./l &
	[2]  1088
	./l &
	[3]  1089
	./l &
	[4]  1090

Test sequence and results:

    In the below tests, to `Stall' means that the scheduler fails to
    give a runnable process any time.

    Notation:
      1088 Stalls	- pid 1088 stalls.  viewable in the top(1) window
			  as the bottom `running' proc, but has 0% cpu
			  utilization.

      top Stalls	- the top window stops updating.  Due to
			  top itself being a victim of the scheduling
			  bug.  to see: run another top in another window.


    result		command line executed
    ------------------	---------------------
    ok			shield 1 2
    ok			shield 1 2
    ok			shield 3 1
    ok			shield 3 1 1087
    ok			shield 2 1 1087
    ok			shield 3 3
    top Stalls		shield 2 1 1090
    ok			shield 3 3
    top Stalls		shield 2 1 1090
    ok			shield 3 3
    top Stalls		shield 2 1 1090
    ok			shield 3 3
    1088 Stalls		shield 2 1 1090
    ok			shield 3 3
    ok			shield 2 1
    ok			shield 1 2
    ok			shield 2 1
    ok			shield 1 2
    ok			shield 2 1
    1090 Stalls		shield 1 2 1090
    1090 Stalls		shield 1 2 1090
    top Stalls plus	shield 2 1 1090
    1087, sendmail,
    crond,init,lots
    of kjournals, and
    syslogd
    ok			shield 3 3
    top + shell window	shield 2 1 1090
    Stalls
    ok			shield 3 3		(executed in another window)
    1087 Stalls		shield 1 2 1090


Sample good and bad top(1) Displays:

------------------------------------------------------------------- good

  9:18pm  up 38 min,  3 users,  load average: 4.00, 4.31, 4.69
48 processes: 43 sleeping, 5 running, 0 zombie, 0 stopped
CPU0 states: 100.0% user,  0.0% system,  0.0% nice,  0.0% idle
CPU1 states: 99.0% user,  1.0% system,  0.0% nice,  0.0% idle
Mem:   513160K av,   44600K used,  468560K free,       0K shrd,   10156K buff
Swap: 1052216K av,       0K used, 1052216K free                   17420K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1088 root      20   0   268  268   220 R    52.0  0.0  15:28 l
 1089 root      20   0   268  268   220 R    50.0  0.0  15:54 l
 1087 root      18   0   268  268   220 R    48.0  0.0  13:38 l
 1090 root      18   0   268  268   220 R    48.0  0.0  22:44 l
 1027 root      10   0  1060 1060   856 R     1.0  0.2   0:13 top

------------------------------------------------------------------- bad

  9:20pm  up 40 min,  3 users,  load average: 6.12, 5.19, 4.97
49 processes: 43 sleeping, 6 running, 0 zombie, 0 stopped
CPU0 states: 97.0% user,  3.0% system,  0.0% nice,  0.0% idle
CPU1 states: 100.0% user,  0.0% system,  0.0% nice,  0.0% idle
Mem:   513160K av,   44940K used,  468220K free,       0K shrd,   10392K buff
Swap: 1052216K av,       0K used, 1052216K free                   17420K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1090 root      16   0   268  268   220 R    99.9  0.0  24:24 l
 1089 root      16   0   268  268   220 R    64.9  0.0  16:58 l
 1088 root       9   0   268  268   220 R    33.9  0.0  16:31 l
  978 root       9   0   664  664   552 R     0.0  0.1   0:00 in.telnetd
 1027 root       9   0  1060 1060   856 R     0.0  0.2   0:14 top
 1087 root       9   0   268  268   220 R     0.0  0.0  14:05 l

-------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Bug report: smp affinity patch
       [not found] <Pine.LNX.4.33.0202221834430.19736-100000@coffee.psychology.mcmaster.ca>
@ 2002-02-25 16:13 ` Joe Korty
  0 siblings, 0 replies; 2+ messages in thread
From: Joe Korty @ 2002-02-25 16:13 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-kernel

>> On occasion, the smp affinity patch can leave one or more runnable
>> processes in such a state that the scheduler never selects them for
> 
> out of curiosity, do you have data on some case where this kind of
> affinity fiddling produces a  noticable improvement in performance? 
> people always say it does, but it's not clear whether that's just
> because famous old systems in the past did it...

Hi Mark,
 Sorry about the delay.  Your letter must have come in just as I was
stepping out the door last Friday night.

Such fiddling never `increases performance', it always results in a
massively suboptimal use of system resources.  Therefore one shields
only when the loss in performance is a non-issue, compared to the
gain in timely responsiveness to external events experienced by
applications running on the shielded cpus.  `Timely responsiveness'
is typically measured by how erratic an application is in responding
to an external event repeated over and over; this variation in
response time I call jitter.

One can measure jitter with a dual trace oscilloscope, a square wave
generator, and a PC with an external input/output interrupt card with
a matching driver which lets an application sleep, waiting on
interrupts from that driver.  The test application is a few lines of
C code that loops forever, sending out an external interrupt every
time it wakes up due to the arrival of an input external interrupt.

With channel 1 of the oscilloscope also attached to the square wave
generator and channel 2 attached to the external interrupt output pin
of the PC, one can graphically see on the scope 1) the average delay
time it takes the PC (hardware, driver, OS, and application in toto)
to respond to an interrupt, and 2) the variations (jitter) from that
average time, how often such variations occur, and their magnitude.

Now I haven't run the above test myself for Linux, but one of the
guys here has, and he says that under Linux we see an average delay
between the input and output pulses of 11 microseconds, and that most
of the delays fall within a few microseconds of that.  However, there
are occasional delays much longer than this, with the longest being
an occasional 10 MILLIseconds..which would be a a truely horrendous
jitter, if true.

The above is with shielding off; we have not done experiments with
our shielding code turned on in Linux since that is not working yet.

I imagine the 10 millisecond delay is due to the standard linux
scheduler, which is priority-less and often favors running a
currently running program over switching to a newly-runnable program.
The new O(1) scheduler might fix that, although I haven't looked at
it yet to see if my hope has any basis in fact.

Once the scheduler issues are fixed then shielding becomes useful
keeping ordinary, noncritical processes off of a cpu.  Process
priority alone does not protect much against the jitter these
processes introduce, as they can at any time enter the kernel via a
system call. There, they can grab spinlocks and lock out interrupts
for short periods of time .. those short periods of time often being
in the 100's of microseconds, for a well-threaded system (which Linux
is not, yet).   These lockouts prevent the scheduler from running and
thus are a source of jitter that is 10x of the normal response time of
the PC to an external interrupt.

In combination with process shielding, one also should use IRQ
shielding (/proc/irq/n/affinity interface), in order to steer
interrupts from noncritical devices over to the unshielded cpus.
Interrupt processing is also a source of jitter in the 100's of
microsecond range, and given todays networking cards, whose drivers
like to loop at interrupt level processing boo-ko numbers of packets,
can often hold out interrupts for much longer than 100's of
microseconds..

Joe

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-02-25 16:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.33.0202221834430.19736-100000@coffee.psychology.mcmaster.ca>
2002-02-25 16:13 ` Bug report: smp affinity patch Joe Korty
2002-02-22 22:02 Joe Korty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox