public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] O(1) scheduler, -J4
@ 2002-01-21 15:05 Ingo Molnar
  2002-01-21 15:32 ` [patch] O(1) scheduler, -J4, 2.4.18-pre4 Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2002-01-21 15:05 UTC (permalink / raw)
  To: linux-kernel


the -J4 scheduler patch is available:

    http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.5.3-pre2-J4.patch
    http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.4.17-J4.patch

there are no open/reported bugs, and no bugs were found since -J2. The
scheduler appears to be stabilizing steadily.

-J4 includes two changes to further improve interactiveness:

1)  the introduction of 'super-long' timeslices, max timeslice is 500
    msecs - it was 180 msecs before. The new default timeslice is 250
    msecs, it was 90 msecs before.

the reason for super-long timeslices is that IMO we now can afford them.
The scheduler is pretty good at identifying true interactive tasks these
days. So we can increase timeslice length without risking the loss of good
interacte latencies. Long timeslices have a number of advantages:

 - nice +19 CPU hogs take up less CPU time than they used to.

 - interactive tasks can gather a bigger 'reserve' timeslice they can use
   up to do bursts of processing.

 - CPU hogs will get better cache affinity, due to longer timeslices
   and less context-switching.

Long timeslices also have a disadvantage:

 - under high load, if an interactive task manages to fall into the
   CPU-bound hell then it will take longer for it to get the next slice of
   processing.

i have measured the pros to beat the cons under the workloads i tried, but
YMMV - more testing by more people is needed, comparing -J4's interactive
feel (and nice behavior, and kernel compilation performance) against -J2's
interactive feel/performance.


2)  slight shrinking of the bonus/penalty range a task can get.

i've shrunk the bonus/penalty range from +-19 priority levels to +-14
priority levels. (from 90% of the full range to 70% of the full range.)
The reason why this can be done without hurting interactiveness is that
it's no longer a necessity to use the maximum range of priorities - the
interactiviness information is stored in p->sleep_avg, which is not
sensitive to the range of priority levels.

The shrinking has two benefits:

 - slightly denser priority arrays, slightly better cache utilization.

 - more isolation of nice levels from each other. Eg. nice -20 tasks now
   have a 6 priority levels 'buffer zone' which cannot be reached by
   normal interactive tasks. nice -20 audio daemons should benefit from
   this. Also, normal CPU hogs are better isolated from nice +19 CPU hogs,
   with the same 6 priority levels 'buffer zone'.

(by shrinking the bonus/penalty range, the -3 rule in the TASK_INTERACTIVE
definition was shrunk as well, to -2.)

Changelog:

 - Erich Focht: optimize max_load, remove prev_max_load.

 - Robert Love: simplify unlock_task_rq().

 - Robert Love: fix the ->cpu offset value in x86's entry.S, used by the
                preemption patch.

 - me: interactiveness updates.

 - me: sched_rr_get_interval() should return the timeslice value based on
       ->__nice, not based on ->prio.

Bug reports, comments, suggestions welcome. (any patch/fix that is not in
-J4 is lost and should be resent.)

	Ingo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch] O(1) scheduler, -J4, 2.4.18-pre4
  2002-01-21 15:05 [patch] O(1) scheduler, -J4 Ingo Molnar
@ 2002-01-21 15:32 ` Ingo Molnar
  2002-01-24  1:43   ` Ingo's O(1) scheduler vs. wait_init_idle Martin J. Bligh
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2002-01-21 15:32 UTC (permalink / raw)
  To: linux-kernel


and due to popular demand there is also a patch against 2.4.18-pre4:

   http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.4.18-pre4-J4.patch

	Ingo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Ingo's O(1) scheduler vs. wait_init_idle
  2002-01-21 15:32 ` [patch] O(1) scheduler, -J4, 2.4.18-pre4 Ingo Molnar
@ 2002-01-24  1:43   ` Martin J. Bligh
  2002-01-24  9:21     ` Ingo Molnar
  2002-01-25 23:07     ` Performance of Ingo's O(1) scheduler on 8 way NUMA-Q Martin J. Bligh
  0 siblings, 2 replies; 12+ messages in thread
From: Martin J. Bligh @ 2002-01-24  1:43 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel

> and due to popular demand there is also a patch against 2.4.18-pre4:
> 
>    http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.4.18-pre4-J4.patch

I was trying to test this in my 8 way NUMA box, but this patch
seems to have lost half of the wait_init_idle fix that I put
in a while back. I'm not sure if this is deliberate or not, but
I suspect not, as you only removed part of it, and from your 
comment below (from a previous email), I think you understand 
the reasoning behind it:

> the new rules are this: no schedule() must be called before all bits in
> wait_init_idle are clear. I'd suggest for you to add this to the top of
> schedule():
>
>	if (wait_init_idle)
>		BUG();

Anyway, the machine won't boot without this fix, so I tried adding
it back in, and now it boots just fine. Patch is attached below.

If the removal was accidental, please could you add it back in 
as below ... if not, could we discuss why this was removed, and
maybe we can find another way to fix the problem?

Meanwhile, I'll try to knock out some benchmark figures with the
new scheduler code in place on the 8 way NUMA and a 16 way
NUMA ;-)

Martin.



diff -urN linux-2.4.18-pre4.old/init/main.c linux-2.4.18-pre4.new/init/main.c
--- linux-2.4.18-pre4.old/init/main.c	Wed Jan 23 18:26:56 2002
+++ linux-2.4.18-pre4.new/init/main.c	Wed Jan 23 18:27:04 2002
@@ -508,6 +508,14 @@
 
 	smp_threads_ready=1;
 	smp_commence();
+
+	/* Wait for the other cpus to set up their idle processes */
+	printk("Waiting on wait_init_idle (map = 0x%lx)\n", wait_init_idle);
+	while (wait_init_idle) {
+		cpu_relax();
+		barrier();
+	}
+	printk("All processors have done init_idle\n");
 }
 
 #endif
diff -urN linux-2.4.18-pre4.old/kernel/sched.c linux-2.4.18-pre4.new/kernel/sched.c
--- linux-2.4.18-pre4.old/kernel/sched.c	Wed Jan 23 18:26:56 2002
+++ linux-2.4.18-pre4.new/kernel/sched.c	Wed Jan 23 18:27:09 2002
@@ -1221,6 +1221,8 @@
 		spin_unlock(&rq2->lock);
 }
 
+extern unsigned long wait_init_idle;
+
 void __init init_idle(void)
 {
 	runqueue_t *this_rq = this_rq(), *rq = current->array->rq;
@@ -1237,6 +1239,7 @@
 	current->state = TASK_RUNNING;
 	double_rq_unlock(this_rq, rq);
 	current->need_resched = 1;
+	clear_bit(cpu(), &wait_init_idle);
 	__restore_flags(flags);
 }
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ingo's O(1) scheduler vs. wait_init_idle
  2002-01-24  1:43   ` Ingo's O(1) scheduler vs. wait_init_idle Martin J. Bligh
@ 2002-01-24  9:21     ` Ingo Molnar
  2002-01-24 17:47       ` Martin J. Bligh
  2002-01-25 23:07     ` Performance of Ingo's O(1) scheduler on 8 way NUMA-Q Martin J. Bligh
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2002-01-24  9:21 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel


On Wed, 23 Jan 2002, Martin J. Bligh wrote:

> I was trying to test this in my 8 way NUMA box, but this patch seems
> to have lost half of the wait_init_idle fix that I put in a while
> back. [...]

please check out the -J5 2.4.17/18 patch, thats the first 2.4 patch that
has the correct idle-thread fixes. (which 2.5.3-pre3 has as well.) Do you
still have booting problems?

	Ingo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ingo's O(1) scheduler vs. wait_init_idle
  2002-01-24  9:21     ` Ingo Molnar
@ 2002-01-24 17:47       ` Martin J. Bligh
  2002-01-24 19:59         ` Ingo Molnar
  2002-01-26 17:19         ` Jesse Barnes
  0 siblings, 2 replies; 12+ messages in thread
From: Martin J. Bligh @ 2002-01-24 17:47 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel

>> I was trying to test this in my 8 way NUMA box, but this patch seems
>> to have lost half of the wait_init_idle fix that I put in a while
>> back. [...]
> 
> please check out the -J5 2.4.17/18 patch, thats the first 2.4 patch that
> has the correct idle-thread fixes. (which 2.5.3-pre3 has as well.) Do you
> still have booting problems?

Yes ... tried J6 on 2.4.18-pre4. If you want the garbled panic, it's attatched
below. What you're doing in J6 certainly looks different, but appears not
to be correct still. I'll look at  it some more, and try to send you a patch
against J6 today.

On the upside, the perfomance of the your J4 patch with the added fix I
sent yesterday seems to be a great improvment - before I was getting 
about 16% of my total system time spent in default_idle on a 
make -j16 bzImage. Now it's 0% ... we're actually feeding those CPUs ;-)
Kernel compile time is under a minute (56s) for the first time ever on the
8 way ... more figures later.

Martin.

checking TSC synchronization across CPUs: 
BIOS BUG: CPU#0 improperly initialized, has 52632 usecs TSC skew! FIXED.
BIOS BUG: CPU#1 improperly initialized, has 52632 usecs TSC skew! FIXED.
BIOS BUG: CPU#2 improperly initialized, has 52632 usecs TSC skew! FIXED.
BIOS BUG: CPU#3 improperly initialized, has 52632 usecs TSC skew! FIXED.
BIOS BUG: CPU#4 improperly initialized, has -52632 usecs TSC skew! FIXED.
BIOS BUG: CPU#5 improperly initialized, has -52632 usecs TSC skew! FIXED.
BIOS BUG: CPU#6 improperly initialized, has -52632 usecs TSC skew! FIXED.
BIOS BUG: CPU#7 improperly initialized, has -52633 usecs TSC skew! FIXED.
tecpu 1 has don0 init idl0, do246>c u_idot tai 0idle
ee doin0000u_idle().
00cpu 09  s done idat i0   edoinf cau_00
().
   i: cpu 6 has done  00000d1e  doin: f7d_idl8   
dydinitieli  ds                                  sp:Cf7dadab0e
                3CPU#  slready initialiess 
apperblp1>: 0, sta4,pagreaddlre00)presenack: >c01e<Uc0 l0000 to4ha0dl00ha1d po26
2900 ointe5f36 00000246 00000001 c021def0 
       00000282 00000001 00000015 00000000 c0295ef5 c0295ef7 c0118910 00000c2d 
       00000004 00000000 00000000 c023624f 
Call Trace: [<c0118910>] 

Code:  Bad EIP value.
 in0>Kernop randc: A00
pted to k 2
            tEI i  e ta10!
                          <c0I3 ed2e]   k - noainteding
F AGS: 00010002
eax: 00000029   ebx: 00010007   ecx: c021df08   edx: 00003ffc
esi: c0233b1a   edi: 0000001e   ebp: f7db5fa8   esp: f7da9fb0
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=f7da9000)
Stack: c01ee3c0 00000002 00000002 c0262a00 c0235f36 00000246 00000001 c021def0 
       00000282 00000001 00000015 00000000 c0295ef5 c0295ef7 c0118910 00000c35 
       00000006 00000000 00000000 c023624f 
Call Trace: [<c0118910>] 

Code: 0f 0b 83 c4 0c a1 84 74 2a c0 8d 90 c8 00 00 00 eb 0f 0f a3 
 <0>Kernel panic: Attempted to kill the idle task!
In idle task - not syncing



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ingo's O(1) scheduler vs. wait_init_idle
  2002-01-24 17:47       ` Martin J. Bligh
@ 2002-01-24 19:59         ` Ingo Molnar
  2002-01-26 17:19         ` Jesse Barnes
  1 sibling, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2002-01-24 19:59 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel


On Thu, 24 Jan 2002, Martin J. Bligh wrote:

> tecpu 1 has don0 init idl0, do246>c u_idot tai 0idle
> ee doin0000u_idle().
> 00cpu 09  s done idat i0   edoinf cau_00

just take out the TSC initialization messages from smpboot.c, that should
ungarble the output. And/or add this to printk.c:

	if (smp_processor_id())
		return;

this way you'll only see a single CPU's printk messages.

	Ingo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Performance of Ingo's O(1) scheduler on 8 way NUMA-Q
  2002-01-24  1:43   ` Ingo's O(1) scheduler vs. wait_init_idle Martin J. Bligh
  2002-01-24  9:21     ` Ingo Molnar
@ 2002-01-25 23:07     ` Martin J. Bligh
  2002-02-07 13:08       ` Performance of Ingo's O(1) scheduler on NUMA-Q Martin J. Bligh
  1 sibling, 1 reply; 12+ messages in thread
From: Martin J. Bligh @ 2002-01-25 23:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

Measuring the performace of a parallelized kernel compile with warm caches
on a 8 way NUMA-Q box. Highmem support is turned OFF so I'm only using
the first 1Gb or so of RAM (it's much faster without HIGHMEM).

prepare:
make -j16 dep; make -j16 bzImage; make mrproper; make -j16 dep; 

measured:
time make -j16 bzImage

2.4.18-pre7 

330.06user 99.92system 1:00.35elapsed 712%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (411135major+486026minor)pagefaults 0swaps

2.4.18-pre7 with J6 scheduler

307.19user 88.54system 0:57.63elapsed 686%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (399255major+484472minor)pagefaults 0swaps

Seems to give a significant improvement, not only giving a shorter 
elapsed time, but also lower CPU load.

Martin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Ingo's O(1) scheduler vs. wait_init_idle
  2002-01-24 17:47       ` Martin J. Bligh
  2002-01-24 19:59         ` Ingo Molnar
@ 2002-01-26 17:19         ` Jesse Barnes
  1 sibling, 0 replies; 12+ messages in thread
From: Jesse Barnes @ 2002-01-26 17:19 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: mingo, linux-kernel

On Thu, Jan 24, 2002 at 09:47:15AM -0800, Martin J. Bligh wrote:
> >> I was trying to test this in my 8 way NUMA box, but this patch seems
> >> to have lost half of the wait_init_idle fix that I put in a while
> >> back. [...]

We had this same trouble on 4 to 12 way Itanium machines, but finally
made them boot using a variation of your fix.  That was with J7.  It
looks like the boot cpu gets stuck waiting for wait_init_idle to
clear.  We'll try to send out a patch on Monday.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Performance of Ingo's O(1) scheduler on NUMA-Q
  2002-01-25 23:07     ` Performance of Ingo's O(1) scheduler on 8 way NUMA-Q Martin J. Bligh
@ 2002-02-07 13:08       ` Martin J. Bligh
  2002-02-07 23:23         ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Martin J. Bligh @ 2002-02-07 13:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

Measuring kernel compile times on a 16 way NUMA-Q,
adding Ingo's scheduler patch takes kernel compiles down
from 47 seconds to 31 seconds .... pretty impressive benefit.

Mike Kravetz is working on NUMA additions to Ingo's scheduler
which should give further improvements.

Martin.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Performance of Ingo's O(1) scheduler on NUMA-Q
  2002-02-07 13:08       ` Performance of Ingo's O(1) scheduler on NUMA-Q Martin J. Bligh
@ 2002-02-07 23:23         ` Ingo Molnar
  2002-02-08 15:15           ` Daniel Egger
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2002-02-07 23:23 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel


On Thu, 7 Feb 2002, Martin J. Bligh wrote:

> Measuring kernel compile times on a 16 way NUMA-Q, adding Ingo's
> scheduler patch takes kernel compiles down from 47 seconds to 31
> seconds .... pretty impressive benefit.

cool! By the way, could you try a test-compile with a 'big' .config file?

The reason i'm asking this is that with 31 seconds compiles, the final
link time serialization has a significant effect, which makes the compile
itself less scalable. Adding lots of subsystems to the .config will create
a compilation that takes much longer, but which should also compare the
two schedulers better.

	Ingo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Performance of Ingo's O(1) scheduler on NUMA-Q
  2002-02-07 23:23         ` Ingo Molnar
@ 2002-02-08 15:15           ` Daniel Egger
  2002-02-11 17:39             ` Martin J. Bligh
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Egger @ 2002-02-08 15:15 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel

Am Fre, 2002-02-08 um 00.23 schrieb Ingo Molnar:

> > Measuring kernel compile times on a 16 way NUMA-Q, adding Ingo's
> > scheduler patch takes kernel compiles down from 47 seconds to 31
> > seconds .... pretty impressive benefit.
 
> cool! By the way, could you try a test-compile with a 'big' .config file?

I'd assume that a 16way machine still taking 31s to compile the kernel
is already having a 'big' config file. 
 
-- 
Servus,
       Daniel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Performance of Ingo's O(1) scheduler on NUMA-Q
  2002-02-08 15:15           ` Daniel Egger
@ 2002-02-11 17:39             ` Martin J. Bligh
  0 siblings, 0 replies; 12+ messages in thread
From: Martin J. Bligh @ 2002-02-11 17:39 UTC (permalink / raw)
  To: Daniel Egger; +Cc: linux-kernel

>> > Measuring kernel compile times on a 16 way NUMA-Q, adding Ingo's
>> > scheduler patch takes kernel compiles down from 47 seconds to 31
>> > seconds .... pretty impressive benefit.
>  
>> cool! By the way, could you try a test-compile with a 'big' .config file?
> 
> I'd assume that a 16way machine still taking 31s to compile the kernel
> is already having a 'big' config file. 

It's a fairly normal config file, but the machine isn't feeling very
in touch with it's NUMAness, so it scales badly. If I only use one
quad (4 processsors), the same compile takes 47s.

M.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-02-11 17:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-21 15:05 [patch] O(1) scheduler, -J4 Ingo Molnar
2002-01-21 15:32 ` [patch] O(1) scheduler, -J4, 2.4.18-pre4 Ingo Molnar
2002-01-24  1:43   ` Ingo's O(1) scheduler vs. wait_init_idle Martin J. Bligh
2002-01-24  9:21     ` Ingo Molnar
2002-01-24 17:47       ` Martin J. Bligh
2002-01-24 19:59         ` Ingo Molnar
2002-01-26 17:19         ` Jesse Barnes
2002-01-25 23:07     ` Performance of Ingo's O(1) scheduler on 8 way NUMA-Q Martin J. Bligh
2002-02-07 13:08       ` Performance of Ingo's O(1) scheduler on NUMA-Q Martin J. Bligh
2002-02-07 23:23         ` Ingo Molnar
2002-02-08 15:15           ` Daniel Egger
2002-02-11 17:39             ` Martin J. Bligh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox