Regression seen for patch "sched:dont decrease idle sleep avg"

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Regression seen for patch "sched:dont decrease idle sleep avg"
@ 2006-05-08 23:18 Tim Chen
  2006-05-09  0:43 ` Con Kolivas
  0 siblings, 1 reply; 42+ messages in thread
From: Tim Chen @ 2006-05-08 23:18 UTC (permalink / raw)
  To: kernel, linux-kernel

Con,

As a result of the patch "sched:dont decrease idle sleep avg" 
introduced after 2.6.15, there is a 4% drop in Volanomark 
throughput on our Itanium test machine.  
Probably the following happened:
Compared to previous code, this patch slightly increases 
the the priority boost when a job is woken up.
This adds priority spread and variations to the wait time of jobs
on run queue if we have a lot of similar jobs in the system.

See patch:
http://www.kernel.org/git/?
p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e72ff0bb2c163eb13014ba113701bd42dab382fe 

Regards,
Tim 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-08 23:18 Regression seen for patch "sched:dont decrease idle sleep avg" Tim Chen
@ 2006-05-09  0:43 ` Con Kolivas
  2006-05-09  1:07   ` Martin Bligh
  2006-05-12  0:04   ` Chen, Kenneth W
  0 siblings, 2 replies; 42+ messages in thread
From: Con Kolivas @ 2006-05-09  0:43 UTC (permalink / raw)
  To: tim.c.chen; +Cc: linux-kernel, mingo

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

Tim Chen writes:

> Con,
> 
> As a result of the patch "sched:dont decrease idle sleep avg" 
> introduced after 2.6.15, there is a 4% drop in Volanomark 
> throughput on our Itanium test machine.  
> Probably the following happened:
> Compared to previous code, this patch slightly increases 
> the the priority boost when a job is woken up.
> This adds priority spread and variations to the wait time of jobs
> on run queue if we have a lot of similar jobs in the system.
> 
> See patch:
> http://www.kernel.org/git/?
> p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e72ff0bb2c163eb13014ba113701bd42dab382fe 

Lovely

This patch corrects a bug in the original code which unintentionally dropped 
the priority of tasks that were idle but were already high priority on other 
merits. It doesn't further increase the priority. The 4% almost certainly is 
due to the lack of any locking in the threading model used by the java 
virtual machine on volanomark and it being pure luck that penalising 
particularly idle tasks previously improved the wakeup timing of basically 
yielding dependant threads. This patch did fix bugs related to interactive 
yet idle tasks like consoles misbehaving. The fact that the presence of that 
particular bug improved a multithreaded benchmark that uses such a threading 
model is pure chance and (obviously) not design. I wouldn't like to see this 
bug reintroduced on the basis of this benchmark result.

---
-ck

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont         decrease idle sleep avg"
  2006-05-09  0:43 ` Con Kolivas
@ 2006-05-09  1:07   ` Martin Bligh
  2006-05-12  0:04   ` Chen, Kenneth W
  1 sibling, 0 replies; 42+ messages in thread
From: Martin Bligh @ 2006-05-09  1:07 UTC (permalink / raw)
  To: Con Kolivas; +Cc: tim.c.chen, linux-kernel, mingo

Con Kolivas wrote:
> Tim Chen writes:
> 
>> Con,
>>
>> As a result of the patch "sched:dont decrease idle sleep avg" 
>> introduced after 2.6.15, there is a 4% drop in Volanomark throughput 
>> on our Itanium test machine.  Probably the following happened:
>> Compared to previous code, this patch slightly increases the the 
>> priority boost when a job is woken up.
>> This adds priority spread and variations to the wait time of jobs
>> on run queue if we have a lot of similar jobs in the system.
>>
>> See patch:
>> http://www.kernel.org/git/?
>> p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e72ff0bb2c163eb13014ba113701bd42dab382fe 
> 
> 
> 
> Lovely
> 
> This patch corrects a bug in the original code which unintentionally 
> dropped the priority of tasks that were idle but were already high 
> priority on other merits. It doesn't further increase the priority. The 
> 4% almost certainly is due to the lack of any locking in the threading 
> model used by the java virtual machine on volanomark and it being pure 
> luck that penalising particularly idle tasks previously improved the 
> wakeup timing of basically yielding dependant threads. This patch did 
> fix bugs related to interactive yet idle tasks like consoles 
> misbehaving. The fact that the presence of that particular bug improved 
> a multithreaded benchmark that uses such a threading model is pure 
> chance and (obviously) not design. I wouldn't like to see this bug 
> reintroduced on the basis of this benchmark result.

Volanomark (and most Java benchmarks) are random number generators
anyway, especially when it comes to scheduler patches. They're doing
such utterly stupid things anyway that I don't think we should care
if we break them ...

M.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-09  0:43 ` Con Kolivas
  2006-05-09  1:07   ` Martin Bligh
@ 2006-05-12  0:04   ` Chen, Kenneth W
  2006-05-13 12:27     ` Andrew Morton
  2006-05-14 16:03     ` Con Kolivas
  1 sibling, 2 replies; 42+ messages in thread
From: Chen, Kenneth W @ 2006-05-12  0:04 UTC (permalink / raw)
  To: 'Con Kolivas', tim.c.chen; +Cc: linux-kernel, mingo

Tim Chen writes:
> See patch:
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e72ff0bb2c163eb13014ba113701bd42dab382fe 

Con Kolivas wrote on Monday, May 08, 2006 5:43 PM
> This patch corrects a bug in the original code which unintentionally dropped 
> the priority of tasks that were idle but were already high priority on other 
> merits. It doesn't further increase the priority.

This got me to take a non-casual look at that particular git commit.  The
first portion of the change log description says perfectly about the intent,
but after studying the code, I have to say that the actual code does not
implement what people say it will do.  In recalc_task_prio(), if a task's
sleep_time is more than INTERACTIVE_SLEEP, it will bump up p->sleep_avg all
the way to near maximum (at MAX_SLEEP_AVG - DEF_TIMESLICE), which according
to my calculation, it will have a priority bonus of 4 (out of max 5).

IOW, for a prolonged sleep, a task will immediately get near maximum priority
boost. Is that what the real intent is?  Seems to be on the contrary to what
the source code comments as well.

I think in the if (sleep_time > INTERACTIVE_SLEEP) block, p->sleep_avg should
be treated similarly like what the "else" block is doing: scale it proportionally
with past sleep time, perhaps not the immediate previously prolonged sleep
because that would for sure bump up priority too fast.  A better method might
be p->sleep_avg *= 2 or something like that.

- Ken

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-12  0:04   ` Chen, Kenneth W
@ 2006-05-13 12:27     ` Andrew Morton
  2006-05-13 13:07       ` Mike Galbraith
  2006-05-14 16:03     ` Con Kolivas
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2006-05-13 12:27 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: kernel, tim.c.chen, linux-kernel, mingo


(Catching up on lkml)

On Thu, 11 May 2006 17:04:11 -0700
"Chen, Kenneth W" <kenneth.w.chen@intel.com> wrote:

> Tim Chen writes:
> > See patch:
> > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e72ff0bb2c163eb13014ba113701bd42dab382fe 
> 
> Con Kolivas wrote on Monday, May 08, 2006 5:43 PM
> > This patch corrects a bug in the original code which unintentionally dropped 
> > the priority of tasks that were idle but were already high priority on other 
> > merits. It doesn't further increase the priority.
> 
> 
> This got me to take a non-casual look at that particular git commit.  The
> first portion of the change log description says perfectly about the intent,
> but after studying the code, I have to say that the actual code does not
> implement what people say it will do.  In recalc_task_prio(), if a task's
> sleep_time is more than INTERACTIVE_SLEEP, it will bump up p->sleep_avg all
> the way to near maximum (at MAX_SLEEP_AVG - DEF_TIMESLICE), which according
> to my calculation, it will have a priority bonus of 4 (out of max 5).
> 
> IOW, for a prolonged sleep, a task will immediately get near maximum priority
> boost. Is that what the real intent is?  Seems to be on the contrary to what
> the source code comments as well.
> 
> I think in the if (sleep_time > INTERACTIVE_SLEEP) block, p->sleep_avg should
> be treated similarly like what the "else" block is doing: scale it proportionally
> with past sleep time, perhaps not the immediate previously prolonged sleep
> because that would for sure bump up priority too fast.  A better method might
> be p->sleep_avg *= 2 or something like that.
> 

That seems to be a pretty significant discovery.  Is anything happening
with it?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-13 12:27     ` Andrew Morton
@ 2006-05-13 13:07       ` Mike Galbraith
  0 siblings, 0 replies; 42+ messages in thread
From: Mike Galbraith @ 2006-05-13 13:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chen, Kenneth W, kernel, tim.c.chen, linux-kernel, mingo

On Sat, 2006-05-13 at 05:27 -0700, Andrew Morton wrote:
> (Catching up on lkml)
> 
> On Thu, 11 May 2006 17:04:11 -0700
> "Chen, Kenneth W" <kenneth.w.chen@intel.com> wrote:
> 
> > Tim Chen writes:
> > > See patch:
> > > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e72ff0bb2c163eb13014ba113701bd42dab382fe 
> > 
> > Con Kolivas wrote on Monday, May 08, 2006 5:43 PM
> > > This patch corrects a bug in the original code which unintentionally dropped 
> > > the priority of tasks that were idle but were already high priority on other 
> > > merits. It doesn't further increase the priority.
> > 
> > 
> > This got me to take a non-casual look at that particular git commit.  The
> > first portion of the change log description says perfectly about the intent,
> > but after studying the code, I have to say that the actual code does not
> > implement what people say it will do.  In recalc_task_prio(), if a task's
> > sleep_time is more than INTERACTIVE_SLEEP, it will bump up p->sleep_avg all
> > the way to near maximum (at MAX_SLEEP_AVG - DEF_TIMESLICE), which according
> > to my calculation, it will have a priority bonus of 4 (out of max 5).
> > 
> > IOW, for a prolonged sleep, a task will immediately get near maximum priority
> > boost. Is that what the real intent is?  Seems to be on the contrary to what
> > the source code comments as well.
> > 
> > I think in the if (sleep_time > INTERACTIVE_SLEEP) block, p->sleep_avg should
> > be treated similarly like what the "else" block is doing: scale it proportionally
> > with past sleep time, perhaps not the immediate previously prolonged sleep
> > because that would for sure bump up priority too fast.  A better method might
> > be p->sleep_avg *= 2 or something like that.
> > 
> 
> That seems to be a pretty significant discovery.  Is anything happening
> with it?

When I tried to fix that, I ran into resistance.

	-Mike


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-12  0:04   ` Chen, Kenneth W
  2006-05-13 12:27     ` Andrew Morton
@ 2006-05-14 16:03     ` Con Kolivas
  2006-05-15 19:01       ` Chen, Kenneth W
  1 sibling, 1 reply; 42+ messages in thread
From: Con Kolivas @ 2006-05-14 16:03 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton

On Friday 12 May 2006 10:04, Chen, Kenneth W wrote:
> Tim Chen writes:
> > See patch:
> > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=co
> >mmit;h=e72ff0bb2c163eb13014ba113701bd42dab382fe
>
> Con Kolivas wrote on Monday, May 08, 2006 5:43 PM
>
> > This patch corrects a bug in the original code which unintentionally
> > dropped the priority of tasks that were idle but were already high
> > priority on other merits. It doesn't further increase the priority.
>
> This got me to take a non-casual look at that particular git commit.  The
> first portion of the change log description says perfectly about the
> intent, but after studying the code, I have to say that the actual code
> does not implement what people say it will do.  In recalc_task_prio(), if a
> task's sleep_time is more than INTERACTIVE_SLEEP, it will bump up
> p->sleep_avg all the way to near maximum (at MAX_SLEEP_AVG -
> DEF_TIMESLICE), which according to my calculation, it will have a priority
> bonus of 4 (out of max 5).
>
> IOW, for a prolonged sleep, a task will immediately get near maximum
> priority boost. Is that what the real intent is?  Seems to be on the
> contrary to what the source code comments as well.
>
> I think in the if (sleep_time > INTERACTIVE_SLEEP) block, p->sleep_avg
> should be treated similarly like what the "else" block is doing: scale it
> proportionally with past sleep time, perhaps not the immediate previously
> prolonged sleep because that would for sure bump up priority too fast.  A
> better method might be p->sleep_avg *= 2 or something like that.

There would be no difference if the priority boost is done lower. The if and 
else blocks both end up equating to the same amount of priority boost, with 
the former having a ceiling on it, so yes it is the intent. You'll see that 
the amount of sleep required to jump from lowest priority to MAX_SLEEP_AVG - 
DEF_TIMESLICE is INTERACTIVE_SLEEP.

-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-14 16:03     ` Con Kolivas
@ 2006-05-15 19:01       ` Chen, Kenneth W
  2006-05-15 23:45         ` Con Kolivas
  2006-05-16  4:07         ` Mike Galbraith
  0 siblings, 2 replies; 42+ messages in thread
From: Chen, Kenneth W @ 2006-05-15 19:01 UTC (permalink / raw)
  To: 'Con Kolivas'; +Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton

Con Kolivas wrote on Sunday, May 14, 2006 9:03 AM
> There would be no difference if the priority boost is done lower. The if and 
> else blocks both end up equating to the same amount of priority boost, with 
> the former having a ceiling on it, so yes it is the intent. You'll see that 
> the amount of sleep required to jump from lowest priority to MAX_SLEEP_AVG - 
> DEF_TIMESLICE is INTERACTIVE_SLEEP.

I don't think the if and the else block is doing the same thing. In the if
block, the p->sleep_avg is unconditionally boosted to ceiling for all tasks,
though it will not reduce sleep_avg for tasks that already exceed the ceiling.
Bumping up sleep_avg will then translate into priority boost of MAX_BONUS-1,
which potentially can be too high.

But that's fine if it is the intent. At minimum, the comment in the source
code should say so instead of fooling people who don't actually read the code.


[patch] sched: update comments in priority calculation w.r.t. implementation.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>

--- ./kernel/sched.c.orig	2006-05-15 12:24:02.000000000 -0700
+++ ./kernel/sched.c	2006-05-15 12:37:16.000000000 -0700
@@ -746,10 +746,12 @@ static int recalc_task_prio(task_t *p, u
 	if (likely(sleep_time > 0)) {
 		/*
 		 * User tasks that sleep a long time are categorised as
-		 * idle. They will only have their sleep_avg increased to a
-		 * level that makes them just interactive priority to stay
-		 * active yet prevent them suddenly becoming cpu hogs and
-		 * starving other processes.
+		 * idle. If they sleep longer than INTERACTIVE_SLEEP, it
+		 * will have its priority boosted to minimum MAX_BONUS-1.
+		 * For short sleep, they will only have their sleep_avg
+		 * increased to a level that makes them just interactive
+		 * priority to stay active yet prevent them suddenly becoming
+		 * cpu hogs and starving other processes.
 		 */
 		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
 				unsigned long ceiling;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-15 19:01       ` Chen, Kenneth W
@ 2006-05-15 23:45         ` Con Kolivas
  2006-05-16  1:22           ` Chen, Kenneth W
                             ` (2 more replies)
  2006-05-16  4:07         ` Mike Galbraith
  1 sibling, 3 replies; 42+ messages in thread
From: Con Kolivas @ 2006-05-15 23:45 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton

On Tuesday 16 May 2006 05:01, Chen, Kenneth W wrote:
> Con Kolivas wrote on Sunday, May 14, 2006 9:03 AM
>
> > There would be no difference if the priority boost is done lower. The if
> > and else blocks both end up equating to the same amount of priority
> > boost, with the former having a ceiling on it, so yes it is the intent.
> > You'll see that the amount of sleep required to jump from lowest priority
> > to MAX_SLEEP_AVG - DEF_TIMESLICE is INTERACTIVE_SLEEP.
>
> I don't think the if and the else block is doing the same thing. In the if
> block, the p->sleep_avg is unconditionally boosted to ceiling for all
> tasks, though it will not reduce sleep_avg for tasks that already exceed
> the ceiling. Bumping up sleep_avg will then translate into priority boost
> of MAX_BONUS-1, which potentially can be too high.

Yes it's only designed to detect something that has been asleep for an 
arbitrary long time and "categorised as idle"; it is not supposed to be a 
priority stepping stone for everything, in this case at MAX_BONUS-1. Mike 
proposed doing this instead, but it was never my intent. Your comment is not 
quite correct as it just happens to be MAX_BONUS-1 at nice 0, and not any 
other nice value.

> But that's fine if it is the intent. At minimum, the comment in the source
> code should say so instead of fooling people who don't actually read the
> code.

Feel free to update it to how you understand it now :) I have this feeling 
we'll be seeing quite some action here soon...

> [patch] sched: update comments in priority calculation w.r.t.
> implementation.

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-15 23:45         ` Con Kolivas
@ 2006-05-16  1:22           ` Chen, Kenneth W
  2006-05-16  1:44             ` Con Kolivas
  2006-05-16  4:10           ` Mike Galbraith
  2006-05-16 23:32           ` Tim Chen
  2 siblings, 1 reply; 42+ messages in thread
From: Chen, Kenneth W @ 2006-05-16  1:22 UTC (permalink / raw)
  To: 'Con Kolivas'; +Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton

Con Kolivas wrote on Monday, May 15, 2006 4:45 PM
> On Tuesday 16 May 2006 05:01, Chen, Kenneth W wrote:
> > I don't think the if and the else block is doing the same thing. In the if
> > block, the p->sleep_avg is unconditionally boosted to ceiling for all
> > tasks, though it will not reduce sleep_avg for tasks that already exceed
> > the ceiling. Bumping up sleep_avg will then translate into priority boost
> > of MAX_BONUS-1, which potentially can be too high.
> 
> Yes it's only designed to detect something that has been asleep for an 
> arbitrary long time and "categorised as idle"; it is not supposed to be a 
> priority stepping stone for everything, in this case at MAX_BONUS-1. Mike 
> proposed doing this instead, but it was never my intent. Your comment is not 
> quite correct as it just happens to be MAX_BONUS-1 at nice 0, and not any 
> other nice value.

Huh??

sleep_avg is set at constant:
p->sleep_avg = JIFFIES_TO_NS(MAX_SLEEP_AVG - DEF_TIMESLICE);


The bonus calculation is:

#define CURRENT_BONUS(p) \
        (NS_TO_JIFFIES((p)->sleep_avg) * MAX_BONUS / MAX_SLEEP_AVG)

bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;

None of the calculation that I see uses nice value.  Did I miss something?

- Ken

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-16  1:22           ` Chen, Kenneth W
@ 2006-05-16  1:44             ` Con Kolivas
  0 siblings, 0 replies; 42+ messages in thread
From: Con Kolivas @ 2006-05-16  1:44 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton

On Tuesday 16 May 2006 11:22, Chen, Kenneth W wrote:
> Con Kolivas wrote on Monday, May 15, 2006 4:45 PM
>
> > On Tuesday 16 May 2006 05:01, Chen, Kenneth W wrote:
> > > I don't think the if and the else block is doing the same thing. In the
> > > if block, the p->sleep_avg is unconditionally boosted to ceiling for
> > > all tasks, though it will not reduce sleep_avg for tasks that already
> > > exceed the ceiling. Bumping up sleep_avg will then translate into
> > > priority boost of MAX_BONUS-1, which potentially can be too high.
> >
> > Yes it's only designed to detect something that has been asleep for an
> > arbitrary long time and "categorised as idle"; it is not supposed to be a
> > priority stepping stone for everything, in this case at MAX_BONUS-1. Mike
> > proposed doing this instead, but it was never my intent. Your comment is
> > not quite correct as it just happens to be MAX_BONUS-1 at nice 0, and not
> > any other nice value.
>
> Huh??
>
> sleep_avg is set at constant:
> p->sleep_avg = JIFFIES_TO_NS(MAX_SLEEP_AVG - DEF_TIMESLICE);
>
>
> The bonus calculation is:
>
> #define CURRENT_BONUS(p) \
>         (NS_TO_JIFFIES((p)->sleep_avg) * MAX_BONUS / MAX_SLEEP_AVG)
>
> bonus = CURRENT_BONUS(p) - MAX_BONUS / 2;
>
> None of the calculation that I see uses nice value.  Did I miss something?

My bad. You're correct, sorry.

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-15 23:45         ` Con Kolivas
  2006-05-16  1:22           ` Chen, Kenneth W
@ 2006-05-16  4:10           ` Mike Galbraith
  2006-05-16 23:32           ` Tim Chen
  2 siblings, 0 replies; 42+ messages in thread
From: Mike Galbraith @ 2006-05-16  4:10 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Chen, Kenneth W, tim.c.chen, linux-kernel, mingo, Andrew Morton

On Tue, 2006-05-16 at 09:45 +1000, Con Kolivas wrote:
> Feel free to update it to how you understand it now :) I have this feeling 
> we'll be seeing quite some action here soon...

Really?  Mind telling us why?

	-Mike


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-15 23:45         ` Con Kolivas
  2006-05-16  1:22           ` Chen, Kenneth W
  2006-05-16  4:10           ` Mike Galbraith
@ 2006-05-16 23:32           ` Tim Chen
  2006-05-17  4:25             ` Mike Galbraith
  2006-05-17  8:23             ` Con Kolivas
  2 siblings, 2 replies; 42+ messages in thread
From: Tim Chen @ 2006-05-16 23:32 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Chen, Kenneth W, linux-kernel, mingo, Andrew Morton,
	Mike Galbraith

On Tue, 2006-05-16 at 09:45 +1000, Con Kolivas wrote:

> 
> Yes it's only designed to detect something that has been asleep for an 
> arbitrary long time and "categorised as idle"; it is not supposed to be a 
> priority stepping stone for everything, in this case at MAX_BONUS-1. Mike 
> proposed doing this instead, but it was never my intent. 

It seems like just one sleep longer than INTERACTIVE_SLEEP is needed
kick the priority of a process all the way to MAX_BONUS-1 and boost the
sleep_avg, regardless of what the prior sleep_avg was.

So if there is a cpu hog that has long sleeps occasionally, once it woke
up, its priority will get boosted close to maximum, likely starving out
other processes for a while till its sleep_avg gets reduced.  This
behavior seems like something to avoid according to the original code
comment.  Are we boosting the priority too quickly?  

Tim

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-16 23:32           ` Tim Chen
@ 2006-05-17  4:25             ` Mike Galbraith
  2006-05-17  4:45               ` Peter Williams
  2006-05-17  8:23             ` Con Kolivas
  1 sibling, 1 reply; 42+ messages in thread
From: Mike Galbraith @ 2006-05-17  4:25 UTC (permalink / raw)
  To: tim.c.chen
  Cc: Con Kolivas, Chen, Kenneth W, linux-kernel, mingo, Andrew Morton

On Tue, 2006-05-16 at 16:32 -0700, Tim Chen wrote:
> On Tue, 2006-05-16 at 09:45 +1000, Con Kolivas wrote:
> 
> > 
> > Yes it's only designed to detect something that has been asleep for an 
> > arbitrary long time and "categorised as idle"; it is not supposed to be a 
> > priority stepping stone for everything, in this case at MAX_BONUS-1. Mike 
> > proposed doing this instead, but it was never my intent. 
> 
> It seems like just one sleep longer than INTERACTIVE_SLEEP is needed
> kick the priority of a process all the way to MAX_BONUS-1 and boost the
> sleep_avg, regardless of what the prior sleep_avg was.
> 
> So if there is a cpu hog that has long sleeps occasionally, once it woke
> up, its priority will get boosted close to maximum, likely starving out
> other processes for a while till its sleep_avg gets reduced.  This
> behavior seems like something to avoid according to the original code
> comment.  Are we boosting the priority too quickly?  

The answer to that is, sometimes yes, and when it bites, it bites hard.
Happily, most hogs don't sleep much, and we don't generally have lots of
bursty sleepers.

	-Mike


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17  4:25             ` Mike Galbraith
@ 2006-05-17  4:45               ` Peter Williams
  2006-05-17  5:24                 ` Mike Galbraith
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Williams @ 2006-05-17  4:45 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: tim.c.chen, Con Kolivas, Chen, Kenneth W, linux-kernel, mingo,
	Andrew Morton

Mike Galbraith wrote:
> On Tue, 2006-05-16 at 16:32 -0700, Tim Chen wrote:
>> On Tue, 2006-05-16 at 09:45 +1000, Con Kolivas wrote:
>>
>>> Yes it's only designed to detect something that has been asleep for an 
>>> arbitrary long time and "categorised as idle"; it is not supposed to be a 
>>> priority stepping stone for everything, in this case at MAX_BONUS-1. Mike 
>>> proposed doing this instead, but it was never my intent. 
>> It seems like just one sleep longer than INTERACTIVE_SLEEP is needed
>> kick the priority of a process all the way to MAX_BONUS-1 and boost the
>> sleep_avg, regardless of what the prior sleep_avg was.
>>
>> So if there is a cpu hog that has long sleeps occasionally, once it woke
>> up, its priority will get boosted close to maximum, likely starving out
>> other processes for a while till its sleep_avg gets reduced.  This
>> behavior seems like something to avoid according to the original code
>> comment.  Are we boosting the priority too quickly?  
> 
> The answer to that is, sometimes yes, and when it bites, it bites hard.
> Happily, most hogs don't sleep much, and we don't generally have lots of
> bursty sleepers.
> 

But it's easy for a malicious user to exploit.  Yes?

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17  4:45               ` Peter Williams
@ 2006-05-17  5:24                 ` Mike Galbraith
  0 siblings, 0 replies; 42+ messages in thread
From: Mike Galbraith @ 2006-05-17  5:24 UTC (permalink / raw)
  To: Peter Williams
  Cc: tim.c.chen, Con Kolivas, Chen, Kenneth W, linux-kernel, mingo,
	Andrew Morton

On Wed, 2006-05-17 at 14:45 +1000, Peter Williams wrote:
> Mike Galbraith wrote:
> > On Tue, 2006-05-16 at 16:32 -0700, Tim Chen wrote:
> >> On Tue, 2006-05-16 at 09:45 +1000, Con Kolivas wrote:
> >>
> >>> Yes it's only designed to detect something that has been asleep for an 
> >>> arbitrary long time and "categorised as idle"; it is not supposed to be a 
> >>> priority stepping stone for everything, in this case at MAX_BONUS-1. Mike 
> >>> proposed doing this instead, but it was never my intent. 
> >> It seems like just one sleep longer than INTERACTIVE_SLEEP is needed
> >> kick the priority of a process all the way to MAX_BONUS-1 and boost the
> >> sleep_avg, regardless of what the prior sleep_avg was.
> >>
> >> So if there is a cpu hog that has long sleeps occasionally, once it woke
> >> up, its priority will get boosted close to maximum, likely starving out
> >> other processes for a while till its sleep_avg gets reduced.  This
> >> behavior seems like something to avoid according to the original code
> >> comment.  Are we boosting the priority too quickly?  
> > 
> > The answer to that is, sometimes yes, and when it bites, it bites hard.
> > Happily, most hogs don't sleep much, and we don't generally have lots of
> > bursty sleepers.
> > 
> 
> But it's easy for a malicious user to exploit.  Yes?

Without limits, sure.  Burn malicious idiot's library card :)  The real
pain begins when you start examining legitimate loads.  It _can_ get
really and truly fugly.  Generally doesn't.

	-Mike


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-16 23:32           ` Tim Chen
  2006-05-17  4:25             ` Mike Galbraith
@ 2006-05-17  8:23             ` Con Kolivas
  2006-05-17  9:49               ` Mike Galbraith
  2006-05-17 19:33               ` Chen, Kenneth W
  1 sibling, 2 replies; 42+ messages in thread
From: Con Kolivas @ 2006-05-17  8:23 UTC (permalink / raw)
  To: tim.c.chen
  Cc: Chen, Kenneth W, linux-kernel, mingo, Andrew Morton,
	Mike Galbraith

On Wednesday 17 May 2006 09:32, Tim Chen wrote:
> On Tue, 2006-05-16 at 09:45 +1000, Con Kolivas wrote:
> > Yes it's only designed to detect something that has been asleep for an
> > arbitrary long time and "categorised as idle"; it is not supposed to be a
> > priority stepping stone for everything, in this case at MAX_BONUS-1. Mike
> > proposed doing this instead, but it was never my intent.
>
> It seems like just one sleep longer than INTERACTIVE_SLEEP is needed
> kick the priority of a process all the way to MAX_BONUS-1 and boost the
> sleep_avg, regardless of what the prior sleep_avg was.
>
> So if there is a cpu hog that has long sleeps occasionally, once it woke
> up, its priority will get boosted close to maximum, likely starving out
> other processes for a while till its sleep_avg gets reduced.  This
> behavior seems like something to avoid according to the original code
> comment.  Are we boosting the priority too quickly?

Two things strike me here. I'll explain them in the patch below.

How's this look?
---
The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
and not explicit enough. The sleep boost is not supposed to be any larger
than without this code and the comment is not clear enough about what exactly
it does, just the reason it does it.

There is a ceiling to the priority beyond which tasks that only ever sleep
for very long periods cannot surpass.

Opportunity to micro-optimise and re-use the ceiling variable.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 kernel/sched.c |   28 +++++++++++-----------------
 1 files changed, 11 insertions(+), 17 deletions(-)

Index: linux-2.6.17-rc4-mm1/kernel/sched.c
===================================================================
--- linux-2.6.17-rc4-mm1.orig/kernel/sched.c	2006-05-17 15:57:49.000000000 +1000
+++ linux-2.6.17-rc4-mm1/kernel/sched.c	2006-05-17 18:19:29.000000000 +1000
@@ -904,20 +904,14 @@ static int recalc_task_prio(task_t *p, u
 	}
 
 	if (likely(sleep_time > 0)) {
-		/*
-		 * User tasks that sleep a long time are categorised as
-		 * idle. They will only have their sleep_avg increased to a
-		 * level that makes them just interactive priority to stay
-		 * active yet prevent them suddenly becoming cpu hogs and
-		 * starving other processes.
-		 */
-		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
-				unsigned long ceiling;
+		unsigned long ceiling = INTERACTIVE_SLEEP(p);
 
-				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
-					DEF_TIMESLICE);
-				if (p->sleep_avg < ceiling)
-					p->sleep_avg = ceiling;
+		if (p->mm && sleep_time > ceiling && p->sleep_avg < ceiling) {
+			/*
+			 * Prevents user tasks from achieving best priority
+			 * with one single large enough sleep.
+			 */
+			p->sleep_avg = ceiling;
 		} else {
 			/*
 			 * Tasks waking from uninterruptible sleep are
@@ -925,12 +919,12 @@ static int recalc_task_prio(task_t *p, u
 			 * are likely to be waiting on I/O
 			 */
 			if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
-				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
+				if (p->sleep_avg >= ceiling)
 					sleep_time = 0;
 				else if (p->sleep_avg + sleep_time >=
-						INTERACTIVE_SLEEP(p)) {
-					p->sleep_avg = INTERACTIVE_SLEEP(p);
-					sleep_time = 0;
+					 ceiling) {
+						p->sleep_avg = ceiling;
+						sleep_time = 0;
 				}
 			}
 
-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17  8:23             ` Con Kolivas
@ 2006-05-17  9:49               ` Mike Galbraith
  2006-05-17 10:25                 ` Con Kolivas
  2006-05-17 19:33               ` Chen, Kenneth W
  1 sibling, 1 reply; 42+ messages in thread
From: Mike Galbraith @ 2006-05-17  9:49 UTC (permalink / raw)
  To: Con Kolivas
  Cc: tim.c.chen, Chen, Kenneth W, linux-kernel, mingo, Andrew Morton

On Wed, 2006-05-17 at 18:23 +1000, Con Kolivas wrote:

> There is a ceiling to the priority beyond which tasks that only ever sleep
> for very long periods cannot surpass.

(Hmm.  The intent is more clear, ie reserve the top for low latency
tasks,... but that sounds a bit like xmms protection.) 

The main problem I see with this ceiling, solely from the interactivity
viewpoint, is that interactive tasks which have started burning cpu
and/or freshly forked interactive tasks land in the same spot.  Thud.c
demonstrates this problem quite well.  You don't want a few copies of
thud in the same queue with your interactive task, much less above it if
it's used enough cpu to drop a notch or two.  Much pain ensues.

	-Mike

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17  9:49               ` Mike Galbraith
@ 2006-05-17 10:25                 ` Con Kolivas
  2006-05-17 11:42                   ` Mike Galbraith
  0 siblings, 1 reply; 42+ messages in thread
From: Con Kolivas @ 2006-05-17 10:25 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: tim.c.chen, Chen, Kenneth W, linux-kernel, mingo, Andrew Morton

On Wednesday 17 May 2006 19:49, Mike Galbraith wrote:
> On Wed, 2006-05-17 at 18:23 +1000, Con Kolivas wrote:
> > There is a ceiling to the priority beyond which tasks that only ever
> > sleep for very long periods cannot surpass.
>
> (Hmm.  The intent is more clear, ie reserve the top for low latency
> tasks,... but that sounds a bit like xmms protection.)
>
> The main problem I see with this ceiling, solely from the interactivity
> viewpoint, is that interactive tasks which have started burning cpu
> and/or freshly forked interactive tasks land in the same spot.  Thud.c
> demonstrates this problem quite well.  You don't want a few copies of
> thud in the same queue with your interactive task, much less above it if
> it's used enough cpu to drop a notch or two.  Much pain ensues.

If there are thud processes that have earned their place they deserve it, just 
as any other task. If something uses only 1% cpu it is unfair just because it 
only ever sleeps for long sleeps to prevent it getting as good priority as 
anything else that uses only 1% cpu. I've noticed that 'top' suffers this 
fate for example. The problem I've had all along with thud as a test case is 
that while it causes a pause on the system for potentially a few seconds, it 
does this by raising the load transiently to a load of 50 (or whatever you 
set it to). I have no problem with a system having a hiccup when the load is 
50, provided the system eventually recovers and isn't starved forever (which 
it isn't). There are other means to prevent one user having that many running 
tasks if so desired.

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17 10:25                 ` Con Kolivas
@ 2006-05-17 11:42                   ` Mike Galbraith
  2006-05-17 12:46                     ` Con Kolivas
  0 siblings, 1 reply; 42+ messages in thread
From: Mike Galbraith @ 2006-05-17 11:42 UTC (permalink / raw)
  To: Con Kolivas
  Cc: tim.c.chen, Chen, Kenneth W, linux-kernel, mingo, Andrew Morton

On Wed, 2006-05-17 at 20:25 +1000, Con Kolivas wrote:
> On Wednesday 17 May 2006 19:49, Mike Galbraith wrote:
> > On Wed, 2006-05-17 at 18:23 +1000, Con Kolivas wrote:
> > > There is a ceiling to the priority beyond which tasks that only ever
> > > sleep for very long periods cannot surpass.
> >
> > (Hmm.  The intent is more clear, ie reserve the top for low latency
> > tasks,... but that sounds a bit like xmms protection.)
> >
> > The main problem I see with this ceiling, solely from the interactivity
> > viewpoint, is that interactive tasks which have started burning cpu
> > and/or freshly forked interactive tasks land in the same spot.  Thud.c
> > demonstrates this problem quite well.  You don't want a few copies of
> > thud in the same queue with your interactive task, much less above it if
> > it's used enough cpu to drop a notch or two.  Much pain ensues.
> 
> If there are thud processes that have earned their place they deserve it, just 
> as any other task. If something uses only 1% cpu it is unfair just because it 

Fair?  I said interactivity wise.  But what the heck, if we're talking
fairness, I can say the same thing about I/O bound tasks.  Heck, it's
not fair to stop any task from reaching the top, and it's certainly not
fair to let them have (for all practical purposes) all the cpu they want
once they sleep enough.  Shoot, the scheduler is unfair even without any
interactivity code.  Long term, it splits tasks into two groups... those
that sleep for more than 50% of the time... yack yack yack... zzzzz

Let's stick to the interactivity side :)

> only ever sleeps for long sleeps to prevent it getting as good priority as 
> anything else that uses only 1% cpu. I've noticed that 'top' suffers this 
> fate for example. The problem I've had all along with thud as a test case is 
> that while it causes a pause on the system for potentially a few seconds, it 
> does this by raising the load transiently to a load of 50 (or whatever you 
> set it to). I have no problem with a system having a hiccup when the load is 
> 50, provided the system eventually recovers and isn't starved forever (which 
> it isn't). There are other means to prevent one user having that many running 
> tasks if so desired.

Three of the little buggers are enough to cause plenty of pain.

	-Mike


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17 11:42                   ` Mike Galbraith
@ 2006-05-17 12:46                     ` Con Kolivas
  2006-05-17 13:41                       ` Mike Galbraith
  0 siblings, 1 reply; 42+ messages in thread
From: Con Kolivas @ 2006-05-17 12:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mike Galbraith, tim.c.chen, Chen, Kenneth W, mingo, Andrew Morton

On Wednesday 17 May 2006 21:42, Mike Galbraith wrote:
> Fair?  I said interactivity wise.  But what the heck, if we're talking
> fairness, I can say the same thing about I/O bound tasks.  Heck, it's
> not fair to stop any task from reaching the top, and it's certainly not
> fair to let them have (for all practical purposes) all the cpu they want
> once they sleep enough.

Toss out the I/O bound thing and we turn into a steaming dripping pile of dog 
doo whenever anything does disk I/O. And damned if there aren't a lot of pcs 
that have hard disks... 

> Shoot, the scheduler is unfair even without any 
> interactivity code.  Long term, it splits tasks into two groups... those
> that sleep for more than 50% of the time... yack yack yack... zzzzz
>
> Let's stick to the interactivity side :)

It's a deal.

> > only ever sleeps for long sleeps to prevent it getting as good priority
> > as anything else that uses only 1% cpu. I've noticed that 'top' suffers
> > this fate for example. The problem I've had all along with thud as a test
> > case is that while it causes a pause on the system for potentially a few
> > seconds, it does this by raising the load transiently to a load of 50 (or
> > whatever you set it to). I have no problem with a system having a hiccup
> > when the load is 50, provided the system eventually recovers and isn't
> > starved forever (which it isn't). There are other means to prevent one
> > user having that many running tasks if so desired.
>
> Three of the little buggers are enough to cause plenty of pain.

Spits and stutters are not starvation. Luckily it gets no worse with this 
patch.

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17 12:46                     ` Con Kolivas
@ 2006-05-17 13:41                       ` Mike Galbraith
  2006-05-17 15:10                         ` Con Kolivas
  0 siblings, 1 reply; 42+ messages in thread
From: Mike Galbraith @ 2006-05-17 13:41 UTC (permalink / raw)
  To: Con Kolivas
  Cc: linux-kernel, tim.c.chen, Chen, Kenneth W, mingo, Andrew Morton

On Wed, 2006-05-17 at 22:46 +1000, Con Kolivas wrote:
> On Wednesday 17 May 2006 21:42, Mike Galbraith wrote:
> > Fair?  I said interactivity wise.  But what the heck, if we're talking
> > fairness, I can say the same thing about I/O bound tasks.  Heck, it's
> > not fair to stop any task from reaching the top, and it's certainly not
> > fair to let them have (for all practical purposes) all the cpu they want
> > once they sleep enough.
> 
> Toss out the I/O bound thing and we turn into a steaming dripping pile of dog 
> doo whenever anything does disk I/O. And damned if there aren't a lot of pcs 
> that have hard disks...

(you should have tried my patch set)

> Spits and stutters are not starvation. Luckily it gets no worse with this 
> patch.

Ok, I'll accept that.  Spits and stutters _are_ interactivity issues
though yes?.  Knowing full well that plunking long sleepers into the
queue you are plunking them into causes spits and stutters, why do you
insist on doing so?

Oh well, we're well on the way to agreeing to disagree again, so let's
just get it over with.  I hereby agree to disagree.

	Cheers,

	-Mike


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17 13:41                       ` Mike Galbraith
@ 2006-05-17 15:10                         ` Con Kolivas
  2006-05-17 17:21                           ` Ray Lee
  0 siblings, 1 reply; 42+ messages in thread
From: Con Kolivas @ 2006-05-17 15:10 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-kernel, tim.c.chen, Chen, Kenneth W, mingo, Andrew Morton

On Wednesday 17 May 2006 23:41, Mike Galbraith wrote:
> On Wed, 2006-05-17 at 22:46 +1000, Con Kolivas wrote:
> > On Wednesday 17 May 2006 21:42, Mike Galbraith wrote:
> > > Fair?  I said interactivity wise.  But what the heck, if we're talking
> > > fairness, I can say the same thing about I/O bound tasks.  Heck, it's
> > > not fair to stop any task from reaching the top, and it's certainly not
> > > fair to let them have (for all practical purposes) all the cpu they
> > > want once they sleep enough.
> >
> > Toss out the I/O bound thing and we turn into a steaming dripping pile of
> > dog doo whenever anything does disk I/O. And damned if there aren't a lot
> > of pcs that have hard disks...
>
> (you should have tried my patch set)

Last one of yours I tried suffered this.

> > Spits and stutters are not starvation. Luckily it gets no worse with this
> > patch.
>
> Ok, I'll accept that.  Spits and stutters _are_ interactivity issues
> though yes?.  Knowing full well that plunking long sleepers into the
> queue you are plunking them into causes spits and stutters, why do you
> insist on doing so?

Because I know of no real world workload that thuds us into spits and 
stutters.

> Oh well, we're well on the way to agreeing to disagree again, so let's
> just get it over with.  I hereby agree to disagree.

Indeed.

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17 15:10                         ` Con Kolivas
@ 2006-05-17 17:21                           ` Ray Lee
  0 siblings, 0 replies; 42+ messages in thread
From: Ray Lee @ 2006-05-17 17:21 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Mike Galbraith, linux-kernel, tim.c.chen, Chen, Kenneth W, mingo,
	Andrew Morton

On 5/17/06, Con Kolivas <kernel@kolivas.org> wrote:
> > Ok, I'll accept that.  Spits and stutters _are_ interactivity issues
> > though yes?.  Knowing full well that plunking long sleepers into the
> > queue you are plunking them into causes spits and stutters, why do you
> > insist on doing so?
>
> Because I know of no real world workload that thuds us into spits and
> stutters.

`apt-get dist-upgrade` seems to do wonders for making a system
thoroughly unusable. Possibly because it forks a copy of perl and a
few other tasks, all competing for CPU and disk I/O bandwidth -- at
least, competing when they're not sleeping most of the time. Y'all
might add a chroot'd install of Debian as a test case.

Ray

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17  8:23             ` Con Kolivas
  2006-05-17  9:49               ` Mike Galbraith
@ 2006-05-17 19:33               ` Chen, Kenneth W
  2006-05-18  0:35                 ` Con Kolivas
  1 sibling, 1 reply; 42+ messages in thread
From: Chen, Kenneth W @ 2006-05-17 19:33 UTC (permalink / raw)
  To: 'Con Kolivas', tim.c.chen
  Cc: linux-kernel, mingo, Andrew Morton, Mike Galbraith

Con Kolivas wrote on Wednesday, May 17, 2006 1:23 AM
> On Wednesday 17 May 2006 09:32, Tim Chen wrote:
> > It seems like just one sleep longer than INTERACTIVE_SLEEP is needed
> > kick the priority of a process all the way to MAX_BONUS-1 and boost the
> > sleep_avg, regardless of what the prior sleep_avg was.
> >
> > So if there is a cpu hog that has long sleeps occasionally, once it woke
> > up, its priority will get boosted close to maximum, likely starving out
> > other processes for a while till its sleep_avg gets reduced.  This
> > behavior seems like something to avoid according to the original code
> > comment.  Are we boosting the priority too quickly?
> 
> Two things strike me here. I'll explain them in the patch below.
> 
> How's this look?
> ---
> The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
> and not explicit enough. The sleep boost is not supposed to be any larger
> than without this code and the comment is not clear enough about what exactly
> it does, just the reason it does it.
> 
> There is a ceiling to the priority beyond which tasks that only ever sleep
> for very long periods cannot surpass.


It looks bad.  I don't like it. The priority boost is even more peculiar
in this patch.


> --- linux-2.6.17-rc4-mm1.orig/kernel/sched.c	2006-05-17 15:57:49.000000000 +1000
> +++ linux-2.6.17-rc4-mm1/kernel/sched.c	2006-05-17 18:19:29.000000000 +1000
> @@ -904,20 +904,14 @@ static int recalc_task_prio(task_t *p, u
>  	}
>  
>  	if (likely(sleep_time > 0)) {
> -		/*
> -		 * User tasks that sleep a long time are categorised as
> -		 * idle. They will only have their sleep_avg increased to a
> -		 * level that makes them just interactive priority to stay
> -		 * active yet prevent them suddenly becoming cpu hogs and
> -		 * starving other processes.
> -		 */
> -		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
> -				unsigned long ceiling;
> +		unsigned long ceiling = INTERACTIVE_SLEEP(p);
>  
> -				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
> -					DEF_TIMESLICE);
> -				if (p->sleep_avg < ceiling)
> -					p->sleep_avg = ceiling;
> +		if (p->mm && sleep_time > ceiling && p->sleep_avg < ceiling) {
> +			/*
> +			 * Prevents user tasks from achieving best priority
> +			 * with one single large enough sleep.
> +			 */
> +			p->sleep_avg = ceiling;


The assignment of p->sleep_avg = ceiling doesn't make much logical sense.
Because INTERACTIVE_SLEEP is scaled proportionally with nice value, e.g.
the lower the nice value, the lower the interactive_sleep.  However, priority
calculation is inverse of p->sleep_avg, e.g. the smaller the sleep_avg, the
smaller the bonus, thus the higher dynamic priority.

Take one concrete example: for a prolonged sleep, say 1 second, nice(-10)
will have a priority boost of 4 while nice(0) will have a priority boost of
9. The ceiling algorithm looks like is reversed. I would think kernel should
at least enforce same ceiling value independent of nice value.

- Ken

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-17 19:33               ` Chen, Kenneth W
@ 2006-05-18  0:35                 ` Con Kolivas
  2006-05-18  1:10                   ` Chen, Kenneth W
  0 siblings, 1 reply; 42+ messages in thread
From: Con Kolivas @ 2006-05-18  0:35 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton, Mike Galbraith

On Thursday 18 May 2006 05:33, Chen, Kenneth W wrote:
> The assignment of p->sleep_avg = ceiling doesn't make much logical sense.
> Because INTERACTIVE_SLEEP is scaled proportionally with nice value, e.g.
> the lower the nice value, the lower the interactive_sleep.  However,
> priority calculation is inverse of p->sleep_avg, e.g. the smaller the
> sleep_avg, the smaller the bonus, thus the higher dynamic priority.
>
> Take one concrete example: for a prolonged sleep, say 1 second, nice(-10)
> will have a priority boost of 4 while nice(0) will have a priority boost of
> 9. The ceiling algorithm looks like is reversed. I would think kernel
> should at least enforce same ceiling value independent of nice value.

I see why you don't like it. However I still don't think you understand why 
the ceiling of that magnitude is there. Tasks behave very differently 
depending on whether their priority is low enough that they expire at the end 
of every timeslice and get put on the expired array, or their priority is 
high enough that they get reinserted into the active array. It's an intrinsic 
quirk in the "interactive" design of the scheduler that basically means we 
have two classes of task - interactive enough to be reinserted into the 
active array or not. As the comment already says the ceiling is there to 
prevent one sleep from getting them to best priority. I don't want them to 
get high priority with one large enough sleep, but I do want them to get into 
the reinsertion class which behaves entirely differently. What is missing 
from the comment is to say that it is also designed to stop them at the 
lowest possible priority that still keeps them in the interactive reinsertion 
class. Using a constant ceiling value irrespective of nice will not guarantee 
that tasks fall into the active reinsertion class dependant on their nice 
value.

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18  0:35                 ` Con Kolivas
@ 2006-05-18  1:10                   ` Chen, Kenneth W
  2006-05-18  1:38                     ` Con Kolivas
  0 siblings, 1 reply; 42+ messages in thread
From: Chen, Kenneth W @ 2006-05-18  1:10 UTC (permalink / raw)
  To: 'Con Kolivas'
  Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton, Mike Galbraith

Con Kolivas wrote on Wednesday, May 17, 2006 5:35 PM
> What is missing 
> from the comment is to say that it is also designed to stop them at the 
> lowest possible priority that still keeps them in the interactive reinsertion 
> class.

> Using a constant ceiling value irrespective of nice will not guarantee 
> that tasks fall into the active reinsertion class dependant on their nice 
> value.

If I may ask, how does it work right now?  Ceiling is set at constant value
irrespective to nice value.  Are you saying current code is broken as well?

        ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
                  DEF_TIMESLICE);
        if (p->sleep_avg < ceiling)
                  p->sleep_avg = ceiling;

We maybe also misunderstand each other.  I'm not arguing of removing the
ceiling. Having a ceiling is the right thing to do here.  What I don't like
is that 2.6.17-rc4 has the ceiling set too high, and your patch also does
an inversion of the ceiling value w.r.t nice value.  So it's the detail of
what's the right value for priority boost that I'm uncomfortable with.

- Ken

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18  1:10                   ` Chen, Kenneth W
@ 2006-05-18  1:38                     ` Con Kolivas
  2006-05-18  5:44                       ` Mike Galbraith
  0 siblings, 1 reply; 42+ messages in thread
From: Con Kolivas @ 2006-05-18  1:38 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton, Mike Galbraith

On Thursday 18 May 2006 11:10, Chen, Kenneth W wrote:
> Con Kolivas wrote on Wednesday, May 17, 2006 5:35 PM
>
> > What is missing
> > from the comment is to say that it is also designed to stop them at the
> > lowest possible priority that still keeps them in the interactive
> > reinsertion class.
> >
> > Using a constant ceiling value irrespective of nice will not guarantee
> > that tasks fall into the active reinsertion class dependant on their nice
> > value.
>
> If I may ask, how does it work right now?  Ceiling is set at constant value
> irrespective to nice value.  Are you saying current code is broken as well?

In essence, yes. The approximation was too general and vague as you have 
pointed out, and was only close to the correct ceiling or knee for one nice 
value.  I just want to formalise the relationship between the ceiling, nice 
value and INTERACTIVE_SLEEP and make the comment clear enough to be 
understood.

>         ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
>                   DEF_TIMESLICE);
>         if (p->sleep_avg < ceiling)
>                   p->sleep_avg = ceiling;
>
>
> We maybe also misunderstand each other.  I'm not arguing of removing the
> ceiling. Having a ceiling is the right thing to do here.  What I don't like
> is that 2.6.17-rc4 has the ceiling set too high, and your patch also does
> an inversion of the ceiling value w.r.t nice value.  So it's the detail of
> what's the right value for priority boost that I'm uncomfortable with.

We're in fuzzy land where there are no absolutes I'm afraid. The common case 
is obviously nice 0 and anything else is simply respecting the relationship 
between nice and INTERACTIVE_SLEEP. As timeslices get proportionately larger 
the lower the nice value, it becomes increasingly easy to avoid getting to 
the end of a full timeslice thereby avoiding expiration anyway. It also 
doesn't take much sleep from a task to get to best priority from the priority 
knee/ceiling no matter how low it is. Finally the relationship between likely 
preemption with varying nice levels is sort of vaguely maintained. On balance 
it seems satisfactory to me to maintain this relationship.

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18  1:38                     ` Con Kolivas
@ 2006-05-18  5:44                       ` Mike Galbraith
  2006-05-18  5:52                         ` Con Kolivas
  0 siblings, 1 reply; 42+ messages in thread
From: Mike Galbraith @ 2006-05-18  5:44 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Chen, Kenneth W, tim.c.chen, linux-kernel, mingo, Andrew Morton

On Thu, 2006-05-18 at 11:38 +1000, Con Kolivas wrote:
> I just want to formalise the relationship between the ceiling, nice 
> value and INTERACTIVE_SLEEP and make the comment clear enough to be 
> understood.

Oh yeah, that reminded me...

INTERACTIVE_SLEEP(p) for nice(>=16) tasks is > NS_MAX_SLEEP_AVG.
CURRENT_BONUS(p) if it took the long sleep path can end up being 11,
which will lead to Ka-[fword]-BOOM in scheduler_tick() for an SMP box.
See TIMESLICE_GRANULARITY(p).  (btdt;) 

	-Mike


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18  5:44                       ` Mike Galbraith
@ 2006-05-18  5:52                         ` Con Kolivas
  2006-05-18  7:04                           ` Mike Galbraith
                                             ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Con Kolivas @ 2006-05-18  5:52 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Chen, Kenneth W, tim.c.chen, linux-kernel, mingo, Andrew Morton

On Thursday 18 May 2006 15:44, Mike Galbraith wrote:
> On Thu, 2006-05-18 at 11:38 +1000, Con Kolivas wrote:
> > I just want to formalise the relationship between the ceiling, nice
> > value and INTERACTIVE_SLEEP and make the comment clear enough to be
> > understood.
>
> Oh yeah, that reminded me...
>
> INTERACTIVE_SLEEP(p) for nice(>=16) tasks is > NS_MAX_SLEEP_AVG.
> CURRENT_BONUS(p) if it took the long sleep path can end up being 11,
> which will lead to Ka-[fword]-BOOM in scheduler_tick() for an SMP box.
> See TIMESLICE_GRANULARITY(p).  (btdt;)

Thanks. This updated one fixes that and clears up the remaining mystery of
why the ceiling is the size it is in the comments.

---
The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
and not explicit enough. The sleep boost is not supposed to be any larger
than without this code and the comment is not clear enough about what exactly
it does, just the reason it does it.

There is a ceiling to the priority beyond which tasks that only ever sleep
for very long periods cannot surpass.

Opportunity to micro-optimise and re-use the ceiling variable.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 kernel/sched.c |   33 ++++++++++++++++-----------------
 1 files changed, 16 insertions(+), 17 deletions(-)

Index: linux-2.6.17-rc4-mm1/kernel/sched.c
===================================================================
--- linux-2.6.17-rc4-mm1.orig/kernel/sched.c	2006-05-17 15:57:49.000000000 +1000
+++ linux-2.6.17-rc4-mm1/kernel/sched.c	2006-05-18 15:48:47.000000000 +1000
@@ -905,19 +905,18 @@ static int recalc_task_prio(task_t *p, u
 
 	if (likely(sleep_time > 0)) {
 		/*
-		 * User tasks that sleep a long time are categorised as
-		 * idle. They will only have their sleep_avg increased to a
-		 * level that makes them just interactive priority to stay
-		 * active yet prevent them suddenly becoming cpu hogs and
-		 * starving other processes.
+		 * This ceiling is set to the lowest priority that would allow
+		 * a task to be reinserted into the active array on timeslice
+		 * completion.
 		 */
-		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
-				unsigned long ceiling;
+		unsigned long ceiling = INTERACTIVE_SLEEP(p);
 
-				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
-					DEF_TIMESLICE);
-				if (p->sleep_avg < ceiling)
-					p->sleep_avg = ceiling;
+		if (p->mm && sleep_time > ceiling && p->sleep_avg < ceiling) {
+			/*
+			 * Prevents user tasks from achieving best priority
+			 * with one single large enough sleep.
+			 */
+			p->sleep_avg = ceiling;
 		} else {
 			/*
 			 * Tasks waking from uninterruptible sleep are
@@ -925,12 +924,12 @@ static int recalc_task_prio(task_t *p, u
 			 * are likely to be waiting on I/O
 			 */
 			if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
-				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
+				if (p->sleep_avg >= ceiling)
 					sleep_time = 0;
 				else if (p->sleep_avg + sleep_time >=
-						INTERACTIVE_SLEEP(p)) {
-					p->sleep_avg = INTERACTIVE_SLEEP(p);
-					sleep_time = 0;
+					 ceiling) {
+						p->sleep_avg = ceiling;
+						sleep_time = 0;
 				}
 			}
 
@@ -944,9 +943,9 @@ static int recalc_task_prio(task_t *p, u
 			 */
 			p->sleep_avg += sleep_time;
 
-			if (p->sleep_avg > NS_MAX_SLEEP_AVG)
-				p->sleep_avg = NS_MAX_SLEEP_AVG;
 		}
+		if (p->sleep_avg > NS_MAX_SLEEP_AVG)
+			p->sleep_avg = NS_MAX_SLEEP_AVG;
 	}
 
 	return effective_prio(p);

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18  5:52                         ` Con Kolivas
@ 2006-05-18  7:04                           ` Mike Galbraith
  2006-05-18 12:59                             ` Mike Galbraith
  2006-05-18 23:17                           ` Chen, Kenneth W
  2006-05-18 23:34                           ` Regression seen for patch "sched:dont decrease idle sleep avg" Chen, Kenneth W
  2 siblings, 1 reply; 42+ messages in thread
From: Mike Galbraith @ 2006-05-18  7:04 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Chen, Kenneth W, tim.c.chen, linux-kernel, mingo, Andrew Morton

On Thu, 2006-05-18 at 15:52 +1000, Con Kolivas wrote:
> On Thursday 18 May 2006 15:44, Mike Galbraith wrote:
> > On Thu, 2006-05-18 at 11:38 +1000, Con Kolivas wrote:
> > > I just want to formalise the relationship between the ceiling, nice
> > > value and INTERACTIVE_SLEEP and make the comment clear enough to be
> > > understood.
> >
> > Oh yeah, that reminded me...
> >
> > INTERACTIVE_SLEEP(p) for nice(>=16) tasks is > NS_MAX_SLEEP_AVG.
> > CURRENT_BONUS(p) if it took the long sleep path can end up being 11,
> > which will lead to Ka-[fword]-BOOM in scheduler_tick() for an SMP box.
> > See TIMESLICE_GRANULARITY(p).  (btdt;)
> 
> Thanks. This updated one fixes that and clears up the remaining mystery of
> why the ceiling is the size it is in the comments.

OK, after some brief testing, I think this is a step in the right
direct, but there is another problem.  In the case where the queue isn't
empty, the stated intent is utterly defeated by the on runqueue bonus.

If you fix this, the thud and irman2 kind of pain mostly goes away for
interactive tasks, and a lot of starvation scenarios go as well.

The best way I've found to fix these kind of boost problems is to just
say no if the task is using more than it's fair share of cpu, and in NO
case, let one wait rocket you to the top... that sets you up for a queue
round-robin nightmare (my interactive feeding frenzy scenario).  It
doesn't take much for tasks that nanosleep and round-robin before it
becomes impossible for them to use their sleep_avg.  I would say nuke
that code, except that it is the only chance at fairness some tasks
have.  Nuking it is definitely easier that getting it right.

	-Mike

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18  7:04                           ` Mike Galbraith
@ 2006-05-18 12:59                             ` Mike Galbraith
  2006-05-19  1:10                               ` Con Kolivas
  0 siblings, 1 reply; 42+ messages in thread
From: Mike Galbraith @ 2006-05-18 12:59 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Chen, Kenneth W, tim.c.chen, linux-kernel, mingo, Andrew Morton

On Thu, 2006-05-18 at 09:04 +0200, Mike Galbraith wrote:

> OK, after some brief testing, I think this is a step in the right
> direct, but there is another problem.  In the case where the queue isn't
> empty, the stated intent is utterly defeated by the on runqueue bonus.

The overly verbose one liner below could serve as a minimal ~fix.

Prevent the on-runqueue bonus logic from defeating the idle sleep logic.

Signed-off-by: Mike Galbraith <efault@gmx.de>

--- linux-2.6.17-rc4-mm1/kernel/sched.c.org	2006-05-18 08:38:13.000000000 +0200
+++ linux-2.6.17-rc4-mm1/kernel/sched.c	2006-05-18 14:47:09.000000000 +0200
@@ -917,6 +917,16 @@ static int recalc_task_prio(task_t *p, u
 			 * with one single large enough sleep.
 			 */
 			p->sleep_avg = ceiling;
+			/*
+			 * Using INTERACTIVE_SLEEP() as a ceiling places a
+			 * nice(0) task 1ms sleep away from promotion, and
+			 * gives it 700ms to round-robin with no chance of
+			 * being demoted.  This is more than generous, so
+			 * mark this sleep as non-interactive to prevent the
+			 * on-runqueue bonus logic from intervening should
+			 * this task not receive cpu immediately.
+			 */
+			p->sleep_type = SLEEP_NONINTERACTIVE;
 		} else {
 			/*
 			 * Tasks waking from uninterruptible sleep are



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18 12:59                             ` Mike Galbraith
@ 2006-05-19  1:10                               ` Con Kolivas
  0 siblings, 0 replies; 42+ messages in thread
From: Con Kolivas @ 2006-05-19  1:10 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Chen, Kenneth W, tim.c.chen, linux-kernel, mingo, Andrew Morton

On Thursday 18 May 2006 22:59, Mike Galbraith wrote:
> On Thu, 2006-05-18 at 09:04 +0200, Mike Galbraith wrote:
> > OK, after some brief testing, I think this is a step in the right
> > direct, but there is another problem.  In the case where the queue isn't
> > empty, the stated intent is utterly defeated by the on runqueue bonus.
>
> The overly verbose one liner below could serve as a minimal ~fix.
>
> Prevent the on-runqueue bonus logic from defeating the idle sleep logic.
>
> Signed-off-by: Mike Galbraith <efault@gmx.de>
>
> --- linux-2.6.17-rc4-mm1/kernel/sched.c.org	2006-05-18 08:38:13.000000000
> +0200 +++ linux-2.6.17-rc4-mm1/kernel/sched.c	2006-05-18 14:47:09.000000000
> +0200 @@ -917,6 +917,16 @@ static int recalc_task_prio(task_t *p, u
>  			 * with one single large enough sleep.
>  			 */
>  			p->sleep_avg = ceiling;
> +			/*
> +			 * Using INTERACTIVE_SLEEP() as a ceiling places a
> +			 * nice(0) task 1ms sleep away from promotion, and
> +			 * gives it 700ms to round-robin with no chance of
> +			 * being demoted.  This is more than generous, so
> +			 * mark this sleep as non-interactive to prevent the
> +			 * on-runqueue bonus logic from intervening should
> +			 * this task not receive cpu immediately.
> +			 */
> +			p->sleep_type = SLEEP_NONINTERACTIVE;
>  		} else {
>  			/*
>  			 * Tasks waking from uninterruptible sleep are

Yes I like it. It's consistent with the intent and lacking from the current 
design.

Ok I'll wrap this and Ken's patches all together and ask for them to be pushed 
upstream. I'd go so far as to say they are mostly obvious comment and 
bugfixes for the code that is already going into 2.6.17 and we should short 
circuit them to mainline if it's ok with Ingo and Akpm.

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18  5:52                         ` Con Kolivas
  2006-05-18  7:04                           ` Mike Galbraith
@ 2006-05-18 23:17                           ` Chen, Kenneth W
  2006-05-19  1:30                             ` [PATCH] sched: fix interactive ceiling code Con Kolivas
  2006-05-18 23:34                           ` Regression seen for patch "sched:dont decrease idle sleep avg" Chen, Kenneth W
  2 siblings, 1 reply; 42+ messages in thread
From: Chen, Kenneth W @ 2006-05-18 23:17 UTC (permalink / raw)
  To: 'Con Kolivas', Mike Galbraith
  Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton

Con Kolivas wrote on Wednesday, May 17, 2006 10:52 PM
> The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
> and not explicit enough. The sleep boost is not supposed to be any larger
> than without this code and the comment is not clear enough about what exactly
> it does, just the reason it does it.
> 
> There is a ceiling to the priority beyond which tasks that only ever sleep
> for very long periods cannot surpass.
> 
> Opportunity to micro-optimise and re-use the ceiling variable.


More opportunity to micro-optimize: now that you've put the clamp code of
p->sleep_avg in the common path, there is no need to clamp sleep_time at
the beginning of that function.  Just let the calculation go through and
clamp the final value of p->sleep_avg to NS_MAX_SLEEP_AVG at the end.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>

--- ./kernel/sched.c.orig	2006-05-18 12:51:45.000000000 -0700
+++ ./kernel/sched.c	2006-05-18 14:53:47.000000000 -0700
@@ -731,17 +731,10 @@
 static int recalc_task_prio(task_t *p, unsigned long long now)
 {
 	/* Caller must always ensure 'now >= p->timestamp' */
-	unsigned long long __sleep_time = now - p->timestamp;
-	unsigned long sleep_time;
+	unsigned long sleep_time = now - p->timestamp;
 
 	if (batch_task(p))
 		sleep_time = 0;
-	else {
-		if (__sleep_time > NS_MAX_SLEEP_AVG)
-			sleep_time = NS_MAX_SLEEP_AVG;
-		else
-			sleep_time = (unsigned long)__sleep_time;
-	}
 
 	if (likely(sleep_time > 0)) {
 		/*



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] sched: fix interactive ceiling code
  2006-05-18 23:17                           ` Chen, Kenneth W
@ 2006-05-19  1:30                             ` Con Kolivas
  2006-05-19  2:02                               ` Mike Galbraith
                                                 ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Con Kolivas @ 2006-05-19  1:30 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: Mike Galbraith, tim.c.chen, linux-kernel, mingo, Andrew Morton,
	mbligh

Ingo, Andrew, I think these are minor logic fixes and comments that correct 
a patch that has already been pushed to 2.6.17- and I would like them short
circuited to mainline if everyone is comfortable with it.
 
Ken, Mike can I ask you to put a signed off on this patch for your 
contributions please?

Martin can I please ask for this to be put on test.kernel.org as a last
minute sanity check? I know URLs are easier so here is the patch for 2.6.17-rc4:
http://ck.kolivas.org/patches/crap/sched-fix_interactive_ceiling.patch

---
The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
and not explicit enough. The sleep boost is not supposed to be any larger
than without this code and the comment is not clear enough about what exactly
it does, just the reason it does it. Fix it.

There is a ceiling to the priority beyond which tasks that only ever sleep
for very long periods cannot surpass. Fix it.

Prevent the on-runqueue bonus logic from defeating the idle sleep logic.

Opportunity to micro-optimise.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 kernel/sched.c |   52 +++++++++++++++++++++++++++-------------------------
 1 files changed, 27 insertions(+), 25 deletions(-)

Index: linux-2.6.17-rc4/kernel/sched.c
===================================================================
--- linux-2.6.17-rc4.orig/kernel/sched.c	2006-05-19 11:25:01.000000000 +1000
+++ linux-2.6.17-rc4/kernel/sched.c	2006-05-19 11:25:14.000000000 +1000
@@ -731,33 +731,35 @@ static inline void __activate_idle_task(
 static int recalc_task_prio(task_t *p, unsigned long long now)
 {
 	/* Caller must always ensure 'now >= p->timestamp' */
-	unsigned long long __sleep_time = now - p->timestamp;
-	unsigned long sleep_time;
+	unsigned long sleep_time = now - p->timestamp;
 
 	if (batch_task(p))
 		sleep_time = 0;
-	else {
-		if (__sleep_time > NS_MAX_SLEEP_AVG)
-			sleep_time = NS_MAX_SLEEP_AVG;
-		else
-			sleep_time = (unsigned long)__sleep_time;
-	}
 
 	if (likely(sleep_time > 0)) {
 		/*
-		 * User tasks that sleep a long time are categorised as
-		 * idle. They will only have their sleep_avg increased to a
-		 * level that makes them just interactive priority to stay
-		 * active yet prevent them suddenly becoming cpu hogs and
-		 * starving other processes.
+		 * This ceiling is set to the lowest priority that would allow
+		 * a task to be reinserted into the active array on timeslice
+		 * completion.
 		 */
-		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
-				unsigned long ceiling;
+		unsigned long ceiling = INTERACTIVE_SLEEP(p);
 
-				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
-					DEF_TIMESLICE);
-				if (p->sleep_avg < ceiling)
-					p->sleep_avg = ceiling;
+		if (p->mm && sleep_time > ceiling && p->sleep_avg < ceiling) {
+			/*
+			 * Prevents user tasks from achieving best priority
+			 * with one single large enough sleep.
+			 */
+			p->sleep_avg = ceiling;
+			/*
+			 * Using INTERACTIVE_SLEEP() as a ceiling places a
+			 * nice(0) task 1ms sleep away from promotion, and
+			 * gives it 700ms to round-robin with no chance of
+			 * being demoted.  This is more than generous, so
+			 * mark this sleep as non-interactive to prevent the
+			 * on-runqueue bonus logic from intervening should
+			 * this task not receive cpu immediately.
+			 */
+			p->sleep_type = SLEEP_NONINTERACTIVE;
 		} else {
 			/*
 			 * Tasks waking from uninterruptible sleep are
@@ -765,12 +767,12 @@ static int recalc_task_prio(task_t *p, u
 			 * are likely to be waiting on I/O
 			 */
 			if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
-				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
+				if (p->sleep_avg >= ceiling)
 					sleep_time = 0;
 				else if (p->sleep_avg + sleep_time >=
-						INTERACTIVE_SLEEP(p)) {
-					p->sleep_avg = INTERACTIVE_SLEEP(p);
-					sleep_time = 0;
+					 ceiling) {
+						p->sleep_avg = ceiling;
+						sleep_time = 0;
 				}
 			}
 
@@ -784,9 +786,9 @@ static int recalc_task_prio(task_t *p, u
 			 */
 			p->sleep_avg += sleep_time;
 
-			if (p->sleep_avg > NS_MAX_SLEEP_AVG)
-				p->sleep_avg = NS_MAX_SLEEP_AVG;
 		}
+		if (p->sleep_avg > NS_MAX_SLEEP_AVG)
+			p->sleep_avg = NS_MAX_SLEEP_AVG;
 	}
 
 	return effective_prio(p);
-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] sched: fix interactive ceiling code
  2006-05-19  1:30                             ` [PATCH] sched: fix interactive ceiling code Con Kolivas
@ 2006-05-19  2:02                               ` Mike Galbraith
  2006-05-19  9:40                               ` Ingo Molnar
  2006-05-19 14:37                               ` Chen, Kenneth W
  2 siblings, 0 replies; 42+ messages in thread
From: Mike Galbraith @ 2006-05-19  2:02 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Chen, Kenneth W, tim.c.chen, linux-kernel, mingo, Andrew Morton,
	mbligh

On Fri, 2006-05-19 at 11:30 +1000, Con Kolivas wrote:
> Ingo, Andrew, I think these are minor logic fixes and comments that correct 
> a patch that has already been pushed to 2.6.17- and I would like them short
> circuited to mainline if everyone is comfortable with it.
>  
> Ken, Mike can I ask you to put a signed off on this patch for your 
> contributions please?

Done.

(I hope nobody ever rolls a patch from 30 contributors;)

> Martin can I please ask for this to be put on test.kernel.org as a last
> minute sanity check? I know URLs are easier so here is the patch for 2.6.17-rc4:
> http://ck.kolivas.org/patches/crap/sched-fix_interactive_ceiling.patch
> 
> ---
> The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
> and not explicit enough. The sleep boost is not supposed to be any larger
> than without this code and the comment is not clear enough about what exactly
> it does, just the reason it does it. Fix it.
> 
> There is a ceiling to the priority beyond which tasks that only ever sleep
> for very long periods cannot surpass. Fix it.
> 
> Prevent the on-runqueue bonus logic from defeating the idle sleep logic.
> 
> Opportunity to micro-optimise.
> 
> Signed-off-by: Con Kolivas <kernel@kolivas.org>

Signed-off-by: Mike Galbraith <efault@gmx.de>

> 
> ---
>  kernel/sched.c |   52 +++++++++++++++++++++++++++-------------------------
>  1 files changed, 27 insertions(+), 25 deletions(-)
> 
> Index: linux-2.6.17-rc4/kernel/sched.c
> ===================================================================
> --- linux-2.6.17-rc4.orig/kernel/sched.c	2006-05-19 11:25:01.000000000 +1000
> +++ linux-2.6.17-rc4/kernel/sched.c	2006-05-19 11:25:14.000000000 +1000
> @@ -731,33 +731,35 @@ static inline void __activate_idle_task(
>  static int recalc_task_prio(task_t *p, unsigned long long now)
>  {
>  	/* Caller must always ensure 'now >= p->timestamp' */
> -	unsigned long long __sleep_time = now - p->timestamp;
> -	unsigned long sleep_time;
> +	unsigned long sleep_time = now - p->timestamp;
>  
>  	if (batch_task(p))
>  		sleep_time = 0;
> -	else {
> -		if (__sleep_time > NS_MAX_SLEEP_AVG)
> -			sleep_time = NS_MAX_SLEEP_AVG;
> -		else
> -			sleep_time = (unsigned long)__sleep_time;
> -	}
>  
>  	if (likely(sleep_time > 0)) {
>  		/*
> -		 * User tasks that sleep a long time are categorised as
> -		 * idle. They will only have their sleep_avg increased to a
> -		 * level that makes them just interactive priority to stay
> -		 * active yet prevent them suddenly becoming cpu hogs and
> -		 * starving other processes.
> +		 * This ceiling is set to the lowest priority that would allow
> +		 * a task to be reinserted into the active array on timeslice
> +		 * completion.
>  		 */
> -		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
> -				unsigned long ceiling;
> +		unsigned long ceiling = INTERACTIVE_SLEEP(p);
>  
> -				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
> -					DEF_TIMESLICE);
> -				if (p->sleep_avg < ceiling)
> -					p->sleep_avg = ceiling;
> +		if (p->mm && sleep_time > ceiling && p->sleep_avg < ceiling) {
> +			/*
> +			 * Prevents user tasks from achieving best priority
> +			 * with one single large enough sleep.
> +			 */
> +			p->sleep_avg = ceiling;
> +			/*
> +			 * Using INTERACTIVE_SLEEP() as a ceiling places a
> +			 * nice(0) task 1ms sleep away from promotion, and
> +			 * gives it 700ms to round-robin with no chance of
> +			 * being demoted.  This is more than generous, so
> +			 * mark this sleep as non-interactive to prevent the
> +			 * on-runqueue bonus logic from intervening should
> +			 * this task not receive cpu immediately.
> +			 */
> +			p->sleep_type = SLEEP_NONINTERACTIVE;
>  		} else {
>  			/*
>  			 * Tasks waking from uninterruptible sleep are
> @@ -765,12 +767,12 @@ static int recalc_task_prio(task_t *p, u
>  			 * are likely to be waiting on I/O
>  			 */
>  			if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
> -				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
> +				if (p->sleep_avg >= ceiling)
>  					sleep_time = 0;
>  				else if (p->sleep_avg + sleep_time >=
> -						INTERACTIVE_SLEEP(p)) {
> -					p->sleep_avg = INTERACTIVE_SLEEP(p);
> -					sleep_time = 0;
> +					 ceiling) {
> +						p->sleep_avg = ceiling;
> +						sleep_time = 0;
>  				}
>  			}
>  
> @@ -784,9 +786,9 @@ static int recalc_task_prio(task_t *p, u
>  			 */
>  			p->sleep_avg += sleep_time;
>  
> -			if (p->sleep_avg > NS_MAX_SLEEP_AVG)
> -				p->sleep_avg = NS_MAX_SLEEP_AVG;
>  		}
> +		if (p->sleep_avg > NS_MAX_SLEEP_AVG)
> +			p->sleep_avg = NS_MAX_SLEEP_AVG;
>  	}
>  
>  	return effective_prio(p);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] sched: fix interactive ceiling code
  2006-05-19  1:30                             ` [PATCH] sched: fix interactive ceiling code Con Kolivas
  2006-05-19  2:02                               ` Mike Galbraith
@ 2006-05-19  9:40                               ` Ingo Molnar
  2006-05-19 14:37                               ` Chen, Kenneth W
  2 siblings, 0 replies; 42+ messages in thread
From: Ingo Molnar @ 2006-05-19  9:40 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Chen, Kenneth W, Mike Galbraith, tim.c.chen, linux-kernel,
	Andrew Morton, mbligh


* Con Kolivas <kernel@kolivas.org> wrote:

> Ingo, Andrew, I think these are minor logic fixes and comments that 
> correct a patch that has already been pushed to 2.6.17- and I would 
> like them short circuited to mainline if everyone is comfortable with 
> it.

the patch looks good, and it certainly wont cause stability problems 
(this type of code typically doesnt). It may cause interactivity 
problems, but then again this affects boundary conditions, so i'm fairly 
optimistic. So if we have say more than say 4 days until 2.6.17 i'd 
suggest it for immediate inclusion.

> Signed-off-by: Con Kolivas <kernel@kolivas.org>

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH] sched: fix interactive ceiling code
  2006-05-19  1:30                             ` [PATCH] sched: fix interactive ceiling code Con Kolivas
  2006-05-19  2:02                               ` Mike Galbraith
  2006-05-19  9:40                               ` Ingo Molnar
@ 2006-05-19 14:37                               ` Chen, Kenneth W
  2006-05-19 16:19                                 ` tim_c_chen
  2 siblings, 1 reply; 42+ messages in thread
From: Chen, Kenneth W @ 2006-05-19 14:37 UTC (permalink / raw)
  To: 'Con Kolivas'
  Cc: Mike Galbraith, tim.c.chen, linux-kernel, mingo, Andrew Morton,
	mbligh

Con Kolivas wrote on Thursday, May 18, 2006 6:31 PM
> Ingo, Andrew, I think these are minor logic fixes and comments that correct 
> a patch that has already been pushed to 2.6.17- and I would like them short
> circuited to mainline if everyone is comfortable with it.
>  
> Ken, Mike can I ask you to put a signed off on this patch for your 
> contributions please?

Yup, looks good. Thanks for all the explanation and certainly your patience.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>



> ---
> The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
> and not explicit enough. The sleep boost is not supposed to be any larger
> than without this code and the comment is not clear enough about what exactly
> it does, just the reason it does it. Fix it.
> 
> There is a ceiling to the priority beyond which tasks that only ever sleep
> for very long periods cannot surpass. Fix it.
> 
> Prevent the on-runqueue bonus logic from defeating the idle sleep logic.
> 
> Opportunity to micro-optimise.
> 
> Signed-off-by: Con Kolivas <kernel@kolivas.org>
> 
> ---
>  kernel/sched.c |   52 +++++++++++++++++++++++++++-------------------------
>  1 files changed, 27 insertions(+), 25 deletions(-)
> 
> Index: linux-2.6.17-rc4/kernel/sched.c
> ===================================================================
> --- linux-2.6.17-rc4.orig/kernel/sched.c	2006-05-19 11:25:01.000000000 +1000
> +++ linux-2.6.17-rc4/kernel/sched.c	2006-05-19 11:25:14.000000000 +1000
> @@ -731,33 +731,35 @@ static inline void __activate_idle_task(
>  static int recalc_task_prio(task_t *p, unsigned long long now)
>  {
>  	/* Caller must always ensure 'now >= p->timestamp' */
> -	unsigned long long __sleep_time = now - p->timestamp;
> -	unsigned long sleep_time;
> +	unsigned long sleep_time = now - p->timestamp;
>  
>  	if (batch_task(p))
>  		sleep_time = 0;
> -	else {
> -		if (__sleep_time > NS_MAX_SLEEP_AVG)
> -			sleep_time = NS_MAX_SLEEP_AVG;
> -		else
> -			sleep_time = (unsigned long)__sleep_time;
> -	}
>  
>  	if (likely(sleep_time > 0)) {
>  		/*
> -		 * User tasks that sleep a long time are categorised as
> -		 * idle. They will only have their sleep_avg increased to a
> -		 * level that makes them just interactive priority to stay
> -		 * active yet prevent them suddenly becoming cpu hogs and
> -		 * starving other processes.
> +		 * This ceiling is set to the lowest priority that would allow
> +		 * a task to be reinserted into the active array on timeslice
> +		 * completion.
>  		 */
> -		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
> -				unsigned long ceiling;
> +		unsigned long ceiling = INTERACTIVE_SLEEP(p);
>  
> -				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
> -					DEF_TIMESLICE);
> -				if (p->sleep_avg < ceiling)
> -					p->sleep_avg = ceiling;
> +		if (p->mm && sleep_time > ceiling && p->sleep_avg < ceiling) {
> +			/*
> +			 * Prevents user tasks from achieving best priority
> +			 * with one single large enough sleep.
> +			 */
> +			p->sleep_avg = ceiling;
> +			/*
> +			 * Using INTERACTIVE_SLEEP() as a ceiling places a
> +			 * nice(0) task 1ms sleep away from promotion, and
> +			 * gives it 700ms to round-robin with no chance of
> +			 * being demoted.  This is more than generous, so
> +			 * mark this sleep as non-interactive to prevent the
> +			 * on-runqueue bonus logic from intervening should
> +			 * this task not receive cpu immediately.
> +			 */
> +			p->sleep_type = SLEEP_NONINTERACTIVE;
>  		} else {
>  			/*
>  			 * Tasks waking from uninterruptible sleep are
> @@ -765,12 +767,12 @@ static int recalc_task_prio(task_t *p, u
>  			 * are likely to be waiting on I/O
>  			 */
>  			if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
> -				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
> +				if (p->sleep_avg >= ceiling)
>  					sleep_time = 0;
>  				else if (p->sleep_avg + sleep_time >=
> -						INTERACTIVE_SLEEP(p)) {
> -					p->sleep_avg = INTERACTIVE_SLEEP(p);
> -					sleep_time = 0;
> +					 ceiling) {
> +						p->sleep_avg = ceiling;
> +						sleep_time = 0;
>  				}
>  			}
>  
> @@ -784,9 +786,9 @@ static int recalc_task_prio(task_t *p, u
>  			 */
>  			p->sleep_avg += sleep_time;
>  
> -			if (p->sleep_avg > NS_MAX_SLEEP_AVG)
> -				p->sleep_avg = NS_MAX_SLEEP_AVG;
>  		}
> +		if (p->sleep_avg > NS_MAX_SLEEP_AVG)
> +			p->sleep_avg = NS_MAX_SLEEP_AVG;
>  	}
>  
>  	return effective_prio(p);
> -- 
> -ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH] sched: fix interactive ceiling code
  2006-05-19 14:37                               ` Chen, Kenneth W
@ 2006-05-19 16:19                                 ` tim_c_chen
  0 siblings, 0 replies; 42+ messages in thread
From: tim_c_chen @ 2006-05-19 16:19 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: 'Con Kolivas', Mike Galbraith, tim.c.chen, linux-kernel,
	mingo, Andrew Morton, mbligh

The patch did recover the 4% regression for Volanomark which started the
whole thread in the first place.

Tim

> Con Kolivas wrote on Thursday, May 18, 2006 6:31 PM
>> Ingo, Andrew, I think these are minor logic fixes and comments that
>> correct
>> a patch that has already been pushed to 2.6.17- and I would like them
>> short
>> circuited to mainline if everyone is comfortable with it.
>>
>> Ken, Mike can I ask you to put a signed off on this patch for your
>> contributions please?
>
> Yup, looks good. Thanks for all the explanation and certainly your
> patience.
>
> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
>
>
>
>> ---
>> The relationship between INTERACTIVE_SLEEP and the ceiling is not
>> perfect
>> and not explicit enough. The sleep boost is not supposed to be any
>> larger
>> than without this code and the comment is not clear enough about what
>> exactly
>> it does, just the reason it does it. Fix it.
>>
>> There is a ceiling to the priority beyond which tasks that only ever
>> sleep
>> for very long periods cannot surpass. Fix it.
>>
>> Prevent the on-runqueue bonus logic from defeating the idle sleep logic.
>>
>> Opportunity to micro-optimise.
>>
>> Signed-off-by: Con Kolivas <kernel@kolivas.org>
>>
>> ---
>>  kernel/sched.c |   52
>> +++++++++++++++++++++++++++-------------------------
>>  1 files changed, 27 insertions(+), 25 deletions(-)
>>
>> Index: linux-2.6.17-rc4/kernel/sched.c
>> ===================================================================
>> --- linux-2.6.17-rc4.orig/kernel/sched.c	2006-05-19 11:25:01.000000000
>> +1000
>> +++ linux-2.6.17-rc4/kernel/sched.c	2006-05-19 11:25:14.000000000 +1000
>> @@ -731,33 +731,35 @@ static inline void __activate_idle_task(
>>  static int recalc_task_prio(task_t *p, unsigned long long now)
>>  {
>>  	/* Caller must always ensure 'now >= p->timestamp' */
>> -	unsigned long long __sleep_time = now - p->timestamp;
>> -	unsigned long sleep_time;
>> +	unsigned long sleep_time = now - p->timestamp;
>>
>>  	if (batch_task(p))
>>  		sleep_time = 0;
>> -	else {
>> -		if (__sleep_time > NS_MAX_SLEEP_AVG)
>> -			sleep_time = NS_MAX_SLEEP_AVG;
>> -		else
>> -			sleep_time = (unsigned long)__sleep_time;
>> -	}
>>
>>  	if (likely(sleep_time > 0)) {
>>  		/*
>> -		 * User tasks that sleep a long time are categorised as
>> -		 * idle. They will only have their sleep_avg increased to a
>> -		 * level that makes them just interactive priority to stay
>> -		 * active yet prevent them suddenly becoming cpu hogs and
>> -		 * starving other processes.
>> +		 * This ceiling is set to the lowest priority that would allow
>> +		 * a task to be reinserted into the active array on timeslice
>> +		 * completion.
>>  		 */
>> -		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
>> -				unsigned long ceiling;
>> +		unsigned long ceiling = INTERACTIVE_SLEEP(p);
>>
>> -				ceiling = JIFFIES_TO_NS(MAX_SLEEP_AVG -
>> -					DEF_TIMESLICE);
>> -				if (p->sleep_avg < ceiling)
>> -					p->sleep_avg = ceiling;
>> +		if (p->mm && sleep_time > ceiling && p->sleep_avg < ceiling) {
>> +			/*
>> +			 * Prevents user tasks from achieving best priority
>> +			 * with one single large enough sleep.
>> +			 */
>> +			p->sleep_avg = ceiling;
>> +			/*
>> +			 * Using INTERACTIVE_SLEEP() as a ceiling places a
>> +			 * nice(0) task 1ms sleep away from promotion, and
>> +			 * gives it 700ms to round-robin with no chance of
>> +			 * being demoted.  This is more than generous, so
>> +			 * mark this sleep as non-interactive to prevent the
>> +			 * on-runqueue bonus logic from intervening should
>> +			 * this task not receive cpu immediately.
>> +			 */
>> +			p->sleep_type = SLEEP_NONINTERACTIVE;
>>  		} else {
>>  			/*
>>  			 * Tasks waking from uninterruptible sleep are
>> @@ -765,12 +767,12 @@ static int recalc_task_prio(task_t *p, u
>>  			 * are likely to be waiting on I/O
>>  			 */
>>  			if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
>> -				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
>> +				if (p->sleep_avg >= ceiling)
>>  					sleep_time = 0;
>>  				else if (p->sleep_avg + sleep_time >=
>> -						INTERACTIVE_SLEEP(p)) {
>> -					p->sleep_avg = INTERACTIVE_SLEEP(p);
>> -					sleep_time = 0;
>> +					 ceiling) {
>> +						p->sleep_avg = ceiling;
>> +						sleep_time = 0;
>>  				}
>>  			}
>>
>> @@ -784,9 +786,9 @@ static int recalc_task_prio(task_t *p, u
>>  			 */
>>  			p->sleep_avg += sleep_time;
>>
>> -			if (p->sleep_avg > NS_MAX_SLEEP_AVG)
>> -				p->sleep_avg = NS_MAX_SLEEP_AVG;
>>  		}
>> +		if (p->sleep_avg > NS_MAX_SLEEP_AVG)
>> +			p->sleep_avg = NS_MAX_SLEEP_AVG;
>>  	}
>>
>>  	return effective_prio(p);
>> --
>> -ck
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18  5:52                         ` Con Kolivas
  2006-05-18  7:04                           ` Mike Galbraith
  2006-05-18 23:17                           ` Chen, Kenneth W
@ 2006-05-18 23:34                           ` Chen, Kenneth W
  2006-05-19  1:07                             ` Con Kolivas
  2 siblings, 1 reply; 42+ messages in thread
From: Chen, Kenneth W @ 2006-05-18 23:34 UTC (permalink / raw)
  To: 'Con Kolivas', Mike Galbraith
  Cc: tim.c.chen, linux-kernel, mingo, Andrew Morton

Con Kolivas wrote on Wednesday, May 17, 2006 10:52 PM
> The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
> and not explicit enough. The sleep boost is not supposed to be any larger
> than without this code and the comment is not clear enough about what exactly
> it does, just the reason it does it.
> 
> There is a ceiling to the priority beyond which tasks that only ever sleep
> for very long periods cannot surpass.
> 
> Opportunity to micro-optimise and re-use the ceiling variable.
> 
> --- linux-2.6.17-rc4-mm1.orig/kernel/sched.c	2006-05-17 15:57:49.000000000 +1000
> +++ linux-2.6.17-rc4-mm1/kernel/sched.c	2006-05-18 15:48:47.000000000 +1000
> @@ -925,12 +924,12 @@ static int recalc_task_prio(task_t *p, u
>  			 * are likely to be waiting on I/O
>  			 */
>  			if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
> -				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
> +				if (p->sleep_avg >= ceiling)
>  					sleep_time = 0;
>  				else if (p->sleep_avg + sleep_time >=
> -						INTERACTIVE_SLEEP(p)) {
> -					p->sleep_avg = INTERACTIVE_SLEEP(p);
> -					sleep_time = 0;
> +					 ceiling) {
> +						p->sleep_avg = ceiling;
> +						sleep_time = 0;


Watch for white space damage, last two lines has one extra tab on the indentation.


By the way, there is all kinds of non-linear behavior with priority boost
adjustment:

        if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
                if (p->sleep_avg >= ceiling)
                        sleep_time = 0;
                else if (p->sleep_avg + sleep_time >= ceiling) {
                        p->sleep_avg = ceiling;
                        sleep_time = 0;
                }
        }

For large p->sleep_avg, kernel don't clamp it to ceiling, yet clamp small
incremental sleep.  This all seems very fragile.

- Ken

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-18 23:34                           ` Regression seen for patch "sched:dont decrease idle sleep avg" Chen, Kenneth W
@ 2006-05-19  1:07                             ` Con Kolivas
  0 siblings, 0 replies; 42+ messages in thread
From: Con Kolivas @ 2006-05-19  1:07 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: Mike Galbraith, tim.c.chen, linux-kernel, mingo, Andrew Morton

On Friday 19 May 2006 09:34, Chen, Kenneth W wrote:
> Con Kolivas wrote on Wednesday, May 17, 2006 10:52 PM
>
> > The relationship between INTERACTIVE_SLEEP and the ceiling is not perfect
> > and not explicit enough. The sleep boost is not supposed to be any larger
> > than without this code and the comment is not clear enough about what
> > exactly it does, just the reason it does it.
> >
> > There is a ceiling to the priority beyond which tasks that only ever
> > sleep for very long periods cannot surpass.
> >
> > Opportunity to micro-optimise and re-use the ceiling variable.
> >
> > --- linux-2.6.17-rc4-mm1.orig/kernel/sched.c	2006-05-17
> > 15:57:49.000000000 +1000 +++
> > linux-2.6.17-rc4-mm1/kernel/sched.c	2006-05-18 15:48:47.000000000 +1000
> > @@ -925,12 +924,12 @@ static int recalc_task_prio(task_t *p, u
> >  			 * are likely to be waiting on I/O
> >  			 */
> >  			if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
> > -				if (p->sleep_avg >= INTERACTIVE_SLEEP(p))
> > +				if (p->sleep_avg >= ceiling)
> >  					sleep_time = 0;
> >  				else if (p->sleep_avg + sleep_time >=
> > -						INTERACTIVE_SLEEP(p)) {
> > -					p->sleep_avg = INTERACTIVE_SLEEP(p);
> > -					sleep_time = 0;
> > +					 ceiling) {
> > +						p->sleep_avg = ceiling;
> > +						sleep_time = 0;
>
> Watch for white space damage, last two lines has one extra tab on the
> indentation.

Hmm a component of the if() is up to that tab so it seemed appropriate to 
indent the body further to me.

> By the way, there is all kinds of non-linear behavior with priority boost
> adjustment:
>
>         if (p->sleep_type == SLEEP_NONINTERACTIVE && p->mm) {
>                 if (p->sleep_avg >= ceiling)
>                         sleep_time = 0;
>                 else if (p->sleep_avg + sleep_time >= ceiling) {
>                         p->sleep_avg = ceiling;
>                         sleep_time = 0;
>                 }
>         }
>
> For large p->sleep_avg, kernel don't clamp it to ceiling, yet clamp small
> incremental sleep.  This all seems very fragile.

Yes it is. sleep_avg affecting priority in a linear fashion in the original 
design was basically the reason interactivity was not flexible enough for a 
wide range of workloads. I don't like it much myself any more either, and 
have been maintaining a complete rewrite for some time. However the fact is 
that the number of valid bug reports is very low, so tiny tweaks as issues 
have come up have sufficed so far.

-- 
-ck

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: Regression seen for patch "sched:dont decrease idle sleep avg"
  2006-05-15 19:01       ` Chen, Kenneth W
  2006-05-15 23:45         ` Con Kolivas
@ 2006-05-16  4:07         ` Mike Galbraith
  1 sibling, 0 replies; 42+ messages in thread
From: Mike Galbraith @ 2006-05-16  4:07 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: 'Con Kolivas', tim.c.chen, linux-kernel, mingo,
	Andrew Morton

On Mon, 2006-05-15 at 12:01 -0700, Chen, Kenneth W wrote:

> [patch] sched: update comments in priority calculation w.r.t. implementation.
> 
> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
> 
> --- ./kernel/sched.c.orig	2006-05-15 12:24:02.000000000 -0700
> +++ ./kernel/sched.c	2006-05-15 12:37:16.000000000 -0700
> @@ -746,10 +746,12 @@ static int recalc_task_prio(task_t *p, u
>  	if (likely(sleep_time > 0)) {
>  		/*
>  		 * User tasks that sleep a long time are categorised as
> -		 * idle. They will only have their sleep_avg increased to a
> -		 * level that makes them just interactive priority to stay
> -		 * active yet prevent them suddenly becoming cpu hogs and
> -		 * starving other processes.
> +		 * idle. If they sleep longer than INTERACTIVE_SLEEP, it
> +		 * will have its priority boosted to minimum MAX_BONUS-1.
> +		 * For short sleep, they will only have their sleep_avg
> +		 * increased to a level that makes them just interactive
> +		 * priority to stay active yet prevent them suddenly becoming
> +		 * cpu hogs and starving other processes.
>  		 */
>  		if (p->mm && sleep_time > INTERACTIVE_SLEEP(p)) {
>  				unsigned long ceiling;

This comment still doesn't reflect what the code does.  Please just
delete the last four lines, they are incorrect.

	-Mike


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2006-05-19 16:19 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-08 23:18 Regression seen for patch "sched:dont decrease idle sleep avg" Tim Chen
2006-05-09  0:43 ` Con Kolivas
2006-05-09  1:07   ` Martin Bligh
2006-05-12  0:04   ` Chen, Kenneth W
2006-05-13 12:27     ` Andrew Morton
2006-05-13 13:07       ` Mike Galbraith
2006-05-14 16:03     ` Con Kolivas
2006-05-15 19:01       ` Chen, Kenneth W
2006-05-15 23:45         ` Con Kolivas
2006-05-16  1:22           ` Chen, Kenneth W
2006-05-16  1:44             ` Con Kolivas
2006-05-16  4:10           ` Mike Galbraith
2006-05-16 23:32           ` Tim Chen
2006-05-17  4:25             ` Mike Galbraith
2006-05-17  4:45               ` Peter Williams
2006-05-17  5:24                 ` Mike Galbraith
2006-05-17  8:23             ` Con Kolivas
2006-05-17  9:49               ` Mike Galbraith
2006-05-17 10:25                 ` Con Kolivas
2006-05-17 11:42                   ` Mike Galbraith
2006-05-17 12:46                     ` Con Kolivas
2006-05-17 13:41                       ` Mike Galbraith
2006-05-17 15:10                         ` Con Kolivas
2006-05-17 17:21                           ` Ray Lee
2006-05-17 19:33               ` Chen, Kenneth W
2006-05-18  0:35                 ` Con Kolivas
2006-05-18  1:10                   ` Chen, Kenneth W
2006-05-18  1:38                     ` Con Kolivas
2006-05-18  5:44                       ` Mike Galbraith
2006-05-18  5:52                         ` Con Kolivas
2006-05-18  7:04                           ` Mike Galbraith
2006-05-18 12:59                             ` Mike Galbraith
2006-05-19  1:10                               ` Con Kolivas
2006-05-18 23:17                           ` Chen, Kenneth W
2006-05-19  1:30                             ` [PATCH] sched: fix interactive ceiling code Con Kolivas
2006-05-19  2:02                               ` Mike Galbraith
2006-05-19  9:40                               ` Ingo Molnar
2006-05-19 14:37                               ` Chen, Kenneth W
2006-05-19 16:19                                 ` tim_c_chen
2006-05-18 23:34                           ` Regression seen for patch "sched:dont decrease idle sleep avg" Chen, Kenneth W
2006-05-19  1:07                             ` Con Kolivas
2006-05-16  4:07         ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox