[sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
@ 2008-06-10 22:58 Dmitry Adamushko
  2008-06-11  8:53 ` Peter Zijlstra
  2008-06-18 10:44 ` Ingo Molnar
  0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Adamushko @ 2008-06-10 22:58 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	linux-kernel


Hi Gregory,


regarding this commit: 45c01e824991b2dd0a332e19efc4901acb31209f


I think we can do it simpler. Please take a look at the patch below.

Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and twice so on x86_64),
let's add "exclusive" (the ones that are bound to this CPU) tasks to the head of the queue
and "shared" ones -- to the end.

In case of a few newly woken up "exclusive" tasks, they are 'stacked' (not queued as now), meaning that
a task {i+1} is being placed in front of the previously woken up task {i}. But I don't think that
this behavior may cause any realistic problems.


There are a couple of changes on top of this one.

(1) in check_preempt_curr_rt()

I don't think there is a need for the "pick_next_rt_entity(rq, &rq->rt) != &rq->curr->rt" check.

enqueue_task_rt(p) and check_preempt_curr_rt() are always called one after another with rq->lock being held so
the following check "p->rt.nr_cpus_allowed == 1 && rq->curr->rt.nr_cpus_allowed != 1"
should be enough (well, just its left part) to guarantee that 'p' has been queued in front of the 'curr'.

(2) in set_cpus_allowed_rt()

I don't thinks there is a need for requeue_task_rt() here.

Perhaps, the only case when 'requeue' (+ reschedule) might be useful is as follows:

i) weight == 1 && cpu_isset(task_cpu(p), *new_mask)

i.e. a task is being bound to this CPU);

ii) 'p' != rq->curr

but here, 'p' has already been on this CPU for a while and was not migrated. i.e. it's possible that 'rq->curr'
would not have high chances to be migrated right at this particular moment (although, has chance in a bit longer term),
should we allow it to be preempted.


Anyway, I think we should not perhaps make it more complex trying to address some rare corner cases.
For instance, that's why a single queue approach would be preferable. Unless I'm missing something obvious,
this approach gives us similar functionality at lower cost.


Verified only compilation-wise.


(Almost)-Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>

---

diff --git a/kernel/sched.c b/kernel/sched.c
index 268a850..e1c382d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -155,8 +155,7 @@ static inline int task_has_rt_policy(struct task_struct *p)
  */
 struct rt_prio_array {
 	DECLARE_BITMAP(bitmap, MAX_RT_PRIO+1); /* include 1 bit for delimiter */
-	struct list_head xqueue[MAX_RT_PRIO]; /* exclusive queue */
-	struct list_head squeue[MAX_RT_PRIO];  /* shared queue */
+	struct list_head queue[MAX_RT_PRIO];
 };
 
 struct rt_bandwidth {
@@ -7661,8 +7660,7 @@ static void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
 
 	array = &rt_rq->active;
 	for (i = 0; i < MAX_RT_PRIO; i++) {
-		INIT_LIST_HEAD(array->xqueue + i);
-		INIT_LIST_HEAD(array->squeue + i);
+		INIT_LIST_HEAD(array->queue + i);
 		__clear_bit(i, array->bitmap);
 	}
 	/* delimiter for bitsearch: */
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 49a225e..2437504 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -576,16 +576,15 @@ static void enqueue_rt_entity(struct sched_rt_entity *rt_se)
 	struct rt_rq *rt_rq = rt_rq_of_se(rt_se);
 	struct rt_prio_array *array = &rt_rq->active;
 	struct rt_rq *group_rq = group_rt_rq(rt_se);
+	struct list_head *queue = array->queue + rt_se_prio(rt_se);
 
 	if (group_rq && rt_rq_throttled(group_rq))
 		return;
 
 	if (rt_se->nr_cpus_allowed == 1)
-		list_add_tail(&rt_se->run_list,
-			      array->xqueue + rt_se_prio(rt_se));
+		list_add(&rt_se->run_list, queue);
 	else
-		list_add_tail(&rt_se->run_list,
-			      array->squeue + rt_se_prio(rt_se));
+		list_add_tail(&rt_se->run_list, queue);
 
 	__set_bit(rt_se_prio(rt_se), array->bitmap);
 
@@ -598,8 +597,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se)
 	struct rt_prio_array *array = &rt_rq->active;
 
 	list_del_init(&rt_se->run_list);
-	if (list_empty(array->squeue + rt_se_prio(rt_se))
-	    && list_empty(array->xqueue + rt_se_prio(rt_se)))
+	if (list_empty(array->queue + rt_se_prio(rt_se)))
 		__clear_bit(rt_se_prio(rt_se), array->bitmap);
 
 	dec_rt_tasks(rt_se, rt_rq);
@@ -666,11 +664,6 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int sleep)
 /*
  * Put task to the end of the run list without the overhead of dequeue
  * followed by enqueue.
- *
- * Note: We always enqueue the task to the shared-queue, regardless of its
- * previous position w.r.t. exclusive vs shared.  This is so that exclusive RR
- * tasks fairly round-robin with all tasks on the runqueue, not just other
- * exclusive tasks.
  */
 static
 void requeue_rt_entity(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
@@ -678,7 +671,7 @@ void requeue_rt_entity(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se)
 	struct rt_prio_array *array = &rt_rq->active;
 
 	list_del_init(&rt_se->run_list);
-	list_add_tail(&rt_se->run_list, array->squeue + rt_se_prio(rt_se));
+	list_add_tail(&rt_se->run_list, array->queue + rt_se_prio(rt_se));
 }
 
 static void requeue_task_rt(struct rq *rq, struct task_struct *p)
@@ -736,9 +729,6 @@ static int select_task_rq_rt(struct task_struct *p, int sync)
 }
 #endif /* CONFIG_SMP */
 
-static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
-						   struct rt_rq *rt_rq);
-
 /*
  * Preempt the current task with a newly woken task if needed:
  */
@@ -764,8 +754,7 @@ static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p)
 	 */
 	if((p->prio == rq->curr->prio)
 	   && p->rt.nr_cpus_allowed == 1
-	   && rq->curr->rt.nr_cpus_allowed != 1
-	   && pick_next_rt_entity(rq, &rq->rt) != &rq->curr->rt) {
+	   && rq->curr->rt.nr_cpus_allowed != 1) {
 		cpumask_t mask;
 
 		if (cpupri_find(&rq->rd->cpupri, rq->curr, &mask))
@@ -789,15 +778,8 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
 	idx = sched_find_first_bit(array->bitmap);
 	BUG_ON(idx >= MAX_RT_PRIO);
 
-	queue = array->xqueue + idx;
-	if (!list_empty(queue))
-		next = list_entry(queue->next, struct sched_rt_entity,
-				  run_list);
-	else {
-		queue = array->squeue + idx;
-		next = list_entry(queue->next, struct sched_rt_entity,
-				  run_list);
-	}
+	queue = array->queue + idx;
+	next = list_entry(queue->next, struct sched_rt_entity, run_list);
 
 	return next;
 }
@@ -867,7 +849,7 @@ static struct task_struct *pick_next_highest_task_rt(struct rq *rq, int cpu)
 			continue;
 		if (next && next->prio < idx)
 			continue;
-		list_for_each_entry(rt_se, array->squeue + idx, run_list) {
+		list_for_each_entry(rt_se, array->queue + idx, run_list) {
 			struct task_struct *p = rt_task_of(rt_se);
 			if (pick_rt_task(rq, p, cpu)) {
 				next = p;
@@ -1249,14 +1231,6 @@ static void set_cpus_allowed_rt(struct task_struct *p,
 		}
 
 		update_rt_migration(rq);
-
-		if (unlikely(weight == 1 || p->rt.nr_cpus_allowed == 1))
-			/*
-			 * If either the new or old weight is a "1", we need
-			 * to requeue to properly move between shared and
-			 * exclusive queues.
-			 */
-			requeue_task_rt(rq, p);
 	}
 
 	p->cpus_allowed    = *new_mask;


---


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-06-10 22:58 [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones" Dmitry Adamushko
@ 2008-06-11  8:53 ` Peter Zijlstra
  2008-06-11 10:05   ` Dmitry Adamushko
  2008-06-18 10:44 ` Ingo Molnar
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2008-06-11  8:53 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Gregory Haskins, Steven Rostedt, Ingo Molnar, Thomas Gleixner,
	linux-kernel

On Wed, 2008-06-11 at 00:58 +0200, Dmitry Adamushko wrote:
> Hi Gregory,
> 
> 
> regarding this commit: 45c01e824991b2dd0a332e19efc4901acb31209f
> 
> 
> I think we can do it simpler. Please take a look at the patch below.
> 
> Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and twice so on x86_64),
> let's add "exclusive" (the ones that are bound to this CPU) tasks to the head of the queue
> and "shared" ones -- to the end.
> 
> In case of a few newly woken up "exclusive" tasks, they are 'stacked' (not queued as now), meaning that
> a task {i+1} is being placed in front of the previously woken up task {i}. But I don't think that
> this behavior may cause any realistic problems.

Doesn't this violate POSIX ?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-06-11  8:53 ` Peter Zijlstra
@ 2008-06-11 10:05   ` Dmitry Adamushko
  2008-06-13 13:08     ` Peter Zijlstra
  2008-06-16 14:26     ` Gregory Haskins
  0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Adamushko @ 2008-06-11 10:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gregory Haskins, Steven Rostedt, Ingo Molnar, Thomas Gleixner,
	linux-kernel

2008/6/11 Peter Zijlstra <a.p.zijlstra@chello.nl>:
> On Wed, 2008-06-11 at 00:58 +0200, Dmitry Adamushko wrote:
>> Hi Gregory,
>>
>>
>> regarding this commit: 45c01e824991b2dd0a332e19efc4901acb31209f
>>
>>
>> I think we can do it simpler. Please take a look at the patch below.
>>
>> Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and twice so on x86_64),
>> let's add "exclusive" (the ones that are bound to this CPU) tasks to the head of the queue
>> and "shared" ones -- to the end.
>>
>> In case of a few newly woken up "exclusive" tasks, they are 'stacked' (not queued as now), meaning that
>> a task {i+1} is being placed in front of the previously woken up task {i}. But I don't think that
>> this behavior may cause any realistic problems.
>
> Doesn't this violate POSIX ?
>

If so, then the idea of "prioritize non-migratable tasks over
migratable ones" violates it, not just an artefact of this particular
implementation.

No matter which implementation is used, we have a situation when a
woken-up single-CPU-bound task (let's call it 'p') can preempt a
current task with effects as follows:

- 'current' is not guaranteed to get another CPU;

- there might have been other pending tasks (of equal prio) on this
queue. As a result, 'p' starts running before them violating currently
used (explicitly requested by POSIX?) round-robin behavior.

Anyway, the original (and my rework) algorithm is somewhat
sub-optimal. It considers only "p->rt.nr_cpus_allowed == 1", while the
ideal algorithm would behave as follows:

1) we get a newly woken-up task on CPU_i;

2) this task have a number (!= 1) of allowed CPUs (which is a sub-set
of all) _but_ all of them are busy at the moment serving high prio
tasks;

3) there are no other pending tasks on the queue (of equal prio) that
are in the same condition as 'p' (that's they can't be migrated to
other CPUs at the moment and to wait for CPU_i is the 'best' effort
for them);

4) 'current' is guaranteed to start running immediatelly on another
CPU (apparently, its affinity allows CPUs that can be used neither by
'p', nor by tasks from (3) ), should we preempt it now.

5) then let 'p' preempt 'current'.

We do cause some 'troubles' (in a way of 'cache' efficienty) for the
'current' but probably we win CPU-time-distribution-wise.

Note, hard real-time tasks would not like to find themselves in the
aforemnetioned situation. But then, one probably should have initially
taken care of assigning different prios or better off, isolating
dedicated CPUs for some tasks wrt how much CPU-time they
consume/latency they expect.

All in all, I think that the initial implementation is sub-optimal in
any case and it's not worth introducing additional overhead (another
priority queue array) to address this issue (which is kind of a corner
case).

We may just consider dropping this idea completely.
(my 0.02$)

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-06-11 10:05   ` Dmitry Adamushko
@ 2008-06-13 13:08     ` Peter Zijlstra
  2008-06-16 14:26     ` Gregory Haskins
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2008-06-13 13:08 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Gregory Haskins, Steven Rostedt, Ingo Molnar, Thomas Gleixner,
	linux-kernel

On Wed, 2008-06-11 at 12:05 +0200, Dmitry Adamushko wrote:
> 2008/6/11 Peter Zijlstra <a.p.zijlstra@chello.nl>:
> > On Wed, 2008-06-11 at 00:58 +0200, Dmitry Adamushko wrote:
> >> Hi Gregory,
> >>
> >>
> >> regarding this commit: 45c01e824991b2dd0a332e19efc4901acb31209f
> >>
> >>
> >> I think we can do it simpler. Please take a look at the patch below.
> >>
> >> Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and twice so on x86_64),
> >> let's add "exclusive" (the ones that are bound to this CPU) tasks to the head of the queue
> >> and "shared" ones -- to the end.
> >>
> >> In case of a few newly woken up "exclusive" tasks, they are 'stacked' (not queued as now), meaning that
> >> a task {i+1} is being placed in front of the previously woken up task {i}. But I don't think that
> >> this behavior may cause any realistic problems.
> >
> > Doesn't this violate POSIX ?
> >
> 
> If so, then the idea of "prioritize non-migratable tasks over
> migratable ones" violates it, not just an artefact of this particular
> implementation.
> 
> No matter which implementation is used, we have a situation when a
> woken-up single-CPU-bound task (let's call it 'p') can preempt a
> current task with effects as follows:
> 
> - 'current' is not guaranteed to get another CPU;
> 
> - there might have been other pending tasks (of equal prio) on this
> queue. As a result, 'p' starts running before them violating currently
> used (explicitly requested by POSIX?) round-robin behavior.

> We may just consider dropping this idea completely.
> (my 0.02$)

If we cannot guarantee POSIX compliant scheduling I think we should get
rid of this.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks over migratable ones"
       [not found] <48524EDA0200005A00038BA4@sinclair.provo.novell.com>
@ 2008-06-13 14:41 ` Gregory Haskins
  0 siblings, 0 replies; 17+ messages in thread
From: Gregory Haskins @ 2008-06-13 14:41 UTC (permalink / raw)
  To: a.p.zijlstra, dmitry.adamushko; +Cc: mingo, rostedt, tglx, linux-kernel

Sorry for topposting....still on vaca via blackberry..

I am in favor of dropping too...this patch was really just an RFC in response to Dmity's observation....its probably too premature to include this type of thing, if ever at all..

-Greg 
-----Original Message-----
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
To: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Gregory Haskins <GHaskins@novell.com>
Cc:  <linux-kernel@vger.kernel.org>

Sent: 6/13/2008 7:08:04 AM
Subject: Re: [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks over migratable ones"

On Wed, 2008-06-11 at 12:05 +0200, Dmitry Adamushko wrote:
> 2008/6/11 Peter Zijlstra <a.p.zijlstra@chello.nl>:
> > On Wed, 2008-06-11 at 00:58 +0200, Dmitry Adamushko wrote:
> >> Hi Gregory,
> >>
> >>
> >> regarding this commit: 45c01e824991b2dd0a332e19efc4901acb31209f
> >>
> >>
> >> I think we can do it simpler. Please take a look at the patch below.
> >>
> >> Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and twice so on x86_64),
> >> let's add "exclusive" (the ones that are bound to this CPU) tasks to the head of the queue
> >> and "shared" ones -- to the end.
> >>
> >> In case of a few newly woken up "exclusive" tasks, they are 'stacked' (not queued as now), meaning that
> >> a task {i+1} is being placed in front of the previously woken up task {i}. But I don't think that
> >> this behavior may cause any realistic problems.
> >
> > Doesn't this violate POSIX ?
> >
> 
> If so, then the idea of "prioritize non-migratable tasks over
> migratable ones" violates it, not just an artefact of this particular
> implementation.
> 
> No matter which implementation is used, we have a situation when a
> woken-up single-CPU-bound task (let's call it 'p') can preempt a
> current task with effects as follows:
> 
> - 'current' is not guaranteed to get another CPU;
> 
> - there might have been other pending tasks (of equal prio) on this
> queue. As a result, 'p' starts running before them violating currently
> used (explicitly requested by POSIX?) round-robin behavior.

> We may just consider dropping this idea completely.
> (my 0.02$)

If we cannot guarantee POSIX compliant scheduling I think we should get
rid of this.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-06-11 10:05   ` Dmitry Adamushko
  2008-06-13 13:08     ` Peter Zijlstra
@ 2008-06-16 14:26     ` Gregory Haskins
  2008-06-16 17:59       ` Dmitry Adamushko
  1 sibling, 1 reply; 17+ messages in thread
From: Gregory Haskins @ 2008-06-16 14:26 UTC (permalink / raw)
  To: Peter Zijlstra, Dmitry Adamushko
  Cc: Ingo Molnar, Steven Rostedt, Thomas Gleixner, linux-kernel

Hi Dmitry,
  Just catching up on mail.  I haven't had a chance to review your patch, so I am only responding to the comments in this thread.

>>> On Wed, Jun 11, 2008 at  6:05 AM, in message
<b647ffbd0806110305p2be38608iadc1d2e91e3be57a@mail.gmail.com>, "Dmitry
Adamushko" <dmitry.adamushko@gmail.com> wrote: 
> 2008/6/11 Peter Zijlstra <a.p.zijlstra@chello.nl>:
>> On Wed, 2008-06-11 at 00:58 +0200, Dmitry Adamushko wrote:
>>> Hi Gregory,
>>>
>>>
>>> regarding this commit: 45c01e824991b2dd0a332e19efc4901acb31209f
>>>
>>>
>>> I think we can do it simpler. Please take a look at the patch below.
>>>
>>> Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and 
> twice so on x86_64),
>>> let's add "exclusive" (the ones that are bound to this CPU) tasks to the 
> head of the queue
>>> and "shared" ones -- to the end.
>>>
>>> In case of a few newly woken up "exclusive" tasks, they are 'stacked' (not 
> queued as now), meaning that
>>> a task {i+1} is being placed in front of the previously woken up task {i}. 
> But I don't think that
>>> this behavior may cause any realistic problems.
>>
>> Doesn't this violate POSIX ?
>>
> 
> If so, then the idea of "prioritize non-migratable tasks over
> migratable ones" violates it, not just an artefact of this particular
> implementation.

This isn't entirely accurate though.  Your proposal does alter the behavior from the original patch.  In fact, I put in the xqueue/squeue concept specifically to *avoid* the condition that you are creating.  I think this was the point Peter was making.

However, to your point: I can agree that either implementation creates the possibility for creating strange scheduling artifacts.  And I think the point you are making is that once we have the artifact, whats another one? ;)

> 
> No matter which implementation is used, we have a situation when a
> woken-up single-CPU-bound task (let's call it 'p') can preempt a
> current task with effects as follows:
> 
> - 'current' is not guaranteed to get another CPU;

Right.  I admit this is one of the sticking points that has potential for creating artifacts.  [1] The downside is there is no real way to avoid this.  You can't atomically guarantee a position on another core until current has been preempted.  So its always going to be a best-effort assessment of the state of the system prior to the preemption.  Not ideal.

Fortunately, this is all w.r.t. equal prio tasks.  And as you state below, if this was an issue the designer should have placed the tasks at different prios.  This fact is what led me to ultimately feel somewhat comfortable with the concept.  [2] A related issue is the potential to have an unbounded task be ping-ponged around (ehem) unbounded. ;)

> 
> - there might have been other pending tasks (of equal prio) on this
> queue. As a result, 'p' starts running before them violating currently
> used (explicitly requested by POSIX?) round-robin behavior.

Yes, except your patch also allows other bound tasks to preempt each other, which adds an additional artifact that is not present in mine nor is it strictly necessary.  [3] You could probably work around this by having the enqueue_task_rt() seek out the first non-bound task position and insert there, instead of HEAD.  In fact I had contemplated doing exactly that, but then decided the memory/speed tradeoff with the xqueue/squeue was better.  I am not against going the other way if the memory cost of the xqueue is too great.

> 
> Anyway, the original (and my rework) algorithm is somewhat
> sub-optimal. It considers only "p->rt.nr_cpus_allowed == 1", while the
> ideal algorithm would behave as follows:
> 
> 1) we get a newly woken-up task on CPU_i;
> 
> 2) this task have a number (!= 1) of allowed CPUs (which is a sub-set
> of all) _but_ all of them are busy at the moment serving high prio
> tasks;

I don't understand the significance of the stipulation that its a "sub-set of all".  Can you elaborate?  Or did you mean to say "it *may* be a sub-set of all"?

> 
> 3) there are no other pending tasks on the queue (of equal prio) that
> are in the same condition as 'p' (that's they can't be migrated to
> other CPUs at the moment and to wait for CPU_i is the 'best' effort
> for them);

Note that this is already accounted for with xqueue/squeue, or with my proposed modifications [3] above

> 
> 4) 'current' is guaranteed to start running immediatelly on another
> CPU (apparently, its affinity allows CPUs that can be used neither by
> 'p', nor by tasks from (3) ), should we preempt it now.

As per [1], we unfortunately can never guarantee this.

> 
> 5) then let 'p' preempt 'current'.

I think everything except your (4) is achieved in my patch, and could be in yours with the slight modification proposed in [3].  So the real question is whether the lack of your point (4) or the issue I mentioned in [2] is acceptable or not.

> 
> We do cause some 'troubles' (in a way of 'cache' efficienty) for the
> 'current' but probably we win CPU-time-distribution-wise.

True, but this in essence is what load-balancing is all about.

> 
> Note, hard real-time tasks would not like to find themselves in the
> aforemnetioned situation. But then, one probably should have initially
> taken care of assigning different prios or better off, isolating
> dedicated CPUs for some tasks wrt how much CPU-time they
> consume/latency they expect.

Right.  The key is that bets are off when considering equal-prio tasks.  Wake-up order, etc, can create similar concerns.  That is what lets this concept be at least semi acceptable.

> 
> 
> All in all, I think that the initial implementation is sub-optimal

Ouch :)

> in
> any case and it's not worth introducing additional overhead (another
> priority queue array) to address this issue (which is kind of a corner
> case).
> 
> We may just consider dropping this idea completely.
> (my 0.02$)

And based on all this (and as I previously stated) I agree with your assessment that perhaps we should just drop this concept outright.

Regards,
-Greg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-06-16 14:26     ` Gregory Haskins
@ 2008-06-16 17:59       ` Dmitry Adamushko
  2008-06-16 18:44         ` Gregory Haskins
  2008-06-16 19:17         ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Adamushko @ 2008-06-16 17:59 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Thomas Gleixner,
	linux-kernel

Hi Gregory,

> [ ... ]
>>> Doesn't this violate POSIX ?
>>>
>>
>> If so, then the idea of "prioritize non-migratable tasks over
>> migratable ones" violates it, not just an artefact of this particular
>> implementation.
>
> This isn't entirely accurate though.  Your proposal does alter the behavior from the original patch.  In fact, I put in the xqueue/squeue concept specifically to *avoid* the condition that you are creating.  I think this was the point Peter was making.
>
> However, to your point: I can agree that either implementation creates the possibility for creating strange scheduling artifacts.  And I think the point you are making is that once we have the artifact, whats another one? ;)

My guess was that Peter points out at the possibility of task_2
getting ahead of task_1, despite task_1's longer presence on the
run-queue and both tasks having equal priority.

IOW., it's not anymore a round-robin distribution for equal-prio
tasks (if that's how POSIX requires it... yeah, I'm speculating here
:-) but "migratable" vs. "non-migratable" in addition.

With your implementation, a "non-migratable" task gets ahead of
"migratable" (and possibly, some "migratable" that are on the queue
but for some reason have not been pulled/pushed yet).

With mine, one "non-migratable" can preempt another "non-migratable".

Partly, it could be fixed by : add a "non-migratable" task to the head
of the queue _only_ if the "current" task (or better -- if a
head-of-the-queue task) is "migratable".

One way or another, we have different aritifacts (and mine have likely
more) but conceptually, both "violates" POSIX if a strict round-robin
scheduling is required.

By "strict round-robin scheduling" I mean something as follows:

a woken-up task_N can get a CPU _only_ if none of other currently
pending tasks in the system (that's on all cpus and, sure, _RR and
_FIFO and of equal prio) can run on this CPU.

e.g. the following behavior of the load-balancer would be wrong (migth
have been possible with the 'old' scheduler/ load-balancer):

- task_1 is placed on CPU_N upon a wake-up with wake_idle();

- however, there were task_2 (currently running) and task_3 (pending) on CPU_M;

- task_3 was able to run on CPU_N _but_ has not been considered by the
load-balancer yet (let's say CPU_M became free just a moment ago, not
enough for the load-balancer to notice it).

i.e. task_1 gets ahead of task_3, although both have equal prios and
both are able to run on this cpu.

yeah, enough speculations :-)

>
>> Anyway, the original (and my rework) algorithm is somewhat
>> sub-optimal. It considers only "p->rt.nr_cpus_allowed == 1", while the
>> ideal algorithm would behave as follows:
>>
>> 1) we get a newly woken-up task on CPU_i;
>>
>> 2) this task have a number (!= 1) of allowed CPUs (which is a sub-set
>> of all) _but_ all of them are busy at the moment serving high prio
>> tasks;
>
> I don't understand the significance of the stipulation that its a "sub-set of all".  Can you elaborate?  Or did you mean to say "it *may* be a sub-set of all"?

it should have been "rt_se->nr_cpus_allowed not _necessarily_ equal to
1", meaning rt_se->nr_cpus_allowed is a sub-set of ALL_CPUS, not
strictly a single cpu.

We consider a task to be "bound" if

"rt_se->nr_cpus_allowed == 1"

i.e. a task needs to be bound to a _single_ cpu.

what I mean is e.g.

say, there are 4 cpus and cpu_0 and cpu_1 are busy serving task_0 and task_1;

there are also task_3 and task_4 which have equal prios _but_ lower
(worse) than those of task_0 and task_1.

task_3 can run on _any_ cpu but task_4 can run only on 2 of them --
cpu_1 and cpu_3.

task_3 is running on cpu_3

task_4 is woken up... with your/mine implementation it won't get cpu_3
as it's _not_ considered "bound" (rt_se->nr_cpus_allowed == 2)

while in fact, task_4 still might have been preempted and moved onto cpu_4.

that's why I called the current (mine is similar)
definition/implementation of "bound" (or "non-migratable" in my terms
above) sub-optimal.

hope it's clear now (and none of the important details are escaping me) :-)

>> All in all, I think that the initial implementation is sub-optimal
>
> Ouch :)

ok, mine is no way better :-)

repeating myself, they address just a specific case
(rt_se->nr_cpus_allowed == 1) of the broader "problem" which I
describe above.

>
> Regards,
> -Greg
>

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-06-16 17:59       ` Dmitry Adamushko
@ 2008-06-16 18:44         ` Gregory Haskins
  2008-06-16 19:17         ` Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Gregory Haskins @ 2008-06-16 18:44 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Thomas Gleixner,
	linux-kernel

>>> On Mon, Jun 16, 2008 at  1:59 PM, in message
<b647ffbd0806161059q7048ea40id0f4235ab19d6ca0@mail.gmail.com>, "Dmitry
Adamushko" <dmitry.adamushko@gmail.com> wrote: 

[snip]

> that's why I called the current (mine is similar)
> definition/implementation of "bound" (or "non-migratable" in my terms
> above) sub-optimal.
> 
> hope it's clear now (and none of the important details are escaping me) :-)

Indeed.  I understand your point now.  To summarize: nr_cpus_allowed is not sufficient to determine if this "load-balancing" operation should be attempted.  Rather, we need to consider whether task_4->cpus_allowed results in no preemption when looking at [1,3], but task_3->cpus_allowed shows the cpu_4 could be preempted.

This problem is easily and efficiently solvable,  and I think your "HEAD" approach is better than my xqueue/squeue idea.  Now the question remains: "Is the whole concept worth it, or should we drop it?"

If we wanted to fix this, we could run both current and the wakee into cpupri for the case where wakee->prio == rq->highest_prio (*).  If wakee comes back with 0 targets but rq->HEAD (*) comes back with potential targets, then wakee should insert to the HEAD and preempt.  Otherwise, tail-insert as always.  (We will still race against the potential targets becoming unavailable to accept current, but as I indicated in the last message, this is unavoidable).

(*) Note that its important to use rq->highest_prio/HEAD and not rq->current so that we don't look at the state of the queue while in the process of rescheduling

Thanks for clarifying, Dmitry.  I think you are right.

Regards,
-Greg

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-06-16 17:59       ` Dmitry Adamushko
  2008-06-16 18:44         ` Gregory Haskins
@ 2008-06-16 19:17         ` Peter Zijlstra
  2008-06-16 19:54           ` [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks " Gregory Haskins
  2008-07-01 10:46           ` [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks " Dmitry Adamushko
  1 sibling, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2008-06-16 19:17 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Gregory Haskins, Ingo Molnar, Steven Rostedt, Thomas Gleixner,
	linux-kernel

On Mon, 2008-06-16 at 19:59 +0200, Dmitry Adamushko wrote:

> One way or another, we have different aritifacts (and mine have likely
> more) but conceptually, both "violates" POSIX if a strict round-robin
> scheduling is required.

http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html#tag_02_08_04_01

Is quite strict on what FIFO should do, and I know of two points where
we deviate and should work to match.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks over migratable ones"
  2008-06-16 19:17         ` Peter Zijlstra
@ 2008-06-16 19:54           ` Gregory Haskins
  2008-06-18 10:39             ` Ingo Molnar
  2008-07-01 10:46           ` [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks " Dmitry Adamushko
  1 sibling, 1 reply; 17+ messages in thread
From: Gregory Haskins @ 2008-06-16 19:54 UTC (permalink / raw)
  To: Peter Zijlstra, Dmitry Adamushko
  Cc: Ingo Molnar, Steven Rostedt, Thomas Gleixner, linux-kernel

>>> On Mon, Jun 16, 2008 at  3:17 PM, in message
<1213643862.16944.142.camel@twins>, Peter Zijlstra <a.p.zijlstra@chello.nl>
wrote: 
> On Mon, 2008-06-16 at 19:59 +0200, Dmitry Adamushko wrote:
> 
>> One way or another, we have different aritifacts (and mine have likely
>> more) but conceptually, both "violates" POSIX if a strict round-robin
>> scheduling is required.
> 
> http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html#t
> ag_02_08_04_01
> 
> Is quite strict on what FIFO should do, and I know of two points where
> we deviate and should work to match.

Thanks for the link, Peter.  When you read that, its pretty clear that this whole concept violates the standard.  Its probably best to just revert the patch and be done with it.

Regards,
-Greg


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks over migratable ones"
  2008-06-16 19:54           ` [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks " Gregory Haskins
@ 2008-06-18 10:39             ` Ingo Molnar
  2008-06-18 10:47               ` Peter Zijlstra
                                 ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Ingo Molnar @ 2008-06-18 10:39 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Peter Zijlstra, Dmitry Adamushko, Steven Rostedt, Thomas Gleixner,
	linux-kernel


* Gregory Haskins <ghaskins@novell.com> wrote:

> >>> On Mon, Jun 16, 2008 at  3:17 PM, in message
> <1213643862.16944.142.camel@twins>, Peter Zijlstra <a.p.zijlstra@chello.nl>
> wrote: 
> > On Mon, 2008-06-16 at 19:59 +0200, Dmitry Adamushko wrote:
> > 
> >> One way or another, we have different aritifacts (and mine have likely
> >> more) but conceptually, both "violates" POSIX if a strict round-robin
> >> scheduling is required.
> > 
> > http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html#t
> > ag_02_08_04_01
> > 
> > Is quite strict on what FIFO should do, and I know of two points where
> > we deviate and should work to match.
> 
> Thanks for the link, Peter.  When you read that, its pretty clear that 
> this whole concept violates the standard.  Its probably best to just 
> revert the patch and be done with it.

no, there's no spec violation here - the spec is silent on SMP issues.

the spec should not be read to force a global runqueue for RT tasks. 
That would be silly beyond imagination.

so ... lets apply Dmitry's nice simplification, hm?

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-06-10 22:58 [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones" Dmitry Adamushko
  2008-06-11  8:53 ` Peter Zijlstra
@ 2008-06-18 10:44 ` Ingo Molnar
  1 sibling, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2008-06-18 10:44 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Gregory Haskins, Steven Rostedt, Thomas Gleixner, Peter Zijlstra,
	linux-kernel


* Dmitry Adamushko <dmitry.adamushko@gmail.com> wrote:

> Hi Gregory,
> 
> regarding this commit: 45c01e824991b2dd0a332e19efc4901acb31209f
> 
> I think we can do it simpler. Please take a look at the patch below.

applied your simplification to tip/sched-devel for testing. It still 
maintains strict FIFO queueing on uniprocessors, correct? (which is 
pretty much the only standards question we care about)

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks over migratable ones"
  2008-06-18 10:39             ` Ingo Molnar
@ 2008-06-18 10:47               ` Peter Zijlstra
  2008-06-18 11:52               ` [sched-devel, patch-rfc] rework of "prioritizenon-migratabletasks " Gregory Haskins
  2008-06-18 11:58               ` [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks " Dmitry Adamushko
  2 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2008-06-18 10:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gregory Haskins, Dmitry Adamushko, Steven Rostedt,
	Thomas Gleixner, linux-kernel

On Wed, 2008-06-18 at 12:39 +0200, Ingo Molnar wrote:
> * Gregory Haskins <ghaskins@novell.com> wrote:
> 
> > >>> On Mon, Jun 16, 2008 at  3:17 PM, in message
> > <1213643862.16944.142.camel@twins>, Peter Zijlstra <a.p.zijlstra@chello.nl>
> > wrote: 
> > > On Mon, 2008-06-16 at 19:59 +0200, Dmitry Adamushko wrote:
> > > 
> > >> One way or another, we have different aritifacts (and mine have likely
> > >> more) but conceptually, both "violates" POSIX if a strict round-robin
> > >> scheduling is required.
> > > 
> > > http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html#t
> > > ag_02_08_04_01
> > > 
> > > Is quite strict on what FIFO should do, and I know of two points where
> > > we deviate and should work to match.
> > 
> > Thanks for the link, Peter.  When you read that, its pretty clear that 
> > this whole concept violates the standard.  Its probably best to just 
> > revert the patch and be done with it.
> 
> no, there's no spec violation here - the spec is silent on SMP issues.
> 
> the spec should not be read to force a global runqueue for RT tasks. 
> That would be silly beyond imagination.

Sadly, some people do read it like that.

> so ... lets apply Dmitry's nice simplification, hm?

As long as it doesn't wreck the per RQ queue model and only affects the
SMP interaction that would be acceptable I guess.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritizenon-migratabletasks over migratable ones"
  2008-06-18 10:39             ` Ingo Molnar
  2008-06-18 10:47               ` Peter Zijlstra
@ 2008-06-18 11:52               ` Gregory Haskins
  2008-06-18 11:58               ` [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks " Dmitry Adamushko
  2 siblings, 0 replies; 17+ messages in thread
From: Gregory Haskins @ 2008-06-18 11:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Dmitry Adamushko, Steven Rostedt, Thomas Gleixner,
	linux-kernel

>>> On Wed, Jun 18, 2008 at  6:39 AM, in message <20080618103919.GH15255@elte.hu>,
Ingo Molnar <mingo@elte.hu> wrote: 

> * Gregory Haskins <ghaskins@novell.com> wrote:
> 
>> >>> On Mon, Jun 16, 2008 at  3:17 PM, in message
>> <1213643862.16944.142.camel@twins>, Peter Zijlstra <a.p.zijlstra@chello.nl>
>> wrote: 
>> > On Mon, 2008-06-16 at 19:59 +0200, Dmitry Adamushko wrote:
>> > 
>> >> One way or another, we have different aritifacts (and mine have likely
>> >> more) but conceptually, both "violates" POSIX if a strict round-robin
>> >> scheduling is required.
>> > 
>> > 
> http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html#t
>> > ag_02_08_04_01
>> > 
>> > Is quite strict on what FIFO should do, and I know of two points where
>> > we deviate and should work to match.
>> 
>> Thanks for the link, Peter.  When you read that, its pretty clear that 
>> this whole concept violates the standard.  Its probably best to just 
>> revert the patch and be done with it.
> 
> no, there's no spec violation here - the spec is silent on SMP issues.
> 
> the spec should not be read to force a global runqueue for RT tasks. 
> That would be silly beyond imagination.
> 
> so ... lets apply Dmitry's nice simplification, hm?

Hmm...I guess that is a good way to look at it.  Sounds good, thanks!

Perhaps I will write up a patch against his that fixes that suboptimal detection problem that he highlighted, afterall

Thanks,
-Greg



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks over migratable ones"
  2008-06-18 10:39             ` Ingo Molnar
  2008-06-18 10:47               ` Peter Zijlstra
  2008-06-18 11:52               ` [sched-devel, patch-rfc] rework of "prioritizenon-migratabletasks " Gregory Haskins
@ 2008-06-18 11:58               ` Dmitry Adamushko
  2 siblings, 0 replies; 17+ messages in thread
From: Dmitry Adamushko @ 2008-06-18 11:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gregory Haskins, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	linux-kernel

2008/6/18 Ingo Molnar <mingo@elte.hu>:
>
> * Gregory Haskins <ghaskins@novell.com> wrote:
>
>> >>> On Mon, Jun 16, 2008 at  3:17 PM, in message
>> <1213643862.16944.142.camel@twins>, Peter Zijlstra <a.p.zijlstra@chello.nl>
>> wrote:
>> > On Mon, 2008-06-16 at 19:59 +0200, Dmitry Adamushko wrote:
>> >
>> >> One way or another, we have different aritifacts (and mine have likely
>> >> more) but conceptually, both "violates" POSIX if a strict round-robin
>> >> scheduling is required.
>> >
>> > http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html#t
>> > ag_02_08_04_01
>> >
>> > Is quite strict on what FIFO should do, and I know of two points where
>> > we deviate and should work to match.
>>
>> Thanks for the link, Peter.  When you read that, its pretty clear that
>> this whole concept violates the standard.  Its probably best to just
>> revert the patch and be done with it.
>
> no, there's no spec violation here - the spec is silent on SMP issues.
>
> the spec should not be read to force a global runqueue for RT tasks.
> That would be silly beyond imagination.

I don't think it would actually make sense to specify behavior in
terms of per-cpu queues. This is an implementation detail. So the
_global_ (system-wise) queue makes more sense  (at least to me :-) if
one wants to define user-visible scheduling behavior.

e.g. as follows:
(http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html#tag_02_08_04_01)

[quote]
"Threads scheduled under this policy are chosen from a thread list
that is ordered by the time its threads have been on the list without
being executed;"
[/quote]

>From the POV of users, I think only the "global" queue would make
sense. Otherwise, the sentence is rather useless.

[quote]
"generally, the head of the list is the thread that has been on the
list the longest time, and the tail is the thread that has been on the
list the shortest time."
[/quote]

OTOH, some differences can be justified by this "generally" word :-)

so it might be understood as something like this:

task_N can get a CPU only if all other pending runnable tasks that
have been enqueued earlier than task_N (of the same class and prio)
can _not_ start running on this CPU (e.g. due to their affinity).

Note, that our case is somewhat more subtle. We preempt a currently
running task in the hope that it can start running on another CPU
immediately, but we can't guarantee it.

Anyway, we are likely to win wrt cpu time distribution, but this fact
of 'preemption' is a new 'artifact' indeed.

if POSIX can be understood as : a SCHED_FIFO/RR task can be preempted
_only_ by a task with a higher prio and has to consume cpu-time as
long as it wants (FIFO) or till its timeslice expires (RR) , then we
violate it with this new behavior.

>
>        Ingo
>

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-06-16 19:17         ` Peter Zijlstra
  2008-06-16 19:54           ` [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks " Gregory Haskins
@ 2008-07-01 10:46           ` Dmitry Adamushko
  2008-07-15 12:48             ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Dmitry Adamushko @ 2008-07-01 10:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gregory Haskins, Ingo Molnar, Steven Rostedt, Thomas Gleixner,
	linux-kernel

2008/6/16 Peter Zijlstra <a.p.zijlstra@chello.nl>:
> On Mon, 2008-06-16 at 19:59 +0200, Dmitry Adamushko wrote:
>
>> One way or another, we have different aritifacts (and mine have likely
>> more) but conceptually, both "violates" POSIX if a strict round-robin
>> scheduling is required.
>
> http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html#tag_02_08_04_01
>
> Is quite strict on what FIFO should do, and I know of two points where
> we deviate and should work to match.
>

btw., rt group scheduling seems to well, slightly wreck this (per-rq)
FIFO model as well.

say, group_A has N SCHED_FIFO tasks of equal prio. So far so good,
they all run strictly one after another.

Now group_B gets task_S. On a group layer, group_B gets enqueued after group_A.

This changes when a current task (that belongs to group_A)
relinquishes a CPU: dequeue_stack -> __enqueue_rt_entity() will place
group_A in the tail of its list.

So the next task to run is task_S, although group_A migth have plenty
of tasks of the same prio that were enqueued ealrier.

We can't get a strict FIFO ordering with this pure tree-like hierarchy.

btw #2,

Gregory, our new modification also doesn't work nicely with group-scheduling.

We may place a task in the head of its queue, yes. But its group will
still remain where it was.

rt_se->nr_cpus_allowed just has no adequat sense for groups and
__enqueue_rt_entity() always places a group at the tail.

IOW, even if check_preempt_curr_rt() calls resched_task() based on
analysis of the newly arrived task 'p', 'p' won't be necessarily
picked up by pick_next_task_rt(). Although, there is a way to fix it.

-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"
  2008-07-01 10:46           ` [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks " Dmitry Adamushko
@ 2008-07-15 12:48             ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2008-07-15 12:48 UTC (permalink / raw)
  To: Dmitry Adamushko
  Cc: Gregory Haskins, Ingo Molnar, Steven Rostedt, Thomas Gleixner,
	linux-kernel

On Tue, 2008-07-01 at 12:46 +0200, Dmitry Adamushko wrote:
> 2008/6/16 Peter Zijlstra <a.p.zijlstra@chello.nl>:
> > On Mon, 2008-06-16 at 19:59 +0200, Dmitry Adamushko wrote:
> >
> >> One way or another, we have different aritifacts (and mine have likely
> >> more) but conceptually, both "violates" POSIX if a strict round-robin
> >> scheduling is required.
> >
> > http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html#tag_02_08_04_01
> >
> > Is quite strict on what FIFO should do, and I know of two points where
> > we deviate and should work to match.
> >
> 
> 
> btw., rt group scheduling seems to well, slightly wreck this (per-rq)
> FIFO model as well.
> 
> say, group_A has N SCHED_FIFO tasks of equal prio. So far so good,
> they all run strictly one after another.
> 
> Now group_B gets task_S. On a group layer, group_B gets enqueued after group_A.
> 
> This changes when a current task (that belongs to group_A)
> relinquishes a CPU: dequeue_stack -> __enqueue_rt_entity() will place
> group_A in the tail of its list.
> 
> So the next task to run is task_S, although group_A migth have plenty
> of tasks of the same prio that were enqueued ealrier.
> 
> We can't get a strict FIFO ordering with this pure tree-like hierarchy.
> 
> 
> btw #2,
> 
> Gregory, our new modification also doesn't work nicely with group-scheduling.
> 
> We may place a task in the head of its queue, yes. But its group will
> still remain where it was.
> 
> rt_se->nr_cpus_allowed just has no adequat sense for groups and
> __enqueue_rt_entity() always places a group at the tail.
> 
> IOW, even if check_preempt_curr_rt() calls resched_task() based on
> analysis of the newly arrived task 'p', 'p' won't be necessarily
> picked up by pick_next_task_rt(). Although, there is a way to fix it.


Yeah, I realized this quite a while back, but 1) posix doesn't say
anything about group scheduling like this, and 2) all the rt group
scheduling stuff is still experimental :-)

I'm open to suggestion though.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-07-15 12:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-10 22:58 [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones" Dmitry Adamushko
2008-06-11  8:53 ` Peter Zijlstra
2008-06-11 10:05   ` Dmitry Adamushko
2008-06-13 13:08     ` Peter Zijlstra
2008-06-16 14:26     ` Gregory Haskins
2008-06-16 17:59       ` Dmitry Adamushko
2008-06-16 18:44         ` Gregory Haskins
2008-06-16 19:17         ` Peter Zijlstra
2008-06-16 19:54           ` [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks " Gregory Haskins
2008-06-18 10:39             ` Ingo Molnar
2008-06-18 10:47               ` Peter Zijlstra
2008-06-18 11:52               ` [sched-devel, patch-rfc] rework of "prioritizenon-migratabletasks " Gregory Haskins
2008-06-18 11:58               ` [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks " Dmitry Adamushko
2008-07-01 10:46           ` [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks " Dmitry Adamushko
2008-07-15 12:48             ` Peter Zijlstra
2008-06-18 10:44 ` Ingo Molnar
     [not found] <48524EDA0200005A00038BA4@sinclair.provo.novell.com>
2008-06-13 14:41 ` [sched-devel, patch-rfc] rework of "prioritize non-migratabletasks " Gregory Haskins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox