public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: highest and lowest priority job of a runqueue
       [not found] <8KLFD-G9-5@gated-at.bofh.it>
@ 2007-07-25 15:18 ` Martin Roehricht
  2007-08-02  8:58 ` Scheduling the highest priority task Martin Roehricht
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Roehricht @ 2007-07-25 15:18 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

On 07/25/2007 11:50 AM, Martin Roehricht wrote:
> I thought I might use (given a list with tmp pointers to all CPUs)
> 	rq = cpu_rq(tmp->cpu);
> 	task_load = rq->curr->load_weight;
> but this always returns 128 regardless of the fact if a task currently
> runs on that CPU or not. I guess it returns the load of the migration
> thread, but I'm not sure. I would like to migrate specific tasks
> throughout find_busiest_group().

Okay, I think I found that my assumption should be just fine and I had 
trouble with some debugging output via the show_schedstat() function.
The "rq->curr->load_weight" returns what it is supposed to return. :-)

I am still uncertain on how to resolve the specific highest or lowest 
priority job:
> I was wondering how I may retrieve
> (a) the priority/load of the highest and the lowest priority task of a
>     runqueue (in a multiprocessor system), and
> (b) the corresponding pointer to this task?

I will try something of the form (pseudocode like):

int idx;
struct list_head *head;
struct task_struct *task;

idx = sched_find_first_bit(rq->active->bitmap);
head = array->queue + idx;
task = list_entry(head, struct task_struct, run_list);

For the lowest priority task a function like "sched_find_last_bit()" 
might be useful.
Would this be a good way to succeed?

> Furthermore, is it correct, that the current migration strategy
> (move_tasks()) chooses automatically the highest priority task?

Thanks,
Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Scheduling the highest priority task
       [not found] <8KLFD-G9-5@gated-at.bofh.it>
  2007-07-25 15:18 ` highest and lowest priority job of a runqueue Martin Roehricht
@ 2007-08-02  8:58 ` Martin Roehricht
  2007-08-02 11:40   ` Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: Martin Roehricht @ 2007-08-02  8:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

Hi,

perhaps someone can give me a hint what I should consider to look for in 
order to change the ("old" 2.6.21) scheduler such that it schedules the 
highest priority task of a given runqueue.
Given a multiprocessor system I currently observe that whenever there 
are two tasks on one CPU, the lower priority one is migrated to another 
CPU. But I don't realize why this happens. From looking at the source 
code I thought it should be the highest priority one (lowest bit set in 
the runqueue's bitmap) according to
	idx = sched_find_first_bit(array->bitmap);
within move_tasks(). The idx value is then used as an index (surprise) 
to the linked list of tasks of this particular priority and one task is 
picked:
	head = array->queue + idx;
         curr = head->prev;
         tmp = list_entry(curr, struct task_struct, run_list);

Is my assumption wrong? Using printk()s within this code section makes 
the system just hang completely quite soon. The schedstats do not notify 
me immediately. So I am a bit lost on how to track down or trace the 
problem.

Can anybody confirm that my observations are correct that the scheduler 
picks the lowest priority job of a runqueue for migration?
What needs to be changed in order to pick the highest priority one?

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduling the highest priority task
  2007-08-02  8:58 ` Scheduling the highest priority task Martin Roehricht
@ 2007-08-02 11:40   ` Ingo Molnar
  2007-08-02 15:00     ` Martin Roehricht
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2007-08-02 11:40 UTC (permalink / raw)
  To: Martin Roehricht; +Cc: linux-kernel


* Martin Roehricht <ml@felicis.org> wrote:

> perhaps someone can give me a hint what I should consider to look for in 
> order to change the ("old" 2.6.21) scheduler such that it schedules the 
> highest priority task of a given runqueue.
> Given a multiprocessor system I currently observe that whenever there 
> are two tasks on one CPU, the lower priority one is migrated to another 
> CPU. But I don't realize why this happens. From looking at the source 
> code I thought it should be the highest priority one (lowest bit set in 
> the runqueue's bitmap) according to
> 	idx = sched_find_first_bit(array->bitmap);
> within move_tasks(). The idx value is then used as an index (surprise) 
> to the linked list of tasks of this particular priority and one task is 
> picked:
> 	head = array->queue + idx;
>         curr = head->prev;
>         tmp = list_entry(curr, struct task_struct, run_list);
> 
> Can anybody confirm that my observations are correct that the 
> scheduler picks the lowest priority job of a runqueue for migration? 
> What needs to be changed in order to pick the highest priority one?

in the SMP migration code, the 'old scheduler' indeed picks the lowest 
priority one, _except_ if that task is running on another CPU or is too 
'cache hot':

        if (skip_for_load ||
            !can_migrate_task(tmp, busiest, this_cpu, sd, idle, &pinned)) {

also, from the priority-queue at 'idx', we pick head->prev, i.e. we 
process the list in the opposite order as schedule(). (This got changed 
in CFS to process in the same direction - which is more logical and also 
yield the most cache-cold tasks for migration.)

i hope this helps.

> Is my assumption wrong? Using printk()s within this code section makes 
> the system just hang completely quite soon. The schedstats do not 
> notify me immediately. So I am a bit lost on how to track down or 
> trace the problem.

yep, printk locks up. You can use my static tracer:

   http://people.redhat.com/mingo/latency-tracing-patches/

add explicit static tracepoints to the scheduler code you want to 
instrument via trace_special(x,y,z) calls [with parameters that interest 
you most], and you can read out the trace via:

   http://people.redhat.com/mingo/latency-tracing-patches/trace-it.c

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduling the highest priority task
  2007-08-02 11:40   ` Ingo Molnar
@ 2007-08-02 15:00     ` Martin Roehricht
  2007-08-02 15:03       ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Roehricht @ 2007-08-02 15:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On 08/02/2007 01:40 PM, Ingo Molnar wrote:
> in the SMP migration code, the 'old scheduler' indeed picks the lowest 
> priority one, _except_ if that task is running on another CPU or is too 
> 'cache hot':

But why is it, that the scheduler picks the lowest priority one? I 
thought sched_find_first_bit() picks the index of the lowest order bit 
in the bitmap and thus the highest priority job. Is that wrong?
What needs to be changed to let the scheduler pick the highest priority 
task from a given runqueue?
I am very confused ...

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduling the highest priority task
  2007-08-02 15:00     ` Martin Roehricht
@ 2007-08-02 15:03       ` Ingo Molnar
  2007-08-02 15:14         ` Martin Roehricht
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2007-08-02 15:03 UTC (permalink / raw)
  To: Martin Roehricht; +Cc: linux-kernel


* Martin Roehricht <ml@felicis.org> wrote:

> On 08/02/2007 01:40 PM, Ingo Molnar wrote:
> >in the SMP migration code, the 'old scheduler' indeed picks the lowest 
> >priority one, _except_ if that task is running on another CPU or is too 
> >'cache hot':
> 
> But why is it, that the scheduler picks the lowest priority one? I 
> thought sched_find_first_bit() picks the index of the lowest order bit 
> in the bitmap and thus the highest priority job. Is that wrong? What 
> needs to be changed to let the scheduler pick the highest priority 
> task from a given runqueue? I am very confused ...

it first picks the lowest index (i.e. the highest priority active 
priority-queue), but within those tasks (each task in that priority 
queue has equal priority) the load-balancer has freedom to pick any. 
Based on performance data we went for picking from the tail of the 
queue.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduling the highest priority task
  2007-08-02 15:03       ` Ingo Molnar
@ 2007-08-02 15:14         ` Martin Roehricht
  2007-08-02 15:19           ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Roehricht @ 2007-08-02 15:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On 08/02/2007 05:03 PM, Ingo Molnar wrote:
> * Martin Roehricht <ml@felicis.org> wrote:
> 
>> On 08/02/2007 01:40 PM, Ingo Molnar wrote:
>> >in the SMP migration code, the 'old scheduler' indeed picks the lowest 
>> >priority one, _except_ if that task is running on another CPU or is too 
>> >'cache hot':
>> 
>> But why is it, that the scheduler picks the lowest priority one? I 
>> thought sched_find_first_bit() picks the index of the lowest order bit 
>> in the bitmap and thus the highest priority job. Is that wrong? What 
>> needs to be changed to let the scheduler pick the highest priority 
>> task from a given runqueue? I am very confused ...
> 
> it first picks the lowest index (i.e. the highest priority active 
> priority-queue), but within those tasks (each task in that priority 
> queue has equal priority) the load-balancer has freedom to pick any. 
> Based on performance data we went for picking from the tail of the 
> queue.

That's fine with me, that within the same priority-queue any task can be 
chosen. But assume two tasks with highly different priorities, such as 
105 and 135 are scheduled on the same processor and one of them is now 
to be migrated -- shouldn't be the queue with task P=105 considered 
first for migration by this code?
Both tasks would use different queues with their own linked lists, right?

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduling the highest priority task
  2007-08-02 15:14         ` Martin Roehricht
@ 2007-08-02 15:19           ` Ingo Molnar
  2007-08-02 15:46             ` Martin Roehricht
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2007-08-02 15:19 UTC (permalink / raw)
  To: Martin Roehricht; +Cc: linux-kernel


* Martin Roehricht <ml@felicis.org> wrote:

> That's fine with me, that within the same priority-queue any task can 
> be chosen. But assume two tasks with highly different priorities, such 
> as 105 and 135 are scheduled on the same processor and one of them is 
> now to be migrated -- shouldn't be the queue with task P=105 
> considered first for migration by this code? Both tasks would use 
> different queues with their own linked lists, right?

yes. What makes you believe that the lower priority one (prio 135) is 
chosen? [ as i said before, that will only be chosen if all tasks in the 
higher-priority queue (prio 105) are either already running on a CPU or 
have recently run so that the cache-hot logic skips them. ]

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduling the highest priority task
  2007-08-02 15:19           ` Ingo Molnar
@ 2007-08-02 15:46             ` Martin Roehricht
  2007-08-02 19:48               ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Roehricht @ 2007-08-02 15:46 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On 08/02/2007 05:19 PM, Ingo Molnar wrote:
> * Martin Roehricht <ml@felicis.org> wrote:
> 
>> That's fine with me, that within the same priority-queue any task can 
>> be chosen. But assume two tasks with highly different priorities, such 
>> as 105 and 135 are scheduled on the same processor and one of them is 
>> now to be migrated -- shouldn't be the queue with task P=105 
>> considered first for migration by this code? Both tasks would use 
>> different queues with their own linked lists, right?
> 
> yes. What makes you believe that the lower priority one (prio 135) is 
> chosen? [ as i said before, that will only be chosen if all tasks in the 
> higher-priority queue (prio 105) are either already running on a CPU or 
> have recently run so that the cache-hot logic skips them. ]

This believe is primarily based on my observations of multiple benchmark 
runs and also on your statement earlier: »in the SMP migration code, the 
'old scheduler' indeed picks the lowest priority one«.

Perhaps it is just an unfortunate coincidence that at ~90% of the time a 
migration decision is made, the higher priority process is currently 
cache hot whereas the lower priority process is not. That would be 
unlucky for me as I would like to decide upon specific runtime 
circumstances whether the highest or the lowest priority job of a 
runqueue should be migrated to another CPU. :-/

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduling the highest priority task
  2007-08-02 15:46             ` Martin Roehricht
@ 2007-08-02 19:48               ` Ingo Molnar
  2007-08-02 21:05                 ` Martin Roehricht
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2007-08-02 19:48 UTC (permalink / raw)
  To: Martin Roehricht; +Cc: linux-kernel


* Martin Roehricht <ml@felicis.org> wrote:

> On 08/02/2007 05:19 PM, Ingo Molnar wrote:
> >* Martin Roehricht <ml@felicis.org> wrote:
> >
> >>That's fine with me, that within the same priority-queue any task can 
> >>be chosen. But assume two tasks with highly different priorities, such 
> >>as 105 and 135 are scheduled on the same processor and one of them is 
> >>now to be migrated -- shouldn't be the queue with task P=105 
> >>considered first for migration by this code? Both tasks would use 
> >>different queues with their own linked lists, right?
> >
> >yes. What makes you believe that the lower priority one (prio 135) is 
> >chosen? [ as i said before, that will only be chosen if all tasks in the 
> >higher-priority queue (prio 105) are either already running on a CPU or 
> >have recently run so that the cache-hot logic skips them. ]
> 
> This believe is primarily based on my observations of multiple 
> benchmark runs and also on your statement earlier: »in the SMP 
> migration code, the 'old scheduler' indeed picks the lowest priority 
> one«.

oh, sorry, that was meant to be the 'highest priority one' :-/

so i think you got it all right, i just typoed that first sentence.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Scheduling the highest priority task
  2007-08-02 19:48               ` Ingo Molnar
@ 2007-08-02 21:05                 ` Martin Roehricht
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Roehricht @ 2007-08-02 21:05 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On 02.08.2007 21:48, Ingo Molnar wrote:
> * Martin Roehricht <ml@felicis.org> wrote:
> 
>> On 08/02/2007 05:19 PM, Ingo Molnar wrote:
>> >* Martin Roehricht <ml@felicis.org> wrote:
>> >
>> >>That's fine with me, that within the same priority-queue any task can 
>> >>be chosen. But assume two tasks with highly different priorities, such 
>> >>as 105 and 135 are scheduled on the same processor and one of them is 
>> >>now to be migrated -- shouldn't be the queue with task P=105 
>> >>considered first for migration by this code? Both tasks would use 
>> >>different queues with their own linked lists, right?
>> >
>> >yes. What makes you believe that the lower priority one (prio 135) is 
>> >chosen? [ as i said before, that will only be chosen if all tasks in the 
>> >higher-priority queue (prio 105) are either already running on a CPU or 
>> >have recently run so that the cache-hot logic skips them. ]
>> 
>> This believe is primarily based on my observations of multiple 
>> benchmark runs and also on your statement earlier: »in the SMP 
>> migration code, the 'old scheduler' indeed picks the lowest priority 
>> one«.
> 
> oh, sorry, that was meant to be the 'highest priority one' :-/
> 
> so i think you got it all right, i just typoed that first sentence.

Okay, now I think I understood this part of the code correctly. The 
reason why I observe a continous migration of the _lower_ priority tasks 
is most probably due to the fact that the higher priority one is 
currently running, according to:
	can_migrate_task() in move_tasks(), and therein:

	if (task_running(rq, p))
		return 0;

I tracked down via an extended /proc/schedstats that my tasks fall 
frequently into this pitfall. I basically solved it by making use of the 
more active push-strategy which is called later by load_balance() once 
the move_tasks() function did not succeed. So in case I need the higher 
priority tasks, I return immediately from move_tasks().

Thanks for your help,
Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-08-02 21:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <8KLFD-G9-5@gated-at.bofh.it>
2007-07-25 15:18 ` highest and lowest priority job of a runqueue Martin Roehricht
2007-08-02  8:58 ` Scheduling the highest priority task Martin Roehricht
2007-08-02 11:40   ` Ingo Molnar
2007-08-02 15:00     ` Martin Roehricht
2007-08-02 15:03       ` Ingo Molnar
2007-08-02 15:14         ` Martin Roehricht
2007-08-02 15:19           ` Ingo Molnar
2007-08-02 15:46             ` Martin Roehricht
2007-08-02 19:48               ` Ingo Molnar
2007-08-02 21:05                 ` Martin Roehricht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox