From: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Gregory Haskins <ghaskins@novell.com>,
Ingo Molnar <mingo@elte.hu>,
linux-rt-users <linux-rt-users@vger.kernel.org>,
kravetz@us.ibm.com, LKML <linux-kernel@vger.kernel.org>,
pmorreale@novell.com, sdietrich@novell.com
Subject: Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.
Date: Tue, 09 Oct 2007 20:16:59 +0200 [thread overview]
Message-ID: <1191953819.5797.9.camel@lappy> (raw)
In-Reply-To: <1191952777.23198.8.camel@localhost.localdomain>
On Tue, 2007-10-09 at 13:59 -0400, Steven Rostedt wrote:
> This has been complied tested (and no more ;-)
>
>
> The idea here is when we find a situation that we just scheduled in an
> RT task and we either pushed a lesser RT task away or more than one RT
> task was scheduled on this CPU before scheduling occurred.
>
> The answer that this patch does is to do a O(n) search of CPUs for the
> CPU with the lowest prio task running. When that CPU is found the next
> highest RT task is pushed to that CPU.
>
> Some notes:
>
> 1) no lock is taken while looking for the lowest priority CPU. When one
> is found, only that CPU's lock is taken and after that a check is made
> to see if it is still a candidate to push the RT task over. If not, we
> try the search again, for a max of 3 tries.
>
> 2) I only do this for the second highest RT task on the CPU queue. This
> can be easily changed to do it for all RT tasks until no more can be
> pushed off to other CPUs.
>
> This is a simple approach right now, and is only being posted for
> comments. I'm sure more can be done to make this more efficient or just
> simply better.
>
> -- Steve
Do we really want this PREEMPT_RT only?
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
>
> Index: linux-2.6.23-rc9-rt2/kernel/sched.c
> ===================================================================
> --- linux-2.6.23-rc9-rt2.orig/kernel/sched.c
> +++ linux-2.6.23-rc9-rt2/kernel/sched.c
> @@ -304,6 +304,7 @@ struct rq {
> #ifdef CONFIG_PREEMPT_RT
> unsigned long rt_nr_running;
> unsigned long rt_nr_uninterruptible;
> + int curr_prio;
> #endif
>
> unsigned long switch_timestamp;
> @@ -1485,6 +1486,87 @@ next_in_queue:
> static int double_lock_balance(struct rq *this_rq, struct rq *busiest);
>
> /*
> + * If the current CPU has more than one RT task, see if the non
> + * running task can migrate over to a CPU that is running a task
> + * of lesser priority.
> + */
> +static int push_rt_task(struct rq *this_rq)
> +{
> + struct task_struct *next_task;
> + struct rq *lowest_rq = NULL;
> + int tries;
> + int cpu;
> + int dst_cpu = -1;
> + int ret = 0;
> +
> + BUG_ON(!spin_is_locked(&this_rq->lock));
assert_spin_locked(&this_rq->lock);
> +
> + next_task = rt_next_highest_task(this_rq);
> + if (!next_task)
> + return 0;
> +
> + /* We might release this_rq lock */
> + get_task_struct(next_task);
Can the rest of the code suffer this? (the caller that is)
> + /* Only try this algorithm three times */
> + for (tries = 0; tries < 3; tries++) {
magic numbers.. maybe a magic #define with a descriptive name?
> + /*
> + * Scan each rq for the lowest prio.
> + */
> + for_each_cpu_mask(cpu, next_task->cpus_allowed) {
> + struct rq *rq = &per_cpu(runqueues, cpu);
> +
> + if (cpu == smp_processor_id())
> + continue;
> +
> + /* no locking for now */
> + if (rq->curr_prio > next_task->prio &&
> + (!lowest_rq || rq->curr_prio < lowest_rq->curr_prio)) {
> + dst_cpu = cpu;
> + lowest_rq = rq;
> + }
> + }
> +
> + if (!lowest_rq)
> + break;
> +
> + if (double_lock_balance(this_rq, lowest_rq)) {
> + /*
> + * We had to unlock the run queue. In
> + * the mean time, next_task could have
> + * migrated already or had its affinity changed.
> + */
> + if (unlikely(task_rq(next_task) != this_rq ||
> + !cpu_isset(dst_cpu, next_task->cpus_allowed))) {
> + spin_unlock(&lowest_rq->lock);
> + break;
> + }
> + }
> +
> + /* if the prio of this runqueue changed, try again */
> + if (lowest_rq->curr_prio <= next_task->prio) {
> + spin_unlock(&lowest_rq->lock);
> + continue;
> + }
> +
> + deactivate_task(this_rq, next_task, 0);
> + set_task_cpu(next_task, dst_cpu);
> + activate_task(lowest_rq, next_task, 0);
> +
> + set_tsk_need_resched(lowest_rq->curr);
Use resched_task(), that will notify the remote cpu too.
> +
> + spin_unlock(&lowest_rq->lock);
> + ret = 1;
> +
> + break;
> + }
> +
> + put_task_struct(next_task);
> +
> + return ret;
> +}
> +
> +/*
> * Pull RT tasks from other CPUs in the RT-overload
> * case. Interrupts are disabled, local rq is locked.
> */
> @@ -2207,7 +2289,8 @@ static inline void finish_task_switch(st
> * If we pushed an RT task off the runqueue,
> * then kick other CPUs, they might run it:
> */
> - if (unlikely(rt_task(current) && rq->rt_nr_running > 1)) {
> + rq->curr_prio = current->prio;
> + if (unlikely(rt_task(current) && push_rt_task(rq))) {
> schedstat_inc(rq, rto_schedule);
> smp_send_reschedule_allbutself_cpumask(current->cpus_allowed);
Which will allow you to remove this thing.
> }
> Index: linux-2.6.23-rc9-rt2/kernel/sched_rt.c
> ===================================================================
> --- linux-2.6.23-rc9-rt2.orig/kernel/sched_rt.c
> +++ linux-2.6.23-rc9-rt2/kernel/sched_rt.c
> @@ -96,6 +96,48 @@ static struct task_struct *pick_next_tas
> return next;
> }
>
> +#ifdef CONFIG_PREEMPT_RT
> +static struct task_struct *rt_next_highest_task(struct rq *rq)
> +{
> + struct rt_prio_array *array = &rq->rt.active;
> + struct task_struct *next;
> + struct list_head *queue;
> + int idx;
> +
> + if (likely (rq->rt_nr_running < 2))
> + return NULL;
> +
> + idx = sched_find_first_bit(array->bitmap);
> + if (idx >= MAX_RT_PRIO) {
> + WARN_ON(1); /* rt_nr__running is bad */
> + return NULL;
> + }
> +
> + queue = array->queue + idx;
> + if (queue->next->next != queue) {
> + /* same prio task */
> + next = list_entry(queue->next->next, struct task_struct, run_list);
> + goto out;
> + }
> +
> + /* slower, but more flexible */
> + idx = find_next_bit(array->bitmap, MAX_RT_PRIO, idx+1);
> + if (idx >= MAX_RT_PRIO) {
> + WARN_ON(1); /* rt_nr_running was 2 and above! */
> + return NULL;
> + }
> +
> + queue = array->queue + idx;
> + next = list_entry(queue->next, struct task_struct, run_list);
> +
> + out:
> + return next;
> +
> +}
> +#else /* CONFIG_PREEMPT_RT */
> +
> +#endif /* CONFIG_PREEMPT_RT */
> +
> static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
> {
> update_curr_rt(rq);
>
>
next prev parent reply other threads:[~2007-10-09 18:18 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-09 14:25 [PATCH 0/5] RT: scheduler fixes and rt_overload enhancements Gregory Haskins
2007-10-09 14:25 ` [PATCH 1/5] RT - fix for scheduling issue Gregory Haskins
2007-10-09 14:25 ` [PATCH 2/5] RT - fix reschedule IPI Gregory Haskins
2007-10-09 14:25 ` [PATCH 3/5] RT - fix mistargeted RESCHED_IPI Gregory Haskins
2007-10-09 14:26 ` [PATCH 4/5] RT: Add a per-cpu rt_overload indication Gregory Haskins
2007-10-09 14:26 ` [PATCH 5/5] RT - Track which CPUs should get IPI'd on rt-overload Gregory Haskins
2007-10-09 15:00 ` [PATCH 0/5] RT: scheduler fixes and rt_overload enhancements Peter Zijlstra
2007-10-09 15:00 ` Steven Rostedt
2007-10-09 15:33 ` Gregory Haskins
2007-10-09 15:39 ` Peter Zijlstra
2007-10-09 17:59 ` [RFC PATCH RT] push waiting rt tasks to cpus with lower prios Steven Rostedt
2007-10-09 18:14 ` Steven Rostedt
2007-10-09 18:16 ` Peter Zijlstra [this message]
2007-10-09 18:45 ` Steven Rostedt
2007-10-09 20:39 ` mike kravetz
2007-10-09 20:50 ` Steven Rostedt
2007-10-09 21:17 ` mike kravetz
2007-10-10 2:12 ` Girish kathalagiri
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1191953819.5797.9.camel@lappy \
--to=peterz@infradead.org \
--cc=ghaskins@novell.com \
--cc=kravetz@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=pmorreale@novell.com \
--cc=rostedt@goodmis.org \
--cc=sdietrich@novell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox