Re: [PATCH v4 19/20] Optimize out cpu_clears

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Steven Rostedt <rostedt@goodmis.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	Gregory Haskins <ghaskins@novell.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Lameter <clameter@sgi.com>,
	Steven Rostedt <srostedt@redhat.com>
Subject: Re: [PATCH v4 19/20] Optimize out cpu_clears
Date: Tue, 20 Nov 2007 21:10:57 -0500	[thread overview]
Message-ID: <20071121021057.GA24815@goodmis.org> (raw)
In-Reply-To: <20071121011251.913977352@goodmis.org>

On Tue, Nov 20, 2007 at 08:01:13PM -0500, Steven Rostedt wrote:
>  
>  static DEFINE_PER_CPU(cpumask_t, local_cpu_mask);
> -static DEFINE_PER_CPU(cpumask_t, valid_cpu_mask);
>  
>  static int find_lowest_cpus(struct task_struct *task, cpumask_t *lowest_mask)
>  {
> -	int       cpu;
> -	cpumask_t *valid_mask = &__get_cpu_var(valid_cpu_mask);
>  	int       lowest_prio = -1;
> +	int       lowest_cpu  = 0;
>  	int       count       = 0;
> +	int       cpu;
>  
> -	cpus_clear(*lowest_mask);
> -	cpus_and(*valid_mask, cpu_online_map, task->cpus_allowed);
> +	cpus_and(*lowest_mask, cpu_online_map, task->cpus_allowed);
>  
>  	/*
>  	 * Scan each rq for the lowest prio.
>  	 */
> -	for_each_cpu_mask(cpu, *valid_mask) {
> +	for_each_cpu_mask(cpu, *lowest_mask) {
>  		struct rq *rq = cpu_rq(cpu);
>  
>  		/* We look for lowest RT prio or non-rt CPU */
> @@ -325,13 +323,30 @@ static int find_lowest_cpus(struct task_
>  			if (rq->rt.highest_prio > lowest_prio) {
>  				/* new low - clear old data */
>  				lowest_prio = rq->rt.highest_prio;
> -				if (count) {
> -					cpus_clear(*lowest_mask);
> -					count = 0;
> -				}

Gregory Haskins pointed out to me that this logic is slightly wrong. I
originally wrote this patch before adding his "count" patch optimization.
I did not take into account that on finding a non RT queue, we may leave
on some extra bits because the clear_cpus is not preformed if count is
zero. And count gets set to zero here. Which means that we don't clean
up.

The fix is to check for lowest_cpus > 0 instead of count on finding an
non-RT runqueue. This lets us know that we need to clear the mask.
Otherwise, if lowest_cpus == 0, then we can return the mask untouched.
The proper bit would already be set, and the return of 1 will have
the rest of the algorithm use the first bit.

Below is the updated patch. The full series is at:

  http://rostedt.homelinux.com/rt/rt-balance-patches-v5.tar.bz2


> +				lowest_cpu = cpu;
> +				count = 0;
>  			}
>  			cpu_set(rq->cpu, *lowest_mask);
>  			count++;



From: Steven Rostedt <srostedt@redhat.com>

This patch removes several cpumask operations by keeping track
of the first of the CPUS that is of the lowest priority. When
the search for the lowest priority runqueue is completed, all
the bits up to the first CPU with the lowest priority runqueue
is cleared.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

---
 kernel/sched_rt.c |   48 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

Index: linux-compile.git/kernel/sched_rt.c
===================================================================
--- linux-compile.git.orig/kernel/sched_rt.c	2007-11-20 19:53:15.000000000 -0500
+++ linux-compile.git/kernel/sched_rt.c	2007-11-20 20:35:04.000000000 -0500
@@ -293,29 +293,36 @@ static struct task_struct *pick_next_hig
 }
 
 static DEFINE_PER_CPU(cpumask_t, local_cpu_mask);
-static DEFINE_PER_CPU(cpumask_t, valid_cpu_mask);
 
 static int find_lowest_cpus(struct task_struct *task, cpumask_t *lowest_mask)
 {
-	int       cpu;
-	cpumask_t *valid_mask = &__get_cpu_var(valid_cpu_mask);
 	int       lowest_prio = -1;
+	int       lowest_cpu  = 0;
 	int       count       = 0;
+	int       cpu;
 
-	cpus_clear(*lowest_mask);
-	cpus_and(*valid_mask, cpu_online_map, task->cpus_allowed);
+	cpus_and(*lowest_mask, cpu_online_map, task->cpus_allowed);
 
 	/*
 	 * Scan each rq for the lowest prio.
 	 */
-	for_each_cpu_mask(cpu, *valid_mask) {
+	for_each_cpu_mask(cpu, *lowest_mask) {
 		struct rq *rq = cpu_rq(cpu);
 
 		/* We look for lowest RT prio or non-rt CPU */
 		if (rq->rt.highest_prio >= MAX_RT_PRIO) {
-			if (count)
+			/*
+			 * if we already found a low RT queue
+			 * and now we found this non-rt queue
+			 * clear the mask and set our bit.
+			 * Otherwise just return the queue as is
+			 * and the count==1 will cause the algorithm
+			 * to use the first bit found.
+			 */
+			if (lowest_cpu) {
 				cpus_clear(*lowest_mask);
-			cpu_set(rq->cpu, *lowest_mask);
+				cpu_set(rq->cpu, *lowest_mask);
+			}
 			return 1;
 		}
 
@@ -325,13 +332,30 @@ static int find_lowest_cpus(struct task_
 			if (rq->rt.highest_prio > lowest_prio) {
 				/* new low - clear old data */
 				lowest_prio = rq->rt.highest_prio;
-				if (count) {
-					cpus_clear(*lowest_mask);
-					count = 0;
-				}
+				lowest_cpu = cpu;
+				count = 0;
 			}
 			cpu_set(rq->cpu, *lowest_mask);
 			count++;
+		} else
+			cpu_clear(cpu, *lowest_mask);
+	}
+
+	/*
+	 * Clear out all the set bits that represent
+	 * runqueues that were of higher prio than
+	 * the lowest_prio.
+	 */
+	if (lowest_cpu) {
+		/*
+		 * Perhaps we could add another cpumask op to
+		 * zero out bits. Like cpu_zero_bits(cpumask, nrbits);
+		 * Then that could be optimized to use memset and such.
+		 */
+		for_each_cpu_mask(cpu, *lowest_mask) {
+			if (cpu >= lowest_cpu)
+				break;
+			cpu_clear(cpu, *lowest_mask);
 		}
 	}

next prev parent reply	other threads:[~2007-11-21  2:11 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-21  1:00 [PATCH v4 00/20] New RT Balancing version 4 Steven Rostedt
2007-11-21  1:00 ` [PATCH v4 01/20] Add rt_nr_running accounting Steven Rostedt
2007-11-21  1:00 ` [PATCH v4 02/20] track highest prio queued on runqueue Steven Rostedt
2007-11-21  1:00 ` [PATCH v4 03/20] push RT tasks Steven Rostedt
2007-11-21  1:00 ` [PATCH v4 04/20] RT overloaded runqueues accounting Steven Rostedt
2007-11-21  1:00 ` [PATCH v4 05/20] pull RT tasks Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 06/20] wake up balance RT Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 07/20] disable CFS RT load balancing Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 08/20] Cache cpus_allowed weight for optimizing migration Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 09/20] RT: Consistency cleanup for this_rq usage Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 10/20] RT: Remove some CFS specific code from the wakeup path of RT tasks Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 11/20] RT: Break out the search function Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 12/20] RT: Allow current_cpu to be included in search Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 13/20] RT: Pre-route RT tasks on wakeup Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 14/20] RT: Optimize our cpu selection based on topology Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 15/20] RT: Optimize rebalancing Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 16/20] Avoid overload Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 17/20] RT: restore the migratable conditional Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 18/20] Optimize cpu search with hamming weight Steven Rostedt
2007-11-21  1:01 ` [PATCH v4 19/20] Optimize out cpu_clears Steven Rostedt
2007-11-21  2:10   ` Steven Rostedt [this message]
2007-11-21  3:10     ` [PATCH] Fix optimized search Gregory Haskins
2007-11-21  4:15       ` Steven Rostedt
2007-11-21  4:26         ` Steven Rostedt
2007-11-21  5:14           ` Gregory Haskins
2007-11-21  1:01 ` [PATCH v4 20/20] balance RT tasks no new wake up Steven Rostedt
2007-11-21  4:44 ` [PATCH 0/4] more RT balancing enhancements Gregory Haskins
2007-11-21  4:44   ` [PATCH 1/4] Fix optimized search Gregory Haskins
2007-11-21  4:44   ` [PATCH 2/4] RT: Add sched-domain roots Gregory Haskins
2007-11-21  4:44   ` [PATCH 3/4] RT: Only balance our RT tasks within our root-domain Gregory Haskins
2007-11-21  4:44   ` [PATCH 4/4] RT: Use a 2-d bitmap for searching lowest-pri CPU Gregory Haskins
2007-11-21 19:51   ` [PATCH 0/4] more RT balancing enhancements v6a Gregory Haskins
2007-11-21 19:52     ` [PATCH 1/4] SCHED: Add sched-domain roots Gregory Haskins
2007-11-21 19:52     ` [PATCH 2/4] SCHED: Track online cpus in the root-domain Gregory Haskins
2007-11-21 19:52     ` [PATCH 3/4] SCHED: Only balance our RT tasks within our root-domain Gregory Haskins
2007-11-21 19:52     ` [PATCH 4/4] SCHED: Use a 2-d bitmap for searching lowest-pri CPU Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071121021057.GA24815@goodmis.org \
    --to=rostedt@goodmis.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=clameter@sgi.com \
    --cc=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=srostedt@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox