public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] sched/fair: Refactor can_migrate_task() to elimate looping
@ 2025-01-08 17:29 I Hsin Cheng
  2025-01-09  8:58 ` Nysal Jan K.A.
  2025-01-09  9:46 ` Peter Zijlstra
  0 siblings, 2 replies; 7+ messages in thread
From: I Hsin Cheng @ 2025-01-08 17:29 UTC (permalink / raw)
  To: mingo
  Cc: peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, linux-kernel, I Hsin Cheng

The function "can_migrate_task()" utilize "for_each_cpu_and" with a
"if" statement inside to find the destination cpu. It's the same logic
to find the first set bit of the result of the bitwise-AND of
"env->dst_grpmask", "env->cpus" and "p->cpus_ptr".

Refactor it by directly perform bitwise-AND for "env->dst_grpmask",
"env->cpus" and "p->cpus_ptr" and use "cpumask_first()" to select the
destination cpu, so we can elimate the need of looping and multiple
times of branch.

After the refactoring this part of the code can speed up from ~130ns to
~93ns, according to the test below.

Ran the test for 5 times and the result is showned in the following
table, and the test script is paste in next section.

-------------------------------------------------------
|Old method|  123|  135|  126|  129|  135|  avg ~130ns|
-------------------------------------------------------
|New method|   87|   97|   90|   92|   98|  avg  ~93ns|
-------------------------------------------------------

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
---
Test is done on Linux 6.8.0-48-generic x86_64 with
Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

Test is executed in the form of kernel module.

Test script:

int init_module(void)
{
    struct cpumask cur_mask, custom_mask, result_mask;
    struct task_struct *p = current;
    int cpu, cpu1 = nr_cpu_ids, cpu2 = nr_cpu_ids;
    unsigned tmp = 0;

    cpumask_copy(&cur_mask, cpu_online_mask);
    /* Self-implemented function, didn't paste here because the length */
    generate_random_cpumask(&custom_mask);

    ktime_t start_1 = ktime_get();
    for_each_cpu_and(cpu, &cur_mask, &custom_mask) {
        if (cpumask_test_cpu(cpu, p->cpus_ptr)) {
            /* imitate load balance operation */
            tmp |= 0x01010101;
            cpu1 = cpu;
            break;
        }
    }
    ktime_t end_1 = ktime_get();

    ktime_t start_2 = ktime_get();
    cpumask_and(&result_mask, &cur_mask, &custom_mask);
    cpumask_and(&result_mask, &result_mask, p->cpus_ptr);

    cpu = cpumask_first(&result_mask);

    if (cpu < nr_cpu_ids) {
        /* imitate load balance operation */
        tmp |= 0x01010101;
        cpu2 = cpu;
    }
    ktime_t end_2 = ktime_get();

    if (cpu1 != cpu2) {
        pr_err("Failed Assertion, cpu1 = %d, cpu2 = %d\n", cpu1, cpu2);
        return 0;
    }

    pr_info("Old method spend time : %lld\n", ktime_to_ns(end_1 - start_1));
    pr_info("New method spend time : %lld\n", ktime_to_ns(end_2 - start_2));

    return 0;
}
---
 kernel/sched/fair.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2d16c8545..ce46f61da 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9404,12 +9404,16 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 			return 0;
 
 		/* Prevent to re-select dst_cpu via env's CPUs: */
-		for_each_cpu_and(cpu, env->dst_grpmask, env->cpus) {
-			if (cpumask_test_cpu(cpu, p->cpus_ptr)) {
-				env->flags |= LBF_DST_PINNED;
-				env->new_dst_cpu = cpu;
-				break;
-			}
+		struct cpumask dst_mask;
+
+		cpumask_and(&dst_mask, env->dst_grpmask, env->cpus);
+		cpumask_and(&dst_mask, &dst_mask, p->cpus_ptr);
+
+		cpu = cpumask_first(&dst_mask);
+
+		if (cpu < nr_cpu_ids) {
+			env->flags |= LBF_DST_PINNED;
+			env->new_dst_cpu = cpu;
 		}
 
 		return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-01-09 15:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-08 17:29 [RFC PATCH] sched/fair: Refactor can_migrate_task() to elimate looping I Hsin Cheng
2025-01-09  8:58 ` Nysal Jan K.A.
2025-01-09 10:19   ` I Hsin Cheng
2025-01-09  9:46 ` Peter Zijlstra
2025-01-09 10:12   ` I Hsin Cheng
2025-01-09 10:21     ` Peter Zijlstra
2025-01-09 15:23       ` I Hsin Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox