linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block/blk-mq: Don't complete locally if capacities are different
@ 2024-01-22 22:42 Qais Yousef
  2024-01-23  8:47 ` Christoph Hellwig
  2024-02-02 23:20 ` Qais Yousef
  0 siblings, 2 replies; 5+ messages in thread
From: Qais Yousef @ 2024-01-22 22:42 UTC (permalink / raw)
  To: Jens Axboe, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Dietmar Eggemann
  Cc: linux-kernel, linux-block, Sudeep Holla, Wei Wang, Jaegeuk Kim,
	Bart Van Assche, Qais Yousef

The logic in blk_mq_complete_need_ipi() assumes SMP systems where all
CPUs have equal capacities and only LLC cache can make a different on
perceived performance. But this assumption falls apart on HMP systems
where LLC is shared, but the CPUs have different capacities. Staying
local then can have a big performance impact if the IO request was done
from a CPU with higher capacity but the interrupt is serviced on a lower
capacity CPU.

Introduce new cpus_gte_capacity() function to enable do the additional
check.

Without the patch I see the BLOCK softirq always running on little cores
(where the hardirq is serviced). With it I can see it running on all
cores.

This was noticed after the topology change [1] where now on a big.LITTLE
we truly get that the LLC is shared between all cores where as in the
past it was being misrepresented for historical reasons. The logic
exposed a missing dependency on capacities for such systems where there
can be a big performance difference between the CPUs.

This of course introduced a noticeable change in behavior depending on
how the topology is presented. Leading to regressions in some workloads
as the performance of the BLOCK softirq on littles can be noticeably
worse.

[1] https://lpc.events/event/16/contributions/1342/attachments/962/1883/LPC-2022-Android-MC-Phantom-Domains.pdf

Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
---
 block/blk-mq.c                 | 5 +++--
 include/linux/sched/topology.h | 6 ++++++
 kernel/sched/core.c            | 8 ++++++++
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ac18f802c027..9b2d278a7ae7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1163,10 +1163,11 @@ static inline bool blk_mq_complete_need_ipi(struct request *rq)
 	if (force_irqthreads())
 		return false;
 
-	/* same CPU or cache domain?  Complete locally */
+	/* same CPU or cache domain and capacity?  Complete locally */
 	if (cpu == rq->mq_ctx->cpu ||
 	    (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
-	     cpus_share_cache(cpu, rq->mq_ctx->cpu)))
+	     cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
+	     cpus_gte_capacity(cpu, rq->mq_ctx->cpu)))
 		return false;
 
 	/* don't try to IPI to an offline CPU */
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index a6e04b4a21d7..31cef5780ba4 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -176,6 +176,7 @@ extern void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
 cpumask_var_t *alloc_sched_domains(unsigned int ndoms);
 void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
 
+bool cpus_gte_capacity(int this_cpu, int that_cpu);
 bool cpus_share_cache(int this_cpu, int that_cpu);
 bool cpus_share_resources(int this_cpu, int that_cpu);
 
@@ -226,6 +227,11 @@ partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
 {
 }
 
+static inline bool cpus_gte_capacity(int this_cpu, int that_cpu)
+{
+	return true;
+}
+
 static inline bool cpus_share_cache(int this_cpu, int that_cpu)
 {
 	return true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index db4be4921e7f..db5ab4b3cee7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3954,6 +3954,14 @@ void wake_up_if_idle(int cpu)
 	}
 }
 
+bool cpus_gte_capacity(int this_cpu, int that_cpu)
+{
+	if (this_cpu == that_cpu)
+		return true;
+
+	return arch_scale_cpu_capacity(this_cpu) >= arch_scale_cpu_capacity(that_cpu);
+}
+
 bool cpus_share_cache(int this_cpu, int that_cpu)
 {
 	if (this_cpu == that_cpu)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-02 23:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-22 22:42 [PATCH] block/blk-mq: Don't complete locally if capacities are different Qais Yousef
2024-01-23  8:47 ` Christoph Hellwig
2024-01-23 15:58   ` Jens Axboe
2024-01-24  0:24     ` Qais Yousef
2024-02-02 23:20 ` Qais Yousef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).