* [RFC PATCH 1/3] sched/topology: Split out SD_* flags declaration to its own file
2020-03-11 18:33 [RFC PATCH 0/3] sched: Instrument sched domain flags Valentin Schneider
@ 2020-03-11 18:33 ` Valentin Schneider
2020-03-23 13:42 ` Morten Rasmussen
2020-03-11 18:33 ` [RFC PATCH 2/3] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
2020-03-11 18:33 ` [RFC PATCH 3/3] sched/topology: Verify SD_* flags setup when sched_debug is on Valentin Schneider
2 siblings, 1 reply; 7+ messages in thread
From: Valentin Schneider @ 2020-03-11 18:33 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
morten.rasmussen
To associate the SD flags with some metadata, we need some more structure
in the way they are declared.
Rather than shove that in a free-standing macro list, move the declaration
in a separate file that can be re-imported with different SD_FLAG
definitions. This is inspired by the syscalls numbering header.
No change in functionality.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
include/linux/sched/sd_flags.h | 33 +++++++++++++++++++++++++++++++++
include/linux/sched/topology.h | 17 +++--------------
2 files changed, 36 insertions(+), 14 deletions(-)
create mode 100644 include/linux/sched/sd_flags.h
diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
new file mode 100644
index 000000000000..685bbe736945
--- /dev/null
+++ b/include/linux/sched/sd_flags.h
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * sched-domains (multiprocessor balancing) flag declarations.
+ */
+
+/* Balance when about to become idle */
+SD_FLAG(SD_BALANCE_NEWIDLE, 0)
+/* Balance on exec */
+SD_FLAG(SD_BALANCE_EXEC, 1)
+/* Balance on fork, clone */
+SD_FLAG(SD_BALANCE_FORK, 2)
+/* Balance on wakeup */
+SD_FLAG(SD_BALANCE_WAKE, 3)
+/* Wake task to waking CPU */
+SD_FLAG(SD_WAKE_AFFINE, 4)
+/* Domain members have different CPU capacities */
+SD_FLAG(SD_ASYM_CPUCAPACITY, 5)
+/* Domain members share CPU capacity */
+SD_FLAG(SD_SHARE_CPUCAPACITY, 6)
+/* Domain members share power domain */
+SD_FLAG(SD_SHARE_POWERDOMAIN, 7)
+/* Domain members share CPU pkg resources */
+SD_FLAG(SD_SHARE_PKG_RESOURCES, 8)
+/* Only a single load balancing instance */
+SD_FLAG(SD_SERIALIZE, 9)
+/* Place busy groups earlier in the domain */
+SD_FLAG(SD_ASYM_PACKING, 10)
+/* Prefer to place tasks in a sibling domain */
+SD_FLAG(SD_PREFER_SIBLING, 11)
+/* sched_domains of this level overlap */
+SD_FLAG(SD_OVERLAP, 12)
+/* cross-node balancing */
+SD_FLAG(SD_NUMA, 13)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 8de2f9744569..db7d24c0174b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -11,20 +11,9 @@
*/
#ifdef CONFIG_SMP
-#define SD_BALANCE_NEWIDLE 0x0001 /* Balance when about to become idle */
-#define SD_BALANCE_EXEC 0x0002 /* Balance on exec */
-#define SD_BALANCE_FORK 0x0004 /* Balance on fork, clone */
-#define SD_BALANCE_WAKE 0x0008 /* Balance on wakeup */
-#define SD_WAKE_AFFINE 0x0010 /* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY 0x0020 /* Domain members have different CPU capacities */
-#define SD_SHARE_CPUCAPACITY 0x0040 /* Domain members share CPU capacity */
-#define SD_SHARE_POWERDOMAIN 0x0080 /* Domain members share power domain */
-#define SD_SHARE_PKG_RESOURCES 0x0100 /* Domain members share CPU pkg resources */
-#define SD_SERIALIZE 0x0200 /* Only a single load balancing instance */
-#define SD_ASYM_PACKING 0x0400 /* Place busy groups earlier in the domain */
-#define SD_PREFER_SIBLING 0x0800 /* Prefer to place tasks in a sibling domain */
-#define SD_OVERLAP 0x1000 /* sched_domains of this level overlap */
-#define SD_NUMA 0x2000 /* cross-node balancing */
+#define SD_FLAG(name, idx) static const unsigned int name = BIT(idx);
+#include <linux/sched/sd_flags.h>
+#undef SD_FLAG
#ifdef CONFIG_SCHED_SMT
static inline int cpu_smt_flags(void)
--
2.24.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 2/3] sched/topology: Define and assign sched_domain flag metadata
2020-03-11 18:33 [RFC PATCH 0/3] sched: Instrument sched domain flags Valentin Schneider
2020-03-11 18:33 ` [RFC PATCH 1/3] sched/topology: Split out SD_* flags declaration to its own file Valentin Schneider
@ 2020-03-11 18:33 ` Valentin Schneider
2020-03-11 18:33 ` [RFC PATCH 3/3] sched/topology: Verify SD_* flags setup when sched_debug is on Valentin Schneider
2 siblings, 0 replies; 7+ messages in thread
From: Valentin Schneider @ 2020-03-11 18:33 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
morten.rasmussen
There are some expectations regarding how sched domain flags should be laid
out, but none of them are checked or asserted in
sched_domain_debug_one(). After staring at said flags for a while, I've
come to realize they all (except *one*) fall in either of two categories:
- Shared with children: those flags are set from the base CPU domain
upwards. Any domain that has it set will have it set in its children. It
hints at "some property holds true / some behaviour is enabled until this
level".
- Shared with parents: those flags are set from the topmost domain
downwards. Any domain that has it set will have it set in its parents. It
hints at "some property isn't visible / some behaviour is disabled until
this level".
The odd one out is SD_PREFER_SIBLING, which is cleared below levels with
SD_ASYM_CPUCAPACITY. The change was introduced by commit
9c63e84db29b ("sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains")
as it could break misfit migration on some systems. In light of this, we
might want to change it back to make it fit one of the two categories and
fix the issue another way.
Tweak the sched_domain flag declaration to assign each flag an expected
layout, and include the rationale for each flag "meta type" assignment as a
comment. Consolidate the flag metadata into an array; the index of a flag's
metadata can easily be found with log2(flag), IOW __ffs(flag).
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
include/linux/sched/sd_flags.h | 162 +++++++++++++++++++++++++++------
include/linux/sched/topology.h | 14 ++-
2 files changed, 147 insertions(+), 29 deletions(-)
diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 685bbe736945..1082e4796ffe 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -3,31 +3,137 @@
* sched-domains (multiprocessor balancing) flag declarations.
*/
-/* Balance when about to become idle */
-SD_FLAG(SD_BALANCE_NEWIDLE, 0)
-/* Balance on exec */
-SD_FLAG(SD_BALANCE_EXEC, 1)
-/* Balance on fork, clone */
-SD_FLAG(SD_BALANCE_FORK, 2)
-/* Balance on wakeup */
-SD_FLAG(SD_BALANCE_WAKE, 3)
-/* Wake task to waking CPU */
-SD_FLAG(SD_WAKE_AFFINE, 4)
-/* Domain members have different CPU capacities */
-SD_FLAG(SD_ASYM_CPUCAPACITY, 5)
-/* Domain members share CPU capacity */
-SD_FLAG(SD_SHARE_CPUCAPACITY, 6)
-/* Domain members share power domain */
-SD_FLAG(SD_SHARE_POWERDOMAIN, 7)
-/* Domain members share CPU pkg resources */
-SD_FLAG(SD_SHARE_PKG_RESOURCES, 8)
-/* Only a single load balancing instance */
-SD_FLAG(SD_SERIALIZE, 9)
-/* Place busy groups earlier in the domain */
-SD_FLAG(SD_ASYM_PACKING, 10)
-/* Prefer to place tasks in a sibling domain */
-SD_FLAG(SD_PREFER_SIBLING, 11)
-/* sched_domains of this level overlap */
-SD_FLAG(SD_OVERLAP, 12)
-/* cross-node balancing */
-SD_FLAG(SD_NUMA, 13)
+#ifndef SD_FLAG
+#define SD_FLAG(x, y, z)
+#endif
+
+/*
+ * Expected flag uses
+ *
+ * SHARED_CHILD: These flags are meant to be set from the base domain upwards.
+ * If a domain has this flag set, all of its children should have it set. This
+ * is usually because the flag describes some shared resource, or because they
+ * are tied to a scheduling behaviour that we want to disable at some point in
+ * the hierarchy for scalability reasons.
+ *
+ * In those cases it doesn't make sense to not have the flag set for a domain
+ * but not have it in (some of) its children: (non-overlapping) sched domains
+ * ALWAYS span their child domains, so operations done with those parent domains
+ * would cover the CPUs in the lower domains that did not have this flag anyway.
+ *
+ *
+ * SHARED_PARENT: These flags are meant to be set from the highest domain
+ * downwards. If a domain has this flag set, all of its parents should have it
+ * set. This is usually for topology properties that start to appear above a
+ * certain level (e.g. domain starts spanning CPUs outside of the base CPU's
+ * socket), which translates to scheduler behaviours that should start taking
+ * effect from a given level.
+ */
+#define SDF_SHARED_CHILD 0x1
+#define SDF_SHARED_PARENT 0x2
+
+/*
+ * Balance when about to become idle
+ *
+ * SHARED_CHILD: Can be set from cpuset.sched_relax_domain_level downwards.
+ */
+SD_FLAG(SD_BALANCE_NEWIDLE, 0, SDF_SHARED_CHILD)
+
+/*
+ * Balance on exec
+ *
+ * SHARED_CHILD: Can be set from NUMA reclaim level downwards.
+ */
+SD_FLAG(SD_BALANCE_EXEC, 1, SDF_SHARED_CHILD)
+
+/*
+ * Balance on fork, clone
+ *
+ * SHARED_CHILD: Can be set from NUMA reclaim level downwards.
+ */
+SD_FLAG(SD_BALANCE_FORK, 2, SDF_SHARED_CHILD)
+
+/*
+ * Balance on wakeup
+ *
+ * SHARED_CHILD: Can be set from cpuset.sched_relax_domain_level downwards.
+ */
+SD_FLAG(SD_BALANCE_WAKE, 3, SDF_SHARED_CHILD)
+
+/*
+ * Wake task to waking CPU
+ *
+ * SHARED_CHILD: Only for levels below NUMA reclaim distance
+ */
+SD_FLAG(SD_WAKE_AFFINE, 4, SDF_SHARED_CHILD)
+
+/*
+ * Domain members have different CPU capacities
+ *
+ * SHARED_PARENT: CPU capacity asymmetry is visible from a given level
+ * upwards.
+ */
+SD_FLAG(SD_ASYM_CPUCAPACITY, 5, SDF_SHARED_PARENT)
+
+/*
+ * Domain members share CPU capacity
+ *
+ * SHARED_CHILD: CPU capacity is shared from the base domain up to a given level.
+ */
+SD_FLAG(SD_SHARE_CPUCAPACITY, 6, SDF_SHARED_CHILD)
+
+/*
+ * Domain members share power domain
+ *
+ * SHARED_CHILD: Power domains are shared from the base domain up to a given
+ * level.
+ */
+SD_FLAG(SD_SHARE_POWERDOMAIN, 7, SDF_SHARED_CHILD)
+
+/*
+ * Domain members share CPU package resources (i.e. caches)
+ *
+ * SHARED_CHILD: Caches are shared from the base domain up to a given level.
+ */
+SD_FLAG(SD_SHARE_PKG_RESOURCES, 8, SDF_SHARED_CHILD)
+
+/*
+ * Only a single load balancing instance
+ *
+ * SHARED_PARENT: Set for all NUMA levels (incl. NODE), which sit atop
+ * "regular" levels (those built from sched_domain_topology). Could be set
+ * differently, but this is still a property that needs to be propagated to
+ * parents: it doesn't make sense to have it e.g. in the first few levels and
+ * not higher up.
+ */
+SD_FLAG(SD_SERIALIZE, 9, SDF_SHARED_PARENT)
+
+/*
+ * Place busy groups earlier in the domain
+ *
+ * SHARED_CHILD: Set for SMT. Technically could be set further up, but currently
+ * assumed to be set from the base domain upwards (see update_top_cache_domain()).
+ */
+SD_FLAG(SD_ASYM_PACKING, 10, SDF_SHARED_CHILD)
+
+/*
+ * Prefer to place tasks in a sibling domain
+ *
+ * NONE: Set up until domains start spanning NUMA. Close to being a
+ * SHARED_CHILD flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
+ */
+SD_FLAG(SD_PREFER_SIBLING, 11, 0)
+
+/*
+ * sched_domains of this level overlap
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE.
+ */
+SD_FLAG(SD_OVERLAP, 12, SDF_SHARED_PARENT)
+
+/*
+ * cross-node balancing
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE.
+ */
+SD_FLAG(SD_NUMA, 13, SDF_SHARED_PARENT)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index db7d24c0174b..cdb71ee6f53a 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -11,10 +11,22 @@
*/
#ifdef CONFIG_SMP
-#define SD_FLAG(name, idx) static const unsigned int name = BIT(idx);
+#define SD_FLAG(name, idx, type) static const unsigned int name = BIT(idx);
#include <linux/sched/sd_flags.h>
#undef SD_FLAG
+#ifdef CONFIG_SCHED_DEBUG
+#define SD_FLAG(_name, idx, _meta_flags) [idx] = {.meta_flags = _meta_flags, .name = #_name},
+
+static const struct {
+ unsigned int meta_flags;
+ char *name;
+} sd_flag_debug[] = {
+#include <linux/sched/sd_flags.h>
+};
+#undef SD_FLAG
+#endif
+
#ifdef CONFIG_SCHED_SMT
static inline int cpu_smt_flags(void)
{
--
2.24.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH 3/3] sched/topology: Verify SD_* flags setup when sched_debug is on
2020-03-11 18:33 [RFC PATCH 0/3] sched: Instrument sched domain flags Valentin Schneider
2020-03-11 18:33 ` [RFC PATCH 1/3] sched/topology: Split out SD_* flags declaration to its own file Valentin Schneider
2020-03-11 18:33 ` [RFC PATCH 2/3] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
@ 2020-03-11 18:33 ` Valentin Schneider
2 siblings, 0 replies; 7+ messages in thread
From: Valentin Schneider @ 2020-03-11 18:33 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
morten.rasmussen
Now that we have some description of what we expect the flags layout to
be, we can use that to assert at runtime that the actual layout is sane.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
kernel/sched/topology.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index af30e2198b22..2e9aee29b3a6 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -29,6 +29,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
struct cpumask *groupmask)
{
struct sched_group *group = sd->groups;
+ int flags = sd->flags;
cpumask_clear(groupmask);
@@ -43,6 +44,21 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
printk(KERN_ERR "ERROR: domain->groups does not contain CPU%d\n", cpu);
}
+ for (; flags; flags &= flags - 1) {
+ unsigned int idx = __ffs(flags);
+ unsigned int flag = BIT(idx);
+ unsigned int meta_flags = sd_flag_debug[idx].meta_flags;
+
+ if ((meta_flags & SDF_SHARED_CHILD) && sd->child &&
+ !(sd->child->flags & flag))
+ printk(KERN_ERR "ERROR: flag %s set here but not in child\n",
+ sd_flag_debug[idx].name);
+ else if ((meta_flags & SDF_SHARED_PARENT) && sd->parent &&
+ !(sd->parent->flags & flag))
+ printk(KERN_ERR "ERROR: flag %s set here but not in parent\n",
+ sd_flag_debug[idx].name);
+ }
+
printk(KERN_DEBUG "%*s groups:", level + 1, "");
do {
if (!group) {
--
2.24.0
^ permalink raw reply related [flat|nested] 7+ messages in thread