public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] RISC-V: hwprobe related stuff
@ 2023-09-21 12:55 Andrew Jones
  2023-09-21 12:55 ` [RFC PATCH 1/5] RISC-V: hwprobe: Clarify cpus size parameter Andrew Jones
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Andrew Jones @ 2023-09-21 12:55 UTC (permalink / raw)
  To: linux-riscv; +Cc: paul.walmsley, palmer, aou, evan, conor.dooley, apatel

This series clarifies the hwprobe interface and introduces the first
hwprobe flag. The flag basically reverses hwprobe to go from getting
values for cpus to getting cpus for values. I've made the series an
RFC because for the first part, the clarification, I'm not sure if
the only person that needs clarification is me or not. If it's just
me, then consider me clarified and we can drop it. The second part,
the new flag, is an RFC, because so far the only use case is the CBO
hwprobe selftest, which this series is based on and the last patch of
the series applies the new hwprobe variant to it. If we don't believe
other use cases, or use cases that care about having to do the probing
inefficiently like the CBO test does before this series, will crop up,
then the second part of this series can be dropped.

Based-on: 20230918131518.56803-8-ajones@ventanamicro.com

Thanks,
drew


Andrew Jones (5):
  RISC-V: hwprobe: Clarify cpus size parameter
  RISC-V: selftests: Replace cpu_count with cpusetsize
  RISC-V: hwprobe: Introduce which-cpus flag
  RISC-V: selftests: Add which-cpus hwprobe test
  RISC-V: selftests: Apply which-cpus flag to CBO hwprobe test

 Documentation/riscv/hwprobe.rst               |  27 ++-
 arch/riscv/include/uapi/asm/hwprobe.h         |   3 +
 arch/riscv/kernel/sys_riscv.c                 | 162 +++++++++++++++++-
 arch/riscv/kernel/vdso/hwprobe.c              |  10 +-
 .../testing/selftests/riscv/hwprobe/Makefile  |   5 +-
 tools/testing/selftests/riscv/hwprobe/cbo.c   |  26 +--
 .../testing/selftests/riscv/hwprobe/hwprobe.c |   2 +-
 .../testing/selftests/riscv/hwprobe/hwprobe.h |   2 +-
 .../selftests/riscv/hwprobe/which-cpus.c      | 156 +++++++++++++++++
 .../selftests/riscv/vector/vstate_prctl.c     |   2 +-
 10 files changed, 353 insertions(+), 42 deletions(-)
 create mode 100644 tools/testing/selftests/riscv/hwprobe/which-cpus.c

-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 1/5] RISC-V: hwprobe: Clarify cpus size parameter
  2023-09-21 12:55 [RFC PATCH 0/5] RISC-V: hwprobe related stuff Andrew Jones
@ 2023-09-21 12:55 ` Andrew Jones
  2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-21 12:55 ` [RFC PATCH 2/5] RISC-V: selftests: Replace cpu_count with cpusetsize Andrew Jones
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2023-09-21 12:55 UTC (permalink / raw)
  To: linux-riscv; +Cc: paul.walmsley, palmer, aou, evan, conor.dooley, apatel

The "count" parameter associated with the 'cpus' parameter of the
hwprobe syscall is the size in bytes of 'cpus'. Naming it 'cpu_count'
may mislead users (it did me) to think it's the number of CPUs that
are or can be represented by 'cpus' instead. This is particularly
easy (IMO) to get wrong since 'cpus' is documented to be defined by
CPU_SET(3) and CPU_SET(3) also documents a CPU_COUNT() (the number
of CPUs in set) macro. CPU_SET(3) refers to the size of cpu sets
with 'setsize'. Adopt 'cpusetsize' for the hwprobe parameter and
specifically state it is in bytes in Documentation/riscv/hwprobe.rst
to clarify.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 Documentation/riscv/hwprobe.rst  | 15 ++++++++-------
 arch/riscv/kernel/sys_riscv.c    | 14 +++++++-------
 arch/riscv/kernel/vdso/hwprobe.c | 10 +++++-----
 3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
index 7b2384de471f..132e9acaa8f4 100644
--- a/Documentation/riscv/hwprobe.rst
+++ b/Documentation/riscv/hwprobe.rst
@@ -12,7 +12,7 @@ is defined in <asm/hwprobe.h>::
     };
 
     long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
-                           size_t cpu_count, cpu_set_t *cpus,
+                           size_t cpusetsize, cpu_set_t *cpus,
                            unsigned int flags);
 
 The arguments are split into three groups: an array of key-value pairs, a CPU
@@ -20,12 +20,13 @@ set, and some flags. The key-value pairs are supplied with a count. Userspace
 must prepopulate the key field for each element, and the kernel will fill in the
 value if the key is recognized. If a key is unknown to the kernel, its key field
 will be cleared to -1, and its value set to 0. The CPU set is defined by
-CPU_SET(3). For value-like keys (eg. vendor/arch/impl), the returned value will
-be only be valid if all CPUs in the given set have the same value. Otherwise -1
-will be returned. For boolean-like keys, the value returned will be a logical
-AND of the values for the specified CPUs. Usermode can supply NULL for cpus and
-0 for cpu_count as a shortcut for all online CPUs. There are currently no flags,
-this value must be zero for future compatibility.
+CPU_SET(3) with size ``cpusetsize`` bytes. For value-like keys (eg. vendor,
+arch, impl), the returned value will only be valid if all CPUs in the given set
+have the same value. Otherwise -1 will be returned. For boolean-like keys, the
+value returned will be a logical AND of the values for the specified CPUs.
+Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
+all online CPUs. There are currently no flags, this value must be zero for
+future compatibility.
 
 On success 0 is returned, on failure a negative error code is returned.
 
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index d7266a9abc66..14b6dfaa5d9f 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -246,7 +246,7 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
 }
 
 static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
-			    size_t pair_count, size_t cpu_count,
+			    size_t pair_count, size_t cpusetsize,
 			    unsigned long __user *cpus_user,
 			    unsigned int flags)
 {
@@ -264,13 +264,13 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
 	 * 0 as a shortcut to all online CPUs.
 	 */
 	cpumask_clear(&cpus);
-	if (!cpu_count && !cpus_user) {
+	if (!cpusetsize && !cpus_user) {
 		cpumask_copy(&cpus, cpu_online_mask);
 	} else {
-		if (cpu_count > cpumask_size())
-			cpu_count = cpumask_size();
+		if (cpusetsize > cpumask_size())
+			cpusetsize = cpumask_size();
 
-		ret = copy_from_user(&cpus, cpus_user, cpu_count);
+		ret = copy_from_user(&cpus, cpus_user, cpusetsize);
 		if (ret)
 			return -EFAULT;
 
@@ -347,10 +347,10 @@ arch_initcall_sync(init_hwprobe_vdso_data);
 #endif /* CONFIG_MMU */
 
 SYSCALL_DEFINE5(riscv_hwprobe, struct riscv_hwprobe __user *, pairs,
-		size_t, pair_count, size_t, cpu_count, unsigned long __user *,
+		size_t, pair_count, size_t, cpusetsize, unsigned long __user *,
 		cpus, unsigned int, flags)
 {
-	return do_riscv_hwprobe(pairs, pair_count, cpu_count,
+	return do_riscv_hwprobe(pairs, pair_count, cpusetsize,
 				cpus, flags);
 }
 
diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c
index d40bec6ac078..3000e9fc0569 100644
--- a/arch/riscv/kernel/vdso/hwprobe.c
+++ b/arch/riscv/kernel/vdso/hwprobe.c
@@ -8,21 +8,21 @@
 #include <vdso/helpers.h>
 
 extern int riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
-			 size_t cpu_count, unsigned long *cpus,
+			 size_t cpusetsize, unsigned long *cpus,
 			 unsigned int flags);
 
 /* Add a prototype to avoid -Wmissing-prototypes warning. */
 int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
-			 size_t cpu_count, unsigned long *cpus,
+			 size_t cpusetsize, unsigned long *cpus,
 			 unsigned int flags);
 
 int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
-			 size_t cpu_count, unsigned long *cpus,
+			 size_t cpusetsize, unsigned long *cpus,
 			 unsigned int flags)
 {
 	const struct vdso_data *vd = __arch_get_vdso_data();
 	const struct arch_vdso_data *avd = &vd->arch_data;
-	bool all_cpus = !cpu_count && !cpus;
+	bool all_cpus = !cpusetsize && !cpus;
 	struct riscv_hwprobe *p = pairs;
 	struct riscv_hwprobe *end = pairs + pair_count;
 
@@ -33,7 +33,7 @@ int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
 	 * masks.
 	 */
 	if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus))
-		return riscv_hwprobe(pairs, pair_count, cpu_count, cpus, flags);
+		return riscv_hwprobe(pairs, pair_count, cpusetsize, cpus, flags);
 
 	/* This is something we can handle, fill out the pairs. */
 	while (p < end) {
-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 2/5] RISC-V: selftests: Replace cpu_count with cpusetsize
  2023-09-21 12:55 [RFC PATCH 0/5] RISC-V: hwprobe related stuff Andrew Jones
  2023-09-21 12:55 ` [RFC PATCH 1/5] RISC-V: hwprobe: Clarify cpus size parameter Andrew Jones
@ 2023-09-21 12:55 ` Andrew Jones
  2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-21 12:55 ` [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag Andrew Jones
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2023-09-21 12:55 UTC (permalink / raw)
  To: linux-riscv; +Cc: paul.walmsley, palmer, aou, evan, conor.dooley, apatel

Use 'cpusetsize' instead of 'cpu_count' for the size of the cpu
set parameter of the hwprobe syscall.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 tools/testing/selftests/riscv/hwprobe/hwprobe.c     | 2 +-
 tools/testing/selftests/riscv/hwprobe/hwprobe.h     | 2 +-
 tools/testing/selftests/riscv/vector/vstate_prctl.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/riscv/hwprobe/hwprobe.c b/tools/testing/selftests/riscv/hwprobe/hwprobe.c
index c474891df307..d53e0889b59e 100644
--- a/tools/testing/selftests/riscv/hwprobe/hwprobe.c
+++ b/tools/testing/selftests/riscv/hwprobe/hwprobe.c
@@ -47,7 +47,7 @@ int main(int argc, char **argv)
 	ksft_test_result(out != 0, "Bad CPU set\n");
 
 	out = riscv_hwprobe(pairs, 8, 1, 0, 0);
-	ksft_test_result(out != 0, "NULL CPU set with non-zero count\n");
+	ksft_test_result(out != 0, "NULL CPU set with non-zero size\n");
 
 	pairs[0].key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR;
 	out = riscv_hwprobe(pairs, 1, 1, &cpus, 0);
diff --git a/tools/testing/selftests/riscv/hwprobe/hwprobe.h b/tools/testing/selftests/riscv/hwprobe/hwprobe.h
index 721b0ce73a56..e3fccb390c4d 100644
--- a/tools/testing/selftests/riscv/hwprobe/hwprobe.h
+++ b/tools/testing/selftests/riscv/hwprobe/hwprobe.h
@@ -10,6 +10,6 @@
  * contain the call.
  */
 long riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
-		   size_t cpu_count, unsigned long *cpus, unsigned int flags);
+		   size_t cpusetsize, unsigned long *cpus, unsigned int flags);
 
 #endif
diff --git a/tools/testing/selftests/riscv/vector/vstate_prctl.c b/tools/testing/selftests/riscv/vector/vstate_prctl.c
index b348b475be57..540fe588e78f 100644
--- a/tools/testing/selftests/riscv/vector/vstate_prctl.c
+++ b/tools/testing/selftests/riscv/vector/vstate_prctl.c
@@ -13,7 +13,7 @@
  * contain the call.
  */
 long riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
-		   size_t cpu_count, unsigned long *cpus, unsigned int flags);
+		   size_t cpusetsize, unsigned long *cpus, unsigned int flags);
 
 #define NEXT_PROGRAM "./vstate_exec_nolibc"
 static int launch_test(int test_inherit)
-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-09-21 12:55 [RFC PATCH 0/5] RISC-V: hwprobe related stuff Andrew Jones
  2023-09-21 12:55 ` [RFC PATCH 1/5] RISC-V: hwprobe: Clarify cpus size parameter Andrew Jones
  2023-09-21 12:55 ` [RFC PATCH 2/5] RISC-V: selftests: Replace cpu_count with cpusetsize Andrew Jones
@ 2023-09-21 12:55 ` Andrew Jones
  2023-09-25 11:23   ` Palmer Dabbelt
                     ` (2 more replies)
  2023-09-21 12:55 ` [RFC PATCH 4/5] RISC-V: selftests: Add which-cpus hwprobe test Andrew Jones
  2023-09-21 12:55 ` [RFC PATCH 5/5] RISC-V: selftests: Apply which-cpus flag to CBO " Andrew Jones
  4 siblings, 3 replies; 26+ messages in thread
From: Andrew Jones @ 2023-09-21 12:55 UTC (permalink / raw)
  To: linux-riscv; +Cc: paul.walmsley, palmer, aou, evan, conor.dooley, apatel

Introduce the first flag for the hwprobe syscall. The flag basically
reverses its behavior, i.e. instead of populating the values of keys
for a given set of cpus, the set of cpus after the call is the result
of finding a set which supports the values of the keys. In order to
do this, we implement pair merge and pair compare functions which
take the type of value (a single value vs. a bitmap of booleans) into
consideration. The flow for the which-cpus syscall variant is as
follows:

  1. Merge pairs into a set of pairs with unique keys
  2. If any unknown keys are seen, return an empty set of cpus
  3. If the platform is homogeneous, then check all the pairs
     against the "all cpu" values and return early
  4. Otherwise, check all the pairs against each cpu individually

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 Documentation/riscv/hwprobe.rst       |  16 ++-
 arch/riscv/include/uapi/asm/hwprobe.h |   3 +
 arch/riscv/kernel/sys_riscv.c         | 148 +++++++++++++++++++++++++-
 3 files changed, 163 insertions(+), 4 deletions(-)

diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
index 132e9acaa8f4..97b1e97e7dd2 100644
--- a/Documentation/riscv/hwprobe.rst
+++ b/Documentation/riscv/hwprobe.rst
@@ -25,8 +25,20 @@ arch, impl), the returned value will only be valid if all CPUs in the given set
 have the same value. Otherwise -1 will be returned. For boolean-like keys, the
 value returned will be a logical AND of the values for the specified CPUs.
 Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
-all online CPUs. There are currently no flags, this value must be zero for
-future compatibility.
+all online CPUs. The currently supported flags are:
+
+* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
+  of sys_riscv_hwprobe().  Instead of populating the values of keys for a given
+  set of CPUs, the set of CPUs is initially all unset and the values of each key
+  are given.  Upon return, the CPUs which all match each of the given key-value
+  pairs are set in ``cpus``.  How matching is done depends on the key type.  For
+  value-like keys, matching means to be the exact same as the value.  For
+  boolean-like keys, matching means the result of a logical AND of the pair's
+  value with the CPU's value is exactly the same as the pair's value.  ``cpus``
+  may also initially have set bits, in which case the bits of any CPUs which do
+  not match the pairs will be cleared, but no other bits will be set.
+
+All other flags are reserved for future compatibility and must be zero.
 
 On success 0 is returned, on failure a negative error code is returned.
 
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
index 86d08a0e617b..36683307c3e4 100644
--- a/arch/riscv/include/uapi/asm/hwprobe.h
+++ b/arch/riscv/include/uapi/asm/hwprobe.h
@@ -40,4 +40,7 @@ struct riscv_hwprobe {
 #define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE	6
 /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
 
+/* Flags */
+#define RISCV_HWPROBE_WHICH_CPUS	(1 << 0)
+
 #endif
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 14b6dfaa5d9f..c70a72fe6aee 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -245,14 +245,145 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
 	}
 }
 
+static bool hwprobe_key_is_map(__s64 key)
+{
+	switch (key) {
+	case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
+	case RISCV_HWPROBE_KEY_IMA_EXT_0:
+	case RISCV_HWPROBE_KEY_CPUPERF_0:
+		return true;
+	}
+
+	return false;
+}
+
+static int hwprobe_pair_merge(struct riscv_hwprobe *to,
+			      struct riscv_hwprobe *from)
+{
+	if (to->key != from->key)
+		return -EINVAL;
+
+	if (hwprobe_key_is_map(to->key)) {
+		to->value |= from->value;
+		return 0;
+	}
+
+	return to->value == from->value ? 0 : -EINVAL;
+}
+
+static bool hwprobe_pair_cmp(struct riscv_hwprobe *pair,
+			     struct riscv_hwprobe *other_pair)
+{
+	if (pair->key != other_pair->key)
+		return false;
+
+	if (hwprobe_key_is_map(pair->key))
+		return (pair->value & other_pair->value) == other_pair->value;
+
+	return pair->value == other_pair->value;
+}
+
+static int hwprobe_which_cpus(struct riscv_hwprobe __user *pairs_user,
+			      size_t pair_count, size_t cpusetsize,
+			      cpumask_t *cpus)
+{
+	struct riscv_hwprobe pairs[RISCV_HWPROBE_MAX_KEY + 1] = {
+		[0 ... RISCV_HWPROBE_MAX_KEY] = (struct riscv_hwprobe){ .key = -1 }
+	};
+	struct riscv_hwprobe pair;
+	struct vdso_data *vd = __arch_get_k_vdso_data();
+	struct arch_vdso_data *avd = &vd->arch_data;
+	bool clear_all = false;
+	cpumask_t one_cpu;
+	int cpu, ret;
+	size_t i;
+
+	for (i = 0; i < pair_count; i++) {
+		ret = copy_from_user(&pair, &pairs_user[i], sizeof(pair));
+		if (ret)
+			return -EFAULT;
+
+		if (pair.key >= 0 && pair.key <= RISCV_HWPROBE_MAX_KEY) {
+			if (pairs[pair.key].key == -1) {
+				pairs[pair.key] = pair;
+			} else {
+				ret = hwprobe_pair_merge(&pairs[pair.key], &pair);
+				if (ret)
+					return ret;
+			}
+		} else {
+			pair.key = -1;
+			pair.value = 0;
+			ret = copy_to_user(&pairs_user[i], &pair, sizeof(pair));
+			if (ret)
+				return -EFAULT;
+			clear_all = true;
+		}
+	}
+
+	if (clear_all) {
+		cpumask_clear(cpus);
+		return 0;
+	}
+
+	if (avd->homogeneous_cpus) {
+		for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
+			if (pairs[i].key == -1)
+				continue;
+
+			pair.key = pairs[i].key;
+			pair.value = avd->all_cpu_hwprobe_values[pairs[i].key];
+
+			if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
+				cpumask_clear(cpus);
+				return 0;
+			}
+		}
+
+		return 0;
+	}
+
+	cpumask_clear(&one_cpu);
+
+	for_each_cpu(cpu, cpus) {
+		cpumask_set_cpu(cpu, &one_cpu);
+
+		for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
+			if (pairs[i].key == -1)
+				continue;
+
+			pair.key = pairs[i].key;
+			pair.value = 0;
+			hwprobe_one_pair(&pair, &one_cpu);
+
+			if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
+				cpumask_clear_cpu(cpu, cpus);
+				break;
+			}
+		}
+
+		cpumask_clear_cpu(cpu, &one_cpu);
+	}
+
+	return 0;
+}
+
 static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
 			    size_t pair_count, size_t cpusetsize,
 			    unsigned long __user *cpus_user,
 			    unsigned int flags)
 {
+	bool which_cpus = false;
+	cpumask_t cpus;
 	size_t out;
 	int ret;
-	cpumask_t cpus;
+
+	if (flags & RISCV_HWPROBE_WHICH_CPUS) {
+		if (!cpusetsize || !cpus_user)
+			return -EINVAL;
+		flags &= ~RISCV_HWPROBE_WHICH_CPUS;
+		which_cpus = true;
+	}
 
 	/* Check the reserved flags. */
 	if (flags != 0)
@@ -274,11 +405,24 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
 		if (ret)
 			return -EFAULT;
 
+		cpumask_and(&cpus, &cpus, cpu_online_mask);
+
+		if (which_cpus) {
+			if (cpumask_empty(&cpus))
+				cpumask_copy(&cpus, cpu_online_mask);
+			ret = hwprobe_which_cpus(pairs, pair_count, cpusetsize, &cpus);
+			if (ret)
+				return ret;
+			ret = copy_to_user(cpus_user, &cpus, cpusetsize);
+			if (ret)
+				return -EFAULT;
+			return 0;
+		}
+
 		/*
 		 * Userspace must provide at least one online CPU, without that
 		 * there's no way to define what is supported.
 		 */
-		cpumask_and(&cpus, &cpus, cpu_online_mask);
 		if (cpumask_empty(&cpus))
 			return -EINVAL;
 	}
-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 4/5] RISC-V: selftests: Add which-cpus hwprobe test
  2023-09-21 12:55 [RFC PATCH 0/5] RISC-V: hwprobe related stuff Andrew Jones
                   ` (2 preceding siblings ...)
  2023-09-21 12:55 ` [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag Andrew Jones
@ 2023-09-21 12:55 ` Andrew Jones
  2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-21 12:55 ` [RFC PATCH 5/5] RISC-V: selftests: Apply which-cpus flag to CBO " Andrew Jones
  4 siblings, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2023-09-21 12:55 UTC (permalink / raw)
  To: linux-riscv; +Cc: paul.walmsley, palmer, aou, evan, conor.dooley, apatel

Test the RISCV_HWPROBE_WHICH_CPUS flag of hwprobe. The test also
has a command line interface in order to get the cpu list for
arbitrary hwprobe pairs.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 .../testing/selftests/riscv/hwprobe/Makefile  |   5 +-
 .../selftests/riscv/hwprobe/which-cpus.c      | 156 ++++++++++++++++++
 2 files changed, 160 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/riscv/hwprobe/which-cpus.c

diff --git a/tools/testing/selftests/riscv/hwprobe/Makefile b/tools/testing/selftests/riscv/hwprobe/Makefile
index f224b84591fb..cec81610a5f2 100644
--- a/tools/testing/selftests/riscv/hwprobe/Makefile
+++ b/tools/testing/selftests/riscv/hwprobe/Makefile
@@ -4,7 +4,7 @@
 
 CFLAGS += -I$(top_srcdir)/tools/include
 
-TEST_GEN_PROGS := hwprobe cbo
+TEST_GEN_PROGS := hwprobe cbo which-cpus
 
 include ../../lib.mk
 
@@ -13,3 +13,6 @@ $(OUTPUT)/hwprobe: hwprobe.c sys_hwprobe.S
 
 $(OUTPUT)/cbo: cbo.c sys_hwprobe.S
 	$(CC) -static -o$@ $(CFLAGS) $(LDFLAGS) $^
+
+$(OUTPUT)/which-cpus: which-cpus.c sys_hwprobe.S
+	$(CC) -static -o$@ $(CFLAGS) $(LDFLAGS) $^
diff --git a/tools/testing/selftests/riscv/hwprobe/which-cpus.c b/tools/testing/selftests/riscv/hwprobe/which-cpus.c
new file mode 100644
index 000000000000..83877a715b08
--- /dev/null
+++ b/tools/testing/selftests/riscv/hwprobe/which-cpus.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Ventana Micro Systems Inc.
+ *
+ * Test the RISCV_HWPROBE_WHICH_CPUS flag of hwprobe. Also provides a command
+ * line interface to get the cpu list for arbitrary hwprobe pairs.
+ */
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sched.h>
+#include <unistd.h>
+#include <assert.h>
+
+#include "hwprobe.h"
+#include "../../kselftest.h"
+
+static void help(void)
+{
+	printf("\n"
+	       "which-cpus: [-h] [<key=value> [<key=value> ...]]\n\n"
+	       "   Without parameters, tests the RISCV_HWPROBE_WHICH_CPUS flag of hwprobe.\n"
+	       "   With parameters, where each parameter is a hwprobe pair written as\n"
+	       "   <key=value>, outputs the cpulist for cpus which all match the given set\n"
+	       "   of pairs.  'key' and 'value' should be in numeric form, e.g. 4=0x3b\n");
+}
+
+static void print_cpulist(cpu_set_t *cpus)
+{
+	int start = 0, end = 0;
+
+	if (!CPU_COUNT(cpus)) {
+		printf("cpus: None\n");
+		return;
+	}
+
+	printf("cpus:");
+	for (int i = 0, c = 0; i < CPU_COUNT(cpus); i++, c++) {
+		if (start != end && !CPU_ISSET(c, cpus))
+			printf("-%d", end);
+
+		while (!CPU_ISSET(c, cpus))
+			++c;
+
+		if (i != 0 && c == end + 1) {
+			end = c;
+			continue;
+		}
+
+		printf("%c%d", i == 0 ? ' ' : ',', c);
+		start = end = c;
+	}
+	if (start != end)
+		printf("-%d", end);
+	printf("\n");
+}
+
+static void do_which_cpus(int argc, char **argv)
+{
+	struct riscv_hwprobe *pairs;
+	int nr_pairs = argc - 1;
+	cpu_set_t cpus;
+	char *start, *end;
+	int rc;
+
+	pairs = malloc(nr_pairs * sizeof(struct riscv_hwprobe));
+	assert(pairs);
+
+	for (int i = 0; i < nr_pairs; i++) {
+		start = argv[i + 1];
+		pairs[i].key = strtol(start, &end, 0);
+		assert(end != start && *end == '=');
+		start = end + 1;
+		pairs[i].value = strtoul(start, &end, 0);
+		assert(end != start && *end == '\0');
+	}
+
+	CPU_ZERO(&cpus);
+	rc = riscv_hwprobe(pairs, nr_pairs, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
+	assert(rc == 0);
+	print_cpulist(&cpus);
+	free(pairs);
+}
+
+int main(int argc, char **argv)
+{
+	struct riscv_hwprobe pairs[2];
+	cpu_set_t cpus_aff, cpus;
+	__u64 ext0_all;
+	long rc;
+
+	if (argc > 1) {
+		if (!strcmp(argv[1], "-h"))
+			help();
+		else
+			do_which_cpus(argc, argv);
+		return 0;
+	}
+
+	rc = sched_getaffinity(0, sizeof(cpu_set_t), &cpus_aff);
+	assert(rc == 0);
+
+	ksft_print_header();
+	ksft_set_plan(7);
+
+	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, };
+	rc = riscv_hwprobe(pairs, 1, 0, NULL, 0);
+	assert(rc == 0 && pairs[0].key == RISCV_HWPROBE_KEY_BASE_BEHAVIOR &&
+	       pairs[0].value == RISCV_HWPROBE_BASE_BEHAVIOR_IMA);
+
+	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_IMA_EXT_0, };
+	rc = riscv_hwprobe(pairs, 1, 0, NULL, 0);
+	assert(rc == 0 && pairs[0].key == RISCV_HWPROBE_KEY_IMA_EXT_0);
+	ext0_all = pairs[0].value;
+
+	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
+	CPU_ZERO(&cpus);
+	rc = riscv_hwprobe(pairs, 1, 0, (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
+	ksft_test_result(rc == -EINVAL, "no cpusetsize\n");
+
+	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
+	rc = riscv_hwprobe(pairs, 1, sizeof(cpu_set_t), NULL, RISCV_HWPROBE_WHICH_CPUS);
+	ksft_test_result(rc == -EINVAL, "NULL cpus\n");
+
+	pairs[0] = (struct riscv_hwprobe){ .key = 0xbadc0de, };
+	CPU_ZERO(&cpus);
+	rc = riscv_hwprobe(pairs, 1, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
+	ksft_test_result(rc == 0 && CPU_COUNT(&cpus) == 0, "unknown key\n");
+
+	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
+	pairs[1] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
+	CPU_ZERO(&cpus);
+	rc = riscv_hwprobe(pairs, 2, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
+	ksft_test_result(rc == 0, "duplicate keys\n");
+
+	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
+	pairs[1] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_IMA_EXT_0, .value = ext0_all, };
+	CPU_ZERO(&cpus);
+	rc = riscv_hwprobe(pairs, 2, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
+	ksft_test_result(rc == 0 && CPU_COUNT(&cpus) == sysconf(_SC_NPROCESSORS_ONLN), "set all cpus\n");
+
+	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
+	pairs[1] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_IMA_EXT_0, .value = ext0_all, };
+	memcpy(&cpus, &cpus_aff, sizeof(cpu_set_t));
+	rc = riscv_hwprobe(pairs, 2, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
+	ksft_test_result(rc == 0 && CPU_EQUAL(&cpus, &cpus_aff), "set all affinity cpus\n");
+
+	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
+	pairs[1] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_IMA_EXT_0, .value = ~ext0_all, };
+	memcpy(&cpus, &cpus_aff, sizeof(cpu_set_t));
+	rc = riscv_hwprobe(pairs, 2, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
+	ksft_test_result(rc == 0 && CPU_COUNT(&cpus) == 0, "clear all cpus\n");
+
+	ksft_finished();
+}
-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 5/5] RISC-V: selftests: Apply which-cpus flag to CBO hwprobe test
  2023-09-21 12:55 [RFC PATCH 0/5] RISC-V: hwprobe related stuff Andrew Jones
                   ` (3 preceding siblings ...)
  2023-09-21 12:55 ` [RFC PATCH 4/5] RISC-V: selftests: Add which-cpus hwprobe test Andrew Jones
@ 2023-09-21 12:55 ` Andrew Jones
  2023-09-25 11:23   ` Palmer Dabbelt
  4 siblings, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2023-09-21 12:55 UTC (permalink / raw)
  To: linux-riscv; +Cc: paul.walmsley, palmer, aou, evan, conor.dooley, apatel

Now that we can efficiently clear cpus from a cpu set that do
not meet a key-value criteria, apply it to the cpus used in
the CBO hwprobe selftest, instead of looping over hwprobe.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
---
 tools/testing/selftests/riscv/hwprobe/cbo.c | 26 +++++++--------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/riscv/hwprobe/cbo.c b/tools/testing/selftests/riscv/hwprobe/cbo.c
index 50a2cc8aef38..375a343bc2c6 100644
--- a/tools/testing/selftests/riscv/hwprobe/cbo.c
+++ b/tools/testing/selftests/riscv/hwprobe/cbo.c
@@ -135,27 +135,19 @@ static void check_no_zicboz_cpus(cpu_set_t *cpus)
 {
 	struct riscv_hwprobe pair = {
 		.key = RISCV_HWPROBE_KEY_IMA_EXT_0,
+		.value = RISCV_HWPROBE_EXT_ZICBOZ,
 	};
-	cpu_set_t one_cpu;
-	int i = 0, c = 0;
+	cpu_set_t tmp;
 	long rc;
 
-	while (i++ < CPU_COUNT(cpus)) {
-		while (!CPU_ISSET(c, cpus))
-			++c;
-
-		CPU_ZERO(&one_cpu);
-		CPU_SET(c, &one_cpu);
-
-		rc = riscv_hwprobe(&pair, 1, sizeof(cpu_set_t), (unsigned long *)&one_cpu, 0);
-		assert(rc == 0 && pair.key == RISCV_HWPROBE_KEY_IMA_EXT_0);
+	memcpy(&tmp, cpus, sizeof(cpu_set_t));
+	rc = riscv_hwprobe(&pair, 1, sizeof(cpu_set_t), (unsigned long *)&tmp, RISCV_HWPROBE_WHICH_CPUS);
+	assert(rc == 0 && pair.key == RISCV_HWPROBE_KEY_IMA_EXT_0);
 
-		if (pair.value & RISCV_HWPROBE_EXT_ZICBOZ)
-			ksft_exit_fail_msg("Zicboz is only present on a subset of harts.\n"
-					   "Use taskset to select a set of harts where Zicboz\n"
-					   "presence (present or not) is consistent for each hart\n");
-		++c;
-	}
+	if (CPU_COUNT(&tmp))
+		ksft_exit_fail_msg("Zicboz is only present on a subset of harts.\n"
+				   "Use taskset to select a set of harts where Zicboz\n"
+				   "presence (present or not) is consistent for each hart\n");
 }
 
 enum {
-- 
2.41.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/5] RISC-V: hwprobe: Clarify cpus size parameter
  2023-09-21 12:55 ` [RFC PATCH 1/5] RISC-V: hwprobe: Clarify cpus size parameter Andrew Jones
@ 2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-25 12:07     ` Andrew Jones
  0 siblings, 1 reply; 26+ messages in thread
From: Palmer Dabbelt @ 2023-09-25 11:23 UTC (permalink / raw)
  To: ajones; +Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Thu, 21 Sep 2023 05:55:20 PDT (-0700), ajones@ventanamicro.com wrote:
> The "count" parameter associated with the 'cpus' parameter of the
> hwprobe syscall is the size in bytes of 'cpus'. Naming it 'cpu_count'
> may mislead users (it did me) to think it's the number of CPUs that
> are or can be represented by 'cpus' instead. This is particularly
> easy (IMO) to get wrong since 'cpus' is documented to be defined by
> CPU_SET(3) and CPU_SET(3) also documents a CPU_COUNT() (the number
> of CPUs in set) macro. CPU_SET(3) refers to the size of cpu sets
> with 'setsize'. Adopt 'cpusetsize' for the hwprobe parameter and
> specifically state it is in bytes in Documentation/riscv/hwprobe.rst
> to clarify.
>
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  Documentation/riscv/hwprobe.rst  | 15 ++++++++-------
>  arch/riscv/kernel/sys_riscv.c    | 14 +++++++-------
>  arch/riscv/kernel/vdso/hwprobe.c | 10 +++++-----
>  3 files changed, 20 insertions(+), 19 deletions(-)
>
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index 7b2384de471f..132e9acaa8f4 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -12,7 +12,7 @@ is defined in <asm/hwprobe.h>::
>      };
>
>      long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> -                           size_t cpu_count, cpu_set_t *cpus,
> +                           size_t cpusetsize, cpu_set_t *cpus,
>                             unsigned int flags);
>
>  The arguments are split into three groups: an array of key-value pairs, a CPU
> @@ -20,12 +20,13 @@ set, and some flags. The key-value pairs are supplied with a count. Userspace
>  must prepopulate the key field for each element, and the kernel will fill in the
>  value if the key is recognized. If a key is unknown to the kernel, its key field
>  will be cleared to -1, and its value set to 0. The CPU set is defined by
> -CPU_SET(3). For value-like keys (eg. vendor/arch/impl), the returned value will
> -be only be valid if all CPUs in the given set have the same value. Otherwise -1
> -will be returned. For boolean-like keys, the value returned will be a logical
> -AND of the values for the specified CPUs. Usermode can supply NULL for cpus and
> -0 for cpu_count as a shortcut for all online CPUs. There are currently no flags,
> -this value must be zero for future compatibility.
> +CPU_SET(3) with size ``cpusetsize`` bytes. For value-like keys (eg. vendor,

So the only difference is just "with size ``cpusetsize`` bytes" and a 
bunch of line wrap changes?  That always makes these doc updates hard to 
read, as I go a bit cross-eyed trying to diff it manually.  I don't know 
of a better way to do it, though.

> +arch, impl), the returned value will only be valid if all CPUs in the given set
> +have the same value. Otherwise -1 will be returned. For boolean-like keys, the
> +value returned will be a logical AND of the values for the specified CPUs.
> +Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> +all online CPUs. There are currently no flags, this value must be zero for
> +future compatibility.
>
>  On success 0 is returned, on failure a negative error code is returned.
>
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index d7266a9abc66..14b6dfaa5d9f 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -246,7 +246,7 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
>  }
>
>  static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> -			    size_t pair_count, size_t cpu_count,
> +			    size_t pair_count, size_t cpusetsize,
>  			    unsigned long __user *cpus_user,
>  			    unsigned int flags)
>  {
> @@ -264,13 +264,13 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
>  	 * 0 as a shortcut to all online CPUs.
>  	 */
>  	cpumask_clear(&cpus);
> -	if (!cpu_count && !cpus_user) {
> +	if (!cpusetsize && !cpus_user) {
>  		cpumask_copy(&cpus, cpu_online_mask);
>  	} else {
> -		if (cpu_count > cpumask_size())
> -			cpu_count = cpumask_size();
> +		if (cpusetsize > cpumask_size())
> +			cpusetsize = cpumask_size();
>
> -		ret = copy_from_user(&cpus, cpus_user, cpu_count);
> +		ret = copy_from_user(&cpus, cpus_user, cpusetsize);
>  		if (ret)
>  			return -EFAULT;
>
> @@ -347,10 +347,10 @@ arch_initcall_sync(init_hwprobe_vdso_data);
>  #endif /* CONFIG_MMU */
>
>  SYSCALL_DEFINE5(riscv_hwprobe, struct riscv_hwprobe __user *, pairs,
> -		size_t, pair_count, size_t, cpu_count, unsigned long __user *,
> +		size_t, pair_count, size_t, cpusetsize, unsigned long __user *,
>  		cpus, unsigned int, flags)
>  {
> -	return do_riscv_hwprobe(pairs, pair_count, cpu_count,
> +	return do_riscv_hwprobe(pairs, pair_count, cpusetsize,
>  				cpus, flags);
>  }
>
> diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c
> index d40bec6ac078..3000e9fc0569 100644
> --- a/arch/riscv/kernel/vdso/hwprobe.c
> +++ b/arch/riscv/kernel/vdso/hwprobe.c
> @@ -8,21 +8,21 @@
>  #include <vdso/helpers.h>
>
>  extern int riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> -			 size_t cpu_count, unsigned long *cpus,
> +			 size_t cpusetsize, unsigned long *cpus,
>  			 unsigned int flags);
>
>  /* Add a prototype to avoid -Wmissing-prototypes warning. */
>  int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> -			 size_t cpu_count, unsigned long *cpus,
> +			 size_t cpusetsize, unsigned long *cpus,
>  			 unsigned int flags);
>
>  int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> -			 size_t cpu_count, unsigned long *cpus,
> +			 size_t cpusetsize, unsigned long *cpus,
>  			 unsigned int flags)
>  {
>  	const struct vdso_data *vd = __arch_get_vdso_data();
>  	const struct arch_vdso_data *avd = &vd->arch_data;
> -	bool all_cpus = !cpu_count && !cpus;
> +	bool all_cpus = !cpusetsize && !cpus;
>  	struct riscv_hwprobe *p = pairs;
>  	struct riscv_hwprobe *end = pairs + pair_count;
>
> @@ -33,7 +33,7 @@ int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
>  	 * masks.
>  	 */
>  	if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus))
> -		return riscv_hwprobe(pairs, pair_count, cpu_count, cpus, flags);
> +		return riscv_hwprobe(pairs, pair_count, cpusetsize, cpus, flags);
>
>  	/* This is something we can handle, fill out the pairs. */
>  	while (p < end) {

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 2/5] RISC-V: selftests: Replace cpu_count with cpusetsize
  2023-09-21 12:55 ` [RFC PATCH 2/5] RISC-V: selftests: Replace cpu_count with cpusetsize Andrew Jones
@ 2023-09-25 11:23   ` Palmer Dabbelt
  0 siblings, 0 replies; 26+ messages in thread
From: Palmer Dabbelt @ 2023-09-25 11:23 UTC (permalink / raw)
  To: ajones; +Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Thu, 21 Sep 2023 05:55:21 PDT (-0700), ajones@ventanamicro.com wrote:
> Use 'cpusetsize' instead of 'cpu_count' for the size of the cpu
> set parameter of the hwprobe syscall.
>
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  tools/testing/selftests/riscv/hwprobe/hwprobe.c     | 2 +-
>  tools/testing/selftests/riscv/hwprobe/hwprobe.h     | 2 +-
>  tools/testing/selftests/riscv/vector/vstate_prctl.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/riscv/hwprobe/hwprobe.c b/tools/testing/selftests/riscv/hwprobe/hwprobe.c
> index c474891df307..d53e0889b59e 100644
> --- a/tools/testing/selftests/riscv/hwprobe/hwprobe.c
> +++ b/tools/testing/selftests/riscv/hwprobe/hwprobe.c
> @@ -47,7 +47,7 @@ int main(int argc, char **argv)
>  	ksft_test_result(out != 0, "Bad CPU set\n");
>
>  	out = riscv_hwprobe(pairs, 8, 1, 0, 0);
> -	ksft_test_result(out != 0, "NULL CPU set with non-zero count\n");
> +	ksft_test_result(out != 0, "NULL CPU set with non-zero size\n");
>
>  	pairs[0].key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR;
>  	out = riscv_hwprobe(pairs, 1, 1, &cpus, 0);
> diff --git a/tools/testing/selftests/riscv/hwprobe/hwprobe.h b/tools/testing/selftests/riscv/hwprobe/hwprobe.h
> index 721b0ce73a56..e3fccb390c4d 100644
> --- a/tools/testing/selftests/riscv/hwprobe/hwprobe.h
> +++ b/tools/testing/selftests/riscv/hwprobe/hwprobe.h
> @@ -10,6 +10,6 @@
>   * contain the call.
>   */
>  long riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> -		   size_t cpu_count, unsigned long *cpus, unsigned int flags);
> +		   size_t cpusetsize, unsigned long *cpus, unsigned int flags);
>
>  #endif
> diff --git a/tools/testing/selftests/riscv/vector/vstate_prctl.c b/tools/testing/selftests/riscv/vector/vstate_prctl.c
> index b348b475be57..540fe588e78f 100644
> --- a/tools/testing/selftests/riscv/vector/vstate_prctl.c
> +++ b/tools/testing/selftests/riscv/vector/vstate_prctl.c
> @@ -13,7 +13,7 @@
>   * contain the call.
>   */
>  long riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> -		   size_t cpu_count, unsigned long *cpus, unsigned int flags);
> +		   size_t cpusetsize, unsigned long *cpus, unsigned int flags);
>
>  #define NEXT_PROGRAM "./vstate_exec_nolibc"
>  static int launch_test(int test_inherit)

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-09-21 12:55 ` [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag Andrew Jones
@ 2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-25 12:12     ` Andrew Jones
  2023-09-25 16:26     ` Evan Green
  2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-25 16:16   ` Evan Green
  2 siblings, 2 replies; 26+ messages in thread
From: Palmer Dabbelt @ 2023-09-25 11:23 UTC (permalink / raw)
  To: ajones; +Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Thu, 21 Sep 2023 05:55:22 PDT (-0700), ajones@ventanamicro.com wrote:
> Introduce the first flag for the hwprobe syscall. The flag basically
> reverses its behavior, i.e. instead of populating the values of keys
> for a given set of cpus, the set of cpus after the call is the result
> of finding a set which supports the values of the keys. In order to
> do this, we implement pair merge and pair compare functions which
> take the type of value (a single value vs. a bitmap of booleans) into
> consideration. The flow for the which-cpus syscall variant is as
> follows:
>
>   1. Merge pairs into a set of pairs with unique keys
>   2. If any unknown keys are seen, return an empty set of cpus
>   3. If the platform is homogeneous, then check all the pairs
>      against the "all cpu" values and return early
>   4. Otherwise, check all the pairs against each cpu individually
>
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  Documentation/riscv/hwprobe.rst       |  16 ++-
>  arch/riscv/include/uapi/asm/hwprobe.h |   3 +
>  arch/riscv/kernel/sys_riscv.c         | 148 +++++++++++++++++++++++++-
>  3 files changed, 163 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index 132e9acaa8f4..97b1e97e7dd2 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -25,8 +25,20 @@ arch, impl), the returned value will only be valid if all CPUs in the given set
>  have the same value. Otherwise -1 will be returned. For boolean-like keys, the
>  value returned will be a logical AND of the values for the specified CPUs.
>  Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> -all online CPUs. There are currently no flags, this value must be zero for
> -future compatibility.
> +all online CPUs. The currently supported flags are:
> +
> +* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
> +  of sys_riscv_hwprobe().  Instead of populating the values of keys for a given
> +  set of CPUs, the set of CPUs is initially all unset and the values of each key
> +  are given.  Upon return, the CPUs which all match each of the given key-value
> +  pairs are set in ``cpus``.  How matching is done depends on the key type.  For
> +  value-like keys, matching means to be the exact same as the value.  For
> +  boolean-like keys, matching means the result of a logical AND of the pair's
> +  value with the CPU's value is exactly the same as the pair's value.  ``cpus``
> +  may also initially have set bits, in which case the bits of any CPUs which do
> +  not match the pairs will be cleared, but no other bits will be set.
> +
> +All other flags are reserved for future compatibility and must be zero.
>
>  On success 0 is returned, on failure a negative error code is returned.
>
> diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> index 86d08a0e617b..36683307c3e4 100644
> --- a/arch/riscv/include/uapi/asm/hwprobe.h
> +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> @@ -40,4 +40,7 @@ struct riscv_hwprobe {
>  #define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE	6
>  /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
>
> +/* Flags */
> +#define RISCV_HWPROBE_WHICH_CPUS	(1 << 0)
> +
>  #endif
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 14b6dfaa5d9f..c70a72fe6aee 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -245,14 +245,145 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
>  	}
>  }
>
> +static bool hwprobe_key_is_map(__s64 key)
> +{
> +	switch (key) {
> +	case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> +	case RISCV_HWPROBE_KEY_IMA_EXT_0:
> +	case RISCV_HWPROBE_KEY_CPUPERF_0:
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static int hwprobe_pair_merge(struct riscv_hwprobe *to,
> +			      struct riscv_hwprobe *from)
> +{
> +	if (to->key != from->key)
> +		return -EINVAL;
> +
> +	if (hwprobe_key_is_map(to->key)) {
> +		to->value |= from->value;
> +		return 0;
> +	}
> +
> +	return to->value == from->value ? 0 : -EINVAL;
> +}
> +
> +static bool hwprobe_pair_cmp(struct riscv_hwprobe *pair,
> +			     struct riscv_hwprobe *other_pair)
> +{
> +	if (pair->key != other_pair->key)
> +		return false;
> +
> +	if (hwprobe_key_is_map(pair->key))
> +		return (pair->value & other_pair->value) == other_pair->value;
> +
> +	return pair->value == other_pair->value;
> +}
> +
> +static int hwprobe_which_cpus(struct riscv_hwprobe __user *pairs_user,
> +			      size_t pair_count, size_t cpusetsize,
> +			      cpumask_t *cpus)
> +{
> +	struct riscv_hwprobe pairs[RISCV_HWPROBE_MAX_KEY + 1] = {
> +		[0 ... RISCV_HWPROBE_MAX_KEY] = (struct riscv_hwprobe){ .key = -1 }
> +	};
> +	struct riscv_hwprobe pair;
> +	struct vdso_data *vd = __arch_get_k_vdso_data();
> +	struct arch_vdso_data *avd = &vd->arch_data;
> +	bool clear_all = false;
> +	cpumask_t one_cpu;
> +	int cpu, ret;
> +	size_t i;
> +
> +	for (i = 0; i < pair_count; i++) {
> +		ret = copy_from_user(&pair, &pairs_user[i], sizeof(pair));
> +		if (ret)
> +			return -EFAULT;
> +
> +		if (pair.key >= 0 && pair.key <= RISCV_HWPROBE_MAX_KEY) {
> +			if (pairs[pair.key].key == -1) {
> +				pairs[pair.key] = pair;
> +			} else {
> +				ret = hwprobe_pair_merge(&pairs[pair.key], &pair);
> +				if (ret)
> +					return ret;
> +			}
> +		} else {
> +			pair.key = -1;
> +			pair.value = 0;
> +			ret = copy_to_user(&pairs_user[i], &pair, sizeof(pair));
> +			if (ret)
> +				return -EFAULT;
> +			clear_all = true;
> +		}
> +	}
> +
> +	if (clear_all) {
> +		cpumask_clear(cpus);
> +		return 0;
> +	}
> +
> +	if (avd->homogeneous_cpus) {
> +		for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> +			if (pairs[i].key == -1)
> +				continue;
> +
> +			pair.key = pairs[i].key;
> +			pair.value = avd->all_cpu_hwprobe_values[pairs[i].key];
> +
> +			if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> +				cpumask_clear(cpus);
> +				return 0;
> +			}
> +		}
> +
> +		return 0;
> +	}
> +
> +	cpumask_clear(&one_cpu);
> +
> +	for_each_cpu(cpu, cpus) {
> +		cpumask_set_cpu(cpu, &one_cpu);
> +
> +		for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> +			if (pairs[i].key == -1)
> +				continue;
> +
> +			pair.key = pairs[i].key;
> +			pair.value = 0;
> +			hwprobe_one_pair(&pair, &one_cpu);
> +
> +			if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> +				cpumask_clear_cpu(cpu, cpus);
> +				break;
> +			}
> +		}
> +
> +		cpumask_clear_cpu(cpu, &one_cpu);
> +	}
> +
> +	return 0;
> +}
> +
>  static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
>  			    size_t pair_count, size_t cpusetsize,
>  			    unsigned long __user *cpus_user,
>  			    unsigned int flags)
>  {
> +	bool which_cpus = false;
> +	cpumask_t cpus;
>  	size_t out;
>  	int ret;
> -	cpumask_t cpus;
> +
> +	if (flags & RISCV_HWPROBE_WHICH_CPUS) {
> +		if (!cpusetsize || !cpus_user)
> +			return -EINVAL;
> +		flags &= ~RISCV_HWPROBE_WHICH_CPUS;
> +		which_cpus = true;
> +	}
>
>  	/* Check the reserved flags. */
>  	if (flags != 0)
> @@ -274,11 +405,24 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
>  		if (ret)
>  			return -EFAULT;
>
> +		cpumask_and(&cpus, &cpus, cpu_online_mask);
> +
> +		if (which_cpus) {
> +			if (cpumask_empty(&cpus))
> +				cpumask_copy(&cpus, cpu_online_mask);
> +			ret = hwprobe_which_cpus(pairs, pair_count, cpusetsize, &cpus);
> +			if (ret)
> +				return ret;
> +			ret = copy_to_user(cpus_user, &cpus, cpusetsize);
> +			if (ret)
> +				return -EFAULT;
> +			return 0;

So this is now essentailly two syscalls.  IMO it'd be cleaner to split 
out the implementations into two functions (ie, 
hwprobe_{which_cpus,which_featurs}() or whatever) rather than have an 
early out and the rest inline.

Also: maybe we want a whole hwprobe file?  It's sort of its own thing 
now, and it's only going to get bigger...

> +		}
> +
>  		/*
>  		 * Userspace must provide at least one online CPU, without that
>  		 * there's no way to define what is supported.
>  		 */
> -		cpumask_and(&cpus, &cpus, cpu_online_mask);
>  		if (cpumask_empty(&cpus))
>  			return -EINVAL;
>  	}

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-09-21 12:55 ` [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag Andrew Jones
  2023-09-25 11:23   ` Palmer Dabbelt
@ 2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-25 12:14     ` Andrew Jones
  2023-10-09 14:15     ` Andrew Jones
  2023-09-25 16:16   ` Evan Green
  2 siblings, 2 replies; 26+ messages in thread
From: Palmer Dabbelt @ 2023-09-25 11:23 UTC (permalink / raw)
  To: ajones; +Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Thu, 21 Sep 2023 05:55:22 PDT (-0700), ajones@ventanamicro.com wrote:
> Introduce the first flag for the hwprobe syscall. The flag basically
> reverses its behavior, i.e. instead of populating the values of keys
> for a given set of cpus, the set of cpus after the call is the result
> of finding a set which supports the values of the keys. In order to
> do this, we implement pair merge and pair compare functions which
> take the type of value (a single value vs. a bitmap of booleans) into
> consideration. The flow for the which-cpus syscall variant is as
> follows:
>
>   1. Merge pairs into a set of pairs with unique keys
>   2. If any unknown keys are seen, return an empty set of cpus
>   3. If the platform is homogeneous, then check all the pairs
>      against the "all cpu" values and return early
>   4. Otherwise, check all the pairs against each cpu individually

IIRC we talked about this in the patchwork call at some point, but IMO 
this feature makes sense and you weren't the first person to bring up 
having something like this.  So I think at a high level it's completely 
reasonable, some implementation comments follow.

>
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  Documentation/riscv/hwprobe.rst       |  16 ++-
>  arch/riscv/include/uapi/asm/hwprobe.h |   3 +
>  arch/riscv/kernel/sys_riscv.c         | 148 +++++++++++++++++++++++++-
>  3 files changed, 163 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index 132e9acaa8f4..97b1e97e7dd2 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -25,8 +25,20 @@ arch, impl), the returned value will only be valid if all CPUs in the given set
>  have the same value. Otherwise -1 will be returned. For boolean-like keys, the
>  value returned will be a logical AND of the values for the specified CPUs.
>  Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> -all online CPUs. There are currently no flags, this value must be zero for
> -future compatibility.
> +all online CPUs. The currently supported flags are:
> +
> +* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
> +  of sys_riscv_hwprobe().  Instead of populating the values of keys for a given
> +  set of CPUs, the set of CPUs is initially all unset and the values of each key
> +  are given.  Upon return, the CPUs which all match each of the given key-value
> +  pairs are set in ``cpus``.  How matching is done depends on the key type.  For

Kind of a meta-comment here, but we should add the key type to each key 
number in the docs.  I suppose maybe it should be obvious which is which 
from the meaning of the keys, but can't hurt to be explicit about it.

> +  value-like keys, matching means to be the exact same as the value.  For
> +  boolean-like keys, matching means the result of a logical AND of the pair's
> +  value with the CPU's value is exactly the same as the pair's value.  ``cpus``
> +  may also initially have set bits, in which case the bits of any CPUs which do
> +  not match the pairs will be cleared, but no other bits will be set.
> +
> +All other flags are reserved for future compatibility and must be zero.
>
>  On success 0 is returned, on failure a negative error code is returned.
>
> diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> index 86d08a0e617b..36683307c3e4 100644
> --- a/arch/riscv/include/uapi/asm/hwprobe.h
> +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> @@ -40,4 +40,7 @@ struct riscv_hwprobe {
>  #define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE	6
>  /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
>
> +/* Flags */
> +#define RISCV_HWPROBE_WHICH_CPUS	(1 << 0)
> +
>  #endif
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 14b6dfaa5d9f..c70a72fe6aee 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -245,14 +245,145 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
>  	}
>  }
>
> +static bool hwprobe_key_is_map(__s64 key)
> +{
> +	switch (key) {
> +	case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> +	case RISCV_HWPROBE_KEY_IMA_EXT_0:
> +	case RISCV_HWPROBE_KEY_CPUPERF_0:
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static int hwprobe_pair_merge(struct riscv_hwprobe *to,
> +			      struct riscv_hwprobe *from)
> +{
> +	if (to->key != from->key)
> +		return -EINVAL;
> +
> +	if (hwprobe_key_is_map(to->key)) {
> +		to->value |= from->value;
> +		return 0;
> +	}
> +
> +	return to->value == from->value ? 0 : -EINVAL;
> +}
> +
> +static bool hwprobe_pair_cmp(struct riscv_hwprobe *pair,
> +			     struct riscv_hwprobe *other_pair)
> +{
> +	if (pair->key != other_pair->key)
> +		return false;
> +
> +	if (hwprobe_key_is_map(pair->key))
> +		return (pair->value & other_pair->value) == other_pair->value;
> +
> +	return pair->value == other_pair->value;
> +}
> +
> +static int hwprobe_which_cpus(struct riscv_hwprobe __user *pairs_user,
> +			      size_t pair_count, size_t cpusetsize,
> +			      cpumask_t *cpus)
> +{
> +	struct riscv_hwprobe pairs[RISCV_HWPROBE_MAX_KEY + 1] = {
> +		[0 ... RISCV_HWPROBE_MAX_KEY] = (struct riscv_hwprobe){ .key = -1 }
> +	};
> +	struct riscv_hwprobe pair;
> +	struct vdso_data *vd = __arch_get_k_vdso_data();
> +	struct arch_vdso_data *avd = &vd->arch_data;
> +	bool clear_all = false;
> +	cpumask_t one_cpu;
> +	int cpu, ret;
> +	size_t i;
> +
> +	for (i = 0; i < pair_count; i++) {
> +		ret = copy_from_user(&pair, &pairs_user[i], sizeof(pair));
> +		if (ret)
> +			return -EFAULT;
> +
> +		if (pair.key >= 0 && pair.key <= RISCV_HWPROBE_MAX_KEY) {
> +			if (pairs[pair.key].key == -1) {
> +				pairs[pair.key] = pair;
> +			} else {
> +				ret = hwprobe_pair_merge(&pairs[pair.key], &pair);
> +				if (ret)
> +					return ret;
> +			}
> +		} else {
> +			pair.key = -1;
> +			pair.value = 0;
> +			ret = copy_to_user(&pairs_user[i], &pair, sizeof(pair));
> +			if (ret)
> +				return -EFAULT;
> +			clear_all = true;
> +		}
> +	}
> +
> +	if (clear_all) {
> +		cpumask_clear(cpus);
> +		return 0;
> +	}
> +
> +	if (avd->homogeneous_cpus) {
> +		for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> +			if (pairs[i].key == -1)
> +				continue;
> +
> +			pair.key = pairs[i].key;
> +			pair.value = avd->all_cpu_hwprobe_values[pairs[i].key];
> +
> +			if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> +				cpumask_clear(cpus);
> +				return 0;
> +			}
> +		}
> +
> +		return 0;
> +	}
> +
> +	cpumask_clear(&one_cpu);
> +
> +	for_each_cpu(cpu, cpus) {
> +		cpumask_set_cpu(cpu, &one_cpu);
> +
> +		for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> +			if (pairs[i].key == -1)
> +				continue;
> +
> +			pair.key = pairs[i].key;
> +			pair.value = 0;
> +			hwprobe_one_pair(&pair, &one_cpu);
> +
> +			if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> +				cpumask_clear_cpu(cpu, cpus);
> +				break;
> +			}
> +		}
> +
> +		cpumask_clear_cpu(cpu, &one_cpu);
> +	}
> +
> +	return 0;
> +}
> +
>  static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
>  			    size_t pair_count, size_t cpusetsize,
>  			    unsigned long __user *cpus_user,
>  			    unsigned int flags)
>  {
> +	bool which_cpus = false;
> +	cpumask_t cpus;
>  	size_t out;
>  	int ret;
> -	cpumask_t cpus;
> +
> +	if (flags & RISCV_HWPROBE_WHICH_CPUS) {
> +		if (!cpusetsize || !cpus_user)
> +			return -EINVAL;
> +		flags &= ~RISCV_HWPROBE_WHICH_CPUS;
> +		which_cpus = true;
> +	}
>
>  	/* Check the reserved flags. */
>  	if (flags != 0)
> @@ -274,11 +405,24 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
>  		if (ret)
>  			return -EFAULT;
>
> +		cpumask_and(&cpus, &cpus, cpu_online_mask);
> +
> +		if (which_cpus) {
> +			if (cpumask_empty(&cpus))
> +				cpumask_copy(&cpus, cpu_online_mask);
> +			ret = hwprobe_which_cpus(pairs, pair_count, cpusetsize, &cpus);
> +			if (ret)
> +				return ret;
> +			ret = copy_to_user(cpus_user, &cpus, cpusetsize);
> +			if (ret)
> +				return -EFAULT;
> +			return 0;
> +		}
> +
>  		/*
>  		 * Userspace must provide at least one online CPU, without that
>  		 * there's no way to define what is supported.
>  		 */
> -		cpumask_and(&cpus, &cpus, cpu_online_mask);
>  		if (cpumask_empty(&cpus))
>  			return -EINVAL;
>  	}

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 4/5] RISC-V: selftests: Add which-cpus hwprobe test
  2023-09-21 12:55 ` [RFC PATCH 4/5] RISC-V: selftests: Add which-cpus hwprobe test Andrew Jones
@ 2023-09-25 11:23   ` Palmer Dabbelt
  0 siblings, 0 replies; 26+ messages in thread
From: Palmer Dabbelt @ 2023-09-25 11:23 UTC (permalink / raw)
  To: ajones; +Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Thu, 21 Sep 2023 05:55:23 PDT (-0700), ajones@ventanamicro.com wrote:
> Test the RISCV_HWPROBE_WHICH_CPUS flag of hwprobe. The test also
> has a command line interface in order to get the cpu list for
> arbitrary hwprobe pairs.
>
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  .../testing/selftests/riscv/hwprobe/Makefile  |   5 +-
>  .../selftests/riscv/hwprobe/which-cpus.c      | 156 ++++++++++++++++++
>  2 files changed, 160 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/riscv/hwprobe/which-cpus.c
>
> diff --git a/tools/testing/selftests/riscv/hwprobe/Makefile b/tools/testing/selftests/riscv/hwprobe/Makefile
> index f224b84591fb..cec81610a5f2 100644
> --- a/tools/testing/selftests/riscv/hwprobe/Makefile
> +++ b/tools/testing/selftests/riscv/hwprobe/Makefile
> @@ -4,7 +4,7 @@
>
>  CFLAGS += -I$(top_srcdir)/tools/include
>
> -TEST_GEN_PROGS := hwprobe cbo
> +TEST_GEN_PROGS := hwprobe cbo which-cpus
>
>  include ../../lib.mk
>
> @@ -13,3 +13,6 @@ $(OUTPUT)/hwprobe: hwprobe.c sys_hwprobe.S
>
>  $(OUTPUT)/cbo: cbo.c sys_hwprobe.S
>  	$(CC) -static -o$@ $(CFLAGS) $(LDFLAGS) $^
> +
> +$(OUTPUT)/which-cpus: which-cpus.c sys_hwprobe.S
> +	$(CC) -static -o$@ $(CFLAGS) $(LDFLAGS) $^
> diff --git a/tools/testing/selftests/riscv/hwprobe/which-cpus.c b/tools/testing/selftests/riscv/hwprobe/which-cpus.c
> new file mode 100644
> index 000000000000..83877a715b08
> --- /dev/null
> +++ b/tools/testing/selftests/riscv/hwprobe/which-cpus.c
> @@ -0,0 +1,156 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023 Ventana Micro Systems Inc.
> + *
> + * Test the RISCV_HWPROBE_WHICH_CPUS flag of hwprobe. Also provides a command
> + * line interface to get the cpu list for arbitrary hwprobe pairs.
> + */
> +#define _GNU_SOURCE
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sched.h>
> +#include <unistd.h>
> +#include <assert.h>
> +
> +#include "hwprobe.h"
> +#include "../../kselftest.h"
> +
> +static void help(void)
> +{
> +	printf("\n"
> +	       "which-cpus: [-h] [<key=value> [<key=value> ...]]\n\n"
> +	       "   Without parameters, tests the RISCV_HWPROBE_WHICH_CPUS flag of hwprobe.\n"
> +	       "   With parameters, where each parameter is a hwprobe pair written as\n"
> +	       "   <key=value>, outputs the cpulist for cpus which all match the given set\n"
> +	       "   of pairs.  'key' and 'value' should be in numeric form, e.g. 4=0x3b\n");
> +}
> +
> +static void print_cpulist(cpu_set_t *cpus)
> +{
> +	int start = 0, end = 0;
> +
> +	if (!CPU_COUNT(cpus)) {
> +		printf("cpus: None\n");
> +		return;
> +	}
> +
> +	printf("cpus:");
> +	for (int i = 0, c = 0; i < CPU_COUNT(cpus); i++, c++) {
> +		if (start != end && !CPU_ISSET(c, cpus))
> +			printf("-%d", end);
> +
> +		while (!CPU_ISSET(c, cpus))
> +			++c;
> +
> +		if (i != 0 && c == end + 1) {
> +			end = c;
> +			continue;
> +		}
> +
> +		printf("%c%d", i == 0 ? ' ' : ',', c);
> +		start = end = c;
> +	}
> +	if (start != end)
> +		printf("-%d", end);
> +	printf("\n");
> +}
> +
> +static void do_which_cpus(int argc, char **argv)
> +{
> +	struct riscv_hwprobe *pairs;
> +	int nr_pairs = argc - 1;
> +	cpu_set_t cpus;
> +	char *start, *end;
> +	int rc;
> +
> +	pairs = malloc(nr_pairs * sizeof(struct riscv_hwprobe));
> +	assert(pairs);
> +
> +	for (int i = 0; i < nr_pairs; i++) {
> +		start = argv[i + 1];
> +		pairs[i].key = strtol(start, &end, 0);
> +		assert(end != start && *end == '=');
> +		start = end + 1;
> +		pairs[i].value = strtoul(start, &end, 0);
> +		assert(end != start && *end == '\0');
> +	}
> +
> +	CPU_ZERO(&cpus);
> +	rc = riscv_hwprobe(pairs, nr_pairs, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
> +	assert(rc == 0);
> +	print_cpulist(&cpus);
> +	free(pairs);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	struct riscv_hwprobe pairs[2];
> +	cpu_set_t cpus_aff, cpus;
> +	__u64 ext0_all;
> +	long rc;
> +
> +	if (argc > 1) {
> +		if (!strcmp(argv[1], "-h"))
> +			help();
> +		else
> +			do_which_cpus(argc, argv);
> +		return 0;
> +	}
> +
> +	rc = sched_getaffinity(0, sizeof(cpu_set_t), &cpus_aff);
> +	assert(rc == 0);
> +
> +	ksft_print_header();
> +	ksft_set_plan(7);
> +
> +	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, };
> +	rc = riscv_hwprobe(pairs, 1, 0, NULL, 0);
> +	assert(rc == 0 && pairs[0].key == RISCV_HWPROBE_KEY_BASE_BEHAVIOR &&
> +	       pairs[0].value == RISCV_HWPROBE_BASE_BEHAVIOR_IMA);
> +
> +	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_IMA_EXT_0, };
> +	rc = riscv_hwprobe(pairs, 1, 0, NULL, 0);
> +	assert(rc == 0 && pairs[0].key == RISCV_HWPROBE_KEY_IMA_EXT_0);
> +	ext0_all = pairs[0].value;
> +
> +	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
> +	CPU_ZERO(&cpus);
> +	rc = riscv_hwprobe(pairs, 1, 0, (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
> +	ksft_test_result(rc == -EINVAL, "no cpusetsize\n");
> +
> +	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
> +	rc = riscv_hwprobe(pairs, 1, sizeof(cpu_set_t), NULL, RISCV_HWPROBE_WHICH_CPUS);
> +	ksft_test_result(rc == -EINVAL, "NULL cpus\n");
> +
> +	pairs[0] = (struct riscv_hwprobe){ .key = 0xbadc0de, };
> +	CPU_ZERO(&cpus);
> +	rc = riscv_hwprobe(pairs, 1, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
> +	ksft_test_result(rc == 0 && CPU_COUNT(&cpus) == 0, "unknown key\n");
> +
> +	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
> +	pairs[1] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
> +	CPU_ZERO(&cpus);
> +	rc = riscv_hwprobe(pairs, 2, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
> +	ksft_test_result(rc == 0, "duplicate keys\n");
> +
> +	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
> +	pairs[1] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_IMA_EXT_0, .value = ext0_all, };
> +	CPU_ZERO(&cpus);
> +	rc = riscv_hwprobe(pairs, 2, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
> +	ksft_test_result(rc == 0 && CPU_COUNT(&cpus) == sysconf(_SC_NPROCESSORS_ONLN), "set all cpus\n");
> +
> +	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
> +	pairs[1] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_IMA_EXT_0, .value = ext0_all, };
> +	memcpy(&cpus, &cpus_aff, sizeof(cpu_set_t));
> +	rc = riscv_hwprobe(pairs, 2, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
> +	ksft_test_result(rc == 0 && CPU_EQUAL(&cpus, &cpus_aff), "set all affinity cpus\n");
> +
> +	pairs[0] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR, .value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA, };
> +	pairs[1] = (struct riscv_hwprobe){ .key = RISCV_HWPROBE_KEY_IMA_EXT_0, .value = ~ext0_all, };
> +	memcpy(&cpus, &cpus_aff, sizeof(cpu_set_t));
> +	rc = riscv_hwprobe(pairs, 2, sizeof(cpu_set_t), (unsigned long *)&cpus, RISCV_HWPROBE_WHICH_CPUS);
> +	ksft_test_result(rc == 0 && CPU_COUNT(&cpus) == 0, "clear all cpus\n");
> +
> +	ksft_finished();
> +}

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 5/5] RISC-V: selftests: Apply which-cpus flag to CBO hwprobe test
  2023-09-21 12:55 ` [RFC PATCH 5/5] RISC-V: selftests: Apply which-cpus flag to CBO " Andrew Jones
@ 2023-09-25 11:23   ` Palmer Dabbelt
  0 siblings, 0 replies; 26+ messages in thread
From: Palmer Dabbelt @ 2023-09-25 11:23 UTC (permalink / raw)
  To: ajones; +Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Thu, 21 Sep 2023 05:55:24 PDT (-0700), ajones@ventanamicro.com wrote:
> Now that we can efficiently clear cpus from a cpu set that do
> not meet a key-value criteria, apply it to the cpus used in
> the CBO hwprobe selftest, instead of looping over hwprobe.
>
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> ---
>  tools/testing/selftests/riscv/hwprobe/cbo.c | 26 +++++++--------------
>  1 file changed, 9 insertions(+), 17 deletions(-)
>
> diff --git a/tools/testing/selftests/riscv/hwprobe/cbo.c b/tools/testing/selftests/riscv/hwprobe/cbo.c
> index 50a2cc8aef38..375a343bc2c6 100644
> --- a/tools/testing/selftests/riscv/hwprobe/cbo.c
> +++ b/tools/testing/selftests/riscv/hwprobe/cbo.c
> @@ -135,27 +135,19 @@ static void check_no_zicboz_cpus(cpu_set_t *cpus)
>  {
>  	struct riscv_hwprobe pair = {
>  		.key = RISCV_HWPROBE_KEY_IMA_EXT_0,
> +		.value = RISCV_HWPROBE_EXT_ZICBOZ,
>  	};
> -	cpu_set_t one_cpu;
> -	int i = 0, c = 0;
> +	cpu_set_t tmp;
>  	long rc;
>
> -	while (i++ < CPU_COUNT(cpus)) {
> -		while (!CPU_ISSET(c, cpus))
> -			++c;
> -
> -		CPU_ZERO(&one_cpu);
> -		CPU_SET(c, &one_cpu);
> -
> -		rc = riscv_hwprobe(&pair, 1, sizeof(cpu_set_t), (unsigned long *)&one_cpu, 0);
> -		assert(rc == 0 && pair.key == RISCV_HWPROBE_KEY_IMA_EXT_0);
> +	memcpy(&tmp, cpus, sizeof(cpu_set_t));
> +	rc = riscv_hwprobe(&pair, 1, sizeof(cpu_set_t), (unsigned long *)&tmp, RISCV_HWPROBE_WHICH_CPUS);
> +	assert(rc == 0 && pair.key == RISCV_HWPROBE_KEY_IMA_EXT_0);
>
> -		if (pair.value & RISCV_HWPROBE_EXT_ZICBOZ)
> -			ksft_exit_fail_msg("Zicboz is only present on a subset of harts.\n"
> -					   "Use taskset to select a set of harts where Zicboz\n"
> -					   "presence (present or not) is consistent for each hart\n");
> -		++c;
> -	}
> +	if (CPU_COUNT(&tmp))
> +		ksft_exit_fail_msg("Zicboz is only present on a subset of harts.\n"
> +				   "Use taskset to select a set of harts where Zicboz\n"
> +				   "presence (present or not) is consistent for each hart\n");
>  }
>
>  enum {

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 1/5] RISC-V: hwprobe: Clarify cpus size parameter
  2023-09-25 11:23   ` Palmer Dabbelt
@ 2023-09-25 12:07     ` Andrew Jones
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Jones @ 2023-09-25 12:07 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Mon, Sep 25, 2023 at 04:23:22AM -0700, Palmer Dabbelt wrote:
> On Thu, 21 Sep 2023 05:55:20 PDT (-0700), ajones@ventanamicro.com wrote:
> > The "count" parameter associated with the 'cpus' parameter of the
> > hwprobe syscall is the size in bytes of 'cpus'. Naming it 'cpu_count'
> > may mislead users (it did me) to think it's the number of CPUs that
> > are or can be represented by 'cpus' instead. This is particularly
> > easy (IMO) to get wrong since 'cpus' is documented to be defined by
> > CPU_SET(3) and CPU_SET(3) also documents a CPU_COUNT() (the number
> > of CPUs in set) macro. CPU_SET(3) refers to the size of cpu sets
> > with 'setsize'. Adopt 'cpusetsize' for the hwprobe parameter and
> > specifically state it is in bytes in Documentation/riscv/hwprobe.rst
> > to clarify.
> > 
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > ---
> >  Documentation/riscv/hwprobe.rst  | 15 ++++++++-------
> >  arch/riscv/kernel/sys_riscv.c    | 14 +++++++-------
> >  arch/riscv/kernel/vdso/hwprobe.c | 10 +++++-----
> >  3 files changed, 20 insertions(+), 19 deletions(-)
> > 
> > diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> > index 7b2384de471f..132e9acaa8f4 100644
> > --- a/Documentation/riscv/hwprobe.rst
> > +++ b/Documentation/riscv/hwprobe.rst
> > @@ -12,7 +12,7 @@ is defined in <asm/hwprobe.h>::
> >      };
> > 
> >      long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> > -                           size_t cpu_count, cpu_set_t *cpus,
> > +                           size_t cpusetsize, cpu_set_t *cpus,
> >                             unsigned int flags);
> > 
> >  The arguments are split into three groups: an array of key-value pairs, a CPU
> > @@ -20,12 +20,13 @@ set, and some flags. The key-value pairs are supplied with a count. Userspace
> >  must prepopulate the key field for each element, and the kernel will fill in the
> >  value if the key is recognized. If a key is unknown to the kernel, its key field
> >  will be cleared to -1, and its value set to 0. The CPU set is defined by
> > -CPU_SET(3). For value-like keys (eg. vendor/arch/impl), the returned value will
> > -be only be valid if all CPUs in the given set have the same value. Otherwise -1
> > -will be returned. For boolean-like keys, the value returned will be a logical
> > -AND of the values for the specified CPUs. Usermode can supply NULL for cpus and
> > -0 for cpu_count as a shortcut for all online CPUs. There are currently no flags,
> > -this value must be zero for future compatibility.
> > +CPU_SET(3) with size ``cpusetsize`` bytes. For value-like keys (eg. vendor,
> 
> So the only difference is just "with size ``cpusetsize`` bytes" and a bunch
> of line wrap changes?  That always makes these doc updates hard to read, as
> I go a bit cross-eyed trying to diff it manually.  I don't know of a better
> way to do it, though.

After applying the patch we can use 'git show --word-diff=color' on it.
In this case, we'll see the "with size ``cpusetsize``" change, and that I
also changed "vendor/arch/impl" to "vendor, arch, impl", dropped the first
'be' from "will be only be" and added `` around cpus.

> 
> > +arch, impl), the returned value will only be valid if all CPUs in the given set
> > +have the same value. Otherwise -1 will be returned. For boolean-like keys, the
> > +value returned will be a logical AND of the values for the specified CPUs.
> > +Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> > +all online CPUs. There are currently no flags, this value must be zero for
> > +future compatibility.
> > 
> >  On success 0 is returned, on failure a negative error code is returned.
> > 
> > diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> > index d7266a9abc66..14b6dfaa5d9f 100644
> > --- a/arch/riscv/kernel/sys_riscv.c
> > +++ b/arch/riscv/kernel/sys_riscv.c
> > @@ -246,7 +246,7 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
> >  }
> > 
> >  static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> > -			    size_t pair_count, size_t cpu_count,
> > +			    size_t pair_count, size_t cpusetsize,
> >  			    unsigned long __user *cpus_user,
> >  			    unsigned int flags)
> >  {
> > @@ -264,13 +264,13 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> >  	 * 0 as a shortcut to all online CPUs.
> >  	 */
> >  	cpumask_clear(&cpus);
> > -	if (!cpu_count && !cpus_user) {
> > +	if (!cpusetsize && !cpus_user) {
> >  		cpumask_copy(&cpus, cpu_online_mask);
> >  	} else {
> > -		if (cpu_count > cpumask_size())
> > -			cpu_count = cpumask_size();
> > +		if (cpusetsize > cpumask_size())
> > +			cpusetsize = cpumask_size();
> > 
> > -		ret = copy_from_user(&cpus, cpus_user, cpu_count);
> > +		ret = copy_from_user(&cpus, cpus_user, cpusetsize);
> >  		if (ret)
> >  			return -EFAULT;
> > 
> > @@ -347,10 +347,10 @@ arch_initcall_sync(init_hwprobe_vdso_data);
> >  #endif /* CONFIG_MMU */
> > 
> >  SYSCALL_DEFINE5(riscv_hwprobe, struct riscv_hwprobe __user *, pairs,
> > -		size_t, pair_count, size_t, cpu_count, unsigned long __user *,
> > +		size_t, pair_count, size_t, cpusetsize, unsigned long __user *,
> >  		cpus, unsigned int, flags)
> >  {
> > -	return do_riscv_hwprobe(pairs, pair_count, cpu_count,
> > +	return do_riscv_hwprobe(pairs, pair_count, cpusetsize,
> >  				cpus, flags);
> >  }
> > 
> > diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c
> > index d40bec6ac078..3000e9fc0569 100644
> > --- a/arch/riscv/kernel/vdso/hwprobe.c
> > +++ b/arch/riscv/kernel/vdso/hwprobe.c
> > @@ -8,21 +8,21 @@
> >  #include <vdso/helpers.h>
> > 
> >  extern int riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> > -			 size_t cpu_count, unsigned long *cpus,
> > +			 size_t cpusetsize, unsigned long *cpus,
> >  			 unsigned int flags);
> > 
> >  /* Add a prototype to avoid -Wmissing-prototypes warning. */
> >  int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> > -			 size_t cpu_count, unsigned long *cpus,
> > +			 size_t cpusetsize, unsigned long *cpus,
> >  			 unsigned int flags);
> > 
> >  int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> > -			 size_t cpu_count, unsigned long *cpus,
> > +			 size_t cpusetsize, unsigned long *cpus,
> >  			 unsigned int flags)
> >  {
> >  	const struct vdso_data *vd = __arch_get_vdso_data();
> >  	const struct arch_vdso_data *avd = &vd->arch_data;
> > -	bool all_cpus = !cpu_count && !cpus;
> > +	bool all_cpus = !cpusetsize && !cpus;
> >  	struct riscv_hwprobe *p = pairs;
> >  	struct riscv_hwprobe *end = pairs + pair_count;
> > 
> > @@ -33,7 +33,7 @@ int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> >  	 * masks.
> >  	 */
> >  	if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus))
> > -		return riscv_hwprobe(pairs, pair_count, cpu_count, cpus, flags);
> > +		return riscv_hwprobe(pairs, pair_count, cpusetsize, cpus, flags);
> > 
> >  	/* This is something we can handle, fill out the pairs. */
> >  	while (p < end) {
> 
> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-09-25 11:23   ` Palmer Dabbelt
@ 2023-09-25 12:12     ` Andrew Jones
  2023-09-25 16:26     ` Evan Green
  1 sibling, 0 replies; 26+ messages in thread
From: Andrew Jones @ 2023-09-25 12:12 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Mon, Sep 25, 2023 at 04:23:29AM -0700, Palmer Dabbelt wrote:
> On Thu, 21 Sep 2023 05:55:22 PDT (-0700), ajones@ventanamicro.com wrote:
...
> > @@ -274,11 +405,24 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> >  		if (ret)
> >  			return -EFAULT;
> > 
> > +		cpumask_and(&cpus, &cpus, cpu_online_mask);
> > +
> > +		if (which_cpus) {
> > +			if (cpumask_empty(&cpus))
> > +				cpumask_copy(&cpus, cpu_online_mask);
> > +			ret = hwprobe_which_cpus(pairs, pair_count, cpusetsize, &cpus);
> > +			if (ret)
> > +				return ret;
> > +			ret = copy_to_user(cpus_user, &cpus, cpusetsize);
> > +			if (ret)
> > +				return -EFAULT;
> > +			return 0;
> 
> So this is now essentailly two syscalls.  IMO it'd be cleaner to split out
> the implementations into two functions (ie,
> hwprobe_{which_cpus,which_featurs}() or whatever) rather than have an early
> out and the rest inline.
> 
> Also: maybe we want a whole hwprobe file?  It's sort of its own thing now,
> and it's only going to get bigger...

I'll do both for v1.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-09-25 11:23   ` Palmer Dabbelt
@ 2023-09-25 12:14     ` Andrew Jones
  2023-10-09 14:15     ` Andrew Jones
  1 sibling, 0 replies; 26+ messages in thread
From: Andrew Jones @ 2023-09-25 12:14 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Mon, Sep 25, 2023 at 04:23:32AM -0700, Palmer Dabbelt wrote:
> On Thu, 21 Sep 2023 05:55:22 PDT (-0700), ajones@ventanamicro.com wrote:
> > Introduce the first flag for the hwprobe syscall. The flag basically
> > reverses its behavior, i.e. instead of populating the values of keys
> > for a given set of cpus, the set of cpus after the call is the result
> > of finding a set which supports the values of the keys. In order to
> > do this, we implement pair merge and pair compare functions which
> > take the type of value (a single value vs. a bitmap of booleans) into
> > consideration. The flow for the which-cpus syscall variant is as
> > follows:
> > 
> >   1. Merge pairs into a set of pairs with unique keys
> >   2. If any unknown keys are seen, return an empty set of cpus
> >   3. If the platform is homogeneous, then check all the pairs
> >      against the "all cpu" values and return early
> >   4. Otherwise, check all the pairs against each cpu individually
> 
> IIRC we talked about this in the patchwork call at some point, but IMO this
> feature makes sense and you weren't the first person to bring up having
> something like this.  So I think at a high level it's completely reasonable,
> some implementation comments follow.
> 
> > 
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > ---
> >  Documentation/riscv/hwprobe.rst       |  16 ++-
> >  arch/riscv/include/uapi/asm/hwprobe.h |   3 +
> >  arch/riscv/kernel/sys_riscv.c         | 148 +++++++++++++++++++++++++-
> >  3 files changed, 163 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> > index 132e9acaa8f4..97b1e97e7dd2 100644
> > --- a/Documentation/riscv/hwprobe.rst
> > +++ b/Documentation/riscv/hwprobe.rst
> > @@ -25,8 +25,20 @@ arch, impl), the returned value will only be valid if all CPUs in the given set
> >  have the same value. Otherwise -1 will be returned. For boolean-like keys, the
> >  value returned will be a logical AND of the values for the specified CPUs.
> >  Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> > -all online CPUs. There are currently no flags, this value must be zero for
> > -future compatibility.
> > +all online CPUs. The currently supported flags are:
> > +
> > +* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
> > +  of sys_riscv_hwprobe().  Instead of populating the values of keys for a given
> > +  set of CPUs, the set of CPUs is initially all unset and the values of each key
> > +  are given.  Upon return, the CPUs which all match each of the given key-value
> > +  pairs are set in ``cpus``.  How matching is done depends on the key type.  For
> 
> Kind of a meta-comment here, but we should add the key type to each key
> number in the docs.  I suppose maybe it should be obvious which is which
> from the meaning of the keys, but can't hurt to be explicit about it.

I'll add for v1.

> 
> > +  value-like keys, matching means to be the exact same as the value.  For
> > +  boolean-like keys, matching means the result of a logical AND of the pair's
> > +  value with the CPU's value is exactly the same as the pair's value.  ``cpus``
> > +  may also initially have set bits, in which case the bits of any CPUs which do
> > +  not match the pairs will be cleared, but no other bits will be set.
> > +
> > +All other flags are reserved for future compatibility and must be zero.
> > 
> >  On success 0 is returned, on failure a negative error code is returned.
> > 
> > diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> > index 86d08a0e617b..36683307c3e4 100644
> > --- a/arch/riscv/include/uapi/asm/hwprobe.h
> > +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> > @@ -40,4 +40,7 @@ struct riscv_hwprobe {
> >  #define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE	6
> >  /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
> > 
> > +/* Flags */
> > +#define RISCV_HWPROBE_WHICH_CPUS	(1 << 0)
> > +
> >  #endif
> > diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> > index 14b6dfaa5d9f..c70a72fe6aee 100644
> > --- a/arch/riscv/kernel/sys_riscv.c
> > +++ b/arch/riscv/kernel/sys_riscv.c
> > @@ -245,14 +245,145 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
> >  	}
> >  }
> > 
> > +static bool hwprobe_key_is_map(__s64 key)
> > +{
> > +	switch (key) {
> > +	case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> > +	case RISCV_HWPROBE_KEY_IMA_EXT_0:
> > +	case RISCV_HWPROBE_KEY_CPUPERF_0:
> > +		return true;
> > +	}
> > +
> > +	return false;
> > +}
> > +
> > +static int hwprobe_pair_merge(struct riscv_hwprobe *to,
> > +			      struct riscv_hwprobe *from)
> > +{
> > +	if (to->key != from->key)
> > +		return -EINVAL;
> > +
> > +	if (hwprobe_key_is_map(to->key)) {
> > +		to->value |= from->value;
> > +		return 0;
> > +	}
> > +
> > +	return to->value == from->value ? 0 : -EINVAL;
> > +}
> > +
> > +static bool hwprobe_pair_cmp(struct riscv_hwprobe *pair,
> > +			     struct riscv_hwprobe *other_pair)
> > +{
> > +	if (pair->key != other_pair->key)
> > +		return false;
> > +
> > +	if (hwprobe_key_is_map(pair->key))
> > +		return (pair->value & other_pair->value) == other_pair->value;
> > +
> > +	return pair->value == other_pair->value;
> > +}
> > +
> > +static int hwprobe_which_cpus(struct riscv_hwprobe __user *pairs_user,
> > +			      size_t pair_count, size_t cpusetsize,
> > +			      cpumask_t *cpus)
> > +{
> > +	struct riscv_hwprobe pairs[RISCV_HWPROBE_MAX_KEY + 1] = {
> > +		[0 ... RISCV_HWPROBE_MAX_KEY] = (struct riscv_hwprobe){ .key = -1 }
> > +	};
> > +	struct riscv_hwprobe pair;
> > +	struct vdso_data *vd = __arch_get_k_vdso_data();
> > +	struct arch_vdso_data *avd = &vd->arch_data;
> > +	bool clear_all = false;
> > +	cpumask_t one_cpu;
> > +	int cpu, ret;
> > +	size_t i;
> > +
> > +	for (i = 0; i < pair_count; i++) {
> > +		ret = copy_from_user(&pair, &pairs_user[i], sizeof(pair));
> > +		if (ret)
> > +			return -EFAULT;
> > +
> > +		if (pair.key >= 0 && pair.key <= RISCV_HWPROBE_MAX_KEY) {
> > +			if (pairs[pair.key].key == -1) {
> > +				pairs[pair.key] = pair;
> > +			} else {
> > +				ret = hwprobe_pair_merge(&pairs[pair.key], &pair);
> > +				if (ret)
> > +					return ret;
> > +			}
> > +		} else {
> > +			pair.key = -1;
> > +			pair.value = 0;
> > +			ret = copy_to_user(&pairs_user[i], &pair, sizeof(pair));
> > +			if (ret)
> > +				return -EFAULT;
> > +			clear_all = true;
> > +		}
> > +	}
> > +
> > +	if (clear_all) {
> > +		cpumask_clear(cpus);
> > +		return 0;
> > +	}
> > +
> > +	if (avd->homogeneous_cpus) {
> > +		for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> > +			if (pairs[i].key == -1)
> > +				continue;
> > +
> > +			pair.key = pairs[i].key;
> > +			pair.value = avd->all_cpu_hwprobe_values[pairs[i].key];
> > +
> > +			if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> > +				cpumask_clear(cpus);
> > +				return 0;
> > +			}
> > +		}
> > +
> > +		return 0;
> > +	}
> > +
> > +	cpumask_clear(&one_cpu);
> > +
> > +	for_each_cpu(cpu, cpus) {
> > +		cpumask_set_cpu(cpu, &one_cpu);
> > +
> > +		for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> > +			if (pairs[i].key == -1)
> > +				continue;
> > +
> > +			pair.key = pairs[i].key;
> > +			pair.value = 0;
> > +			hwprobe_one_pair(&pair, &one_cpu);
> > +
> > +			if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> > +				cpumask_clear_cpu(cpu, cpus);
> > +				break;
> > +			}
> > +		}
> > +
> > +		cpumask_clear_cpu(cpu, &one_cpu);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> >  			    size_t pair_count, size_t cpusetsize,
> >  			    unsigned long __user *cpus_user,
> >  			    unsigned int flags)
> >  {
> > +	bool which_cpus = false;
> > +	cpumask_t cpus;
> >  	size_t out;
> >  	int ret;
> > -	cpumask_t cpus;
> > +
> > +	if (flags & RISCV_HWPROBE_WHICH_CPUS) {
> > +		if (!cpusetsize || !cpus_user)
> > +			return -EINVAL;
> > +		flags &= ~RISCV_HWPROBE_WHICH_CPUS;
> > +		which_cpus = true;
> > +	}
> > 
> >  	/* Check the reserved flags. */
> >  	if (flags != 0)
> > @@ -274,11 +405,24 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> >  		if (ret)
> >  			return -EFAULT;
> > 
> > +		cpumask_and(&cpus, &cpus, cpu_online_mask);
> > +
> > +		if (which_cpus) {
> > +			if (cpumask_empty(&cpus))
> > +				cpumask_copy(&cpus, cpu_online_mask);
> > +			ret = hwprobe_which_cpus(pairs, pair_count, cpusetsize, &cpus);
> > +			if (ret)
> > +				return ret;
> > +			ret = copy_to_user(cpus_user, &cpus, cpusetsize);
> > +			if (ret)
> > +				return -EFAULT;
> > +			return 0;
> > +		}
> > +
> >  		/*
> >  		 * Userspace must provide at least one online CPU, without that
> >  		 * there's no way to define what is supported.
> >  		 */
> > -		cpumask_and(&cpus, &cpus, cpu_online_mask);
> >  		if (cpumask_empty(&cpus))
> >  			return -EINVAL;
> >  	}

I think the rest of the comments ended up in a separate mail. Hopefully
none got lost.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-09-21 12:55 ` [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag Andrew Jones
  2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-25 11:23   ` Palmer Dabbelt
@ 2023-09-25 16:16   ` Evan Green
  2023-10-05 13:23     ` Andrew Jones
  2 siblings, 1 reply; 26+ messages in thread
From: Evan Green @ 2023-09-25 16:16 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, paul.walmsley, palmer, aou, conor.dooley, apatel

On Thu, Sep 21, 2023 at 5:55 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> Introduce the first flag for the hwprobe syscall. The flag basically
> reverses its behavior, i.e. instead of populating the values of keys
> for a given set of cpus, the set of cpus after the call is the result
> of finding a set which supports the values of the keys. In order to
> do this, we implement pair merge and pair compare functions which
> take the type of value (a single value vs. a bitmap of booleans) into
> consideration. The flow for the which-cpus syscall variant is as
> follows:
>
>   1. Merge pairs into a set of pairs with unique keys
>   2. If any unknown keys are seen, return an empty set of cpus
>   3. If the platform is homogeneous, then check all the pairs
>      against the "all cpu" values and return early
>   4. Otherwise, check all the pairs against each cpu individually
>
> Signed-off-by: Andrew Jones <ajones@ventanamicro.com>

Overall the concept seems reasonable to me. I could see an application
like Chrome using this to schedule intensive or latency-sensitive
activities (video playback comes to mind) on only cores that meet a
certain performance bar, if any do.

> ---
>  Documentation/riscv/hwprobe.rst       |  16 ++-
>  arch/riscv/include/uapi/asm/hwprobe.h |   3 +
>  arch/riscv/kernel/sys_riscv.c         | 148 +++++++++++++++++++++++++-
>  3 files changed, 163 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index 132e9acaa8f4..97b1e97e7dd2 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -25,8 +25,20 @@ arch, impl), the returned value will only be valid if all CPUs in the given set
>  have the same value. Otherwise -1 will be returned. For boolean-like keys, the
>  value returned will be a logical AND of the values for the specified CPUs.
>  Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> -all online CPUs. There are currently no flags, this value must be zero for
> -future compatibility.
> +all online CPUs. The currently supported flags are:
> +
> +* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
> +  of sys_riscv_hwprobe().  Instead of populating the values of keys for a given
> +  set of CPUs, the set of CPUs is initially all unset and the values of each key
> +  are given.  Upon return, the CPUs which all match each of the given key-value
> +  pairs are set in ``cpus``.  How matching is done depends on the key type.  For
> +  value-like keys, matching means to be the exact same as the value.  For
> +  boolean-like keys, matching means the result of a logical AND of the pair's
> +  value with the CPU's value is exactly the same as the pair's value.  ``cpus``
> +  may also initially have set bits, in which case the bits of any CPUs which do
> +  not match the pairs will be cleared, but no other bits will be set.
> +
> +All other flags are reserved for future compatibility and must be zero.
>
>  On success 0 is returned, on failure a negative error code is returned.
>
> diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> index 86d08a0e617b..36683307c3e4 100644
> --- a/arch/riscv/include/uapi/asm/hwprobe.h
> +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> @@ -40,4 +40,7 @@ struct riscv_hwprobe {
>  #define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE    6
>  /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
>
> +/* Flags */
> +#define RISCV_HWPROBE_WHICH_CPUS       (1 << 0)
> +
>  #endif
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 14b6dfaa5d9f..c70a72fe6aee 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -245,14 +245,145 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
>         }
>  }
>
> +static bool hwprobe_key_is_map(__s64 key)

The word "map" confuses me some, as the closest my brain can get is a
synonym for dictionary, which makes me think "unique value". Based on
the keys returning true, you really mean bitfield.

> +{
> +       switch (key) {
> +       case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> +       case RISCV_HWPROBE_KEY_IMA_EXT_0:
> +       case RISCV_HWPROBE_KEY_CPUPERF_0:
> +               return true;
> +       }
> +
> +       return false;
> +}
> +
> +static int hwprobe_pair_merge(struct riscv_hwprobe *to,
> +                             struct riscv_hwprobe *from)

Why is the merge step necessary? Maybe I'll see further below. Oh yes,
I do, but see my comment below on how we might get rid of this.

> +{
> +       if (to->key != from->key)
> +               return -EINVAL;
> +
> +       if (hwprobe_key_is_map(to->key)) {
> +               to->value |= from->value;
> +               return 0;
> +       }
> +
> +       return to->value == from->value ? 0 : -EINVAL;
> +}
> +
> +static bool hwprobe_pair_cmp(struct riscv_hwprobe *pair,
> +                            struct riscv_hwprobe *other_pair)
> +{
> +       if (pair->key != other_pair->key)
> +               return false;
> +
> +       if (hwprobe_key_is_map(pair->key))
> +               return (pair->value & other_pair->value) == other_pair->value;
> +
> +       return pair->value == other_pair->value;
> +}
> +
> +static int hwprobe_which_cpus(struct riscv_hwprobe __user *pairs_user,
> +                             size_t pair_count, size_t cpusetsize,
> +                             cpumask_t *cpus)
> +{
> +       struct riscv_hwprobe pairs[RISCV_HWPROBE_MAX_KEY + 1] = {
> +               [0 ... RISCV_HWPROBE_MAX_KEY] = (struct riscv_hwprobe){ .key = -1 }

Minor note: I'd prefer we avoid baking in things that blow up if the
key space becomes non-contiguous or large. I suspect vendor-specific
hwprobe keys are coming, and if we assign them a separate high range
of keyspace, we'll have to unwind anything in the implementation that
assumes a contiguous small key space.

> +       };
> +       struct riscv_hwprobe pair;
> +       struct vdso_data *vd = __arch_get_k_vdso_data();
> +       struct arch_vdso_data *avd = &vd->arch_data;
> +       bool clear_all = false;
> +       cpumask_t one_cpu;
> +       int cpu, ret;
> +       size_t i;
> +
> +       for (i = 0; i < pair_count; i++) {
> +               ret = copy_from_user(&pair, &pairs_user[i], sizeof(pair));
> +               if (ret)
> +                       return -EFAULT;
> +
> +               if (pair.key >= 0 && pair.key <= RISCV_HWPROBE_MAX_KEY) {
> +                       if (pairs[pair.key].key == -1) {
> +                               pairs[pair.key] = pair;
> +                       } else {
> +                               ret = hwprobe_pair_merge(&pairs[pair.key], &pair);
> +                               if (ret)
> +                                       return ret;
> +                       }
> +               } else {
> +                       pair.key = -1;
> +                       pair.value = 0;
> +                       ret = copy_to_user(&pairs_user[i], &pair, sizeof(pair));
> +                       if (ret)
> +                               return -EFAULT;
> +                       clear_all = true;
> +               }
> +       }

So if I write out this algorithm, I get something like:
 * Create an array of every possible key, and dedupe the caller's list
of pairs into this array.
 * For each remaining cpu, go through this array and either confirm
the big array's element matches this cpu's value, or clear the cpu
from the result set.

But why do we go to all the effort of de-duping the caller's array of
pairs? Can't they do that themselves (or pay a small performance
penalty for "yes" results)? Instead, couldn't it be something like:
For each pair in the user's set, for each remaining cpu in the set,
compare the values, or clear the cpu in the remaining set.

Doing that would also take the runtime from O(keyspace * ncpus) to
O(query_lengh * ncpus).

> +
> +       if (clear_all) {
> +               cpumask_clear(cpus);
> +               return 0;
> +       }
> +
> +       if (avd->homogeneous_cpus) {
> +               for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> +                       if (pairs[i].key == -1)
> +                               continue;
> +
> +                       pair.key = pairs[i].key;
> +                       pair.value = avd->all_cpu_hwprobe_values[pairs[i].key];
> +
> +                       if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> +                               cpumask_clear(cpus);
> +                               return 0;
> +                       }
> +               }
> +
> +               return 0;
> +       }

We could move the whole above homogenous_cpus part into the vDSO
function, avoiding a syscall on homogenous systems. It would mean
you'd have to either duplicate the logic here as well, or have it only
in the vDSO and kernel callers don't get the optimization. Probably
there are very few customers of this API in the kernel, so maybe just
having it in the vDSO is a good place to start.

> +
> +       cpumask_clear(&one_cpu);
> +
> +       for_each_cpu(cpu, cpus) {
> +               cpumask_set_cpu(cpu, &one_cpu);
> +
> +               for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> +                       if (pairs[i].key == -1)
> +                               continue;
> +
> +                       pair.key = pairs[i].key;
> +                       pair.value = 0;
> +                       hwprobe_one_pair(&pair, &one_cpu);
> +
> +                       if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> +                               cpumask_clear_cpu(cpu, cpus);
> +                               break;
> +                       }
> +               }
> +
> +               cpumask_clear_cpu(cpu, &one_cpu);
> +       }
> +
> +       return 0;
> +}
> +
>  static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
>                             size_t pair_count, size_t cpusetsize,
>                             unsigned long __user *cpus_user,
>                             unsigned int flags)
>  {
> +       bool which_cpus = false;
> +       cpumask_t cpus;
>         size_t out;
>         int ret;
> -       cpumask_t cpus;
> +
> +       if (flags & RISCV_HWPROBE_WHICH_CPUS) {
> +               if (!cpusetsize || !cpus_user)
> +                       return -EINVAL;
> +               flags &= ~RISCV_HWPROBE_WHICH_CPUS;
> +               which_cpus = true;
> +       }
>
>         /* Check the reserved flags. */
>         if (flags != 0)
> @@ -274,11 +405,24 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
>                 if (ret)
>                         return -EFAULT;
>
> +               cpumask_and(&cpus, &cpus, cpu_online_mask);
> +
> +               if (which_cpus) {
> +                       if (cpumask_empty(&cpus))
> +                               cpumask_copy(&cpus, cpu_online_mask);
> +                       ret = hwprobe_which_cpus(pairs, pair_count, cpusetsize, &cpus);
> +                       if (ret)
> +                               return ret;
> +                       ret = copy_to_user(cpus_user, &cpus, cpusetsize);
> +                       if (ret)
> +                               return -EFAULT;
> +                       return 0;
> +               }
> +
>                 /*
>                  * Userspace must provide at least one online CPU, without that
>                  * there's no way to define what is supported.
>                  */
> -               cpumask_and(&cpus, &cpus, cpu_online_mask);
>                 if (cpumask_empty(&cpus))
>                         return -EINVAL;
>         }
> --
> 2.41.0
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-25 12:12     ` Andrew Jones
@ 2023-09-25 16:26     ` Evan Green
  1 sibling, 0 replies; 26+ messages in thread
From: Evan Green @ 2023-09-25 16:26 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: ajones, linux-riscv, Paul Walmsley, aou, Conor Dooley, apatel

On Mon, Sep 25, 2023 at 4:23 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Thu, 21 Sep 2023 05:55:22 PDT (-0700), ajones@ventanamicro.com wrote:
> > Introduce the first flag for the hwprobe syscall. The flag basically
> > reverses its behavior, i.e. instead of populating the values of keys
> > for a given set of cpus, the set of cpus after the call is the result
> > of finding a set which supports the values of the keys. In order to
> > do this, we implement pair merge and pair compare functions which
> > take the type of value (a single value vs. a bitmap of booleans) into
> > consideration. The flow for the which-cpus syscall variant is as
> > follows:
> >
> >   1. Merge pairs into a set of pairs with unique keys
> >   2. If any unknown keys are seen, return an empty set of cpus
> >   3. If the platform is homogeneous, then check all the pairs
> >      against the "all cpu" values and return early
> >   4. Otherwise, check all the pairs against each cpu individually
> >
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> > ---
> >  Documentation/riscv/hwprobe.rst       |  16 ++-
> >  arch/riscv/include/uapi/asm/hwprobe.h |   3 +
> >  arch/riscv/kernel/sys_riscv.c         | 148 +++++++++++++++++++++++++-
> >  3 files changed, 163 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> > index 132e9acaa8f4..97b1e97e7dd2 100644
> > --- a/Documentation/riscv/hwprobe.rst
> > +++ b/Documentation/riscv/hwprobe.rst
> > @@ -25,8 +25,20 @@ arch, impl), the returned value will only be valid if all CPUs in the given set
> >  have the same value. Otherwise -1 will be returned. For boolean-like keys, the
> >  value returned will be a logical AND of the values for the specified CPUs.
> >  Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> > -all online CPUs. There are currently no flags, this value must be zero for
> > -future compatibility.
> > +all online CPUs. The currently supported flags are:
> > +
> > +* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
> > +  of sys_riscv_hwprobe().  Instead of populating the values of keys for a given
> > +  set of CPUs, the set of CPUs is initially all unset and the values of each key
> > +  are given.  Upon return, the CPUs which all match each of the given key-value
> > +  pairs are set in ``cpus``.  How matching is done depends on the key type.  For
> > +  value-like keys, matching means to be the exact same as the value.  For
> > +  boolean-like keys, matching means the result of a logical AND of the pair's
> > +  value with the CPU's value is exactly the same as the pair's value.  ``cpus``
> > +  may also initially have set bits, in which case the bits of any CPUs which do
> > +  not match the pairs will be cleared, but no other bits will be set.
> > +
> > +All other flags are reserved for future compatibility and must be zero.
> >
> >  On success 0 is returned, on failure a negative error code is returned.
> >
> > diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> > index 86d08a0e617b..36683307c3e4 100644
> > --- a/arch/riscv/include/uapi/asm/hwprobe.h
> > +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> > @@ -40,4 +40,7 @@ struct riscv_hwprobe {
> >  #define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE  6
> >  /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
> >
> > +/* Flags */
> > +#define RISCV_HWPROBE_WHICH_CPUS     (1 << 0)
> > +
> >  #endif
> > diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> > index 14b6dfaa5d9f..c70a72fe6aee 100644
> > --- a/arch/riscv/kernel/sys_riscv.c
> > +++ b/arch/riscv/kernel/sys_riscv.c
> > @@ -245,14 +245,145 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
> >       }
> >  }
> >
> > +static bool hwprobe_key_is_map(__s64 key)
> > +{
> > +     switch (key) {
> > +     case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> > +     case RISCV_HWPROBE_KEY_IMA_EXT_0:
> > +     case RISCV_HWPROBE_KEY_CPUPERF_0:
> > +             return true;
> > +     }
> > +
> > +     return false;
> > +}
> > +
> > +static int hwprobe_pair_merge(struct riscv_hwprobe *to,
> > +                           struct riscv_hwprobe *from)
> > +{
> > +     if (to->key != from->key)
> > +             return -EINVAL;
> > +
> > +     if (hwprobe_key_is_map(to->key)) {
> > +             to->value |= from->value;
> > +             return 0;
> > +     }
> > +
> > +     return to->value == from->value ? 0 : -EINVAL;
> > +}
> > +
> > +static bool hwprobe_pair_cmp(struct riscv_hwprobe *pair,
> > +                          struct riscv_hwprobe *other_pair)
> > +{
> > +     if (pair->key != other_pair->key)
> > +             return false;
> > +
> > +     if (hwprobe_key_is_map(pair->key))
> > +             return (pair->value & other_pair->value) == other_pair->value;
> > +
> > +     return pair->value == other_pair->value;
> > +}
> > +
> > +static int hwprobe_which_cpus(struct riscv_hwprobe __user *pairs_user,
> > +                           size_t pair_count, size_t cpusetsize,
> > +                           cpumask_t *cpus)
> > +{
> > +     struct riscv_hwprobe pairs[RISCV_HWPROBE_MAX_KEY + 1] = {
> > +             [0 ... RISCV_HWPROBE_MAX_KEY] = (struct riscv_hwprobe){ .key = -1 }
> > +     };
> > +     struct riscv_hwprobe pair;
> > +     struct vdso_data *vd = __arch_get_k_vdso_data();
> > +     struct arch_vdso_data *avd = &vd->arch_data;
> > +     bool clear_all = false;
> > +     cpumask_t one_cpu;
> > +     int cpu, ret;
> > +     size_t i;
> > +
> > +     for (i = 0; i < pair_count; i++) {
> > +             ret = copy_from_user(&pair, &pairs_user[i], sizeof(pair));
> > +             if (ret)
> > +                     return -EFAULT;
> > +
> > +             if (pair.key >= 0 && pair.key <= RISCV_HWPROBE_MAX_KEY) {
> > +                     if (pairs[pair.key].key == -1) {
> > +                             pairs[pair.key] = pair;
> > +                     } else {
> > +                             ret = hwprobe_pair_merge(&pairs[pair.key], &pair);
> > +                             if (ret)
> > +                                     return ret;
> > +                     }
> > +             } else {
> > +                     pair.key = -1;
> > +                     pair.value = 0;
> > +                     ret = copy_to_user(&pairs_user[i], &pair, sizeof(pair));
> > +                     if (ret)
> > +                             return -EFAULT;
> > +                     clear_all = true;
> > +             }
> > +     }
> > +
> > +     if (clear_all) {
> > +             cpumask_clear(cpus);
> > +             return 0;
> > +     }
> > +
> > +     if (avd->homogeneous_cpus) {
> > +             for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> > +                     if (pairs[i].key == -1)
> > +                             continue;
> > +
> > +                     pair.key = pairs[i].key;
> > +                     pair.value = avd->all_cpu_hwprobe_values[pairs[i].key];
> > +
> > +                     if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> > +                             cpumask_clear(cpus);
> > +                             return 0;
> > +                     }
> > +             }
> > +
> > +             return 0;
> > +     }
> > +
> > +     cpumask_clear(&one_cpu);
> > +
> > +     for_each_cpu(cpu, cpus) {
> > +             cpumask_set_cpu(cpu, &one_cpu);
> > +
> > +             for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> > +                     if (pairs[i].key == -1)
> > +                             continue;
> > +
> > +                     pair.key = pairs[i].key;
> > +                     pair.value = 0;
> > +                     hwprobe_one_pair(&pair, &one_cpu);
> > +
> > +                     if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> > +                             cpumask_clear_cpu(cpu, cpus);
> > +                             break;
> > +                     }
> > +             }
> > +
> > +             cpumask_clear_cpu(cpu, &one_cpu);
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> >  static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> >                           size_t pair_count, size_t cpusetsize,
> >                           unsigned long __user *cpus_user,
> >                           unsigned int flags)
> >  {
> > +     bool which_cpus = false;
> > +     cpumask_t cpus;
> >       size_t out;
> >       int ret;
> > -     cpumask_t cpus;
> > +
> > +     if (flags & RISCV_HWPROBE_WHICH_CPUS) {
> > +             if (!cpusetsize || !cpus_user)
> > +                     return -EINVAL;
> > +             flags &= ~RISCV_HWPROBE_WHICH_CPUS;
> > +             which_cpus = true;
> > +     }
> >
> >       /* Check the reserved flags. */
> >       if (flags != 0)
> > @@ -274,11 +405,24 @@ static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
> >               if (ret)
> >                       return -EFAULT;
> >
> > +             cpumask_and(&cpus, &cpus, cpu_online_mask);
> > +
> > +             if (which_cpus) {
> > +                     if (cpumask_empty(&cpus))
> > +                             cpumask_copy(&cpus, cpu_online_mask);
> > +                     ret = hwprobe_which_cpus(pairs, pair_count, cpusetsize, &cpus);
> > +                     if (ret)
> > +                             return ret;
> > +                     ret = copy_to_user(cpus_user, &cpus, cpusetsize);
> > +                     if (ret)
> > +                             return -EFAULT;
> > +                     return 0;
>
> So this is now essentailly two syscalls.  IMO it'd be cleaner to split
> out the implementations into two functions (ie,
> hwprobe_{which_cpus,which_featurs}() or whatever) rather than have an
> early out and the rest inline.

I don't have a strong opinion, but I can see it both ways. He'll also
need a vDSO function, and that function will dip into the same vDSO
data as hwprobe, so they're very connected. To your point though, the
code paths diverge immediately, so maybe it's better to just have two
vDSO functions and two syscalls?

>
> Also: maybe we want a whole hwprobe file?  It's sort of its own thing
> now, and it's only going to get bigger...
>
> > +             }
> > +
> >               /*
> >                * Userspace must provide at least one online CPU, without that
> >                * there's no way to define what is supported.
> >                */
> > -             cpumask_and(&cpus, &cpus, cpu_online_mask);
> >               if (cpumask_empty(&cpus))
> >                       return -EINVAL;
> >       }

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-09-25 16:16   ` Evan Green
@ 2023-10-05 13:23     ` Andrew Jones
  2023-10-05 17:12       ` Evan Green
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2023-10-05 13:23 UTC (permalink / raw)
  To: Evan Green; +Cc: linux-riscv, paul.walmsley, palmer, aou, conor.dooley, apatel

On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
> On Thu, Sep 21, 2023 at 5:55 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > Introduce the first flag for the hwprobe syscall. The flag basically
> > reverses its behavior, i.e. instead of populating the values of keys
> > for a given set of cpus, the set of cpus after the call is the result
> > of finding a set which supports the values of the keys. In order to
> > do this, we implement pair merge and pair compare functions which
> > take the type of value (a single value vs. a bitmap of booleans) into
> > consideration. The flow for the which-cpus syscall variant is as
> > follows:
> >
> >   1. Merge pairs into a set of pairs with unique keys
> >   2. If any unknown keys are seen, return an empty set of cpus
> >   3. If the platform is homogeneous, then check all the pairs
> >      against the "all cpu" values and return early
> >   4. Otherwise, check all the pairs against each cpu individually
> >
> > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> 
> Overall the concept seems reasonable to me. I could see an application
> like Chrome using this to schedule intensive or latency-sensitive
> activities (video playback comes to mind) on only cores that meet a
> certain performance bar, if any do.

Thanks for the review, Evan and sorry for my slow return to this.

> 
> > ---
> >  Documentation/riscv/hwprobe.rst       |  16 ++-
> >  arch/riscv/include/uapi/asm/hwprobe.h |   3 +
> >  arch/riscv/kernel/sys_riscv.c         | 148 +++++++++++++++++++++++++-
> >  3 files changed, 163 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> > index 132e9acaa8f4..97b1e97e7dd2 100644
> > --- a/Documentation/riscv/hwprobe.rst
> > +++ b/Documentation/riscv/hwprobe.rst
> > @@ -25,8 +25,20 @@ arch, impl), the returned value will only be valid if all CPUs in the given set
> >  have the same value. Otherwise -1 will be returned. For boolean-like keys, the
> >  value returned will be a logical AND of the values for the specified CPUs.
> >  Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> > -all online CPUs. There are currently no flags, this value must be zero for
> > -future compatibility.
> > +all online CPUs. The currently supported flags are:
> > +
> > +* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
> > +  of sys_riscv_hwprobe().  Instead of populating the values of keys for a given
> > +  set of CPUs, the set of CPUs is initially all unset and the values of each key
> > +  are given.  Upon return, the CPUs which all match each of the given key-value
> > +  pairs are set in ``cpus``.  How matching is done depends on the key type.  For
> > +  value-like keys, matching means to be the exact same as the value.  For
> > +  boolean-like keys, matching means the result of a logical AND of the pair's
> > +  value with the CPU's value is exactly the same as the pair's value.  ``cpus``
> > +  may also initially have set bits, in which case the bits of any CPUs which do
> > +  not match the pairs will be cleared, but no other bits will be set.
> > +
> > +All other flags are reserved for future compatibility and must be zero.
> >
> >  On success 0 is returned, on failure a negative error code is returned.
> >
> > diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> > index 86d08a0e617b..36683307c3e4 100644
> > --- a/arch/riscv/include/uapi/asm/hwprobe.h
> > +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> > @@ -40,4 +40,7 @@ struct riscv_hwprobe {
> >  #define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE    6
> >  /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
> >
> > +/* Flags */
> > +#define RISCV_HWPROBE_WHICH_CPUS       (1 << 0)
> > +
> >  #endif
> > diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> > index 14b6dfaa5d9f..c70a72fe6aee 100644
> > --- a/arch/riscv/kernel/sys_riscv.c
> > +++ b/arch/riscv/kernel/sys_riscv.c
> > @@ -245,14 +245,145 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
> >         }
> >  }
> >
> > +static bool hwprobe_key_is_map(__s64 key)
> 
> The word "map" confuses me some, as the closest my brain can get is a
> synonym for dictionary, which makes me think "unique value". Based on
> the keys returning true, you really mean bitfield.

I was thinking bitmap, which I prefer using over bitfield, since C
bitfield's come to mind first when I see that word. I'll spell out
bitmap for v1.

> 
> > +{
> > +       switch (key) {
> > +       case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> > +       case RISCV_HWPROBE_KEY_IMA_EXT_0:
> > +       case RISCV_HWPROBE_KEY_CPUPERF_0:
> > +               return true;
> > +       }
> > +
> > +       return false;
> > +}
> > +
> > +static int hwprobe_pair_merge(struct riscv_hwprobe *to,
> > +                             struct riscv_hwprobe *from)
> 
> Why is the merge step necessary? Maybe I'll see further below. Oh yes,
> I do, but see my comment below on how we might get rid of this.
> 
> > +{
> > +       if (to->key != from->key)
> > +               return -EINVAL;
> > +
> > +       if (hwprobe_key_is_map(to->key)) {
> > +               to->value |= from->value;
> > +               return 0;
> > +       }
> > +
> > +       return to->value == from->value ? 0 : -EINVAL;
> > +}
> > +
> > +static bool hwprobe_pair_cmp(struct riscv_hwprobe *pair,
> > +                            struct riscv_hwprobe *other_pair)
> > +{
> > +       if (pair->key != other_pair->key)
> > +               return false;
> > +
> > +       if (hwprobe_key_is_map(pair->key))
> > +               return (pair->value & other_pair->value) == other_pair->value;
> > +
> > +       return pair->value == other_pair->value;
> > +}
> > +
> > +static int hwprobe_which_cpus(struct riscv_hwprobe __user *pairs_user,
> > +                             size_t pair_count, size_t cpusetsize,
> > +                             cpumask_t *cpus)
> > +{
> > +       struct riscv_hwprobe pairs[RISCV_HWPROBE_MAX_KEY + 1] = {
> > +               [0 ... RISCV_HWPROBE_MAX_KEY] = (struct riscv_hwprobe){ .key = -1 }
> 
> Minor note: I'd prefer we avoid baking in things that blow up if the
> key space becomes non-contiguous or large. I suspect vendor-specific
> hwprobe keys are coming, and if we assign them a separate high range
> of keyspace, we'll have to unwind anything in the implementation that
> assumes a contiguous small key space.

Good point. I'll try to find a better way to manage key space for v1.

> 
> > +       };
> > +       struct riscv_hwprobe pair;
> > +       struct vdso_data *vd = __arch_get_k_vdso_data();
> > +       struct arch_vdso_data *avd = &vd->arch_data;
> > +       bool clear_all = false;
> > +       cpumask_t one_cpu;
> > +       int cpu, ret;
> > +       size_t i;
> > +
> > +       for (i = 0; i < pair_count; i++) {
> > +               ret = copy_from_user(&pair, &pairs_user[i], sizeof(pair));
> > +               if (ret)
> > +                       return -EFAULT;
> > +
> > +               if (pair.key >= 0 && pair.key <= RISCV_HWPROBE_MAX_KEY) {
> > +                       if (pairs[pair.key].key == -1) {
> > +                               pairs[pair.key] = pair;
> > +                       } else {
> > +                               ret = hwprobe_pair_merge(&pairs[pair.key], &pair);
> > +                               if (ret)
> > +                                       return ret;
> > +                       }
> > +               } else {
> > +                       pair.key = -1;
> > +                       pair.value = 0;
> > +                       ret = copy_to_user(&pairs_user[i], &pair, sizeof(pair));
> > +                       if (ret)
> > +                               return -EFAULT;
> > +                       clear_all = true;
> > +               }
> > +       }
> 
> So if I write out this algorithm, I get something like:
>  * Create an array of every possible key, and dedupe the caller's list
> of pairs into this array.
>  * For each remaining cpu, go through this array and either confirm
> the big array's element matches this cpu's value, or clear the cpu
> from the result set.
> 
> But why do we go to all the effort of de-duping the caller's array of
> pairs? Can't they do that themselves (or pay a small performance
> penalty for "yes" results)? Instead, couldn't it be something like:
> For each pair in the user's set, for each remaining cpu in the set,
> compare the values, or clear the cpu in the remaining set.
> 
> Doing that would also take the runtime from O(keyspace * ncpus) to
> O(query_lengh * ncpus).

I want to de-dupe for two reasons:
 * query_length is unbounded, but keyspace is bounded (and is currently
   small)
 * The merge function also provides an opportunity for sanity checks of
   value-type duplicates. If the values don't match then the syscall
   returns -EINVAL.

> 
> > +
> > +       if (clear_all) {
> > +               cpumask_clear(cpus);
> > +               return 0;
> > +       }
> > +
> > +       if (avd->homogeneous_cpus) {
> > +               for (i = 0; i <= RISCV_HWPROBE_MAX_KEY; i++) {
> > +                       if (pairs[i].key == -1)
> > +                               continue;
> > +
> > +                       pair.key = pairs[i].key;
> > +                       pair.value = avd->all_cpu_hwprobe_values[pairs[i].key];
> > +
> > +                       if (!hwprobe_pair_cmp(&pair, &pairs[i])) {
> > +                               cpumask_clear(cpus);
> > +                               return 0;
> > +                       }
> > +               }
> > +
> > +               return 0;
> > +       }
> 
> We could move the whole above homogenous_cpus part into the vDSO
> function, avoiding a syscall on homogenous systems. It would mean
> you'd have to either duplicate the logic here as well, or have it only
> in the vDSO and kernel callers don't get the optimization. Probably
> there are very few customers of this API in the kernel, so maybe just
> having it in the vDSO is a good place to start.

Considering I'll be splitting which-cpus into its own function, where
other vDSO code needs to be duplicated, then I'll also look into moving
the above logic into that vDSO code, and then just drop it from here,
since, as you suggest, the vDSO location should be sufficient.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-10-05 13:23     ` Andrew Jones
@ 2023-10-05 17:12       ` Evan Green
  2023-10-05 18:11         ` Andrew Jones
  0 siblings, 1 reply; 26+ messages in thread
From: Evan Green @ 2023-10-05 17:12 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, paul.walmsley, palmer, aou, conor.dooley, apatel

On Thu, Oct 5, 2023 at 6:23 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
> > On Thu, Sep 21, 2023 at 5:55 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > >
> > > Introduce the first flag for the hwprobe syscall. The flag basically
> > > reverses its behavior, i.e. instead of populating the values of keys
> > > for a given set of cpus, the set of cpus after the call is the result
> > > of finding a set which supports the values of the keys. In order to
> > > do this, we implement pair merge and pair compare functions which
> > > take the type of value (a single value vs. a bitmap of booleans) into
> > > consideration. The flow for the which-cpus syscall variant is as
> > > follows:
> > >
> > >   1. Merge pairs into a set of pairs with unique keys
> > >   2. If any unknown keys are seen, return an empty set of cpus
> > >   3. If the platform is homogeneous, then check all the pairs
> > >      against the "all cpu" values and return early
> > >   4. Otherwise, check all the pairs against each cpu individually
> > >
> > > Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
> >
> > Overall the concept seems reasonable to me. I could see an application
> > like Chrome using this to schedule intensive or latency-sensitive
> > activities (video playback comes to mind) on only cores that meet a
> > certain performance bar, if any do.
>
> Thanks for the review, Evan and sorry for my slow return to this.
>
> >
> > > ---
> > >  Documentation/riscv/hwprobe.rst       |  16 ++-
> > >  arch/riscv/include/uapi/asm/hwprobe.h |   3 +
> > >  arch/riscv/kernel/sys_riscv.c         | 148 +++++++++++++++++++++++++-
> > >  3 files changed, 163 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> > > index 132e9acaa8f4..97b1e97e7dd2 100644
> > > --- a/Documentation/riscv/hwprobe.rst
> > > +++ b/Documentation/riscv/hwprobe.rst
> > > @@ -25,8 +25,20 @@ arch, impl), the returned value will only be valid if all CPUs in the given set
> > >  have the same value. Otherwise -1 will be returned. For boolean-like keys, the
> > >  value returned will be a logical AND of the values for the specified CPUs.
> > >  Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for
> > > -all online CPUs. There are currently no flags, this value must be zero for
> > > -future compatibility.
> > > +all online CPUs. The currently supported flags are:
> > > +
> > > +* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior
> > > +  of sys_riscv_hwprobe().  Instead of populating the values of keys for a given
> > > +  set of CPUs, the set of CPUs is initially all unset and the values of each key
> > > +  are given.  Upon return, the CPUs which all match each of the given key-value
> > > +  pairs are set in ``cpus``.  How matching is done depends on the key type.  For
> > > +  value-like keys, matching means to be the exact same as the value.  For
> > > +  boolean-like keys, matching means the result of a logical AND of the pair's
> > > +  value with the CPU's value is exactly the same as the pair's value.  ``cpus``
> > > +  may also initially have set bits, in which case the bits of any CPUs which do
> > > +  not match the pairs will be cleared, but no other bits will be set.
> > > +
> > > +All other flags are reserved for future compatibility and must be zero.
> > >
> > >  On success 0 is returned, on failure a negative error code is returned.
> > >
> > > diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> > > index 86d08a0e617b..36683307c3e4 100644
> > > --- a/arch/riscv/include/uapi/asm/hwprobe.h
> > > +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> > > @@ -40,4 +40,7 @@ struct riscv_hwprobe {
> > >  #define RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE    6
> > >  /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
> > >
> > > +/* Flags */
> > > +#define RISCV_HWPROBE_WHICH_CPUS       (1 << 0)
> > > +
> > >  #endif
> > > diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> > > index 14b6dfaa5d9f..c70a72fe6aee 100644
> > > --- a/arch/riscv/kernel/sys_riscv.c
> > > +++ b/arch/riscv/kernel/sys_riscv.c
> > > @@ -245,14 +245,145 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
> > >         }
> > >  }
> > >
> > > +static bool hwprobe_key_is_map(__s64 key)
> >
> > The word "map" confuses me some, as the closest my brain can get is a
> > synonym for dictionary, which makes me think "unique value". Based on
> > the keys returning true, you really mean bitfield.
>
> I was thinking bitmap, which I prefer using over bitfield, since C
> bitfield's come to mind first when I see that word. I'll spell out
> bitmap for v1.

SGTM.

>
> >
> > > +{
> > > +       switch (key) {
> > > +       case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> > > +       case RISCV_HWPROBE_KEY_IMA_EXT_0:
> > > +       case RISCV_HWPROBE_KEY_CPUPERF_0:
> > > +               return true;
> > > +       }
> > > +
> > > +       return false;
> > > +}
> > > +
> > > +static int hwprobe_pair_merge(struct riscv_hwprobe *to,
> > > +                             struct riscv_hwprobe *from)
> >
> > Why is the merge step necessary? Maybe I'll see further below. Oh yes,
> > I do, but see my comment below on how we might get rid of this.
> >
> > > +{
> > > +       if (to->key != from->key)
> > > +               return -EINVAL;
> > > +
> > > +       if (hwprobe_key_is_map(to->key)) {
> > > +               to->value |= from->value;
> > > +               return 0;
> > > +       }
> > > +
> > > +       return to->value == from->value ? 0 : -EINVAL;
> > > +}
> > > +
> > > +static bool hwprobe_pair_cmp(struct riscv_hwprobe *pair,
> > > +                            struct riscv_hwprobe *other_pair)
> > > +{
> > > +       if (pair->key != other_pair->key)
> > > +               return false;
> > > +
> > > +       if (hwprobe_key_is_map(pair->key))
> > > +               return (pair->value & other_pair->value) == other_pair->value;
> > > +
> > > +       return pair->value == other_pair->value;
> > > +}
> > > +
> > > +static int hwprobe_which_cpus(struct riscv_hwprobe __user *pairs_user,
> > > +                             size_t pair_count, size_t cpusetsize,
> > > +                             cpumask_t *cpus)
> > > +{
> > > +       struct riscv_hwprobe pairs[RISCV_HWPROBE_MAX_KEY + 1] = {
> > > +               [0 ... RISCV_HWPROBE_MAX_KEY] = (struct riscv_hwprobe){ .key = -1 }
> >
> > Minor note: I'd prefer we avoid baking in things that blow up if the
> > key space becomes non-contiguous or large. I suspect vendor-specific
> > hwprobe keys are coming, and if we assign them a separate high range
> > of keyspace, we'll have to unwind anything in the implementation that
> > assumes a contiguous small key space.
>
> Good point. I'll try to find a better way to manage key space for v1.
>
> >
> > > +       };
> > > +       struct riscv_hwprobe pair;
> > > +       struct vdso_data *vd = __arch_get_k_vdso_data();
> > > +       struct arch_vdso_data *avd = &vd->arch_data;
> > > +       bool clear_all = false;
> > > +       cpumask_t one_cpu;
> > > +       int cpu, ret;
> > > +       size_t i;
> > > +
> > > +       for (i = 0; i < pair_count; i++) {
> > > +               ret = copy_from_user(&pair, &pairs_user[i], sizeof(pair));
> > > +               if (ret)
> > > +                       return -EFAULT;
> > > +
> > > +               if (pair.key >= 0 && pair.key <= RISCV_HWPROBE_MAX_KEY) {
> > > +                       if (pairs[pair.key].key == -1) {
> > > +                               pairs[pair.key] = pair;
> > > +                       } else {
> > > +                               ret = hwprobe_pair_merge(&pairs[pair.key], &pair);
> > > +                               if (ret)
> > > +                                       return ret;
> > > +                       }
> > > +               } else {
> > > +                       pair.key = -1;
> > > +                       pair.value = 0;
> > > +                       ret = copy_to_user(&pairs_user[i], &pair, sizeof(pair));
> > > +                       if (ret)
> > > +                               return -EFAULT;
> > > +                       clear_all = true;
> > > +               }
> > > +       }
> >
> > So if I write out this algorithm, I get something like:
> >  * Create an array of every possible key, and dedupe the caller's list
> > of pairs into this array.
> >  * For each remaining cpu, go through this array and either confirm
> > the big array's element matches this cpu's value, or clear the cpu
> > from the result set.
> >
> > But why do we go to all the effort of de-duping the caller's array of
> > pairs? Can't they do that themselves (or pay a small performance
> > penalty for "yes" results)? Instead, couldn't it be something like:
> > For each pair in the user's set, for each remaining cpu in the set,
> > compare the values, or clear the cpu in the remaining set.
> >
> > Doing that would also take the runtime from O(keyspace * ncpus) to
> > O(query_lengh * ncpus).
>
> I want to de-dupe for two reasons:
>  * query_length is unbounded, but keyspace is bounded (and is currently
>    small)

Ok, but remember that if we ship this behavior today, we're committed
to it forever. The keyspace is likely to grow, it would be unfortunate
if this step started to cause a noticeable performance delay.

>  * The merge function also provides an opportunity for sanity checks of
>    value-type duplicates. If the values don't match then the syscall
>    returns -EINVAL.

But the behavior if we don't do the deduping is that we get an empty
set of CPUs (eg the caller has asked "which CPUs have featureX==1 &&
featureX==2", answer: none). Other than perhaps a diagnostic print,
can the application take any practical different course of action from
these two results? In other words, the only logical choices I can see
an app or library taking to either of these two responses is 1) shrug
and move on or 2) panic and abort.

In my opinion the deduping has real costs associated with it, but
other than more directly pointing out that usermode made a silly
request, I don't understand the benefit.

-Evan

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-10-05 17:12       ` Evan Green
@ 2023-10-05 18:11         ` Andrew Jones
  2023-10-09 15:39           ` Andrew Jones
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2023-10-05 18:11 UTC (permalink / raw)
  To: Evan Green; +Cc: linux-riscv, paul.walmsley, palmer, aou, conor.dooley, apatel

On Thu, Oct 05, 2023 at 10:12:14AM -0700, Evan Green wrote:
> On Thu, Oct 5, 2023 at 6:23 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
...
> > > So if I write out this algorithm, I get something like:
> > >  * Create an array of every possible key, and dedupe the caller's list
> > > of pairs into this array.
> > >  * For each remaining cpu, go through this array and either confirm
> > > the big array's element matches this cpu's value, or clear the cpu
> > > from the result set.
> > >
> > > But why do we go to all the effort of de-duping the caller's array of
> > > pairs? Can't they do that themselves (or pay a small performance
> > > penalty for "yes" results)? Instead, couldn't it be something like:
> > > For each pair in the user's set, for each remaining cpu in the set,
> > > compare the values, or clear the cpu in the remaining set.
> > >
> > > Doing that would also take the runtime from O(keyspace * ncpus) to
> > > O(query_lengh * ncpus).
> >
> > I want to de-dupe for two reasons:
> >  * query_length is unbounded, but keyspace is bounded (and is currently
> >    small)
> 
> Ok, but remember that if we ship this behavior today, we're committed
> to it forever. The keyspace is likely to grow, it would be unfortunate
> if this step started to cause a noticeable performance delay.

Maybe it's not too late to put a bound on pairs, i.e. pair_count greater
than some number should return E2BIG.

> 
> >  * The merge function also provides an opportunity for sanity checks of
> >    value-type duplicates. If the values don't match then the syscall
> >    returns -EINVAL.
> 
> But the behavior if we don't do the deduping is that we get an empty
> set of CPUs (eg the caller has asked "which CPUs have featureX==1 &&
> featureX==2", answer: none). Other than perhaps a diagnostic print,
> can the application take any practical different course of action from
> these two results? In other words, the only logical choices I can see
> an app or library taking to either of these two responses is 1) shrug
> and move on or 2) panic and abort.

For (2), an EINVAL would indicate what went wrong to the developer and
it may be fixed. Also, for EINVAL, there's (3), which is to propagate
the error, possibly all the way to the user, where the inputs could get
corrected and the call may be retried.

> In my opinion the deduping has real costs associated with it, but
> other than more directly pointing out that usermode made a silly
> request, I don't understand the benefit.
>

When pair_count is large, then the cost should be less to loop over the
pairs once, de-duping, than to loop over them once for each CPU. If
we opt to limit pair_count, then I agree there's less need for a de-
dupe loop, but we still need to copy_from_user() each pair, and that
should only be done once. So, we either complicate the body of the main
loop in order to do it on the first time through, or we do it in a
separate loop first (where we might as well also de-dupe :-)

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-09-25 11:23   ` Palmer Dabbelt
  2023-09-25 12:14     ` Andrew Jones
@ 2023-10-09 14:15     ` Andrew Jones
  1 sibling, 0 replies; 26+ messages in thread
From: Andrew Jones @ 2023-10-09 14:15 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: linux-riscv, Paul Walmsley, aou, Evan Green, Conor Dooley, apatel

On Mon, Sep 25, 2023 at 04:23:32AM -0700, Palmer Dabbelt wrote:
> Kind of a meta-comment here, but we should add the key type to each key
> number in the docs.  I suppose maybe it should be obvious which is which
> from the meaning of the keys, but can't hurt to be explicit about it.
>

I was just about to do this task, but I see it's already written this
way. The doc uses "Contains the value of ..." and "An unsigned int..."
for value types and "A bitmask..." for bitmask types. As long as we
stay consistent with that as more are added, then I think we're good to
go.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-10-05 18:11         ` Andrew Jones
@ 2023-10-09 15:39           ` Andrew Jones
  2023-10-09 16:50             ` Evan Green
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2023-10-09 15:39 UTC (permalink / raw)
  To: Evan Green; +Cc: linux-riscv, paul.walmsley, palmer, aou, conor.dooley, apatel

On Thu, Oct 05, 2023 at 08:11:25PM +0200, Andrew Jones wrote:
> On Thu, Oct 05, 2023 at 10:12:14AM -0700, Evan Green wrote:
> > On Thu, Oct 5, 2023 at 6:23 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > >
> > > On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
> ...
> > > > So if I write out this algorithm, I get something like:
> > > >  * Create an array of every possible key, and dedupe the caller's list
> > > > of pairs into this array.
> > > >  * For each remaining cpu, go through this array and either confirm
> > > > the big array's element matches this cpu's value, or clear the cpu
> > > > from the result set.
> > > >
> > > > But why do we go to all the effort of de-duping the caller's array of
> > > > pairs? Can't they do that themselves (or pay a small performance
> > > > penalty for "yes" results)? Instead, couldn't it be something like:
> > > > For each pair in the user's set, for each remaining cpu in the set,
> > > > compare the values, or clear the cpu in the remaining set.
> > > >
> > > > Doing that would also take the runtime from O(keyspace * ncpus) to
> > > > O(query_lengh * ncpus).
> > >
> > > I want to de-dupe for two reasons:
> > >  * query_length is unbounded, but keyspace is bounded (and is currently
> > >    small)
> > 
> > Ok, but remember that if we ship this behavior today, we're committed
> > to it forever. The keyspace is likely to grow, it would be unfortunate
> > if this step started to cause a noticeable performance delay.
> 
> Maybe it's not too late to put a bound on pairs, i.e. pair_count greater
> than some number should return E2BIG.
>

Scratch this idea. I went looking for precedent for limiting the length of
input arrays to syscalls, but couldn't find any. To the contrary, I found
that move_pages() used to return E2BIG when there were "too many pages to
move", but it hasn't done so since 2.6.29 and, even then, it appears the
concern was multiplication overflow, not having "too much" work. So, we
should leave the number of pairs unbounded, but, too me, that means we
should de-dupe, since we can do the copy_from_user() once for each at that
time, and any user which decides to provide each bit separately for each
bitmask key type should get tiny speedup.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-10-09 15:39           ` Andrew Jones
@ 2023-10-09 16:50             ` Evan Green
  2023-10-10 10:44               ` Andrew Jones
  0 siblings, 1 reply; 26+ messages in thread
From: Evan Green @ 2023-10-09 16:50 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, paul.walmsley, palmer, aou, conor.dooley, apatel

On Mon, Oct 9, 2023 at 8:39 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Thu, Oct 05, 2023 at 08:11:25PM +0200, Andrew Jones wrote:
> > On Thu, Oct 05, 2023 at 10:12:14AM -0700, Evan Green wrote:
> > > On Thu, Oct 5, 2023 at 6:23 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > > >
> > > > On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
> > ...
> > > > > So if I write out this algorithm, I get something like:
> > > > >  * Create an array of every possible key, and dedupe the caller's list
> > > > > of pairs into this array.
> > > > >  * For each remaining cpu, go through this array and either confirm
> > > > > the big array's element matches this cpu's value, or clear the cpu
> > > > > from the result set.
> > > > >
> > > > > But why do we go to all the effort of de-duping the caller's array of
> > > > > pairs? Can't they do that themselves (or pay a small performance
> > > > > penalty for "yes" results)? Instead, couldn't it be something like:
> > > > > For each pair in the user's set, for each remaining cpu in the set,
> > > > > compare the values, or clear the cpu in the remaining set.
> > > > >
> > > > > Doing that would also take the runtime from O(keyspace * ncpus) to
> > > > > O(query_lengh * ncpus).
> > > >
> > > > I want to de-dupe for two reasons:
> > > >  * query_length is unbounded, but keyspace is bounded (and is currently
> > > >    small)
> > >
> > > Ok, but remember that if we ship this behavior today, we're committed
> > > to it forever. The keyspace is likely to grow, it would be unfortunate
> > > if this step started to cause a noticeable performance delay.
> >
> > Maybe it's not too late to put a bound on pairs, i.e. pair_count greater
> > than some number should return E2BIG.
> >
>
> Scratch this idea. I went looking for precedent for limiting the length of
> input arrays to syscalls, but couldn't find any. To the contrary, I found
> that move_pages() used to return E2BIG when there were "too many pages to
> move", but it hasn't done so since 2.6.29 and, even then, it appears the
> concern was multiplication overflow, not having "too much" work. So, we
> should leave the number of pairs unbounded, but, too me, that means we
> should de-dupe, since we can do the copy_from_user() once for each at that
> time, and any user which decides to provide each bit separately for each
> bitmask key type should get tiny speedup.

I still don't get the usecase where usermode is going to be submitting
this large jumble of keys, replete with many duplicates. And even if
such a case did exist, why should the kernel be the one deduping,
dragging in a runtime penalty for all other callers that didn't submit
duplicates. Usermode could de-dupe on their own I'd think.

-Evan

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-10-09 16:50             ` Evan Green
@ 2023-10-10 10:44               ` Andrew Jones
  2023-10-10 15:14                 ` Evan Green
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Jones @ 2023-10-10 10:44 UTC (permalink / raw)
  To: Evan Green; +Cc: linux-riscv, paul.walmsley, palmer, aou, conor.dooley, apatel

On Mon, Oct 09, 2023 at 09:50:00AM -0700, Evan Green wrote:
> On Mon, Oct 9, 2023 at 8:39 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Thu, Oct 05, 2023 at 08:11:25PM +0200, Andrew Jones wrote:
> > > On Thu, Oct 05, 2023 at 10:12:14AM -0700, Evan Green wrote:
> > > > On Thu, Oct 5, 2023 at 6:23 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > > > >
> > > > > On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
> > > ...
> > > > > > So if I write out this algorithm, I get something like:
> > > > > >  * Create an array of every possible key, and dedupe the caller's list
> > > > > > of pairs into this array.
> > > > > >  * For each remaining cpu, go through this array and either confirm
> > > > > > the big array's element matches this cpu's value, or clear the cpu
> > > > > > from the result set.
> > > > > >
> > > > > > But why do we go to all the effort of de-duping the caller's array of
> > > > > > pairs? Can't they do that themselves (or pay a small performance
> > > > > > penalty for "yes" results)? Instead, couldn't it be something like:
> > > > > > For each pair in the user's set, for each remaining cpu in the set,
> > > > > > compare the values, or clear the cpu in the remaining set.
> > > > > >
> > > > > > Doing that would also take the runtime from O(keyspace * ncpus) to
> > > > > > O(query_lengh * ncpus).
> > > > >
> > > > > I want to de-dupe for two reasons:
> > > > >  * query_length is unbounded, but keyspace is bounded (and is currently
> > > > >    small)
> > > >
> > > > Ok, but remember that if we ship this behavior today, we're committed
> > > > to it forever. The keyspace is likely to grow, it would be unfortunate
> > > > if this step started to cause a noticeable performance delay.
> > >
> > > Maybe it's not too late to put a bound on pairs, i.e. pair_count greater
> > > than some number should return E2BIG.
> > >
> >
> > Scratch this idea. I went looking for precedent for limiting the length of
> > input arrays to syscalls, but couldn't find any. To the contrary, I found
> > that move_pages() used to return E2BIG when there were "too many pages to
> > move", but it hasn't done so since 2.6.29 and, even then, it appears the
> > concern was multiplication overflow, not having "too much" work. So, we
> > should leave the number of pairs unbounded, but, too me, that means we
> > should de-dupe, since we can do the copy_from_user() once for each at that
> > time, and any user which decides to provide each bit separately for each
> > bitmask key type should get tiny speedup.
> 
> I still don't get the usecase where usermode is going to be submitting
> this large jumble of keys, replete with many duplicates. And even if
> such a case did exist, why should the kernel be the one deduping,
> dragging in a runtime penalty for all other callers that didn't submit
> duplicates. Usermode could de-dupe on their own I'd think.

It's not about usecases, but about best handling of all allowed inputs.

If callers don't submit duplicates or make the call with a small
pair_count, then the de-dupe loop would have negligible overhead in
terms of time. However, thinking about this some more, we can avoid
issuing copy_from_user() multiple times by looping over cpus within
the loop over user input pairs. We'll do cpumask_set_cpu() and
cpumask_clear_cpu() on the one_cpu cpumask more times instead, but
that's just a couple loads and some arithmetic. I think this would
be the better approach, not because of time overhead, but because
we won't have to store pairs at all, avoiding memory allocation or
the assumption that we can store a pair per possible key on the stack.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-10-10 10:44               ` Andrew Jones
@ 2023-10-10 15:14                 ` Evan Green
  2023-10-10 16:22                   ` Andrew Jones
  0 siblings, 1 reply; 26+ messages in thread
From: Evan Green @ 2023-10-10 15:14 UTC (permalink / raw)
  To: Andrew Jones
  Cc: linux-riscv, paul.walmsley, palmer, aou, conor.dooley, apatel

On Tue, Oct 10, 2023 at 3:44 AM Andrew Jones <ajones@ventanamicro.com> wrote:
>
> On Mon, Oct 09, 2023 at 09:50:00AM -0700, Evan Green wrote:
> > On Mon, Oct 9, 2023 at 8:39 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > >
> > > On Thu, Oct 05, 2023 at 08:11:25PM +0200, Andrew Jones wrote:
> > > > On Thu, Oct 05, 2023 at 10:12:14AM -0700, Evan Green wrote:
> > > > > On Thu, Oct 5, 2023 at 6:23 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
> > > > ...
> > > > > > > So if I write out this algorithm, I get something like:
> > > > > > >  * Create an array of every possible key, and dedupe the caller's list
> > > > > > > of pairs into this array.
> > > > > > >  * For each remaining cpu, go through this array and either confirm
> > > > > > > the big array's element matches this cpu's value, or clear the cpu
> > > > > > > from the result set.
> > > > > > >
> > > > > > > But why do we go to all the effort of de-duping the caller's array of
> > > > > > > pairs? Can't they do that themselves (or pay a small performance
> > > > > > > penalty for "yes" results)? Instead, couldn't it be something like:
> > > > > > > For each pair in the user's set, for each remaining cpu in the set,
> > > > > > > compare the values, or clear the cpu in the remaining set.
> > > > > > >
> > > > > > > Doing that would also take the runtime from O(keyspace * ncpus) to
> > > > > > > O(query_lengh * ncpus).
> > > > > >
> > > > > > I want to de-dupe for two reasons:
> > > > > >  * query_length is unbounded, but keyspace is bounded (and is currently
> > > > > >    small)
> > > > >
> > > > > Ok, but remember that if we ship this behavior today, we're committed
> > > > > to it forever. The keyspace is likely to grow, it would be unfortunate
> > > > > if this step started to cause a noticeable performance delay.
> > > >
> > > > Maybe it's not too late to put a bound on pairs, i.e. pair_count greater
> > > > than some number should return E2BIG.
> > > >
> > >
> > > Scratch this idea. I went looking for precedent for limiting the length of
> > > input arrays to syscalls, but couldn't find any. To the contrary, I found
> > > that move_pages() used to return E2BIG when there were "too many pages to
> > > move", but it hasn't done so since 2.6.29 and, even then, it appears the
> > > concern was multiplication overflow, not having "too much" work. So, we
> > > should leave the number of pairs unbounded, but, too me, that means we
> > > should de-dupe, since we can do the copy_from_user() once for each at that
> > > time, and any user which decides to provide each bit separately for each
> > > bitmask key type should get tiny speedup.
> >
> > I still don't get the usecase where usermode is going to be submitting
> > this large jumble of keys, replete with many duplicates. And even if
> > such a case did exist, why should the kernel be the one deduping,
> > dragging in a runtime penalty for all other callers that didn't submit
> > duplicates. Usermode could de-dupe on their own I'd think.
>
> It's not about usecases, but about best handling of all allowed inputs.
>
> If callers don't submit duplicates or make the call with a small
> pair_count, then the de-dupe loop would have negligible overhead in
> terms of time. However, thinking about this some more, we can avoid
> issuing copy_from_user() multiple times by looping over cpus within
> the loop over user input pairs. We'll do cpumask_set_cpu() and
> cpumask_clear_cpu() on the one_cpu cpumask more times instead, but
> that's just a couple loads and some arithmetic. I think this would
> be the better approach, not because of time overhead, but because
> we won't have to store pairs at all, avoiding memory allocation or
> the assumption that we can store a pair per possible key on the stack.

That sounds good to me. If I understand the proposal right, you'll
also be able to keep your optimization of looping only over the cpus
that are "still in the running", which was a nice early exit for
certain "no" answers.
-Evan

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag
  2023-10-10 15:14                 ` Evan Green
@ 2023-10-10 16:22                   ` Andrew Jones
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Jones @ 2023-10-10 16:22 UTC (permalink / raw)
  To: Evan Green; +Cc: linux-riscv, paul.walmsley, palmer, aou, conor.dooley, apatel

On Tue, Oct 10, 2023 at 08:14:16AM -0700, Evan Green wrote:
> On Tue, Oct 10, 2023 at 3:44 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Mon, Oct 09, 2023 at 09:50:00AM -0700, Evan Green wrote:
> > > On Mon, Oct 9, 2023 at 8:39 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > > >
> > > > On Thu, Oct 05, 2023 at 08:11:25PM +0200, Andrew Jones wrote:
> > > > > On Thu, Oct 05, 2023 at 10:12:14AM -0700, Evan Green wrote:
> > > > > > On Thu, Oct 5, 2023 at 6:23 AM Andrew Jones <ajones@ventanamicro.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 25, 2023 at 09:16:01AM -0700, Evan Green wrote:
> > > > > ...
> > > > > > > > So if I write out this algorithm, I get something like:
> > > > > > > >  * Create an array of every possible key, and dedupe the caller's list
> > > > > > > > of pairs into this array.
> > > > > > > >  * For each remaining cpu, go through this array and either confirm
> > > > > > > > the big array's element matches this cpu's value, or clear the cpu
> > > > > > > > from the result set.
> > > > > > > >
> > > > > > > > But why do we go to all the effort of de-duping the caller's array of
> > > > > > > > pairs? Can't they do that themselves (or pay a small performance
> > > > > > > > penalty for "yes" results)? Instead, couldn't it be something like:
> > > > > > > > For each pair in the user's set, for each remaining cpu in the set,
> > > > > > > > compare the values, or clear the cpu in the remaining set.
> > > > > > > >
> > > > > > > > Doing that would also take the runtime from O(keyspace * ncpus) to
> > > > > > > > O(query_lengh * ncpus).
> > > > > > >
> > > > > > > I want to de-dupe for two reasons:
> > > > > > >  * query_length is unbounded, but keyspace is bounded (and is currently
> > > > > > >    small)
> > > > > >
> > > > > > Ok, but remember that if we ship this behavior today, we're committed
> > > > > > to it forever. The keyspace is likely to grow, it would be unfortunate
> > > > > > if this step started to cause a noticeable performance delay.
> > > > >
> > > > > Maybe it's not too late to put a bound on pairs, i.e. pair_count greater
> > > > > than some number should return E2BIG.
> > > > >
> > > >
> > > > Scratch this idea. I went looking for precedent for limiting the length of
> > > > input arrays to syscalls, but couldn't find any. To the contrary, I found
> > > > that move_pages() used to return E2BIG when there were "too many pages to
> > > > move", but it hasn't done so since 2.6.29 and, even then, it appears the
> > > > concern was multiplication overflow, not having "too much" work. So, we
> > > > should leave the number of pairs unbounded, but, too me, that means we
> > > > should de-dupe, since we can do the copy_from_user() once for each at that
> > > > time, and any user which decides to provide each bit separately for each
> > > > bitmask key type should get tiny speedup.
> > >
> > > I still don't get the usecase where usermode is going to be submitting
> > > this large jumble of keys, replete with many duplicates. And even if
> > > such a case did exist, why should the kernel be the one deduping,
> > > dragging in a runtime penalty for all other callers that didn't submit
> > > duplicates. Usermode could de-dupe on their own I'd think.
> >
> > It's not about usecases, but about best handling of all allowed inputs.
> >
> > If callers don't submit duplicates or make the call with a small
> > pair_count, then the de-dupe loop would have negligible overhead in
> > terms of time. However, thinking about this some more, we can avoid
> > issuing copy_from_user() multiple times by looping over cpus within
> > the loop over user input pairs. We'll do cpumask_set_cpu() and
> > cpumask_clear_cpu() on the one_cpu cpumask more times instead, but
> > that's just a couple loads and some arithmetic. I think this would
> > be the better approach, not because of time overhead, but because
> > we won't have to store pairs at all, avoiding memory allocation or
> > the assumption that we can store a pair per possible key on the stack.
> 
> That sounds good to me. If I understand the proposal right, you'll
> also be able to keep your optimization of looping only over the cpus
> that are "still in the running", which was a nice early exit for
> certain "no" answers.

Yup. If a cpu is dropped due to one pair, then it won't be checked against
later pairs. And, if a pair's key is not recognized, then no cpus will be
checked against later pairs, as that's a "clear all" condition.

Thanks,
drew

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-10-10 16:23 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-21 12:55 [RFC PATCH 0/5] RISC-V: hwprobe related stuff Andrew Jones
2023-09-21 12:55 ` [RFC PATCH 1/5] RISC-V: hwprobe: Clarify cpus size parameter Andrew Jones
2023-09-25 11:23   ` Palmer Dabbelt
2023-09-25 12:07     ` Andrew Jones
2023-09-21 12:55 ` [RFC PATCH 2/5] RISC-V: selftests: Replace cpu_count with cpusetsize Andrew Jones
2023-09-25 11:23   ` Palmer Dabbelt
2023-09-21 12:55 ` [RFC PATCH 3/5] RISC-V: hwprobe: Introduce which-cpus flag Andrew Jones
2023-09-25 11:23   ` Palmer Dabbelt
2023-09-25 12:12     ` Andrew Jones
2023-09-25 16:26     ` Evan Green
2023-09-25 11:23   ` Palmer Dabbelt
2023-09-25 12:14     ` Andrew Jones
2023-10-09 14:15     ` Andrew Jones
2023-09-25 16:16   ` Evan Green
2023-10-05 13:23     ` Andrew Jones
2023-10-05 17:12       ` Evan Green
2023-10-05 18:11         ` Andrew Jones
2023-10-09 15:39           ` Andrew Jones
2023-10-09 16:50             ` Evan Green
2023-10-10 10:44               ` Andrew Jones
2023-10-10 15:14                 ` Evan Green
2023-10-10 16:22                   ` Andrew Jones
2023-09-21 12:55 ` [RFC PATCH 4/5] RISC-V: selftests: Add which-cpus hwprobe test Andrew Jones
2023-09-25 11:23   ` Palmer Dabbelt
2023-09-21 12:55 ` [RFC PATCH 5/5] RISC-V: selftests: Apply which-cpus flag to CBO " Andrew Jones
2023-09-25 11:23   ` Palmer Dabbelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox