* [PATCH v2 -next 01/11] cgroup/cpuset: introduce cpuset-v1.c
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
@ 2024-08-26 13:26 ` Chen Ridong
2024-08-26 19:07 ` Waiman Long
2024-08-26 13:26 ` [PATCH v2 -next 02/11] cgroup/cpuset: move common code to cpuset-internal.h Chen Ridong
` (9 subsequent siblings)
10 siblings, 1 reply; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:26 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
This patch introduces the cgroup/cpuset-v1.c source file which will be
used for all legacy (cgroup v1) cpuset cgroup code. It also introduces
cgroup/cpuset-internal.h to keep declarations shared between
cgroup/cpuset.c and cpuset/cpuset-v1.c.
As of now, let's compile it if CONFIG_CPUSET is set. Later on it can be
switched to use a separate config option, so that the legacy code won't be
compiled if not required.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
MAINTAINERS | 2 ++
kernel/cgroup/Makefile | 2 +-
kernel/cgroup/cpuset-internal.h | 7 +++++++
kernel/cgroup/cpuset-v1.c | 4 ++++
4 files changed, 14 insertions(+), 1 deletion(-)
create mode 100644 kernel/cgroup/cpuset-internal.h
create mode 100644 kernel/cgroup/cpuset-v1.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 82e3924816d2..3b5ec1cafd95 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5698,6 +5698,8 @@ S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
F: Documentation/admin-guide/cgroup-v1/cpusets.rst
F: include/linux/cpuset.h
+F: kernel/cgroup/cpuset-internal.h
+F: kernel/cgroup/cpuset-v1.c
F: kernel/cgroup/cpuset.c
F: tools/testing/selftests/cgroup/test_cpuset.c
F: tools/testing/selftests/cgroup/test_cpuset_prs.sh
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 12f8457ad1f9..005ac4c675cb 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -4,6 +4,6 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o freezer.o
obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
obj-$(CONFIG_CGROUP_PIDS) += pids.o
obj-$(CONFIG_CGROUP_RDMA) += rdma.o
-obj-$(CONFIG_CPUSETS) += cpuset.o
+obj-$(CONFIG_CPUSETS) += cpuset.o cpuset-v1.o
obj-$(CONFIG_CGROUP_MISC) += misc.o
obj-$(CONFIG_CGROUP_DEBUG) += debug.o
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
new file mode 100644
index 000000000000..6605be417e32
--- /dev/null
+++ b/kernel/cgroup/cpuset-internal.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef __CPUSET_INTERNAL_H
+#define __CPUSET_INTERNAL_H
+
+#endif /* __CPUSET_INTERNAL_H */
+
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
new file mode 100644
index 000000000000..ae166eb4f75d
--- /dev/null
+++ b/kernel/cgroup/cpuset-v1.c
@@ -0,0 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include "cpuset-internal.h"
+
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2 -next 01/11] cgroup/cpuset: introduce cpuset-v1.c
2024-08-26 13:26 ` [PATCH v2 -next 01/11] cgroup/cpuset: introduce cpuset-v1.c Chen Ridong
@ 2024-08-26 19:07 ` Waiman Long
2024-08-27 1:45 ` Chen Ridong
0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2024-08-26 19:07 UTC (permalink / raw)
To: Chen Ridong, tj, lizefan.x, hannes, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
On 8/26/24 09:26, Chen Ridong wrote:
> This patch introduces the cgroup/cpuset-v1.c source file which will be
> used for all legacy (cgroup v1) cpuset cgroup code. It also introduces
> cgroup/cpuset-internal.h to keep declarations shared between
> cgroup/cpuset.c and cpuset/cpuset-v1.c.
>
> As of now, let's compile it if CONFIG_CPUSET is set. Later on it can be
> switched to use a separate config option, so that the legacy code won't be
> compiled if not required.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> MAINTAINERS | 2 ++
> kernel/cgroup/Makefile | 2 +-
> kernel/cgroup/cpuset-internal.h | 7 +++++++
> kernel/cgroup/cpuset-v1.c | 4 ++++
> 4 files changed, 14 insertions(+), 1 deletion(-)
> create mode 100644 kernel/cgroup/cpuset-internal.h
> create mode 100644 kernel/cgroup/cpuset-v1.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 82e3924816d2..3b5ec1cafd95 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5698,6 +5698,8 @@ S: Maintained
> T: git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
> F: Documentation/admin-guide/cgroup-v1/cpusets.rst
> F: include/linux/cpuset.h
> +F: kernel/cgroup/cpuset-internal.h
> +F: kernel/cgroup/cpuset-v1.c
> F: kernel/cgroup/cpuset.c
> F: tools/testing/selftests/cgroup/test_cpuset.c
> F: tools/testing/selftests/cgroup/test_cpuset_prs.sh
> diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
> index 12f8457ad1f9..005ac4c675cb 100644
> --- a/kernel/cgroup/Makefile
> +++ b/kernel/cgroup/Makefile
> @@ -4,6 +4,6 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o freezer.o
> obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
> obj-$(CONFIG_CGROUP_PIDS) += pids.o
> obj-$(CONFIG_CGROUP_RDMA) += rdma.o
> -obj-$(CONFIG_CPUSETS) += cpuset.o
> +obj-$(CONFIG_CPUSETS) += cpuset.o cpuset-v1.o
> obj-$(CONFIG_CGROUP_MISC) += misc.o
> obj-$(CONFIG_CGROUP_DEBUG) += debug.o
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> new file mode 100644
> index 000000000000..6605be417e32
> --- /dev/null
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#ifndef __CPUSET_INTERNAL_H
> +#define __CPUSET_INTERNAL_H
> +
> +#endif /* __CPUSET_INTERNAL_H */
> +
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> new file mode 100644
> index 000000000000..ae166eb4f75d
> --- /dev/null
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -0,0 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +#include "cpuset-internal.h"
> +
Don't leave a blank line at the end of a file. You will get the
following error when applying the patch.
0001-cgroup_cpuset-introduce-cpuset-v1.c.patch:70: new blank line at EOF.
All your patches except the last one have this problem.
Cheers,
Longman
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 -next 01/11] cgroup/cpuset: introduce cpuset-v1.c
2024-08-26 19:07 ` Waiman Long
@ 2024-08-27 1:45 ` Chen Ridong
0 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-27 1:45 UTC (permalink / raw)
To: Waiman Long, tj, lizefan.x, hannes, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel
On 2024/8/27 3:07, Waiman Long wrote:
> On 8/26/24 09:26, Chen Ridong wrote:
>> This patch introduces the cgroup/cpuset-v1.c source file which will be
>> used for all legacy (cgroup v1) cpuset cgroup code. It also introduces
>> cgroup/cpuset-internal.h to keep declarations shared between
>> cgroup/cpuset.c and cpuset/cpuset-v1.c.
>>
>> As of now, let's compile it if CONFIG_CPUSET is set. Later on it can be
>> switched to use a separate config option, so that the legacy code
>> won't be
>> compiled if not required.
>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> MAINTAINERS | 2 ++
>> kernel/cgroup/Makefile | 2 +-
>> kernel/cgroup/cpuset-internal.h | 7 +++++++
>> kernel/cgroup/cpuset-v1.c | 4 ++++
>> 4 files changed, 14 insertions(+), 1 deletion(-)
>> create mode 100644 kernel/cgroup/cpuset-internal.h
>> create mode 100644 kernel/cgroup/cpuset-v1.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 82e3924816d2..3b5ec1cafd95 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -5698,6 +5698,8 @@ S: Maintained
>> T: git git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git
>> F: Documentation/admin-guide/cgroup-v1/cpusets.rst
>> F: include/linux/cpuset.h
>> +F: kernel/cgroup/cpuset-internal.h
>> +F: kernel/cgroup/cpuset-v1.c
>> F: kernel/cgroup/cpuset.c
>> F: tools/testing/selftests/cgroup/test_cpuset.c
>> F: tools/testing/selftests/cgroup/test_cpuset_prs.sh
>> diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
>> index 12f8457ad1f9..005ac4c675cb 100644
>> --- a/kernel/cgroup/Makefile
>> +++ b/kernel/cgroup/Makefile
>> @@ -4,6 +4,6 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o
>> freezer.o
>> obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
>> obj-$(CONFIG_CGROUP_PIDS) += pids.o
>> obj-$(CONFIG_CGROUP_RDMA) += rdma.o
>> -obj-$(CONFIG_CPUSETS) += cpuset.o
>> +obj-$(CONFIG_CPUSETS) += cpuset.o cpuset-v1.o
>> obj-$(CONFIG_CGROUP_MISC) += misc.o
>> obj-$(CONFIG_CGROUP_DEBUG) += debug.o
>> diff --git a/kernel/cgroup/cpuset-internal.h
>> b/kernel/cgroup/cpuset-internal.h
>> new file mode 100644
>> index 000000000000..6605be417e32
>> --- /dev/null
>> +++ b/kernel/cgroup/cpuset-internal.h
>> @@ -0,0 +1,7 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +
>> +#ifndef __CPUSET_INTERNAL_H
>> +#define __CPUSET_INTERNAL_H
>> +
>> +#endif /* __CPUSET_INTERNAL_H */
>> +
>> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>> new file mode 100644
>> index 000000000000..ae166eb4f75d
>> --- /dev/null
>> +++ b/kernel/cgroup/cpuset-v1.c
>> @@ -0,0 +1,4 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +
>> +#include "cpuset-internal.h"
>> +
>
> Don't leave a blank line at the end of a file. You will get the
> following error when applying the patch.
>
> 0001-cgroup_cpuset-introduce-cpuset-v1.c.patch:70: new blank line at EOF.
>
> All your patches except the last one have this problem.
>
> Cheers,
> Longman
Thank you, will fix it.
Best regards,
Ridong
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 -next 02/11] cgroup/cpuset: move common code to cpuset-internal.h
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
2024-08-26 13:26 ` [PATCH v2 -next 01/11] cgroup/cpuset: introduce cpuset-v1.c Chen Ridong
@ 2024-08-26 13:26 ` Chen Ridong
2024-08-26 13:26 ` [PATCH v2 -next 03/11] cgroup/cpuset: move memory_pressure to cpuset-v1.c Chen Ridong
` (8 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:26 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
Move some declarations that will be used for cpuset v1 and v2,
including 'cpuset struct', 'cpuset_flagbits_t', cpuset_filetype_t,etc.
No logical change.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset-internal.h | 236 +++++++++++++++++++++++++++++++-
kernel/cgroup/cpuset.c | 236 +-------------------------------
2 files changed, 236 insertions(+), 236 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 6605be417e32..ffea3eefebdf 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -3,5 +3,239 @@
#ifndef __CPUSET_INTERNAL_H
#define __CPUSET_INTERNAL_H
-#endif /* __CPUSET_INTERNAL_H */
+#include <linux/union_find.h>
+#include <linux/cpumask.h>
+#include <linux/spinlock.h>
+#include <linux/cpuset.h>
+#include <linux/cgroup.h>
+
+/* See "Frequency meter" comments, below. */
+
+struct fmeter {
+ int cnt; /* unprocessed events count */
+ int val; /* most recent output value */
+ time64_t time; /* clock (secs) when val computed */
+ spinlock_t lock; /* guards read or write of above */
+};
+
+/*
+ * Invalid partition error code
+ */
+enum prs_errcode {
+ PERR_NONE = 0,
+ PERR_INVCPUS,
+ PERR_INVPARENT,
+ PERR_NOTPART,
+ PERR_NOTEXCL,
+ PERR_NOCPUS,
+ PERR_HOTPLUG,
+ PERR_CPUSEMPTY,
+ PERR_HKEEPING,
+ PERR_ACCESS,
+};
+
+/* bits in struct cpuset flags field */
+typedef enum {
+ CS_ONLINE,
+ CS_CPU_EXCLUSIVE,
+ CS_MEM_EXCLUSIVE,
+ CS_MEM_HARDWALL,
+ CS_MEMORY_MIGRATE,
+ CS_SCHED_LOAD_BALANCE,
+ CS_SPREAD_PAGE,
+ CS_SPREAD_SLAB,
+} cpuset_flagbits_t;
+
+/* The various types of files and directories in a cpuset file system */
+
+typedef enum {
+ FILE_MEMORY_MIGRATE,
+ FILE_CPULIST,
+ FILE_MEMLIST,
+ FILE_EFFECTIVE_CPULIST,
+ FILE_EFFECTIVE_MEMLIST,
+ FILE_SUBPARTS_CPULIST,
+ FILE_EXCLUSIVE_CPULIST,
+ FILE_EFFECTIVE_XCPULIST,
+ FILE_ISOLATED_CPULIST,
+ FILE_CPU_EXCLUSIVE,
+ FILE_MEM_EXCLUSIVE,
+ FILE_MEM_HARDWALL,
+ FILE_SCHED_LOAD_BALANCE,
+ FILE_PARTITION_ROOT,
+ FILE_SCHED_RELAX_DOMAIN_LEVEL,
+ FILE_MEMORY_PRESSURE_ENABLED,
+ FILE_MEMORY_PRESSURE,
+ FILE_SPREAD_PAGE,
+ FILE_SPREAD_SLAB,
+} cpuset_filetype_t;
+
+struct cpuset {
+ struct cgroup_subsys_state css;
+
+ unsigned long flags; /* "unsigned long" so bitops work */
+
+ /*
+ * On default hierarchy:
+ *
+ * The user-configured masks can only be changed by writing to
+ * cpuset.cpus and cpuset.mems, and won't be limited by the
+ * parent masks.
+ *
+ * The effective masks is the real masks that apply to the tasks
+ * in the cpuset. They may be changed if the configured masks are
+ * changed or hotplug happens.
+ *
+ * effective_mask == configured_mask & parent's effective_mask,
+ * and if it ends up empty, it will inherit the parent's mask.
+ *
+ *
+ * On legacy hierarchy:
+ *
+ * The user-configured masks are always the same with effective masks.
+ */
+
+ /* user-configured CPUs and Memory Nodes allow to tasks */
+ cpumask_var_t cpus_allowed;
+ nodemask_t mems_allowed;
+
+ /* effective CPUs and Memory Nodes allow to tasks */
+ cpumask_var_t effective_cpus;
+ nodemask_t effective_mems;
+
+ /*
+ * Exclusive CPUs dedicated to current cgroup (default hierarchy only)
+ *
+ * The effective_cpus of a valid partition root comes solely from its
+ * effective_xcpus and some of the effective_xcpus may be distributed
+ * to sub-partitions below & hence excluded from its effective_cpus.
+ * For a valid partition root, its effective_cpus have no relationship
+ * with cpus_allowed unless its exclusive_cpus isn't set.
+ *
+ * This value will only be set if either exclusive_cpus is set or
+ * when this cpuset becomes a local partition root.
+ */
+ cpumask_var_t effective_xcpus;
+
+ /*
+ * Exclusive CPUs as requested by the user (default hierarchy only)
+ *
+ * Its value is independent of cpus_allowed and designates the set of
+ * CPUs that can be granted to the current cpuset or its children when
+ * it becomes a valid partition root. The effective set of exclusive
+ * CPUs granted (effective_xcpus) depends on whether those exclusive
+ * CPUs are passed down by its ancestors and not yet taken up by
+ * another sibling partition root along the way.
+ *
+ * If its value isn't set, it defaults to cpus_allowed.
+ */
+ cpumask_var_t exclusive_cpus;
+
+ /*
+ * This is old Memory Nodes tasks took on.
+ *
+ * - top_cpuset.old_mems_allowed is initialized to mems_allowed.
+ * - A new cpuset's old_mems_allowed is initialized when some
+ * task is moved into it.
+ * - old_mems_allowed is used in cpuset_migrate_mm() when we change
+ * cpuset.mems_allowed and have tasks' nodemask updated, and
+ * then old_mems_allowed is updated to mems_allowed.
+ */
+ nodemask_t old_mems_allowed;
+
+ struct fmeter fmeter; /* memory_pressure filter */
+
+ /*
+ * Tasks are being attached to this cpuset. Used to prevent
+ * zeroing cpus/mems_allowed between ->can_attach() and ->attach().
+ */
+ int attach_in_progress;
+
+ /* for custom sched domain */
+ int relax_domain_level;
+
+ /* number of valid local child partitions */
+ int nr_subparts;
+ /* partition root state */
+ int partition_root_state;
+
+ /*
+ * number of SCHED_DEADLINE tasks attached to this cpuset, so that we
+ * know when to rebuild associated root domain bandwidth information.
+ */
+ int nr_deadline_tasks;
+ int nr_migrate_dl_tasks;
+ u64 sum_migrate_dl_bw;
+
+ /* Invalid partition error code, not lock protected */
+ enum prs_errcode prs_err;
+
+ /* Handle for cpuset.cpus.partition */
+ struct cgroup_file partition_file;
+
+ /* Remote partition silbling list anchored at remote_children */
+ struct list_head remote_sibling;
+
+ /* Used to merge intersecting subsets for generate_sched_domains */
+ struct uf_node node;
+};
+
+static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
+{
+ return css ? container_of(css, struct cpuset, css) : NULL;
+}
+
+/* Retrieve the cpuset for a task */
+static inline struct cpuset *task_cs(struct task_struct *task)
+{
+ return css_cs(task_css(task, cpuset_cgrp_id));
+}
+
+static inline struct cpuset *parent_cs(struct cpuset *cs)
+{
+ return css_cs(cs->css.parent);
+}
+
+/* convenient tests for these bits */
+static inline bool is_cpuset_online(struct cpuset *cs)
+{
+ return test_bit(CS_ONLINE, &cs->flags) && !css_is_dying(&cs->css);
+}
+
+static inline int is_cpu_exclusive(const struct cpuset *cs)
+{
+ return test_bit(CS_CPU_EXCLUSIVE, &cs->flags);
+}
+
+static inline int is_mem_exclusive(const struct cpuset *cs)
+{
+ return test_bit(CS_MEM_EXCLUSIVE, &cs->flags);
+}
+
+static inline int is_mem_hardwall(const struct cpuset *cs)
+{
+ return test_bit(CS_MEM_HARDWALL, &cs->flags);
+}
+
+static inline int is_sched_load_balance(const struct cpuset *cs)
+{
+ return test_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+}
+
+static inline int is_memory_migrate(const struct cpuset *cs)
+{
+ return test_bit(CS_MEMORY_MIGRATE, &cs->flags);
+}
+
+static inline int is_spread_page(const struct cpuset *cs)
+{
+ return test_bit(CS_SPREAD_PAGE, &cs->flags);
+}
+
+static inline int is_spread_slab(const struct cpuset *cs)
+{
+ return test_bit(CS_SPREAD_SLAB, &cs->flags);
+}
+
+#endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 7db55eed63cf..61763dd70de5 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -22,11 +22,9 @@
* distribution for more details.
*/
#include "cgroup-internal.h"
+#include "cpuset-internal.h"
#include <linux/cpu.h>
-#include <linux/cpumask.h>
-#include <linux/cpuset.h>
-#include <linux/delay.h>
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/kernel.h>
@@ -40,13 +38,10 @@
#include <linux/sched/mm.h>
#include <linux/sched/task.h>
#include <linux/security.h>
-#include <linux/spinlock.h>
#include <linux/oom.h>
#include <linux/sched/isolation.h>
-#include <linux/cgroup.h>
#include <linux/wait.h>
#include <linux/workqueue.h>
-#include <linux/union_find.h>
DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key);
DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);
@@ -58,31 +53,6 @@ DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);
*/
DEFINE_STATIC_KEY_FALSE(cpusets_insane_config_key);
-/* See "Frequency meter" comments, below. */
-
-struct fmeter {
- int cnt; /* unprocessed events count */
- int val; /* most recent output value */
- time64_t time; /* clock (secs) when val computed */
- spinlock_t lock; /* guards read or write of above */
-};
-
-/*
- * Invalid partition error code
- */
-enum prs_errcode {
- PERR_NONE = 0,
- PERR_INVCPUS,
- PERR_INVPARENT,
- PERR_NOTPART,
- PERR_NOTEXCL,
- PERR_NOCPUS,
- PERR_HOTPLUG,
- PERR_CPUSEMPTY,
- PERR_HKEEPING,
- PERR_ACCESS,
-};
-
static const char * const perr_strings[] = {
[PERR_INVCPUS] = "Invalid cpu list in cpuset.cpus.exclusive",
[PERR_INVPARENT] = "Parent is an invalid partition root",
@@ -95,117 +65,6 @@ static const char * const perr_strings[] = {
[PERR_ACCESS] = "Enable partition not permitted",
};
-struct cpuset {
- struct cgroup_subsys_state css;
-
- unsigned long flags; /* "unsigned long" so bitops work */
-
- /*
- * On default hierarchy:
- *
- * The user-configured masks can only be changed by writing to
- * cpuset.cpus and cpuset.mems, and won't be limited by the
- * parent masks.
- *
- * The effective masks is the real masks that apply to the tasks
- * in the cpuset. They may be changed if the configured masks are
- * changed or hotplug happens.
- *
- * effective_mask == configured_mask & parent's effective_mask,
- * and if it ends up empty, it will inherit the parent's mask.
- *
- *
- * On legacy hierarchy:
- *
- * The user-configured masks are always the same with effective masks.
- */
-
- /* user-configured CPUs and Memory Nodes allow to tasks */
- cpumask_var_t cpus_allowed;
- nodemask_t mems_allowed;
-
- /* effective CPUs and Memory Nodes allow to tasks */
- cpumask_var_t effective_cpus;
- nodemask_t effective_mems;
-
- /*
- * Exclusive CPUs dedicated to current cgroup (default hierarchy only)
- *
- * The effective_cpus of a valid partition root comes solely from its
- * effective_xcpus and some of the effective_xcpus may be distributed
- * to sub-partitions below & hence excluded from its effective_cpus.
- * For a valid partition root, its effective_cpus have no relationship
- * with cpus_allowed unless its exclusive_cpus isn't set.
- *
- * This value will only be set if either exclusive_cpus is set or
- * when this cpuset becomes a local partition root.
- */
- cpumask_var_t effective_xcpus;
-
- /*
- * Exclusive CPUs as requested by the user (default hierarchy only)
- *
- * Its value is independent of cpus_allowed and designates the set of
- * CPUs that can be granted to the current cpuset or its children when
- * it becomes a valid partition root. The effective set of exclusive
- * CPUs granted (effective_xcpus) depends on whether those exclusive
- * CPUs are passed down by its ancestors and not yet taken up by
- * another sibling partition root along the way.
- *
- * If its value isn't set, it defaults to cpus_allowed.
- */
- cpumask_var_t exclusive_cpus;
-
- /*
- * This is old Memory Nodes tasks took on.
- *
- * - top_cpuset.old_mems_allowed is initialized to mems_allowed.
- * - A new cpuset's old_mems_allowed is initialized when some
- * task is moved into it.
- * - old_mems_allowed is used in cpuset_migrate_mm() when we change
- * cpuset.mems_allowed and have tasks' nodemask updated, and
- * then old_mems_allowed is updated to mems_allowed.
- */
- nodemask_t old_mems_allowed;
-
- struct fmeter fmeter; /* memory_pressure filter */
-
- /*
- * Tasks are being attached to this cpuset. Used to prevent
- * zeroing cpus/mems_allowed between ->can_attach() and ->attach().
- */
- int attach_in_progress;
-
- /* for custom sched domain */
- int relax_domain_level;
-
- /* number of valid local child partitions */
- int nr_subparts;
-
- /* partition root state */
- int partition_root_state;
-
- /*
- * number of SCHED_DEADLINE tasks attached to this cpuset, so that we
- * know when to rebuild associated root domain bandwidth information.
- */
- int nr_deadline_tasks;
- int nr_migrate_dl_tasks;
- u64 sum_migrate_dl_bw;
-
- /* Invalid partition error code, not lock protected */
- enum prs_errcode prs_err;
-
- /* Handle for cpuset.cpus.partition */
- struct cgroup_file partition_file;
-
- /* Remote partition silbling list anchored at remote_children */
- struct list_head remote_sibling;
-
- /* Used to merge intersecting subsets for generate_sched_domains */
- struct uf_node node;
-};
-
/*
* Legacy hierarchy call to cgroup_transfer_tasks() is handled asynchrously
*/
@@ -274,22 +133,6 @@ struct tmpmasks {
cpumask_var_t new_cpus; /* For update_cpumasks_hier() */
};
-static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
-{
- return css ? container_of(css, struct cpuset, css) : NULL;
-}
-
-/* Retrieve the cpuset for a task */
-static inline struct cpuset *task_cs(struct task_struct *task)
-{
- return css_cs(task_css(task, cpuset_cgrp_id));
-}
-
-static inline struct cpuset *parent_cs(struct cpuset *cs)
-{
- return css_cs(cs->css.parent);
-}
-
void inc_dl_tasks_cs(struct task_struct *p)
{
struct cpuset *cs = task_cs(p);
@@ -304,59 +147,6 @@ void dec_dl_tasks_cs(struct task_struct *p)
cs->nr_deadline_tasks--;
}
-/* bits in struct cpuset flags field */
-typedef enum {
- CS_ONLINE,
- CS_CPU_EXCLUSIVE,
- CS_MEM_EXCLUSIVE,
- CS_MEM_HARDWALL,
- CS_MEMORY_MIGRATE,
- CS_SCHED_LOAD_BALANCE,
- CS_SPREAD_PAGE,
- CS_SPREAD_SLAB,
-} cpuset_flagbits_t;
-
-/* convenient tests for these bits */
-static inline bool is_cpuset_online(struct cpuset *cs)
-{
- return test_bit(CS_ONLINE, &cs->flags) && !css_is_dying(&cs->css);
-}
-
-static inline int is_cpu_exclusive(const struct cpuset *cs)
-{
- return test_bit(CS_CPU_EXCLUSIVE, &cs->flags);
-}
-
-static inline int is_mem_exclusive(const struct cpuset *cs)
-{
- return test_bit(CS_MEM_EXCLUSIVE, &cs->flags);
-}
-
-static inline int is_mem_hardwall(const struct cpuset *cs)
-{
- return test_bit(CS_MEM_HARDWALL, &cs->flags);
-}
-
-static inline int is_sched_load_balance(const struct cpuset *cs)
-{
- return test_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
-}
-
-static inline int is_memory_migrate(const struct cpuset *cs)
-{
- return test_bit(CS_MEMORY_MIGRATE, &cs->flags);
-}
-
-static inline int is_spread_page(const struct cpuset *cs)
-{
- return test_bit(CS_SPREAD_PAGE, &cs->flags);
-}
-
-static inline int is_spread_slab(const struct cpuset *cs)
-{
- return test_bit(CS_SPREAD_SLAB, &cs->flags);
-}
-
static inline int is_partition_valid(const struct cpuset *cs)
{
return cs->partition_root_state > 0;
@@ -3529,30 +3319,6 @@ static void cpuset_attach(struct cgroup_taskset *tset)
mutex_unlock(&cpuset_mutex);
}
-/* The various types of files and directories in a cpuset file system */
-
-typedef enum {
- FILE_MEMORY_MIGRATE,
- FILE_CPULIST,
- FILE_MEMLIST,
- FILE_EFFECTIVE_CPULIST,
- FILE_EFFECTIVE_MEMLIST,
- FILE_SUBPARTS_CPULIST,
- FILE_EXCLUSIVE_CPULIST,
- FILE_EFFECTIVE_XCPULIST,
- FILE_ISOLATED_CPULIST,
- FILE_CPU_EXCLUSIVE,
- FILE_MEM_EXCLUSIVE,
- FILE_MEM_HARDWALL,
- FILE_SCHED_LOAD_BALANCE,
- FILE_PARTITION_ROOT,
- FILE_SCHED_RELAX_DOMAIN_LEVEL,
- FILE_MEMORY_PRESSURE_ENABLED,
- FILE_MEMORY_PRESSURE,
- FILE_SPREAD_PAGE,
- FILE_SPREAD_SLAB,
-} cpuset_filetype_t;
-
static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
u64 val)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 -next 03/11] cgroup/cpuset: move memory_pressure to cpuset-v1.c
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
2024-08-26 13:26 ` [PATCH v2 -next 01/11] cgroup/cpuset: introduce cpuset-v1.c Chen Ridong
2024-08-26 13:26 ` [PATCH v2 -next 02/11] cgroup/cpuset: move common code to cpuset-internal.h Chen Ridong
@ 2024-08-26 13:26 ` Chen Ridong
2024-08-26 13:26 ` [PATCH v2 -next 04/11] cgroup/cpuset: move relax_domain_level " Chen Ridong
` (7 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:26 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
Collection of memory_pressure can be enabled by writing 1 to the cpuset
file 'memory_pressure_enabled', which is only for cpuset-v1. Therefore,
move the corresponding code to cpuset-v1.c.
Currently, the 'fmeter_init' and 'fmeter_getrate' functions are called
at cpuset.c, so expose them to cpuset.c.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset-internal.h | 7 ++
kernel/cgroup/cpuset-v1.c | 134 ++++++++++++++++++++++++++++++++
kernel/cgroup/cpuset.c | 134 --------------------------------
3 files changed, 141 insertions(+), 134 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index ffea3eefebdf..7911c86bf012 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -238,4 +238,11 @@ static inline int is_spread_slab(const struct cpuset *cs)
return test_bit(CS_SPREAD_SLAB, &cs->flags);
}
+/*
+ * cpuset-v1.c
+ */
+
+void fmeter_init(struct fmeter *fmp);
+int fmeter_getrate(struct fmeter *fmp);
+
#endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index ae166eb4f75d..f17ba44bc566 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -2,3 +2,137 @@
#include "cpuset-internal.h"
+/*
+ * Frequency meter - How fast is some event occurring?
+ *
+ * These routines manage a digitally filtered, constant time based,
+ * event frequency meter. There are four routines:
+ * fmeter_init() - initialize a frequency meter.
+ * fmeter_markevent() - called each time the event happens.
+ * fmeter_getrate() - returns the recent rate of such events.
+ * fmeter_update() - internal routine used to update fmeter.
+ *
+ * A common data structure is passed to each of these routines,
+ * which is used to keep track of the state required to manage the
+ * frequency meter and its digital filter.
+ *
+ * The filter works on the number of events marked per unit time.
+ * The filter is single-pole low-pass recursive (IIR). The time unit
+ * is 1 second. Arithmetic is done using 32-bit integers scaled to
+ * simulate 3 decimal digits of precision (multiplied by 1000).
+ *
+ * With an FM_COEF of 933, and a time base of 1 second, the filter
+ * has a half-life of 10 seconds, meaning that if the events quit
+ * happening, then the rate returned from the fmeter_getrate()
+ * will be cut in half each 10 seconds, until it converges to zero.
+ *
+ * It is not worth doing a real infinitely recursive filter. If more
+ * than FM_MAXTICKS ticks have elapsed since the last filter event,
+ * just compute FM_MAXTICKS ticks worth, by which point the level
+ * will be stable.
+ *
+ * Limit the count of unprocessed events to FM_MAXCNT, so as to avoid
+ * arithmetic overflow in the fmeter_update() routine.
+ *
+ * Given the simple 32 bit integer arithmetic used, this meter works
+ * best for reporting rates between one per millisecond (msec) and
+ * one per 32 (approx) seconds. At constant rates faster than one
+ * per msec it maxes out at values just under 1,000,000. At constant
+ * rates between one per msec, and one per second it will stabilize
+ * to a value N*1000, where N is the rate of events per second.
+ * At constant rates between one per second and one per 32 seconds,
+ * it will be choppy, moving up on the seconds that have an event,
+ * and then decaying until the next event. At rates slower than
+ * about one in 32 seconds, it decays all the way back to zero between
+ * each event.
+ */
+
+#define FM_COEF 933 /* coefficient for half-life of 10 secs */
+#define FM_MAXTICKS ((u32)99) /* useless computing more ticks than this */
+#define FM_MAXCNT 1000000 /* limit cnt to avoid overflow */
+#define FM_SCALE 1000 /* faux fixed point scale */
+
+/* Initialize a frequency meter */
+void fmeter_init(struct fmeter *fmp)
+{
+ fmp->cnt = 0;
+ fmp->val = 0;
+ fmp->time = 0;
+ spin_lock_init(&fmp->lock);
+}
+
+/* Internal meter update - process cnt events and update value */
+static void fmeter_update(struct fmeter *fmp)
+{
+ time64_t now;
+ u32 ticks;
+
+ now = ktime_get_seconds();
+ ticks = now - fmp->time;
+
+ if (ticks == 0)
+ return;
+
+ ticks = min(FM_MAXTICKS, ticks);
+ while (ticks-- > 0)
+ fmp->val = (FM_COEF * fmp->val) / FM_SCALE;
+ fmp->time = now;
+
+ fmp->val += ((FM_SCALE - FM_COEF) * fmp->cnt) / FM_SCALE;
+ fmp->cnt = 0;
+}
+
+/* Process any previous ticks, then bump cnt by one (times scale). */
+static void fmeter_markevent(struct fmeter *fmp)
+{
+ spin_lock(&fmp->lock);
+ fmeter_update(fmp);
+ fmp->cnt = min(FM_MAXCNT, fmp->cnt + FM_SCALE);
+ spin_unlock(&fmp->lock);
+}
+
+/* Process any previous ticks, then return current value. */
+int fmeter_getrate(struct fmeter *fmp)
+{
+ int val;
+
+ spin_lock(&fmp->lock);
+ fmeter_update(fmp);
+ val = fmp->val;
+ spin_unlock(&fmp->lock);
+ return val;
+}
+
+/*
+ * Collection of memory_pressure is suppressed unless
+ * this flag is enabled by writing "1" to the special
+ * cpuset file 'memory_pressure_enabled' in the root cpuset.
+ */
+
+int cpuset_memory_pressure_enabled __read_mostly;
+
+/*
+ * __cpuset_memory_pressure_bump - keep stats of per-cpuset reclaims.
+ *
+ * Keep a running average of the rate of synchronous (direct)
+ * page reclaim efforts initiated by tasks in each cpuset.
+ *
+ * This represents the rate at which some task in the cpuset
+ * ran low on memory on all nodes it was allowed to use, and
+ * had to enter the kernels page reclaim code in an effort to
+ * create more free memory by tossing clean pages or swapping
+ * or writing dirty pages.
+ *
+ * Display to user space in the per-cpuset read-only file
+ * "memory_pressure". Value displayed is an integer
+ * representing the recent rate of entry into the synchronous
+ * (direct) page reclaim by any task attached to the cpuset.
+ */
+
+void __cpuset_memory_pressure_bump(void)
+{
+ rcu_read_lock();
+ fmeter_markevent(&task_cs(current)->fmeter);
+ rcu_read_unlock();
+}
+
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 61763dd70de5..17f7984a41f5 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2990,107 +2990,6 @@ static int update_prstate(struct cpuset *cs, int new_prs)
return 0;
}
-/*
- * Frequency meter - How fast is some event occurring?
- *
- * These routines manage a digitally filtered, constant time based,
- * event frequency meter. There are four routines:
- * fmeter_init() - initialize a frequency meter.
- * fmeter_markevent() - called each time the event happens.
- * fmeter_getrate() - returns the recent rate of such events.
- * fmeter_update() - internal routine used to update fmeter.
- *
- * A common data structure is passed to each of these routines,
- * which is used to keep track of the state required to manage the
- * frequency meter and its digital filter.
- *
- * The filter works on the number of events marked per unit time.
- * The filter is single-pole low-pass recursive (IIR). The time unit
- * is 1 second. Arithmetic is done using 32-bit integers scaled to
- * simulate 3 decimal digits of precision (multiplied by 1000).
- *
- * With an FM_COEF of 933, and a time base of 1 second, the filter
- * has a half-life of 10 seconds, meaning that if the events quit
- * happening, then the rate returned from the fmeter_getrate()
- * will be cut in half each 10 seconds, until it converges to zero.
- *
- * It is not worth doing a real infinitely recursive filter. If more
- * than FM_MAXTICKS ticks have elapsed since the last filter event,
- * just compute FM_MAXTICKS ticks worth, by which point the level
- * will be stable.
- *
- * Limit the count of unprocessed events to FM_MAXCNT, so as to avoid
- * arithmetic overflow in the fmeter_update() routine.
- *
- * Given the simple 32 bit integer arithmetic used, this meter works
- * best for reporting rates between one per millisecond (msec) and
- * one per 32 (approx) seconds. At constant rates faster than one
- * per msec it maxes out at values just under 1,000,000. At constant
- * rates between one per msec, and one per second it will stabilize
- * to a value N*1000, where N is the rate of events per second.
- * At constant rates between one per second and one per 32 seconds,
- * it will be choppy, moving up on the seconds that have an event,
- * and then decaying until the next event. At rates slower than
- * about one in 32 seconds, it decays all the way back to zero between
- * each event.
- */
-
-#define FM_COEF 933 /* coefficient for half-life of 10 secs */
-#define FM_MAXTICKS ((u32)99) /* useless computing more ticks than this */
-#define FM_MAXCNT 1000000 /* limit cnt to avoid overflow */
-#define FM_SCALE 1000 /* faux fixed point scale */
-
-/* Initialize a frequency meter */
-static void fmeter_init(struct fmeter *fmp)
-{
- fmp->cnt = 0;
- fmp->val = 0;
- fmp->time = 0;
- spin_lock_init(&fmp->lock);
-}
-
-/* Internal meter update - process cnt events and update value */
-static void fmeter_update(struct fmeter *fmp)
-{
- time64_t now;
- u32 ticks;
-
- now = ktime_get_seconds();
- ticks = now - fmp->time;
-
- if (ticks == 0)
- return;
-
- ticks = min(FM_MAXTICKS, ticks);
- while (ticks-- > 0)
- fmp->val = (FM_COEF * fmp->val) / FM_SCALE;
- fmp->time = now;
-
- fmp->val += ((FM_SCALE - FM_COEF) * fmp->cnt) / FM_SCALE;
- fmp->cnt = 0;
-}
-
-/* Process any previous ticks, then bump cnt by one (times scale). */
-static void fmeter_markevent(struct fmeter *fmp)
-{
- spin_lock(&fmp->lock);
- fmeter_update(fmp);
- fmp->cnt = min(FM_MAXCNT, fmp->cnt + FM_SCALE);
- spin_unlock(&fmp->lock);
-}
-
-/* Process any previous ticks, then return current value. */
-static int fmeter_getrate(struct fmeter *fmp)
-{
- int val;
-
- spin_lock(&fmp->lock);
- fmeter_update(fmp);
- val = fmp->val;
- spin_unlock(&fmp->lock);
- return val;
-}
-
static struct cpuset *cpuset_attach_old_cs;
/*
@@ -4780,39 +4679,6 @@ void cpuset_print_current_mems_allowed(void)
rcu_read_unlock();
}
-/*
- * Collection of memory_pressure is suppressed unless
- * this flag is enabled by writing "1" to the special
- * cpuset file 'memory_pressure_enabled' in the root cpuset.
- */
-
-int cpuset_memory_pressure_enabled __read_mostly;
-
-/*
- * __cpuset_memory_pressure_bump - keep stats of per-cpuset reclaims.
- *
- * Keep a running average of the rate of synchronous (direct)
- * page reclaim efforts initiated by tasks in each cpuset.
- *
- * This represents the rate at which some task in the cpuset
- * ran low on memory on all nodes it was allowed to use, and
- * had to enter the kernels page reclaim code in an effort to
- * create more free memory by tossing clean pages or swapping
- * or writing dirty pages.
- *
- * Display to user space in the per-cpuset read-only file
- * "memory_pressure". Value displayed is an integer
- * representing the recent rate of entry into the synchronous
- * (direct) page reclaim by any task attached to the cpuset.
- */
-
-void __cpuset_memory_pressure_bump(void)
-{
- rcu_read_lock();
- fmeter_markevent(&task_cs(current)->fmeter);
- rcu_read_unlock();
-}
-
#ifdef CONFIG_PROC_PID_CPUSET
/*
* proc_cpuset_show()
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 -next 04/11] cgroup/cpuset: move relax_domain_level to cpuset-v1.c
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
` (2 preceding siblings ...)
2024-08-26 13:26 ` [PATCH v2 -next 03/11] cgroup/cpuset: move memory_pressure to cpuset-v1.c Chen Ridong
@ 2024-08-26 13:26 ` Chen Ridong
2024-08-26 13:26 ` [PATCH v2 -next 05/11] cgroup/cpuset: move memory_spread " Chen Ridong
` (6 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:26 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
Setting domain level is not supported at cpuset v2, so move corresponding
code into cpuset-v1.c.
The 'cpuset_write_s64' and 'cpuset_read_s64' are only used for setting
domain level, move them to cpuset-v1.c. Currently, expose to cpuset.c.
After cpuset legacy interface files are move to cpuset-v1.c, they can
be static. The 'rebuild_sched_domains_locked' is exposed to cpuset-v1.c.
The change from original code is that using 'cpuset_lock' and
'cpuset_unlock' functions to lock or unlock cpuset_mutex.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset-internal.h | 6 +++-
kernel/cgroup/cpuset-v1.c | 59 +++++++++++++++++++++++++++++++
kernel/cgroup/cpuset.c | 62 ++-------------------------------
3 files changed, 66 insertions(+), 61 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 7911c86bf012..1058a45f05ec 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -238,11 +238,15 @@ static inline int is_spread_slab(const struct cpuset *cs)
return test_bit(CS_SPREAD_SLAB, &cs->flags);
}
+void rebuild_sched_domains_locked(void);
+
/*
* cpuset-v1.c
*/
-
void fmeter_init(struct fmeter *fmp);
int fmeter_getrate(struct fmeter *fmp);
+int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
+ s64 val);
+s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft);
#endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index f17ba44bc566..175638f2e7b7 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -136,3 +136,62 @@ void __cpuset_memory_pressure_bump(void)
rcu_read_unlock();
}
+static int update_relax_domain_level(struct cpuset *cs, s64 val)
+{
+#ifdef CONFIG_SMP
+ if (val < -1 || val > sched_domain_level_max + 1)
+ return -EINVAL;
+#endif
+
+ if (val != cs->relax_domain_level) {
+ cs->relax_domain_level = val;
+ if (!cpumask_empty(cs->cpus_allowed) &&
+ is_sched_load_balance(cs))
+ rebuild_sched_domains_locked();
+ }
+
+ return 0;
+}
+
+int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
+ s64 val)
+{
+ struct cpuset *cs = css_cs(css);
+ cpuset_filetype_t type = cft->private;
+ int retval = -ENODEV;
+
+ cpus_read_lock();
+ cpuset_lock();
+ if (!is_cpuset_online(cs))
+ goto out_unlock;
+
+ switch (type) {
+ case FILE_SCHED_RELAX_DOMAIN_LEVEL:
+ retval = update_relax_domain_level(cs, val);
+ break;
+ default:
+ retval = -EINVAL;
+ break;
+ }
+out_unlock:
+ cpuset_unlock();
+ cpus_read_unlock();
+ return retval;
+}
+
+s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
+{
+ struct cpuset *cs = css_cs(css);
+ cpuset_filetype_t type = cft->private;
+
+ switch (type) {
+ case FILE_SCHED_RELAX_DOMAIN_LEVEL:
+ return cs->relax_domain_level;
+ default:
+ BUG();
+ }
+
+ /* Unreachable but makes gcc happy */
+ return 0;
+}
+
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 17f7984a41f5..45031a17e068 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1069,7 +1069,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
*
* Call with cpuset_mutex held. Takes cpus_read_lock().
*/
-static void rebuild_sched_domains_locked(void)
+void rebuild_sched_domains_locked(void)
{
struct cgroup_subsys_state *pos_css;
struct sched_domain_attr *attr;
@@ -1121,7 +1121,7 @@ static void rebuild_sched_domains_locked(void)
partition_and_rebuild_sched_domains(ndoms, doms, attr);
}
#else /* !CONFIG_SMP */
-static void rebuild_sched_domains_locked(void)
+void rebuild_sched_domains_locked(void)
{
}
#endif /* CONFIG_SMP */
@@ -2788,23 +2788,6 @@ bool current_cpuset_is_being_rebound(void)
return ret;
}
-static int update_relax_domain_level(struct cpuset *cs, s64 val)
-{
-#ifdef CONFIG_SMP
- if (val < -1 || val > sched_domain_level_max + 1)
- return -EINVAL;
-#endif
-
- if (val != cs->relax_domain_level) {
- cs->relax_domain_level = val;
- if (!cpumask_empty(cs->cpus_allowed) &&
- is_sched_load_balance(cs))
- rebuild_sched_domains_locked();
- }
-
- return 0;
-}
-
/**
* update_tasks_flags - update the spread flags of tasks in the cpuset.
* @cs: the cpuset in which each task's spread flags needs to be changed
@@ -3267,32 +3250,6 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
return retval;
}
-static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
- s64 val)
-{
- struct cpuset *cs = css_cs(css);
- cpuset_filetype_t type = cft->private;
- int retval = -ENODEV;
-
- cpus_read_lock();
- mutex_lock(&cpuset_mutex);
- if (!is_cpuset_online(cs))
- goto out_unlock;
-
- switch (type) {
- case FILE_SCHED_RELAX_DOMAIN_LEVEL:
- retval = update_relax_domain_level(cs, val);
- break;
- default:
- retval = -EINVAL;
- break;
- }
-out_unlock:
- mutex_unlock(&cpuset_mutex);
- cpus_read_unlock();
- return retval;
-}
-
/*
* Common handling for a write to a "cpus" or "mems" file.
*/
@@ -3443,21 +3400,6 @@ static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
return 0;
}
-static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
-{
- struct cpuset *cs = css_cs(css);
- cpuset_filetype_t type = cft->private;
- switch (type) {
- case FILE_SCHED_RELAX_DOMAIN_LEVEL:
- return cs->relax_domain_level;
- default:
- BUG();
- }
-
- /* Unreachable but makes gcc happy */
- return 0;
-}
-
static int sched_partition_show(struct seq_file *seq, void *v)
{
struct cpuset *cs = css_cs(seq_css(seq));
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 -next 05/11] cgroup/cpuset: move memory_spread to cpuset-v1.c
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
` (3 preceding siblings ...)
2024-08-26 13:26 ` [PATCH v2 -next 04/11] cgroup/cpuset: move relax_domain_level " Chen Ridong
@ 2024-08-26 13:26 ` Chen Ridong
2024-08-26 13:26 ` [PATCH v2 -next 06/11] cgroup/cpuset: add callback_lock helper Chen Ridong
` (5 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:26 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
'memory_spread' is only set in cpuset v1. move corresponding code into
cpuset-v1.c.
Currently, 'cpuset_update_task_spread_flags' and 'update_tasks_flags' are
exposed to cpuset.c.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset-internal.h | 3 +++
kernel/cgroup/cpuset-v1.c | 42 +++++++++++++++++++++++++++++++++
kernel/cgroup/cpuset.c | 42 ---------------------------------
3 files changed, 45 insertions(+), 42 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 1058a45f05ec..02c4b0c74fa9 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -248,5 +248,8 @@ int fmeter_getrate(struct fmeter *fmp);
int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
s64 val);
s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft);
+void cpuset_update_task_spread_flags(struct cpuset *cs,
+ struct task_struct *tsk);
+void update_tasks_flags(struct cpuset *cs);
#endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 175638f2e7b7..320abd4bf2c3 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -195,3 +195,45 @@ s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
return 0;
}
+/*
+ * update task's spread flag if cpuset's page/slab spread flag is set
+ *
+ * Call with callback_lock or cpuset_mutex held. The check can be skipped
+ * if on default hierarchy.
+ */
+void cpuset_update_task_spread_flags(struct cpuset *cs,
+ struct task_struct *tsk)
+{
+ if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
+ return;
+
+ if (is_spread_page(cs))
+ task_set_spread_page(tsk);
+ else
+ task_clear_spread_page(tsk);
+
+ if (is_spread_slab(cs))
+ task_set_spread_slab(tsk);
+ else
+ task_clear_spread_slab(tsk);
+}
+
+/**
+ * update_tasks_flags - update the spread flags of tasks in the cpuset.
+ * @cs: the cpuset in which each task's spread flags needs to be changed
+ *
+ * Iterate through each task of @cs updating its spread flags. As this
+ * function is called with cpuset_mutex held, cpuset membership stays
+ * stable.
+ */
+void update_tasks_flags(struct cpuset *cs)
+{
+ struct css_task_iter it;
+ struct task_struct *task;
+
+ css_task_iter_start(&cs->css, 0, &it);
+ while ((task = css_task_iter_next(&it)))
+ cpuset_update_task_spread_flags(cs, task);
+ css_task_iter_end(&it);
+}
+
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 45031a17e068..0a3347e4dddc 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -401,29 +401,6 @@ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
nodes_and(*pmask, cs->effective_mems, node_states[N_MEMORY]);
}
-/*
- * update task's spread flag if cpuset's page/slab spread flag is set
- *
- * Call with callback_lock or cpuset_mutex held. The check can be skipped
- * if on default hierarchy.
- */
-static void cpuset_update_task_spread_flags(struct cpuset *cs,
- struct task_struct *tsk)
-{
- if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
- return;
-
- if (is_spread_page(cs))
- task_set_spread_page(tsk);
- else
- task_clear_spread_page(tsk);
-
- if (is_spread_slab(cs))
- task_set_spread_slab(tsk);
- else
- task_clear_spread_slab(tsk);
-}
-
/*
* is_cpuset_subset(p, q) - Is cpuset p a subset of cpuset q?
*
@@ -2788,25 +2765,6 @@ bool current_cpuset_is_being_rebound(void)
return ret;
}
-/**
- * update_tasks_flags - update the spread flags of tasks in the cpuset.
- * @cs: the cpuset in which each task's spread flags needs to be changed
- *
- * Iterate through each task of @cs updating its spread flags. As this
- * function is called with cpuset_mutex held, cpuset membership stays
- * stable.
- */
-static void update_tasks_flags(struct cpuset *cs)
-{
- struct css_task_iter it;
- struct task_struct *task;
-
- css_task_iter_start(&cs->css, 0, &it);
- while ((task = css_task_iter_next(&it)))
- cpuset_update_task_spread_flags(cs, task);
- css_task_iter_end(&it);
-}
-
/*
* update_flag - read a 0 or a 1 in a file and update associated flag
* bit: the bit to update (see cpuset_flagbits_t)
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 -next 06/11] cgroup/cpuset: add callback_lock helper
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
` (4 preceding siblings ...)
2024-08-26 13:26 ` [PATCH v2 -next 05/11] cgroup/cpuset: move memory_spread " Chen Ridong
@ 2024-08-26 13:26 ` Chen Ridong
2024-08-26 13:26 ` [PATCH v2 -next 07/11] cgroup/cpuset: move legacy hotplug update to cpuset-v1.c Chen Ridong
` (4 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:26 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
To modify cpuset, both cpuset_mutex and callback_lock are needed. Add
helpers for cpuset-v1 to get callback_lock.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset-internal.h | 2 ++
kernel/cgroup/cpuset.c | 10 ++++++++++
2 files changed, 12 insertions(+)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 02c4b0c74fa9..9a60dd6681e4 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -239,6 +239,8 @@ static inline int is_spread_slab(const struct cpuset *cs)
}
void rebuild_sched_domains_locked(void);
+void callback_lock_irq(void);
+void callback_unlock_irq(void);
/*
* cpuset-v1.c
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 0a3347e4dddc..2b2dc963299b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -269,6 +269,16 @@ void cpuset_unlock(void)
static DEFINE_SPINLOCK(callback_lock);
+void callback_lock_irq(void)
+{
+ spin_lock_irq(&callback_lock);
+}
+
+void callback_unlock_irq(void)
+{
+ spin_unlock_irq(&callback_lock);
+}
+
static struct workqueue_struct *cpuset_migrate_mm_wq;
static DECLARE_WAIT_QUEUE_HEAD(cpuset_attach_wq);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 -next 07/11] cgroup/cpuset: move legacy hotplug update to cpuset-v1.c
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
` (5 preceding siblings ...)
2024-08-26 13:26 ` [PATCH v2 -next 06/11] cgroup/cpuset: add callback_lock helper Chen Ridong
@ 2024-08-26 13:26 ` Chen Ridong
2024-08-26 13:27 ` [PATCH v2 -next 08/11] cgroup/cpuset: move validate_change_legacy " Chen Ridong
` (3 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:26 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
There are some differents about hotplug update between cpuset v1 and
cpuset v2. Move the legacy code to cpuset-v1.c.
'update_tasks_cpumask' and 'update_tasks_nodemask' are both used in cpuset
v1 and cpuset v2, declare them in cpuset-internal.h.
The change from original code is that use callback_lock helpers to get
callback_lock lock/unlock.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset-internal.h | 5 ++
kernel/cgroup/cpuset-v1.c | 91 +++++++++++++++++++++++++++++++
kernel/cgroup/cpuset.c | 96 +--------------------------------
3 files changed, 98 insertions(+), 94 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 9a60dd6681e4..7cd30ad809d5 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -241,6 +241,8 @@ static inline int is_spread_slab(const struct cpuset *cs)
void rebuild_sched_domains_locked(void);
void callback_lock_irq(void);
void callback_unlock_irq(void);
+void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus);
+void update_tasks_nodemask(struct cpuset *cs);
/*
* cpuset-v1.c
@@ -253,5 +255,8 @@ s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft);
void cpuset_update_task_spread_flags(struct cpuset *cs,
struct task_struct *tsk);
void update_tasks_flags(struct cpuset *cs);
+void hotplug_update_tasks_legacy(struct cpuset *cs,
+ struct cpumask *new_cpus, nodemask_t *new_mems,
+ bool cpus_updated, bool mems_updated);
#endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 320abd4bf2c3..ce1d00746e92 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -2,6 +2,14 @@
#include "cpuset-internal.h"
+/*
+ * Legacy hierarchy call to cgroup_transfer_tasks() is handled asynchrously
+ */
+struct cpuset_remove_tasks_struct {
+ struct work_struct work;
+ struct cpuset *cs;
+};
+
/*
* Frequency meter - How fast is some event occurring?
*
@@ -237,3 +245,86 @@ void update_tasks_flags(struct cpuset *cs)
css_task_iter_end(&it);
}
+/*
+ * If CPU and/or memory hotplug handlers, below, unplug any CPUs
+ * or memory nodes, we need to walk over the cpuset hierarchy,
+ * removing that CPU or node from all cpusets. If this removes the
+ * last CPU or node from a cpuset, then move the tasks in the empty
+ * cpuset to its next-highest non-empty parent.
+ */
+static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
+{
+ struct cpuset *parent;
+
+ /*
+ * Find its next-highest non-empty parent, (top cpuset
+ * has online cpus, so can't be empty).
+ */
+ parent = parent_cs(cs);
+ while (cpumask_empty(parent->cpus_allowed) ||
+ nodes_empty(parent->mems_allowed))
+ parent = parent_cs(parent);
+
+ if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
+ pr_err("cpuset: failed to transfer tasks out of empty cpuset ");
+ pr_cont_cgroup_name(cs->css.cgroup);
+ pr_cont("\n");
+ }
+}
+
+static void cpuset_migrate_tasks_workfn(struct work_struct *work)
+{
+ struct cpuset_remove_tasks_struct *s;
+
+ s = container_of(work, struct cpuset_remove_tasks_struct, work);
+ remove_tasks_in_empty_cpuset(s->cs);
+ css_put(&s->cs->css);
+ kfree(s);
+}
+
+void hotplug_update_tasks_legacy(struct cpuset *cs,
+ struct cpumask *new_cpus, nodemask_t *new_mems,
+ bool cpus_updated, bool mems_updated)
+{
+ bool is_empty;
+
+ callback_lock_irq();
+ cpumask_copy(cs->cpus_allowed, new_cpus);
+ cpumask_copy(cs->effective_cpus, new_cpus);
+ cs->mems_allowed = *new_mems;
+ cs->effective_mems = *new_mems;
+ callback_unlock_irq();
+
+ /*
+ * Don't call update_tasks_cpumask() if the cpuset becomes empty,
+ * as the tasks will be migrated to an ancestor.
+ */
+ if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
+ update_tasks_cpumask(cs, new_cpus);
+ if (mems_updated && !nodes_empty(cs->mems_allowed))
+ update_tasks_nodemask(cs);
+
+ is_empty = cpumask_empty(cs->cpus_allowed) ||
+ nodes_empty(cs->mems_allowed);
+
+ /*
+ * Move tasks to the nearest ancestor with execution resources,
+ * This is full cgroup operation which will also call back into
+ * cpuset. Execute it asynchronously using workqueue.
+ */
+ if (is_empty && cs->css.cgroup->nr_populated_csets &&
+ css_tryget_online(&cs->css)) {
+ struct cpuset_remove_tasks_struct *s;
+
+ s = kzalloc(sizeof(*s), GFP_KERNEL);
+ if (WARN_ON_ONCE(!s)) {
+ css_put(&cs->css);
+ return;
+ }
+
+ s->cs = cs;
+ INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
+ schedule_work(&s->work);
+ }
+}
+
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2b2dc963299b..b93ef0b48eae 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -65,14 +65,6 @@ static const char * const perr_strings[] = {
[PERR_ACCESS] = "Enable partition not permitted",
};
-/*
- * Legacy hierarchy call to cgroup_transfer_tasks() is handled asynchrously
- */
-struct cpuset_remove_tasks_struct {
- struct work_struct work;
- struct cpuset *cs;
-};
-
/*
* Exclusive CPUs distributed out to sub-partitions of top_cpuset
*/
@@ -1138,7 +1130,7 @@ void rebuild_sched_domains(void)
* is used instead of effective_cpus to make sure all offline CPUs are also
* included as hotplug code won't update cpumasks for tasks in top_cpuset.
*/
-static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
+void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
{
struct css_task_iter it;
struct task_struct *task;
@@ -2591,7 +2583,7 @@ static void *cpuset_being_rebound;
* effective cpuset's. As this function is called with cpuset_mutex held,
* cpuset membership stays stable.
*/
-static void update_tasks_nodemask(struct cpuset *cs)
+void update_tasks_nodemask(struct cpuset *cs)
{
static nodemask_t newmems; /* protected by cpuset_mutex */
struct css_task_iter it;
@@ -3923,90 +3915,6 @@ int __init cpuset_init(void)
return 0;
}
-/*
- * If CPU and/or memory hotplug handlers, below, unplug any CPUs
- * or memory nodes, we need to walk over the cpuset hierarchy,
- * removing that CPU or node from all cpusets. If this removes the
- * last CPU or node from a cpuset, then move the tasks in the empty
- * cpuset to its next-highest non-empty parent.
- */
-static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
-{
- struct cpuset *parent;
-
- /*
- * Find its next-highest non-empty parent, (top cpuset
- * has online cpus, so can't be empty).
- */
- parent = parent_cs(cs);
- while (cpumask_empty(parent->cpus_allowed) ||
- nodes_empty(parent->mems_allowed))
- parent = parent_cs(parent);
-
- if (cgroup_transfer_tasks(parent->css.cgroup, cs->css.cgroup)) {
- pr_err("cpuset: failed to transfer tasks out of empty cpuset ");
- pr_cont_cgroup_name(cs->css.cgroup);
- pr_cont("\n");
- }
-}
-
-static void cpuset_migrate_tasks_workfn(struct work_struct *work)
-{
- struct cpuset_remove_tasks_struct *s;
-
- s = container_of(work, struct cpuset_remove_tasks_struct, work);
- remove_tasks_in_empty_cpuset(s->cs);
- css_put(&s->cs->css);
- kfree(s);
-}
-
-static void
-hotplug_update_tasks_legacy(struct cpuset *cs,
- struct cpumask *new_cpus, nodemask_t *new_mems,
- bool cpus_updated, bool mems_updated)
-{
- bool is_empty;
-
- spin_lock_irq(&callback_lock);
- cpumask_copy(cs->cpus_allowed, new_cpus);
- cpumask_copy(cs->effective_cpus, new_cpus);
- cs->mems_allowed = *new_mems;
- cs->effective_mems = *new_mems;
- spin_unlock_irq(&callback_lock);
-
- /*
- * Don't call update_tasks_cpumask() if the cpuset becomes empty,
- * as the tasks will be migrated to an ancestor.
- */
- if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
- update_tasks_cpumask(cs, new_cpus);
- if (mems_updated && !nodes_empty(cs->mems_allowed))
- update_tasks_nodemask(cs);
-
- is_empty = cpumask_empty(cs->cpus_allowed) ||
- nodes_empty(cs->mems_allowed);
-
- /*
- * Move tasks to the nearest ancestor with execution resources,
- * This is full cgroup operation which will also call back into
- * cpuset. Execute it asynchronously using workqueue.
- */
- if (is_empty && cs->css.cgroup->nr_populated_csets &&
- css_tryget_online(&cs->css)) {
- struct cpuset_remove_tasks_struct *s;
-
- s = kzalloc(sizeof(*s), GFP_KERNEL);
- if (WARN_ON_ONCE(!s)) {
- css_put(&cs->css);
- return;
- }
-
- s->cs = cs;
- INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
- schedule_work(&s->work);
- }
-}
-
static void
hotplug_update_tasks(struct cpuset *cs,
struct cpumask *new_cpus, nodemask_t *new_mems,
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 -next 08/11] cgroup/cpuset: move validate_change_legacy to cpuset-v1.c
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
` (6 preceding siblings ...)
2024-08-26 13:26 ` [PATCH v2 -next 07/11] cgroup/cpuset: move legacy hotplug update to cpuset-v1.c Chen Ridong
@ 2024-08-26 13:27 ` Chen Ridong
2024-08-26 13:27 ` [PATCH v2 -next 09/11] cgroup/cpuset: move v1 interfaces " Chen Ridong
` (2 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:27 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
The validate_change_legacy functions is used for v1, move it to
cpuset-v1.c. And two micro 'cpuset_for_each_child' and
'cpuset_for_each_descendant_pre' are common for v1 and v2, move them to
cpuset-internal.h.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset-internal.h | 29 +++++++++++++
kernel/cgroup/cpuset-v1.c | 45 ++++++++++++++++++++
kernel/cgroup/cpuset.c | 73 ---------------------------------
3 files changed, 74 insertions(+), 73 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 7cd30ad809d5..07551ff0812e 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -238,6 +238,34 @@ static inline int is_spread_slab(const struct cpuset *cs)
return test_bit(CS_SPREAD_SLAB, &cs->flags);
}
+/**
+ * cpuset_for_each_child - traverse online children of a cpuset
+ * @child_cs: loop cursor pointing to the current child
+ * @pos_css: used for iteration
+ * @parent_cs: target cpuset to walk children of
+ *
+ * Walk @child_cs through the online children of @parent_cs. Must be used
+ * with RCU read locked.
+ */
+#define cpuset_for_each_child(child_cs, pos_css, parent_cs) \
+ css_for_each_child((pos_css), &(parent_cs)->css) \
+ if (is_cpuset_online(((child_cs) = css_cs((pos_css)))))
+
+/**
+ * cpuset_for_each_descendant_pre - pre-order walk of a cpuset's descendants
+ * @des_cs: loop cursor pointing to the current descendant
+ * @pos_css: used for iteration
+ * @root_cs: target cpuset to walk ancestor of
+ *
+ * Walk @des_cs through the online descendants of @root_cs. Must be used
+ * with RCU read locked. The caller may modify @pos_css by calling
+ * css_rightmost_descendant() to skip subtree. @root_cs is included in the
+ * iteration and the first node to be visited.
+ */
+#define cpuset_for_each_descendant_pre(des_cs, pos_css, root_cs) \
+ css_for_each_descendant_pre((pos_css), &(root_cs)->css) \
+ if (is_cpuset_online(((des_cs) = css_cs((pos_css)))))
+
void rebuild_sched_domains_locked(void);
void callback_lock_irq(void);
void callback_unlock_irq(void);
@@ -258,5 +286,6 @@ void update_tasks_flags(struct cpuset *cs);
void hotplug_update_tasks_legacy(struct cpuset *cs,
struct cpumask *new_cpus, nodemask_t *new_mems,
bool cpus_updated, bool mems_updated);
+int validate_change_legacy(struct cpuset *cur, struct cpuset *trial);
#endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index ce1d00746e92..246fc962f549 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -328,3 +328,48 @@ void hotplug_update_tasks_legacy(struct cpuset *cs,
}
}
+/*
+ * is_cpuset_subset(p, q) - Is cpuset p a subset of cpuset q?
+ *
+ * One cpuset is a subset of another if all its allowed CPUs and
+ * Memory Nodes are a subset of the other, and its exclusive flags
+ * are only set if the other's are set. Call holding cpuset_mutex.
+ */
+
+static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
+{
+ return cpumask_subset(p->cpus_allowed, q->cpus_allowed) &&
+ nodes_subset(p->mems_allowed, q->mems_allowed) &&
+ is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
+ is_mem_exclusive(p) <= is_mem_exclusive(q);
+}
+
+/*
+ * validate_change_legacy() - Validate conditions specific to legacy (v1)
+ * behavior.
+ */
+int validate_change_legacy(struct cpuset *cur, struct cpuset *trial)
+{
+ struct cgroup_subsys_state *css;
+ struct cpuset *c, *par;
+ int ret;
+
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
+ /* Each of our child cpusets must be a subset of us */
+ ret = -EBUSY;
+ cpuset_for_each_child(c, css, cur)
+ if (!is_cpuset_subset(c, trial))
+ goto out;
+
+ /* On legacy hierarchy, we must be a subset of our parent cpuset. */
+ ret = -EACCES;
+ par = parent_cs(cur);
+ if (par && !is_cpuset_subset(trial, par))
+ goto out;
+
+ ret = 0;
+out:
+ return ret;
+}
+
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b93ef0b48eae..4412a4168902 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -180,34 +180,6 @@ static struct cpuset top_cpuset = {
.remote_sibling = LIST_HEAD_INIT(top_cpuset.remote_sibling),
};
-/**
- * cpuset_for_each_child - traverse online children of a cpuset
- * @child_cs: loop cursor pointing to the current child
- * @pos_css: used for iteration
- * @parent_cs: target cpuset to walk children of
- *
- * Walk @child_cs through the online children of @parent_cs. Must be used
- * with RCU read locked.
- */
-#define cpuset_for_each_child(child_cs, pos_css, parent_cs) \
- css_for_each_child((pos_css), &(parent_cs)->css) \
- if (is_cpuset_online(((child_cs) = css_cs((pos_css)))))
-
-/**
- * cpuset_for_each_descendant_pre - pre-order walk of a cpuset's descendants
- * @des_cs: loop cursor pointing to the current descendant
- * @pos_css: used for iteration
- * @root_cs: target cpuset to walk ancestor of
- *
- * Walk @des_cs through the online descendants of @root_cs. Must be used
- * with RCU read locked. The caller may modify @pos_css by calling
- * css_rightmost_descendant() to skip subtree. @root_cs is included in the
- * iteration and the first node to be visited.
- */
-#define cpuset_for_each_descendant_pre(des_cs, pos_css, root_cs) \
- css_for_each_descendant_pre((pos_css), &(root_cs)->css) \
- if (is_cpuset_online(((des_cs) = css_cs((pos_css)))))
-
/*
* There are two global locks guarding cpuset structures - cpuset_mutex and
* callback_lock. We also require taking task_lock() when dereferencing a
@@ -403,22 +375,6 @@ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
nodes_and(*pmask, cs->effective_mems, node_states[N_MEMORY]);
}
-/*
- * is_cpuset_subset(p, q) - Is cpuset p a subset of cpuset q?
- *
- * One cpuset is a subset of another if all its allowed CPUs and
- * Memory Nodes are a subset of the other, and its exclusive flags
- * are only set if the other's are set. Call holding cpuset_mutex.
- */
-
-static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
-{
- return cpumask_subset(p->cpus_allowed, q->cpus_allowed) &&
- nodes_subset(p->mems_allowed, q->mems_allowed) &&
- is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
- is_mem_exclusive(p) <= is_mem_exclusive(q);
-}
-
/**
* alloc_cpumasks - allocate three cpumasks for cpuset
* @cs: the cpuset that have cpumasks to be allocated.
@@ -549,35 +505,6 @@ static inline bool cpusets_are_exclusive(struct cpuset *cs1, struct cpuset *cs2)
return true;
}
-/*
- * validate_change_legacy() - Validate conditions specific to legacy (v1)
- * behavior.
- */
-static int validate_change_legacy(struct cpuset *cur, struct cpuset *trial)
-{
- struct cgroup_subsys_state *css;
- struct cpuset *c, *par;
- int ret;
-
- WARN_ON_ONCE(!rcu_read_lock_held());
-
- /* Each of our child cpusets must be a subset of us */
- ret = -EBUSY;
- cpuset_for_each_child(c, css, cur)
- if (!is_cpuset_subset(c, trial))
- goto out;
-
- /* On legacy hierarchy, we must be a subset of our parent cpuset. */
- ret = -EACCES;
- par = parent_cs(cur);
- if (par && !is_cpuset_subset(trial, par))
- goto out;
-
- ret = 0;
-out:
- return ret;
-}
-
/*
* validate_change() - Used to validate that any proposed cpuset change
* follows the structural rules for cpusets.
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 -next 09/11] cgroup/cpuset: move v1 interfaces to cpuset-v1.c
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
` (7 preceding siblings ...)
2024-08-26 13:27 ` [PATCH v2 -next 08/11] cgroup/cpuset: move validate_change_legacy " Chen Ridong
@ 2024-08-26 13:27 ` Chen Ridong
2024-08-26 19:30 ` Waiman Long
2024-08-26 13:27 ` [PATCH v2 -next 10/11] cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1 Chen Ridong
2024-08-26 13:27 ` [PATCH v2 -next 11/11] cgroup/cpuset: add sefltest for cpuset v1 Chen Ridong
10 siblings, 1 reply; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:27 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
Move legacy cpuset controller interfaces files and corresponding code
into cpuset-v1.c. 'update_flag', 'cpuset_write_resmask' and
'cpuset_common_seq_show' are also used for v1, so declare them in
cpuset-internal.h.
'cpuset_write_s64', 'cpuset_read_s64' and 'fmeter_getrate' are only used
cpuset-v1.c now, make it static.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
kernel/cgroup/cpuset-internal.h | 9 +-
kernel/cgroup/cpuset-v1.c | 194 ++++++++++++++++++++++++++++++-
kernel/cgroup/cpuset.c | 195 +-------------------------------
3 files changed, 199 insertions(+), 199 deletions(-)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 07551ff0812e..a6c71c86e58d 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -271,15 +271,16 @@ void callback_lock_irq(void);
void callback_unlock_irq(void);
void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus);
void update_tasks_nodemask(struct cpuset *cs);
+int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, int turning_on);
+ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off);
+int cpuset_common_seq_show(struct seq_file *sf, void *v);
/*
* cpuset-v1.c
*/
+extern struct cftype legacy_files[];
void fmeter_init(struct fmeter *fmp);
-int fmeter_getrate(struct fmeter *fmp);
-int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
- s64 val);
-s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft);
void cpuset_update_task_spread_flags(struct cpuset *cs,
struct task_struct *tsk);
void update_tasks_flags(struct cpuset *cs);
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 246fc962f549..ffb8711cc8fa 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -100,7 +100,7 @@ static void fmeter_markevent(struct fmeter *fmp)
}
/* Process any previous ticks, then return current value. */
-int fmeter_getrate(struct fmeter *fmp)
+static int fmeter_getrate(struct fmeter *fmp)
{
int val;
@@ -161,7 +161,7 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val)
return 0;
}
-int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
+static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
s64 val)
{
struct cpuset *cs = css_cs(css);
@@ -187,7 +187,7 @@ int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft,
return retval;
}
-s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
+static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
{
struct cpuset *cs = css_cs(css);
cpuset_filetype_t type = cft->private;
@@ -373,3 +373,191 @@ int validate_change_legacy(struct cpuset *cur, struct cpuset *trial)
return ret;
}
+static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
+{
+ struct cpuset *cs = css_cs(css);
+ cpuset_filetype_t type = cft->private;
+
+ switch (type) {
+ case FILE_CPU_EXCLUSIVE:
+ return is_cpu_exclusive(cs);
+ case FILE_MEM_EXCLUSIVE:
+ return is_mem_exclusive(cs);
+ case FILE_MEM_HARDWALL:
+ return is_mem_hardwall(cs);
+ case FILE_SCHED_LOAD_BALANCE:
+ return is_sched_load_balance(cs);
+ case FILE_MEMORY_MIGRATE:
+ return is_memory_migrate(cs);
+ case FILE_MEMORY_PRESSURE_ENABLED:
+ return cpuset_memory_pressure_enabled;
+ case FILE_MEMORY_PRESSURE:
+ return fmeter_getrate(&cs->fmeter);
+ case FILE_SPREAD_PAGE:
+ return is_spread_page(cs);
+ case FILE_SPREAD_SLAB:
+ return is_spread_slab(cs);
+ default:
+ BUG();
+ }
+
+ /* Unreachable but makes gcc happy */
+ return 0;
+}
+
+static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
+ u64 val)
+{
+ struct cpuset *cs = css_cs(css);
+ cpuset_filetype_t type = cft->private;
+ int retval = 0;
+
+ cpus_read_lock();
+ cpuset_lock();
+ if (!is_cpuset_online(cs)) {
+ retval = -ENODEV;
+ goto out_unlock;
+ }
+
+ switch (type) {
+ case FILE_CPU_EXCLUSIVE:
+ retval = update_flag(CS_CPU_EXCLUSIVE, cs, val);
+ break;
+ case FILE_MEM_EXCLUSIVE:
+ retval = update_flag(CS_MEM_EXCLUSIVE, cs, val);
+ break;
+ case FILE_MEM_HARDWALL:
+ retval = update_flag(CS_MEM_HARDWALL, cs, val);
+ break;
+ case FILE_SCHED_LOAD_BALANCE:
+ retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
+ break;
+ case FILE_MEMORY_MIGRATE:
+ retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
+ break;
+ case FILE_MEMORY_PRESSURE_ENABLED:
+ cpuset_memory_pressure_enabled = !!val;
+ break;
+ case FILE_SPREAD_PAGE:
+ retval = update_flag(CS_SPREAD_PAGE, cs, val);
+ break;
+ case FILE_SPREAD_SLAB:
+ retval = update_flag(CS_SPREAD_SLAB, cs, val);
+ break;
+ default:
+ retval = -EINVAL;
+ break;
+ }
+out_unlock:
+ cpuset_unlock();
+ cpus_read_unlock();
+ return retval;
+}
+
+/*
+ * for the common functions, 'private' gives the type of file
+ */
+
+struct cftype legacy_files[] = {
+ {
+ .name = "cpus",
+ .seq_show = cpuset_common_seq_show,
+ .write = cpuset_write_resmask,
+ .max_write_len = (100U + 6 * NR_CPUS),
+ .private = FILE_CPULIST,
+ },
+
+ {
+ .name = "mems",
+ .seq_show = cpuset_common_seq_show,
+ .write = cpuset_write_resmask,
+ .max_write_len = (100U + 6 * MAX_NUMNODES),
+ .private = FILE_MEMLIST,
+ },
+
+ {
+ .name = "effective_cpus",
+ .seq_show = cpuset_common_seq_show,
+ .private = FILE_EFFECTIVE_CPULIST,
+ },
+
+ {
+ .name = "effective_mems",
+ .seq_show = cpuset_common_seq_show,
+ .private = FILE_EFFECTIVE_MEMLIST,
+ },
+
+ {
+ .name = "cpu_exclusive",
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_CPU_EXCLUSIVE,
+ },
+
+ {
+ .name = "mem_exclusive",
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_MEM_EXCLUSIVE,
+ },
+
+ {
+ .name = "mem_hardwall",
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_MEM_HARDWALL,
+ },
+
+ {
+ .name = "sched_load_balance",
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_SCHED_LOAD_BALANCE,
+ },
+
+ {
+ .name = "sched_relax_domain_level",
+ .read_s64 = cpuset_read_s64,
+ .write_s64 = cpuset_write_s64,
+ .private = FILE_SCHED_RELAX_DOMAIN_LEVEL,
+ },
+
+ {
+ .name = "memory_migrate",
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_MEMORY_MIGRATE,
+ },
+
+ {
+ .name = "memory_pressure",
+ .read_u64 = cpuset_read_u64,
+ .private = FILE_MEMORY_PRESSURE,
+ },
+
+ {
+ .name = "memory_spread_page",
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_SPREAD_PAGE,
+ },
+
+ {
+ /* obsolete, may be removed in the future */
+ .name = "memory_spread_slab",
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_SPREAD_SLAB,
+ },
+
+ {
+ .name = "memory_pressure_enabled",
+ .flags = CFTYPE_ONLY_ON_ROOT,
+ .read_u64 = cpuset_read_u64,
+ .write_u64 = cpuset_write_u64,
+ .private = FILE_MEMORY_PRESSURE_ENABLED,
+ },
+
+ { } /* terminate */
+};
+
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4412a4168902..2f52fe488f3a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1107,8 +1107,6 @@ enum partition_cmd {
partcmd_invalidate, /* Make partition invalid */
};
-static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
- int turning_on);
static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
struct tmpmasks *tmp);
@@ -2703,7 +2701,7 @@ bool current_cpuset_is_being_rebound(void)
* Call with cpuset_mutex held.
*/
-static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
+int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
int turning_on)
{
struct cpuset *trialcs;
@@ -3088,59 +3086,10 @@ static void cpuset_attach(struct cgroup_taskset *tset)
mutex_unlock(&cpuset_mutex);
}
-static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
- u64 val)
-{
- struct cpuset *cs = css_cs(css);
- cpuset_filetype_t type = cft->private;
- int retval = 0;
-
- cpus_read_lock();
- mutex_lock(&cpuset_mutex);
- if (!is_cpuset_online(cs)) {
- retval = -ENODEV;
- goto out_unlock;
- }
-
- switch (type) {
- case FILE_CPU_EXCLUSIVE:
- retval = update_flag(CS_CPU_EXCLUSIVE, cs, val);
- break;
- case FILE_MEM_EXCLUSIVE:
- retval = update_flag(CS_MEM_EXCLUSIVE, cs, val);
- break;
- case FILE_MEM_HARDWALL:
- retval = update_flag(CS_MEM_HARDWALL, cs, val);
- break;
- case FILE_SCHED_LOAD_BALANCE:
- retval = update_flag(CS_SCHED_LOAD_BALANCE, cs, val);
- break;
- case FILE_MEMORY_MIGRATE:
- retval = update_flag(CS_MEMORY_MIGRATE, cs, val);
- break;
- case FILE_MEMORY_PRESSURE_ENABLED:
- cpuset_memory_pressure_enabled = !!val;
- break;
- case FILE_SPREAD_PAGE:
- retval = update_flag(CS_SPREAD_PAGE, cs, val);
- break;
- case FILE_SPREAD_SLAB:
- retval = update_flag(CS_SPREAD_SLAB, cs, val);
- break;
- default:
- retval = -EINVAL;
- break;
- }
-out_unlock:
- mutex_unlock(&cpuset_mutex);
- cpus_read_unlock();
- return retval;
-}
-
/*
* Common handling for a write to a "cpus" or "mems" file.
*/
-static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
+ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
struct cpuset *cs = css_cs(of_css(of));
@@ -3215,7 +3164,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
* and since these maps can change value dynamically, one could read
* gibberish by doing partial reads while a list was changing.
*/
-static int cpuset_common_seq_show(struct seq_file *sf, void *v)
+int cpuset_common_seq_show(struct seq_file *sf, void *v)
{
struct cpuset *cs = css_cs(seq_css(sf));
cpuset_filetype_t type = seq_cft(sf)->private;
@@ -3256,37 +3205,6 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
return ret;
}
-static u64 cpuset_read_u64(struct cgroup_subsys_state *css, struct cftype *cft)
-{
- struct cpuset *cs = css_cs(css);
- cpuset_filetype_t type = cft->private;
- switch (type) {
- case FILE_CPU_EXCLUSIVE:
- return is_cpu_exclusive(cs);
- case FILE_MEM_EXCLUSIVE:
- return is_mem_exclusive(cs);
- case FILE_MEM_HARDWALL:
- return is_mem_hardwall(cs);
- case FILE_SCHED_LOAD_BALANCE:
- return is_sched_load_balance(cs);
- case FILE_MEMORY_MIGRATE:
- return is_memory_migrate(cs);
- case FILE_MEMORY_PRESSURE_ENABLED:
- return cpuset_memory_pressure_enabled;
- case FILE_MEMORY_PRESSURE:
- return fmeter_getrate(&cs->fmeter);
- case FILE_SPREAD_PAGE:
- return is_spread_page(cs);
- case FILE_SPREAD_SLAB:
- return is_spread_slab(cs);
- default:
- BUG();
- }
-
- /* Unreachable but makes gcc happy */
- return 0;
-}
-
static int sched_partition_show(struct seq_file *seq, void *v)
{
struct cpuset *cs = css_cs(seq_css(seq));
@@ -3350,113 +3268,6 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
return retval ?: nbytes;
}
-/*
- * for the common functions, 'private' gives the type of file
- */
-
-static struct cftype legacy_files[] = {
- {
- .name = "cpus",
- .seq_show = cpuset_common_seq_show,
- .write = cpuset_write_resmask,
- .max_write_len = (100U + 6 * NR_CPUS),
- .private = FILE_CPULIST,
- },
-
- {
- .name = "mems",
- .seq_show = cpuset_common_seq_show,
- .write = cpuset_write_resmask,
- .max_write_len = (100U + 6 * MAX_NUMNODES),
- .private = FILE_MEMLIST,
- },
-
- {
- .name = "effective_cpus",
- .seq_show = cpuset_common_seq_show,
- .private = FILE_EFFECTIVE_CPULIST,
- },
-
- {
- .name = "effective_mems",
- .seq_show = cpuset_common_seq_show,
- .private = FILE_EFFECTIVE_MEMLIST,
- },
-
- {
- .name = "cpu_exclusive",
- .read_u64 = cpuset_read_u64,
- .write_u64 = cpuset_write_u64,
- .private = FILE_CPU_EXCLUSIVE,
- },
-
- {
- .name = "mem_exclusive",
- .read_u64 = cpuset_read_u64,
- .write_u64 = cpuset_write_u64,
- .private = FILE_MEM_EXCLUSIVE,
- },
-
- {
- .name = "mem_hardwall",
- .read_u64 = cpuset_read_u64,
- .write_u64 = cpuset_write_u64,
- .private = FILE_MEM_HARDWALL,
- },
-
- {
- .name = "sched_load_balance",
- .read_u64 = cpuset_read_u64,
- .write_u64 = cpuset_write_u64,
- .private = FILE_SCHED_LOAD_BALANCE,
- },
-
- {
- .name = "sched_relax_domain_level",
- .read_s64 = cpuset_read_s64,
- .write_s64 = cpuset_write_s64,
- .private = FILE_SCHED_RELAX_DOMAIN_LEVEL,
- },
-
- {
- .name = "memory_migrate",
- .read_u64 = cpuset_read_u64,
- .write_u64 = cpuset_write_u64,
- .private = FILE_MEMORY_MIGRATE,
- },
-
- {
- .name = "memory_pressure",
- .read_u64 = cpuset_read_u64,
- .private = FILE_MEMORY_PRESSURE,
- },
-
- {
- .name = "memory_spread_page",
- .read_u64 = cpuset_read_u64,
- .write_u64 = cpuset_write_u64,
- .private = FILE_SPREAD_PAGE,
- },
-
- {
- /* obsolete, may be removed in the future */
- .name = "memory_spread_slab",
- .read_u64 = cpuset_read_u64,
- .write_u64 = cpuset_write_u64,
- .private = FILE_SPREAD_SLAB,
- },
-
- {
- .name = "memory_pressure_enabled",
- .flags = CFTYPE_ONLY_ON_ROOT,
- .read_u64 = cpuset_read_u64,
- .write_u64 = cpuset_write_u64,
- .private = FILE_MEMORY_PRESSURE_ENABLED,
- },
-
- { } /* terminate */
-};
-
/*
* This is currently a minimal set for the default hierarchy. It can be
* expanded later on by migrating more features and control files from v1.
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2 -next 09/11] cgroup/cpuset: move v1 interfaces to cpuset-v1.c
2024-08-26 13:27 ` [PATCH v2 -next 09/11] cgroup/cpuset: move v1 interfaces " Chen Ridong
@ 2024-08-26 19:30 ` Waiman Long
2024-08-26 19:40 ` Tejun Heo
0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2024-08-26 19:30 UTC (permalink / raw)
To: Chen Ridong, tj, lizefan.x, hannes, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
On 8/26/24 09:27, Chen Ridong wrote:
> Move legacy cpuset controller interfaces files and corresponding code
> into cpuset-v1.c. 'update_flag', 'cpuset_write_resmask' and
> 'cpuset_common_seq_show' are also used for v1, so declare them in
> cpuset-internal.h.
>
> 'cpuset_write_s64', 'cpuset_read_s64' and 'fmeter_getrate' are only used
> cpuset-v1.c now, make it static.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> kernel/cgroup/cpuset-internal.h | 9 +-
> kernel/cgroup/cpuset-v1.c | 194 ++++++++++++++++++++++++++++++-
> kernel/cgroup/cpuset.c | 195 +-------------------------------
> 3 files changed, 199 insertions(+), 199 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index 07551ff0812e..a6c71c86e58d 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -271,15 +271,16 @@ void callback_lock_irq(void);
> void callback_unlock_irq(void);
> void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus);
> void update_tasks_nodemask(struct cpuset *cs);
> +int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, int turning_on);
> +ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
> + char *buf, size_t nbytes, loff_t off);
> +int cpuset_common_seq_show(struct seq_file *sf, void *v);
>
> /*
> * cpuset-v1.c
> */
> +extern struct cftype legacy_files[];
The legacy_files name is rather generic. By making it globally visible
within the kernel, it runs the risk conflicting with another variable of
the same name (namespace pollution). I would suggest adding "cpuset_"
prefix to make it unique to cpuset.
The following functions also have similar issue.
- update_flag()
- update_tasks_flags()
- validate_change_legacy()
- callback_lock_irq()
- callback_unlock_irq().
Another alternative is to include cpuset-v1.c directly into cpuset.c like
#ifdef CONFIG_CPUSETS_V1
#include "cpuset-v1.c"
#else
....
#endif
Then you don't need to change the names and will not need
cpuset-internal.h. It is up to you to decide what you want to do.
Cheers,
Longman
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 -next 09/11] cgroup/cpuset: move v1 interfaces to cpuset-v1.c
2024-08-26 19:30 ` Waiman Long
@ 2024-08-26 19:40 ` Tejun Heo
2024-08-26 19:47 ` Waiman Long
0 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2024-08-26 19:40 UTC (permalink / raw)
To: Waiman Long
Cc: Chen Ridong, lizefan.x, hannes, adityakali, sergeh, mkoutny,
cgroups, linux-kernel, chenridong
On Mon, Aug 26, 2024 at 03:30:14PM -0400, Waiman Long wrote:
...
> Another alternative is to include cpuset-v1.c directly into cpuset.c like
>
> #ifdef CONFIG_CPUSETS_V1
> #include "cpuset-v1.c"
> #else
> ....
> #endif
>
> Then you don't need to change the names and will not need cpuset-internal.h.
> It is up to you to decide what you want to do.
FWIW, I'd prefer to have cpuset1_ prefixed functions declared in cpuset1.h
or something rather than including .c file.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 -next 09/11] cgroup/cpuset: move v1 interfaces to cpuset-v1.c
2024-08-26 19:40 ` Tejun Heo
@ 2024-08-26 19:47 ` Waiman Long
2024-08-27 1:47 ` chenridong
0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2024-08-26 19:47 UTC (permalink / raw)
To: Tejun Heo
Cc: Chen Ridong, lizefan.x, hannes, adityakali, sergeh, mkoutny,
cgroups, linux-kernel, chenridong
On 8/26/24 15:40, Tejun Heo wrote:
> On Mon, Aug 26, 2024 at 03:30:14PM -0400, Waiman Long wrote:
> ...
>> Another alternative is to include cpuset-v1.c directly into cpuset.c like
>>
>> #ifdef CONFIG_CPUSETS_V1
>> #include "cpuset-v1.c"
>> #else
>> ....
>> #endif
>>
>> Then you don't need to change the names and will not need cpuset-internal.h.
>> It is up to you to decide what you want to do.
> FWIW, I'd prefer to have cpuset1_ prefixed functions declared in cpuset1.h
> or something rather than including .c file.
Sure. Let's have "cpuset1_" prefix if it is v1 specific and "cpuset_"
prefix if it is used by both v1 and v2. That applies only to newly
exposed names.
Cheers,
Longman
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 -next 09/11] cgroup/cpuset: move v1 interfaces to cpuset-v1.c
2024-08-26 19:47 ` Waiman Long
@ 2024-08-27 1:47 ` chenridong
0 siblings, 0 replies; 18+ messages in thread
From: chenridong @ 2024-08-27 1:47 UTC (permalink / raw)
To: Waiman Long, Tejun Heo
Cc: lizefan.x, hannes, adityakali, sergeh, mkoutny, cgroups,
linux-kernel, chenridong
On 2024/8/27 3:47, Waiman Long wrote:
>
> On 8/26/24 15:40, Tejun Heo wrote:
>> On Mon, Aug 26, 2024 at 03:30:14PM -0400, Waiman Long wrote:
>> ...
>>> Another alternative is to include cpuset-v1.c directly into cpuset.c
>>> like
>>>
>>> #ifdef CONFIG_CPUSETS_V1
>>> #include "cpuset-v1.c"
>>> #else
>>> ....
>>> #endif
>>>
>>> Then you don't need to change the names and will not need
>>> cpuset-internal.h.
>>> It is up to you to decide what you want to do.
>> FWIW, I'd prefer to have cpuset1_ prefixed functions declared in
>> cpuset1.h
>> or something rather than including .c file.
>
> Sure. Let's have "cpuset1_" prefix if it is v1 specific and "cpuset_"
> prefix if it is used by both v1 and v2. That applies only to newly
> exposed names.
>
> Cheers,
> Longman
>
I will rename the functions with cpuset1_/cpuset_ prefix.
Thanks,
Ridong
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 -next 10/11] cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
` (8 preceding siblings ...)
2024-08-26 13:27 ` [PATCH v2 -next 09/11] cgroup/cpuset: move v1 interfaces " Chen Ridong
@ 2024-08-26 13:27 ` Chen Ridong
2024-08-26 13:27 ` [PATCH v2 -next 11/11] cgroup/cpuset: add sefltest for cpuset v1 Chen Ridong
10 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:27 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
This patch introduces CONFIG_CPUSETS_V1 and guard cpuset-v1 code under
CONFIG_CPUSETS_V1. The default value of CONFIG_CPUSETS_V1 is N, so that
user who adopted v2 don't have 'pay' for cpuset v1.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
include/linux/cpuset.h | 4 ++++
init/Kconfig | 13 +++++++++++++
kernel/cgroup/Makefile | 3 ++-
kernel/cgroup/cpuset-internal.h | 13 +++++++++++++
kernel/cgroup/cpuset.c | 2 ++
5 files changed, 34 insertions(+), 1 deletion(-)
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 2a6981eeebf8..835e7b793f6a 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -99,6 +99,7 @@ static inline bool cpuset_zone_allowed(struct zone *z, gfp_t gfp_mask)
extern int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
const struct task_struct *tsk2);
+#ifdef CONFIG_CPUSETS_V1
#define cpuset_memory_pressure_bump() \
do { \
if (cpuset_memory_pressure_enabled) \
@@ -106,6 +107,9 @@ extern int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
} while (0)
extern int cpuset_memory_pressure_enabled;
extern void __cpuset_memory_pressure_bump(void);
+#else
+static inline void cpuset_memory_pressure_bump(void) { }
+#endif
extern void cpuset_task_status_allowed(struct seq_file *m,
struct task_struct *task);
diff --git a/init/Kconfig b/init/Kconfig
index a465ea9525bd..8bf091354bea 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1143,6 +1143,19 @@ config CPUSETS
Say N if unsure.
+config CPUSETS_V1
+ bool "Legacy cgroup v1 cpusets controller"
+ depends on CPUSETS
+ default n
+ help
+ Legacy cgroup v1 cpusets controller which has been deprecated by
+ cgroup v2 implementation. The v1 is there for legacy applications
+ which haven't migrated to the new cgroup v2 interface yet. If you
+ do not have any such application then you are completely fine leaving
+ this option disabled.
+
+ Say N if unsure.
+
config PROC_PID_CPUSET
bool "Include legacy /proc/<pid>/cpuset file"
depends on CPUSETS
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 005ac4c675cb..a5c9359d516f 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -4,6 +4,7 @@ obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o freezer.o
obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
obj-$(CONFIG_CGROUP_PIDS) += pids.o
obj-$(CONFIG_CGROUP_RDMA) += rdma.o
-obj-$(CONFIG_CPUSETS) += cpuset.o cpuset-v1.o
+obj-$(CONFIG_CPUSETS) += cpuset.o
+obj-$(CONFIG_CPUSETS_V1) += cpuset-v1.o
obj-$(CONFIG_CGROUP_MISC) += misc.o
obj-$(CONFIG_CGROUP_DEBUG) += debug.o
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index a6c71c86e58d..7ec702dff877 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -279,6 +279,7 @@ int cpuset_common_seq_show(struct seq_file *sf, void *v);
/*
* cpuset-v1.c
*/
+#ifdef CONFIG_CPUSETS_V1
extern struct cftype legacy_files[];
void fmeter_init(struct fmeter *fmp);
void cpuset_update_task_spread_flags(struct cpuset *cs,
@@ -289,4 +290,16 @@ void hotplug_update_tasks_legacy(struct cpuset *cs,
bool cpus_updated, bool mems_updated);
int validate_change_legacy(struct cpuset *cur, struct cpuset *trial);
+#else
+static inline void fmeter_init(struct fmeter *fmp) {}
+static inline void cpuset_update_task_spread_flags(struct cpuset *cs,
+ struct task_struct *tsk) {}
+static inline void update_tasks_flags(struct cpuset *cs) {}
+static inline void hotplug_update_tasks_legacy(struct cpuset *cs,
+ struct cpumask *new_cpus, nodemask_t *new_mems,
+ bool cpus_updated, bool mems_updated) {}
+static inline int validate_change_legacy(struct cpuset *cur,
+ struct cpuset *trial) { return 0; }
+
+#endif /* CONFIG_CPUSETS_V1 */
#endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2f52fe488f3a..2cefeaaff742 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3617,7 +3617,9 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
.can_fork = cpuset_can_fork,
.cancel_fork = cpuset_cancel_fork,
.fork = cpuset_fork,
+#ifdef CONFIG_CPUSETS_V1
.legacy_cftypes = legacy_files,
+#endif
.dfl_cftypes = dfl_files,
.early_init = true,
.threaded = true,
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 -next 11/11] cgroup/cpuset: add sefltest for cpuset v1
2024-08-26 13:26 [PATCH v2 -next 00/11] cgroup:cpuset:separate legacy cgroup v1 code and put under config option Chen Ridong
` (9 preceding siblings ...)
2024-08-26 13:27 ` [PATCH v2 -next 10/11] cgroup/cpuset: guard cpuset-v1 code under CONFIG_CPUSETS_V1 Chen Ridong
@ 2024-08-26 13:27 ` Chen Ridong
10 siblings, 0 replies; 18+ messages in thread
From: Chen Ridong @ 2024-08-26 13:27 UTC (permalink / raw)
To: tj, lizefan.x, hannes, longman, adityakali, sergeh, mkoutny
Cc: cgroups, linux-kernel, chenridong
There is only hotplug test for cpuset v1, just add base read/write test
for cpuset v1.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
MAINTAINERS | 1 +
.../selftests/cgroup/test_cpuset_v1_base.sh | 77 +++++++++++++++++++
2 files changed, 78 insertions(+)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_v1_base.sh
diff --git a/MAINTAINERS b/MAINTAINERS
index 3b5ec1cafd95..b59f54e1e30d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5703,6 +5703,7 @@ F: kernel/cgroup/cpuset-v1.c
F: kernel/cgroup/cpuset.c
F: tools/testing/selftests/cgroup/test_cpuset.c
F: tools/testing/selftests/cgroup/test_cpuset_prs.sh
+F: tools/testing/selftests/cgroup/test_cpuset_v1_base.sh
CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
M: Johannes Weiner <hannes@cmpxchg.org>
diff --git a/tools/testing/selftests/cgroup/test_cpuset_v1_base.sh b/tools/testing/selftests/cgroup/test_cpuset_v1_base.sh
new file mode 100755
index 000000000000..42a6628fb8bc
--- /dev/null
+++ b/tools/testing/selftests/cgroup/test_cpuset_v1_base.sh
@@ -0,0 +1,77 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Basc test for cpuset v1 interfaces write/read
+#
+
+skip_test() {
+ echo "$1"
+ echo "Test SKIPPED"
+ exit 4 # ksft_skip
+}
+
+write_test() {
+ dir=$1
+ interface=$2
+ value=$3
+ original=$(cat $dir/$interface)
+ echo "testing $interface $value"
+ echo $value > $dir/$interface
+ new=$(cat $dir/$interface)
+ [[ $value -ne $(cat $dir/$interface) ]] && {
+ echo "$interface write $value failed: new:$new"
+ exit 1
+ }
+}
+
+[[ $(id -u) -eq 0 ]] || skip_test "Test must be run as root!"
+
+# Find cpuset v1 mount point
+CPUSET=$(mount -t cgroup | grep cpuset | head -1 | awk '{print $3}')
+[[ -n "$CPUSET" ]] || skip_test "cpuset v1 mount point not found!"
+
+#
+# Create a test cpuset, read write test
+#
+TDIR=test$$
+[[ -d $CPUSET/$TDIR ]] || mkdir $CPUSET/$TDIR
+
+ITF_MATRIX=(
+ #interface value expect root_only
+ 'cpuset.cpus 0-1 0-1 0'
+ 'cpuset.mem_exclusive 1 1 0'
+ 'cpuset.mem_exclusive 0 0 0'
+ 'cpuset.mem_hardwall 1 1 0'
+ 'cpuset.mem_hardwall 0 0 0'
+ 'cpuset.memory_migrate 1 1 0'
+ 'cpuset.memory_migrate 0 0 0'
+ 'cpuset.memory_spread_page 1 1 0'
+ 'cpuset.memory_spread_page 0 0 0'
+ 'cpuset.memory_spread_slab 1 1 0'
+ 'cpuset.memory_spread_slab 0 0 0'
+ 'cpuset.mems 0 0 0'
+ 'cpuset.sched_load_balance 1 1 0'
+ 'cpuset.sched_load_balance 0 0 0'
+ 'cpuset.sched_relax_domain_level 2 2 0'
+ 'cpuset.memory_pressure_enabled 1 1 1'
+ 'cpuset.memory_pressure_enabled 0 0 1'
+)
+
+run_test()
+{
+ cnt="${ITF_MATRIX[@]}"
+ for i in "${ITF_MATRIX[@]}" ; do
+ args=($i)
+ root_only=${args[3]}
+ [[ $root_only -eq 1 ]] && {
+ write_test "$CPUSET" "${args[0]}" "${args[1]}" "${args[2]}"
+ continue
+ }
+ write_test "$CPUSET/$TDIR" "${args[0]}" "${args[1]}" "${args[2]}"
+ done
+}
+
+run_test
+rmdir $CPUSET/$TDIR
+echo "Test PASSED"
+exit 0
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread