public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting in case can_attach() fails
@ 2023-03-22 13:59 Dietmar Eggemann
       [not found] ` <20230322135959.1998790-1-dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Dietmar Eggemann @ 2023-03-22 13:59 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Waiman Long, Tejun Heo,
	Zefan Li, Johannes Weiner, Hao Luo
  Cc: Steven Rostedt, Vincent Guittot, Daniel Bristot de Oliveira,
	Luca Abeni, Tommaso Cucinotta, Qais Yousef, Wei Wang,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

I followed Longman's idea to add a `deadline task transfer count` into
cpuset and only update the `dl task count` in cpuset_attach().

Moreover, I switched from per-task DL BW request to a per-cpuset one.
This way we don't have to free per-task in case xxx_can_attach() fails.

The DL BW freeing is handled in cpuset_cancel_attach() for the case
`multiple controllers and one of the non-cpuset can_attach() fails`.

Only lightly tested on cgroup v1 with exclusive cpusets so far.

Dietmar Eggemann (2):
  sched/deadline: Create DL BW alloc, free & check overflow interface
  cgroup/cpuset: Free DL BW in case can_attach() fails

 include/linux/sched.h   |  4 ++-
 kernel/cgroup/cpuset.c  | 55 +++++++++++++++++++++++++++++++++++++----
 kernel/sched/core.c     | 19 +++-----------
 kernel/sched/deadline.c | 53 +++++++++++++++++++++++++++++----------
 kernel/sched/sched.h    |  2 +-
 5 files changed, 97 insertions(+), 36 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/2] sched/deadline: Create DL BW alloc, free & check overflow interface
       [not found] ` <20230322135959.1998790-1-dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
@ 2023-03-22 13:59   ` Dietmar Eggemann
  2023-03-22 13:59   ` [RFC PATCH 2/2] cgroup/cpuset: Free DL BW in case can_attach() fails Dietmar Eggemann
  2023-03-23  9:33   ` [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting " Juri Lelli
  2 siblings, 0 replies; 6+ messages in thread
From: Dietmar Eggemann @ 2023-03-22 13:59 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Waiman Long, Tejun Heo,
	Zefan Li, Johannes Weiner, Hao Luo
  Cc: Steven Rostedt, Vincent Guittot, Daniel Bristot de Oliveira,
	Luca Abeni, Tommaso Cucinotta, Qais Yousef, Wei Wang,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Rework the existing dl_cpu_busy() interface which offered DL BW
overflow checking and per-task DL BW allocation.

Add dl_bw_free() as an interface to be able to free DL BW.
It will be used to allow freeing of the DL BW request done during
cpuset_can_attach() in case multiple controllers are attached to the
cgroup next to cpuset and one of the non-cpuset can_attach() fails.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
---
 include/linux/sched.h   |  2 ++
 kernel/sched/core.c     |  4 ++--
 kernel/sched/deadline.c | 53 +++++++++++++++++++++++++++++++----------
 kernel/sched/sched.h    |  2 +-
 4 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4df2b3e76b30..658e997ba057 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1847,6 +1847,8 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags)
 
 extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
 extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus);
+extern int dl_bw_alloc(int cpu, u64 dl_bw);
+extern void dl_bw_free(int cpu, u64 dl_bw);
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d586a8440348..2f07aecb7434 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9226,7 +9226,7 @@ int task_can_attach(struct task_struct *p,
 
 		if (unlikely(cpu >= nr_cpu_ids))
 			return -EINVAL;
-		ret = dl_cpu_busy(cpu, p);
+		ret = dl_bw_alloc(cpu, p->dl.dl_bw);
 	}
 
 out:
@@ -9511,7 +9511,7 @@ static void cpuset_cpu_active(void)
 static int cpuset_cpu_inactive(unsigned int cpu)
 {
 	if (!cpuhp_tasks_frozen) {
-		int ret = dl_cpu_busy(cpu, NULL);
+		int ret = dl_bw_check_overflow(cpu);
 
 		if (ret)
 			return ret;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 71b24371a6f7..41ed6c6d2628 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -3033,26 +3033,38 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
 	return ret;
 }
 
-int dl_cpu_busy(int cpu, struct task_struct *p)
+enum dl_bw_request {
+	dl_bw_req_check_overflow = 0,
+	dl_bw_req_alloc,
+	dl_bw_req_free
+};
+
+static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
 {
-	unsigned long flags, cap;
+	unsigned long flags;
 	struct dl_bw *dl_b;
-	bool overflow;
+	bool overflow = 0;
 
 	rcu_read_lock_sched();
 	dl_b = dl_bw_of(cpu);
 	raw_spin_lock_irqsave(&dl_b->lock, flags);
-	cap = dl_bw_capacity(cpu);
-	overflow = __dl_overflow(dl_b, cap, 0, p ? p->dl.dl_bw : 0);
 
-	if (!overflow && p) {
-		/*
-		 * We reserve space for this task in the destination
-		 * root_domain, as we can't fail after this point.
-		 * We will free resources in the source root_domain
-		 * later on (see set_cpus_allowed_dl()).
-		 */
-		__dl_add(dl_b, p->dl.dl_bw, dl_bw_cpus(cpu));
+	if (req == dl_bw_req_free) {
+		__dl_sub(dl_b, dl_bw, dl_bw_cpus(cpu));
+	} else {
+		unsigned long cap = dl_bw_capacity(cpu);
+
+		overflow = __dl_overflow(dl_b, cap, 0, dl_bw);
+
+		if (req == dl_bw_req_alloc && !overflow) {
+			/*
+			 * We reserve space in the destination
+			 * root_domain, as we can't fail after this point.
+			 * We will free resources in the source root_domain
+			 * later on (see set_cpus_allowed_dl()).
+			 */
+			__dl_add(dl_b, dl_bw, dl_bw_cpus(cpu));
+		}
 	}
 
 	raw_spin_unlock_irqrestore(&dl_b->lock, flags);
@@ -3060,6 +3072,21 @@ int dl_cpu_busy(int cpu, struct task_struct *p)
 
 	return overflow ? -EBUSY : 0;
 }
+
+int dl_bw_check_overflow(int cpu)
+{
+	return dl_bw_manage(dl_bw_req_check_overflow, cpu, 0);
+}
+
+int dl_bw_alloc(int cpu, u64 dl_bw)
+{
+	return dl_bw_manage(dl_bw_req_alloc, cpu, dl_bw);
+}
+
+void dl_bw_free(int cpu, u64 dl_bw)
+{
+	dl_bw_manage(dl_bw_req_free, cpu, dl_bw);
+}
 #endif
 
 #ifdef CONFIG_SCHED_DEBUG
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3e8df6d31c1e..6cb4cf878fe2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -330,7 +330,7 @@ extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr);
 extern bool __checkparam_dl(const struct sched_attr *attr);
 extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr);
 extern int  dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
-extern int  dl_cpu_busy(int cpu, struct task_struct *p);
+extern int  dl_bw_check_overflow(int cpu);
 
 #ifdef CONFIG_CGROUP_SCHED
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 2/2] cgroup/cpuset: Free DL BW in case can_attach() fails
       [not found] ` <20230322135959.1998790-1-dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
  2023-03-22 13:59   ` [RFC PATCH 1/2] sched/deadline: Create DL BW alloc, free & check overflow interface Dietmar Eggemann
@ 2023-03-22 13:59   ` Dietmar Eggemann
       [not found]     ` <20230322135959.1998790-3-dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
  2023-03-23  9:33   ` [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting " Juri Lelli
  2 siblings, 1 reply; 6+ messages in thread
From: Dietmar Eggemann @ 2023-03-22 13:59 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Waiman Long, Tejun Heo,
	Zefan Li, Johannes Weiner, Hao Luo
  Cc: Steven Rostedt, Vincent Guittot, Daniel Bristot de Oliveira,
	Luca Abeni, Tommaso Cucinotta, Qais Yousef, Wei Wang,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

cpuset_can_attach() can fail. Postpone DL BW allocation until all task
have been checked. DL BW is not allocated per-task but as a sum over
all DL tasks migrating.

If multiple controllers are attached to the cgroup next to cuset a
non-cpuset can_attach() can fail. In this case free DL BW in
cpuset_cancel_attach().

Finally, update cpuset DL task count (nr_deadline_tasks) only in
cpuset_attach().

Suggested-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
---
 include/linux/sched.h  |  2 +-
 kernel/cgroup/cpuset.c | 55 ++++++++++++++++++++++++++++++++++++++----
 kernel/sched/core.c    | 17 ++-----------
 3 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 658e997ba057..675ec74469d7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1846,7 +1846,7 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags)
 }
 
 extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
-extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus);
+extern int task_can_attach(struct task_struct *p);
 extern int dl_bw_alloc(int cpu, u64 dl_bw);
 extern void dl_bw_free(int cpu, u64 dl_bw);
 #ifdef CONFIG_SMP
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index f46192d2e97e..fdc476eefbed 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -198,6 +198,8 @@ struct cpuset {
 	 * know when to rebuild associated root domain bandwidth information.
 	 */
 	int nr_deadline_tasks;
+	int nr_migrate_dl_tasks;
+	u64 sum_migrate_dl_bw;
 
 	/* Invalid partition error code, not lock protected */
 	enum prs_errcode prs_err;
@@ -2462,16 +2464,23 @@ static int fmeter_getrate(struct fmeter *fmp)
 
 static struct cpuset *cpuset_attach_old_cs;
 
+static void reset_migrate_dl_data(struct cpuset *cs)
+{
+	cs->nr_migrate_dl_tasks = 0;
+	cs->sum_migrate_dl_bw = 0;
+}
+
 /* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
 static int cpuset_can_attach(struct cgroup_taskset *tset)
 {
 	struct cgroup_subsys_state *css;
-	struct cpuset *cs;
+	struct cpuset *cs, *oldcs;
 	struct task_struct *task;
 	int ret;
 
 	/* used later by cpuset_attach() */
 	cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
+	oldcs = cpuset_attach_old_cs;
 	cs = css_cs(css);
 
 	mutex_lock(&cpuset_mutex);
@@ -2489,7 +2498,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 		goto out_unlock;
 
 	cgroup_taskset_for_each(task, css, tset) {
-		ret = task_can_attach(task, cs->effective_cpus);
+		ret = task_can_attach(task);
 		if (ret)
 			goto out_unlock;
 		ret = security_task_setscheduler(task);
@@ -2497,11 +2506,31 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 			goto out_unlock;
 
 		if (dl_task(task)) {
-			cs->nr_deadline_tasks++;
-			cpuset_attach_old_cs->nr_deadline_tasks--;
+			cs->nr_migrate_dl_tasks++;
+			cs->sum_migrate_dl_bw += task->dl.dl_bw;
+		}
+	}
+
+	if (!cs->nr_migrate_dl_tasks)
+		goto out_succes;
+
+	if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) {
+		int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
+
+		if (unlikely(cpu >= nr_cpu_ids)) {
+			reset_migrate_dl_data(cs);
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+
+		ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
+		if (ret) {
+			reset_migrate_dl_data(cs);
+			goto out_unlock;
 		}
 	}
 
+out_succes:
 	/*
 	 * Mark attach is in progress.  This makes validate_change() fail
 	 * changes which zero cpus/mems_allowed.
@@ -2516,11 +2545,21 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
 static void cpuset_cancel_attach(struct cgroup_taskset *tset)
 {
 	struct cgroup_subsys_state *css;
+	struct cpuset *cs;
 
 	cgroup_taskset_first(tset, &css);
+	cs = css_cs(css);
 
 	mutex_lock(&cpuset_mutex);
-	css_cs(css)->attach_in_progress--;
+	cs->attach_in_progress--;
+
+	if (cs->nr_migrate_dl_tasks) {
+		int cpu = cpumask_any(cs->effective_cpus);
+
+		dl_bw_free(cpu, cs->sum_migrate_dl_bw);
+		reset_migrate_dl_data(cs);
+	}
+
 	mutex_unlock(&cpuset_mutex);
 }
 
@@ -2615,6 +2654,12 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 out:
 	cs->old_mems_allowed = cpuset_attach_nodemask_to;
 
+	if (cs->nr_migrate_dl_tasks) {
+		cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks;
+		oldcs->nr_deadline_tasks -= cs->nr_migrate_dl_tasks;
+		reset_migrate_dl_data(cs);
+	}
+
 	cs->attach_in_progress--;
 	if (!cs->attach_in_progress)
 		wake_up(&cpuset_attach_wq);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2f07aecb7434..4fb058b72886 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9201,8 +9201,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
 	return ret;
 }
 
-int task_can_attach(struct task_struct *p,
-		    const struct cpumask *cs_effective_cpus)
+int task_can_attach(struct task_struct *p)
 {
 	int ret = 0;
 
@@ -9215,21 +9214,9 @@ int task_can_attach(struct task_struct *p,
 	 * success of set_cpus_allowed_ptr() on all attached tasks
 	 * before cpus_mask may be changed.
 	 */
-	if (p->flags & PF_NO_SETAFFINITY) {
+	if (p->flags & PF_NO_SETAFFINITY)
 		ret = -EINVAL;
-		goto out;
-	}
-
-	if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
-					      cs_effective_cpus)) {
-		int cpu = cpumask_any_and(cpu_active_mask, cs_effective_cpus);
 
-		if (unlikely(cpu >= nr_cpu_ids))
-			return -EINVAL;
-		ret = dl_bw_alloc(cpu, p->dl.dl_bw);
-	}
-
-out:
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 2/2] cgroup/cpuset: Free DL BW in case can_attach() fails
       [not found]     ` <20230322135959.1998790-3-dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
@ 2023-03-22 14:48       ` Waiman Long
  0 siblings, 0 replies; 6+ messages in thread
From: Waiman Long @ 2023-03-22 14:48 UTC (permalink / raw)
  To: Dietmar Eggemann, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Tejun Heo, Zefan Li, Johannes Weiner, Hao Luo
  Cc: Steven Rostedt, Vincent Guittot, Daniel Bristot de Oliveira,
	Luca Abeni, Tommaso Cucinotta, Qais Yousef, Wei Wang,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 3/22/23 09:59, Dietmar Eggemann wrote:
> cpuset_can_attach() can fail. Postpone DL BW allocation until all task
> have been checked. DL BW is not allocated per-task but as a sum over
> all DL tasks migrating.
>
> If multiple controllers are attached to the cgroup next to cuset a
Typo: "cuset" -> "cpuset"
> non-cpuset can_attach() can fail. In this case free DL BW in
> cpuset_cancel_attach().
>
> Finally, update cpuset DL task count (nr_deadline_tasks) only in
> cpuset_attach().
>
> Suggested-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
> ---
>   include/linux/sched.h  |  2 +-
>   kernel/cgroup/cpuset.c | 55 ++++++++++++++++++++++++++++++++++++++----
>   kernel/sched/core.c    | 17 ++-----------
>   3 files changed, 53 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 658e997ba057..675ec74469d7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1846,7 +1846,7 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags)
>   }
>   
>   extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
> -extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus);
> +extern int task_can_attach(struct task_struct *p);
>   extern int dl_bw_alloc(int cpu, u64 dl_bw);
>   extern void dl_bw_free(int cpu, u64 dl_bw);
>   #ifdef CONFIG_SMP
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index f46192d2e97e..fdc476eefbed 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -198,6 +198,8 @@ struct cpuset {
>   	 * know when to rebuild associated root domain bandwidth information.
>   	 */
>   	int nr_deadline_tasks;
> +	int nr_migrate_dl_tasks;
> +	u64 sum_migrate_dl_bw;
>   
>   	/* Invalid partition error code, not lock protected */
>   	enum prs_errcode prs_err;
> @@ -2462,16 +2464,23 @@ static int fmeter_getrate(struct fmeter *fmp)
>   
>   static struct cpuset *cpuset_attach_old_cs;
>   
> +static void reset_migrate_dl_data(struct cpuset *cs)
> +{
> +	cs->nr_migrate_dl_tasks = 0;
> +	cs->sum_migrate_dl_bw = 0;
> +}
> +
>   /* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */
>   static int cpuset_can_attach(struct cgroup_taskset *tset)
>   {
>   	struct cgroup_subsys_state *css;
> -	struct cpuset *cs;
> +	struct cpuset *cs, *oldcs;
>   	struct task_struct *task;
>   	int ret;
>   
>   	/* used later by cpuset_attach() */
>   	cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
> +	oldcs = cpuset_attach_old_cs;
>   	cs = css_cs(css);
>   
>   	mutex_lock(&cpuset_mutex);
> @@ -2489,7 +2498,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>   		goto out_unlock;
>   
>   	cgroup_taskset_for_each(task, css, tset) {
> -		ret = task_can_attach(task, cs->effective_cpus);
> +		ret = task_can_attach(task);
>   		if (ret)
>   			goto out_unlock;
>   		ret = security_task_setscheduler(task);
> @@ -2497,11 +2506,31 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>   			goto out_unlock;
>   
>   		if (dl_task(task)) {
> -			cs->nr_deadline_tasks++;
> -			cpuset_attach_old_cs->nr_deadline_tasks--;
> +			cs->nr_migrate_dl_tasks++;
> +			cs->sum_migrate_dl_bw += task->dl.dl_bw;
> +		}
> +	}
> +
> +	if (!cs->nr_migrate_dl_tasks)
> +		goto out_succes;
> +
> +	if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) {
> +		int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus);
> +
> +		if (unlikely(cpu >= nr_cpu_ids)) {
> +			reset_migrate_dl_data(cs);
> +			ret = -EINVAL;
> +			goto out_unlock;
> +		}
> +
> +		ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw);
> +		if (ret) {
> +			reset_migrate_dl_data(cs);
> +			goto out_unlock;

I do have a question about just picking any cpu to see if there is 
enough DL bandwidth. What happens if there are multiple DL tasks to be 
migrated and their total bandwidth exceed any one cpu can provide?

Other than that, the overall workflow looks good to me from the cpuset 
point of view.

Cheers,
Longman

>   		}
>   	}
>   
> +out_succes:
>   	/*
>   	 * Mark attach is in progress.  This makes validate_change() fail
>   	 * changes which zero cpus/mems_allowed.
> @@ -2516,11 +2545,21 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
>   static void cpuset_cancel_attach(struct cgroup_taskset *tset)
>   {
>   	struct cgroup_subsys_state *css;
> +	struct cpuset *cs;
>   
>   	cgroup_taskset_first(tset, &css);
> +	cs = css_cs(css);
>   
>   	mutex_lock(&cpuset_mutex);
> -	css_cs(css)->attach_in_progress--;
> +	cs->attach_in_progress--;
> +
> +	if (cs->nr_migrate_dl_tasks) {
> +		int cpu = cpumask_any(cs->effective_cpus);
> +
> +		dl_bw_free(cpu, cs->sum_migrate_dl_bw);
> +		reset_migrate_dl_data(cs);
> +	}
> +
>   	mutex_unlock(&cpuset_mutex);
>   }
>   
> @@ -2615,6 +2654,12 @@ static void cpuset_attach(struct cgroup_taskset *tset)
>   out:
>   	cs->old_mems_allowed = cpuset_attach_nodemask_to;
>   
> +	if (cs->nr_migrate_dl_tasks) {
> +		cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks;
> +		oldcs->nr_deadline_tasks -= cs->nr_migrate_dl_tasks;
> +		reset_migrate_dl_data(cs);
> +	}
> +
>   	cs->attach_in_progress--;
>   	if (!cs->attach_in_progress)
>   		wake_up(&cpuset_attach_wq);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2f07aecb7434..4fb058b72886 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9201,8 +9201,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur,
>   	return ret;
>   }
>   
> -int task_can_attach(struct task_struct *p,
> -		    const struct cpumask *cs_effective_cpus)
> +int task_can_attach(struct task_struct *p)
>   {
>   	int ret = 0;
>   
> @@ -9215,21 +9214,9 @@ int task_can_attach(struct task_struct *p,
>   	 * success of set_cpus_allowed_ptr() on all attached tasks
>   	 * before cpus_mask may be changed.
>   	 */
> -	if (p->flags & PF_NO_SETAFFINITY) {
> +	if (p->flags & PF_NO_SETAFFINITY)
>   		ret = -EINVAL;
> -		goto out;
> -	}
> -
> -	if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
> -					      cs_effective_cpus)) {
> -		int cpu = cpumask_any_and(cpu_active_mask, cs_effective_cpus);
>   
> -		if (unlikely(cpu >= nr_cpu_ids))
> -			return -EINVAL;
> -		ret = dl_bw_alloc(cpu, p->dl.dl_bw);
> -	}
> -
> -out:
>   	return ret;
>   }
>   


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting in case can_attach() fails
       [not found] ` <20230322135959.1998790-1-dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
  2023-03-22 13:59   ` [RFC PATCH 1/2] sched/deadline: Create DL BW alloc, free & check overflow interface Dietmar Eggemann
  2023-03-22 13:59   ` [RFC PATCH 2/2] cgroup/cpuset: Free DL BW in case can_attach() fails Dietmar Eggemann
@ 2023-03-23  9:33   ` Juri Lelli
       [not found]     ` <ZBwc4l0ZyyRQPiSP-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  2 siblings, 1 reply; 6+ messages in thread
From: Juri Lelli @ 2023-03-23  9:33 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Waiman Long, Tejun Heo, Zefan Li,
	Johannes Weiner, Hao Luo, Steven Rostedt, Vincent Guittot,
	Daniel Bristot de Oliveira, Luca Abeni, Tommaso Cucinotta,
	Qais Yousef, Wei Wang, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hi,

On 22/03/23 14:59, Dietmar Eggemann wrote:
> I followed Longman's idea to add a `deadline task transfer count` into
> cpuset and only update the `dl task count` in cpuset_attach().
> 
> Moreover, I switched from per-task DL BW request to a per-cpuset one.
> This way we don't have to free per-task in case xxx_can_attach() fails.
> 
> The DL BW freeing is handled in cpuset_cancel_attach() for the case
> `multiple controllers and one of the non-cpuset can_attach() fails`.
> 
> Only lightly tested on cgroup v1 with exclusive cpusets so far.

This makes sense to me. Thanks for working on it!

Guess I might incorporate these in my (RFC) series and re-post the whole
lot?

Best,
Juri


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting in case can_attach() fails
       [not found]     ` <ZBwc4l0ZyyRQPiSP-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2023-03-24 14:56       ` Dietmar Eggemann
  0 siblings, 0 replies; 6+ messages in thread
From: Dietmar Eggemann @ 2023-03-24 14:56 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Ingo Molnar, Peter Zijlstra, Waiman Long, Tejun Heo, Zefan Li,
	Johannes Weiner, Hao Luo, Steven Rostedt, Vincent Guittot,
	Daniel Bristot de Oliveira, Luca Abeni, Tommaso Cucinotta,
	Qais Yousef, Wei Wang, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On 23/03/2023 10:33, Juri Lelli wrote:
> Hi,
> 
> On 22/03/23 14:59, Dietmar Eggemann wrote:

[...]

> This makes sense to me. Thanks for working on it!
> 
> Guess I might incorporate these in my (RFC) series and re-post the whole
> lot?

Yes, please do so.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-24 14:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-22 13:59 [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting in case can_attach() fails Dietmar Eggemann
     [not found] ` <20230322135959.1998790-1-dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
2023-03-22 13:59   ` [RFC PATCH 1/2] sched/deadline: Create DL BW alloc, free & check overflow interface Dietmar Eggemann
2023-03-22 13:59   ` [RFC PATCH 2/2] cgroup/cpuset: Free DL BW in case can_attach() fails Dietmar Eggemann
     [not found]     ` <20230322135959.1998790-3-dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>
2023-03-22 14:48       ` Waiman Long
2023-03-23  9:33   ` [RFC PATCH 0/2] sched/cpuset: Fix DL BW accounting " Juri Lelli
     [not found]     ` <ZBwc4l0ZyyRQPiSP-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2023-03-24 14:56       ` Dietmar Eggemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox