[PATCH 0/3] SCHED_DEADLINE fix AC and SMP scheduling

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] SCHED_DEADLINE fix AC and SMP scheduling
@ 2014-09-19  9:22 Juri Lelli
  2014-09-19  9:22 ` [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class Juri Lelli
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Juri Lelli @ 2014-09-19  9:22 UTC (permalink / raw)
  To: peterz
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, daniel.wagner,
	vincent, luca.abeni, linux-kernel, Juri Lelli

Hello everyone,

This patchset fixes admission control, bandwidth management and
(clustered) SMP scheduling for SCHED_DEADLINE tasks.

Patch 1/3 properly clears things up when a task leaves SCHED_DEADLINE,
without dying (and it is a different solution for the problem spotted
by Daniel, http://marc.info/?l=linux-kernel&m=140974782707197&w=2);
patch 2/3 fixes how cpusets bandwidth is checked and updated; patch 3/3
fixes SMP (clustered) scheduling.

Best Regards,

- Juri

Juri Lelli (3):
  sched/deadline: clear dl_entity params when setscheduling to different
    class
  sched/deadline: fix bandwidth check/update when migrating tasks
    between exclusive cpusets
  sched/deadline: fix inter- exclusive cpusets migrations

 include/linux/sched.h      |  2 ++
 kernel/cpuset.c            | 13 ++-----
 kernel/sched/core.c        | 89 ++++++++++++++++++++++++++++++++++------------
 kernel/sched/cpudeadline.c |  4 +--
 kernel/sched/deadline.c    | 34 ++++++++++++++++--
 kernel/sched/sched.h       | 22 ++++++++++++
 6 files changed, 125 insertions(+), 39 deletions(-)

-- 
2.1.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class
  2014-09-19  9:22 [PATCH 0/3] SCHED_DEADLINE fix AC and SMP scheduling Juri Lelli
@ 2014-09-19  9:22 ` Juri Lelli
  2014-09-19 11:44   ` Daniel Wagner
                     ` (3 more replies)
  2014-09-19  9:22 ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Juri Lelli
  2014-09-19  9:22 ` [PATCH 3/3] sched/deadline: fix inter- exclusive cpusets migrations Juri Lelli
  2 siblings, 4 replies; 32+ messages in thread
From: Juri Lelli @ 2014-09-19  9:22 UTC (permalink / raw)
  To: peterz
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, daniel.wagner,
	vincent, luca.abeni, linux-kernel, Juri Lelli

When a task is using SCHED_DEADLINE and the user setschedules it to a different
class its sched_dl_entity static parameters are not cleaned up. This causes a
bug if the user sets it back to SCHED_DEADLINE with the same parameters again.
The problem resides in the check we perform at the very beginning of
dl_overflow():

	if (new_bw == p->dl.dl_bw)
		return 0;

This condition is met in the case depicted above, so the function returns and
dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). After this,
admission control is broken.

This patch fixes the thing, properly clearing static parameters for a task
that ceases to use SCHED_DEADLINE.

Reported-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com>
Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: Vincent Legout <vincent@legout.info>
Tested-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/core.c     | 19 +++++++++++++++----
 kernel/sched/deadline.c |  2 ++
 kernel/sched/sched.h    |  3 +++
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec1a286..581a429 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1776,6 +1776,20 @@ int wake_up_state(struct task_struct *p, unsigned int state)
 }
 
 /*
+ * This function clears the sched_dl_entity static params.
+ */
+void __dl_clear_params(struct task_struct *p)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	dl_se->dl_runtime = 0;
+	dl_se->dl_deadline = 0;
+	dl_se->dl_period = 0;
+	dl_se->flags = 0;
+	dl_se->dl_bw = 0;
+}
+
+/*
  * Perform scheduler related setup for a newly forked process p.
  * p is forked by current.
  *
@@ -1799,10 +1813,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 
 	RB_CLEAR_NODE(&p->dl.rb_node);
 	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	p->dl.dl_runtime = p->dl.runtime = 0;
-	p->dl.dl_deadline = p->dl.deadline = 0;
-	p->dl.dl_period = 0;
-	p->dl.flags = 0;
+	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 255ce13..4a51b14 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1569,6 +1569,8 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p)
 	if (hrtimer_active(&p->dl.dl_timer) && !dl_policy(p->policy))
 		hrtimer_try_to_cancel(&p->dl.dl_timer);
 
+	__dl_clear_params(p);
+
 #ifdef CONFIG_SMP
 	/*
 	 * Since this might be the only -deadline task on the rq,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 579712f..4890484 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -126,6 +126,9 @@ struct rt_bandwidth {
 	u64			rt_runtime;
 	struct hrtimer		rt_period_timer;
 };
+
+void __dl_clear_params(struct task_struct *p);
+
 /*
  * To keep the bandwidth of -deadline tasks and groups under control
  * we need some place where:
-- 
2.1.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class
  2014-09-19  9:22 ` [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class Juri Lelli
@ 2014-09-19 11:44   ` Daniel Wagner
  2014-09-19 12:43     ` Juri Lelli
  2014-09-22 18:50   ` Vincent Legout
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Daniel Wagner @ 2014-09-19 11:44 UTC (permalink / raw)
  To: Juri Lelli, peterz
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, vincent,
	luca.abeni, linux-kernel

Hi,

On 09/19/2014 11:22 AM, Juri Lelli wrote:
> When a task is using SCHED_DEADLINE and the user setschedules it to a different
> class its sched_dl_entity static parameters are not cleaned up. This causes a
> bug if the user sets it back to SCHED_DEADLINE with the same parameters again.
> The problem resides in the check we perform at the very beginning of
> dl_overflow():
> 
> 	if (new_bw == p->dl.dl_bw)
> 		return 0;
> 
> This condition is met in the case depicted above, so the function returns and
> dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). After this,
> admission control is broken.
> 
> This patch fixes the thing, properly clearing static parameters for a task
> that ceases to use SCHED_DEADLINE.
> 
> Reported-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com>
> Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>

Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>

Fixes my problem.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class
  2014-09-19 11:44   ` Daniel Wagner
@ 2014-09-19 12:43     ` Juri Lelli
  0 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-09-19 12:43 UTC (permalink / raw)
  To: Daniel Wagner, peterz@infradead.org
  Cc: mingo@redhat.com, juri.lelli@gmail.com, raistlin@linux.it,
	michael@amarulasolutions.com, fchecconi@gmail.com,
	vincent@legout.info, luca.abeni@unitn.it,
	linux-kernel@vger.kernel.org

Hi Daniel,

On 19/09/14 12:44, Daniel Wagner wrote:
> Hi,
> 
> On 09/19/2014 11:22 AM, Juri Lelli wrote:
>> When a task is using SCHED_DEADLINE and the user setschedules it to a different
>> class its sched_dl_entity static parameters are not cleaned up. This causes a
>> bug if the user sets it back to SCHED_DEADLINE with the same parameters again.
>> The problem resides in the check we perform at the very beginning of
>> dl_overflow():
>>
>> 	if (new_bw == p->dl.dl_bw)
>> 		return 0;
>>
>> This condition is met in the case depicted above, so the function returns and
>> dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). After this,
>> admission control is broken.
>>
>> This patch fixes the thing, properly clearing static parameters for a task
>> that ceases to use SCHED_DEADLINE.
>>
>> Reported-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com>
>> Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> 
> Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> 
> Fixes my problem.
> 

Great! Thanks for testing it.

Best,

- Juri


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class
  2014-09-19  9:22 ` [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class Juri Lelli
  2014-09-19 11:44   ` Daniel Wagner
@ 2014-09-22 18:50   ` Vincent Legout
  2014-09-24 14:54   ` [tip:sched/core] sched/deadline: Clear " tip-bot for Juri Lelli
  2014-10-08 12:32   ` [PATCH 1/3] sched/deadline: clear " Wanpeng Li
  3 siblings, 0 replies; 32+ messages in thread
From: Vincent Legout @ 2014-09-22 18:50 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, juri.lelli, raistlin, michael, fchecconi,
	daniel.wagner, luca.abeni, linux-kernel

Hello,

Juri Lelli <juri.lelli@arm.com> writes:

> When a task is using SCHED_DEADLINE and the user setschedules it to a different
> class its sched_dl_entity static parameters are not cleaned up. This causes a
> bug if the user sets it back to SCHED_DEADLINE with the same parameters again.
> The problem resides in the check we perform at the very beginning of
> dl_overflow():
>
> 	if (new_bw == p->dl.dl_bw)
> 		return 0;
>
> This condition is met in the case depicted above, so the function returns and
> dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). After this,
> admission control is broken.
>
> This patch fixes the thing, properly clearing static parameters for a task
> that ceases to use SCHED_DEADLINE.

Thanks, this fixes the issue I had:

Tested-by: Vincent Legout <vincent@legout.info>

Thanks,
Vincent

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [tip:sched/core] sched/deadline: Clear dl_entity params when setscheduling to different class
  2014-09-19  9:22 ` [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class Juri Lelli
  2014-09-19 11:44   ` Daniel Wagner
  2014-09-22 18:50   ` Vincent Legout
@ 2014-09-24 14:54   ` tip-bot for Juri Lelli
  2014-10-08 12:32   ` [PATCH 1/3] sched/deadline: clear " Wanpeng Li
  3 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Juri Lelli @ 2014-09-24 14:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, luca.abeni, hpa, mingo, torvalds, peterz,
	daniel.wagner, daniele.alessandrelli, raistlin, fchecconi,
	vincent, juri.lelli, michael, tglx

Commit-ID:  a5e7be3b28a235108c59561bea55eea1072b23b0
Gitweb:     http://git.kernel.org/tip/a5e7be3b28a235108c59561bea55eea1072b23b0
Author:     Juri Lelli <juri.lelli@arm.com>
AuthorDate: Fri, 19 Sep 2014 10:22:39 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Sep 2014 14:46:56 +0200

sched/deadline: Clear dl_entity params when setscheduling to different class

When a task is using SCHED_DEADLINE and the user setschedules it to a
different class its sched_dl_entity static parameters are not cleaned
up. This causes a bug if the user sets it back to SCHED_DEADLINE with
the same parameters again.  The problem resides in the check we
perform at the very beginning of dl_overflow():

	if (new_bw == p->dl.dl_bw)
		return 0;

This condition is met in the case depicted above, so the function
returns and dl_b->total_bw is not updated (the p->dl.dl_bw is not
added to it). After this, admission control is broken.

This patch fixes the thing, properly clearing static parameters for a
task that ceases to use SCHED_DEADLINE.

Reported-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com>
Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: Vincent Legout <vincent@legout.info>
Tested-by: Luca Abeni <luca.abeni@unitn.it>
Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Tested-by: Vincent Legout <vincent@legout.info>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1411118561-26323-2-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c     | 19 +++++++++++++++----
 kernel/sched/deadline.c |  2 ++
 kernel/sched/sched.h    |  3 +++
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a284190..09bde2a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1809,6 +1809,20 @@ int wake_up_state(struct task_struct *p, unsigned int state)
 }
 
 /*
+ * This function clears the sched_dl_entity static params.
+ */
+void __dl_clear_params(struct task_struct *p)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	dl_se->dl_runtime = 0;
+	dl_se->dl_deadline = 0;
+	dl_se->dl_period = 0;
+	dl_se->flags = 0;
+	dl_se->dl_bw = 0;
+}
+
+/*
  * Perform scheduler related setup for a newly forked process p.
  * p is forked by current.
  *
@@ -1832,10 +1846,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 
 	RB_CLEAR_NODE(&p->dl.rb_node);
 	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	p->dl.dl_runtime = p->dl.runtime = 0;
-	p->dl.dl_deadline = p->dl.deadline = 0;
-	p->dl.dl_period = 0;
-	p->dl.flags = 0;
+	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index aaa5abb..efb9412 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1565,6 +1565,8 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p)
 	if (hrtimer_active(&p->dl.dl_timer) && !dl_policy(p->policy))
 		hrtimer_try_to_cancel(&p->dl.dl_timer);
 
+	__dl_clear_params(p);
+
 #ifdef CONFIG_SMP
 	/*
 	 * Since this might be the only -deadline task on the rq,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1bc6aad..76f3a38 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -130,6 +130,9 @@ struct rt_bandwidth {
 	u64			rt_runtime;
 	struct hrtimer		rt_period_timer;
 };
+
+void __dl_clear_params(struct task_struct *p);
+
 /*
  * To keep the bandwidth of -deadline tasks and groups under control
  * we need some place where:

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class
  2014-09-19  9:22 ` [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class Juri Lelli
                     ` (2 preceding siblings ...)
  2014-09-24 14:54   ` [tip:sched/core] sched/deadline: Clear " tip-bot for Juri Lelli
@ 2014-10-08 12:32   ` Wanpeng Li
  2014-10-21 12:15     ` Wanpeng Li
  3 siblings, 1 reply; 32+ messages in thread
From: Wanpeng Li @ 2014-10-08 12:32 UTC (permalink / raw)
  To: Juri Lelli, peterz
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, daniel.wagner,
	vincent, luca.abeni, linux-kernel

Hi Juri,

于 9/19/14, 5:22 PM, Juri Lelli 写道:
> When a task is using SCHED_DEADLINE and the user setschedules it to a different
> class its sched_dl_entity static parameters are not cleaned up. This causes a
> bug if the user sets it back to SCHED_DEADLINE with the same parameters again.
> The problem resides in the check we perform at the very beginning of
> dl_overflow():
>
> 	if (new_bw == p->dl.dl_bw)
> 		return 0;
>
> This condition is met in the case depicted above, so the function returns and
> dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). After this,

As you know there is no static parameter clear before this patch, so if 
p->dl.dl_bw will decrease from dl_b->total_bw when the user setschedules 
to a different class instead of dl?

Regards,
Wanpeng Li

> admission control is broken.
>
> This patch fixes the thing, properly clearing static parameters for a task
> that ceases to use SCHED_DEADLINE.
>
> Reported-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com>
> Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> Reported-by: Vincent Legout <vincent@legout.info>
> Tested-by: Luca Abeni <luca.abeni@unitn.it>
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@gmail.com>
> Cc: Dario Faggioli <raistlin@linux.it>
> Cc: Michael Trimarchi <michael@amarulasolutions.com>
> Cc: Fabio Checconi <fchecconi@gmail.com>
> Cc: linux-kernel@vger.kernel.org
> ---
>   kernel/sched/core.c     | 19 +++++++++++++++----
>   kernel/sched/deadline.c |  2 ++
>   kernel/sched/sched.h    |  3 +++
>   3 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ec1a286..581a429 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1776,6 +1776,20 @@ int wake_up_state(struct task_struct *p, unsigned int state)
>   }
>   
>   /*
> + * This function clears the sched_dl_entity static params.
> + */
> +void __dl_clear_params(struct task_struct *p)
> +{
> +	struct sched_dl_entity *dl_se = &p->dl;
> +
> +	dl_se->dl_runtime = 0;
> +	dl_se->dl_deadline = 0;
> +	dl_se->dl_period = 0;
> +	dl_se->flags = 0;
> +	dl_se->dl_bw = 0;
> +}
> +
> +/*
>    * Perform scheduler related setup for a newly forked process p.
>    * p is forked by current.
>    *
> @@ -1799,10 +1813,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
>   
>   	RB_CLEAR_NODE(&p->dl.rb_node);
>   	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> -	p->dl.dl_runtime = p->dl.runtime = 0;
> -	p->dl.dl_deadline = p->dl.deadline = 0;
> -	p->dl.dl_period = 0;
> -	p->dl.flags = 0;
> +	__dl_clear_params(p);
>   
>   	INIT_LIST_HEAD(&p->rt.run_list);
>   
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 255ce13..4a51b14 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1569,6 +1569,8 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p)
>   	if (hrtimer_active(&p->dl.dl_timer) && !dl_policy(p->policy))
>   		hrtimer_try_to_cancel(&p->dl.dl_timer);
>   
> +	__dl_clear_params(p);
> +
>   #ifdef CONFIG_SMP
>   	/*
>   	 * Since this might be the only -deadline task on the rq,
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 579712f..4890484 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -126,6 +126,9 @@ struct rt_bandwidth {
>   	u64			rt_runtime;
>   	struct hrtimer		rt_period_timer;
>   };
> +
> +void __dl_clear_params(struct task_struct *p);
> +
>   /*
>    * To keep the bandwidth of -deadline tasks and groups under control
>    * we need some place where:


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class
  2014-10-08 12:32   ` [PATCH 1/3] sched/deadline: clear " Wanpeng Li
@ 2014-10-21 12:15     ` Wanpeng Li
  2014-10-21 13:15       ` Juri Lelli
  0 siblings, 1 reply; 32+ messages in thread
From: Wanpeng Li @ 2014-10-21 12:15 UTC (permalink / raw)
  To: Juri Lelli, peterz
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, daniel.wagner,
	vincent, luca.abeni, linux-kernel

ping,

于 10/8/14, 8:32 PM, Wanpeng Li 写道:
> Hi Juri,
>
> 于 9/19/14, 5:22 PM, Juri Lelli 写道:
>> When a task is using SCHED_DEADLINE and the user setschedules it to a 
>> different
>> class its sched_dl_entity static parameters are not cleaned up. This 
>> causes a
>> bug if the user sets it back to SCHED_DEADLINE with the same 
>> parameters again.
>> The problem resides in the check we perform at the very beginning of
>> dl_overflow():
>>
>>     if (new_bw == p->dl.dl_bw)
>>         return 0;
>>
>> This condition is met in the case depicted above, so the function 
>> returns and
>> dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). 
>> After this,
>
> As you know there is no static parameter clear before this patch, so 
> if p->dl.dl_bw will decrease from dl_b->total_bw when the user 
> setschedules to a different class instead of dl?
>
> Regards,
> Wanpeng Li
>
>> admission control is broken.
>>
>> This patch fixes the thing, properly clearing static parameters for a 
>> task
>> that ceases to use SCHED_DEADLINE.
>>
>> Reported-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com>
>> Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
>> Reported-by: Vincent Legout <vincent@legout.info>
>> Tested-by: Luca Abeni <luca.abeni@unitn.it>
>> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Juri Lelli <juri.lelli@gmail.com>
>> Cc: Dario Faggioli <raistlin@linux.it>
>> Cc: Michael Trimarchi <michael@amarulasolutions.com>
>> Cc: Fabio Checconi <fchecconi@gmail.com>
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>   kernel/sched/core.c     | 19 +++++++++++++++----
>>   kernel/sched/deadline.c |  2 ++
>>   kernel/sched/sched.h    |  3 +++
>>   3 files changed, 20 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index ec1a286..581a429 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1776,6 +1776,20 @@ int wake_up_state(struct task_struct *p, 
>> unsigned int state)
>>   }
>>     /*
>> + * This function clears the sched_dl_entity static params.
>> + */
>> +void __dl_clear_params(struct task_struct *p)
>> +{
>> +    struct sched_dl_entity *dl_se = &p->dl;
>> +
>> +    dl_se->dl_runtime = 0;
>> +    dl_se->dl_deadline = 0;
>> +    dl_se->dl_period = 0;
>> +    dl_se->flags = 0;
>> +    dl_se->dl_bw = 0;
>> +}
>> +
>> +/*
>>    * Perform scheduler related setup for a newly forked process p.
>>    * p is forked by current.
>>    *
>> @@ -1799,10 +1813,7 @@ static void __sched_fork(unsigned long 
>> clone_flags, struct task_struct *p)
>>         RB_CLEAR_NODE(&p->dl.rb_node);
>>       hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>> -    p->dl.dl_runtime = p->dl.runtime = 0;
>> -    p->dl.dl_deadline = p->dl.deadline = 0;
>> -    p->dl.dl_period = 0;
>> -    p->dl.flags = 0;
>> +    __dl_clear_params(p);
>>         INIT_LIST_HEAD(&p->rt.run_list);
>>   diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 255ce13..4a51b14 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -1569,6 +1569,8 @@ static void switched_from_dl(struct rq *rq, 
>> struct task_struct *p)
>>       if (hrtimer_active(&p->dl.dl_timer) && !dl_policy(p->policy))
>>           hrtimer_try_to_cancel(&p->dl.dl_timer);
>>   +    __dl_clear_params(p);
>> +
>>   #ifdef CONFIG_SMP
>>       /*
>>        * Since this might be the only -deadline task on the rq,
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 579712f..4890484 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -126,6 +126,9 @@ struct rt_bandwidth {
>>       u64            rt_runtime;
>>       struct hrtimer        rt_period_timer;
>>   };
>> +
>> +void __dl_clear_params(struct task_struct *p);
>> +
>>   /*
>>    * To keep the bandwidth of -deadline tasks and groups under control
>>    * we need some place where:
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class
  2014-10-21 12:15     ` Wanpeng Li
@ 2014-10-21 13:15       ` Juri Lelli
  0 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-10-21 13:15 UTC (permalink / raw)
  To: Wanpeng Li, peterz@infradead.org
  Cc: mingo@redhat.com, juri.lelli@gmail.com, raistlin@linux.it,
	michael@amarulasolutions.com, fchecconi@gmail.com,
	daniel.wagner@bmw-carit.de, vincent@legout.info,
	luca.abeni@unitn.it, linux-kernel@vger.kernel.org

Hi,

On 21/10/14 13:15, Wanpeng Li wrote:
> ping,
> 
> 于 10/8/14, 8:32 PM, Wanpeng Li 写道:
>> Hi Juri,
>>
>> 于 9/19/14, 5:22 PM, Juri Lelli 写道:
>>> When a task is using SCHED_DEADLINE and the user setschedules it to a 
>>> different
>>> class its sched_dl_entity static parameters are not cleaned up. This 
>>> causes a
>>> bug if the user sets it back to SCHED_DEADLINE with the same 
>>> parameters again.
>>> The problem resides in the check we perform at the very beginning of
>>> dl_overflow():
>>>
>>>     if (new_bw == p->dl.dl_bw)
>>>         return 0;
>>>
>>> This condition is met in the case depicted above, so the function 
>>> returns and
>>> dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). 
>>> After this,
>>
>> As you know there is no static parameter clear before this patch, so 
>> if p->dl.dl_bw will decrease from dl_b->total_bw when the user 
>> setschedules to a different class instead of dl?
>>

We remove p->dl.dl_bw from dl_b->total_bw in dl_overflow(). Then this
patch introduces the clearing of p->dl params.

Thanks,

- Juri

>> Regards,
>> Wanpeng Li
>>
>>> admission control is broken.
>>>
>>> This patch fixes the thing, properly clearing static parameters for a 
>>> task
>>> that ceases to use SCHED_DEADLINE.
>>>
>>> Reported-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com>
>>> Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
>>> Reported-by: Vincent Legout <vincent@legout.info>
>>> Tested-by: Luca Abeni <luca.abeni@unitn.it>
>>> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: Juri Lelli <juri.lelli@gmail.com>
>>> Cc: Dario Faggioli <raistlin@linux.it>
>>> Cc: Michael Trimarchi <michael@amarulasolutions.com>
>>> Cc: Fabio Checconi <fchecconi@gmail.com>
>>> Cc: linux-kernel@vger.kernel.org
>>> ---
>>>   kernel/sched/core.c     | 19 +++++++++++++++----
>>>   kernel/sched/deadline.c |  2 ++
>>>   kernel/sched/sched.h    |  3 +++
>>>   3 files changed, 20 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index ec1a286..581a429 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -1776,6 +1776,20 @@ int wake_up_state(struct task_struct *p, 
>>> unsigned int state)
>>>   }
>>>     /*
>>> + * This function clears the sched_dl_entity static params.
>>> + */
>>> +void __dl_clear_params(struct task_struct *p)
>>> +{
>>> +    struct sched_dl_entity *dl_se = &p->dl;
>>> +
>>> +    dl_se->dl_runtime = 0;
>>> +    dl_se->dl_deadline = 0;
>>> +    dl_se->dl_period = 0;
>>> +    dl_se->flags = 0;
>>> +    dl_se->dl_bw = 0;
>>> +}
>>> +
>>> +/*
>>>    * Perform scheduler related setup for a newly forked process p.
>>>    * p is forked by current.
>>>    *
>>> @@ -1799,10 +1813,7 @@ static void __sched_fork(unsigned long 
>>> clone_flags, struct task_struct *p)
>>>         RB_CLEAR_NODE(&p->dl.rb_node);
>>>       hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>>> -    p->dl.dl_runtime = p->dl.runtime = 0;
>>> -    p->dl.dl_deadline = p->dl.deadline = 0;
>>> -    p->dl.dl_period = 0;
>>> -    p->dl.flags = 0;
>>> +    __dl_clear_params(p);
>>>         INIT_LIST_HEAD(&p->rt.run_list);
>>>   diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>>> index 255ce13..4a51b14 100644
>>> --- a/kernel/sched/deadline.c
>>> +++ b/kernel/sched/deadline.c
>>> @@ -1569,6 +1569,8 @@ static void switched_from_dl(struct rq *rq, 
>>> struct task_struct *p)
>>>       if (hrtimer_active(&p->dl.dl_timer) && !dl_policy(p->policy))
>>>           hrtimer_try_to_cancel(&p->dl.dl_timer);
>>>   +    __dl_clear_params(p);
>>> +
>>>   #ifdef CONFIG_SMP
>>>       /*
>>>        * Since this might be the only -deadline task on the rq,
>>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>>> index 579712f..4890484 100644
>>> --- a/kernel/sched/sched.h
>>> +++ b/kernel/sched/sched.h
>>> @@ -126,6 +126,9 @@ struct rt_bandwidth {
>>>       u64            rt_runtime;
>>>       struct hrtimer        rt_period_timer;
>>>   };
>>> +
>>> +void __dl_clear_params(struct task_struct *p);
>>> +
>>>   /*
>>>    * To keep the bandwidth of -deadline tasks and groups under control
>>>    * we need some place where:
>>
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-09-19  9:22 [PATCH 0/3] SCHED_DEADLINE fix AC and SMP scheduling Juri Lelli
  2014-09-19  9:22 ` [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class Juri Lelli
@ 2014-09-19  9:22 ` Juri Lelli
  2014-09-19 11:47     ` Daniel Wagner
                     ` (3 more replies)
  2014-09-19  9:22 ` [PATCH 3/3] sched/deadline: fix inter- exclusive cpusets migrations Juri Lelli
  2 siblings, 4 replies; 32+ messages in thread
From: Juri Lelli @ 2014-09-19  9:22 UTC (permalink / raw)
  To: peterz
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, daniel.wagner,
	vincent, luca.abeni, linux-kernel, Juri Lelli, Li Zefan, cgroups

Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
affinity (performing what is commonly called clustered scheduling).
Unfortunately, such thing is currently broken for two reasons:

 - No check is performed when the user tries to attach a task to
   an exlusive cpuset (recall that exclusive cpusets have an
   associated maximum allowed bandwidth).

 - Bandwidths of source and destination cpusets are not correctly
   updated after a task is migrated between them.

This patch fixes both things at once, as they are opposite faces
of the same coin.

The check is performed in cpuset_can_attach(), as there aren't any
points of failure after that function. The updated is split in two
halves. We first reserve bandwidth in the destination cpuset, after
we pass the check in cpuset_can_attach(). And we then release
bandwidth from the source cpuset when the task's affinity is
actually changed. Even if there can be time windows when sched_setattr()
may erroneously fail in the source cpuset, we are fine with it, as
we can't perfom an atomic update of both cpusets at once.

Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: Vincent Legout <vincent@legout.info>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: linux-kernel@vger.kernel.org
Cc: cgroups@vger.kernel.org
---
 include/linux/sched.h   |  2 ++
 kernel/cpuset.c         | 13 ++-------
 kernel/sched/core.c     | 70 +++++++++++++++++++++++++++++++++++--------------
 kernel/sched/deadline.c | 25 ++++++++++++++++--
 kernel/sched/sched.h    | 19 ++++++++++++++
 5 files changed, 97 insertions(+), 32 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5c2c885..3a65995 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2029,6 +2029,8 @@ static inline void tsk_restore_flags(struct task_struct *task,
 	task->flags |= orig_flags & flags;
 }
 
+extern int task_can_attach(struct task_struct *p,
+			   const struct cpumask *cs_cpus_allowed);
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p,
 			       const struct cpumask *new_mask);
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 22874d7..ab9be24 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1428,17 +1428,8 @@ static int cpuset_can_attach(struct cgroup_subsys_state *css,
 		goto out_unlock;
 
 	cgroup_taskset_for_each(task, tset) {
-		/*
-		 * Kthreads which disallow setaffinity shouldn't be moved
-		 * to a new cpuset; we don't want to change their cpu
-		 * affinity and isolating such threads by their set of
-		 * allowed nodes is unnecessary.  Thus, cpusets are not
-		 * applicable for such threads.  This prevents checking for
-		 * success of set_cpus_allowed_ptr() on all attached tasks
-		 * before cpus_allowed may be changed.
-		 */
-		ret = -EINVAL;
-		if (task->flags & PF_NO_SETAFFINITY)
+		ret = task_can_attach(task, cs->cpus_allowed);
+		if (ret)
 			goto out_unlock;
 		ret = security_task_setscheduler(task);
 		if (ret)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 581a429..092143d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2013,25 +2013,6 @@ static inline int dl_bw_cpus(int i)
 }
 #endif
 
-static inline
-void __dl_clear(struct dl_bw *dl_b, u64 tsk_bw)
-{
-	dl_b->total_bw -= tsk_bw;
-}
-
-static inline
-void __dl_add(struct dl_bw *dl_b, u64 tsk_bw)
-{
-	dl_b->total_bw += tsk_bw;
-}
-
-static inline
-bool __dl_overflow(struct dl_bw *dl_b, int cpus, u64 old_bw, u64 new_bw)
-{
-	return dl_b->bw != -1 &&
-	       dl_b->bw * cpus < dl_b->total_bw - old_bw + new_bw;
-}
-
 /*
  * We must be sure that accepting a new task (or allowing changing the
  * parameters of an existing one) is consistent with the bandwidth
@@ -4606,6 +4587,57 @@ void init_idle(struct task_struct *idle, int cpu)
 #endif
 }
 
+int task_can_attach(struct task_struct *p,
+		    const struct cpumask *cs_cpus_allowed)
+{
+	int ret = 0;
+
+	/*
+	 * Kthreads which disallow setaffinity shouldn't be moved
+	 * to a new cpuset; we don't want to change their cpu
+	 * affinity and isolating such threads by their set of
+	 * allowed nodes is unnecessary.  Thus, cpusets are not
+	 * applicable for such threads.  This prevents checking for
+	 * success of set_cpus_allowed_ptr() on all attached tasks
+	 * before cpus_allowed may be changed.
+	 */
+	if (p->flags & PF_NO_SETAFFINITY) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+#ifdef CONFIG_SMP
+	if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
+					      cs_cpus_allowed)) {
+		unsigned int dest_cpu = cpumask_any_and(cpu_active_mask,
+							cs_cpus_allowed);
+		struct dl_bw *dl_b = dl_bw_of(dest_cpu);
+		bool overflow;
+		int cpus;
+		unsigned long flags;
+
+		raw_spin_lock_irqsave(&dl_b->lock, flags);
+		cpus = dl_bw_cpus(dest_cpu);
+		overflow = __dl_overflow(dl_b, cpus, 0, p->dl.dl_bw);
+		if (overflow)
+			ret = -EBUSY;
+		else {
+			/*
+			 * We reserve space for this task in the destination
+			 * root_domain, as we can't fail after this point.
+			 * We will free resources in the source root_domain
+			 * later on (see set_cpus_allowed_dl()).
+			 */
+			__dl_add(dl_b, p->dl.dl_bw);
+		}
+		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+
+	}
+#endif
+out:
+	return ret;
+}
+
 #ifdef CONFIG_SMP
 void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 4a51b14..56cb157 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1495,10 +1495,33 @@ static void set_cpus_allowed_dl(struct task_struct *p,
 				const struct cpumask *new_mask)
 {
 	struct rq *rq;
+	struct root_domain *src_rd;
 	int weight;
 
 	BUG_ON(!dl_task(p));
 
+	rq = task_rq(p);
+	src_rd = rq->rd;
+	/*
+	 * Migrating a SCHED_DEADLINE task between exclusive
+	 * cpusets (different root_domains) entails a bandwidth
+	 * update. We already made space for us in the destination
+	 * domain (see cpuset_can_attach()).
+	 */
+	if (!cpumask_intersects(src_rd->span, new_mask)) {
+		struct dl_bw *src_dl_b;
+
+		src_dl_b = dl_bw_of(cpu_of(rq));
+		/*
+		 * We now free resources of the root_domain we are migrating
+		 * off. In the worst case, sched_setattr() may temporary fail
+		 * until we complete the update.
+		 */
+		raw_spin_lock(&src_dl_b->lock);
+		__dl_clear(src_dl_b, p->dl.dl_bw);
+		raw_spin_unlock(&src_dl_b->lock);
+	}
+
 	/*
 	 * Update only if the task is actually running (i.e.,
 	 * it is on the rq AND it is not throttled).
@@ -1515,8 +1538,6 @@ static void set_cpus_allowed_dl(struct task_struct *p,
 	if ((p->nr_cpus_allowed > 1) == (weight > 1))
 		return;
 
-	rq = task_rq(p);
-
 	/*
 	 * The process used to be able to migrate OR it can now migrate
 	 */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4890484..a0e9e81 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -171,6 +171,25 @@ struct dl_bw {
 	u64 bw, total_bw;
 };
 
+static inline
+void __dl_clear(struct dl_bw *dl_b, u64 tsk_bw)
+{
+	dl_b->total_bw -= tsk_bw;
+}
+
+static inline
+void __dl_add(struct dl_bw *dl_b, u64 tsk_bw)
+{
+	dl_b->total_bw += tsk_bw;
+}
+
+static inline
+bool __dl_overflow(struct dl_bw *dl_b, int cpus, u64 old_bw, u64 new_bw)
+{
+	return dl_b->bw != -1 &&
+	       dl_b->bw * cpus < dl_b->total_bw - old_bw + new_bw;
+}
+
 extern struct mutex sched_domains_mutex;
 
 #ifdef CONFIG_CGROUP_SCHED
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-09-19  9:22 ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Juri Lelli
@ 2014-09-19 11:47     ` Daniel Wagner
       [not found]   ` <1411118561-26323-3-git-send-email-juri.lelli-5wv7dgnIgG8@public.gmane.org>
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Daniel Wagner @ 2014-09-19 11:47 UTC (permalink / raw)
  To: Juri Lelli, peterz
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, vincent,
	luca.abeni, linux-kernel, Li Zefan, cgroups

Hi,

On 09/19/2014 11:22 AM, Juri Lelli wrote:
> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
> affinity (performing what is commonly called clustered scheduling).
> Unfortunately, such thing is currently broken for two reasons:
> 
>  - No check is performed when the user tries to attach a task to
>    an exlusive cpuset (recall that exclusive cpusets have an
>    associated maximum allowed bandwidth).
> 
>  - Bandwidths of source and destination cpusets are not correctly
>    updated after a task is migrated between them.
> 
> This patch fixes both things at once, as they are opposite faces
> of the same coin.
> 
> The check is performed in cpuset_can_attach(), as there aren't any
> points of failure after that function. The updated is split in two
> halves. We first reserve bandwidth in the destination cpuset, after
> we pass the check in cpuset_can_attach(). And we then release
> bandwidth from the source cpuset when the task's affinity is
> actually changed. Even if there can be time windows when sched_setattr()
> may erroneously fail in the source cpuset, we are fine with it, as
> we can't perfom an atomic update of both cpusets at once.
> 
> Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>

Ack

I have no special test for this, I just let my test running which was
fixed by patch #1. Works fine though. I'll plan to write some test for this.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
@ 2014-09-19 11:47     ` Daniel Wagner
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Wagner @ 2014-09-19 11:47 UTC (permalink / raw)
  To: Juri Lelli, peterz
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, vincent,
	luca.abeni, linux-kernel, Li Zefan, cgroups

Hi,

On 09/19/2014 11:22 AM, Juri Lelli wrote:
> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
> affinity (performing what is commonly called clustered scheduling).
> Unfortunately, such thing is currently broken for two reasons:
> 
>  - No check is performed when the user tries to attach a task to
>    an exlusive cpuset (recall that exclusive cpusets have an
>    associated maximum allowed bandwidth).
> 
>  - Bandwidths of source and destination cpusets are not correctly
>    updated after a task is migrated between them.
> 
> This patch fixes both things at once, as they are opposite faces
> of the same coin.
> 
> The check is performed in cpuset_can_attach(), as there aren't any
> points of failure after that function. The updated is split in two
> halves. We first reserve bandwidth in the destination cpuset, after
> we pass the check in cpuset_can_attach(). And we then release
> bandwidth from the source cpuset when the task's affinity is
> actually changed. Even if there can be time windows when sched_setattr()
> may erroneously fail in the source cpuset, we are fine with it, as
> we can't perfom an atomic update of both cpusets at once.
> 
> Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>

Ack

I have no special test for this, I just let my test running which was
fixed by patch #1. Works fine though. I'll plan to write some test for this.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <541C17D6.5020608-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>]

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-09-19 11:47     ` Daniel Wagner
@ 2014-09-19 12:46         ` Juri Lelli
  -1 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-09-19 12:46 UTC (permalink / raw)
  To: Daniel Wagner, peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
  Cc: mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	raistlin-k2GhghHVRtY@public.gmane.org,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/@public.gmane.org,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	vincent-9z8vmPu0pS/iB9QmIjCX8w@public.gmane.org,
	luca.abeni-3IIOeSMMxS4@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi Daniel,

On 19/09/14 12:47, Daniel Wagner wrote:
> Hi,
> 
> On 09/19/2014 11:22 AM, Juri Lelli wrote:
>> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
>> affinity (performing what is commonly called clustered scheduling).
>> Unfortunately, such thing is currently broken for two reasons:
>>
>>  - No check is performed when the user tries to attach a task to
>>    an exlusive cpuset (recall that exclusive cpusets have an
>>    associated maximum allowed bandwidth).
>>
>>  - Bandwidths of source and destination cpusets are not correctly
>>    updated after a task is migrated between them.
>>
>> This patch fixes both things at once, as they are opposite faces
>> of the same coin.
>>
>> The check is performed in cpuset_can_attach(), as there aren't any
>> points of failure after that function. The updated is split in two
>> halves. We first reserve bandwidth in the destination cpuset, after
>> we pass the check in cpuset_can_attach(). And we then release
>> bandwidth from the source cpuset when the task's affinity is
>> actually changed. Even if there can be time windows when sched_setattr()
>> may erroneously fail in the source cpuset, we are fine with it, as
>> we can't perfom an atomic update of both cpusets at once.
>>
>> Reported-by: Daniel Wagner <daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>
> 
> Ack
> 
> I have no special test for this, I just let my test running which was
> fixed by patch #1. Works fine though. I'll plan to write some test for this.
> 

Ok, thanks. Just mind that the problem fixed by patch 3/3 may sometime
affect this too. I should have definitely put 3/3 on top of the patchset :/.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
@ 2014-09-19 12:46         ` Juri Lelli
  0 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-09-19 12:46 UTC (permalink / raw)
  To: Daniel Wagner, peterz@infradead.org
  Cc: mingo@redhat.com, juri.lelli@gmail.com, raistlin@linux.it,
	michael@amarulasolutions.com, fchecconi@gmail.com,
	vincent@legout.info, luca.abeni@unitn.it,
	linux-kernel@vger.kernel.org, Li Zefan, cgroups@vger.kernel.org

Hi Daniel,

On 19/09/14 12:47, Daniel Wagner wrote:
> Hi,
> 
> On 09/19/2014 11:22 AM, Juri Lelli wrote:
>> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
>> affinity (performing what is commonly called clustered scheduling).
>> Unfortunately, such thing is currently broken for two reasons:
>>
>>  - No check is performed when the user tries to attach a task to
>>    an exlusive cpuset (recall that exclusive cpusets have an
>>    associated maximum allowed bandwidth).
>>
>>  - Bandwidths of source and destination cpusets are not correctly
>>    updated after a task is migrated between them.
>>
>> This patch fixes both things at once, as they are opposite faces
>> of the same coin.
>>
>> The check is performed in cpuset_can_attach(), as there aren't any
>> points of failure after that function. The updated is split in two
>> halves. We first reserve bandwidth in the destination cpuset, after
>> we pass the check in cpuset_can_attach(). And we then release
>> bandwidth from the source cpuset when the task's affinity is
>> actually changed. Even if there can be time windows when sched_setattr()
>> may erroneously fail in the source cpuset, we are fine with it, as
>> we can't perfom an atomic update of both cpusets at once.
>>
>> Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
> 
> Ack
> 
> I have no special test for this, I just let my test running which was
> fixed by patch #1. Works fine though. I'll plan to write some test for this.
> 

Ok, thanks. Just mind that the problem fixed by patch 3/3 may sometime
affect this too. I should have definitely put 3/3 on top of the patchset :/.

Thanks,

- Juri


^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <1411118561-26323-3-git-send-email-juri.lelli-5wv7dgnIgG8@public.gmane.org>]

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-09-19  9:22 ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Juri Lelli
@ 2014-09-19 21:25       ` Peter Zijlstra
       [not found]   ` <1411118561-26323-3-git-send-email-juri.lelli-5wv7dgnIgG8@public.gmane.org>
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2014-09-19 21:25 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo-H+wXaHxf7aLQT0dZR+AlfA, juri.lelli-Re5JQEeQqe8AvxtiuMwx3w,
	raistlin-k2GhghHVRtY, michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w,
	daniel.wagner-98C5kh4wR6ohFhg+JK9F0w,
	vincent-9z8vmPu0pS/iB9QmIjCX8w, luca.abeni-3IIOeSMMxS4,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA

On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
> affinity (performing what is commonly called clustered scheduling).
> Unfortunately, such thing is currently broken for two reasons:
> 
>  - No check is performed when the user tries to attach a task to
>    an exlusive cpuset (recall that exclusive cpusets have an
>    associated maximum allowed bandwidth).
> 
>  - Bandwidths of source and destination cpusets are not correctly
>    updated after a task is migrated between them.
> 
> This patch fixes both things at once, as they are opposite faces
> of the same coin.
> 
> The check is performed in cpuset_can_attach(), as there aren't any
> points of failure after that function. The updated is split in two
> halves. We first reserve bandwidth in the destination cpuset, after
> we pass the check in cpuset_can_attach(). And we then release
> bandwidth from the source cpuset when the task's affinity is
> actually changed. Even if there can be time windows when sched_setattr()
> may erroneously fail in the source cpuset, we are fine with it, as
> we can't perfom an atomic update of both cpusets at once.

The thing I cannot find is if we correctly deal with updates to the
cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
@ 2014-09-19 21:25       ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2014-09-19 21:25 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, daniel.wagner,
	vincent, luca.abeni, linux-kernel, Li Zefan, cgroups

On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
> affinity (performing what is commonly called clustered scheduling).
> Unfortunately, such thing is currently broken for two reasons:
> 
>  - No check is performed when the user tries to attach a task to
>    an exlusive cpuset (recall that exclusive cpusets have an
>    associated maximum allowed bandwidth).
> 
>  - Bandwidths of source and destination cpusets are not correctly
>    updated after a task is migrated between them.
> 
> This patch fixes both things at once, as they are opposite faces
> of the same coin.
> 
> The check is performed in cpuset_can_attach(), as there aren't any
> points of failure after that function. The updated is split in two
> halves. We first reserve bandwidth in the destination cpuset, after
> we pass the check in cpuset_can_attach(). And we then release
> bandwidth from the source cpuset when the task's affinity is
> actually changed. Even if there can be time windows when sched_setattr()
> may erroneously fail in the source cpuset, we are fine with it, as
> we can't perfom an atomic update of both cpusets at once.

The thing I cannot find is if we correctly deal with updates to the
cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.



^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <20140919212547.GG2832-IIpfhp3q70wB9AHHLWeGtNQXobZC6xk2@public.gmane.org>]

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-09-19 21:25       ` Peter Zijlstra
@ 2014-09-23  8:12           ` Juri Lelli
  -1 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-09-23  8:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	raistlin-k2GhghHVRtY@public.gmane.org,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/@public.gmane.org,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org,
	vincent-9z8vmPu0pS/iB9QmIjCX8w@public.gmane.org,
	luca.abeni-3IIOeSMMxS4@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi Peter,

On 19/09/14 22:25, Peter Zijlstra wrote:
> On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
>> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
>> affinity (performing what is commonly called clustered scheduling).
>> Unfortunately, such thing is currently broken for two reasons:
>>
>>  - No check is performed when the user tries to attach a task to
>>    an exlusive cpuset (recall that exclusive cpusets have an
>>    associated maximum allowed bandwidth).
>>
>>  - Bandwidths of source and destination cpusets are not correctly
>>    updated after a task is migrated between them.
>>
>> This patch fixes both things at once, as they are opposite faces
>> of the same coin.
>>
>> The check is performed in cpuset_can_attach(), as there aren't any
>> points of failure after that function. The updated is split in two
>> halves. We first reserve bandwidth in the destination cpuset, after
>> we pass the check in cpuset_can_attach(). And we then release
>> bandwidth from the source cpuset when the task's affinity is
>> actually changed. Even if there can be time windows when sched_setattr()
>> may erroneously fail in the source cpuset, we are fine with it, as
>> we can't perfom an atomic update of both cpusets at once.
> 
> The thing I cannot find is if we correctly deal with updates to the
> cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
> assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.
> 

Right, next week I should be able to properly test this.

Thanks a lot,

- Juri

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
@ 2014-09-23  8:12           ` Juri Lelli
  0 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-09-23  8:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo@redhat.com, juri.lelli@gmail.com, raistlin@linux.it,
	michael@amarulasolutions.com, fchecconi@gmail.com,
	daniel.wagner@bmw-carit.de, vincent@legout.info,
	luca.abeni@unitn.it, linux-kernel@vger.kernel.org, Li Zefan,
	cgroups@vger.kernel.org

Hi Peter,

On 19/09/14 22:25, Peter Zijlstra wrote:
> On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
>> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
>> affinity (performing what is commonly called clustered scheduling).
>> Unfortunately, such thing is currently broken for two reasons:
>>
>>  - No check is performed when the user tries to attach a task to
>>    an exlusive cpuset (recall that exclusive cpusets have an
>>    associated maximum allowed bandwidth).
>>
>>  - Bandwidths of source and destination cpusets are not correctly
>>    updated after a task is migrated between them.
>>
>> This patch fixes both things at once, as they are opposite faces
>> of the same coin.
>>
>> The check is performed in cpuset_can_attach(), as there aren't any
>> points of failure after that function. The updated is split in two
>> halves. We first reserve bandwidth in the destination cpuset, after
>> we pass the check in cpuset_can_attach(). And we then release
>> bandwidth from the source cpuset when the task's affinity is
>> actually changed. Even if there can be time windows when sched_setattr()
>> may erroneously fail in the source cpuset, we are fine with it, as
>> we can't perfom an atomic update of both cpusets at once.
> 
> The thing I cannot find is if we correctly deal with updates to the
> cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
> assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.
> 

Right, next week I should be able to properly test this.

Thanks a lot,

- Juri


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-09-19 21:25       ` Peter Zijlstra
@ 2014-10-07  8:59           ` Juri Lelli
  -1 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-10-07  8:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	raistlin-k2GhghHVRtY@public.gmane.org,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/@public.gmane.org,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org,
	vincent-9z8vmPu0pS/iB9QmIjCX8w@public.gmane.org,
	luca.abeni-3IIOeSMMxS4@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi Peter,

On 19/09/14 22:25, Peter Zijlstra wrote:
> On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
>> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
>> affinity (performing what is commonly called clustered scheduling).
>> Unfortunately, such thing is currently broken for two reasons:
>>
>>  - No check is performed when the user tries to attach a task to
>>    an exlusive cpuset (recall that exclusive cpusets have an
>>    associated maximum allowed bandwidth).
>>
>>  - Bandwidths of source and destination cpusets are not correctly
>>    updated after a task is migrated between them.
>>
>> This patch fixes both things at once, as they are opposite faces
>> of the same coin.
>>
>> The check is performed in cpuset_can_attach(), as there aren't any
>> points of failure after that function. The updated is split in two
>> halves. We first reserve bandwidth in the destination cpuset, after
>> we pass the check in cpuset_can_attach(). And we then release
>> bandwidth from the source cpuset when the task's affinity is
>> actually changed. Even if there can be time windows when sched_setattr()
>> may erroneously fail in the source cpuset, we are fine with it, as
>> we can't perfom an atomic update of both cpusets at once.
> 
> The thing I cannot find is if we correctly deal with updates to the
> cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
> assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.
> 

So, what follows should address the problem you describe.

Assuming you intended that we try to update masks as A:cpu0,3 and
B:cpu1,2, with what below we are able to check that removing cpu3
from B doesn't break guarantees. After that cpu3 can be put in A.

Does it make any sense?

Thanks,

- Juri

From 0e52c2211879eec92cd435e7717ab628f3b3084b Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli-5wv7dgnIgG8@public.gmane.org>
Date: Tue, 7 Oct 2014 09:52:11 +0100
Subject: [PATCH] sched/deadline: ensure that updates to exclusive cpusets
 don't break AC

Signed-off-by: Juri Lelli <juri.lelli-5wv7dgnIgG8@public.gmane.org>
---
 include/linux/sched.h |  2 ++
 kernel/cpuset.c       | 10 ++++++++++
 kernel/sched/core.c   | 19 +++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 163295f..24696d3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2041,6 +2041,8 @@ static inline void tsk_restore_flags(struct task_struct *task,
 	task->flags |= orig_flags & flags;
 }
 
+extern int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+				     const struct cpumask *trial);
 extern int task_can_attach(struct task_struct *p,
 			   const struct cpumask *cs_cpus_allowed);
 #ifdef CONFIG_SMP
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 4a7ebde..f96b47f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -506,6 +506,16 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 			goto out;
 	}
 
+	/*
+	 * We can't shrink if we won't have enough room for SCHED_DEADLINE
+	 * tasks.
+	 */
+	ret = -EBUSY;
+	if (is_cpu_exclusive(cur) &&
+	    !cpuset_cpumask_can_shrink(cur->cpus_allowed,
+				       trial->cpus_allowed))
+		goto out;
+
 	ret = 0;
 out:
 	rcu_read_unlock();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092143d..b4bd8fa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4587,6 +4587,25 @@ void init_idle(struct task_struct *idle, int cpu)
 #endif
 }
 
+int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+			      const struct cpumask *trial)
+{
+	int ret = 1, trial_cpus;
+	struct dl_bw *cur_dl_b;
+	unsigned long flags;
+
+	cur_dl_b = dl_bw_of(cpumask_any(cur));
+	trial_cpus = cpumask_weight(trial);
+
+	raw_spin_lock_irqsave(&cur_dl_b->lock, flags);
+	if (cur_dl_b->bw != -1 &&
+	    cur_dl_b->bw * trial_cpus < cur_dl_b->total_bw)
+		ret = 0;
+	raw_spin_unlock_irqrestore(&cur_dl_b->lock, flags);
+
+	return ret;
+}
+
 int task_can_attach(struct task_struct *p,
 		    const struct cpumask *cs_cpus_allowed)
 {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
@ 2014-10-07  8:59           ` Juri Lelli
  0 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-10-07  8:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo@redhat.com, juri.lelli@gmail.com, raistlin@linux.it,
	michael@amarulasolutions.com, fchecconi@gmail.com,
	daniel.wagner@bmw-carit.de, vincent@legout.info,
	luca.abeni@unitn.it, linux-kernel@vger.kernel.org, Li Zefan,
	cgroups@vger.kernel.org

Hi Peter,

On 19/09/14 22:25, Peter Zijlstra wrote:
> On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
>> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
>> affinity (performing what is commonly called clustered scheduling).
>> Unfortunately, such thing is currently broken for two reasons:
>>
>>  - No check is performed when the user tries to attach a task to
>>    an exlusive cpuset (recall that exclusive cpusets have an
>>    associated maximum allowed bandwidth).
>>
>>  - Bandwidths of source and destination cpusets are not correctly
>>    updated after a task is migrated between them.
>>
>> This patch fixes both things at once, as they are opposite faces
>> of the same coin.
>>
>> The check is performed in cpuset_can_attach(), as there aren't any
>> points of failure after that function. The updated is split in two
>> halves. We first reserve bandwidth in the destination cpuset, after
>> we pass the check in cpuset_can_attach(). And we then release
>> bandwidth from the source cpuset when the task's affinity is
>> actually changed. Even if there can be time windows when sched_setattr()
>> may erroneously fail in the source cpuset, we are fine with it, as
>> we can't perfom an atomic update of both cpusets at once.
> 
> The thing I cannot find is if we correctly deal with updates to the
> cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
> assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.
> 

So, what follows should address the problem you describe.

Assuming you intended that we try to update masks as A:cpu0,3 and
B:cpu1,2, with what below we are able to check that removing cpu3
from B doesn't break guarantees. After that cpu3 can be put in A.

Does it make any sense?

Thanks,

- Juri

>From 0e52c2211879eec92cd435e7717ab628f3b3084b Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli@arm.com>
Date: Tue, 7 Oct 2014 09:52:11 +0100
Subject: [PATCH] sched/deadline: ensure that updates to exclusive cpusets
 don't break AC

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
 include/linux/sched.h |  2 ++
 kernel/cpuset.c       | 10 ++++++++++
 kernel/sched/core.c   | 19 +++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 163295f..24696d3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2041,6 +2041,8 @@ static inline void tsk_restore_flags(struct task_struct *task,
 	task->flags |= orig_flags & flags;
 }
 
+extern int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+				     const struct cpumask *trial);
 extern int task_can_attach(struct task_struct *p,
 			   const struct cpumask *cs_cpus_allowed);
 #ifdef CONFIG_SMP
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 4a7ebde..f96b47f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -506,6 +506,16 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 			goto out;
 	}
 
+	/*
+	 * We can't shrink if we won't have enough room for SCHED_DEADLINE
+	 * tasks.
+	 */
+	ret = -EBUSY;
+	if (is_cpu_exclusive(cur) &&
+	    !cpuset_cpumask_can_shrink(cur->cpus_allowed,
+				       trial->cpus_allowed))
+		goto out;
+
 	ret = 0;
 out:
 	rcu_read_unlock();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092143d..b4bd8fa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4587,6 +4587,25 @@ void init_idle(struct task_struct *idle, int cpu)
 #endif
 }
 
+int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+			      const struct cpumask *trial)
+{
+	int ret = 1, trial_cpus;
+	struct dl_bw *cur_dl_b;
+	unsigned long flags;
+
+	cur_dl_b = dl_bw_of(cpumask_any(cur));
+	trial_cpus = cpumask_weight(trial);
+
+	raw_spin_lock_irqsave(&cur_dl_b->lock, flags);
+	if (cur_dl_b->bw != -1 &&
+	    cur_dl_b->bw * trial_cpus < cur_dl_b->total_bw)
+		ret = 0;
+	raw_spin_unlock_irqrestore(&cur_dl_b->lock, flags);
+
+	return ret;
+}
+
 int task_can_attach(struct task_struct *p,
 		    const struct cpumask *cs_cpus_allowed)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

[parent not found: <5433AB8A.7050908-5wv7dgnIgG8@public.gmane.org>]

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-10-07  8:59           ` Juri Lelli
@ 2014-10-07 12:31               ` Peter Zijlstra
  -1 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2014-10-07 12:31 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	raistlin-k2GhghHVRtY@public.gmane.org,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/@public.gmane.org,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org,
	vincent-9z8vmPu0pS/iB9QmIjCX8w@public.gmane.org,
	luca.abeni-3IIOeSMMxS4@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

On Tue, Oct 07, 2014 at 09:59:54AM +0100, Juri Lelli wrote:
> Hi Peter,
> 
> On 19/09/14 22:25, Peter Zijlstra wrote:
> > On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
> >> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
> >> affinity (performing what is commonly called clustered scheduling).
> >> Unfortunately, such thing is currently broken for two reasons:
> >>
> >>  - No check is performed when the user tries to attach a task to
> >>    an exlusive cpuset (recall that exclusive cpusets have an
> >>    associated maximum allowed bandwidth).
> >>
> >>  - Bandwidths of source and destination cpusets are not correctly
> >>    updated after a task is migrated between them.
> >>
> >> This patch fixes both things at once, as they are opposite faces
> >> of the same coin.
> >>
> >> The check is performed in cpuset_can_attach(), as there aren't any
> >> points of failure after that function. The updated is split in two
> >> halves. We first reserve bandwidth in the destination cpuset, after
> >> we pass the check in cpuset_can_attach(). And we then release
> >> bandwidth from the source cpuset when the task's affinity is
> >> actually changed. Even if there can be time windows when sched_setattr()
> >> may erroneously fail in the source cpuset, we are fine with it, as
> >> we can't perfom an atomic update of both cpusets at once.
> > 
> > The thing I cannot find is if we correctly deal with updates to the
> > cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
> > assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.
> > 
> 
> So, what follows should address the problem you describe.
> 
> Assuming you intended that we try to update masks as A:cpu0,3 and
> B:cpu1,2, with what below we are able to check that removing cpu3
> from B doesn't break guarantees. After that cpu3 can be put in A.
> 
> Does it make any sense?

Yeah, I think that about covers is. Could you write a changelog with it?

The reason I hadn't applied your patch #2 yet is because I thought it
triggered the splat reported in this thread. But later emails seem to
suggest this is a separate/pre-existing issue?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
@ 2014-10-07 12:31               ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2014-10-07 12:31 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo@redhat.com, juri.lelli@gmail.com, raistlin@linux.it,
	michael@amarulasolutions.com, fchecconi@gmail.com,
	daniel.wagner@bmw-carit.de, vincent@legout.info,
	luca.abeni@unitn.it, linux-kernel@vger.kernel.org, Li Zefan,
	cgroups@vger.kernel.org

On Tue, Oct 07, 2014 at 09:59:54AM +0100, Juri Lelli wrote:
> Hi Peter,
> 
> On 19/09/14 22:25, Peter Zijlstra wrote:
> > On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
> >> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
> >> affinity (performing what is commonly called clustered scheduling).
> >> Unfortunately, such thing is currently broken for two reasons:
> >>
> >>  - No check is performed when the user tries to attach a task to
> >>    an exlusive cpuset (recall that exclusive cpusets have an
> >>    associated maximum allowed bandwidth).
> >>
> >>  - Bandwidths of source and destination cpusets are not correctly
> >>    updated after a task is migrated between them.
> >>
> >> This patch fixes both things at once, as they are opposite faces
> >> of the same coin.
> >>
> >> The check is performed in cpuset_can_attach(), as there aren't any
> >> points of failure after that function. The updated is split in two
> >> halves. We first reserve bandwidth in the destination cpuset, after
> >> we pass the check in cpuset_can_attach(). And we then release
> >> bandwidth from the source cpuset when the task's affinity is
> >> actually changed. Even if there can be time windows when sched_setattr()
> >> may erroneously fail in the source cpuset, we are fine with it, as
> >> we can't perfom an atomic update of both cpusets at once.
> > 
> > The thing I cannot find is if we correctly deal with updates to the
> > cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
> > assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.
> > 
> 
> So, what follows should address the problem you describe.
> 
> Assuming you intended that we try to update masks as A:cpu0,3 and
> B:cpu1,2, with what below we are able to check that removing cpu3
> from B doesn't break guarantees. After that cpu3 can be put in A.
> 
> Does it make any sense?

Yeah, I think that about covers is. Could you write a changelog with it?

The reason I hadn't applied your patch #2 yet is because I thought it
triggered the splat reported in this thread. But later emails seem to
suggest this is a separate/pre-existing issue?



^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <20141007123109.GG19379-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>]

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-10-07 12:31               ` Peter Zijlstra
@ 2014-10-07 13:12                   ` Juri Lelli
  -1 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-10-07 13:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	raistlin-k2GhghHVRtY@public.gmane.org,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/@public.gmane.org,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	daniel.wagner-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org,
	vincent-9z8vmPu0pS/iB9QmIjCX8w@public.gmane.org,
	luca.abeni-3IIOeSMMxS4@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi Peter,

On 07/10/14 13:31, Peter Zijlstra wrote:
> On Tue, Oct 07, 2014 at 09:59:54AM +0100, Juri Lelli wrote:
>> Hi Peter,
>>
>> On 19/09/14 22:25, Peter Zijlstra wrote:
>>> On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
>>>> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
>>>> affinity (performing what is commonly called clustered scheduling).
>>>> Unfortunately, such thing is currently broken for two reasons:
>>>>
>>>>  - No check is performed when the user tries to attach a task to
>>>>    an exlusive cpuset (recall that exclusive cpusets have an
>>>>    associated maximum allowed bandwidth).
>>>>
>>>>  - Bandwidths of source and destination cpusets are not correctly
>>>>    updated after a task is migrated between them.
>>>>
>>>> This patch fixes both things at once, as they are opposite faces
>>>> of the same coin.
>>>>
>>>> The check is performed in cpuset_can_attach(), as there aren't any
>>>> points of failure after that function. The updated is split in two
>>>> halves. We first reserve bandwidth in the destination cpuset, after
>>>> we pass the check in cpuset_can_attach(). And we then release
>>>> bandwidth from the source cpuset when the task's affinity is
>>>> actually changed. Even if there can be time windows when sched_setattr()
>>>> may erroneously fail in the source cpuset, we are fine with it, as
>>>> we can't perfom an atomic update of both cpusets at once.
>>>
>>> The thing I cannot find is if we correctly deal with updates to the
>>> cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
>>> assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.
>>>
>>
>> So, what follows should address the problem you describe.
>>
>> Assuming you intended that we try to update masks as A:cpu0,3 and
>> B:cpu1,2, with what below we are able to check that removing cpu3
>> from B doesn't break guarantees. After that cpu3 can be put in A.
>>
>> Does it make any sense?
> 
> Yeah, I think that about covers is. Could you write a changelog with it?
> 

Sure. Didn't write it in the first instance because I though to squash it
in 2/3. But it is actually fixing a different issue, so please find it
below. 

> The reason I hadn't applied your patch #2 yet is because I thought it
> triggered the splat reported in this thread. But later emails seem to
> suggest this is a separate/pre-existing issue?
> 

Right. I think that is a separate (PI related) issue. But we could also
wait for that to be nailed down and then apply all the fixes at once.
As you think is better.

Thanks,

- Juri

From af75e5f56c9eafacfddad79485149e2e2cb889b9 Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli-5wv7dgnIgG8@public.gmane.org>
Date: Tue, 7 Oct 2014 09:52:11 +0100
Subject: [PATCH] sched/deadline: ensure that updates to exclusive cpusets
 don't break AC

How we deal with updates to exclusive cpusets is currently broken.
As an example, suppose we have an exclusive cpuset composed of
two cpus: A[cpu0,cpu1]. We can assign SCHED_DEADLINE task to it
up to the allowed bandwidth. If we want now to modify cpusetA's
cpumask, we have to check that removing a cpu's amount of
bandwidth doesn't break AC guarantees. This thing isn't checked
in the current code.

This patch fixes the problem above, denying an update if the
new cpumask won't have enough bandwidth for SCHED_DEADLINE tasks
that are currently active.

Signed-off-by: Juri Lelli <juri.lelli-5wv7dgnIgG8@public.gmane.org>
---
 include/linux/sched.h |  2 ++
 kernel/cpuset.c       | 10 ++++++++++
 kernel/sched/core.c   | 19 +++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 163295f..24696d3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2041,6 +2041,8 @@ static inline void tsk_restore_flags(struct task_struct *task,
 	task->flags |= orig_flags & flags;
 }
 
+extern int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+				     const struct cpumask *trial);
 extern int task_can_attach(struct task_struct *p,
 			   const struct cpumask *cs_cpus_allowed);
 #ifdef CONFIG_SMP
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 4a7ebde..f96b47f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -506,6 +506,16 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 			goto out;
 	}
 
+	/*
+	 * We can't shrink if we won't have enough room for SCHED_DEADLINE
+	 * tasks.
+	 */
+	ret = -EBUSY;
+	if (is_cpu_exclusive(cur) &&
+	    !cpuset_cpumask_can_shrink(cur->cpus_allowed,
+				       trial->cpus_allowed))
+		goto out;
+
 	ret = 0;
 out:
 	rcu_read_unlock();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092143d..b4bd8fa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4587,6 +4587,25 @@ void init_idle(struct task_struct *idle, int cpu)
 #endif
 }
 
+int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+			      const struct cpumask *trial)
+{
+	int ret = 1, trial_cpus;
+	struct dl_bw *cur_dl_b;
+	unsigned long flags;
+
+	cur_dl_b = dl_bw_of(cpumask_any(cur));
+	trial_cpus = cpumask_weight(trial);
+
+	raw_spin_lock_irqsave(&cur_dl_b->lock, flags);
+	if (cur_dl_b->bw != -1 &&
+	    cur_dl_b->bw * trial_cpus < cur_dl_b->total_bw)
+		ret = 0;
+	raw_spin_unlock_irqrestore(&cur_dl_b->lock, flags);
+
+	return ret;
+}
+
 int task_can_attach(struct task_struct *p,
 		    const struct cpumask *cs_cpus_allowed)
 {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
@ 2014-10-07 13:12                   ` Juri Lelli
  0 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2014-10-07 13:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo@redhat.com, juri.lelli@gmail.com, raistlin@linux.it,
	michael@amarulasolutions.com, fchecconi@gmail.com,
	daniel.wagner@bmw-carit.de, vincent@legout.info,
	luca.abeni@unitn.it, linux-kernel@vger.kernel.org, Li Zefan,
	cgroups@vger.kernel.org

Hi Peter,

On 07/10/14 13:31, Peter Zijlstra wrote:
> On Tue, Oct 07, 2014 at 09:59:54AM +0100, Juri Lelli wrote:
>> Hi Peter,
>>
>> On 19/09/14 22:25, Peter Zijlstra wrote:
>>> On Fri, Sep 19, 2014 at 10:22:40AM +0100, Juri Lelli wrote:
>>>> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
>>>> affinity (performing what is commonly called clustered scheduling).
>>>> Unfortunately, such thing is currently broken for two reasons:
>>>>
>>>>  - No check is performed when the user tries to attach a task to
>>>>    an exlusive cpuset (recall that exclusive cpusets have an
>>>>    associated maximum allowed bandwidth).
>>>>
>>>>  - Bandwidths of source and destination cpusets are not correctly
>>>>    updated after a task is migrated between them.
>>>>
>>>> This patch fixes both things at once, as they are opposite faces
>>>> of the same coin.
>>>>
>>>> The check is performed in cpuset_can_attach(), as there aren't any
>>>> points of failure after that function. The updated is split in two
>>>> halves. We first reserve bandwidth in the destination cpuset, after
>>>> we pass the check in cpuset_can_attach(). And we then release
>>>> bandwidth from the source cpuset when the task's affinity is
>>>> actually changed. Even if there can be time windows when sched_setattr()
>>>> may erroneously fail in the source cpuset, we are fine with it, as
>>>> we can't perfom an atomic update of both cpusets at once.
>>>
>>> The thing I cannot find is if we correctly deal with updates to the
>>> cpuset. Say we first setup 2 (exclusive) sets A:cpu0 B:cpu1-3. Then
>>> assign tasks and then update the cpu masks like: B:cpu2,3, A:cpu1,2.
>>>
>>
>> So, what follows should address the problem you describe.
>>
>> Assuming you intended that we try to update masks as A:cpu0,3 and
>> B:cpu1,2, with what below we are able to check that removing cpu3
>> from B doesn't break guarantees. After that cpu3 can be put in A.
>>
>> Does it make any sense?
> 
> Yeah, I think that about covers is. Could you write a changelog with it?
> 

Sure. Didn't write it in the first instance because I though to squash it
in 2/3. But it is actually fixing a different issue, so please find it
below. 

> The reason I hadn't applied your patch #2 yet is because I thought it
> triggered the splat reported in this thread. But later emails seem to
> suggest this is a separate/pre-existing issue?
> 

Right. I think that is a separate (PI related) issue. But we could also
wait for that to be nailed down and then apply all the fixes at once.
As you think is better.

Thanks,

- Juri

>From af75e5f56c9eafacfddad79485149e2e2cb889b9 Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli@arm.com>
Date: Tue, 7 Oct 2014 09:52:11 +0100
Subject: [PATCH] sched/deadline: ensure that updates to exclusive cpusets
 don't break AC

How we deal with updates to exclusive cpusets is currently broken.
As an example, suppose we have an exclusive cpuset composed of
two cpus: A[cpu0,cpu1]. We can assign SCHED_DEADLINE task to it
up to the allowed bandwidth. If we want now to modify cpusetA's
cpumask, we have to check that removing a cpu's amount of
bandwidth doesn't break AC guarantees. This thing isn't checked
in the current code.

This patch fixes the problem above, denying an update if the
new cpumask won't have enough bandwidth for SCHED_DEADLINE tasks
that are currently active.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
 include/linux/sched.h |  2 ++
 kernel/cpuset.c       | 10 ++++++++++
 kernel/sched/core.c   | 19 +++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 163295f..24696d3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2041,6 +2041,8 @@ static inline void tsk_restore_flags(struct task_struct *task,
 	task->flags |= orig_flags & flags;
 }
 
+extern int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+				     const struct cpumask *trial);
 extern int task_can_attach(struct task_struct *p,
 			   const struct cpumask *cs_cpus_allowed);
 #ifdef CONFIG_SMP
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 4a7ebde..f96b47f 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -506,6 +506,16 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 			goto out;
 	}
 
+	/*
+	 * We can't shrink if we won't have enough room for SCHED_DEADLINE
+	 * tasks.
+	 */
+	ret = -EBUSY;
+	if (is_cpu_exclusive(cur) &&
+	    !cpuset_cpumask_can_shrink(cur->cpus_allowed,
+				       trial->cpus_allowed))
+		goto out;
+
 	ret = 0;
 out:
 	rcu_read_unlock();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 092143d..b4bd8fa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4587,6 +4587,25 @@ void init_idle(struct task_struct *idle, int cpu)
 #endif
 }
 
+int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+			      const struct cpumask *trial)
+{
+	int ret = 1, trial_cpus;
+	struct dl_bw *cur_dl_b;
+	unsigned long flags;
+
+	cur_dl_b = dl_bw_of(cpumask_any(cur));
+	trial_cpus = cpumask_weight(trial);
+
+	raw_spin_lock_irqsave(&cur_dl_b->lock, flags);
+	if (cur_dl_b->bw != -1 &&
+	    cur_dl_b->bw * trial_cpus < cur_dl_b->total_bw)
+		ret = 0;
+	raw_spin_unlock_irqrestore(&cur_dl_b->lock, flags);
+
+	return ret;
+}
+
 int task_can_attach(struct task_struct *p,
 		    const struct cpumask *cs_cpus_allowed)
 {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [tip:sched/core] sched/deadline: Ensure that updates to exclusive cpusets don't break AC
  2014-10-07 13:12                   ` Juri Lelli
  (?)
@ 2014-10-28 11:07                   ` tip-bot for Juri Lelli
  -1 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Juri Lelli @ 2014-10-28 11:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, torvalds, mingo, peterz, lizefan, linux-kernel, hpa,
	juri.lelli

Commit-ID:  f82f80426f7afcf55953924e71555984a4bd6ce6
Gitweb:     http://git.kernel.org/tip/f82f80426f7afcf55953924e71555984a4bd6ce6
Author:     Juri Lelli <juri.lelli@arm.com>
AuthorDate: Tue, 7 Oct 2014 09:52:11 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Oct 2014 10:48:00 +0100

sched/deadline: Ensure that updates to exclusive cpusets don't break AC

How we deal with updates to exclusive cpusets is currently broken.
As an example, suppose we have an exclusive cpuset composed of
two cpus: A[cpu0,cpu1]. We can assign SCHED_DEADLINE task to it
up to the allowed bandwidth. If we want now to modify cpusetA's
cpumask, we have to check that removing a cpu's amount of
bandwidth doesn't break AC guarantees. This thing isn't checked
in the current code.

This patch fixes the problem above, denying an update if the
new cpumask won't have enough bandwidth for SCHED_DEADLINE tasks
that are currently active.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Link: http://lkml.kernel.org/r/5433E6AF.5080105@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h |  2 ++
 kernel/cpuset.c       | 10 ++++++++++
 kernel/sched/core.c   | 19 +++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1d1fa08..320a977 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2052,6 +2052,8 @@ static inline void tsk_restore_flags(struct task_struct *task,
 	task->flags |= orig_flags & flags;
 }
 
+extern int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+				     const struct cpumask *trial);
 extern int task_can_attach(struct task_struct *p,
 			   const struct cpumask *cs_cpus_allowed);
 #ifdef CONFIG_SMP
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 7af8577..723cfc9 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -506,6 +506,16 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 			goto out;
 	}
 
+	/*
+	 * We can't shrink if we won't have enough room for SCHED_DEADLINE
+	 * tasks.
+	 */
+	ret = -EBUSY;
+	if (is_cpu_exclusive(cur) &&
+	    !cpuset_cpumask_can_shrink(cur->cpus_allowed,
+				       trial->cpus_allowed))
+		goto out;
+
 	ret = 0;
 out:
 	rcu_read_unlock();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9993fee..0456a55 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4650,6 +4650,25 @@ void init_idle(struct task_struct *idle, int cpu)
 #endif
 }
 
+int cpuset_cpumask_can_shrink(const struct cpumask *cur,
+			      const struct cpumask *trial)
+{
+	int ret = 1, trial_cpus;
+	struct dl_bw *cur_dl_b;
+	unsigned long flags;
+
+	cur_dl_b = dl_bw_of(cpumask_any(cur));
+	trial_cpus = cpumask_weight(trial);
+
+	raw_spin_lock_irqsave(&cur_dl_b->lock, flags);
+	if (cur_dl_b->bw != -1 &&
+	    cur_dl_b->bw * trial_cpus < cur_dl_b->total_bw)
+		ret = 0;
+	raw_spin_unlock_irqrestore(&cur_dl_b->lock, flags);
+
+	return ret;
+}
+
 int task_can_attach(struct task_struct *p,
 		    const struct cpumask *cs_cpus_allowed)
 {

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-09-19  9:22 ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Juri Lelli
  2014-09-19 11:47     ` Daniel Wagner
       [not found]   ` <1411118561-26323-3-git-send-email-juri.lelli-5wv7dgnIgG8@public.gmane.org>
@ 2014-09-22 19:24   ` Vincent Legout
  2014-09-23  8:09     ` Juri Lelli
  2014-10-28 11:07   ` [tip:sched/core] sched/deadline: Fix bandwidth check/ update " tip-bot for Juri Lelli
  3 siblings, 1 reply; 32+ messages in thread
From: Vincent Legout @ 2014-09-22 19:24 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, juri.lelli, raistlin, michael, fchecconi,
	daniel.wagner, luca.abeni, linux-kernel, Li Zefan, cgroups

Hello,

Juri Lelli <juri.lelli@arm.com> writes:

> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
> affinity (performing what is commonly called clustered scheduling).
> Unfortunately, such thing is currently broken for two reasons:
>
>  - No check is performed when the user tries to attach a task to
>    an exlusive cpuset (recall that exclusive cpusets have an
>    associated maximum allowed bandwidth).
>
>  - Bandwidths of source and destination cpusets are not correctly
>    updated after a task is migrated between them.
>
> This patch fixes both things at once, as they are opposite faces
> of the same coin.
>
> The check is performed in cpuset_can_attach(), as there aren't any
> points of failure after that function. The updated is split in two
> halves. We first reserve bandwidth in the destination cpuset, after
> we pass the check in cpuset_can_attach(). And we then release
> bandwidth from the source cpuset when the task's affinity is
> actually changed. Even if there can be time windows when sched_setattr()
> may erroneously fail in the source cpuset, we are fine with it, as
> we can't perfom an atomic update of both cpusets at once.

Thanks, this seems to fix the other problem I had. However, this bug,
which I never had before, now happens randomly (with or without patch
3/3):

Sep 19 09:54:37 starbuck kernel: [ 1309.728678] ------------[ cut here ]------------
Sep 19 09:54:37 starbuck kernel: [ 1309.728699] kernel BUG at kernel/sched/deadline.c:819!
Sep 19 09:54:37 starbuck kernel: [ 1309.728719] invalid opcode: 0000 [#1] PREEMPT SMP 
Sep 19 09:54:37 starbuck kernel: [ 1309.728744] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dm_crypt nfsd auth_rpcgss oid_registry exportfs nfs_acl nfs lockd sunrpc bridge stp llc lp coretemp kvm_intel kvm ppdev ioatdma microcode ipmi_si parport_pc lpc_ich dca mfd_core parport ipmi_msghandler joydev serio_raw hid_generic usbhid hid crc32c_intel psmouse e1000e ptp pps_core
Sep 19 09:54:37 starbuck kernel: [ 1309.728928] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 3.16.0+ #20
Sep 19 09:54:37 starbuck kernel: [ 1309.728950] Hardware name: empty empty/S7002, BIOS 'V1.10.B10   ' 05/03/2011
Sep 19 09:54:37 starbuck kernel: [ 1309.728977] task: ffff88023691c920 ti: ffff88023692c000 task.ti: ffff88023692c000
Sep 19 09:54:37 starbuck kernel: [ 1309.729003] RIP: 0010:[<ffffffff810a543e>]  [<ffffffff810a543e>] enqueue_task_dl+0x44e/0x450
Sep 19 09:54:37 starbuck kernel: [ 1309.729041] RSP: 0018:ffff88043fc23e68  EFLAGS: 00010082
Sep 19 09:54:37 starbuck kernel: [ 1309.729060] RAX: 0000000000000000 RBX: ffff880434edb0c0 RCX: ffff880434edb2f8
Sep 19 09:54:37 starbuck kernel: [ 1309.729086] RDX: 0000000000000008 RSI: ffff880434edb0c0 RDI: 0000000000000000
Sep 19 09:54:37 starbuck kernel: [ 1309.729140] RBP: ffff88043fc23ea8 R08: 0000000000000001 R09: 000002cb4aebb39d
Sep 19 09:54:37 starbuck kernel: [ 1309.729193] R10: 13955a8129438cf2 R11: 0000000000000202 R12: 0000000000000008
Sep 19 09:54:37 starbuck kernel: [ 1309.729247] R13: ffff88043fc33f00 R14: ffff88043fc2e0e0 R15: ffff880434edb2f8
Sep 19 09:54:37 starbuck kernel: [ 1309.729301] FS:  0000000000000000(0000) GS:ffff88043fc20000(0000) knlGS:0000000000000000
Sep 19 09:54:37 starbuck kernel: [ 1309.729383] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 19 09:54:37 starbuck kernel: [ 1309.729436] CR2: 0000000000000000 CR3: 0000000435a07000 CR4: 00000000000027e0

My script launches 6 processes and schedules them on 2 cpusets where
each cpuset contains only one cpu. It moves processes from one cpuset to
another and also updates their runtime. I can investigate more and try
to provide a short script to reproduce if needed.

Thanks,
Vincent

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-09-22 19:24   ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Vincent Legout
@ 2014-09-23  8:09     ` Juri Lelli
       [not found]       ` <54212AB7.3070406-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 32+ messages in thread
From: Juri Lelli @ 2014-09-23  8:09 UTC (permalink / raw)
  To: Vincent Legout
  Cc: peterz@infradead.org, mingo@redhat.com, juri.lelli@gmail.com,
	raistlin@linux.it, michael@amarulasolutions.com,
	fchecconi@gmail.com, daniel.wagner@bmw-carit.de,
	luca.abeni@unitn.it, linux-kernel@vger.kernel.org, Li Zefan,
	cgroups@vger.kernel.org

Hi Vincent,

On 22/09/14 20:24, Vincent Legout wrote:
> Hello,
> 
> Juri Lelli <juri.lelli@arm.com> writes:
> 
>> Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
>> affinity (performing what is commonly called clustered scheduling).
>> Unfortunately, such thing is currently broken for two reasons:
>>
>>  - No check is performed when the user tries to attach a task to
>>    an exlusive cpuset (recall that exclusive cpusets have an
>>    associated maximum allowed bandwidth).
>>
>>  - Bandwidths of source and destination cpusets are not correctly
>>    updated after a task is migrated between them.
>>
>> This patch fixes both things at once, as they are opposite faces
>> of the same coin.
>>
>> The check is performed in cpuset_can_attach(), as there aren't any
>> points of failure after that function. The updated is split in two
>> halves. We first reserve bandwidth in the destination cpuset, after
>> we pass the check in cpuset_can_attach(). And we then release
>> bandwidth from the source cpuset when the task's affinity is
>> actually changed. Even if there can be time windows when sched_setattr()
>> may erroneously fail in the source cpuset, we are fine with it, as
>> we can't perfom an atomic update of both cpusets at once.
> 
> Thanks, this seems to fix the other problem I had. However, this bug,
> which I never had before, now happens randomly (with or without patch
> 3/3):
> 
> Sep 19 09:54:37 starbuck kernel: [ 1309.728678] ------------[ cut here ]------------
> Sep 19 09:54:37 starbuck kernel: [ 1309.728699] kernel BUG at kernel/sched/deadline.c:819!
> Sep 19 09:54:37 starbuck kernel: [ 1309.728719] invalid opcode: 0000 [#1] PREEMPT SMP 
> Sep 19 09:54:37 starbuck kernel: [ 1309.728744] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dm_crypt nfsd auth_rpcgss oid_registry exportfs nfs_acl nfs lockd sunrpc bridge stp llc lp coretemp kvm_intel kvm ppdev ioatdma microcode ipmi_si parport_pc lpc_ich dca mfd_core parport ipmi_msghandler joydev serio_raw hid_generic usbhid hid crc32c_intel psmouse e1000e ptp pps_core
> Sep 19 09:54:37 starbuck kernel: [ 1309.728928] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 3.16.0+ #20
> Sep 19 09:54:37 starbuck kernel: [ 1309.728950] Hardware name: empty empty/S7002, BIOS 'V1.10.B10   ' 05/03/2011
> Sep 19 09:54:37 starbuck kernel: [ 1309.728977] task: ffff88023691c920 ti: ffff88023692c000 task.ti: ffff88023692c000
> Sep 19 09:54:37 starbuck kernel: [ 1309.729003] RIP: 0010:[<ffffffff810a543e>]  [<ffffffff810a543e>] enqueue_task_dl+0x44e/0x450
> Sep 19 09:54:37 starbuck kernel: [ 1309.729041] RSP: 0018:ffff88043fc23e68  EFLAGS: 00010082
> Sep 19 09:54:37 starbuck kernel: [ 1309.729060] RAX: 0000000000000000 RBX: ffff880434edb0c0 RCX: ffff880434edb2f8
> Sep 19 09:54:37 starbuck kernel: [ 1309.729086] RDX: 0000000000000008 RSI: ffff880434edb0c0 RDI: 0000000000000000
> Sep 19 09:54:37 starbuck kernel: [ 1309.729140] RBP: ffff88043fc23ea8 R08: 0000000000000001 R09: 000002cb4aebb39d
> Sep 19 09:54:37 starbuck kernel: [ 1309.729193] R10: 13955a8129438cf2 R11: 0000000000000202 R12: 0000000000000008
> Sep 19 09:54:37 starbuck kernel: [ 1309.729247] R13: ffff88043fc33f00 R14: ffff88043fc2e0e0 R15: ffff880434edb2f8
> Sep 19 09:54:37 starbuck kernel: [ 1309.729301] FS:  0000000000000000(0000) GS:ffff88043fc20000(0000) knlGS:0000000000000000
> Sep 19 09:54:37 starbuck kernel: [ 1309.729383] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Sep 19 09:54:37 starbuck kernel: [ 1309.729436] CR2: 0000000000000000 CR3: 0000000435a07000 CR4: 00000000000027e0
> 

This can be related to the problems Daniel is also experiencing.

> My script launches 6 processes and schedules them on 2 cpusets where
> each cpuset contains only one cpu. It moves processes from one cpuset to
> another and also updates their runtime. I can investigate more and try
> to provide a short script to reproduce if needed.
> 

I should be able to dig into this next week. But yes, in the meantime a
script would be useful to reproduce the problem.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <54212AB7.3070406-5wv7dgnIgG8@public.gmane.org>]

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
  2014-09-23  8:09     ` Juri Lelli
@ 2014-09-23 13:08           ` Daniel Wagner
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Wagner @ 2014-09-23 13:08 UTC (permalink / raw)
  To: Juri Lelli, Vincent Legout
  Cc: peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	juri.lelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	raistlin-k2GhghHVRtY@public.gmane.org,
	michael-dyjBcgdgk7Pe9wHmmfpqLFaTQe2KTcn/@public.gmane.org,
	fchecconi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	luca.abeni-3IIOeSMMxS4@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Li Zefan,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hi,

On 09/23/2014 10:09 AM, Juri Lelli wrote:
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728678] ------------[ cut here ]------------
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728699] kernel BUG at kernel/sched/deadline.c:819!
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728719] invalid opcode: 0000 [#1] PREEMPT SMP 
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728744] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dm_crypt nfsd auth_rpcgss oid_registry exportfs nfs_acl nfs lockd sunrpc bridge stp llc lp coretemp kvm_intel kvm ppdev ioatdma microcode ipmi_si parport_pc lpc_ich dca mfd_core parport ipmi_msghandler joydev serio_raw hid_generic usbhid hid crc32c_intel psmouse e1000e ptp pps_core
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728928] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 3.16.0+ #20
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728950] Hardware name: empty empty/S7002, BIOS 'V1.10.B10   ' 05/03/2011
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728977] task: ffff88023691c920 ti: ffff88023692c000 task.ti: ffff88023692c000
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729003] RIP: 0010:[<ffffffff810a543e>]  [<ffffffff810a543e>] enqueue_task_dl+0x44e/0x450
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729041] RSP: 0018:ffff88043fc23e68  EFLAGS: 00010082
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729060] RAX: 0000000000000000 RBX: ffff880434edb0c0 RCX: ffff880434edb2f8
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729086] RDX: 0000000000000008 RSI: ffff880434edb0c0 RDI: 0000000000000000
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729140] RBP: ffff88043fc23ea8 R08: 0000000000000001 R09: 000002cb4aebb39d
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729193] R10: 13955a8129438cf2 R11: 0000000000000202 R12: 0000000000000008
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729247] R13: ffff88043fc33f00 R14: ffff88043fc2e0e0 R15: ffff880434edb2f8
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729301] FS:  0000000000000000(0000) GS:ffff88043fc20000(0000) knlGS:0000000000000000
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729383] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729436] CR2: 0000000000000000 CR3: 0000000435a07000 CR4: 00000000000027e0
>>
> 
> This can be related to the problems Daniel is also experiencing.

I nailed my problem down to mutex with PI inheritance. This will trigger

  kernel BUG at kernel/sched/deadline.c:819!

reliable.

cheers,
daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets
@ 2014-09-23 13:08           ` Daniel Wagner
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Wagner @ 2014-09-23 13:08 UTC (permalink / raw)
  To: Juri Lelli, Vincent Legout
  Cc: peterz@infradead.org, mingo@redhat.com, juri.lelli@gmail.com,
	raistlin@linux.it, michael@amarulasolutions.com,
	fchecconi@gmail.com, luca.abeni@unitn.it,
	linux-kernel@vger.kernel.org, Li Zefan, cgroups@vger.kernel.org

Hi,

On 09/23/2014 10:09 AM, Juri Lelli wrote:
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728678] ------------[ cut here ]------------
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728699] kernel BUG at kernel/sched/deadline.c:819!
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728719] invalid opcode: 0000 [#1] PREEMPT SMP 
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728744] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dm_crypt nfsd auth_rpcgss oid_registry exportfs nfs_acl nfs lockd sunrpc bridge stp llc lp coretemp kvm_intel kvm ppdev ioatdma microcode ipmi_si parport_pc lpc_ich dca mfd_core parport ipmi_msghandler joydev serio_raw hid_generic usbhid hid crc32c_intel psmouse e1000e ptp pps_core
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728928] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 3.16.0+ #20
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728950] Hardware name: empty empty/S7002, BIOS 'V1.10.B10   ' 05/03/2011
>> Sep 19 09:54:37 starbuck kernel: [ 1309.728977] task: ffff88023691c920 ti: ffff88023692c000 task.ti: ffff88023692c000
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729003] RIP: 0010:[<ffffffff810a543e>]  [<ffffffff810a543e>] enqueue_task_dl+0x44e/0x450
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729041] RSP: 0018:ffff88043fc23e68  EFLAGS: 00010082
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729060] RAX: 0000000000000000 RBX: ffff880434edb0c0 RCX: ffff880434edb2f8
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729086] RDX: 0000000000000008 RSI: ffff880434edb0c0 RDI: 0000000000000000
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729140] RBP: ffff88043fc23ea8 R08: 0000000000000001 R09: 000002cb4aebb39d
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729193] R10: 13955a8129438cf2 R11: 0000000000000202 R12: 0000000000000008
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729247] R13: ffff88043fc33f00 R14: ffff88043fc2e0e0 R15: ffff880434edb2f8
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729301] FS:  0000000000000000(0000) GS:ffff88043fc20000(0000) knlGS:0000000000000000
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729383] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> Sep 19 09:54:37 starbuck kernel: [ 1309.729436] CR2: 0000000000000000 CR3: 0000000435a07000 CR4: 00000000000027e0
>>
> 
> This can be related to the problems Daniel is also experiencing.

I nailed my problem down to mutex with PI inheritance. This will trigger

  kernel BUG at kernel/sched/deadline.c:819!

reliable.

cheers,
daniel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [tip:sched/core] sched/deadline: Fix bandwidth check/ update when migrating tasks between exclusive cpusets
  2014-09-19  9:22 ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Juri Lelli
                     ` (2 preceding siblings ...)
  2014-09-22 19:24   ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Vincent Legout
@ 2014-10-28 11:07   ` tip-bot for Juri Lelli
  3 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Juri Lelli @ 2014-10-28 11:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: michael, mingo, juri.lelli, daniel.wagner, fchecconi, tglx, hpa,
	raistlin, linux-kernel, vincent, lizefan, torvalds, peterz

Commit-ID:  7f51412a415d87ea8598d14722fb31e4f5701257
Gitweb:     http://git.kernel.org/tip/7f51412a415d87ea8598d14722fb31e4f5701257
Author:     Juri Lelli <juri.lelli@arm.com>
AuthorDate: Fri, 19 Sep 2014 10:22:40 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Oct 2014 10:47:58 +0100

sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets

Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks
affinity (performing what is commonly called clustered scheduling).
Unfortunately, such thing is currently broken for two reasons:

 - No check is performed when the user tries to attach a task to
   an exlusive cpuset (recall that exclusive cpusets have an
   associated maximum allowed bandwidth).

 - Bandwidths of source and destination cpusets are not correctly
   updated after a task is migrated between them.

This patch fixes both things at once, as they are opposite faces
of the same coin.

The check is performed in cpuset_can_attach(), as there aren't any
points of failure after that function. The updated is split in two
halves. We first reserve bandwidth in the destination cpuset, after
we pass the check in cpuset_can_attach(). And we then release
bandwidth from the source cpuset when the task's affinity is
actually changed. Even if there can be time windows when sched_setattr()
may erroneously fail in the source cpuset, we are fine with it, as
we can't perfom an atomic update of both cpusets at once.

Reported-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Reported-by: Vincent Legout <vincent@legout.info>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: michael@amarulasolutions.com
Cc: luca.abeni@unitn.it
Cc: Li Zefan <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: cgroups@vger.kernel.org
Link: http://lkml.kernel.org/r/1411118561-26323-3-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h   |  2 ++
 kernel/cpuset.c         | 13 ++-------
 kernel/sched/core.c     | 70 +++++++++++++++++++++++++++++++++++--------------
 kernel/sched/deadline.c | 25 ++++++++++++++++--
 kernel/sched/sched.h    | 19 ++++++++++++++
 5 files changed, 97 insertions(+), 32 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5e344bb..1d1fa08 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2052,6 +2052,8 @@ static inline void tsk_restore_flags(struct task_struct *task,
 	task->flags |= orig_flags & flags;
 }
 
+extern int task_can_attach(struct task_struct *p,
+			   const struct cpumask *cs_cpus_allowed);
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p,
 			       const struct cpumask *new_mask);
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 1f107c7..7af8577 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -1429,17 +1429,8 @@ static int cpuset_can_attach(struct cgroup_subsys_state *css,
 		goto out_unlock;
 
 	cgroup_taskset_for_each(task, tset) {
-		/*
-		 * Kthreads which disallow setaffinity shouldn't be moved
-		 * to a new cpuset; we don't want to change their cpu
-		 * affinity and isolating such threads by their set of
-		 * allowed nodes is unnecessary.  Thus, cpusets are not
-		 * applicable for such threads.  This prevents checking for
-		 * success of set_cpus_allowed_ptr() on all attached tasks
-		 * before cpus_allowed may be changed.
-		 */
-		ret = -EINVAL;
-		if (task->flags & PF_NO_SETAFFINITY)
+		ret = task_can_attach(task, cs->cpus_allowed);
+		if (ret)
 			goto out_unlock;
 		ret = security_task_setscheduler(task);
 		if (ret)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5c067fd..9993fee 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2034,25 +2034,6 @@ static inline int dl_bw_cpus(int i)
 }
 #endif
 
-static inline
-void __dl_clear(struct dl_bw *dl_b, u64 tsk_bw)
-{
-	dl_b->total_bw -= tsk_bw;
-}
-
-static inline
-void __dl_add(struct dl_bw *dl_b, u64 tsk_bw)
-{
-	dl_b->total_bw += tsk_bw;
-}
-
-static inline
-bool __dl_overflow(struct dl_bw *dl_b, int cpus, u64 old_bw, u64 new_bw)
-{
-	return dl_b->bw != -1 &&
-	       dl_b->bw * cpus < dl_b->total_bw - old_bw + new_bw;
-}
-
 /*
  * We must be sure that accepting a new task (or allowing changing the
  * parameters of an existing one) is consistent with the bandwidth
@@ -4669,6 +4650,57 @@ void init_idle(struct task_struct *idle, int cpu)
 #endif
 }
 
+int task_can_attach(struct task_struct *p,
+		    const struct cpumask *cs_cpus_allowed)
+{
+	int ret = 0;
+
+	/*
+	 * Kthreads which disallow setaffinity shouldn't be moved
+	 * to a new cpuset; we don't want to change their cpu
+	 * affinity and isolating such threads by their set of
+	 * allowed nodes is unnecessary.  Thus, cpusets are not
+	 * applicable for such threads.  This prevents checking for
+	 * success of set_cpus_allowed_ptr() on all attached tasks
+	 * before cpus_allowed may be changed.
+	 */
+	if (p->flags & PF_NO_SETAFFINITY) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+#ifdef CONFIG_SMP
+	if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span,
+					      cs_cpus_allowed)) {
+		unsigned int dest_cpu = cpumask_any_and(cpu_active_mask,
+							cs_cpus_allowed);
+		struct dl_bw *dl_b = dl_bw_of(dest_cpu);
+		bool overflow;
+		int cpus;
+		unsigned long flags;
+
+		raw_spin_lock_irqsave(&dl_b->lock, flags);
+		cpus = dl_bw_cpus(dest_cpu);
+		overflow = __dl_overflow(dl_b, cpus, 0, p->dl.dl_bw);
+		if (overflow)
+			ret = -EBUSY;
+		else {
+			/*
+			 * We reserve space for this task in the destination
+			 * root_domain, as we can't fail after this point.
+			 * We will free resources in the source root_domain
+			 * later on (see set_cpus_allowed_dl()).
+			 */
+			__dl_add(dl_b, p->dl.dl_bw);
+		}
+		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+
+	}
+#endif
+out:
+	return ret;
+}
+
 #ifdef CONFIG_SMP
 /*
  * move_queued_task - move a queued task to new rq.
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 9d1e76a..8aaa971 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1517,10 +1517,33 @@ static void set_cpus_allowed_dl(struct task_struct *p,
 				const struct cpumask *new_mask)
 {
 	struct rq *rq;
+	struct root_domain *src_rd;
 	int weight;
 
 	BUG_ON(!dl_task(p));
 
+	rq = task_rq(p);
+	src_rd = rq->rd;
+	/*
+	 * Migrating a SCHED_DEADLINE task between exclusive
+	 * cpusets (different root_domains) entails a bandwidth
+	 * update. We already made space for us in the destination
+	 * domain (see cpuset_can_attach()).
+	 */
+	if (!cpumask_intersects(src_rd->span, new_mask)) {
+		struct dl_bw *src_dl_b;
+
+		src_dl_b = dl_bw_of(cpu_of(rq));
+		/*
+		 * We now free resources of the root_domain we are migrating
+		 * off. In the worst case, sched_setattr() may temporary fail
+		 * until we complete the update.
+		 */
+		raw_spin_lock(&src_dl_b->lock);
+		__dl_clear(src_dl_b, p->dl.dl_bw);
+		raw_spin_unlock(&src_dl_b->lock);
+	}
+
 	/*
 	 * Update only if the task is actually running (i.e.,
 	 * it is on the rq AND it is not throttled).
@@ -1537,8 +1560,6 @@ static void set_cpus_allowed_dl(struct task_struct *p,
 	if ((p->nr_cpus_allowed > 1) == (weight > 1))
 		return;
 
-	rq = task_rq(p);
-
 	/*
 	 * The process used to be able to migrate OR it can now migrate
 	 */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 57aacea..ec3917c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -176,6 +176,25 @@ struct dl_bw {
 	u64 bw, total_bw;
 };
 
+static inline
+void __dl_clear(struct dl_bw *dl_b, u64 tsk_bw)
+{
+	dl_b->total_bw -= tsk_bw;
+}
+
+static inline
+void __dl_add(struct dl_bw *dl_b, u64 tsk_bw)
+{
+	dl_b->total_bw += tsk_bw;
+}
+
+static inline
+bool __dl_overflow(struct dl_bw *dl_b, int cpus, u64 old_bw, u64 new_bw)
+{
+	return dl_b->bw != -1 &&
+	       dl_b->bw * cpus < dl_b->total_bw - old_bw + new_bw;
+}
+
 extern struct mutex sched_domains_mutex;
 
 #ifdef CONFIG_CGROUP_SCHED

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/3] sched/deadline: fix inter- exclusive cpusets migrations
  2014-09-19  9:22 [PATCH 0/3] SCHED_DEADLINE fix AC and SMP scheduling Juri Lelli
  2014-09-19  9:22 ` [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class Juri Lelli
  2014-09-19  9:22 ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Juri Lelli
@ 2014-09-19  9:22 ` Juri Lelli
  2014-09-24 14:55   ` [tip:sched/core] sched/deadline: Fix " tip-bot for Juri Lelli
  2 siblings, 1 reply; 32+ messages in thread
From: Juri Lelli @ 2014-09-19  9:22 UTC (permalink / raw)
  To: peterz
  Cc: mingo, juri.lelli, raistlin, michael, fchecconi, daniel.wagner,
	vincent, luca.abeni, linux-kernel, Juri Lelli

Users can perform clustered scheduling using the cpuset facility.
After an exclusive cpuset is created, task migrations happen only
between CPUs belonging to the same cpuset. Inter- cpuset migrations
can only happen when the user requires so, moving a task between
different cpusets. This behaviour is broken in SCHED_DEADLINE, as
currently spurious inter- cpuset migration may happen without
user intervention.

This patch fix the problem (and shuffles the code a bit to improve
clarity).

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Michael Trimarchi <michael@amarulasolutions.com>
Cc: Fabio Checconi <fchecconi@gmail.com>
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/cpudeadline.c | 4 +---
 kernel/sched/deadline.c    | 7 +++++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index bd95963..539ca3c 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -107,9 +107,7 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	int best_cpu = -1;
 	const struct sched_dl_entity *dl_se = &p->dl;
 
-	if (later_mask && cpumask_and(later_mask, cp->free_cpus,
-			&p->cpus_allowed) && cpumask_and(later_mask,
-			later_mask, cpu_active_mask)) {
+	if (later_mask && cpumask_and(later_mask, later_mask, cp->free_cpus)) {
 		best_cpu = cpumask_any(later_mask);
 		goto out;
 	} else if (cpumask_test_cpu(cpudl_maximum(cp), &p->cpus_allowed) &&
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 56cb157..2799441 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1169,6 +1169,13 @@ static int find_later_rq(struct task_struct *task)
 	if (task->nr_cpus_allowed == 1)
 		return -1;
 
+	/*
+	 * We have to consider system topology and task affinity
+	 * first, then we can look for a suitable cpu.
+	 */
+	cpumask_copy(later_mask, task_rq(task)->rd->span);
+	cpumask_and(later_mask, later_mask, cpu_active_mask);
+	cpumask_and(later_mask, later_mask, &task->cpus_allowed);
 	best_cpu = cpudl_find(&task_rq(task)->rd->cpudl,
 			task, later_mask);
 	if (best_cpu == -1)
-- 
2.1.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [tip:sched/core] sched/deadline: Fix inter- exclusive cpusets migrations
  2014-09-19  9:22 ` [PATCH 3/3] sched/deadline: fix inter- exclusive cpusets migrations Juri Lelli
@ 2014-09-24 14:55   ` tip-bot for Juri Lelli
  0 siblings, 0 replies; 32+ messages in thread
From: tip-bot for Juri Lelli @ 2014-09-24 14:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, peterz, juri.lelli, tglx

Commit-ID:  91ec6778ec4f963fcb2c2793610919b572f633b0
Gitweb:     http://git.kernel.org/tip/91ec6778ec4f963fcb2c2793610919b572f633b0
Author:     Juri Lelli <juri.lelli@arm.com>
AuthorDate: Fri, 19 Sep 2014 10:22:41 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Sep 2014 14:46:57 +0200

sched/deadline: Fix inter- exclusive cpusets migrations

Users can perform clustered scheduling using the cpuset facility.
After an exclusive cpuset is created, task migrations happen only
between CPUs belonging to the same cpuset. Inter- cpuset migrations
can only happen when the user requires so, moving a task between
different cpusets. This behaviour is broken in SCHED_DEADLINE, as
currently spurious inter- cpuset migration may happen without user
intervention.

This patch fix the problem (and shuffles the code a bit to improve
clarity).

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: raistlin@linux.it
Cc: michael@amarulasolutions.com
Cc: fchecconi@gmail.com
Cc: daniel.wagner@bmw-carit.de
Cc: vincent@legout.info
Cc: luca.abeni@unitn.it
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1411118561-26323-4-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/cpudeadline.c | 4 +---
 kernel/sched/deadline.c    | 7 +++++++
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index bd95963..539ca3c 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -107,9 +107,7 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	int best_cpu = -1;
 	const struct sched_dl_entity *dl_se = &p->dl;
 
-	if (later_mask && cpumask_and(later_mask, cp->free_cpus,
-			&p->cpus_allowed) && cpumask_and(later_mask,
-			later_mask, cpu_active_mask)) {
+	if (later_mask && cpumask_and(later_mask, later_mask, cp->free_cpus)) {
 		best_cpu = cpumask_any(later_mask);
 		goto out;
 	} else if (cpumask_test_cpu(cpudl_maximum(cp), &p->cpus_allowed) &&
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index efb9412..abfaf3d 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1164,6 +1164,13 @@ static int find_later_rq(struct task_struct *task)
 	if (task->nr_cpus_allowed == 1)
 		return -1;
 
+	/*
+	 * We have to consider system topology and task affinity
+	 * first, then we can look for a suitable cpu.
+	 */
+	cpumask_copy(later_mask, task_rq(task)->rd->span);
+	cpumask_and(later_mask, later_mask, cpu_active_mask);
+	cpumask_and(later_mask, later_mask, &task->cpus_allowed);
 	best_cpu = cpudl_find(&task_rq(task)->rd->cpudl,
 			task, later_mask);
 	if (best_cpu == -1)

^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2014-10-28 11:09 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-19  9:22 [PATCH 0/3] SCHED_DEADLINE fix AC and SMP scheduling Juri Lelli
2014-09-19  9:22 ` [PATCH 1/3] sched/deadline: clear dl_entity params when setscheduling to different class Juri Lelli
2014-09-19 11:44   ` Daniel Wagner
2014-09-19 12:43     ` Juri Lelli
2014-09-22 18:50   ` Vincent Legout
2014-09-24 14:54   ` [tip:sched/core] sched/deadline: Clear " tip-bot for Juri Lelli
2014-10-08 12:32   ` [PATCH 1/3] sched/deadline: clear " Wanpeng Li
2014-10-21 12:15     ` Wanpeng Li
2014-10-21 13:15       ` Juri Lelli
2014-09-19  9:22 ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Juri Lelli
2014-09-19 11:47   ` Daniel Wagner
2014-09-19 11:47     ` Daniel Wagner
     [not found]     ` <541C17D6.5020608-98C5kh4wR6ohFhg+JK9F0w@public.gmane.org>
2014-09-19 12:46       ` Juri Lelli
2014-09-19 12:46         ` Juri Lelli
     [not found]   ` <1411118561-26323-3-git-send-email-juri.lelli-5wv7dgnIgG8@public.gmane.org>
2014-09-19 21:25     ` Peter Zijlstra
2014-09-19 21:25       ` Peter Zijlstra
     [not found]       ` <20140919212547.GG2832-IIpfhp3q70wB9AHHLWeGtNQXobZC6xk2@public.gmane.org>
2014-09-23  8:12         ` Juri Lelli
2014-09-23  8:12           ` Juri Lelli
2014-10-07  8:59         ` Juri Lelli
2014-10-07  8:59           ` Juri Lelli
     [not found]           ` <5433AB8A.7050908-5wv7dgnIgG8@public.gmane.org>
2014-10-07 12:31             ` Peter Zijlstra
2014-10-07 12:31               ` Peter Zijlstra
     [not found]               ` <20141007123109.GG19379-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-10-07 13:12                 ` Juri Lelli
2014-10-07 13:12                   ` Juri Lelli
2014-10-28 11:07                   ` [tip:sched/core] sched/deadline: Ensure that updates to exclusive cpusets don't break AC tip-bot for Juri Lelli
2014-09-22 19:24   ` [PATCH 2/3] sched/deadline: fix bandwidth check/update when migrating tasks between exclusive cpusets Vincent Legout
2014-09-23  8:09     ` Juri Lelli
     [not found]       ` <54212AB7.3070406-5wv7dgnIgG8@public.gmane.org>
2014-09-23 13:08         ` Daniel Wagner
2014-09-23 13:08           ` Daniel Wagner
2014-10-28 11:07   ` [tip:sched/core] sched/deadline: Fix bandwidth check/ update " tip-bot for Juri Lelli
2014-09-19  9:22 ` [PATCH 3/3] sched/deadline: fix inter- exclusive cpusets migrations Juri Lelli
2014-09-24 14:55   ` [tip:sched/core] sched/deadline: Fix " tip-bot for Juri Lelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.