* [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
@ 2016-09-17 1:28 Joonwoo Park
2016-09-17 23:28 ` Wanpeng Li
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Joonwoo Park @ 2016-09-17 1:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel, Joonwoo Park
From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime). It makes use
of a per-cpu hrtimer resource and hence alarming that hrtimer should
be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
rather than being based on number of tasks in a particular cfs_rq (as
implemented currently). As a result, with current code, its possible for
a running task (which is the sole task in its cfs_rq) to be preempted
much after its ideal_runtime has elapsed, resulting in increased latency
for tasks in other cfs_rq on same cpu.
Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
---
joonwoop: Do we also need to update or remove if-statement inside
hrtick_update()?
I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
has large number of nr_running where slice is longer than sched_latency.
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4088eed..c55c566 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
WARN_ON(task_rq(p) != rq);
- if (cfs_rq->nr_running > 1) {
+ if (rq->cfs.h_nr_running > 1) {
u64 slice = sched_slice(cfs_rq, se);
u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
s64 delta = slice - ran;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-17 1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
@ 2016-09-17 23:28 ` Wanpeng Li
2016-09-19 18:08 ` Joonwoo Park
2016-09-19 8:21 ` Peter Zijlstra
2016-09-22 14:01 ` [tip:sched/core] sched/fair: " tip-bot for Srivatsa Vaddagiri
2 siblings, 1 reply; 8+ messages in thread
From: Wanpeng Li @ 2016-09-17 23:28 UTC (permalink / raw)
To: Joonwoo Park
Cc: Peter Zijlstra, Srivatsa Vaddagiri, Ingo Molnar,
linux-kernel@vger.kernel.org
2016-09-17 9:28 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
> From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
>
> SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> (just when they would have exceeded their ideal_runtime). It makes use
> of a per-cpu hrtimer resource and hence alarming that hrtimer should
> be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
> rather than being based on number of tasks in a particular cfs_rq (as
> implemented currently). As a result, with current code, its possible for
> a running task (which is the sole task in its cfs_rq) to be preempted
not be preempted much, right?
> much after its ideal_runtime has elapsed, resulting in increased latency
> for tasks in other cfs_rq on same cpu.
>
> Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
> tasks a CPU has across its various cfs_rqs.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
> ---
>
> joonwoop: Do we also need to update or remove if-statement inside
> hrtick_update()?
> I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> has large number of nr_running where slice is longer than sched_latency.
>
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4088eed..c55c566 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
>
> WARN_ON(task_rq(p) != rq);
>
> - if (cfs_rq->nr_running > 1) {
> + if (rq->cfs.h_nr_running > 1) {
> u64 slice = sched_slice(cfs_rq, se);
> u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> s64 delta = slice - ran;
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> hosted by The Linux Foundation
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-17 23:28 ` Wanpeng Li
@ 2016-09-19 18:08 ` Joonwoo Park
2016-09-19 23:30 ` Wanpeng Li
0 siblings, 1 reply; 8+ messages in thread
From: Joonwoo Park @ 2016-09-19 18:08 UTC (permalink / raw)
To: Wanpeng Li
Cc: Peter Zijlstra, Srivatsa Vaddagiri, Ingo Molnar,
linux-kernel@vger.kernel.org
On Sun, Sep 18, 2016 at 07:28:32AM +0800, Wanpeng Li wrote:
> 2016-09-17 9:28 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
> > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> >
> > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> > (just when they would have exceeded their ideal_runtime). It makes use
> > of a per-cpu hrtimer resource and hence alarming that hrtimer should
> > be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
> > rather than being based on number of tasks in a particular cfs_rq (as
> > implemented currently). As a result, with current code, its possible for
> > a running task (which is the sole task in its cfs_rq) to be preempted
>
> not be preempted much, right?
I don't think so....
By saying 'to be preempted much after its ideal_runtime has elapsed' I
wanted to describe the current suboptimal behaviour.
Thanks,
Joonwoo
>
> > much after its ideal_runtime has elapsed, resulting in increased latency
> > for tasks in other cfs_rq on same cpu.
> >
> > Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
> > tasks a CPU has across its various cfs_rqs.
> >
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> > Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
> > ---
> >
> > joonwoop: Do we also need to update or remove if-statement inside
> > hrtick_update()?
> > I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> > has large number of nr_running where slice is longer than sched_latency.
> >
> > kernel/sched/fair.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4088eed..c55c566 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> >
> > WARN_ON(task_rq(p) != rq);
> >
> > - if (cfs_rq->nr_running > 1) {
> > + if (rq->cfs.h_nr_running > 1) {
> > u64 slice = sched_slice(cfs_rq, se);
> > u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> > s64 delta = slice - ran;
> > --
> > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> > hosted by The Linux Foundation
> >
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-19 18:08 ` Joonwoo Park
@ 2016-09-19 23:30 ` Wanpeng Li
0 siblings, 0 replies; 8+ messages in thread
From: Wanpeng Li @ 2016-09-19 23:30 UTC (permalink / raw)
To: Joonwoo Park
Cc: Peter Zijlstra, Srivatsa Vaddagiri, Ingo Molnar,
linux-kernel@vger.kernel.org
2016-09-20 2:08 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
> On Sun, Sep 18, 2016 at 07:28:32AM +0800, Wanpeng Li wrote:
>> 2016-09-17 9:28 GMT+08:00 Joonwoo Park <joonwoop@codeaurora.org>:
>> > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
>> >
>> > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
>> > (just when they would have exceeded their ideal_runtime). It makes use
>> > of a per-cpu hrtimer resource and hence alarming that hrtimer should
>> > be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
>> > rather than being based on number of tasks in a particular cfs_rq (as
>> > implemented currently). As a result, with current code, its possible for
>> > a running task (which is the sole task in its cfs_rq) to be preempted
>>
>> not be preempted much, right?
>
> I don't think so....
> By saying 'to be preempted much after its ideal_runtime has elapsed' I
> wanted to describe the current suboptimal behaviour.
I also described the current suboptimal behaviour. A running task
(which is the sole task in its cfs_rq) as the test case 2 you
mentioned in another reply is preempted after 10ms since the scheduler
tick, however, it will be preempted after 1.5ms since the hrtick fire
in test case 1. That's what I mean "not be preempted much for test
case 2".
Regards,
Wanpeng Li
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-17 1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
2016-09-17 23:28 ` Wanpeng Li
@ 2016-09-19 8:21 ` Peter Zijlstra
2016-09-19 18:04 ` Joonwoo Park
2016-09-22 14:01 ` [tip:sched/core] sched/fair: " tip-bot for Srivatsa Vaddagiri
2 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2016-09-19 8:21 UTC (permalink / raw)
To: Joonwoo Park; +Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel
On Fri, Sep 16, 2016 at 06:28:51PM -0700, Joonwoo Park wrote:
> From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
>
> SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
Right, but I always found the overhead of the thing too high to be
really useful.
How come you're using this?
> joonwoop: Do we also need to update or remove if-statement inside
> hrtick_update()?
> I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> has large number of nr_running where slice is longer than sched_latency.
Right, you want that to match with whatever sched_slice() does.
> +++ b/kernel/sched/fair.c
> @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
>
> WARN_ON(task_rq(p) != rq);
>
> - if (cfs_rq->nr_running > 1) {
> + if (rq->cfs.h_nr_running > 1) {
> u64 slice = sched_slice(cfs_rq, se);
> u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> s64 delta = slice - ran;
Yeah, that looks right. I don't think I've ever tried hrtick with
cgroups enabled...
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-19 8:21 ` Peter Zijlstra
@ 2016-09-19 18:04 ` Joonwoo Park
2016-09-19 18:17 ` Joonwoo Park
0 siblings, 1 reply; 8+ messages in thread
From: Joonwoo Park @ 2016-09-19 18:04 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel
On Mon, Sep 19, 2016 at 10:21:58AM +0200, Peter Zijlstra wrote:
> On Fri, Sep 16, 2016 at 06:28:51PM -0700, Joonwoo Park wrote:
> > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> >
> > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
>
> Right, but I always found the overhead of the thing too high to be
> really useful.
>
> How come you're using this?
This patch was in our internal tree for decades so I unfortunately cannot
find actual usecase or history.
But I guess it was about excessive latency when there are number of CPU
bound tasks running on a CPU but on different cfs_rqs and CONFIG_HZ = 100.
See how I recreated :
* run 4 cpu hogs on the same cgroup [1] :
dd-960 [000] d..3 110.651060: sched_switch: prev_comm=dd prev_pid=960 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=959 next_prio=120
dd-959 [000] d..3 110.652566: sched_switch: prev_comm=dd prev_pid=959 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=961 next_prio=120
dd-961 [000] d..3 110.654072: sched_switch: prev_comm=dd prev_pid=961 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=962 next_prio=120
dd-962 [000] d..3 110.655578: sched_switch: prev_comm=dd prev_pid=962 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=960 next_prio=120
preempt every 1.5ms slice by hrtick.
* run 4 CPU hogs on 4 different cgroups [2] :
dd-964 [000] d..3 24.169873: sched_switch: prev_comm=dd prev_pid=964 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=966 next_prio=120
dd-966 [000] d..3 24.179873: sched_switch: prev_comm=dd prev_pid=966 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=965 next_prio=120
dd-965 [000] d..3 24.189873: sched_switch: prev_comm=dd prev_pid=965 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=967 next_prio=120
dd-967 [000] d..3 24.199873: sched_switch: prev_comm=dd prev_pid=967 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=964 next_prio=120
preempt every 10ms by scheduler tick so that all tasks suffers from 40ms preemption latency.
[1] :
dd if=/dev/zero of=/dev/zero &
dd if=/dev/zero of=/dev/zero &
dd if=/dev/zero of=/dev/zero &
dd if=/dev/zero of=/dev/zero &
[2] :
mount -t cgroup -o cpu cpu /sys/fs/cgroup
mkdir /sys/fs/cgroup/grp1
mkdir /sys/fs/cgroup/grp2
mkdir /sys/fs/cgroup/grp3
mkdir /sys/fs/cgroup/grp4
dd if=/dev/zero of=/dev/zero &
echo $! > /sys/fs/cgroup/grp1/tasks
dd if=/dev/zero of=/dev/zero &
echo $! > /sys/fs/cgroup/grp2/tasks
dd if=/dev/zero of=/dev/zero &
echo $! > /sys/fs/cgroup/grp3/tasks
dd if=/dev/zero of=/dev/zero &
echo $! > /sys/fs/cgroup/grp4/tasks
I could confirm this patch makes the latter behaves as same as the former in terms of preemption latency.
>
>
> > joonwoop: Do we also need to update or remove if-statement inside
> > hrtick_update()?
>
> > I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> > has large number of nr_running where slice is longer than sched_latency.
>
> Right, you want that to match with whatever sched_slice() does.
Cool. Thank you!
Thanks,
Joonwoo
>
> > +++ b/kernel/sched/fair.c
> > @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> >
> > WARN_ON(task_rq(p) != rq);
> >
> > - if (cfs_rq->nr_running > 1) {
> > + if (rq->cfs.h_nr_running > 1) {
> > u64 slice = sched_slice(cfs_rq, se);
> > u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> > s64 delta = slice - ran;
>
> Yeah, that looks right. I don't think I've ever tried hrtick with
> cgroups enabled...
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-19 18:04 ` Joonwoo Park
@ 2016-09-19 18:17 ` Joonwoo Park
0 siblings, 0 replies; 8+ messages in thread
From: Joonwoo Park @ 2016-09-19 18:17 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Srivatsa Vaddagiri, Ingo Molnar, linux-kernel
On Mon, Sep 19, 2016 at 11:04:49AM -0700, Joonwoo Park wrote:
> On Mon, Sep 19, 2016 at 10:21:58AM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 16, 2016 at 06:28:51PM -0700, Joonwoo Park wrote:
> > > From: Srivatsa Vaddagiri <vatsa@codeaurora.org>
> > >
> > > SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
> >
> > Right, but I always found the overhead of the thing too high to be
> > really useful.
> >
> > How come you're using this?
>
> This patch was in our internal tree for decades so I unfortunately cannot
> find actual usecase or history.
> But I guess it was about excessive latency when there are number of CPU
> bound tasks running on a CPU but on different cfs_rqs and CONFIG_HZ = 100.
>
> See how I recreated :
>
> * run 4 cpu hogs on the same cgroup [1] :
> dd-960 [000] d..3 110.651060: sched_switch: prev_comm=dd prev_pid=960 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=959 next_prio=120
> dd-959 [000] d..3 110.652566: sched_switch: prev_comm=dd prev_pid=959 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=961 next_prio=120
> dd-961 [000] d..3 110.654072: sched_switch: prev_comm=dd prev_pid=961 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=962 next_prio=120
> dd-962 [000] d..3 110.655578: sched_switch: prev_comm=dd prev_pid=962 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=960 next_prio=120
> preempt every 1.5ms slice by hrtick.
>
> * run 4 CPU hogs on 4 different cgroups [2] :
> dd-964 [000] d..3 24.169873: sched_switch: prev_comm=dd prev_pid=964 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=966 next_prio=120
> dd-966 [000] d..3 24.179873: sched_switch: prev_comm=dd prev_pid=966 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=965 next_prio=120
> dd-965 [000] d..3 24.189873: sched_switch: prev_comm=dd prev_pid=965 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=967 next_prio=120
> dd-967 [000] d..3 24.199873: sched_switch: prev_comm=dd prev_pid=967 prev_prio=120 prev_state=R+ ==> next_comm=dd next_pid=964 next_prio=120
> preempt every 10ms by scheduler tick so that all tasks suffers from 40ms preemption latency.
>
> [1] :
> dd if=/dev/zero of=/dev/zero &
Ugh.. of=/dev/null instead.
> dd if=/dev/zero of=/dev/zero &
> dd if=/dev/zero of=/dev/zero &
> dd if=/dev/zero of=/dev/zero &
>
> [2] :
> mount -t cgroup -o cpu cpu /sys/fs/cgroup
> mkdir /sys/fs/cgroup/grp1
> mkdir /sys/fs/cgroup/grp2
> mkdir /sys/fs/cgroup/grp3
> mkdir /sys/fs/cgroup/grp4
> dd if=/dev/zero of=/dev/zero &
> echo $! > /sys/fs/cgroup/grp1/tasks
> dd if=/dev/zero of=/dev/zero &
> echo $! > /sys/fs/cgroup/grp2/tasks
> dd if=/dev/zero of=/dev/zero &
> echo $! > /sys/fs/cgroup/grp3/tasks
> dd if=/dev/zero of=/dev/zero &
> echo $! > /sys/fs/cgroup/grp4/tasks
>
> I could confirm this patch makes the latter behaves as same as the former in terms of preemption latency.
>
> >
> >
> > > joonwoop: Do we also need to update or remove if-statement inside
> > > hrtick_update()?
> >
> > > I guess not because hrtick_update() doesn't want to start hrtick when cfs_rq
> > > has large number of nr_running where slice is longer than sched_latency.
> >
> > Right, you want that to match with whatever sched_slice() does.
>
> Cool. Thank you!
>
> Thanks,
> Joonwoo
>
> >
> > > +++ b/kernel/sched/fair.c
> > > @@ -4458,7 +4458,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> > >
> > > WARN_ON(task_rq(p) != rq);
> > >
> > > - if (cfs_rq->nr_running > 1) {
> > > + if (rq->cfs.h_nr_running > 1) {
> > > u64 slice = sched_slice(cfs_rq, se);
> > > u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
> > > s64 delta = slice - ran;
> >
> > Yeah, that looks right. I don't think I've ever tried hrtick with
> > cgroups enabled...
^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip:sched/core] sched/fair: Fix SCHED_HRTICK bug leading to late preemption of tasks
2016-09-17 1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
2016-09-17 23:28 ` Wanpeng Li
2016-09-19 8:21 ` Peter Zijlstra
@ 2016-09-22 14:01 ` tip-bot for Srivatsa Vaddagiri
2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Srivatsa Vaddagiri @ 2016-09-22 14:01 UTC (permalink / raw)
To: linux-tip-commits
Cc: torvalds, hpa, linux-kernel, tglx, mingo, peterz, joonwoop, vatsa
Commit-ID: 8bf46a39be910937d4c9e8d999a7438a7ae1a75b
Gitweb: http://git.kernel.org/tip/8bf46a39be910937d4c9e8d999a7438a7ae1a75b
Author: Srivatsa Vaddagiri <vatsa@codeaurora.org>
AuthorDate: Fri, 16 Sep 2016 18:28:51 -0700
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 22 Sep 2016 15:20:18 +0200
sched/fair: Fix SCHED_HRTICK bug leading to late preemption of tasks
SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime).
It makes use of a per-CPU hrtimer resource and hence arming that
hrtimer should be based on total SCHED_FAIR tasks a CPU has across its
various cfs_rqs, rather than being based on number of tasks in a
particular cfs_rq (as implemented currently).
As a result, with current code, its possible for a running task (which
is the sole task in its cfs_rq) to be preempted much after its
ideal_runtime has elapsed, resulting in increased latency for tasks in
other cfs_rq on same CPU.
Fix this by arming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1474075731-11550-1-git-send-email-joonwoop@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 986c10c..8fb4d19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4469,7 +4469,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
WARN_ON(task_rq(p) != rq);
- if (cfs_rq->nr_running > 1) {
+ if (rq->cfs.h_nr_running > 1) {
u64 slice = sched_slice(cfs_rq, se);
u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
s64 delta = slice - ran;
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-09-22 14:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-17 1:28 [PATCH] sched: Fix SCHED_HRTICK bug leading to late preemption of tasks Joonwoo Park
2016-09-17 23:28 ` Wanpeng Li
2016-09-19 18:08 ` Joonwoo Park
2016-09-19 23:30 ` Wanpeng Li
2016-09-19 8:21 ` Peter Zijlstra
2016-09-19 18:04 ` Joonwoo Park
2016-09-19 18:17 ` Joonwoo Park
2016-09-22 14:01 ` [tip:sched/core] sched/fair: " tip-bot for Srivatsa Vaddagiri
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.