* [PATCH] sched/core: Minor optimize ttwu_runnable()
@ 2022-11-02 10:23 Chengming Zhou
2022-11-04 18:19 ` Valentin Schneider
2022-11-08 9:20 ` Peter Zijlstra
0 siblings, 2 replies; 11+ messages in thread
From: Chengming Zhou @ 2022-11-02 10:23 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, bristot, vschneid
Cc: linux-kernel, Chengming Zhou
ttwu_runnable() is used as a fast wakeup path when the wakee task
is between set_current_state() and schedule(), in which case all
we need to do is change p->state back to TASK_RUNNING. So we don't
need to update_rq_clock() and check_preempt_curr() in this case.
Some performance numbers using mmtests/perfpipe on Intel Xeon server:
linux-next patched
Min Time 8.67 ( 0.00%) 8.66 ( 0.13%)
1st-qrtle Time 8.83 ( 0.00%) 8.72 ( 1.19%)
2nd-qrtle Time 8.90 ( 0.00%) 8.76 ( 1.57%)
3rd-qrtle Time 8.98 ( 0.00%) 8.82 ( 1.82%)
Max-1 Time 8.67 ( 0.00%) 8.66 ( 0.13%)
Max-5 Time 8.67 ( 0.00%) 8.66 ( 0.13%)
Max-10 Time 8.79 ( 0.00%) 8.69 ( 1.09%)
Max-90 Time 9.01 ( 0.00%) 8.84 ( 1.94%)
Max-95 Time 9.02 ( 0.00%) 8.85 ( 1.86%)
Max-99 Time 9.02 ( 0.00%) 8.88 ( 1.56%)
Max Time 9.59 ( 0.00%) 8.89 ( 7.29%)
Amean Time 8.92 ( 0.00%) 8.77 * 1.65%*
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
kernel/sched/core.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 87c9cdf37a26..3785418de127 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3718,9 +3718,8 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
rq = __task_rq_lock(p, &rf);
if (task_on_rq_queued(p)) {
- /* check_preempt_curr() may use rq clock */
- update_rq_clock(rq);
- ttwu_do_wakeup(rq, p, wake_flags, &rf);
+ WRITE_ONCE(p->__state, TASK_RUNNING);
+ trace_sched_wakeup(p);
ret = 1;
}
__task_rq_unlock(rq, &rf);
--
2.37.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-02 10:23 [PATCH] sched/core: Minor optimize ttwu_runnable() Chengming Zhou @ 2022-11-04 18:19 ` Valentin Schneider 2022-11-05 1:34 ` Chengming Zhou 2022-11-08 9:20 ` Peter Zijlstra 1 sibling, 1 reply; 11+ messages in thread From: Valentin Schneider @ 2022-11-04 18:19 UTC (permalink / raw) To: Chengming Zhou, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot Cc: linux-kernel, Chengming Zhou On 02/11/22 18:23, Chengming Zhou wrote: > ttwu_runnable() is used as a fast wakeup path when the wakee task > is between set_current_state() and schedule(), in which case all > we need to do is change p->state back to TASK_RUNNING. So we don't > need to update_rq_clock() and check_preempt_curr() in this case. > > Some performance numbers using mmtests/perfpipe on Intel Xeon server: > > linux-next patched > Min Time 8.67 ( 0.00%) 8.66 ( 0.13%) > 1st-qrtle Time 8.83 ( 0.00%) 8.72 ( 1.19%) > 2nd-qrtle Time 8.90 ( 0.00%) 8.76 ( 1.57%) > 3rd-qrtle Time 8.98 ( 0.00%) 8.82 ( 1.82%) > Max-1 Time 8.67 ( 0.00%) 8.66 ( 0.13%) > Max-5 Time 8.67 ( 0.00%) 8.66 ( 0.13%) > Max-10 Time 8.79 ( 0.00%) 8.69 ( 1.09%) > Max-90 Time 9.01 ( 0.00%) 8.84 ( 1.94%) > Max-95 Time 9.02 ( 0.00%) 8.85 ( 1.86%) > Max-99 Time 9.02 ( 0.00%) 8.88 ( 1.56%) > Max Time 9.59 ( 0.00%) 8.89 ( 7.29%) > Amean Time 8.92 ( 0.00%) 8.77 * 1.65%* > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > kernel/sched/core.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 87c9cdf37a26..3785418de127 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3718,9 +3718,8 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags) > > rq = __task_rq_lock(p, &rf); > if (task_on_rq_queued(p)) { > - /* check_preempt_curr() may use rq clock */ > - update_rq_clock(rq); > - ttwu_do_wakeup(rq, p, wake_flags, &rf); > + WRITE_ONCE(p->__state, TASK_RUNNING); > + trace_sched_wakeup(p); This also loses the class->task_woken() call, AFAICT we could have !p->on_cpu here (e.g. an IRQ hit before the task got to schedule()), but then again if there is a need to push we should have done that at the IRQ preempt via set_next_task_{rt, dl}()... So I'm starting to think this is OK, but that needs elaborating in the changelog. > ret = 1; > } > __task_rq_unlock(rq, &rf); > -- > 2.37.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-04 18:19 ` Valentin Schneider @ 2022-11-05 1:34 ` Chengming Zhou 2022-11-07 12:56 ` Valentin Schneider 0 siblings, 1 reply; 11+ messages in thread From: Chengming Zhou @ 2022-11-05 1:34 UTC (permalink / raw) To: Valentin Schneider, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot Cc: linux-kernel On 2022/11/5 02:19, Valentin Schneider wrote: > On 02/11/22 18:23, Chengming Zhou wrote: >> ttwu_runnable() is used as a fast wakeup path when the wakee task >> is between set_current_state() and schedule(), in which case all >> we need to do is change p->state back to TASK_RUNNING. So we don't >> need to update_rq_clock() and check_preempt_curr() in this case. >> >> Some performance numbers using mmtests/perfpipe on Intel Xeon server: >> >> linux-next patched >> Min Time 8.67 ( 0.00%) 8.66 ( 0.13%) >> 1st-qrtle Time 8.83 ( 0.00%) 8.72 ( 1.19%) >> 2nd-qrtle Time 8.90 ( 0.00%) 8.76 ( 1.57%) >> 3rd-qrtle Time 8.98 ( 0.00%) 8.82 ( 1.82%) >> Max-1 Time 8.67 ( 0.00%) 8.66 ( 0.13%) >> Max-5 Time 8.67 ( 0.00%) 8.66 ( 0.13%) >> Max-10 Time 8.79 ( 0.00%) 8.69 ( 1.09%) >> Max-90 Time 9.01 ( 0.00%) 8.84 ( 1.94%) >> Max-95 Time 9.02 ( 0.00%) 8.85 ( 1.86%) >> Max-99 Time 9.02 ( 0.00%) 8.88 ( 1.56%) >> Max Time 9.59 ( 0.00%) 8.89 ( 7.29%) >> Amean Time 8.92 ( 0.00%) 8.77 * 1.65%* >> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> >> --- >> kernel/sched/core.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 87c9cdf37a26..3785418de127 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -3718,9 +3718,8 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags) >> >> rq = __task_rq_lock(p, &rf); >> if (task_on_rq_queued(p)) { >> - /* check_preempt_curr() may use rq clock */ >> - update_rq_clock(rq); >> - ttwu_do_wakeup(rq, p, wake_flags, &rf); >> + WRITE_ONCE(p->__state, TASK_RUNNING); >> + trace_sched_wakeup(p); > > This also loses the class->task_woken() call, AFAICT we could have > !p->on_cpu here (e.g. an IRQ hit before the task got to schedule()), but > then again if there is a need to push we should have done that at the IRQ > preempt via set_next_task_{rt, dl}()... So I'm starting to think this is > OK, but that needs elaborating in the changelog. Sorry, I don't get why we could have !p->on_cpu here? I thought if we have task_on_rq_queued(p) here, it means p hasn't got to __schedule() to deactivate_task(), so p should still be on_cpu? Thanks. > > >> ret = 1; >> } >> __task_rq_unlock(rq, &rf); >> -- >> 2.37.2 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-05 1:34 ` Chengming Zhou @ 2022-11-07 12:56 ` Valentin Schneider 2022-11-07 13:19 ` Chengming Zhou 0 siblings, 1 reply; 11+ messages in thread From: Valentin Schneider @ 2022-11-07 12:56 UTC (permalink / raw) To: Chengming Zhou, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot Cc: linux-kernel On 05/11/22 09:34, Chengming Zhou wrote: > On 2022/11/5 02:19, Valentin Schneider wrote: >> On 02/11/22 18:23, Chengming Zhou wrote: >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>> index 87c9cdf37a26..3785418de127 100644 >>> --- a/kernel/sched/core.c >>> +++ b/kernel/sched/core.c >>> @@ -3718,9 +3718,8 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags) >>> >>> rq = __task_rq_lock(p, &rf); >>> if (task_on_rq_queued(p)) { >>> - /* check_preempt_curr() may use rq clock */ >>> - update_rq_clock(rq); >>> - ttwu_do_wakeup(rq, p, wake_flags, &rf); >>> + WRITE_ONCE(p->__state, TASK_RUNNING); >>> + trace_sched_wakeup(p); >> >> This also loses the class->task_woken() call, AFAICT we could have >> !p->on_cpu here (e.g. an IRQ hit before the task got to schedule()), but >> then again if there is a need to push we should have done that at the IRQ >> preempt via set_next_task_{rt, dl}()... So I'm starting to think this is >> OK, but that needs elaborating in the changelog. > > Sorry, I don't get why we could have !p->on_cpu here? > > I thought if we have task_on_rq_queued(p) here, it means p hasn't got to > __schedule() to deactivate_task(), so p should still be on_cpu? > > Thanks. > So p is still on the rq for sure, but it may not be the currently running task. Consider, on a CONFIG_PREEMPT kernel, task p0 in a wait loop: 0 for (;;) { 1 set_current_state(TASK_UNINTERRUPTIBLE); 2 3 if (CONDITION) 4 break; 5 6 schedule(); 7 } 8 __set_current_state(TASK_RUNNING); Now further consider p0 executes line 1, but then gets interrupted by an IRQ, at the end of which preempt_schedule_irq() happens. We enter __schedule(SM_PREEMPT), so p0 isn't dequeued due to its non-zero task state, but we *can* end up switching the CPU's current task. ISTR that's why Peter renamed that function ttwu_*runnable*() and not ttwu_*running*(). >> >> >>> ret = 1; >>> } >>> __task_rq_unlock(rq, &rf); >>> -- >>> 2.37.2 >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-07 12:56 ` Valentin Schneider @ 2022-11-07 13:19 ` Chengming Zhou 2022-11-07 15:54 ` Valentin Schneider 0 siblings, 1 reply; 11+ messages in thread From: Chengming Zhou @ 2022-11-07 13:19 UTC (permalink / raw) To: Valentin Schneider, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot Cc: linux-kernel On 2022/11/7 20:56, Valentin Schneider wrote: > On 05/11/22 09:34, Chengming Zhou wrote: >> On 2022/11/5 02:19, Valentin Schneider wrote: >>> On 02/11/22 18:23, Chengming Zhou wrote: >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>>> index 87c9cdf37a26..3785418de127 100644 >>>> --- a/kernel/sched/core.c >>>> +++ b/kernel/sched/core.c >>>> @@ -3718,9 +3718,8 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags) >>>> >>>> rq = __task_rq_lock(p, &rf); >>>> if (task_on_rq_queued(p)) { >>>> - /* check_preempt_curr() may use rq clock */ >>>> - update_rq_clock(rq); >>>> - ttwu_do_wakeup(rq, p, wake_flags, &rf); >>>> + WRITE_ONCE(p->__state, TASK_RUNNING); >>>> + trace_sched_wakeup(p); >>> >>> This also loses the class->task_woken() call, AFAICT we could have >>> !p->on_cpu here (e.g. an IRQ hit before the task got to schedule()), but >>> then again if there is a need to push we should have done that at the IRQ >>> preempt via set_next_task_{rt, dl}()... So I'm starting to think this is >>> OK, but that needs elaborating in the changelog. >> >> Sorry, I don't get why we could have !p->on_cpu here? >> >> I thought if we have task_on_rq_queued(p) here, it means p hasn't got to >> __schedule() to deactivate_task(), so p should still be on_cpu? >> >> Thanks. >> > > So p is still on the rq for sure, but it may not be the currently running > task. Consider, on a CONFIG_PREEMPT kernel, task p0 in a wait loop: > > 0 for (;;) { > 1 set_current_state(TASK_UNINTERRUPTIBLE); > 2 > 3 if (CONDITION) > 4 break; > 5 > 6 schedule(); > 7 } > 8 __set_current_state(TASK_RUNNING); > > Now further consider p0 executes line 1, but then gets interrupted by an > IRQ, at the end of which preempt_schedule_irq() happens. We enter > __schedule(SM_PREEMPT), so p0 isn't dequeued due to its non-zero task > state, but we *can* end up switching the CPU's current task. Thank you very much for this detailed explanation, I get it. Yes, this process can be seen on CONFIG_PREEMPT kernel. > > ISTR that's why Peter renamed that function ttwu_*runnable*() and not > ttwu_*running*(). So this task p didn't really sleep, it's preempted, later scheduled in, get to execute line 6 schedule(), but its state has been set to TASK_RUNNING, it won't deactivate_task(p). Looks like this patch is still reasonable? Should I put this detailed explanation in the changelog and send v2? Thanks! > >>> >>> >>>> ret = 1; >>>> } >>>> __task_rq_unlock(rq, &rf); >>>> -- >>>> 2.37.2 >>> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-07 13:19 ` Chengming Zhou @ 2022-11-07 15:54 ` Valentin Schneider 2022-11-08 7:38 ` Chengming Zhou 2022-11-08 9:11 ` Peter Zijlstra 0 siblings, 2 replies; 11+ messages in thread From: Valentin Schneider @ 2022-11-07 15:54 UTC (permalink / raw) To: Chengming Zhou, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot Cc: linux-kernel On 07/11/22 21:19, Chengming Zhou wrote: > On 2022/11/7 20:56, Valentin Schneider wrote: >> >> So p is still on the rq for sure, but it may not be the currently running >> task. Consider, on a CONFIG_PREEMPT kernel, task p0 in a wait loop: >> >> 0 for (;;) { >> 1 set_current_state(TASK_UNINTERRUPTIBLE); >> 2 >> 3 if (CONDITION) >> 4 break; >> 5 >> 6 schedule(); >> 7 } >> 8 __set_current_state(TASK_RUNNING); >> >> Now further consider p0 executes line 1, but then gets interrupted by an >> IRQ, at the end of which preempt_schedule_irq() happens. We enter >> __schedule(SM_PREEMPT), so p0 isn't dequeued due to its non-zero task >> state, but we *can* end up switching the CPU's current task. > > Thank you very much for this detailed explanation, I get it. Yes, this > process can be seen on CONFIG_PREEMPT kernel. > >> >> ISTR that's why Peter renamed that function ttwu_*runnable*() and not >> ttwu_*running*(). > > So this task p didn't really sleep, it's preempted, later scheduled in, > get to execute line 6 schedule(), but its state has been set to TASK_RUNNING, > it won't deactivate_task(p). > Right! > Looks like this patch is still reasonable? Should I put this detailed > explanation in the changelog and send v2? > So that's the part for the p->sched_class->task_woken() callback, which only affects RT and DL (and only does something when !p->on_cpu). I *think* it's fine to remove it from ttwu_runnable() as any push/pull should have happened when other tasks were enqueued on the same CPU - with that said, it wouldn't hurt to double check this :-) As for the check_preempt_curr(), since per the above p can be preempted, you could have scenarios right now with CFS tasks where ttwu_runnable()->check_preempt_curr() causes NEED_RESCHED to be set. e.g. p0 does set_current_state(TASK_UNINTERRUPTIBLE) but then gets interrupted by the tick, a p1 gets selected to run instead because of check_preempt_tick(), and then runs long enough to have check_preempt_curr() decide to let p0 preempt p1. That does require specific timing (lower tick frequency should make this more likely) and probably task niceness distribution too, but isn't impossible. Maybe try reading p->on_cpu, and only do the quick task state update if it's still the current task, otherwise do the preemption checks? > Thanks! > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-07 15:54 ` Valentin Schneider @ 2022-11-08 7:38 ` Chengming Zhou 2022-11-08 9:11 ` Peter Zijlstra 1 sibling, 0 replies; 11+ messages in thread From: Chengming Zhou @ 2022-11-08 7:38 UTC (permalink / raw) To: Valentin Schneider, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot Cc: linux-kernel On 2022/11/7 23:54, Valentin Schneider wrote: > On 07/11/22 21:19, Chengming Zhou wrote: >> On 2022/11/7 20:56, Valentin Schneider wrote: >>> >>> So p is still on the rq for sure, but it may not be the currently running >>> task. Consider, on a CONFIG_PREEMPT kernel, task p0 in a wait loop: >>> >>> 0 for (;;) { >>> 1 set_current_state(TASK_UNINTERRUPTIBLE); >>> 2 >>> 3 if (CONDITION) >>> 4 break; >>> 5 >>> 6 schedule(); >>> 7 } >>> 8 __set_current_state(TASK_RUNNING); >>> >>> Now further consider p0 executes line 1, but then gets interrupted by an >>> IRQ, at the end of which preempt_schedule_irq() happens. We enter >>> __schedule(SM_PREEMPT), so p0 isn't dequeued due to its non-zero task >>> state, but we *can* end up switching the CPU's current task. >> >> Thank you very much for this detailed explanation, I get it. Yes, this >> process can be seen on CONFIG_PREEMPT kernel. >> >>> >>> ISTR that's why Peter renamed that function ttwu_*runnable*() and not >>> ttwu_*running*(). >> >> So this task p didn't really sleep, it's preempted, later scheduled in, >> get to execute line 6 schedule(), but its state has been set to TASK_RUNNING, >> it won't deactivate_task(p). >> > > Right! > >> Looks like this patch is still reasonable? Should I put this detailed >> explanation in the changelog and send v2? >> > > So that's the part for the p->sched_class->task_woken() callback, which > only affects RT and DL (and only does something when !p->on_cpu). I *think* > it's fine to remove it from ttwu_runnable() as any push/pull should have > happened when other tasks were enqueued on the same CPU - with that said, > it wouldn't hurt to double check this :-) Yes, ttwu_runnable() should be fine to remove p->sched_class->task_woken(). > > > As for the check_preempt_curr(), since per the above p can be preempted, > you could have scenarios right now with CFS tasks where > ttwu_runnable()->check_preempt_curr() causes NEED_RESCHED to be set. > > e.g. p0 does > > set_current_state(TASK_UNINTERRUPTIBLE) > > but then gets interrupted by the tick, a p1 gets selected to run instead > because of check_preempt_tick(), and then runs long enough to have > check_preempt_curr() decide to let p0 preempt p1. > > That does require specific timing (lower tick frequency should make this > more likely) and probably task niceness distribution too, but isn't > impossible. > > Maybe try reading p->on_cpu, and only do the quick task state update if > it's still the current task, otherwise do the preemption checks? I think it's a good suggestion, so we still have the preemption check when p0 is preempted by p1, letting p0 to be scheduled in more timely. I will update and send v2 later. > >> Thanks! >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-07 15:54 ` Valentin Schneider 2022-11-08 7:38 ` Chengming Zhou @ 2022-11-08 9:11 ` Peter Zijlstra 2022-11-08 10:08 ` Peter Zijlstra 1 sibling, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2022-11-08 9:11 UTC (permalink / raw) To: Valentin Schneider Cc: Chengming Zhou, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel On Mon, Nov 07, 2022 at 03:54:38PM +0000, Valentin Schneider wrote: > So that's the part for the p->sched_class->task_woken() callback, which > only affects RT and DL (and only does something when !p->on_cpu). I *think* > it's fine to remove it from ttwu_runnable() as any push/pull should have > happened when other tasks were enqueued on the same CPU - with that said, > it wouldn't hurt to double check this :-) > > > As for the check_preempt_curr(), since per the above p can be preempted, > you could have scenarios right now with CFS tasks where > ttwu_runnable()->check_preempt_curr() causes NEED_RESCHED to be set. > > e.g. p0 does > > set_current_state(TASK_UNINTERRUPTIBLE) > > but then gets interrupted by the tick, a p1 gets selected to run instead > because of check_preempt_tick(), and then runs long enough to have > check_preempt_curr() decide to let p0 preempt p1. > > That does require specific timing (lower tick frequency should make this > more likely) and probably task niceness distribution too, but isn't > impossible. > > Maybe try reading p->on_cpu, and only do the quick task state update if > it's still the current task, otherwise do the preemption checks? I'm confused... So all relevant parties take rq->lock: - __schedule() - scheduler_tick() - ttwu_runnable() So if ttwu_runnable() sees on_rq and switches state back to RUNNING then neither check_preempt_curr() nor task_woken() make any sense. Specifically: - you can't very well preempt yourself (which is what check_preempt_curr() is trying to determine -- if the woken task should preempt the running task, they're both the same in this case); - the task did not actually wake up, because it was still on the runqueue to begin with. This path prevents a sleep, rather than issues a wakeup. So yes, I think the patch as proposed is ok. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-08 9:11 ` Peter Zijlstra @ 2022-11-08 10:08 ` Peter Zijlstra 2022-11-08 11:17 ` Chengming Zhou 0 siblings, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2022-11-08 10:08 UTC (permalink / raw) To: Valentin Schneider Cc: Chengming Zhou, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel On Tue, Nov 08, 2022 at 10:11:49AM +0100, Peter Zijlstra wrote: > On Mon, Nov 07, 2022 at 03:54:38PM +0000, Valentin Schneider wrote: > > > So that's the part for the p->sched_class->task_woken() callback, which > > only affects RT and DL (and only does something when !p->on_cpu). I *think* > > it's fine to remove it from ttwu_runnable() as any push/pull should have > > happened when other tasks were enqueued on the same CPU - with that said, > > it wouldn't hurt to double check this :-) > > > > > > As for the check_preempt_curr(), since per the above p can be preempted, > > you could have scenarios right now with CFS tasks where > > ttwu_runnable()->check_preempt_curr() causes NEED_RESCHED to be set. > > > > e.g. p0 does > > > > set_current_state(TASK_UNINTERRUPTIBLE) > > > > but then gets interrupted by the tick, a p1 gets selected to run instead > > because of check_preempt_tick(), and then runs long enough to have > > check_preempt_curr() decide to let p0 preempt p1. > > > > That does require specific timing (lower tick frequency should make this > > more likely) and probably task niceness distribution too, but isn't > > impossible. > > > > Maybe try reading p->on_cpu, and only do the quick task state update if > > it's still the current task, otherwise do the preemption checks? > > I'm confused... I am and Valentin has a point. It could indeed be preempted and in that case check_preempt_curr() could indeed make it get back on. In that case his suggestion might make sense; something along the lines of so I suppose... (And I think we can still do the reorg I proposed elsewhere, but I've not yet tried.) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index cb2aa2b54c7a..6944d9473295 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3698,9 +3698,16 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags) rq = __task_rq_lock(p, &rf); if (task_on_rq_queued(p)) { - /* check_preempt_curr() may use rq clock */ - update_rq_clock(rq); - ttwu_do_wakeup(rq, p, wake_flags, &rf); + if (!p->on_cpu) { + /* + * When on_rq && !on_cpu the task is preempted, see if + * it should preempt whatever is current there now. + */ + update_rq_clock(rq); + check_preempt_curr(rq, p, wake_flags); + } + WRITE_ONCE(p->__state, TASK_RUNNING); + trace_sched_wakeup(p); ret = 1; } __task_rq_unlock(rq, &rf); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-08 10:08 ` Peter Zijlstra @ 2022-11-08 11:17 ` Chengming Zhou 0 siblings, 0 replies; 11+ messages in thread From: Chengming Zhou @ 2022-11-08 11:17 UTC (permalink / raw) To: Peter Zijlstra, Valentin Schneider Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-kernel On 2022/11/8 18:08, Peter Zijlstra wrote: > On Tue, Nov 08, 2022 at 10:11:49AM +0100, Peter Zijlstra wrote: >> On Mon, Nov 07, 2022 at 03:54:38PM +0000, Valentin Schneider wrote: >> >>> So that's the part for the p->sched_class->task_woken() callback, which >>> only affects RT and DL (and only does something when !p->on_cpu). I *think* >>> it's fine to remove it from ttwu_runnable() as any push/pull should have >>> happened when other tasks were enqueued on the same CPU - with that said, >>> it wouldn't hurt to double check this :-) >>> >>> >>> As for the check_preempt_curr(), since per the above p can be preempted, >>> you could have scenarios right now with CFS tasks where >>> ttwu_runnable()->check_preempt_curr() causes NEED_RESCHED to be set. >>> >>> e.g. p0 does >>> >>> set_current_state(TASK_UNINTERRUPTIBLE) >>> >>> but then gets interrupted by the tick, a p1 gets selected to run instead >>> because of check_preempt_tick(), and then runs long enough to have >>> check_preempt_curr() decide to let p0 preempt p1. >>> >>> That does require specific timing (lower tick frequency should make this >>> more likely) and probably task niceness distribution too, but isn't >>> impossible. >>> >>> Maybe try reading p->on_cpu, and only do the quick task state update if >>> it's still the current task, otherwise do the preemption checks? >> >> I'm confused... > > I am and Valentin has a point. It could indeed be preempted and in that > case check_preempt_curr() could indeed make it get back on. > > In that case his suggestion might make sense; something along the lines > of so I suppose... > > (And I think we can still do the reorg I proposed elsewhere, but I've not > yet tried.) Ok, thanks for these suggestions. I will try to do this, do some tests and send v2. Thanks. > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index cb2aa2b54c7a..6944d9473295 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3698,9 +3698,16 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags) > > rq = __task_rq_lock(p, &rf); > if (task_on_rq_queued(p)) { > - /* check_preempt_curr() may use rq clock */ > - update_rq_clock(rq); > - ttwu_do_wakeup(rq, p, wake_flags, &rf); > + if (!p->on_cpu) { > + /* > + * When on_rq && !on_cpu the task is preempted, see if > + * it should preempt whatever is current there now. > + */ > + update_rq_clock(rq); > + check_preempt_curr(rq, p, wake_flags); > + } > + WRITE_ONCE(p->__state, TASK_RUNNING); > + trace_sched_wakeup(p); > ret = 1; > } > __task_rq_unlock(rq, &rf); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sched/core: Minor optimize ttwu_runnable() 2022-11-02 10:23 [PATCH] sched/core: Minor optimize ttwu_runnable() Chengming Zhou 2022-11-04 18:19 ` Valentin Schneider @ 2022-11-08 9:20 ` Peter Zijlstra 1 sibling, 0 replies; 11+ messages in thread From: Peter Zijlstra @ 2022-11-08 9:20 UTC (permalink / raw) To: Chengming Zhou Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, linux-kernel On Wed, Nov 02, 2022 at 06:23:43PM +0800, Chengming Zhou wrote: > ttwu_runnable() is used as a fast wakeup path when the wakee task > is between set_current_state() and schedule(), in which case all > we need to do is change p->state back to TASK_RUNNING. So we don't > need to update_rq_clock() and check_preempt_curr() in this case. > > Some performance numbers using mmtests/perfpipe on Intel Xeon server: > > linux-next patched > Min Time 8.67 ( 0.00%) 8.66 ( 0.13%) > 1st-qrtle Time 8.83 ( 0.00%) 8.72 ( 1.19%) > 2nd-qrtle Time 8.90 ( 0.00%) 8.76 ( 1.57%) > 3rd-qrtle Time 8.98 ( 0.00%) 8.82 ( 1.82%) > Max-1 Time 8.67 ( 0.00%) 8.66 ( 0.13%) > Max-5 Time 8.67 ( 0.00%) 8.66 ( 0.13%) > Max-10 Time 8.79 ( 0.00%) 8.69 ( 1.09%) > Max-90 Time 9.01 ( 0.00%) 8.84 ( 1.94%) > Max-95 Time 9.02 ( 0.00%) 8.85 ( 1.86%) > Max-99 Time 9.02 ( 0.00%) 8.88 ( 1.56%) > Max Time 9.59 ( 0.00%) 8.89 ( 7.29%) > Amean Time 8.92 ( 0.00%) 8.77 * 1.65%* > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > kernel/sched/core.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 87c9cdf37a26..3785418de127 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3718,9 +3718,8 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags) > > rq = __task_rq_lock(p, &rf); > if (task_on_rq_queued(p)) { > - /* check_preempt_curr() may use rq clock */ > - update_rq_clock(rq); > - ttwu_do_wakeup(rq, p, wake_flags, &rf); > + WRITE_ONCE(p->__state, TASK_RUNNING); > + trace_sched_wakeup(p); > ret = 1; > } > __task_rq_unlock(rq, &rf); Yes, I think this is correct; however I would re-organize code a little. How's this? diff --git a/kernel/sched/core.c b/kernel/sched/core.c index cb2aa2b54c7a..43d9a1551a5d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3602,15 +3602,40 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags) __schedstat_inc(p->stats.nr_wakeups_sync); } /* - * Mark the task runnable and perform wakeup-preemption. + * Mark the task runnable... */ -static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, - struct rq_flags *rf) +static inline void ttwu_do_wakeup(struct task_struct *p) { - check_preempt_curr(rq, p, wake_flags); WRITE_ONCE(p->__state, TASK_RUNNING); trace_sched_wakeup(p); +} + +static void +ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, + struct rq_flags *rf) +{ + int en_flags = ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK; + + lockdep_assert_rq_held(rq); + + if (p->sched_contributes_to_load) + rq->nr_uninterruptible--; + +#ifdef CONFIG_SMP + if (wake_flags & WF_MIGRATED) + en_flags |= ENQUEUE_MIGRATED; + else +#endif + if (p->in_iowait) { + delayacct_blkio_end(p); + atomic_dec(&task_rq(p)->nr_iowait); + } + + activate_task(rq, p, en_flags); + check_preempt_curr(rq, p, wake_flags); + + ttwu_do_wakeup(p); #ifdef CONFIG_SMP if (p->sched_class->task_woken) { @@ -3640,31 +3666,6 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags, #endif } -static void -ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, - struct rq_flags *rf) -{ - int en_flags = ENQUEUE_WAKEUP | ENQUEUE_NOCLOCK; - - lockdep_assert_rq_held(rq); - - if (p->sched_contributes_to_load) - rq->nr_uninterruptible--; - -#ifdef CONFIG_SMP - if (wake_flags & WF_MIGRATED) - en_flags |= ENQUEUE_MIGRATED; - else -#endif - if (p->in_iowait) { - delayacct_blkio_end(p); - atomic_dec(&task_rq(p)->nr_iowait); - } - - activate_task(rq, p, en_flags); - ttwu_do_wakeup(rq, p, wake_flags, rf); -} - /* * Consider @p being inside a wait loop: * @@ -3698,9 +3699,7 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags) rq = __task_rq_lock(p, &rf); if (task_on_rq_queued(p)) { - /* check_preempt_curr() may use rq clock */ - update_rq_clock(rq); - ttwu_do_wakeup(rq, p, wake_flags, &rf); + ttwu_do_wakeup(p); ret = 1; } __task_rq_unlock(rq, &rf); @@ -4062,8 +4061,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) goto out; trace_sched_waking(p); - WRITE_ONCE(p->__state, TASK_RUNNING); - trace_sched_wakeup(p); + ttwu_do_wakeup(p); goto out; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-11-08 11:18 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-11-02 10:23 [PATCH] sched/core: Minor optimize ttwu_runnable() Chengming Zhou 2022-11-04 18:19 ` Valentin Schneider 2022-11-05 1:34 ` Chengming Zhou 2022-11-07 12:56 ` Valentin Schneider 2022-11-07 13:19 ` Chengming Zhou 2022-11-07 15:54 ` Valentin Schneider 2022-11-08 7:38 ` Chengming Zhou 2022-11-08 9:11 ` Peter Zijlstra 2022-11-08 10:08 ` Peter Zijlstra 2022-11-08 11:17 ` Chengming Zhou 2022-11-08 9:20 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox