From: Valentin Schneider <Valentin.Schneider@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vincent Donnefort <Vincent.Donnefort@arm.com>,
peterz@infradead.org, mingo@redhat.com,
linux-kernel@vger.kernel.org, mgorman@techsingularity.net,
dietmar.eggemann@arm.com
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task
Date: Thu, 25 Nov 2021 15:30:05 +0000 [thread overview]
Message-ID: <87zgpsb6de.mognet@arm.com> (raw)
In-Reply-To: <CAKfTPtDPskVdEd-KQ_cwe-R_zVFPQOgdbk9x+3eD12pKs8fGFw@mail.gmail.com>
On 25/11/21 14:23, Vincent Guittot wrote:
> On Thu, 25 Nov 2021 at 12:16, Valentin Schneider
> <Valentin.Schneider@arm.com> wrote:
>> I think you can still hit this on a symmetric system; let me try to
>> reformulate my other email.
>>
>> If this (non-patched) condition evaluates to true, it means the previous
>> condition
>>
>> (available_idle_cpu(target) || sched_idle_cpu(target)) &&
>> asym_fits_capacity(task_util, target)
>>
>> evaluated to false, so for a symmetric system target sure isn't idle.
>>
>> prev == smp_processor_id() implies prev == target, IOW prev isn't
>> idle. Now, consider:
>>
>> p0.prev = CPU1
>> p1.prev = CPU1
>>
>> CPU0 CPU1
>> current = don't care current = swapper/1
>>
>> ttwu(p1)
>> ttwu_queue(p1, CPU1)
>> // or
>> ttwu_queue_wakelist(p1, CPU1)
>>
>> hrtimer_wakeup()
>> wake_up_process()
>> ttwu()
>> idle_cpu(CPU1)? no
>>
>> is_per_cpu_kthread(current)? yes
>> prev == smp_processor_id()? yes
>> this_rq()->nr_running <= 1? yes
>> => self enqueue
>>
>> ...
>> schedule_idle()
>>
>> This works if CPU0 does either a full enqueue (rq->nr_running == 1) or just
>> a wakelist enqueue (rq->ttwu_pending > 0). If there was an idle CPU3
>> around, we'd still be stacking p0 and p1 onto CPU1.
>>
>> IOW this opens a window between a remote ttwu() and the idle task invoking
>> schedule_idle() where the idle task can stack more tasks onto its CPU.
>
> Your use case above is out of the scope of this patch and has always
> been there, even for other per cpu kthreads. In such case, the wake up
> is not triggered by current (idle or another per cpu kthread) but by
> an interrupt (hrtimer in your case).
Technically the idle task didn't pass is_per_cpu_kthread(p) when that
condition was added, this is somewhat of a "new development" - but you're
right on the hardirq side of things.
> If we want to filter wakeup
> generated by interrupt context while a per cpu kthread is running, it
> would be better to fix all cases and test the running context like
> this
>
I think that could make sense - though can the idle task issue wakeups in
process context? If so that won't be sufficient. A quick audit tells me:
o rcu_nocb_flush_deferred_wakeup() happens before calling into cpuidle
o I didn't see any wakeup issued from the cpu_pm_notifier call chain
o I'm not entirely sure about flush_smp_call_function_from_idle(). I found
this thing in RCU:
smp_call_function_single(cpu, rcu_exp_handler)
rcu_exp_handler()
rcu_report_exp_rdp()
rcu_report_exp_cpu_mult()
__rcu_report_exp_rnp()
swake_up_one()
IIUC if set_nr_if_polling() then the smp_call won't send an IPI and should be
handled in that flush_foo_from_idle() call.
I'd be tempted to stick your VincentD's conditions together, just to be
safe...
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6397,7 +6397,8 @@ static int select_idle_sibling(struct
> task_struct *p, int prev, int target)
> * essentially a sync wakeup. An obvious example of this
> * pattern is IO completions.
> */
> - if (is_per_cpu_kthread(current) &&
> + if (!in_interrupt() &&
> + is_per_cpu_kthread(current) &&
> prev == smp_processor_id() &&
> this_rq()->nr_running <= 1) {
> return prev;
>
>>
>> >
>> >> --
>> >> 2.25.1
>> >>
next prev parent reply other threads:[~2021-11-25 15:32 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-24 15:42 [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task Vincent Donnefort
2021-11-24 16:28 ` Valentin Schneider
2021-11-25 9:05 ` Vincent Guittot
2021-11-25 11:16 ` Valentin Schneider
2021-11-25 13:17 ` Dietmar Eggemann
2021-11-25 13:23 ` Vincent Guittot
2021-11-25 15:30 ` Valentin Schneider [this message]
2021-11-26 8:23 ` Vincent Guittot
2021-11-26 13:32 ` Valentin Schneider
2021-11-26 14:40 ` Vincent Guittot
2021-11-26 16:49 ` Valentin Schneider
2021-11-26 17:18 ` Vincent Donnefort
2021-11-29 15:49 ` Vincent Guittot
2021-11-29 16:54 ` Vincent Donnefort
2021-11-30 13:35 ` Dietmar Eggemann
2021-11-30 15:42 ` Vincent Guittot
2021-12-01 14:40 ` Vincent Donnefort
2021-12-01 16:19 ` Vincent Guittot
2021-11-29 8:36 ` [sched/fair] 8d0920b981: stress-ng.sem.ops_per_sec 11.9% improvement kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgpsb6de.mognet@arm.com \
--to=valentin.schneider@arm.com \
--cc=Vincent.Donnefort@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox