From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: yury.norov@gmail.com
Cc: vschneid@redhat.com, dietmar.eggemann@arm.com,
rostedt@goodmis.org, mingo@redhat.com, peterz@infradead.org,
kprateek.nayak@amd.com, huschle@linux.ibm.com,
srikar@linux.ibm.com, linux-kernel@vger.kernel.org,
christophe.leroy@csgroup.eu, linuxppc-dev@lists.ozlabs.org,
gregkh@linuxfoundation.org, maddy@linux.ibm.com,
tglx@linutronix.de, juri.lelli@redhat.com,
vincent.guittot@linaro.org
Subject: Re: [RFC v2 6/9] sched/core: Push current task out if CPU is marked as avoid
Date: Wed, 13 Aug 2025 00:10:23 +0530 [thread overview]
Message-ID: <d87c4b4f-959b-4726-9b4b-4ddeb7488b37@linux.ibm.com> (raw)
In-Reply-To: <20250625191108.1646208-7-sshegde@linux.ibm.com>
Sorry for the delay in response to bloat-o-meter report. Since stop_one_cpu_nowait needs protection
against race, need to add a field in rq. So ifdef check of CONFIG_PARAVIRT makes sense.
>
> Since the task is running, need to use the stopper class to push the
> task out. Use __balance_push_cpu_stop to achieve that.
>
> This currently works only CFS and RT.
>
> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> ---
> kernel/sched/core.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> kernel/sched/sched.h | 1 +
> 2 files changed, 45 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 13e44d7a0b90..aea4232e3ec4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5577,6 +5577,10 @@ void sched_tick(void)
>
> sched_clock_tick();
>
> + /* push the current task out if cpu is marked as avoid */
> + if (cpu_avoid(cpu))
> + push_current_task(rq);
> +
> rq_lock(rq, &rf);
> donor = rq->donor;
>
> @@ -8028,6 +8032,43 @@ static void balance_hotplug_wait(void)
> TASK_UNINTERRUPTIBLE);
> }
>
> +static DEFINE_PER_CPU(struct cpu_stop_work, push_task_work);
> +
> +/* A CPU is marked as Avoid when there is contention for underlying
> + * physical CPU and using this CPU will lead to hypervisor preemptions.
> + * It is better not to use this CPU.
> + *
> + * In case any task is scheduled on such CPU, move it out. In
> + * select_fallback_rq a non_avoid CPU will be chosen and henceforth
> + * task shouldn't come back to this CPU
> + */
> +void push_current_task(struct rq *rq)
> +{
> + struct task_struct *push_task = rq->curr;
> + unsigned long flags;
> +
> + /* idle task can't be pused out */
> + if (rq->curr == rq->idle || !cpu_avoid(rq->cpu))
> + return;
> +
> + /* Do for only SCHED_NORMAL AND RT for now */
> + if (push_task->sched_class != &fair_sched_class &&
> + push_task->sched_class != &rt_sched_class)
> + return;
> +
> + if (kthread_is_per_cpu(push_task) ||
> + is_migration_disabled(push_task))
> + return;
> +
> + local_irq_save(flags);
> + get_task_struct(push_task);
> + preempt_disable();
> +
> + stop_one_cpu_nowait(rq->cpu, __balance_push_cpu_stop, push_task,
> + this_cpu_ptr(&push_task_work));
Doing a perf record occasionally caused the crash. This happens because stop_one_cpu_nowait
expects the callers to sync and push_task_work should be untouched until the stopper executes.
So, i had to do something similar to whats done in active_balance.
Add a field in rq and set/unset accordingly.
Using this field in __balance_push_cpu_stop is also hacky. I have to do something like below,
if (rq->balance_callback != &balance_push_callback)
rq->push_task_work_pending = 0;
or i have to copy __balance_push_cpu_stop and do the above.
After this, it makes sense to put all this under CONFIG_PARAVIRT.
(Also, i did explore using stop_one_cpu variant, got to it via scheduling a work and then execute it at
preemptible context. That occasionally ends up in deadlock. due to some issues at my end, haven't debugged that
further. a backup option for nowait)
> + preempt_enable();
> + local_irq_restore(flags);
> +}
> #else /* !CONFIG_HOTPLUG_CPU: */
>
> static inline void balance_push(struct rq *rq)
> @@ -8042,6 +8083,9 @@ static inline void balance_hotplug_wait(void)
> {
> }
>
> +void push_current_task(struct rq *rq)
> +{
> +}
> #endif /* !CONFIG_HOTPLUG_CPU */
>
> void set_rq_online(struct rq *rq)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 105190b18020..b9614873762e 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1709,6 +1709,7 @@ struct rq_flags {
> };
>
> extern struct balance_callback balance_push_callback;
> +void push_current_task(struct rq *rq);
>
> #ifdef CONFIG_SCHED_CLASS_EXT
> extern const struct sched_class ext_sched_class;
Hopefully i should be able to send out v3 soon addressing the comments.
Namewise, going to keep it cpu_paravirt_mask and cpu_paravirt(cpu).
next prev parent reply other threads:[~2025-08-12 18:40 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-25 19:10 [RFC v2 0/9] cpu avoid state and push task mechanism Shrikanth Hegde
2025-06-25 19:11 ` [RFC v2 1/9] sched/docs: Document avoid_cpu_mask and avoid CPU concept Shrikanth Hegde
2025-06-25 19:11 ` [RFC v2 2/9] cpumask: Introduce cpu_avoid_mask Shrikanth Hegde
2025-06-25 19:11 ` [RFC v2 3/9] sched/core: Dont allow to use CPU marked as avoid Shrikanth Hegde
2025-06-25 19:11 ` [RFC v2 4/9] sched/fair: Don't use CPU marked as avoid for wakeup and load balance Shrikanth Hegde
2025-06-26 0:02 ` Yury Norov
2025-06-26 13:42 ` Shrikanth Hegde
2025-06-25 19:11 ` [RFC v2 5/9] sched/rt: Don't select CPU marked as avoid for wakeup and push/pull rt task Shrikanth Hegde
2025-06-25 19:11 ` [RFC v2 6/9] sched/core: Push current task out if CPU is marked as avoid Shrikanth Hegde
2025-08-12 18:40 ` Shrikanth Hegde [this message]
2025-06-25 19:11 ` [RFC v2 7/9] sched: Add static key check for cpu_avoid Shrikanth Hegde
2025-06-26 0:12 ` Yury Norov
2025-06-25 19:11 ` [RFC v2 8/9] sysfs: Add cpu_avoid file Shrikanth Hegde
2025-07-01 9:35 ` Greg KH
2025-07-02 6:05 ` Shrikanth Hegde
2025-06-25 19:11 ` [RFC v2 9/9] [DEBUG] powerpc: add debug file for set/unset cpu avoid Shrikanth Hegde
2025-06-25 22:53 ` Yury Norov
2025-06-26 13:39 ` Shrikanth Hegde
2025-06-25 21:55 ` [RFC v2 0/9] cpu avoid state and push task mechanism Yury Norov
2025-06-26 14:33 ` Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d87c4b4f-959b-4726-9b4b-4ddeb7488b37@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=christophe.leroy@csgroup.eu \
--cc=dietmar.eggemann@arm.com \
--cc=gregkh@linuxfoundation.org \
--cc=huschle@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=srikar@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).