linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: vschneid@redhat.com, iii@linux.ibm.com, huschle@linux.ibm.com,
	rostedt@goodmis.org, dietmar.eggemann@arm.com,
	vineeth@bitbyteword.org, jgross@suse.com, pbonzini@redhat.com,
	seanjc@google.com, mingo@redhat.com, peterz@infradead.org,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	tglx@linutronix.de, yury.norov@gmail.com, maddy@linux.ibm.com,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	gregkh@linuxfoundation.org
Subject: Re: [RFC PATCH v3 07/10] sched/core: Push current task from paravirt CPU
Date: Thu, 11 Sep 2025 22:22:48 +0530	[thread overview]
Message-ID: <1617b0fb-273a-4d86-8247-c67968c07b3b@linux.ibm.com> (raw)
In-Reply-To: <7227822a-0b4a-47cc-af7f-190f6d3b3e07@amd.com>



On 9/11/25 11:10 AM, K Prateek Nayak wrote:
> Hello Shrikanth,
> 
> On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
>> Actively push out any task running on a paravirt CPU. Since the task is
>> running on the CPU need to spawn a stopper thread and push the task out.
>>
>> If task is sleeping, when it wakes up it is expected to move out. In
>> case it still chooses a paravirt CPU, next tick will move it out.
>> However, if the task in pinned only to paravirt CPUs, it will continue
>> running there.
>>
>> Though code is almost same as __balance_push_cpu_stop and quite close to
>> push_cpu_stop, it provides a cleaner implementation w.r.t to PARAVIRT
>> config.
>>
>> Add push_task_work_done flag to protect pv_push_task_work buffer. This has
>> been placed at the empty slot available considering 64/128 byte
>> cacheline.
>>
>> This currently works only FAIR and RT.
> 
> EXT can perhaps use the ops->cpu_{release,acquire}() if they are
> interested in this.
>

>>
>> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
>> ---
>>   kernel/sched/core.c  | 84 ++++++++++++++++++++++++++++++++++++++++++++
>>   kernel/sched/sched.h |  9 ++++-
>>   2 files changed, 92 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 279b0dd72b5e..1f9df5b8a3a2 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5629,6 +5629,10 @@ void sched_tick(void)
>>   
>>   	sched_clock_tick();
>>   
>> +	/* push the current task out if a paravirt CPU */
>> +	if (is_cpu_paravirt(cpu))
>> +		push_current_from_paravirt_cpu(rq);
> 
> Does this mean paravirt CPU is capable of handling an interrupt but may
> not be continuously available to run a task?

When i run hackbench which involves fair bit of IRQ stuff, it moves out.

For example,

echo 600-710 > /sys/devices/system/cpu/paravirt

11:31:54 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
11:31:57 AM  598    2.04    0.00   77.55    0.00   18.37    0.00    1.02    0.00    0.00    1.02
11:31:57 AM  599    1.01    0.00   79.80    0.00   17.17    0.00    1.01    0.00    0.00    1.01
11:31:57 AM  600    0.00    0.00    0.00    0.00    0.00    0.00    0.99    0.00    0.00   99.01
11:31:57 AM  601    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
11:31:57 AM  602    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00


There could some workloads which doesn't move irq's out, for which needs irqbalance change.
Looking into it.

  Or is the VMM expected to set
> the CPU on the paravirt mask and give the vCPU sufficient time to move the
> task before yanking it away from the pCPU?
>

If the vCPU is running something, it is going to run at some point on pCPU.
hypervisor will give the cycles to this vCPU by preempting some other vCPU.

It is that using this infra, there is should be nothing on that paravirt vCPU.
That way collectively VMM gets only limited request for pCPU which it can satify
without vCPU preemption.

  
>> +
>>   	rq_lock(rq, &rf);
>>   	donor = rq->donor;
>>   
>> @@ -10977,4 +10981,84 @@ void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx)
>>   struct cpumask __cpu_paravirt_mask __read_mostly;
>>   EXPORT_SYMBOL(__cpu_paravirt_mask);
>>   DEFINE_STATIC_KEY_FALSE(cpu_paravirt_push_tasks);
>> +
>> +static DEFINE_PER_CPU(struct cpu_stop_work, pv_push_task_work);
>> +
>> +static int paravirt_push_cpu_stop(void *arg)
>> +{
>> +	struct task_struct *p = arg;
> 
> Can we move all pushable tasks at once instead of just the rq->curr at
> the time of the tick? It can also avoid keeping the reference to "p"
> and only selectively pushing it. Thoughts?
> 

I think that is doable.
need to pass rq as arg and go through all tasks in rq in the stopped thread.

>> +	struct rq *rq = this_rq();
>> +	struct rq_flags rf;
>> +	int cpu;
>> +
>> +	raw_spin_lock_irq(&p->pi_lock);
>> +	rq_lock(rq, &rf);
>> +	rq->push_task_work_done = 0;
>> +
>> +	update_rq_clock(rq);
>> +
>> +	if (task_rq(p) == rq && task_on_rq_queued(p)) {
>> +		cpu = select_fallback_rq(rq->cpu, p);
>> +		rq = __migrate_task(rq, &rf, p, cpu);
>> +	}
>> +
>> +	rq_unlock(rq, &rf);
>> +	raw_spin_unlock_irq(&p->pi_lock);
>> +	put_task_struct(p);
>> +
>> +	return 0;
>> +}
>> +
>> +/* A CPU is marked as Paravirt when there is contention for underlying
>> + * physical CPU and using this CPU will lead to hypervisor preemptions.
>> + * It is better not to use this CPU.
>> + *
>> + * In case any task is scheduled on such CPU, move it out. In
>> + * select_fallback_rq a non paravirt CPU will be chosen and henceforth
>> + * task shouldn't come back to this CPU
>> + */
>> +void push_current_from_paravirt_cpu(struct rq *rq)
>> +{
>> +	struct task_struct *push_task = rq->curr;
>> +	unsigned long flags;
>> +	struct rq_flags rf;
>> +
>> +	if (!is_cpu_paravirt(rq->cpu))
>> +		return;
>> +
>> +	/* Idle task can't be pused out */
>> +	if (rq->curr == rq->idle)
>> +		return;
>> +
>> +	/* Do for only SCHED_NORMAL AND RT for now */
>> +	if (push_task->sched_class != &fair_sched_class &&
>> +	    push_task->sched_class != &rt_sched_class)
>> +		return;
>> +
>> +	if (kthread_is_per_cpu(push_task) ||
>> +	    is_migration_disabled(push_task))
>> +		return;
>> +
>> +	/* Is it affine to only paravirt cpus? */
>> +	if (cpumask_subset(push_task->cpus_ptr, cpu_paravirt_mask))
>> +		return;
>> +
>> +	/* There is already a stopper thread for this. Dont race with it */
>> +	if (rq->push_task_work_done == 1)
>> +		return;
>> +
>> +	local_irq_save(flags);
>> +	preempt_disable();
> 
> Disabling IRQs implies preemption is disabled.
>

Most of the places stop_one_cpu_nowait called with preemption & irq disabled.
stopper runs at the next possible opportunity.

stop_one_cpu_nowait
  ->queues the task into stopper list
     -> wake_up_process(stopper)
        -> set need_resched
          -> stopper runs as early as possible.
          
>> +
>> +	get_task_struct(push_task);
>> +
>> +	rq_lock(rq, &rf);
>> +	rq->push_task_work_done = 1;
>> +	rq_unlock(rq, &rf);
>> +
>> +	stop_one_cpu_nowait(rq->cpu, paravirt_push_cpu_stop, push_task,
>> +			    this_cpu_ptr(&pv_push_task_work));
>> +	preempt_enable();
>> +	local_irq_restore(flags);
>> +}
>>   #endif


  reply	other threads:[~2025-09-11 16:53 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 17:42 [RFC PATCH v3 00/10] paravirt CPUs and push task for less vCPU preemption Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 01/10] sched/docs: Document cpu_paravirt_mask and Paravirt CPU concept Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 02/10] cpumask: Introduce cpu_paravirt_mask Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 03/10] sched: Static key to check paravirt cpu push Shrikanth Hegde
2025-09-11  1:53   ` Yury Norov
2025-09-11 14:37     ` Shrikanth Hegde
2025-09-11 15:29       ` Yury Norov
2025-09-10 17:42 ` [RFC PATCH v3 04/10] sched/core: Dont allow to use CPU marked as paravirt Shrikanth Hegde
2025-09-11  5:16   ` K Prateek Nayak
2025-09-11 14:44     ` Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance Shrikanth Hegde
2025-09-11  5:23   ` K Prateek Nayak
2025-09-11 15:56     ` Shrikanth Hegde
2025-09-11 16:55       ` K Prateek Nayak
2025-11-08 12:04     ` Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 06/10] sched/rt: Don't select paravirt CPU for wakeup and push/pull rt task Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 07/10] sched/core: Push current task from paravirt CPU Shrikanth Hegde
2025-09-11  5:40   ` K Prateek Nayak
2025-09-11 16:52     ` Shrikanth Hegde [this message]
2025-09-11 17:06       ` K Prateek Nayak
2025-09-12  5:22         ` Shrikanth Hegde
2025-09-12  8:48           ` K Prateek Nayak
2025-09-12 12:49             ` Shrikanth Hegde
2025-11-10  4:54     ` Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 08/10] sysfs: Add paravirt CPU file Shrikanth Hegde
2025-09-10 17:42 ` [RFC PATCH v3 09/10] powerpc: Add debug file for set/unset paravirt CPUs Shrikanth Hegde
2025-09-10 17:42 ` [HELPER PATCH] sysfs: Provide write method for paravirt Shrikanth Hegde
2025-10-20 14:32 ` [RFC PATCH v3 00/10] paravirt CPUs and push task for less vCPU preemption Sean Christopherson
2025-10-20 15:05   ` Paolo Bonzini
2025-10-23  4:03     ` Shrikanth Hegde
2025-10-21  6:10   ` Shrikanth Hegde
2025-10-22 18:46     ` Sean Christopherson
2025-10-30 17:43       ` Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1617b0fb-273a-4d86-8247-c67968c07b3b@linux.ibm.com \
    --to=sshegde@linux.ibm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=huschle@linux.ibm.com \
    --cc=iii@linux.ibm.com \
    --cc=jgross@suse.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vineeth@bitbyteword.org \
    --cc=vschneid@redhat.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).