public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vishal Chourasia <vishalc@linux.ibm.com>
To: Phil Auld <pauld@redhat.com>
Cc: Madadi Vineeth Reddy <vineethr@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Waiman Long <longman@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [CHANGE 1/2] sched/isolation: Make use of more than one housekeeping cpu
Date: Fri, 21 Feb 2025 10:12:37 +0530	[thread overview]
Message-ID: <Z7gEPXdPrXgM0-yB@linux.ibm.com> (raw)
In-Reply-To: <20250220155229.GB89684@pauld.westford.csb>

On Thu, Feb 20, 2025 at 10:52:29AM -0500, Phil Auld wrote:
> 
> Hi Vishal,
> 
> On Thu, Feb 20, 2025 at 02:50:40PM +0530 Vishal Chourasia wrote:
> > On Tue, Feb 18, 2025 at 10:00:59AM -0500, Phil Auld wrote:
> > > Hi Vishal.
> > > 
> > > On Fri, Feb 14, 2025 at 11:08:19AM +0530 Vishal Chourasia wrote:
> > > > Hi Phil, Vineeth
> > > > 
> > > > On Thu, Feb 13, 2025 at 09:26:53AM -0500, Phil Auld wrote:
> > > > > On Thu, Feb 13, 2025 at 10:14:04AM +0530 Madadi Vineeth Reddy wrote:
> > > > > > Hi Phil Auld,
> > > > > > 
> > > > > > On 11/02/25 19:31, Phil Auld wrote:
> > > > > > > The exising code uses housekeeping_any_cpu() to select a cpu for
> > > > > > > a given housekeeping task. However, this often ends up calling
> > > > > > > cpumask_any_and() which is defined as cpumask_first_and() which has
> > > > > > > the effect of alyways using the first cpu among those available.
> > > > > > > 
> > > > > > > The same applies when multiple NUMA nodes are involved. In that
> > > > > > > case the first cpu in the local node is chosen which does provide
> > > > > > > a bit of spreading but with multiple HK cpus per node the same
> > > > > > > issues arise.
> > > > > > > 
> > > > > > > Spread the HK work out by having housekeeping_any_cpu() and
> > > > > > > sched_numa_find_closest() use cpumask_any_and_distribute()
> > > > > > > instead of cpumask_any_and().
> > > > > > > 
> > > > > > 
> > > > > > Got the overall intent of the patch for better load distribution on
> > > > > > housekeeping tasks. However, one potential drawback could be that by
> > > > > > spreading HK work across multiple CPUs might reduce the time that
> > > > > > some cores can spend in deeper idle states which can be beneficial for
> > > > > > power-sensitive systems.
> > > > > > 
> > > > > > Thoughts?
> > > > > 
> > > > > NOHZ_full setups are not generally used in power sensitive systems I think.
> > > > > They aren't in our use cases at least. 
> > > > > 
> > > > > In cases with many cpus a single housekeeping cpu can not keep up. Having
> > > > > other HK cpus in deep idle states while the one in use is overloaded is
> > > > > not a win. 
> > > > 
> > > > To me, an overloaded CPU sounds like where more than one tasks are ready
> > > > to run, and a HK CPU is one receiving periodic scheduling clock
> > > > ticks, so HP CPU is bound to comes out of any power-saving state it is in.
> > > 
> > > If the overload is caused by HK and interrupts there is nothing in the
> > > system to help. Tasks, sure, can get load balanced.
> > > 
> > > And as you say, the HK cpus will have generally ticks happening anyway.
> > > 
> > > > > 
> > > > > If your single HK cpu can keep up then only configure that one HK cpu.
> > > > > The others will go idle and stay there.  And since they are nohz_full
> > > > > might get to stay idle even longer.
> > > > While it is good to distribute the load across each HK CPU in the HK 
> > > > cpumask (queuing jobs on different CPUs each time), this can cause
> > > > jitter in virtualized environments. Unnecessaryily evicting other
> > > > tenants, when it's better to overload a VP than to wake up other VPs of a
> > > > tenant.
> > > > 
> > > 
> > > Sorry I'm not sure I understand your setup. Are your running virtual
> > > tenants on the HK cpus?  nohz_full in the guests? Maybe you only need
> > > on HK then it won't matter.
> > > 
> > Firstly, I am unaware if nohz_full is being used in virtualized environment.
> > Please correct me it is. I am not saying it can't or shouldn't be used, it's just
> > I don't know if anybody is using it.
> >
> 
> I've seen some people trying it in various ways and to varying degrees of
> success if I recall correctly.
> 
> 
> > nohz_full in guests would mean that tick is disabled inside the guest but
> > the host might still be getting ticks. So, I am unsure, whether it is a good
> > idea to have nohz_full in virtualized environment.
> > 
> > Nevertheless, the idea of nohz_full is to reduce to the kernel interference
> > for CPUs marked as nohz_full. And, it does help with guest interference.
> > 
> > I would like to mention, In SPLPAR environment, scheduling work on
> > different HK CPU each time can caused VM preemption in a multi-tenant
> > setup in cases where CPUs in HK cpumask spans across VPs, its better to
> > consolidate them within few VPs.
> > 
> > VP is virtual core/processor.
> 
> I have a hard time reconciling you saying you are not using virtualization
> and then talking about VMs and VPs ;)
I was thinking of a possibility in virtualized environments, this isn't
specific to PPC 

Thanks
vishalc :)
> 
> Sometimes I forget how interesting PPC can be...
> 
> > 
> > > My concern is that currently there is no point in having more than
> > > one HK cpu (per node in a NUMA case). The code as currently implemented
> > > is just not doing what it needs to.
> > > 
> > > We have numerous cases where a single HK cpu just cannot keep up and
> > > the remote_tick warning fires. It also can lead to the other things
> > > (orchastration sw, HA keepalives etc) on the HK cpus getting starved
> > > which leads to other issues.  In these cases we recommend increasing
> > > the number of HK cpus.  But... that only helps the userspace tasks
> > > somewhat. It does not help the actual housekeeping part.
> > > 
> > > It seems clear to me that the intent of the cpumask_any_and() calls
> > > is to pick _any_ cpu in the hk mask. Not just the first, otherwise
> > > it would just use cpumask_first_and().
> > > 
> > > I'm open to alternate suggestions of how to fix this.
> > Your approach looks good to me.
> > 
> > I wanted to mention the case of overcommitted multi-tenant setup
> > where this will cause noisy neighbour sort of situation, but this can be
> > avoided by carefully selecting HK CPUs.
> 
> Yes, it can be configured away, hopefully.
> 
> Still, this could easily be a sched feat if that make sense to people.
> 
> 
> Thanks for the review!
> 
> 
> Cheers,
> Phil
> 
> 
> 
> > 
> > Thanks,
> > vishalc
> > > 
> > > 
> > > Cheers,
> > > Phil
> > > 
> > > > > 
> > > > > I do have a patch that has this controlled by a sched feature if that
> > > > > is of interest. Then it could be disabled if you don't want it.
> > > > 
> > > > Vishal
> > > > > 
> > > > > Cheers,
> > > > > Phil
> > > > > 
> > > > > > 
> > > > > > Thanks,
> > > > > > Madadi Vineeth Reddy
> > > > > > 
> > > > > > > Signed-off-by: Phil Auld <pauld@redhat.com>
> > > > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > > > Cc: Juri Lelli <juri.lelli@redhat.com>
> > > > > > > Cc: Frederic Weisbecker <frederic@kernel.org>
> > > > > > > Cc: Waiman Long <longman@redhat.com>
> > > > > > > Cc: linux-kernel@vger.kernel.org
> > > > > > > ---
> > > > > > >  kernel/sched/isolation.c | 2 +-
> > > > > > >  kernel/sched/topology.c  | 2 +-
> > > > > > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> > > > > > > index 81bc8b329ef1..93b038d48900 100644
> > > > > > > --- a/kernel/sched/isolation.c
> > > > > > > +++ b/kernel/sched/isolation.c
> > > > > > > @@ -40,7 +40,7 @@ int housekeeping_any_cpu(enum hk_type type)
> > > > > > >  			if (cpu < nr_cpu_ids)
> > > > > > >  				return cpu;
> > > > > > >  
> > > > > > > -			cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask);
> > > > > > > +			cpu = cpumask_any_and_distribute(housekeeping.cpumasks[type], cpu_online_mask);
> > > > > > >  			if (likely(cpu < nr_cpu_ids))
> > > > > > >  				return cpu;
> > > > > > >  			/*
> > > > > > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > > > > > > index c49aea8c1025..94133f843485 100644
> > > > > > > --- a/kernel/sched/topology.c
> > > > > > > +++ b/kernel/sched/topology.c
> > > > > > > @@ -2101,7 +2101,7 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu)
> > > > > > >  	for (i = 0; i < sched_domains_numa_levels; i++) {
> > > > > > >  		if (!masks[i][j])
> > > > > > >  			break;
> > > > > > > -		cpu = cpumask_any_and(cpus, masks[i][j]);
> > > > > > > +		cpu = cpumask_any_and_distribute(cpus, masks[i][j]);
> > > > > > >  		if (cpu < nr_cpu_ids) {
> > > > > > >  			found = cpu;
> > > > > > >  			break;
> > > > > > 
> > > > > 
> > > > > -- 
> > > > > 
> > > > 
> > > 
> > > -- 
> > > 
> > 
> 
> -- 
> 

      reply	other threads:[~2025-02-21  4:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11 14:01 [CHANGE 1/2] sched/isolation: Make use of more than one housekeeping cpu Phil Auld
2025-02-11 14:14 ` [PATCH] " Phil Auld
2025-02-11 14:24   ` Waiman Long
2025-02-13  4:44 ` [CHANGE 1/2] " Madadi Vineeth Reddy
2025-02-13 14:26   ` Phil Auld
2025-02-14  5:38     ` Vishal Chourasia
2025-02-18 15:00       ` Phil Auld
2025-02-18 15:23         ` Waiman Long
2025-02-18 15:30           ` Phil Auld
2025-02-18 15:33             ` Waiman Long
2025-02-20  9:20         ` Vishal Chourasia
2025-02-20 15:52           ` Phil Auld
2025-02-21  4:42             ` Vishal Chourasia [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z7gEPXdPrXgM0-yB@linux.ibm.com \
    --to=vishalc@linux.ibm.com \
    --cc=frederic@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=vineethr@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox