All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Cc: peterz@infradead.org, vincent.guittot@linaro.org,
	srikar@linux.vnet.ibm.com, linux-kernel@vger.kernel.org,
	mingo@redhat.com, dietmar.eggemann@arm.com, mgorman@suse.de
Subject: Re: [RFC PATCH] sched/fair: Skip idle CPU search on busy system
Date: Wed, 4 Oct 2023 18:25:15 +0200	[thread overview]
Message-ID: <ZR2R6wMhOpx6PVGT@gmail.com> (raw)
In-Reply-To: <20230726093612.1882644-1-sshegde@linux.vnet.ibm.com>


* Shrikanth Hegde <sshegde@linux.vnet.ibm.com> wrote:

> When the system is fully busy, there will not be any idle CPU's.
> In that case, load_balance will be called mainly with CPU_NOT_IDLE
> type. In should_we_balance its currently checking for an idle CPU if
> one exist. When system is 100% busy, there will not be an idle CPU and
> these idle_cpu checks can be skipped. This would avoid fetching those rq
> structures.
> 
> This is a minor optimization for a specific case of 100% utilization.
> 
> .....
> Coming to the current implementation. It is a very basic approach to the
> issue. It may not be the best/perfect way to this.  It works only in
> case of CONFIG_NO_HZ_COMMON. nohz.nr_cpus is a global info available which
> tracks idle CPU's. AFAIU there isn't any other. If there is such info, we
> can use that instead. nohz.nr_cpus is atomic, which might be costly too.
> 
> Alternative way would be to add a new attribute to sched_domain and update
> it in cpu idle entry/exit path per CPU. Advantage is, check can be per
> env->sd instead of global. Slightly complicated, but maybe better. there
> could other advantage at wake up to limit the scan etc.
> 
> Your feedback would really help. Does this optimization makes sense?
> 
> Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
> ---
>  kernel/sched/fair.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 373ff5f55884..903d59b5290c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10713,6 +10713,12 @@ static int should_we_balance(struct lb_env *env)
>  		return 1;
>  	}
> 
> +#ifdef CONFIG_NO_HZ_COMMON
> +	/* If the system is fully busy, its better to skip the idle checks */
> +	if (env->idle == CPU_NOT_IDLE && atomic_read(&nohz.nr_cpus) == 0)
> +		return group_balance_cpu(sg) == env->dst_cpu;
> +#endif

Not a big fan of coupling NOHZ to a scheduler optimization in this fashion, 
and not a big fan of the nohz.nr_cpus global cacheline either.

I think it should be done unconditionally, via the scheduler topology tree:

 - We should probably slow-propagate "permanently busy" status of a CPU
   down the topology tree, ie.:

     - mark a domain fully-busy with a delay & batching, probably driven
       by the busy-tick only,

     - while marking a domain idle instantly & propagating this up the
       domain tree only if necessary. The propagation can stop if it
       finds a non-busy domain, so usually it won't reach the root domain.

 - This approach ensures there's no real overhead problem in the domain 
   tree: think of hundreds of CPUs all accessing the nohz.nr_cpus global 
   variable... I bet it's a measurable problem already on large systems.

 - The "atomic_read(&nohz.nr_cpus) == 0" condition in your patch is simply
   the busy-flag checked at the root domain: a readonly global cacheline
   that never gets modified on a permanently busy system.

Thanks,

	Ingo

      parent reply	other threads:[~2023-10-04 16:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26  9:36 [RFC PATCH] sched/fair: Skip idle CPU search on busy system Shrikanth Hegde
2023-07-27  7:25 ` Chen Yu
2023-07-27 15:04   ` Shrikanth Hegde
2023-08-09 18:44 ` Vishal Chourasia
2023-08-10 15:44   ` Shrikanth Hegde
2023-10-04 16:25 ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZR2R6wMhOpx6PVGT@gmail.com \
    --to=mingo@kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=sshegde@linux.vnet.ibm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.