public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Sander Vanheule <sander@svanheule.net>
Cc: Valentin Schneider <vschneid@redhat.com>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mel Gorman <mgorman@suse.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Tony Luck <tony.luck@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Gal Pressman <gal@nvidia.com>, Tariq Toukan <tariqt@nvidia.com>,
	Jesse Brandeburg <jesse.brandeburg@intel.com>
Subject: Re: [PATCH v3 1/9] cpumask: Make cpumask_full() check for nr_cpu_ids bits
Date: Sun, 28 Aug 2022 09:38:35 -0700	[thread overview]
Message-ID: <YwuaCxd+0yHA+bxk@yury-laptop> (raw)
In-Reply-To: <15255a7223fe405808bcedb5ab19bf2108637e08.camel@svanheule.net>

On Sun, Aug 28, 2022 at 10:35:38AM +0200, Sander Vanheule wrote:

 ...

> > It's really a puzzle, and some of my thoughts are below. So. 
> > 
> > This is a question what for we need nr_cpumask_bits while we already
> > have nr_cpu_ids. When OFFSTACK is ON, they are obviously the same.
> > When it's of - the nr_cpumask_bits is an alias to NR_CPUS.
> > 
> > I tried to wire the nr_cpumask_bits to nr_cpu_ids unconditionally, and
> > it works even when OFFSTACK is OFF, no surprises.
> > 
> > I didn't find any discussions describing what for we need nr_cpumask_bits,
> > and the code adding it dates to a very long ago.
> > 
> > If I alias nr_cpumask_bits to nr_cpu_ids unconditionally on my VM with
> > NR_CPUs == 256 and nr_cpu_ids == 4, there's obviously a clear win in
> > performance, but the Image size gets 2.5K bigger. Probably that's the
> > reason for what nr_cpumask_bits was needed...
> 
> I think it makes sense to have a compile-time-constant value for nr_cpumask_bits
> in some cases. For example on embedded platforms, where every opportunity to
> save a few kB should be used, or cases where NR_CPUS <= BITS_PER_LONG.
> 
> > 
> > There's also a very old misleading comment in cpumask.h:
> > 
> >  *  If HOTPLUG is enabled, then cpu_possible_mask is forced to have
> >  *  all NR_CPUS bits set, otherwise it is just the set of CPUs that
> >  *  ACPI reports present at boot.
> > 
> > It lies, and I checked with x86_64 that cpu_possible_mask is populated
> > during boot time with 0b1111, if I create a 4-cpu VM. Hence, the
> > nr_cpu_ids is 4, while NR_CPUS == 256.
> > 
> > Interestingly, there's no a single user of the cpumask_full(),
> > obviously, because it's broken. This is really a broken dead code.
> > 
> > Now that we have a test that checks sanity of cpumasks, this mess
> > popped up.
> > 
> > Your fix doesn't look correct, because it fixes one function, and
> > doesn't touch others. For example, the cpumask subset() may fail
> > if src1p will have set bits after nr_cpu_ids, while cpumask_full()
> > will be returning true.
> 
> It appears the documentation for cpumask_full() is also incorrect, because it
> claims to check if all CPUs < nr_cpu_ids are set. Meanwhile, the implementation
> checks if all CPUs < nr_cpumask_bits are set.
> 
> cpumask_weight() has a similar issue, and maybe also other cpumask_*() functions
> (I didn't check in detail yet).
> 
> > 
> > In -next, there is an update from Sander for the cpumask test that
> > removes this check, and probably if you rebase on top of -next, you
> > can drop this and 2nd patch of your series.
> > 
> > What about proper fix? I think that a long time ago we didn't have
> > ACPI tables for possible cpus, and didn't populate cpumask_possible
> > from that, so the
> > 
> >         #define nr_cpumask_bits NR_CPUS
> > 
> > worked well. Now that we have cpumask_possible partially filled,
> > we have to always
> > 
> >         #define nr_cpumask_bits nr_cpu_ids
> > 
> > and pay +2.5K price in size even if OFFSTACK is OFF. At least, it wins
> > at runtime...
> > 
> > Any thoughts?
> 
> It looks like both nr_cpumask_bits and nr_cpu_ids are used in a number of places
> outside of lib/cpumask.c. Documentation for cpumask_*() functions almost always
> refers to nr_cpu_ids as a highest valid value.
> 
> Perhaps nr_cpumask_bits should become an variable for internal cpumask usage,
> and external users should only use nr_cpu_ids? The changes in 6.0 are my first
> real interaction with cpumask, so it's possible that there are things I'm
> missing here.
> 
> That being said, some of the cpumask tests compare results to nr_cpumask_bits,
> so those should then probably be fixed to compare against nr_cpu_ids instead.

Aha, and it kills me how we have such a mess in a very core subsystem.

We have 3 problems here:
 - mess with nr_cpumask_bits and nr_cpu_ids;
 - ineffectiveness of cpumask routines when nr_cpumask_bits > nr_cpu_ids;
 - runtime nature of nr_cpu_ids, even for those embedded systems with
   taught memory constraints. So that if we just drop nr_cpumask_bits,
   it will add 2.5K to the Image.

I think that dropping nr_cpumask_bits is our only choice, and to avoid
Image bloating for embedded users, we can hint the kernel that NR_CPUS
is an exact number, so that it will skip setting it in runtime.

I added a EXACT_NR_CPUS option for this, which works like this:

  #if (NR_CPUS == 1) || defined(CONFIG_EXACT_NR_CPUS)
  #define nr_cpu_ids      ((unsigned int)NR_CPUS)
  #else
  extern unsigned int nr_cpu_ids;
  #endif

  /* Deprecated */ 
  #define nr_cpumask_bits nr_cpu_ids

I tried it with arm64 4-CPU build. When the EXACT_NR_CPUS is enabled,
the difference is:
  add/remove: 3/4 grow/shrink: 46/729 up/down: 652/-46952 (-46300)
  Total: Before=25670945, After=25624645, chg -0.18%

Looks quite impressive to me. I'll send a patch soon.

Thanks,
Yury

  reply	other threads:[~2022-08-28 16:38 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 18:12 [PATCH v3 0/9] sched, net: NUMA-aware CPU spreading interface Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 1/9] cpumask: Make cpumask_full() check for nr_cpu_ids bits Valentin Schneider
2022-08-25 20:49   ` Yury Norov
2022-08-28  8:35     ` Sander Vanheule
2022-08-28 16:38       ` Yury Norov [this message]
2022-08-25 18:12 ` [PATCH v3 2/9] lib/test_cpumask: Make test_cpumask_last " Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 3/9] bitops: Introduce find_next_andnot_bit() Valentin Schneider
2022-08-25 21:05   ` Yury Norov
2022-08-25 23:17     ` Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 4/9] cpumask: Introduce for_each_cpu_andnot() Valentin Schneider
2022-08-25 21:14   ` Yury Norov
2022-09-05 16:44     ` Valentin Schneider
2022-09-05 18:33       ` Yury Norov
2022-08-25 18:12 ` [PATCH v3 5/9] lib/test_cpumask: Add for_each_cpu_and(not) tests Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 6/9] sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot() Valentin Schneider
2022-08-25 21:16   ` Yury Norov
2022-08-25 23:20     ` Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 7/9] sched/topology: Introduce sched_numa_hop_mask() Valentin Schneider
2022-08-26  8:14   ` Yicong Yang
2022-09-05 16:51     ` Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 8/9] sched/topology: Introduce for_each_numa_hop_cpu() Valentin Schneider
2022-09-05  9:46   ` Tariq Toukan
2022-09-05 16:44     ` Valentin Schneider
2022-08-25 18:12 ` [PATCH v3 9/9] SHOWCASE: net/mlx5e: Leverage for_each_numa_hop_cpu() Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YwuaCxd+0yHA+bxk@yury-laptop \
    --to=yury.norov@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=saeedm@nvidia.com \
    --cc=sander@svanheule.net \
    --cc=tariqt@nvidia.com \
    --cc=tony.luck@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox