From: Yury Norov <yury.norov@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Gabriele Monaco <gmonaco@redhat.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Michael Jeanson <mjeanson@efficios.com>,
Jens Axboe <axboe@kernel.dk>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Gautham R. Shenoy" <gautham.shenoy@amd.com>,
Florian Weimer <fweimer@redhat.com>,
Tim Chen <tim.c.chen@intel.com>
Subject: Re: [patch V2 07/20] cpumask: Introduce cpumask_or_and_calc_weight()
Date: Thu, 23 Oct 2025 12:37:40 -0400 [thread overview]
Message-ID: <aPpZ1HHXh_RMGIjR@yury> (raw)
In-Reply-To: <20251022110555.837390652@linutronix.de>
On Wed, Oct 22, 2025 at 02:55:28PM +0200, Thomas Gleixner wrote:
> CID management OR's two cpumasks and then calculates the weight on the
> result. That's inefficient as that has to walk the same stuff twice. As
> this is done with runqueue lock held, there is a real benefit of speeding
> this up. Depending on the system this results in 10-20% less cycles spent
> with runqueue lock held for a 4K cpumask.
>
> Provide cpumask_or_and_calc_weight() and the corresponding bitmap functions
> which return the weight of the OR result right away.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Yury Norov <yury.norov@gmail.com>
> ---
> V2: Rename and use the BITMAP_WEIGHT() macro - Yury
> ---
> include/linux/bitmap.h | 16 ++++++++++++++++
> include/linux/cpumask.h | 16 ++++++++++++++++
> lib/bitmap.c | 6 ++++++
> 3 files changed, 38 insertions(+)
>
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -45,6 +45,8 @@ struct device;
> * bitmap_copy(dst, src, nbits) *dst = *src
> * bitmap_and(dst, src1, src2, nbits) *dst = *src1 & *src2
> * bitmap_or(dst, src1, src2, nbits) *dst = *src1 | *src2
> + * bitmap_or_and_calc_weight(dst, src1, src2, nbits)
> + * *dst = *src1 | *src2. Returns Hamming Weight of dst
> * bitmap_xor(dst, src1, src2, nbits) *dst = *src1 ^ *src2
> * bitmap_andnot(dst, src1, src2, nbits) *dst = *src1 & ~(*src2)
> * bitmap_complement(dst, src, nbits) *dst = ~(*src)
> @@ -165,6 +167,8 @@ bool __bitmap_and(unsigned long *dst, co
> const unsigned long *bitmap2, unsigned int nbits);
> void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1,
> const unsigned long *bitmap2, unsigned int nbits);
> +unsigned int __bitmap_or_and_calc_weight(unsigned long *dst, const unsigned long *bitmap1,
> + const unsigned long *bitmap2, unsigned int nbits);
> void __bitmap_xor(unsigned long *dst, const unsigned long *bitmap1,
> const unsigned long *bitmap2, unsigned int nbits);
> bool __bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1,
> @@ -338,6 +342,18 @@ void bitmap_or(unsigned long *dst, const
> }
>
> static __always_inline
> +unsigned int bitmap_or_and_calc_weight(unsigned long *dst, const unsigned long *src1,
> + const unsigned long *src2, unsigned int nbits)
> +{
> + if (small_const_nbits(nbits)) {
> + *dst = *src1 | *src2;
> + return hweight_long(*dst & BITMAP_LAST_WORD_MASK(nbits));
> + } else {
> + return __bitmap_or_and_calc_weight(dst, src1, src2, nbits);
> + }
> +}
> +
> +static __always_inline
> void bitmap_xor(unsigned long *dst, const unsigned long *src1,
> const unsigned long *src2, unsigned int nbits)
> {
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -729,6 +729,22 @@ void cpumask_or(struct cpumask *dstp, co
> }
>
> /**
> + * cpumask_or_and_calc_weight - *dstp = *src1p | *src2p and return the weight of the result
> + * @dstp: the cpumask result
> + * @src1p: the first input
> + * @src2p: the second input
> + *
> + * Return: The number of bits set in the resulting cpumask @dstp
> + */
> +static __always_inline
> +unsigned int cpumask_or_and_calc_weight(struct cpumask *dstp, const struct cpumask *src1p,
> + const struct cpumask *src2p)
> +{
> + return bitmap_or_and_calc_weight(cpumask_bits(dstp), cpumask_bits(src1p),
> + cpumask_bits(src2p), small_cpumask_bits);
> +}
> +
> +/**
> * cpumask_xor - *dstp = *src1p ^ *src2p
> * @dstp: the cpumask result
> * @src1p: the first input
> --- a/lib/bitmap.c
> +++ b/lib/bitmap.c
> @@ -355,6 +355,12 @@ unsigned int __bitmap_weight_andnot(cons
> }
> EXPORT_SYMBOL(__bitmap_weight_andnot);
>
> +unsigned int __bitmap_or_and_calc_weight(unsigned long *dst, const unsigned long *bitmap1,
> + const unsigned long *bitmap2, unsigned int bits)
> +{
> + return BITMAP_WEIGHT(({dst[idx] = bitmap1[idx] | bitmap2[idx]; dst[idx]; }), bits);
Nice!
> +}
> +
Reviewed-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Maybe bitmap_weighted_or()?
next prev parent reply other threads:[~2025-10-23 16:37 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-22 12:55 [patch V2 00/20] sched: Rewrite MM CID management Thomas Gleixner
2025-10-22 12:55 ` [patch V2 01/20] sched/mmcid: Revert the complex " Thomas Gleixner
2025-10-22 12:55 ` [patch V2 02/20] sched/mmcid: Use proper data structures Thomas Gleixner
2025-10-22 12:55 ` [patch V2 03/20] sched/mmcid: Cacheline align MM CID storage Thomas Gleixner
2025-10-22 12:55 ` [patch V2 04/20] sched: Fixup whitespace damage Thomas Gleixner
2025-10-22 12:55 ` [patch V2 05/20] sched/mmcid: Move scheduler code out of global header Thomas Gleixner
2025-10-22 12:55 ` [patch V2 06/20] sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() Thomas Gleixner
2025-10-22 12:55 ` [patch V2 07/20] cpumask: Introduce cpumask_or_and_calc_weight() Thomas Gleixner
2025-10-23 16:37 ` Yury Norov [this message]
2025-10-22 12:55 ` [patch V2 08/20] sched/mmcid: Use cpumask_or_and_calc_weight() Thomas Gleixner
2025-10-23 16:45 ` Yury Norov
2025-10-22 12:55 ` [patch V2 09/20] cpumask: Cache num_possible_cpus() Thomas Gleixner
2025-10-23 16:25 ` Yury Norov
2025-10-22 12:55 ` [patch V2 10/20] sched/mmcid: Convert mm CID mask to a bitmap Thomas Gleixner
2025-10-23 16:46 ` Yury Norov
2025-10-27 5:45 ` Shrikanth Hegde
2025-10-27 8:57 ` Thomas Gleixner
2025-10-22 12:55 ` [patch V2 11/20] signal: Move MMCID exit out of sighand lock Thomas Gleixner
2025-10-22 12:55 ` [patch V2 12/20] sched/mmcid: Move initialization out of line Thomas Gleixner
2025-10-22 12:55 ` [patch V2 13/20] sched/mmcid: Provide precomputed maximal value Thomas Gleixner
2025-10-22 12:55 ` [patch V2 14/20] sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex Thomas Gleixner
2025-10-22 12:55 ` [patch V2 15/20] sched/mmcid: Introduce per task/CPU ownership infrastrcuture Thomas Gleixner
2025-10-22 12:55 ` [patch V2 16/20] sched/mmcid: Provide new scheduler CID mechanism Thomas Gleixner
2025-10-27 5:11 ` Shrikanth Hegde
2025-10-27 8:54 ` Thomas Gleixner
2025-10-22 12:55 ` [patch V2 17/20] sched/mmcid: Provide CID ownership mode fixup functions Thomas Gleixner
2025-10-22 12:55 ` [patch V2 18/20] irqwork: Move data struct to a types header Thomas Gleixner
2025-10-22 12:55 ` [patch V2 19/20] sched/mmcid: Implement deferred mode change Thomas Gleixner
2025-10-22 12:55 ` [patch V2 20/20] sched/mmcid: Switch over to the new mechanism Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPpZ1HHXh_RMGIjR@yury \
--to=yury.norov@gmail.com \
--cc=axboe@kernel.dk \
--cc=fweimer@redhat.com \
--cc=gautham.shenoy@amd.com \
--cc=gmonaco@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mjeanson@efficios.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.