From: Yury Norov <yury.norov@gmail.com>
To: I Hsin Cheng <richard120310@gmail.com>
Cc: anshuman.khandual@arm.com, arnd@arndb.de,
linux-kernel@vger.kernel.org, jserv@ccns.ncku.edu.tw,
skhan@linuxfoundation.org
Subject: Re: [PATCH 1/2] uapi: Refactor __GENMASK() for speed-up
Date: Tue, 11 Feb 2025 13:39:32 -0500 [thread overview]
Message-ID: <Z6uZZPpQ9YYfrL-I@thinkpad> (raw)
In-Reply-To: <20250211162412.477655-2-richard120310@gmail.com>
On Wed, Feb 12, 2025 at 12:24:11AM +0800, I Hsin Cheng wrote:
> The calculation of "((~_UL(0)) - (_UL(1) << (l)) + 1)" is to generate a
> bitmask with "l" trailing zeroes, which is equivalent to
> "(~_UL(0) << (l))".
I used to think that GENMASK() is a compile-time macro. __GENMASK() is
not, but it has very limited usage through the kernel, all in the uapi.
> Refactor the calculation so the number of arithmetic instruction will be
> reduced from 3 to 1.
I'd like to look at it! Please share disassembly.
> Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
> ---
> Test is done to show the speed-up we can get from reducing the number of
> instruction. The test machine runs with 6.9.0-0-generic kernel on x86_64
> architecture with processor AMD Ryzen 7 5700X3D 8-Core Processor.
So you CC arm maintainers and provide tests against x86_64. Are your
improvements consistent for arm, power and other arches?
Can you run bloat-o-meter against your change?
> The test executes in the form of kernel module.
>
> Test result:
> [451675.644026] new __GENMASK() : 5985416
> [451675.644030] old __GENMASK() : 6006406
The difference is 0.35%? That's impressive!
Can you please run your test multiple times and print those numbers
together with confidence interval?
> Test script snippet:
> /* switch to __BITS_PER_LONG_LONG when testing __GENMASK_ULL() */
> \#define __GENMASK_NEW(h, l) \
> ((~_UL(0) << (l)) & \
> (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
>
> int init_module(void)
> {
> ktime_t start, end, total1 = 0, total2 = 0;
> for (int k = 0; k < 100; k++) {
> for (int i = 0; i < __BITS_PER_LONG; i++) {
> for (int j = 0; j <= i; j++) {
> unsigned result1, result2;
>
> start = ktime_get();
> result1 = __GENMASK_NEW(i, j);
> end = ktime_get();
> total1 += (end - start);
Nah, this is wrong. I feel like you measure performance of
ktime_get() rather than GENMASK().
Do it this way:
start = ktime_get();
for (...) {
__always_used result = GENMASK();
}
time = ktime_get() - start;
>
> start = ktime_get();
> result2 = __GENMASK(i, j);
Please test GENMASK(), not __GENMASK().
> end = ktime_get();
> total2 += (end - start);
>
> if (result1 != result2) {
> pr_info("Wrong calculation of GENMASK_NEW()\n");
> return 0;
> }
For functional testing we have lib/test_bits.c, lib/test_bitmap.c and
others. I expect you'll run them all to check correctness. Once you do
that, you don't need to test correctness here.
> }
> }
> }
>
> pr_info("new __GENMASK() : %lld\n", total1);
> pr_info("old __GENMASK() : %lld\n", total2);
>
> return 0;
> }
Please incorporate this in one of existing tests and prepend the
series with it. That way you'll let the others to run before/after for
your series.
> ---
> include/uapi/linux/bits.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bits.h b/include/uapi/linux/bits.h
> index 5ee30f882736..8fc7fea65288 100644
> --- a/include/uapi/linux/bits.h
> +++ b/include/uapi/linux/bits.h
> @@ -5,7 +5,7 @@
> #define _UAPI_LINUX_BITS_H
>
> #define __GENMASK(h, l) \
> - (((~_UL(0)) - (_UL(1) << (l)) + 1) & \
> + ((~_UL(0) << (l)) & \
> (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
Now it would fit a single line, right?
If it works, this is a nice simplification, even though it's only a
compile time thing.
Please address all the comments above and run low-level tests I
mentioned above, so that we'll make sure it has no side effects.
Thanks,
Yury
>
> #define __GENMASK_ULL(h, l) \
> --
> 2.43.0
next prev parent reply other threads:[~2025-02-11 18:39 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 16:24 [PATCH 0/2] uapi: Refactor __GENMASK() and __GENMASK_ULL() for I Hsin Cheng
2025-02-11 16:24 ` [PATCH 1/2] uapi: Refactor __GENMASK() for speed-up I Hsin Cheng
2025-02-11 18:39 ` Yury Norov [this message]
2025-02-11 18:44 ` Yury Norov
2025-02-12 14:01 ` I Hsin Cheng
2025-02-13 19:54 ` Yury Norov
2025-02-14 14:51 ` I Hsin Cheng
2025-02-14 15:55 ` Yury Norov
2025-02-12 13:50 ` I Hsin Cheng
2025-02-11 16:24 ` [PATCH 2/2] uapi: Refactor __GENMASK_ULL() " I Hsin Cheng
2025-02-11 22:30 ` David Laight
2025-02-12 12:39 ` I Hsin Cheng
2025-02-12 13:52 ` David Laight
2025-02-12 14:12 ` Yury Norov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6uZZPpQ9YYfrL-I@thinkpad \
--to=yury.norov@gmail.com \
--cc=anshuman.khandual@arm.com \
--cc=arnd@arndb.de \
--cc=jserv@ccns.ncku.edu.tw \
--cc=linux-kernel@vger.kernel.org \
--cc=richard120310@gmail.com \
--cc=skhan@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.