public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: I Hsin Cheng <richard120310@gmail.com>
Cc: anshuman.khandual@arm.com, arnd@arndb.de,
	linux-kernel@vger.kernel.org, jserv@ccns.ncku.edu.tw,
	skhan@linuxfoundation.org
Subject: Re: [PATCH 1/2] uapi: Refactor __GENMASK() for speed-up
Date: Tue, 11 Feb 2025 13:39:32 -0500	[thread overview]
Message-ID: <Z6uZZPpQ9YYfrL-I@thinkpad> (raw)
In-Reply-To: <20250211162412.477655-2-richard120310@gmail.com>

On Wed, Feb 12, 2025 at 12:24:11AM +0800, I Hsin Cheng wrote:
> The calculation of "((~_UL(0)) - (_UL(1) << (l)) + 1)" is to generate a
> bitmask with "l" trailing zeroes, which is equivalent to
> "(~_UL(0) << (l))".

I used to think that GENMASK() is a compile-time macro. __GENMASK() is
not, but it has very limited usage through the kernel, all in the uapi.
 
> Refactor the calculation so the number of arithmetic instruction will be
> reduced from 3 to 1.

I'd like to look at it! Please share disassembly.

> Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
> ---
> Test is done to show the speed-up we can get from reducing the number of
> instruction. The test machine runs with 6.9.0-0-generic kernel on x86_64
> architecture with processor AMD Ryzen 7 5700X3D 8-Core Processor.

So you CC arm maintainers and provide tests against x86_64. Are your
improvements consistent for arm, power and other arches?

Can you run bloat-o-meter against your change?

> The test executes in the form of kernel module.
> 
> Test result:
> [451675.644026] new __GENMASK() : 5985416
> [451675.644030] old __GENMASK() : 6006406

The difference is 0.35%? That's impressive!

Can you please run your test multiple times and print those numbers
together with confidence interval?
 
> Test script snippet:
> /* switch to __BITS_PER_LONG_LONG when testing __GENMASK_ULL() */
> \#define __GENMASK_NEW(h, l) \
>         ((~_UL(0) << (l)) & \
>          (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))
>
> int init_module(void)
> {
>     ktime_t start, end, total1 = 0, total2 = 0;
>     for (int k = 0; k < 100; k++) {
>         for (int i = 0; i < __BITS_PER_LONG; i++) {
>             for (int j = 0; j <= i; j++) {
>                 unsigned result1, result2;
> 
>                 start = ktime_get();
>                 result1 = __GENMASK_NEW(i, j);
>                 end = ktime_get();
>                 total1 += (end - start);

Nah, this is wrong. I feel like you measure performance of
ktime_get() rather than GENMASK().

Do it this way:

        start = ktime_get();
        for (...) {
                __always_used result = GENMASK();
        }
        time = ktime_get() - start;


> 
>                 start = ktime_get();
>                 result2 = __GENMASK(i, j);

Please test GENMASK(), not __GENMASK().

>                 end = ktime_get();
>                 total2 += (end - start);
> 
>                 if (result1 != result2) {
>                     pr_info("Wrong calculation of GENMASK_NEW()\n");
>                     return 0;
>                 }

For functional testing we have lib/test_bits.c, lib/test_bitmap.c and
others. I expect you'll run them all to check correctness. Once you do
that, you don't need to test correctness here.

>             }
>         }
>     }
> 
>     pr_info("new __GENMASK() : %lld\n", total1);
>     pr_info("old __GENMASK() : %lld\n", total2);
> 
>     return 0;
> }

Please incorporate this in one of existing tests and prepend the
series with it. That way you'll let the others to run before/after for
your series.

> ---
>  include/uapi/linux/bits.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/bits.h b/include/uapi/linux/bits.h
> index 5ee30f882736..8fc7fea65288 100644
> --- a/include/uapi/linux/bits.h
> +++ b/include/uapi/linux/bits.h
> @@ -5,7 +5,7 @@
>  #define _UAPI_LINUX_BITS_H
>  
>  #define __GENMASK(h, l) \
> -        (((~_UL(0)) - (_UL(1) << (l)) + 1) & \
> +	((~_UL(0) << (l)) & \
>           (~_UL(0) >> (__BITS_PER_LONG - 1 - (h))))

Now it would fit a single line, right?

If it works, this is a nice simplification, even though it's only a
compile time thing.

Please address all the comments above and run low-level tests I
mentioned above, so that we'll make sure it has no side effects.

Thanks,
Yury

>  
>  #define __GENMASK_ULL(h, l) \
> -- 
> 2.43.0

  reply	other threads:[~2025-02-11 18:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11 16:24 [PATCH 0/2] uapi: Refactor __GENMASK() and __GENMASK_ULL() for I Hsin Cheng
2025-02-11 16:24 ` [PATCH 1/2] uapi: Refactor __GENMASK() for speed-up I Hsin Cheng
2025-02-11 18:39   ` Yury Norov [this message]
2025-02-11 18:44     ` Yury Norov
2025-02-12 14:01       ` I Hsin Cheng
2025-02-13 19:54         ` Yury Norov
2025-02-14 14:51           ` I Hsin Cheng
2025-02-14 15:55             ` Yury Norov
2025-02-12 13:50     ` I Hsin Cheng
2025-02-11 16:24 ` [PATCH 2/2] uapi: Refactor __GENMASK_ULL() " I Hsin Cheng
2025-02-11 22:30   ` David Laight
2025-02-12 12:39     ` I Hsin Cheng
2025-02-12 13:52       ` David Laight
2025-02-12 14:12       ` Yury Norov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6uZZPpQ9YYfrL-I@thinkpad \
    --to=yury.norov@gmail.com \
    --cc=anshuman.khandual@arm.com \
    --cc=arnd@arndb.de \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richard120310@gmail.com \
    --cc=skhan@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox