From: Yury Norov <yury.norov@gmail.com>
To: Alexander Potapenko <glider@google.com>
Cc: catalin.marinas@arm.com, will@kernel.org, pcc@google.com,
andreyknvl@gmail.com, andriy.shevchenko@linux.intel.com,
linux@rasmusvillemoes.dk, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, eugenis@google.com,
syednwaris@gmail.com, william.gray@linaro.org,
Arnd Bergmann <arnd@arndb.de>
Subject: Re: [PATCH v4 1/5] lib/bitmap: add bitmap_{set,get}_value()
Date: Fri, 4 Aug 2023 12:55:01 -0700 [thread overview]
Message-ID: <ZM1XlUbAJ7Qpd6OO@yury-ThinkPad> (raw)
In-Reply-To: <CAG_fn=VrPJj6YowHNki5RGAAs8qvwZpUVN4K9qw=cf4aW7Qw9A@mail.gmail.com>
On Fri, Aug 04, 2023 at 06:07:00PM +0200, Alexander Potapenko wrote:
> > space >= nbits <=>
> > BITS_PER_LONG - offset >= nbits <=>
> > offset + nbits <= BITS_PER_LONG
> >
> > > map[index] &= (fit ? (~(GENMASK(nbits - 1, 0) << offset)) :
> >
> > So here GENMASK(nbits + offset - 1, offset) is at max:
> > GENMASK(BITS_PER_LONG - 1, offset). And it never overflows, which is my
> > point. Does it make sense?
>
> It indeed does. Perhaps pulling offset inside GENMASK is not a bug
> after all (a simple test does not show any difference between their
> behavior.
> But `GENMASK(nbits - 1 + offset, offset)` blows up the code (see below).
> My guess is that this happens because the compiler fails to reuse the
> value of `GENMASK(nbits - 1, 0)` used to clamp the value to write, and
> calculates `GENMASK(nbits - 1 + offset, offset)` from scratch.
OK. Can you put a comment explaining this? Or maybe would be even
better to use BITMAP_LAST_WORD_MASK() here:
mask = BITMAP_LAST_WORD_MASK(nbits);
value &= mask;
...
map[index] &= (fit ? (~mask << offset)) :
> > > ~BITMAP_FIRST_WORD_MASK(start));
> >
> > As I said, ~BITMAP_FIRST_WORD_MASK() is the same as BITMAP_LAST_WORD_MASK()
> > and vise-versa.
>
> Surprisingly, ~BITMAP_FIRST_WORD_MASK() generates better code than
> BITMAP_LAST_WORD_MASK().
Wow... If that's consistent across different compilers/arches, we'd
just drop the latter. Thanks for pointing that. I'll check.
> > > map[index] |= value << offset;
> > > if (fit)
> > > return;
> > >
> > > map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
>
> OTOH I managed to shave three more bytes off by replacing
> ~BITMAP_LAST_WORD_MASK with a BITMAP_FIRST_WORD_MASK here.
>
> > > map[index + 1] |= (value >> space);
> > > }
>
> I'll post the implementations together with the disassembly below.
> I used some Clang 17.0.0 version that is a couple months behind
> upstream, but that still produces sustainably shorter code (~48 bytes
> less) than the trunk GCC on Godbolt.
>
> 1. Original implementation of bitmap_write() from this patch - 164
> bytes (interestingly, it's 157 bytes with Clang 14.0.6)
I spotted that too in some other case. Newer compilers tend to
generate bigger code, but the result usually works faster. One
particular reason for my case was a loop unrolling.
[...]
> 3. My improved version built on top of yours and mentioned above under
> the name bitmap_write_new() - 116 bytes:
30% better in size - that's impressive!
> ==================================================================
> void bitmap_write_new(unsigned long *map, unsigned long value,
> unsigned long start, unsigned long nbits)
> {
> unsigned long offset;
> unsigned long space;
> size_t index;
> bool fit;
>
> if (unlikely(!nbits))
> return;
>
> value &= GENMASK(nbits - 1, 0);
> offset = start % BITS_PER_LONG;
> space = BITS_PER_LONG - offset;
> index = BIT_WORD(start);
> fit = space >= nbits;
>
> map[index] &= (fit ? (~(GENMASK(nbits - 1, 0) << offset)) :
> ~BITMAP_FIRST_WORD_MASK(start));
> map[index] |= value << offset;
> if (fit)
> return;
>
> map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
> map[index + 1] |= (value >> space);
> }
Thanks,
Yury
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Yury Norov <yury.norov@gmail.com>
To: Alexander Potapenko <glider@google.com>
Cc: catalin.marinas@arm.com, will@kernel.org, pcc@google.com,
andreyknvl@gmail.com, andriy.shevchenko@linux.intel.com,
linux@rasmusvillemoes.dk, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, eugenis@google.com,
syednwaris@gmail.com, william.gray@linaro.org,
Arnd Bergmann <arnd@arndb.de>
Subject: Re: [PATCH v4 1/5] lib/bitmap: add bitmap_{set,get}_value()
Date: Fri, 4 Aug 2023 12:55:01 -0700 [thread overview]
Message-ID: <ZM1XlUbAJ7Qpd6OO@yury-ThinkPad> (raw)
In-Reply-To: <CAG_fn=VrPJj6YowHNki5RGAAs8qvwZpUVN4K9qw=cf4aW7Qw9A@mail.gmail.com>
On Fri, Aug 04, 2023 at 06:07:00PM +0200, Alexander Potapenko wrote:
> > space >= nbits <=>
> > BITS_PER_LONG - offset >= nbits <=>
> > offset + nbits <= BITS_PER_LONG
> >
> > > map[index] &= (fit ? (~(GENMASK(nbits - 1, 0) << offset)) :
> >
> > So here GENMASK(nbits + offset - 1, offset) is at max:
> > GENMASK(BITS_PER_LONG - 1, offset). And it never overflows, which is my
> > point. Does it make sense?
>
> It indeed does. Perhaps pulling offset inside GENMASK is not a bug
> after all (a simple test does not show any difference between their
> behavior.
> But `GENMASK(nbits - 1 + offset, offset)` blows up the code (see below).
> My guess is that this happens because the compiler fails to reuse the
> value of `GENMASK(nbits - 1, 0)` used to clamp the value to write, and
> calculates `GENMASK(nbits - 1 + offset, offset)` from scratch.
OK. Can you put a comment explaining this? Or maybe would be even
better to use BITMAP_LAST_WORD_MASK() here:
mask = BITMAP_LAST_WORD_MASK(nbits);
value &= mask;
...
map[index] &= (fit ? (~mask << offset)) :
> > > ~BITMAP_FIRST_WORD_MASK(start));
> >
> > As I said, ~BITMAP_FIRST_WORD_MASK() is the same as BITMAP_LAST_WORD_MASK()
> > and vise-versa.
>
> Surprisingly, ~BITMAP_FIRST_WORD_MASK() generates better code than
> BITMAP_LAST_WORD_MASK().
Wow... If that's consistent across different compilers/arches, we'd
just drop the latter. Thanks for pointing that. I'll check.
> > > map[index] |= value << offset;
> > > if (fit)
> > > return;
> > >
> > > map[index + 1] &= ~BITMAP_LAST_WORD_MASK(start + nbits);
>
> OTOH I managed to shave three more bytes off by replacing
> ~BITMAP_LAST_WORD_MASK with a BITMAP_FIRST_WORD_MASK here.
>
> > > map[index + 1] |= (value >> space);
> > > }
>
> I'll post the implementations together with the disassembly below.
> I used some Clang 17.0.0 version that is a couple months behind
> upstream, but that still produces sustainably shorter code (~48 bytes
> less) than the trunk GCC on Godbolt.
>
> 1. Original implementation of bitmap_write() from this patch - 164
> bytes (interestingly, it's 157 bytes with Clang 14.0.6)
I spotted that too in some other case. Newer compilers tend to
generate bigger code, but the result usually works faster. One
particular reason for my case was a loop unrolling.
[...]
> 3. My improved version built on top of yours and mentioned above under
> the name bitmap_write_new() - 116 bytes:
30% better in size - that's impressive!
> ==================================================================
> void bitmap_write_new(unsigned long *map, unsigned long value,
> unsigned long start, unsigned long nbits)
> {
> unsigned long offset;
> unsigned long space;
> size_t index;
> bool fit;
>
> if (unlikely(!nbits))
> return;
>
> value &= GENMASK(nbits - 1, 0);
> offset = start % BITS_PER_LONG;
> space = BITS_PER_LONG - offset;
> index = BIT_WORD(start);
> fit = space >= nbits;
>
> map[index] &= (fit ? (~(GENMASK(nbits - 1, 0) << offset)) :
> ~BITMAP_FIRST_WORD_MASK(start));
> map[index] |= value << offset;
> if (fit)
> return;
>
> map[index + 1] &= BITMAP_FIRST_WORD_MASK(start + nbits);
> map[index + 1] |= (value >> space);
> }
Thanks,
Yury
next prev parent reply other threads:[~2023-08-04 19:55 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-20 17:39 [PATCH v4 0/5] Implement MTE tag compression for swapped pages Alexander Potapenko
2023-07-20 17:39 ` Alexander Potapenko
2023-07-20 17:39 ` [PATCH v4 1/5] lib/bitmap: add bitmap_{set,get}_value() Alexander Potapenko
2023-07-20 17:39 ` Alexander Potapenko
2023-07-23 1:57 ` Yury Norov
2023-07-23 1:57 ` Yury Norov
2023-07-23 15:38 ` Yury Norov
2023-09-22 7:48 ` Alexander Potapenko
2023-09-22 7:48 ` Alexander Potapenko
2023-07-24 8:36 ` Andy Shevchenko
2023-07-25 5:04 ` Yury Norov
2023-07-25 9:00 ` Andy Shevchenko
2023-07-26 8:08 ` Alexander Potapenko
2023-07-27 0:14 ` Yury Norov
2023-07-27 0:14 ` Yury Norov
2023-08-04 16:07 ` Alexander Potapenko
2023-08-04 16:07 ` Alexander Potapenko
2023-08-04 19:55 ` Yury Norov [this message]
2023-08-04 19:55 ` Yury Norov
2023-08-04 20:05 ` Andy Shevchenko
2023-08-04 20:05 ` Andy Shevchenko
2023-09-22 7:49 ` Alexander Potapenko
2023-09-22 7:49 ` Alexander Potapenko
2023-09-22 7:47 ` Alexander Potapenko
2023-09-22 7:47 ` Alexander Potapenko
2023-07-20 17:39 ` [PATCH v4 2/5] lib/test_bitmap: add tests for bitmap_{set,get}_value() Alexander Potapenko
2023-07-20 17:39 ` Alexander Potapenko
2023-07-23 2:29 ` Yury Norov
2023-07-23 2:29 ` Yury Norov
2023-09-22 7:57 ` Alexander Potapenko
2023-09-22 7:57 ` Alexander Potapenko
2023-09-22 13:28 ` Yury Norov
2023-09-22 13:28 ` Yury Norov
2023-09-27 12:33 ` Alexander Potapenko
2023-09-27 12:33 ` Alexander Potapenko
2023-07-20 17:39 ` [PATCH v4 3/5] arm64: mte: implement CONFIG_ARM64_MTE_COMP Alexander Potapenko
2023-07-20 17:39 ` Alexander Potapenko
2023-07-21 11:22 ` Andy Shevchenko
2023-07-21 11:22 ` Andy Shevchenko
2023-09-22 8:03 ` Alexander Potapenko
2023-09-22 8:03 ` Alexander Potapenko
2023-08-18 17:57 ` Catalin Marinas
2023-08-18 17:57 ` Catalin Marinas
2023-09-22 8:04 ` Alexander Potapenko
2023-09-22 8:04 ` Alexander Potapenko
2023-07-20 17:39 ` [PATCH v4 4/5] arm64: mte: add a test for MTE tags compression Alexander Potapenko
2023-07-20 17:39 ` Alexander Potapenko
2023-07-21 11:25 ` Andy Shevchenko
2023-07-21 11:25 ` Andy Shevchenko
2023-09-22 8:05 ` Alexander Potapenko
2023-09-22 8:05 ` Alexander Potapenko
2023-07-20 17:39 ` [PATCH v4 5/5] arm64: mte: add compression support to mteswap.c Alexander Potapenko
2023-07-20 17:39 ` Alexander Potapenko
2023-08-18 18:18 ` Catalin Marinas
2023-08-18 18:18 ` Catalin Marinas
2023-09-20 13:26 ` Alexander Potapenko
2023-09-20 13:26 ` Alexander Potapenko
2023-09-20 16:18 ` Alexander Potapenko
2023-09-20 16:18 ` Alexander Potapenko
2023-09-20 14:22 ` Alexander Potapenko
2023-09-20 14:22 ` Alexander Potapenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZM1XlUbAJ7Qpd6OO@yury-ThinkPad \
--to=yury.norov@gmail.com \
--cc=andreyknvl@gmail.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=arnd@arndb.de \
--cc=catalin.marinas@arm.com \
--cc=eugenis@google.com \
--cc=glider@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=pcc@google.com \
--cc=syednwaris@gmail.com \
--cc=will@kernel.org \
--cc=william.gray@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.