From: Nicholas Piggin <npiggin@gmail.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>,
Segher Boessenkool <segher@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible
Date: Wed, 14 Apr 2021 12:01:21 +1000 [thread overview]
Message-ID: <1618365589.67fxh7cot9.astroid@bobo.none> (raw)
In-Reply-To: <20210413215803.GT26583@gate.crashing.org>
Excerpts from Segher Boessenkool's message of April 14, 2021 7:58 am:
> On Tue, Apr 13, 2021 at 06:33:19PM +0200, Christophe Leroy wrote:
>> Le 12/04/2021 à 23:54, Segher Boessenkool a écrit :
>> >On Thu, Apr 08, 2021 at 03:33:44PM +0000, Christophe Leroy wrote:
>> >>For clear bits, on 32 bits 'rlwinm' can be used instead or 'andc' for
>> >>when all bits to be cleared are consecutive.
>> >
>> >Also on 64-bits, as long as both the top and bottom bits are in the low
>> >32-bit half (for 32 bit mode, it can wrap as well).
>>
>> Yes. But here we are talking about clearing a few bits, all other ones must
>> remain unchanged. An rlwinm on PPC64 will always clear the upper part,
>> which is unlikely what we want.
>
> No, it does not. It takes the low 32 bits of the source reg, duplicated
> to the top half as well, then rotated, then ANDed with the mask (which
> can wrap around). This isn't very often very useful, but :-)
>
> (One useful operation is splatting 32 bits to both halves of a 64-bit
> register, which is just rlwinm d,s,0,1,0).
>
> If you only look at the low 32 bits, it does exactly the same as on
> 32-bit implementations.
>
>> >>For the time being only
>> >>handle the single bit case, which we detect by checking whether the
>> >>mask is a power of two.
>> >
>> >You could look at rs6000_is_valid_mask in GCC:
>> > <https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/rs6000/rs6000.c;h=48b8efd732b251c059628096314848305deb0c0b;hb=HEAD#l11148>
>> >used by rs6000_is_valid_and_mask immediately after it. You probably
>> >want to allow only rlwinm in your case, and please note this checks if
>> >something is a valid mask, not the inverse of a valid mask (as you
>> >want here).
>>
>> This check looks more complex than what I need. It is used for both rlw...
>> and rld..., and it calculates the operants. The only thing I need is to
>> validate the mask.
>
> It has to do exactly the same thing for rlwinm as for all 64-bit
> variants (rldicl, rldicr, rldic).
>
> One side effect of calculation the bit positions with exact_log2 is that
> that returns negative if the argument is not a power of two.
>
> Here is a simpler way, that handles all cases: input in "u32 val":
>
> if (!val)
> return nonono;
> if (val & 1)
> val = ~val; // make the mask non-wrapping
> val += val & -val; // adding the low set bit should result in
> // at most one bit set
> if (!(val & (val - 1)))
> return okidoki_all_good;
>
>> I found a way: By anding the mask with the complement of itself rotated by
>> left bits to 1, we identify the transitions from 0 to 1. If the result is a
>> power of 2, it means there's only one transition so the mask is as expected.
>
> That does not handle all cases (it misses all bits set at least). Which
> isn't all that interesting of course, but is a valid mask (but won't
> clear any bits, so not too interesting for your specific case :-) )
Would be nice if we could let the compiler deal with it all...
static inline unsigned long lr(unsigned long *mem)
{
unsigned long val;
/*
* This doesn't clobber memory but want to avoid memory operations
* moving ahead of it
*/
asm volatile("ldarx %0, %y1" : "=r"(val) : "Z"(*mem) : "memory");
return val;
}
static inline bool stc(unsigned long *mem, unsigned long val)
{
/*
* This doesn't really clobber memory but same as above, also can't
* specify output in asm goto.
*/
asm volatile goto(
"stdcx. %0, %y1 \n\t"
"bne- %l[fail] \n\t"
: : "r"(val), "Z"(*mem) : "cr0", "memory" : fail);
return true;
fail: __attribute__((cold))
return false;
}
static inline void atomic_add(unsigned long *mem, unsigned long val)
{
unsigned long old, new;
do {
old = lr(mem);
new = old + val;
} while (unlikely(!stc(mem, new)));
}
WARNING: multiple messages have this Message-ID (diff)
From: Nicholas Piggin <npiggin@gmail.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>,
Segher Boessenkool <segher@kernel.crashing.org>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Paul Mackerras <paulus@samba.org>
Subject: Re: [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible
Date: Wed, 14 Apr 2021 12:01:21 +1000 [thread overview]
Message-ID: <1618365589.67fxh7cot9.astroid@bobo.none> (raw)
In-Reply-To: <20210413215803.GT26583@gate.crashing.org>
Excerpts from Segher Boessenkool's message of April 14, 2021 7:58 am:
> On Tue, Apr 13, 2021 at 06:33:19PM +0200, Christophe Leroy wrote:
>> Le 12/04/2021 à 23:54, Segher Boessenkool a écrit :
>> >On Thu, Apr 08, 2021 at 03:33:44PM +0000, Christophe Leroy wrote:
>> >>For clear bits, on 32 bits 'rlwinm' can be used instead or 'andc' for
>> >>when all bits to be cleared are consecutive.
>> >
>> >Also on 64-bits, as long as both the top and bottom bits are in the low
>> >32-bit half (for 32 bit mode, it can wrap as well).
>>
>> Yes. But here we are talking about clearing a few bits, all other ones must
>> remain unchanged. An rlwinm on PPC64 will always clear the upper part,
>> which is unlikely what we want.
>
> No, it does not. It takes the low 32 bits of the source reg, duplicated
> to the top half as well, then rotated, then ANDed with the mask (which
> can wrap around). This isn't very often very useful, but :-)
>
> (One useful operation is splatting 32 bits to both halves of a 64-bit
> register, which is just rlwinm d,s,0,1,0).
>
> If you only look at the low 32 bits, it does exactly the same as on
> 32-bit implementations.
>
>> >>For the time being only
>> >>handle the single bit case, which we detect by checking whether the
>> >>mask is a power of two.
>> >
>> >You could look at rs6000_is_valid_mask in GCC:
>> > <https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/rs6000/rs6000.c;h=48b8efd732b251c059628096314848305deb0c0b;hb=HEAD#l11148>
>> >used by rs6000_is_valid_and_mask immediately after it. You probably
>> >want to allow only rlwinm in your case, and please note this checks if
>> >something is a valid mask, not the inverse of a valid mask (as you
>> >want here).
>>
>> This check looks more complex than what I need. It is used for both rlw...
>> and rld..., and it calculates the operants. The only thing I need is to
>> validate the mask.
>
> It has to do exactly the same thing for rlwinm as for all 64-bit
> variants (rldicl, rldicr, rldic).
>
> One side effect of calculation the bit positions with exact_log2 is that
> that returns negative if the argument is not a power of two.
>
> Here is a simpler way, that handles all cases: input in "u32 val":
>
> if (!val)
> return nonono;
> if (val & 1)
> val = ~val; // make the mask non-wrapping
> val += val & -val; // adding the low set bit should result in
> // at most one bit set
> if (!(val & (val - 1)))
> return okidoki_all_good;
>
>> I found a way: By anding the mask with the complement of itself rotated by
>> left bits to 1, we identify the transitions from 0 to 1. If the result is a
>> power of 2, it means there's only one transition so the mask is as expected.
>
> That does not handle all cases (it misses all bits set at least). Which
> isn't all that interesting of course, but is a valid mask (but won't
> clear any bits, so not too interesting for your specific case :-) )
Would be nice if we could let the compiler deal with it all...
static inline unsigned long lr(unsigned long *mem)
{
unsigned long val;
/*
* This doesn't clobber memory but want to avoid memory operations
* moving ahead of it
*/
asm volatile("ldarx %0, %y1" : "=r"(val) : "Z"(*mem) : "memory");
return val;
}
static inline bool stc(unsigned long *mem, unsigned long val)
{
/*
* This doesn't really clobber memory but same as above, also can't
* specify output in asm goto.
*/
asm volatile goto(
"stdcx. %0, %y1 \n\t"
"bne- %l[fail] \n\t"
: : "r"(val), "Z"(*mem) : "cr0", "memory" : fail);
return true;
fail: __attribute__((cold))
return false;
}
static inline void atomic_add(unsigned long *mem, unsigned long val)
{
unsigned long old, new;
do {
old = lr(mem);
new = old + val;
} while (unlikely(!stc(mem, new)));
}
next prev parent reply other threads:[~2021-04-14 2:02 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-08 15:33 [PATCH v1 1/2] powerpc/bitops: Use immediate operand when possible Christophe Leroy
2021-04-08 15:33 ` Christophe Leroy
2021-04-08 15:33 ` [PATCH v1 2/2] powerpc/atomics: " Christophe Leroy
2021-04-08 15:33 ` Christophe Leroy
2021-04-12 22:08 ` Segher Boessenkool
2021-04-12 22:08 ` Segher Boessenkool
2021-04-13 16:36 ` Christophe Leroy
2021-04-13 16:36 ` Christophe Leroy
2021-04-12 21:54 ` [PATCH v1 1/2] powerpc/bitops: " Segher Boessenkool
2021-04-12 21:54 ` Segher Boessenkool
2021-04-13 16:33 ` Christophe Leroy
2021-04-13 16:33 ` Christophe Leroy
2021-04-13 21:58 ` Segher Boessenkool
2021-04-13 21:58 ` Segher Boessenkool
2021-04-14 2:01 ` Nicholas Piggin [this message]
2021-04-14 2:01 ` Nicholas Piggin
2021-04-14 12:24 ` Segher Boessenkool
2021-04-14 12:24 ` Segher Boessenkool
2021-04-14 12:42 ` Christophe Leroy
2021-04-14 12:42 ` Christophe Leroy
2021-04-14 15:19 ` Segher Boessenkool
2021-04-14 15:19 ` Segher Boessenkool
2021-04-14 15:32 ` David Laight
2021-04-14 15:32 ` David Laight
2021-04-14 17:20 ` Segher Boessenkool
2021-04-14 17:20 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1618365589.67fxh7cot9.astroid@bobo.none \
--to=npiggin@gmail.com \
--cc=christophe.leroy@csgroup.eu \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=paulus@samba.org \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.