From: Jisheng Zhang <jszhang@kernel.org>
To: Nick Kossifidis <mick@ics.forth.gr>
Cc: Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
Matteo Croce <mcroce@microsoft.com>
Subject: Re: [PATCH 3/3] riscv: optimized memset
Date: Tue, 30 Jan 2024 21:25:54 +0800 [thread overview]
Message-ID: <Zbj44v8QsQPtQ_jD@xhacker> (raw)
In-Reply-To: <b7ae944c-2b7c-4c8d-8623-a8387b8d4e02@ics.forth.gr>
On Tue, Jan 30, 2024 at 02:07:37PM +0200, Nick Kossifidis wrote:
> On 1/28/24 13:10, Jisheng Zhang wrote:
> > diff --git a/arch/riscv/lib/string.c b/arch/riscv/lib/string.c
> > index 20677c8067da..022edda68f1c 100644
> > --- a/arch/riscv/lib/string.c
> > +++ b/arch/riscv/lib/string.c
> > @@ -144,3 +144,44 @@ void *memmove(void *dest, const void *src, size_t count) __weak __alias(__memmov
> > EXPORT_SYMBOL(memmove);
> > void *__pi_memmove(void *dest, const void *src, size_t count) __alias(__memmove);
> > void *__pi___memmove(void *dest, const void *src, size_t count) __alias(__memmove);
> > +
> > +void *__memset(void *s, int c, size_t count)
> > +{
> > + union types dest = { .as_u8 = s };
> > +
> > + if (count >= MIN_THRESHOLD) {
> > + unsigned long cu = (unsigned long)c;
> > +
> > + /* Compose an ulong with 'c' repeated 4/8 times */
> > +#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER
> > + cu *= 0x0101010101010101UL;
Here we need to check BITS_PER_LONG, use 0x01010101UL for rv32
> > +#else
> > + cu |= cu << 8;
> > + cu |= cu << 16;
> > + /* Suppress warning on 32 bit machines */
> > + cu |= (cu << 16) << 16;
> > +#endif
>
> I guess you could check against __SIZEOF_LONG__ here.
Hmm I believe we can remove the | and shift totally, and fall
back to ARCH_HAS_FAST_MULTIPLIER, see
https://lore.kernel.org/linux-riscv/20240125145703.913-1-jszhang@kernel.org/
>
> > + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
> > + /*
> > + * Fill the buffer one byte at time until
> > + * the destination is word aligned.
> > + */
> > + for (; count && dest.as_uptr & WORD_MASK; count--)
> > + *dest.as_u8++ = c;
> > + }
> > +
> > + /* Copy using the largest size allowed */
> > + for (; count >= BYTES_LONG; count -= BYTES_LONG)
> > + *dest.as_ulong++ = cu;
> > + }
> > +
> > + /* copy the remainder */
> > + while (count--)
> > + *dest.as_u8++ = c;
> > +
> > + return s;
> > +}
> > +EXPORT_SYMBOL(__memset);
>
> BTW a similar approach could be used for memchr, e.g.:
>
> #if __SIZEOF_LONG__ == 8
> #define HAS_ZERO(_x) (((_x) - 0x0101010101010101ULL) & ~(_x) &
> 0x8080808080808080ULL)
> #else
> #define HAS_ZERO(_x) (((_x) - 0x01010101UL) & ~(_x) & 0x80808080UL)
> #endif
>
> void *
> memchr(const void *src_ptr, int c, size_t len)
> {
> union const_data src = { .as_bytes = src_ptr };
> unsigned char byte = (unsigned char) c;
> unsigned long mask = (unsigned long) c;
> size_t remaining = len;
>
> /* Nothing to do */
> if (!src_ptr || !len)
> return NULL;
>
> if (len < 2 * WORD_SIZE)
> goto trailing;
>
> mask |= mask << 8;
> mask |= mask << 16;
> #if __SIZEOF_LONG__ == 8
> mask |= mask << 32;
> #endif
>
> /* Search by byte up to the src's alignment boundary */
> for(; src.as_uptr & WORD_MASK; remaining--, src.as_bytes++) {
> if (*src.as_bytes == byte)
> return (void*) src.as_bytes;
> }
>
> /* Search word by word using the mask */
> for(; remaining >= WORD_SIZE; remaining -= WORD_SIZE, src.as_ulong++) {
> unsigned long check = *src.as_ulong ^ mask;
> if(HAS_ZERO(check))
> break;
> }
>
> trailing:
> for(; remaining > 0; remaining--, src.as_bytes++) {
> if (*src.as_bytes == byte)
> return (void*) src.as_bytes;
> }
>
> return NULL;
> }
>
> Regards,
> Nick
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: Jisheng Zhang <jszhang@kernel.org>
To: Nick Kossifidis <mick@ics.forth.gr>
Cc: Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
Matteo Croce <mcroce@microsoft.com>
Subject: Re: [PATCH 3/3] riscv: optimized memset
Date: Tue, 30 Jan 2024 21:25:54 +0800 [thread overview]
Message-ID: <Zbj44v8QsQPtQ_jD@xhacker> (raw)
In-Reply-To: <b7ae944c-2b7c-4c8d-8623-a8387b8d4e02@ics.forth.gr>
On Tue, Jan 30, 2024 at 02:07:37PM +0200, Nick Kossifidis wrote:
> On 1/28/24 13:10, Jisheng Zhang wrote:
> > diff --git a/arch/riscv/lib/string.c b/arch/riscv/lib/string.c
> > index 20677c8067da..022edda68f1c 100644
> > --- a/arch/riscv/lib/string.c
> > +++ b/arch/riscv/lib/string.c
> > @@ -144,3 +144,44 @@ void *memmove(void *dest, const void *src, size_t count) __weak __alias(__memmov
> > EXPORT_SYMBOL(memmove);
> > void *__pi_memmove(void *dest, const void *src, size_t count) __alias(__memmove);
> > void *__pi___memmove(void *dest, const void *src, size_t count) __alias(__memmove);
> > +
> > +void *__memset(void *s, int c, size_t count)
> > +{
> > + union types dest = { .as_u8 = s };
> > +
> > + if (count >= MIN_THRESHOLD) {
> > + unsigned long cu = (unsigned long)c;
> > +
> > + /* Compose an ulong with 'c' repeated 4/8 times */
> > +#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER
> > + cu *= 0x0101010101010101UL;
Here we need to check BITS_PER_LONG, use 0x01010101UL for rv32
> > +#else
> > + cu |= cu << 8;
> > + cu |= cu << 16;
> > + /* Suppress warning on 32 bit machines */
> > + cu |= (cu << 16) << 16;
> > +#endif
>
> I guess you could check against __SIZEOF_LONG__ here.
Hmm I believe we can remove the | and shift totally, and fall
back to ARCH_HAS_FAST_MULTIPLIER, see
https://lore.kernel.org/linux-riscv/20240125145703.913-1-jszhang@kernel.org/
>
> > + if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) {
> > + /*
> > + * Fill the buffer one byte at time until
> > + * the destination is word aligned.
> > + */
> > + for (; count && dest.as_uptr & WORD_MASK; count--)
> > + *dest.as_u8++ = c;
> > + }
> > +
> > + /* Copy using the largest size allowed */
> > + for (; count >= BYTES_LONG; count -= BYTES_LONG)
> > + *dest.as_ulong++ = cu;
> > + }
> > +
> > + /* copy the remainder */
> > + while (count--)
> > + *dest.as_u8++ = c;
> > +
> > + return s;
> > +}
> > +EXPORT_SYMBOL(__memset);
>
> BTW a similar approach could be used for memchr, e.g.:
>
> #if __SIZEOF_LONG__ == 8
> #define HAS_ZERO(_x) (((_x) - 0x0101010101010101ULL) & ~(_x) &
> 0x8080808080808080ULL)
> #else
> #define HAS_ZERO(_x) (((_x) - 0x01010101UL) & ~(_x) & 0x80808080UL)
> #endif
>
> void *
> memchr(const void *src_ptr, int c, size_t len)
> {
> union const_data src = { .as_bytes = src_ptr };
> unsigned char byte = (unsigned char) c;
> unsigned long mask = (unsigned long) c;
> size_t remaining = len;
>
> /* Nothing to do */
> if (!src_ptr || !len)
> return NULL;
>
> if (len < 2 * WORD_SIZE)
> goto trailing;
>
> mask |= mask << 8;
> mask |= mask << 16;
> #if __SIZEOF_LONG__ == 8
> mask |= mask << 32;
> #endif
>
> /* Search by byte up to the src's alignment boundary */
> for(; src.as_uptr & WORD_MASK; remaining--, src.as_bytes++) {
> if (*src.as_bytes == byte)
> return (void*) src.as_bytes;
> }
>
> /* Search word by word using the mask */
> for(; remaining >= WORD_SIZE; remaining -= WORD_SIZE, src.as_ulong++) {
> unsigned long check = *src.as_ulong ^ mask;
> if(HAS_ZERO(check))
> break;
> }
>
> trailing:
> for(; remaining > 0; remaining--, src.as_bytes++) {
> if (*src.as_bytes == byte)
> return (void*) src.as_bytes;
> }
>
> return NULL;
> }
>
> Regards,
> Nick
next prev parent reply other threads:[~2024-01-30 13:39 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-28 11:10 [PATCH 0/3] riscv: optimize memcpy/memmove/memset Jisheng Zhang
2024-01-28 11:10 ` Jisheng Zhang
2024-01-28 11:10 ` [PATCH 1/3] riscv: optimized memcpy Jisheng Zhang
2024-01-28 11:10 ` Jisheng Zhang
2024-01-28 12:35 ` David Laight
2024-01-28 12:35 ` David Laight
2024-01-30 12:11 ` Nick Kossifidis
2024-01-30 12:11 ` Nick Kossifidis
2024-01-30 22:44 ` kernel test robot
2024-01-31 0:19 ` kernel test robot
2024-01-31 0:19 ` kernel test robot
2024-01-28 11:10 ` [PATCH 2/3] riscv: optimized memmove Jisheng Zhang
2024-01-28 11:10 ` Jisheng Zhang
2024-01-28 12:47 ` David Laight
2024-01-28 12:47 ` David Laight
2024-01-30 11:30 ` Jisheng Zhang
2024-01-30 11:30 ` Jisheng Zhang
2024-01-30 11:51 ` David Laight
2024-01-30 11:51 ` David Laight
2024-01-30 11:39 ` Nick Kossifidis
2024-01-30 11:39 ` Nick Kossifidis
2024-01-30 13:12 ` Jisheng Zhang
2024-01-30 13:12 ` Jisheng Zhang
2024-01-30 16:52 ` Nick Kossifidis
2024-01-30 16:52 ` Nick Kossifidis
2024-01-31 5:25 ` Jisheng Zhang
2024-01-31 5:25 ` Jisheng Zhang
2024-01-31 9:13 ` Nick Kossifidis
2024-01-31 9:13 ` Nick Kossifidis
2024-01-28 11:10 ` [PATCH 3/3] riscv: optimized memset Jisheng Zhang
2024-01-28 11:10 ` Jisheng Zhang
2024-01-30 12:07 ` Nick Kossifidis
2024-01-30 12:07 ` Nick Kossifidis
2024-01-30 13:25 ` Jisheng Zhang [this message]
2024-01-30 13:25 ` Jisheng Zhang
2024-02-01 23:04 ` David Laight
2024-02-01 23:04 ` David Laight
2024-01-29 18:16 ` [PATCH 0/3] riscv: optimize memcpy/memmove/memset Conor Dooley
2024-01-29 18:16 ` Conor Dooley
2024-01-30 2:28 ` Jisheng Zhang
2024-01-30 2:28 ` Jisheng Zhang
-- strict thread matches above, loose matches on Subject: below --
2021-06-15 2:38 [PATCH 0/3] riscv: optimized mem* functions Matteo Croce
2021-06-15 2:38 ` [PATCH 3/3] riscv: optimized memset Matteo Croce
2021-06-15 2:38 ` Matteo Croce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zbj44v8QsQPtQ_jD@xhacker \
--to=jszhang@kernel.org \
--cc=aou@eecs.berkeley.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=mcroce@microsoft.com \
--cc=mick@ics.forth.gr \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.