From: Tao Zhou <tao.zhou@linux.dev>
To: Yu-Jen Chang <arthurchang09@gmail.com>, tao.zhou@linux.dev
Cc: ak@linux.intel.com, jdike@linux.intel.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, hpa@zytor.com, keescook@chromium.org,
linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
richard@nod.at, anton.ivanov@cambridgegreys.com,
johannes@sipsolutions.net, linux-um@lists.infradead.org,
jserv@ccns.ncku.edu.tw
Subject: Re: [PATCH 1/2] x86/lib: Optimize memchr()
Date: Sun, 29 May 2022 00:41:07 +0800 [thread overview]
Message-ID: <YpJQoxUt9RhKb0Pr@geo.homenetwork> (raw)
In-Reply-To: <20220528081236.3020-2-arthurchang09@gmail.com>
On Sat, May 28, 2022 at 04:12:35PM +0800, Yu-Jen Chang wrote:
> The original assembly version of memchr() is implemented with
> the byte-wise comparing technique, which does not fully
> use 64-bits registers in x86_64 CPU. We use word-wide
> comparing so that 8 characters can be compared at the same time
> on x86_64 CPU. First we align the input and then use word-wise
> comparing to find the first 64-bit word that contain the target.
> Secondly, we compare every byte in the word and get the output.
>
> We create two files to measure the performance. The first file
> contains on average 10 characters ahead the target character.
> The second file contains at least 1000 characters ahead the
> target character. Our implementation of “memchr()” is slightly
> better in the first test and nearly 4x faster than the orginal
> implementation in the second test.
>
> Signed-off-by: Yu-Jen Chang <arthurchang09@gmail.com>
> Signed-off-by: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
> ---
> arch/x86/include/asm/string_64.h | 3 ++
> arch/x86/lib/Makefile | 1 +
> arch/x86/lib/string_64.c | 78 ++++++++++++++++++++++++++++++++
> 3 files changed, 82 insertions(+)
> create mode 100644 arch/x86/lib/string_64.c
>
> diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
> index 6e450827f..edce657e0 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h
> @@ -14,6 +14,9 @@
> extern void *memcpy(void *to, const void *from, size_t len);
> extern void *__memcpy(void *to, const void *from, size_t len);
>
> +#define __HAVE_ARCH_MEMCHR
> +extern void *memchr(const void *cs, int c, size_t length);
> +
> #define __HAVE_ARCH_MEMSET
> void *memset(void *s, int c, size_t n);
> void *__memset(void *s, int c, size_t n);
> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> index f76747862..4d530e559 100644
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -69,5 +69,6 @@ else
> lib-y += clear_page_64.o copy_page_64.o
> lib-y += memmove_64.o memset_64.o
> lib-y += copy_user_64.o
> + lib-y += string_64.o
> lib-y += cmpxchg16b_emu.o
> endif
> diff --git a/arch/x86/lib/string_64.c b/arch/x86/lib/string_64.c
> new file mode 100644
> index 000000000..4e067d5be
> --- /dev/null
> +++ b/arch/x86/lib/string_64.c
> @@ -0,0 +1,78 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/string.h>
> +#include <linux/export.h>
> +#include <linux/align.h>
> +
> +/* How many bytes are loaded each iteration of the word copy loop */
> +#define LBLOCKSIZE (sizeof(long))
> +
> +#ifdef __HAVE_ARCH_MEMCHR
> +
> +void *memchr(const void *cs, int c, size_t length)
> +{
> + const unsigned char *src = (const unsigned char *)cs, d = c;
I don't know why this 'd = c' is not error.
d is a char pointer and c is int. At least this is not safe me do not know.
> + while (!IS_ALIGNED((long)src, sizeof(long))) {
> + if (!length--)
> + return NULL;
> + if (*src == d)
Compare a character value to a pointer value and this value is c.
May be right do not know.
Or:
char d = c;
...
> + return (void *)src;
> + src++;
> + }
> + if (length >= LBLOCKSIZE) {
> + unsigned long mask = d << 8 | d;
> + unsigned int i = 32;
> + long xor, data;
> + const long consta = 0xFEFEFEFEFEFEFEFF,
> + constb = 0x8080808080808080;
Two magic number..
> + /*
> + * Create a 8-bytes mask for word-wise comparing.
> + * For example, a mask for 'a' is 0x6161616161616161.
> + */
> +
> + mask |= mask << 16;
> + for (i = 32; i < LBLOCKSIZE * 8; i <<= 1)
> + mask |= mask << i;
> + /*
> + * We perform word-wise comparing with following operation:
> + * 1. Perform xor on the long word @src and @mask
> + * and put into @xor.
> + * 2. Add @xor with @consta.
> + * 3. ~@xor & @constb.
> + * 4. Perform & with the result of step 2 and 3.
> + *
> + * Step 1 creates a byte which is 0 in the long word if
> + * there is at least one target byte in it.
> + *
> + * Step 2 to Step 4 find if there is a byte with 0 in
> + * the long word.
> + */
> + asm volatile("1:\n\t"
> + "movq (%0),%1\n\t"
> + "xorq %6,%1\n\t"
> + "lea (%1,%4), %2\n\t"
> + "notq %1\n\t"
> + "andq %5,%1\n\t"
> + "testq %1,%2\n\t"
> + "jne 2f\n\t"
s/jne/jnz/
Lack much here from me. But I give example that should check the
CF flag is zero.
1) contain matching byte.
1111111011111111(consta)
+add
0000000001101100(xor)
------------------------
1111111101101011 (%1)
1111111110010100(~xor)
&and
1000000010000000(constb)
------------------------
1000000010000000 (%2)
the logical and of %1 and %2 is
1000000000000000 that is not zero.
2) not contain matching byte
1111111011111111
+
0110111011011100
----------------
0110110111011011(%1)
1001000100100011
&
1000000010000000
----------------
1000000000000000(%2)
%1 and %2 is
0000000000000000 that is zero.
I guess that here should use jump instruction jnz instead.
Even though, I do not know why that two magic number is so magical..
Thanks,
Tao
> + "add $8,%0\n\t"
> + "sub $8,%3\n\t"
> + "cmp $7,%3\n\t"
> + "ja 1b\n\t"
> + "2:\n\t"
> + : "=D"(src), "=r"(xor), "=r"(data), "=r"(length)
> + : "r"(consta), "r"(constb), "r"(mask), "0"(src),
> + "1"(xor), "2"(data), "3"(length)
> + : "memory", "cc");
> + }
> +
> + while (length--) {
> + if (*src == d)
> + return (void *)src;
> + src++;
> + }
> + return NULL;
> +}
> +EXPORT_SYMBOL(memchr);
> +#endif
> --
> 2.25.1
>
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
next prev parent reply other threads:[~2022-05-28 16:40 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-28 8:12 [PATCH 0/2] x86: Optimize memchr() for x86-64 Yu-Jen Chang
2022-05-28 8:12 ` [PATCH 1/2] x86/lib: Optimize memchr() Yu-Jen Chang
2022-05-28 16:41 ` Tao Zhou [this message]
2022-05-29 12:05 ` arthur chang arthur
2022-05-30 8:09 ` David Laight
2022-06-01 5:58 ` Yu-Jen Chang
2022-06-01 8:25 ` David Laight
2022-06-06 3:25 ` Yu-Jen Chang
2022-05-28 8:12 ` [PATCH 2/2] x86/um: Use x86_64-optimized memchr Yu-Jen Chang
2022-05-29 1:10 ` [PATCH 0/2] x86: Optimize memchr() for x86-64 Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YpJQoxUt9RhKb0Pr@geo.homenetwork \
--to=tao.zhou@linux.dev \
--cc=ak@linux.intel.com \
--cc=anton.ivanov@cambridgegreys.com \
--cc=arthurchang09@gmail.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jdike@linux.intel.com \
--cc=johannes@sipsolutions.net \
--cc=jserv@ccns.ncku.edu.tw \
--cc=keescook@chromium.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-um@lists.infradead.org \
--cc=mingo@redhat.com \
--cc=richard@nod.at \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox