public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrey Semashev <andrey.semashev@gmail.com>
To: Yu-Jen Chang <arthurchang09@gmail.com>
Cc: andy@kernel.org, akinobu.mita@gmail.com,
	Ching-Chun Huang <jserv@ccns.ncku.edu.tw>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] lib/string.c: Optimize memchr()
Date: Mon, 11 Jul 2022 18:00:27 +0300	[thread overview]
Message-ID: <48db247e-f6fd-cb4b-7cc5-455bf26bb153@gmail.com> (raw)
In-Reply-To: <CAD4RrFPihC+8LScC1RJ5GfOsLs4kze0QwALS1ykNH_m89Z1NGg@mail.gmail.com>

On 7/11/22 17:52, Yu-Jen Chang wrote:
> Andrey Semashev <andrey.semashev@gmail.com> 於 2022年7月11日 週一 凌晨4:01寫道:
>>
>> On 7/10/22 17:28, Yu-Jen Chang wrote:
>>> The original version of memchr() is implemented with the byte-wise
>>> comparing technique, which does not fully use 64-bits or 32-bits
>>> registers in CPU. We use word-wide comparing so that 8 characters
>>> can be compared at the same time on CPU. This code is base on
>>> David Laight's implementation.
>>>
>>> We create two files to measure the performance. The first file
>>> contains on average 10 characters ahead the target character.
>>> The second file contains at least 1000 characters ahead the
>>> target character. Our implementation of “memchr()” is slightly
>>> better in the first test and nearly 4x faster than the orginal
>>> implementation in the second test.
>>>
>>> Signed-off-by: Yu-Jen Chang <arthurchang09@gmail.com>
>>> Signed-off-by: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
>>> ---
>>>  lib/string.c | 28 +++++++++++++++++++++-------
>>>  1 file changed, 21 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/lib/string.c b/lib/string.c
>>> index 80469e6c3..8ca965431 100644
>>> --- a/lib/string.c
>>> +++ b/lib/string.c
>>> @@ -905,21 +905,35 @@ EXPORT_SYMBOL(strnstr);
>>>  #ifndef __HAVE_ARCH_MEMCHR
>>>  /**
>>>   * memchr - Find a character in an area of memory.
>>> - * @s: The memory area
>>> + * @p: The memory area
>>>   * @c: The byte to search for
>>> - * @n: The size of the area.
>>> + * @length: The size of the area.
>>>   *
>>>   * returns the address of the first occurrence of @c, or %NULL
>>>   * if @c is not found
>>>   */
>>> -void *memchr(const void *s, int c, size_t n)
>>> +void *memchr(const void *p, int c, unsigned long length)

I didn't comment on this initially, but is the change of length type
intentional? Why?

>>>  {
>>> -     const unsigned char *p = s;
>>> -     while (n-- != 0) {
>>> -             if ((unsigned char)c == *p++) {
>>> -                     return (void *)(p - 1);
>>> +     u64 mask, val;
>>> +     const void *end = p + length;
>>> +
>>> +     c &= 0xff;
>>> +     if (p <= end - 8) {
>>> +             mask = c;
>>> +             MEMCHR_MASK_GEN(mask);
>>> +
>>> +             for (; p <= end - 8; p += 8) {
>>> +                     val = *(u64 *)p ^ mask;
>>
>> What if p is not aligned to 8 (or 4 on 32-bit targets) bytes? Not all
>> targets support (efficient) unaligned loads, do they?
> 
> I think it works if p is not aligned to 8 or 4 bytes.
> 
> Let's say the string is 10 bytes. The for loop here will search the first
> 8 bytes. If the target character is in the last 2 bytes, the second for
> loop will find it. It also work like this on 32-bit machine.

I think you're missing the point. Loads at unaligned addresses may not
be allowed by hardware using conventional load instructions or may be
inefficient. Given that this memchr implementation is used as a fallback
when no hardware-specific version is available, you should be
conservative wrt. hardware capabilities and behavior. You should
probably have a pre-alignment loop.

>>
>>> +                     if ((val + 0xfefefefefefefeffu) &
>>> +                         (~val & 0x8080808080808080u))
>>> +                             break;
>>>               }
>>>       }
>>> +
>>> +     for (; p < end; p++)
>>> +             if (*(unsigned char *)p == c)
>>> +                     return (void *)p;
>>> +
>>>       return NULL;
>>>  }
>>>  EXPORT_SYMBOL(memchr);
>>


  reply	other threads:[~2022-07-11 15:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-10 14:28 [PATCH 0/2] Optimize memchr() Yu-Jen Chang
2022-07-10 14:28 ` [PATCH 1/2] lib/string.c: Add a macro for memchr_inv() Yu-Jen Chang
2022-07-10 14:28 ` [PATCH 2/2] lib/string.c: Optimize memchr() Yu-Jen Chang
2022-07-10 15:16   ` Joe Perches
2022-07-11 14:50     ` Yu-Jen Chang
2022-07-10 16:58   ` kernel test robot
2022-07-10 20:01   ` Andrey Semashev
2022-07-11 14:52     ` Yu-Jen Chang
2022-07-11 15:00       ` Andrey Semashev [this message]
2022-07-11 15:03         ` Andy Shevchenko
2022-07-12 14:58         ` Yu-Jen Chang
2022-07-12 15:08           ` Andy Shevchenko
2022-07-13  9:39           ` David Laight
2022-07-13  9:49             ` Andrey Semashev
2022-07-13 10:02               ` Andrey Semashev
2022-07-13 10:24                 ` David Laight
2022-07-22 16:08                   ` Yu-Jen Chang
     [not found]                     ` <CAHp75Vfy6wYqzT-T9aEjVEAQCZ_k=0qN8S8OwG3knbrC-oOkMQ@mail.gmail.com>
2022-07-29 15:42                       ` Yu-Jen Chang
2022-07-13  9:57             ` Andrey Semashev
2022-07-21  5:06   ` kernel test robot
2022-08-12 19:06 ` [PATCH 0/2] " Pavel Machek
2022-08-15 10:59   ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48db247e-f6fd-cb4b-7cc5-455bf26bb153@gmail.com \
    --to=andrey.semashev@gmail.com \
    --cc=akinobu.mita@gmail.com \
    --cc=andy@kernel.org \
    --cc=arthurchang09@gmail.com \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox