From: Ivan Orlov <ivan.orlov0322@gmail.com>
To: David Laight <David.Laight@ACULAB.COM>,
"paul.walmsley@sifive.com" <paul.walmsley@sifive.com>,
"palmer@dabbelt.com" <palmer@dabbelt.com>,
"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>
Cc: "conor.dooley@microchip.com" <conor.dooley@microchip.com>,
"ajones@ventanamicro.com" <ajones@ventanamicro.com>,
"samuel@sholland.org" <samuel@sholland.org>,
"alexghiti@rivosinc.com" <alexghiti@rivosinc.com>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"skhan@linuxfoundation.org" <skhan@linuxfoundation.org>
Subject: Re: [PATCH] riscv: lib: Optimize 'strlen' function
Date: Sun, 17 Dec 2023 23:23:19 +0000 [thread overview]
Message-ID: <45351e30-d197-4b9c-864f-8ff5f9b6ab61@gmail.com> (raw)
In-Reply-To: <a210197c479e48778672aa13287eef88@AcuMS.aculab.com>
On 12/17/23 18:10, David Laight wrote:
> From: Ivan Orlov
>> Sent: 13 December 2023 15:46
>
> Looking at the old code...
>
>> 1:
>> - lbu t0, 0(t1)
>> - beqz t0, 2f
>> - addi t1, t1, 1
>> - j 1b
>
> I suspect there is (at least) a two clock stall between
> the 'ldu' and 'beqz'.
Hmm, the stall exists due to memory access? Why does two subsequent
accesses to the memory (as in the example you provided) do the trick? Is
it because two "ldb"s could be parallelized?
> Allowing for one clock for the 'predicted taken' branch
> that is 7 clocks/byte.
>
> Try this one - especially on 32bit:
>
> mov t0, a0
> and t1, t0, 1
> sub t0, t0, t1
> bnez t1, 2f
> 1:
> ldb t1, 0(t0)
> 2: ldb t2, 1(t0)
> add t0, t0, 2
> beqz t1, 3f
> bnez t2, 1b
> add t0, t0, 1
> 3: sub t0, t0, 2
> sub a0, t0, a0
> ret
>
I tested it on my 64bit board, and this variant is definitely faster
than the original implementation! Here is the results of the benchmark
which compares this variant with the word-oriented one:
Test count per size: 1000
Size: 1 (+-0), mean_old: 711, mean_new: 708
Size: 2 (+-0), mean_old: 649, mean_new: 713
Size: 4 (+-0), mean_old: 499, mean_new: 506
Size: 8 (+-0), mean_old: 344, mean_new: 350
Size: 16 (+-0), mean_old: 342, mean_new: 362
Size: 32 (+-0), mean_old: 369, mean_new: 387
Size: 64 (+-0), mean_old: 393, mean_new: 401
Size: 128 (+-4), mean_old: 457, mean_new: 424
Size: 256 (+-13), mean_old: 578, mean_new: 476
Size: 512 (+-31), mean_old: 842, mean_new: 573
Size: 1024 (+-19), mean_old: 1305, mean_new: 777
Size: 2048 (+-97), mean_old: 2280, mean_new: 1193
Size: 4096 (+-149), mean_old: 4226, mean_new: 2002
Size: 8192 (+-439), mean_old: 8131, mean_new: 3634
Size: 16384 (+-615), mean_old: 16353, mean_new: 6905
Size: 32768 (+-2566), mean_old: 37075, mean_new: 14232
Size: 65536 (+-6047), mean_old: 73797, mean_new: 37090
Size: 131072 (+-10071), mean_old: 146802, mean_new: 73402
Size: 262144 (+-18150), mean_old: 293003, mean_new: 146118
Size: 524288 (+-21247), mean_old: 585057, mean_new: 291324
Benchmark code:
https://github.com/ivanorlov2206/strlen-benchmark/blob/main/strlentest.c
It looks like the variant you suggested could be faster for shorter
strings even on the 64bit platform.
Maybe we could enhance it even more by loading 4 consequent bytes into
different registers so the memory loads would still be parallelized?
--
Kind regards,
Ivan Orlov
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2023-12-17 23:23 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-13 15:45 [PATCH] riscv: lib: Optimize 'strlen' function Ivan Orlov
2023-12-17 17:00 ` David Laight
2023-12-17 22:52 ` Ivan Orlov
2023-12-18 1:41 ` Ivan Orlov
2023-12-18 9:20 ` David Laight
2023-12-18 10:03 ` Ivan Orlov
2023-12-18 10:12 ` David Laight
2023-12-17 18:10 ` David Laight
2023-12-17 23:23 ` Ivan Orlov [this message]
2023-12-18 9:12 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45351e30-d197-4b9c-864f-8ff5f9b6ab61@gmail.com \
--to=ivan.orlov0322@gmail.com \
--cc=David.Laight@ACULAB.COM \
--cc=ajones@ventanamicro.com \
--cc=alexghiti@rivosinc.com \
--cc=aou@eecs.berkeley.edu \
--cc=conor.dooley@microchip.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=samuel@sholland.org \
--cc=skhan@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).