From: Ivan Orlov <ivan.orlov0322@gmail.com>
To: David Laight <David.Laight@ACULAB.COM>,
"paul.walmsley@sifive.com" <paul.walmsley@sifive.com>,
"palmer@dabbelt.com" <palmer@dabbelt.com>,
"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>
Cc: "conor.dooley@microchip.com" <conor.dooley@microchip.com>,
"ajones@ventanamicro.com" <ajones@ventanamicro.com>,
"samuel@sholland.org" <samuel@sholland.org>,
"alexghiti@rivosinc.com" <alexghiti@rivosinc.com>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"skhan@linuxfoundation.org" <skhan@linuxfoundation.org>
Subject: Re: [PATCH] riscv: lib: Optimize 'strlen' function
Date: Mon, 18 Dec 2023 10:03:21 +0000 [thread overview]
Message-ID: <f12ef9ea-14fe-491a-890b-a645bb4827cb@gmail.com> (raw)
In-Reply-To: <bd9159806a2e4fd188a78515b58ec51e@AcuMS.aculab.com>
On 12/18/23 09:20, David Laight wrote:
> From: Ivan Orlov
>> Sent: 18 December 2023 01:42
>>
>> On 12/17/23 17:00, David Laight wrote:
>>> I'd also guess that pretty much all the calls in-kernel are short.
>>> You might try counting as: histogram[ilog2(strlen_result)]++
>>> and seeing what it shows for some workload.
>>> I bet you (a beer if I see you!) that you won't see many over 1k.
>>
>> Hi David,
>>
>> Here is the statistics for strlen result:
>>
>> [ 223.169575] Calls count for 2^0: 6150
>> [ 223.173293] Calls count for 2^1: 184852
>> [ 223.177142] Calls count for 2^2: 313896
>> [ 223.180990] Calls count for 2^3: 185844
>> [ 223.184881] Calls count for 2^4: 87868
>> [ 223.188660] Calls count for 2^5: 9916
>> [ 223.192368] Calls count for 2^6: 1865
>> [ 223.196062] Calls count for 2^7: 0
>> [ 223.199483] Calls count for 2^8: 0
>> [ 223.202952] Calls count for 2^9: 0
>> ...
>>
>> Looks like I've just lost a beer :)
>>
>> Considering this statistics, I'd say implementing the word-oriented
>> strlen is an overcomplication - we wouldn't get any performance gain and
>> it just doesn't worth it.
>
> And the 32bit version is about half the speed of the 64bit one.
>
> Of course, the fast way to do strlen is add a custom instruction!
>
>> I simplified your code a little bit, it looks like the alignment there
>> is unnecessary: QEMU test shows the same performance independently from
>> alignment. Tests on the board gave the same result (perhaps because the
>> CPU on the board has 2 DDR channels?)
>
> The alignment is there because it can overread the string end
> by one byte - and that mustn't cross a page boundary.
> So you either have to mark the second load as 'may fault return
> zero' or just not do it.
>
> If the data isn't in cache the cache load will dominate.
> The DDR channels only affect cache load times.
> Get a TLB miss and add a few thousand more clocks!
>
Ah, right, sounds reasonable...
Overall, I believe your solution is better and it would be more fair if
you send it as a patch :) Here is benchmark results for your version vs
the original (the old) one on the Starfive VisionFive2 RISC-V board:
Size: 1 (+-0), mean_old: 350, mean_new: 340
Size: 2 (+-0), mean_old: 337, mean_new: 347
Size: 4 (+-0), mean_old: 322, mean_new: 355
Size: 8 (+-0), mean_old: 345, mean_new: 335
Size: 16 (+-0), mean_old: 352, mean_new: 367
Size: 32 (+-0), mean_old: 425, mean_new: 362
Size: 64 (+-4), mean_old: 507, mean_new: 407
Size: 128 (+-10), mean_old: 730, mean_new: 442
Size: 256 (+-19), mean_old: 1142, mean_new: 592
Size: 512 (+-6), mean_old: 1945, mean_new: 812
Size: 1024 (+-21), mean_old: 3565, mean_new: 1312
Size: 2048 (+-108), mean_old: 6812, mean_new: 2280
Size: 4096 (+-362), mean_old: 13302, mean_new: 4242
Size: 8192 (+-385), mean_old: 26393, mean_new: 8160
Size: 16384 (+-1115), mean_old: 52689, mean_new: 15953
Size: 32768 (+-2515), mean_old: 107293, mean_new: 32391
Size: 65536 (+-6041), mean_old: 213789, mean_new: 74354
Size: 131072 (+-12352), mean_old: 426619, mean_new: 146972
Size: 262144 (+-2635), mean_old: 848115, mean_new: 291309
Size: 524288 (+-3336), mean_old: 1712847, mean_new: 589654
>>
>> --
>> Kind regards,
>> Ivan Orlov
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
--
Kind regards,
Ivan Orlov
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2023-12-18 10:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-13 15:45 [PATCH] riscv: lib: Optimize 'strlen' function Ivan Orlov
2023-12-17 17:00 ` David Laight
2023-12-17 22:52 ` Ivan Orlov
2023-12-18 1:41 ` Ivan Orlov
2023-12-18 9:20 ` David Laight
2023-12-18 10:03 ` Ivan Orlov [this message]
2023-12-18 10:12 ` David Laight
2023-12-17 18:10 ` David Laight
2023-12-17 23:23 ` Ivan Orlov
2023-12-18 9:12 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f12ef9ea-14fe-491a-890b-a645bb4827cb@gmail.com \
--to=ivan.orlov0322@gmail.com \
--cc=David.Laight@ACULAB.COM \
--cc=ajones@ventanamicro.com \
--cc=alexghiti@rivosinc.com \
--cc=aou@eecs.berkeley.edu \
--cc=conor.dooley@microchip.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=samuel@sholland.org \
--cc=skhan@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).