* [PATCH v2] flattree: Optimize stringtable_insert() with memmem()
@ 2025-04-17 8:59 Yao Zi
2025-04-18 7:58 ` David Gibson
0 siblings, 1 reply; 2+ messages in thread
From: Yao Zi @ 2025-04-17 8:59 UTC (permalink / raw)
To: devicetree-compiler; +Cc: David Gibson, Yao Zi
According to perf result, stringtable_insert() is one of the five hotest
functions, which is obvious since it depends on a brute-force,
quadratic-complexity method to deduplicate the string block and is
called at creation of every property.
This patch replaces the brute-force deduplication with libc-provided
memmem(), which is guaranteed to be in linear complexity on both glibc
and musl libc.
On an i7-7200U Linux system with musl-libc, building the "dtbs" target
in Linux 6.11's arm64 port takes 19.1s less in average, achieving 25.1%
speed up.
Signed-off-by: Yao Zi <ziyao@disroot.org>
---
Changed from v1
- Drop the hashtable for simplicity and update the commit message
- Add a comment to explain restrictions on deduplication
- Link to v1: https://lore.kernel.org/devicetree-compiler/20250412100351.9009-2-ziyao@disroot.org/
flattree.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/flattree.c b/flattree.c
index 30e6de2044b2..d7328690e008 100644
--- a/flattree.c
+++ b/flattree.c
@@ -220,17 +220,24 @@ static struct emitter asm_emitter = {
static int stringtable_insert(struct data *d, const char *str)
{
- unsigned int i;
+ const char *dup;
+ size_t size;
- /* FIXME: do this more efficiently? */
+ /*
+ * Reuse (subsequence of) existing string if possible. E.g.
+ * "enable-method" could be reused when inserting "method".
+ *
+ * The terminating '\0' must be taken into account, or "foo\0" may be
+ * wrongly recognized as subsequence of "foobar\0".
+ */
+ size = strlen(str) + 1;
+ dup = memmem(d->val, d->len, str, size);
+ if (dup)
+ return dup - d->val;
- for (i = 0; i < d->len; i++) {
- if (streq(str, d->val + i))
- return i;
- }
+ *d = data_append_data(*d, str, size);
- *d = data_append_data(*d, str, strlen(str)+1);
- return i;
+ return d->len - size;
}
static void flatten_tree(struct node *tree, struct emitter *emit,
--
2.49.0
^ permalink raw reply related [flat|nested] 2+ messages in thread* Re: [PATCH v2] flattree: Optimize stringtable_insert() with memmem()
2025-04-17 8:59 [PATCH v2] flattree: Optimize stringtable_insert() with memmem() Yao Zi
@ 2025-04-18 7:58 ` David Gibson
0 siblings, 0 replies; 2+ messages in thread
From: David Gibson @ 2025-04-18 7:58 UTC (permalink / raw)
To: Yao Zi; +Cc: devicetree-compiler
[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]
On Thu, Apr 17, 2025 at 08:59:00AM +0000, Yao Zi wrote:
> According to perf result, stringtable_insert() is one of the five hotest
> functions, which is obvious since it depends on a brute-force,
> quadratic-complexity method to deduplicate the string block and is
> called at creation of every property.
>
> This patch replaces the brute-force deduplication with libc-provided
> memmem(), which is guaranteed to be in linear complexity on both glibc
> and musl libc.
>
> On an i7-7200U Linux system with musl-libc, building the "dtbs" target
> in Linux 6.11's arm64 port takes 19.1s less in average, achieving 25.1%
> speed up.
>
> Signed-off-by: Yao Zi <ziyao@disroot.org>
This looks great, except for one detail. I went to apply this, but it
failed the github tests on Windows. Looks like the Windows libc
doesn't include memmem(), so we'll need to have a fallback
implementation. Trickiest part will be working out when to use the
alternative.
> ---
>
> Changed from v1
> - Drop the hashtable for simplicity and update the commit message
> - Add a comment to explain restrictions on deduplication
> - Link to v1: https://lore.kernel.org/devicetree-compiler/20250412100351.9009-2-ziyao@disroot.org/
>
> flattree.c | 23 +++++++++++++++--------
> 1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/flattree.c b/flattree.c
> index 30e6de2044b2..d7328690e008 100644
> --- a/flattree.c
> +++ b/flattree.c
> @@ -220,17 +220,24 @@ static struct emitter asm_emitter = {
>
> static int stringtable_insert(struct data *d, const char *str)
> {
> - unsigned int i;
> + const char *dup;
> + size_t size;
>
> - /* FIXME: do this more efficiently? */
> + /*
> + * Reuse (subsequence of) existing string if possible. E.g.
> + * "enable-method" could be reused when inserting "method".
> + *
> + * The terminating '\0' must be taken into account, or "foo\0" may be
> + * wrongly recognized as subsequence of "foobar\0".
> + */
> + size = strlen(str) + 1;
> + dup = memmem(d->val, d->len, str, size);
> + if (dup)
> + return dup - d->val;
>
> - for (i = 0; i < d->len; i++) {
> - if (streq(str, d->val + i))
> - return i;
> - }
> + *d = data_append_data(*d, str, size);
>
> - *d = data_append_data(*d, str, strlen(str)+1);
> - return i;
> + return d->len - size;
> }
>
> static void flatten_tree(struct node *tree, struct emitter *emit,
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-04-18 7:58 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-17 8:59 [PATCH v2] flattree: Optimize stringtable_insert() with memmem() Yao Zi
2025-04-18 7:58 ` David Gibson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox