All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] flattree: Optimize stringtable_insert() with memmem()
@ 2025-04-17  8:59 Yao Zi
  2025-04-18  7:58 ` David Gibson
  0 siblings, 1 reply; 2+ messages in thread
From: Yao Zi @ 2025-04-17  8:59 UTC (permalink / raw)
  To: devicetree-compiler; +Cc: David Gibson, Yao Zi

According to perf result, stringtable_insert() is one of the five hotest
functions, which is obvious since it depends on a brute-force,
quadratic-complexity method to deduplicate the string block and is
called at creation of every property.

This patch replaces the brute-force deduplication with libc-provided
memmem(), which is guaranteed to be in linear complexity on both glibc
and musl libc.

On an i7-7200U Linux system with musl-libc, building the "dtbs" target
in Linux 6.11's arm64 port takes 19.1s less in average, achieving 25.1%
speed up.

Signed-off-by: Yao Zi <ziyao@disroot.org>
---

Changed from v1
- Drop the hashtable for simplicity and update the commit message
- Add a comment to explain restrictions on deduplication
- Link to v1: https://lore.kernel.org/devicetree-compiler/20250412100351.9009-2-ziyao@disroot.org/

 flattree.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/flattree.c b/flattree.c
index 30e6de2044b2..d7328690e008 100644
--- a/flattree.c
+++ b/flattree.c
@@ -220,17 +220,24 @@ static struct emitter asm_emitter = {
 
 static int stringtable_insert(struct data *d, const char *str)
 {
-	unsigned int i;
+	const char *dup;
+	size_t size;
 
-	/* FIXME: do this more efficiently? */
+	/*
+	 * Reuse (subsequence of) existing string if possible. E.g.
+	 * "enable-method" could be reused when inserting "method".
+	 *
+	 * The terminating '\0' must be taken into account, or "foo\0" may be
+	 * wrongly recognized as subsequence of "foobar\0".
+	 */
+	size = strlen(str) + 1;
+	dup = memmem(d->val, d->len, str, size);
+	if (dup)
+		return dup - d->val;
 
-	for (i = 0; i < d->len; i++) {
-		if (streq(str, d->val + i))
-			return i;
-	}
+	*d = data_append_data(*d, str, size);
 
-	*d = data_append_data(*d, str, strlen(str)+1);
-	return i;
+	return d->len - size;
 }
 
 static void flatten_tree(struct node *tree, struct emitter *emit,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] flattree: Optimize stringtable_insert() with memmem()
  2025-04-17  8:59 [PATCH v2] flattree: Optimize stringtable_insert() with memmem() Yao Zi
@ 2025-04-18  7:58 ` David Gibson
  0 siblings, 0 replies; 2+ messages in thread
From: David Gibson @ 2025-04-18  7:58 UTC (permalink / raw)
  To: Yao Zi; +Cc: devicetree-compiler

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

On Thu, Apr 17, 2025 at 08:59:00AM +0000, Yao Zi wrote:
> According to perf result, stringtable_insert() is one of the five hotest
> functions, which is obvious since it depends on a brute-force,
> quadratic-complexity method to deduplicate the string block and is
> called at creation of every property.
> 
> This patch replaces the brute-force deduplication with libc-provided
> memmem(), which is guaranteed to be in linear complexity on both glibc
> and musl libc.
> 
> On an i7-7200U Linux system with musl-libc, building the "dtbs" target
> in Linux 6.11's arm64 port takes 19.1s less in average, achieving 25.1%
> speed up.
> 
> Signed-off-by: Yao Zi <ziyao@disroot.org>

This looks great, except for one detail.  I went to apply this, but it
failed the github tests on Windows.  Looks like the Windows libc
doesn't include memmem(), so we'll need to have a fallback
implementation.  Trickiest part will be working out when to use the
alternative.

> ---
> 
> Changed from v1
> - Drop the hashtable for simplicity and update the commit message
> - Add a comment to explain restrictions on deduplication
> - Link to v1: https://lore.kernel.org/devicetree-compiler/20250412100351.9009-2-ziyao@disroot.org/
> 
>  flattree.c | 23 +++++++++++++++--------
>  1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/flattree.c b/flattree.c
> index 30e6de2044b2..d7328690e008 100644
> --- a/flattree.c
> +++ b/flattree.c
> @@ -220,17 +220,24 @@ static struct emitter asm_emitter = {
>  
>  static int stringtable_insert(struct data *d, const char *str)
>  {
> -	unsigned int i;
> +	const char *dup;
> +	size_t size;
>  
> -	/* FIXME: do this more efficiently? */
> +	/*
> +	 * Reuse (subsequence of) existing string if possible. E.g.
> +	 * "enable-method" could be reused when inserting "method".
> +	 *
> +	 * The terminating '\0' must be taken into account, or "foo\0" may be
> +	 * wrongly recognized as subsequence of "foobar\0".
> +	 */
> +	size = strlen(str) + 1;
> +	dup = memmem(d->val, d->len, str, size);
> +	if (dup)
> +		return dup - d->val;
>  
> -	for (i = 0; i < d->len; i++) {
> -		if (streq(str, d->val + i))
> -			return i;
> -	}
> +	*d = data_append_data(*d, str, size);
>  
> -	*d = data_append_data(*d, str, strlen(str)+1);
> -	return i;
> +	return d->len - size;
>  }
>  
>  static void flatten_tree(struct node *tree, struct emitter *emit,

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-04-18  7:58 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-17  8:59 [PATCH v2] flattree: Optimize stringtable_insert() with memmem() Yao Zi
2025-04-18  7:58 ` David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.