[PATCH v3 next 02/17] tools/nolibc: Optimise and common up the number to ascii functions

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: david.laight.linux@gmail.com
To: "Willy Tarreau" <w@1wt.eu>,
	"Thomas Weißschuh" <linux@weissschuh.net>,
	linux-kernel@vger.kernel.org, "Cheng Li" <lechain@gmail.com>
Cc: David Laight <david.laight.linux@gmail.com>
Subject: [PATCH v3 next 02/17] tools/nolibc: Optimise and common up the number to ascii functions
Date: Mon, 23 Feb 2026 10:17:20 +0000	[thread overview]
Message-ID: <20260223101735.2922-3-david.laight.linux@gmail.com> (raw)
In-Reply-To: <20260223101735.2922-1-david.laight.linux@gmail.com>

From: David Laight <david.laight.linux@gmail.com>

Implement u[64]to[ah]_r() using a common function that uses multiply
by reciprocal to generate the least significant digit first and then
reverses the string.

On 32bit this is five multiplies (with 64bit product) for each output
digit. I think the old utoa_r() always did 36 multiplies and a lot
of subtracts - so this is likely faster even for 32bit values.
Definitely better for 64bit values (especially small ones).

Clearly shifts are faster for base 16, but reversing the output buffer
makes a big difference.

Sharing the code reduces the footprint (unless gcc decides to constant
fold the functions).
Definitely helps vfprintf() where the constants get loaded and a single
call is done.
Also makes it cheap to add octal support to vfprintf for completeness.

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---

This was previously a separate patch.
Moved here since a major user in the printf code and the printf tests
test this code.
v3: Revert to the code from v1. But with a simple test not a loop.

 tools/include/nolibc/stdlib.h | 169 +++++++++++++++++-----------------
 1 file changed, 82 insertions(+), 87 deletions(-)

diff --git a/tools/include/nolibc/stdlib.h b/tools/include/nolibc/stdlib.h
index f184e108ed0a..186635656c15 100644
--- a/tools/include/nolibc/stdlib.h
+++ b/tools/include/nolibc/stdlib.h
@@ -188,36 +188,91 @@ void *realloc(void *old_ptr, size_t new_size)
 	return ret;
 }
 
-/* Converts the unsigned long integer <in> to its hex representation into
+/* Converts the unsigned 64bit integer <in> to base <base> ascii into
  * buffer <buffer>, which must be long enough to store the number and the
- * trailing zero (17 bytes for "ffffffffffffffff" or 9 for "ffffffff"). The
- * buffer is filled from the first byte, and the number of characters emitted
- * (not counting the trailing zero) is returned. The function is constructed
- * in a way to optimize the code size and avoid any divide that could add a
- * dependency on large external functions.
+ * trailing zero. The buffer is filled from the first byte, and the number
+ * of characters emitted (not counting the trailing zero) is returned.
+ * The function uses 'multiply by reciprocal' for the divisions and
+ * requires the caller pass the correct reciprocal.
+ *
+ * Note that unlike __div64_const32() in asm-generic/div64.h there isn't
+ * an extra shift done (by ___p), the reciprocal has to be lower resulting
+ * in a slightly low quotient.
+ * Keep things simple by correcting for the error.
+ * This also saves calculating the 'low * low' product (e2 below) which is
+ * very unlikely to be significant.
+ *
+ * Some maths:
+ *	recip = p2 / base - e1;		// With e1 < base.
+ *	q = (recip * in - e2) / p2;	// With e2 < p2.
+ *        = base / in - (e1 * in + e2) / p2;
+ *        > base / in - (e1 * p2 + p2) / p2;
+ *        = base / in - ((e1 + 1) * p2) / p2;
+ *        > base / in - base;
+ * So the maximum error is less than 'base'.
+ * Hence the largest possible digit is '2 * base - 1'.
+ * For base 10 e1 is 6 and you can get digits of 15 (eg from 2**64-1).
+ * Error e1 is largest for a base that is a factor of 2**64+1, the smallest is 274177
+ * and converting 2**42-1 in base 274177 does generate a digit of 274177+274175.
+ * This all means only a single correction is needed rather than a loop.
+ *
+ * __int128 isn't used for mips because gcc prior to 10.0 will call
+ * __multi3 for MIPS64r6.
  */
-static __attribute__((unused))
-int utoh_r(unsigned long in, char *buffer)
+#define _NOLIBC_U64TOA_RECIP(base) ((base) & 1 ? ~0ull / (base) : (1ull << 63) / ((base) / 2))
+static __attribute__((unused, noinline))
+int _nolibc_u64toa_base(uint64_t in, char *buffer, unsigned int base, uint64_t recip)
 {
-	signed char pos = (~0UL > 0xfffffffful) ? 60 : 28;
-	int digits = 0;
-	int dig;
+	unsigned int digits = 0;
+	unsigned int dig;
+	uint64_t q;
+	char *p;
 
+	/* Generate least significant digit first */
 	do {
-		dig = in >> pos;
-		in -= (uint64_t)dig << pos;
-		pos -= 4;
-		if (dig || digits || pos < 0) {
-			if (dig > 9)
-				dig += 'a' - '0' - 10;
-			buffer[digits++] = '0' + dig;
+#if defined(__SIZEOF_INT128__) && !defined(__mips__)
+		q = ((unsigned __int128)in * recip) >> 64;
+#else
+		uint64_t p = (uint32_t)in * (recip >> 32);
+		q = (in >> 32) * (recip >> 32) + (p >> 32);
+		p = (uint32_t)p + (in >> 32) * (uint32_t)recip;
+		q += p >> 32;
+#endif
+		dig = in - q * base;
+		/* Correct for any rounding errors */
+		if (dig >= base) {
+			dig -= base;
+			q++;
 		}
-	} while (pos >= 0);
+		if (dig > 9)
+			dig += 'a' - '0' - 10;
+		buffer[digits++] = '0' + dig;
+	} while ((in = q));
 
 	buffer[digits] = 0;
+
+	/* Order reverse to result */
+	for (p = buffer + digits - 1; p > buffer; buffer++, p--) {
+		dig = *buffer;
+		*buffer = *p;
+		*p = dig;
+	}
+
 	return digits;
 }
 
+/* Converts the unsigned long integer <in> to its hex representation into
+ * buffer <buffer>, which must be long enough to store the number and the
+ * trailing zero (17 bytes for "ffffffffffffffff" or 9 for "ffffffff"). The
+ * buffer is filled from the first byte, and the number of characters emitted
+ * (not counting the trailing zero) is returned.
+ */
+static __inline__ __attribute__((unused))
+int utoh_r(unsigned long in, char *buffer)
+{
+	return _nolibc_u64toa_base(in, buffer, 16, _NOLIBC_U64TOA_RECIP(16));
+}
+
 /* converts unsigned long <in> to an hex string using the static itoa_buffer
  * and returns the pointer to that string.
  */
@@ -233,30 +288,11 @@ char *utoh(unsigned long in)
  * trailing zero (21 bytes for 18446744073709551615 in 64-bit, 11 for
  * 4294967295 in 32-bit). The buffer is filled from the first byte, and the
  * number of characters emitted (not counting the trailing zero) is returned.
- * The function is constructed in a way to optimize the code size and avoid
- * any divide that could add a dependency on large external functions.
  */
-static __attribute__((unused))
+static __inline__ __attribute__((unused))
 int utoa_r(unsigned long in, char *buffer)
 {
-	unsigned long lim;
-	int digits = 0;
-	int pos = (~0UL > 0xfffffffful) ? 19 : 9;
-	int dig;
-
-	do {
-		for (dig = 0, lim = 1; dig < pos; dig++)
-			lim *= 10;
-
-		if (digits || in >= lim || !pos) {
-			for (dig = 0; in >= lim; dig++)
-				in -= lim;
-			buffer[digits++] = '0' + dig;
-		}
-	} while (pos--);
-
-	buffer[digits] = 0;
-	return digits;
+	return _nolibc_u64toa_base(in, buffer, 10, _NOLIBC_U64TOA_RECIP(10));
 }
 
 /* Converts the signed long integer <in> to its string representation into
@@ -324,34 +360,12 @@ char *utoa(unsigned long in)
  * buffer <buffer>, which must be long enough to store the number and the
  * trailing zero (17 bytes for "ffffffffffffffff"). The buffer is filled from
  * the first byte, and the number of characters emitted (not counting the
- * trailing zero) is returned. The function is constructed in a way to optimize
- * the code size and avoid any divide that could add a dependency on large
- * external functions.
+ * trailing zero) is returned.
  */
-static __attribute__((unused))
+static __inline__ __attribute__((unused))
 int u64toh_r(uint64_t in, char *buffer)
 {
-	signed char pos = 60;
-	int digits = 0;
-	int dig;
-
-	do {
-		if (sizeof(long) >= 8) {
-			dig = (in >> pos) & 0xF;
-		} else {
-			/* 32-bit platforms: avoid a 64-bit shift */
-			uint32_t d = (pos >= 32) ? (in >> 32) : in;
-			dig = (d >> (pos & 31)) & 0xF;
-		}
-		if (dig > 9)
-			dig += 'a' - '0' - 10;
-		pos -= 4;
-		if (dig || digits || pos < 0)
-			buffer[digits++] = '0' + dig;
-	} while (pos >= 0);
-
-	buffer[digits] = 0;
-	return digits;
+	return _nolibc_u64toa_base(in, buffer, 16, _NOLIBC_U64TOA_RECIP(16));
 }
 
 /* converts uint64_t <in> to an hex string using the static itoa_buffer and
@@ -368,31 +382,12 @@ char *u64toh(uint64_t in)
  * buffer <buffer>, which must be long enough to store the number and the
  * trailing zero (21 bytes for 18446744073709551615). The buffer is filled from
  * the first byte, and the number of characters emitted (not counting the
- * trailing zero) is returned. The function is constructed in a way to optimize
- * the code size and avoid any divide that could add a dependency on large
- * external functions.
+ * trailing zero) is returned.
  */
-static __attribute__((unused))
+static __inline__ __attribute__((unused))
 int u64toa_r(uint64_t in, char *buffer)
 {
-	unsigned long long lim;
-	int digits = 0;
-	int pos = 19; /* start with the highest possible digit */
-	int dig;
-
-	do {
-		for (dig = 0, lim = 1; dig < pos; dig++)
-			lim *= 10;
-
-		if (digits || in >= lim || !pos) {
-			for (dig = 0; in >= lim; dig++)
-				in -= lim;
-			buffer[digits++] = '0' + dig;
-		}
-	} while (pos--);
-
-	buffer[digits] = 0;
-	return digits;
+	return _nolibc_u64toa_base(in, buffer, 10, _NOLIBC_U64TOA_RECIP(10));
 }
 
 /* Converts the signed 64-bit integer <in> to its string representation into
-- 
2.39.5

next prev parent reply	other threads:[~2026-02-23 10:18 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 10:17 [PATCH v3 next 00/17] Enhance printf() david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 01/17] tools/nolibc: Add _NOLIBC_OPTIMIZER_HIDE_VAR() to compiler.h david.laight.linux
2026-02-25 21:25   ` Thomas Weißschuh
2026-02-25 22:17     ` David Laight
2026-02-25 22:24       ` Thomas Weißschuh
2026-02-23 10:17 ` david.laight.linux [this message]
2026-02-25 21:40   ` [PATCH v3 next 02/17] tools/nolibc: Optimise and common up the number to ascii functions Thomas Weißschuh
2026-02-25 22:09     ` David Laight
2026-02-23 10:17 ` [PATCH v3 next 03/17] selftests/nolibc: Fix build with host headers and libc david.laight.linux
2026-02-25 21:24   ` Thomas Weißschuh
2026-02-23 10:17 ` [PATCH v3 next 04/17] selftests/nolibc: Improve reporting of vfprintf() errors david.laight.linux
2026-02-25 21:56   ` Thomas Weißschuh
2026-02-26 10:12     ` David Laight
2026-02-26 21:39       ` Thomas Weißschuh
2026-02-23 10:17 ` [PATCH v3 next 05/17] tools/nolibc: Implement strerror() in terms of strerror_r() david.laight.linux
2026-02-25 22:09   ` Thomas Weißschuh
2026-02-25 22:58     ` David Laight
2026-02-23 10:17 ` [PATCH v3 next 06/17] tools/nolibc/printf: Change variables 'c' to 'ch' and 'tmpbuf[]' to 'outbuf[]' david.laight.linux
2026-02-25 22:23   ` Thomas Weißschuh
2026-02-23 10:17 ` [PATCH v3 next 07/17] tools/nolibc/printf: Move snprintf length check to callback david.laight.linux
2026-02-25 22:37   ` Thomas Weißschuh
2026-02-25 23:12     ` David Laight
2026-02-26 21:29       ` Thomas Weißschuh
2026-02-26 22:11         ` David Laight
2026-02-23 10:17 ` [PATCH v3 next 08/17] tools/nolibc/printf: Output pad characters in 16 byte chunks david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 09/17] tools/nolibc/printf: Simplify __nolibc_printf() david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 10/17] tools/nolibc/printf: Use goto and reduce indentation david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 11/17] tools/nolibc/printf: Use bit-masks to hold requested flag, length and conversion chars david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 12/17] tools/nolibc/printf: Handle "%s" with the numeric formats david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 13/17] tools/nolibc/printf: Add support for conversion flags david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 14/17] tools/nolibc/printf: Add support for left aligning fields david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 15/17] tools/nolibc/printf: Add support for zero padding and field precision david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 16/17] tools/nolibc/printf: Add support for octal output david.laight.linux
2026-02-23 10:17 ` [PATCH v3 next 17/17] selftests/nolibc: Use printf variable field widths and precisions david.laight.linux

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:f184e108ed0 dfblob:186635656c1 )
 OR (
bs:"[PATCH v3 next 02/17] tools/nolibc: Optimise and common up the number to ascii functions" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260223101735.2922-3-david.laight.linux@gmail.com \
    --to=david.laight.linux@gmail.com \
    --cc=lechain@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@weissschuh.net \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox