All of lore.kernel.org
 help / color / mirror / Atom feed
From: david.laight.linux@gmail.com
To: "Willy Tarreau" <w@1wt.eu>,
	"Thomas Weißschuh" <linux@weissschuh.net>,
	linux-kernel@vger.kernel.org, "Cheng Li" <lechain@gmail.com>
Cc: David Laight <david.laight.linux@gmail.com>
Subject: [PATCH v5 next 08/17] tools/nolibc/printf: Use bit-masks to hold requested flag, length and conversion chars
Date: Sun,  8 Mar 2026 11:37:33 +0000	[thread overview]
Message-ID: <20260308113742.12649-9-david.laight.linux@gmail.com> (raw)
In-Reply-To: <20260308113742.12649-1-david.laight.linux@gmail.com>

From: David Laight <david.laight.linux@gmail.com>

Use flags bits (1u << (ch & 31)) for the flags, length modifiers, and
conversion specifiers.
This makes it easy to test for multiple values at once.

Detect the conversion flags " #+-0" although they are currently all ignored.

Unconditionally generate the signed values (for %d) to remove a second
set of checks for the size.

Separate out the formatting of single characters from numbers.
Output the sign for negative values then negate and treat as unsigned.

Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David Laight <david.laight.linux@gmail.com>
---

Unchanged for v5.

Changes for v4:
- Move the support for length modifiers t, j, q, L and formats
  i and X to the next patch.
- Convert ll to j (not q) since q isn't added until the next patch.

Changes for v3:
- Patch 6 in v2.
- Move all the variable definitions to the top of the function.
  The loop body is a bit long to hide definitions at its top.
- Avoid -Wtype-limits validating format characters.
- Include changes to the selftests.

Changes for v2:
- Use #defines to make the code a lot more readable.
- Include the changes from the old patch 10 that used masks for the
  conversion specifiers.
- Detect all the valid flag characters even though they are not implemented.
- Support for left justifying field is moved to patch 7.

 tools/include/nolibc/stdio.h | 157 ++++++++++++++++++++++++-----------
 1 file changed, 108 insertions(+), 49 deletions(-)

diff --git a/tools/include/nolibc/stdio.h b/tools/include/nolibc/stdio.h
index 710a4bce5e81..1c2b2cf9a1f3 100644
--- a/tools/include/nolibc/stdio.h
+++ b/tools/include/nolibc/stdio.h
@@ -291,10 +291,14 @@ int fseek(FILE *stream, long offset, int whence)
 }
 
 
-/* minimal printf(). It supports the following formats:
- *  - %[l*]{d,u,c,x,p}
- *  - %s
- *  - unknown modifiers are ignored.
+/* printf(). Supports the following integer and string formats.
+ *  - %[#-+ 0][width][{l,ll,j}]{c,d,u,x,p,s,m,%}
+ *  - %% generates a single %
+ *  - %m outputs strerror(errno).
+ *  - The modifiers [#-+ 0] are currently ignored.
+ *  - No support for precision or variable widths.
+ *  - No support for floating point or wide characters.
+ *  - Invalid formats are copied to the output buffer.
  *
  * Called by vfprintf() and snprintf() to do the actual formatting.
  * The callers provide a callback function to save the formatted data.
@@ -305,15 +309,43 @@ int fseek(FILE *stream, long offset, int whence)
  *  - with (NULL, 0) at the end of the __nolibc_printf.
  * If the callback returns non-zero __nolibc_printf() immediately returns -1.
  */
+
 typedef int (*__nolibc_printf_cb)(void *state, const char *buf, size_t size);
 
+/* This code uses 'flag' variables that are indexed by the low 6 bits
+ * of characters to optimise checks for multiple characters.
+ *
+ * _NOLIBC_PF_FLAGS_CONTAIN(flags, 'a', 'b'. ...)
+ * returns non-zero if the bit for any of the specified characters is set.
+ *
+ * _NOLIBC_PF_CHAR_IS_ONE_OF(ch, 'a', 'b'. ...)
+ * returns the flag bit for ch if it is one of the specified characters.
+ * All the characters must be in the same 32 character block (non-alphabetic,
+ * upper case, or lower case) of the ASCII character set.
+ */
+#define _NOLIBC_PF_FLAG(ch) (1u << ((ch) & 0x1f))
+#define _NOLIBC_PF_FLAG_NZ(ch) ((ch) ? _NOLIBC_PF_FLAG(ch) : 0)
+#define _NOLIBC_PF_FLAG8(cmp_1, cmp_2, cmp_3, cmp_4, cmp_5, cmp_6, cmp_7, cmp_8, ...) \
+	(_NOLIBC_PF_FLAG_NZ(cmp_1) | _NOLIBC_PF_FLAG_NZ(cmp_2) | \
+	 _NOLIBC_PF_FLAG_NZ(cmp_3) | _NOLIBC_PF_FLAG_NZ(cmp_4) | \
+	 _NOLIBC_PF_FLAG_NZ(cmp_5) | _NOLIBC_PF_FLAG_NZ(cmp_6) | \
+	 _NOLIBC_PF_FLAG_NZ(cmp_7) | _NOLIBC_PF_FLAG_NZ(cmp_8))
+#define _NOLIBC_PF_FLAGS_CONTAIN(flags, ...) \
+	((flags) & _NOLIBC_PF_FLAG8(__VA_ARGS__, 0, 0, 0, 0, 0, 0, 0))
+#define _NOLIBC_PF_CHAR_IS_ONE_OF(ch, cmp_1, ...) \
+	((unsigned int)(ch) - (cmp_1 & 0xe0) > 0x1f ? 0 : \
+		_NOLIBC_PF_FLAGS_CONTAIN(_NOLIBC_PF_FLAG(ch), cmp_1, __VA_ARGS__))
+
 static __attribute__((unused, format(printf, 3, 0)))
 int __nolibc_printf(__nolibc_printf_cb cb, void *state, const char *fmt, va_list args)
 {
-	char lpref, ch;
+	char ch;
 	unsigned long long v;
+	long long signed_v;
 	int written, width, len;
+	unsigned int flags, ch_flag;
 	char outbuf[21];
+	char *out;
 	const char *outstr;
 
 	written = 0;
@@ -324,6 +356,7 @@ int __nolibc_printf(__nolibc_printf_cb cb, void *state, const char *fmt, va_list
 			break;
 
 		width = 0;
+		flags = 0;
 		if (ch != '%') {
 			while (*fmt && *fmt != '%')
 				fmt++;
@@ -334,7 +367,14 @@ int __nolibc_printf(__nolibc_printf_cb cb, void *state, const char *fmt, va_list
 
 		/* we're in a format sequence */
 
-		ch = *fmt++;
+		/* Conversion flag characters */
+		while (1) {
+			ch = *fmt++;
+			ch_flag = _NOLIBC_PF_CHAR_IS_ONE_OF(ch, ' ', '#', '+', '-', '0');
+			if (!ch_flag)
+				break;
+			flags |= ch_flag;
+		}
 
 		/* width */
 		while (ch >= '0' && ch <= '9') {
@@ -344,62 +384,78 @@ int __nolibc_printf(__nolibc_printf_cb cb, void *state, const char *fmt, va_list
 			ch = *fmt++;
 		}
 
-		/* Length modifiers */
-		if (ch == 'l') {
-			lpref = 1;
-			ch = *fmt++;
-			if (ch == 'l') {
-				lpref = 2;
-				ch = *fmt++;
+		/* Length modifier.
+		 * They miss the conversion flags characters " #+-0" so can go into flags.
+		 * Change ll to j (both always 64bits).
+		 */
+		ch_flag = _NOLIBC_PF_CHAR_IS_ONE_OF(ch, 'l', 'j');
+		if (ch_flag != 0) {
+			if (ch == 'l' && fmt[0] == 'l') {
+				fmt++;
+				ch_flag = _NOLIBC_PF_FLAG('j');
 			}
-		} else if (ch == 'j') {
-			/* intmax_t is long long */
-			lpref = 2;
+			flags |= ch_flag;
 			ch = *fmt++;
-		} else {
-			lpref = 0;
 		}
 
-		if (ch == 'c' || ch == 'd' || ch == 'u' || ch == 'x' || ch == 'p') {
-			char *out = outbuf;
+		/* Conversion specifiers. */
 
-			if (ch == 'p')
+		/* Numeric and pointer conversion specifiers.
+		 *
+		 * Use an explicit bound check (rather than _NOLIBC_PF_CHAR_IS_ONE_OF())
+		 * so ch_flag can be used later.
+		 */
+		ch_flag = _NOLIBC_PF_FLAG(ch);
+		if ((ch >= 'a' && ch <= 'z') &&
+		    _NOLIBC_PF_FLAGS_CONTAIN(ch_flag, 'c', 'd', 'u', 'x', 'p')) {
+			/* 'long' is needed for pointer conversions and ltz lengths.
+			 * A single test can be used provided 'p' (the same bit as '0')
+			 * is masked from flags.
+			 */
+			if (_NOLIBC_PF_FLAGS_CONTAIN(ch_flag | (flags & ~_NOLIBC_PF_FLAG('p')),
+						     'p', 'l')) {
 				v = va_arg(args, unsigned long);
-			else if (lpref) {
-				if (lpref > 1)
-					v = va_arg(args, unsigned long long);
-				else
-					v = va_arg(args, unsigned long);
-			} else
+				signed_v = (long)v;
+			} else if (_NOLIBC_PF_FLAGS_CONTAIN(flags, 'j')) {
+				v = va_arg(args, unsigned long long);
+				signed_v = v;
+			} else {
 				v = va_arg(args, unsigned int);
+				signed_v = (int)v;
+			}
 
-			if (ch == 'd') {
-				/* sign-extend the value */
-				if (lpref == 0)
-					v = (long long)(int)v;
-				else if (lpref == 1)
-					v = (long long)(long)v;
+			if (ch == 'c') {
+				/* "%c" - single character. */
+				outbuf[0] = v;
+				len = 1;
+				outstr = outbuf;
+				goto do_output;
 			}
 
-			switch (ch) {
-			case 'c':
-				out[0] = v;
-				out[1] = 0;
-				break;
-			case 'd':
-				i64toa_r(v, out);
-				break;
-			case 'u':
+			out = outbuf;
+
+			if (_NOLIBC_PF_FLAGS_CONTAIN(ch_flag, 'd')) {
+				/* "%d" and "%i" - signed decimal numbers. */
+				if (signed_v < 0) {
+					*out++ = '-';
+					v = -(signed_v + 1);
+					v++;
+				}
+			}
+
+			/* Convert the number to ascii in the required base. */
+			if (_NOLIBC_PF_FLAGS_CONTAIN(ch_flag, 'd', 'u')) {
+				/* Base 10 */
 				u64toa_r(v, out);
-				break;
-			case 'p':
-				*(out++) = '0';
-				*(out++) = 'x';
-				__nolibc_fallthrough;
-			default: /* 'x' and 'p' above */
+			} else {
+				/* Base 16 */
+				if (_NOLIBC_PF_FLAGS_CONTAIN(ch_flag, 'p')) {
+					*(out++) = '0';
+					*(out++) = 'x';
+				}
 				u64toh_r(v, out);
-				break;
 			}
+
 			outstr = outbuf;
 			goto do_strlen_output;
 		}
@@ -442,6 +498,9 @@ int __nolibc_printf(__nolibc_printf_cb cb, void *state, const char *fmt, va_list
 do_output:
 		written += len;
 
+		/* Stop gcc back-merging this code into one of the conditionals above. */
+		_NOLIBC_OPTIMIZER_HIDE_VAR(len);
+
 		width -= len;
 		while (width > 0) {
 			/* Output pad in 16 byte blocks with the small block first. */
-- 
2.39.5


  parent reply	other threads:[~2026-03-08 11:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-08 11:37 [PATCH v5 next 00/17] Enhance printf() david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 01/17] tools/nolibc: Add _NOLIBC_OPTIMIZER_HIDE_VAR() to compiler.h david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 02/17] selftests/nolibc: Rename w to written in expect_vfprintf() david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 03/17] tools/nolibc: Implement strerror() in terms of strerror_r() david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 04/17] tools/nolibc: Rename the 'errnum' parameter to strerror() david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 05/17] tools/nolibc/printf: Output pad characters in 16 byte chunks david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 06/17] tools/nolibc/printf: Simplify __nolibc_printf() david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 07/17] tools/nolibc/printf: Use goto and reduce indentation david.laight.linux
2026-03-08 11:37 ` david.laight.linux [this message]
2026-03-08 11:37 ` [PATCH v5 next 09/17] tools/nolibc/printf: Add support for length modifiers tzqL and formats iX david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 10/17] tools/nolibc/printf: Handle "%s" with the numeric formats david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 11/17] tools/nolibc/printf: Prepend sign to converted number david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 12/17] tools/nolibc/printf: Add support for conversion flags space and plus david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 13/17] tools/nolibc/printf: Special case 0 and add support for %#x david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 14/17] tools/nolibc/printf: Add support for left aligning fields david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 15/17] tools/nolibc/printf: Add support for zero padding and field precision david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 16/17] tools/nolibc/printf: Add support for octal output david.laight.linux
2026-03-08 11:37 ` [PATCH v5 next 17/17] selftests/nolibc: Use printf variable field widths and precisions david.laight.linux
2026-03-08 11:58 ` [PATCH v5 next 00/17] Enhance printf() Willy Tarreau
2026-03-08 21:01 ` Thomas Weißschuh
2026-03-08 22:41   ` David Laight
2026-03-09  6:55     ` Willy Tarreau
2026-03-09  9:20       ` David Laight
2026-03-13 20:07     ` Thomas Weißschuh
2026-03-13 22:40       ` David Laight
2026-03-14  4:48         ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260308113742.12649-9-david.laight.linux@gmail.com \
    --to=david.laight.linux@gmail.com \
    --cc=lechain@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@weissschuh.net \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.