public inbox for llvm@lists.linux.dev
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Add wcslen()
@ 2025-03-26 16:32 Nathan Chancellor
  2025-03-26 16:32 ` [PATCH v2 1/2] include: Move typedefs in nls.h to their own header Nathan Chancellor
  2025-03-26 16:32 ` [PATCH v2 2/2] lib/string.c: Add wcslen() Nathan Chancellor
  0 siblings, 2 replies; 8+ messages in thread
From: Nathan Chancellor @ 2025-03-26 16:32 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Shevchenko, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Ard Biesheuvel, linux-kernel, linux-hardening, llvm, stable,
	Nathan Chancellor

Hi all,

A recent LLVM change [1] introduces a call to wcslen() in
fs/smb/client/smb2pdu.c through UniStrcat() via
alloc_path_with_tree_prefix(). Similar to the bcmp() and stpcpy()
additions that happened in 5f074f3e192f and 1e1b6d63d634, add wcslen()
to fix the linkage failure.

[1]: https://github.com/llvm/llvm-project/commit/9694844d7e36fd5e01011ab56b64f27b867aa72d

---
Changes in v2:
- Refactor typedefs from nls.h into nls_types.h to make it safe to
  include in string.h, which may be included in many places throughout
  the kernel that may not like the other stuff nls.h brings in:
  https://lore.kernel.org/202503260611.MDurOUhF-lkp@intel.com/
- Drop libstub change due to the above change, as it is no longer
  necessary.
- Move prototype shuffle of patch 2 into the patch that adds wcslen()
  (Andy)
- Use new nls_types.h in string.{c,h}
- Link to v1: https://lore.kernel.org/r/20250325-string-add-wcslen-for-llvm-opt-v1-0-b8f1e2c17888@kernel.org

---
Nathan Chancellor (2):
      include: Move typedefs in nls.h to their own header
      lib/string.c: Add wcslen()

 include/linux/nls.h       | 19 +------------------
 include/linux/nls_types.h | 25 +++++++++++++++++++++++++
 include/linux/string.h    |  2 ++
 lib/string.c              | 11 +++++++++++
 4 files changed, 39 insertions(+), 18 deletions(-)
---
base-commit: 78ab93c78fb31c5dfe207318aa2b7bd4e41f8dba
change-id: 20250324-string-add-wcslen-for-llvm-opt-705791db92c0

Best regards,
-- 
Nathan Chancellor <nathan@kernel.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] include: Move typedefs in nls.h to their own header
  2025-03-26 16:32 [PATCH v2 0/2] Add wcslen() Nathan Chancellor
@ 2025-03-26 16:32 ` Nathan Chancellor
  2025-03-26 18:26   ` Andy Shevchenko
  2025-03-26 16:32 ` [PATCH v2 2/2] lib/string.c: Add wcslen() Nathan Chancellor
  1 sibling, 1 reply; 8+ messages in thread
From: Nathan Chancellor @ 2025-03-26 16:32 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Shevchenko, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Ard Biesheuvel, linux-kernel, linux-hardening, llvm, stable,
	Nathan Chancellor

In order to allow commonly included headers such as string.h to access
typedefs such as wchar_t without running into issues with the rest of
the NLS library, refactor the typedefs out into their own header that
can be included in a much safer manner.

Cc: stable@vger.kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 include/linux/nls.h       | 19 +------------------
 include/linux/nls_types.h | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/linux/nls.h b/include/linux/nls.h
index e0bf8367b274..3d416d1f60b6 100644
--- a/include/linux/nls.h
+++ b/include/linux/nls.h
@@ -3,24 +3,7 @@
 #define _LINUX_NLS_H
 
 #include <linux/init.h>
-
-/* Unicode has changed over the years.  Unicode code points no longer
- * fit into 16 bits; as of Unicode 5 valid code points range from 0
- * to 0x10ffff (17 planes, where each plane holds 65536 code points).
- *
- * The original decision to represent Unicode characters as 16-bit
- * wchar_t values is now outdated.  But plane 0 still includes the
- * most commonly used characters, so we will retain it.  The newer
- * 32-bit unicode_t type can be used when it is necessary to
- * represent the full Unicode character set.
- */
-
-/* Plane-0 Unicode character */
-typedef u16 wchar_t;
-#define MAX_WCHAR_T	0xffff
-
-/* Arbitrary Unicode character */
-typedef u32 unicode_t;
+#include <linux/nls_types.h>
 
 struct nls_table {
 	const char *charset;
diff --git a/include/linux/nls_types.h b/include/linux/nls_types.h
new file mode 100644
index 000000000000..8caefdba19b1
--- /dev/null
+++ b/include/linux/nls_types.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_NLS_TYPES_H
+#define _LINUX_NLS_TYPES_H
+
+#include <linux/types.h>
+
+/* Unicode has changed over the years.  Unicode code points no longer
+ * fit into 16 bits; as of Unicode 5 valid code points range from 0
+ * to 0x10ffff (17 planes, where each plane holds 65536 code points).
+ *
+ * The original decision to represent Unicode characters as 16-bit
+ * wchar_t values is now outdated.  But plane 0 still includes the
+ * most commonly used characters, so we will retain it.  The newer
+ * 32-bit unicode_t type can be used when it is necessary to
+ * represent the full Unicode character set.
+ */
+
+/* Plane-0 Unicode character */
+typedef u16 wchar_t;
+#define MAX_WCHAR_T	0xffff
+
+/* Arbitrary Unicode character */
+typedef u32 unicode_t;
+
+#endif /* _LINUX_NLS_TYPES_H */

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] lib/string.c: Add wcslen()
  2025-03-26 16:32 [PATCH v2 0/2] Add wcslen() Nathan Chancellor
  2025-03-26 16:32 ` [PATCH v2 1/2] include: Move typedefs in nls.h to their own header Nathan Chancellor
@ 2025-03-26 16:32 ` Nathan Chancellor
  2025-03-26 18:39   ` Andy Shevchenko
  1 sibling, 1 reply; 8+ messages in thread
From: Nathan Chancellor @ 2025-03-26 16:32 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Shevchenko, Nick Desaulniers, Bill Wendling, Justin Stitt,
	Ard Biesheuvel, linux-kernel, linux-hardening, llvm, stable,
	Nathan Chancellor

A recent optimization change in LLVM [1] aims to transform certain loop
idioms into calls to strlen() or wcslen(). This change transforms the
first while loop in UniStrcat() into a call to wcslen(), breaking the
build when UniStrcat() gets inlined into alloc_path_with_tree_prefix():

  ld.lld: error: undefined symbol: wcslen
  >>> referenced by nls_ucs2_utils.h:54 (fs/smb/client/../../nls/nls_ucs2_utils.h:54)
  >>>               vmlinux.o:(alloc_path_with_tree_prefix)
  >>> referenced by nls_ucs2_utils.h:54 (fs/smb/client/../../nls/nls_ucs2_utils.h:54)
  >>>               vmlinux.o:(alloc_path_with_tree_prefix)

The kernel does not build with '-ffreestanding' (which would avoid this
transformation) because it does want libcall optimizations in general
and turning on '-ffreestanding' disables the majority of them. While
'-fno-builtin-wcslen' would be more targeted at the problem, it does not
work with LTO.

Add a basic wcslen() to avoid this linkage failure. While no
architecture or FORTIFY_SOURCE overrides this, add it to string.c
instead of string_helpers.c so that it is built with '-ffreestanding',
otherwise the compiler might transform it into a call to itself.

Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/9694844d7e36fd5e01011ab56b64f27b867aa72d [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
---
 include/linux/string.h |  2 ++
 lib/string.c           | 11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index 0403a4ca4c11..4a48f8eac301 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -7,6 +7,7 @@
 #include <linux/cleanup.h>	/* for DEFINE_FREE() */
 #include <linux/compiler.h>	/* for inline */
 #include <linux/types.h>	/* for size_t */
+#include <linux/nls_types.h>	/* for wchar_t */
 #include <linux/stddef.h>	/* for NULL */
 #include <linux/err.h>		/* for ERR_PTR() */
 #include <linux/errno.h>	/* for E2BIG */
@@ -203,6 +204,7 @@ extern __kernel_size_t strlen(const char *);
 #ifndef __HAVE_ARCH_STRNLEN
 extern __kernel_size_t strnlen(const char *,__kernel_size_t);
 #endif
+extern __kernel_size_t wcslen(const wchar_t *s);
 #ifndef __HAVE_ARCH_STRPBRK
 extern char * strpbrk(const char *,const char *);
 #endif
diff --git a/lib/string.c b/lib/string.c
index eb4486ed40d2..2c6f8c8f4159 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -21,6 +21,7 @@
 #include <linux/errno.h>
 #include <linux/limits.h>
 #include <linux/linkage.h>
+#include <linux/nls_types.h>
 #include <linux/stddef.h>
 #include <linux/string.h>
 #include <linux/types.h>
@@ -429,6 +430,16 @@ size_t strnlen(const char *s, size_t count)
 EXPORT_SYMBOL(strnlen);
 #endif
 
+size_t wcslen(const wchar_t *s)
+{
+	const wchar_t *sc;
+
+	for (sc = s; *sc != '\0'; ++sc)
+		/* nothing */;
+	return sc - s;
+}
+EXPORT_SYMBOL(wcslen);
+
 #ifndef __HAVE_ARCH_STRSPN
 /**
  * strspn - Calculate the length of the initial substring of @s which only contain letters in @accept

-- 
2.49.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] include: Move typedefs in nls.h to their own header
  2025-03-26 16:32 ` [PATCH v2 1/2] include: Move typedefs in nls.h to their own header Nathan Chancellor
@ 2025-03-26 18:26   ` Andy Shevchenko
  0 siblings, 0 replies; 8+ messages in thread
From: Andy Shevchenko @ 2025-03-26 18:26 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Kees Cook, Andy Shevchenko, Nick Desaulniers, Bill Wendling,
	Justin Stitt, Ard Biesheuvel, linux-kernel, linux-hardening, llvm,
	stable

On Wed, Mar 26, 2025 at 7:19 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> In order to allow commonly included headers such as string.h to access
> typedefs such as wchar_t without running into issues with the rest of
> the NLS library, refactor the typedefs out into their own header that
> can be included in a much safer manner.

...

While the below is the original text, we can reduce churn for the
future by doing the following change while at it (no need to recend,
maybe Kees can amend this while applying):

> +/* Unicode has changed over the years.  Unicode code points no longer

/*
 * This is an incorrect comment style. Should be
 * like in this example.
 */

> + * fit into 16 bits; as of Unicode 5 valid code points range from 0
> + * to 0x10ffff (17 planes, where each plane holds 65536 code points).
> + *
> + * The original decision to represent Unicode characters as 16-bit
> + * wchar_t values is now outdated.  But plane 0 still includes the
> + * most commonly used characters, so we will retain it.  The newer
> + * 32-bit unicode_t type can be used when it is necessary to
> + * represent the full Unicode character set.
> + */

In either case it's fine and not a big deal,
Reviewed-by: Andy Shevchenko <andy@kernel.org>

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] lib/string.c: Add wcslen()
  2025-03-26 16:32 ` [PATCH v2 2/2] lib/string.c: Add wcslen() Nathan Chancellor
@ 2025-03-26 18:39   ` Andy Shevchenko
  2025-03-26 18:41     ` Andy Shevchenko
  2025-03-26 22:35     ` Nathan Chancellor
  0 siblings, 2 replies; 8+ messages in thread
From: Andy Shevchenko @ 2025-03-26 18:39 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Kees Cook, Andy Shevchenko, Nick Desaulniers, Bill Wendling,
	Justin Stitt, Ard Biesheuvel, linux-kernel, linux-hardening, llvm,
	stable

On Wed, Mar 26, 2025 at 7:19 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> A recent optimization change in LLVM [1] aims to transform certain loop
> idioms into calls to strlen() or wcslen(). This change transforms the
> first while loop in UniStrcat() into a call to wcslen(), breaking the
> build when UniStrcat() gets inlined into alloc_path_with_tree_prefix():
>
>   ld.lld: error: undefined symbol: wcslen
>   >>> referenced by nls_ucs2_utils.h:54 (fs/smb/client/../../nls/nls_ucs2_utils.h:54)
>   >>>               vmlinux.o:(alloc_path_with_tree_prefix)
>   >>> referenced by nls_ucs2_utils.h:54 (fs/smb/client/../../nls/nls_ucs2_utils.h:54)
>   >>>               vmlinux.o:(alloc_path_with_tree_prefix)
>
> The kernel does not build with '-ffreestanding' (which would avoid this
> transformation) because it does want libcall optimizations in general
> and turning on '-ffreestanding' disables the majority of them. While
> '-fno-builtin-wcslen' would be more targeted at the problem, it does not
> work with LTO.
>
> Add a basic wcslen() to avoid this linkage failure. While no
> architecture or FORTIFY_SOURCE overrides this, add it to string.c
> instead of string_helpers.c so that it is built with '-ffreestanding',
> otherwise the compiler might transform it into a call to itself.

...

> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -7,6 +7,7 @@
>  #include <linux/cleanup.h>     /* for DEFINE_FREE() */
>  #include <linux/compiler.h>    /* for inline */
>  #include <linux/types.h>       /* for size_t */

> +#include <linux/nls_types.h>   /* for wchar_t */

I know it's not ordered, but can we at least not make it worse, i.e.
squeeze this to be after the compiler.h? Or even somewhere after below
the err*.h? Whatever gives a better (sparsely) ordered overall
result...

>  #include <linux/stddef.h>      /* for NULL */
>  #include <linux/err.h>         /* for ERR_PTR() */

...

>  #ifndef __HAVE_ARCH_STRNLEN
>  extern __kernel_size_t strnlen(const char *,__kernel_size_t);
>  #endif
> +extern __kernel_size_t wcslen(const wchar_t *s);

I'm wondering why we still continue putting this 'extern' keyword.
Yes, I see that the rest is like this, but for new code do we really
need it?

>  #ifndef __HAVE_ARCH_STRPBRK
>  extern char * strpbrk(const char *,const char *);
>  #endif


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] lib/string.c: Add wcslen()
  2025-03-26 18:39   ` Andy Shevchenko
@ 2025-03-26 18:41     ` Andy Shevchenko
  2025-03-26 22:28       ` Nathan Chancellor
  2025-03-26 22:35     ` Nathan Chancellor
  1 sibling, 1 reply; 8+ messages in thread
From: Andy Shevchenko @ 2025-03-26 18:41 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Kees Cook, Andy Shevchenko, Nick Desaulniers, Bill Wendling,
	Justin Stitt, Ard Biesheuvel, linux-kernel, linux-hardening, llvm,
	stable

On Wed, Mar 26, 2025 at 8:39 PM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Wed, Mar 26, 2025 at 7:19 PM Nathan Chancellor <nathan@kernel.org> wrote:

...

> > --- a/include/linux/string.h
> > +++ b/include/linux/string.h
> > @@ -7,6 +7,7 @@
> >  #include <linux/cleanup.h>     /* for DEFINE_FREE() */
> >  #include <linux/compiler.h>    /* for inline */
> >  #include <linux/types.h>       /* for size_t */
>
> > +#include <linux/nls_types.h>   /* for wchar_t */
>
> I know it's not ordered, but can we at least not make it worse, i.e.
> squeeze this to be after the compiler.h? Or even somewhere after below
> the err*.h? Whatever gives a better (sparsely) ordered overall
> result...

I just checked, and the only unordered piece is those two: types +
stddef right now, and if you move nls_types.h after errno.h it will
keep the status quo.

> >  #include <linux/stddef.h>      /* for NULL */
> >  #include <linux/err.h>         /* for ERR_PTR() */

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] lib/string.c: Add wcslen()
  2025-03-26 18:41     ` Andy Shevchenko
@ 2025-03-26 22:28       ` Nathan Chancellor
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Chancellor @ 2025-03-26 22:28 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Kees Cook, Andy Shevchenko, Nick Desaulniers, Bill Wendling,
	Justin Stitt, Ard Biesheuvel, linux-kernel, linux-hardening, llvm,
	stable

On Wed, Mar 26, 2025 at 08:41:44PM +0200, Andy Shevchenko wrote:
> On Wed, Mar 26, 2025 at 8:39 PM Andy Shevchenko
> <andy.shevchenko@gmail.com> wrote:
> > On Wed, Mar 26, 2025 at 7:19 PM Nathan Chancellor <nathan@kernel.org> wrote:
> 
> ...
> 
> > > --- a/include/linux/string.h
> > > +++ b/include/linux/string.h
> > > @@ -7,6 +7,7 @@
> > >  #include <linux/cleanup.h>     /* for DEFINE_FREE() */
> > >  #include <linux/compiler.h>    /* for inline */
> > >  #include <linux/types.h>       /* for size_t */
> >
> > > +#include <linux/nls_types.h>   /* for wchar_t */
> >
> > I know it's not ordered, but can we at least not make it worse, i.e.
> > squeeze this to be after the compiler.h? Or even somewhere after below
> > the err*.h? Whatever gives a better (sparsely) ordered overall
> > result...
> 
> I just checked, and the only unordered piece is those two: types +
> stddef right now, and if you move nls_types.h after errno.h it will
> keep the status quo.
> 
> > >  #include <linux/stddef.h>      /* for NULL */
> > >  #include <linux/err.h>         /* for ERR_PTR() */

Yeah, I had noticed there was no alphabetical consistency, so I decided
to use this place to keep the types together but I have no strong
opinion. I can send v3 tomorrow unless Kees is happy with this version
and is okay with just applying this diff on top.

Cheers,
Nathan

diff --git a/include/linux/string.h b/include/linux/string.h
index 4a48f8eac301..750715768a62 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -7,10 +7,10 @@
 #include <linux/cleanup.h>	/* for DEFINE_FREE() */
 #include <linux/compiler.h>	/* for inline */
 #include <linux/types.h>	/* for size_t */
-#include <linux/nls_types.h>	/* for wchar_t */
 #include <linux/stddef.h>	/* for NULL */
 #include <linux/err.h>		/* for ERR_PTR() */
 #include <linux/errno.h>	/* for E2BIG */
+#include <linux/nls_types.h>	/* for wchar_t */
 #include <linux/overflow.h>	/* for check_mul_overflow() */
 #include <linux/stdarg.h>
 #include <uapi/linux/string.h>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] lib/string.c: Add wcslen()
  2025-03-26 18:39   ` Andy Shevchenko
  2025-03-26 18:41     ` Andy Shevchenko
@ 2025-03-26 22:35     ` Nathan Chancellor
  1 sibling, 0 replies; 8+ messages in thread
From: Nathan Chancellor @ 2025-03-26 22:35 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Kees Cook, Andy Shevchenko, Nick Desaulniers, Bill Wendling,
	Justin Stitt, Ard Biesheuvel, linux-kernel, linux-hardening, llvm,
	stable

On Wed, Mar 26, 2025 at 08:39:37PM +0200, Andy Shevchenko wrote:
> On Wed, Mar 26, 2025 at 7:19 PM Nathan Chancellor <nathan@kernel.org> wrote:
> >  #ifndef __HAVE_ARCH_STRNLEN
> >  extern __kernel_size_t strnlen(const char *,__kernel_size_t);
> >  #endif
> > +extern __kernel_size_t wcslen(const wchar_t *s);
> 
> I'm wondering why we still continue putting this 'extern' keyword.
> Yes, I see that the rest is like this, but for new code do we really
> need it?

Yeah, I just did it to keep it consistent with what is around it but
there should be no reason that it cannot be removed. I am happy to do
that in v3 if desired.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-03-26 22:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-26 16:32 [PATCH v2 0/2] Add wcslen() Nathan Chancellor
2025-03-26 16:32 ` [PATCH v2 1/2] include: Move typedefs in nls.h to their own header Nathan Chancellor
2025-03-26 18:26   ` Andy Shevchenko
2025-03-26 16:32 ` [PATCH v2 2/2] lib/string.c: Add wcslen() Nathan Chancellor
2025-03-26 18:39   ` Andy Shevchenko
2025-03-26 18:41     ` Andy Shevchenko
2025-03-26 22:28       ` Nathan Chancellor
2025-03-26 22:35     ` Nathan Chancellor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox