public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed
* man/man7/pathname.7: Correct handling of pathnames
@ 2025-01-27 11:22 Alejandro Colomar
  2025-01-27 14:50 ` Jason Yundt
  0 siblings, 1 reply; 16+ messages in thread
From: Alejandro Colomar @ 2025-01-27 11:22 UTC (permalink / raw)
  To: Jason Yundt; +Cc: linux-man, Florian Weimer, G. Branden Robinson

[-- Attachment #1: Type: text/plain, Size: 2793 bytes --]

Hi Jason,

I think the recommendation to use the current locale for handling
pathnames isn't good.

If I use the C locale (and I do have systems with the C locale), then
programs running on that system would corrupt files that go through that
system.  Let's say you send me María.song, and I download it on a system
using the C locale.  Programs would fail to copy the file.

Instead, I think a good recommendation would be to behave in one of the
following ways:

-  Accept only the POSIX Portable Filename Character Set.
-  Assume UTF-8, but reject control characters.
-  Assume UTF-8.
-  Accept anything, but reject control characters.
-  Accept anything, just like the kernel.

The current locale should actively be ignored when handling pathnames.

I've modified the example in the manual page to use a filename that's
non-ASCII, to make it more interesting.  See how it fails:

	alx@devuan:~/tmp/gcc$ cat path.c 
	     #include <err.h>
	     #include <iconv.h>
	     #include <langinfo.h>
	     #include <locale.h>
	     #include <stdio.h>
	     #include <stdlib.h>
	     #include <uchar.h>

	     #define NELEMS(a)  (sizeof(a) / sizeof(a[0]))

	     int
	     main(void)
	     {
		 char      *locale_pathname;
		 char      *in, *out;
		 FILE      *fp;
		 size_t    size;
		 size_t    inbytes, outbytes;
		 iconv_t   cd;
		 char32_t  utf32_pathname[] = U"María";

		 if (setlocale(LC_ALL, "") == NULL)
		     err(EXIT_FAILURE, "setlocale");

		 size = NELEMS(utf32_pathname) * MB_CUR_MAX;
		 locale_pathname = malloc(size);
		 if (locale_pathname == NULL)
		     err(EXIT_FAILURE, "malloc");

		 cd = iconv_open(nl_langinfo(CODESET), "UTF-32");
		 if (cd == (iconv_t)-1)
		     err(EXIT_FAILURE, "iconv_open");

		 in = (char *) utf32_pathname;
		 inbytes = sizeof(utf32_pathname);
		 out = locale_pathname;
		 outbytes = size;
		 if (iconv(cd, &in, &inbytes, &out, &outbytes) == (size_t) -1)
		     err(EXIT_FAILURE, "iconv");

		 if (iconv_close(cd) == -1)
		     err(EXIT_FAILURE, "iconv_close");

		 fp = fopen(locale_pathname, "w");
		 if (fp == NULL)
		     err(EXIT_FAILURE, "fopen");

		 fputs("Hello, world!\n", fp);
		 if (fclose(fp) == EOF)
		     err(EXIT_FAILURE, "fclose");

		 free(locale_pathname);
		 exit(EXIT_SUCCESS);
	     }

	alx@devuan:~/tmp/gcc$ cc -Wall -Wextra path.c 
	alx@devuan:~/tmp/gcc$ ls
	a.out  path.c
	alx@devuan:~/tmp/gcc$ ./a.out ; echo $?
	0
	alx@devuan:~/tmp/gcc$ ls
	María  a.out  path.c
	alx@devuan:~/tmp/gcc$ cat María 
	Hello, world!
	alx@devuan:~/tmp/gcc$ LC_ALL=C ./a.out ; echo $?
	a.out: iconv: Invalid or incomplete multibyte or wide character
	1

What do you think?


Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-01-28 18:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-27 11:22 man/man7/pathname.7: Correct handling of pathnames Alejandro Colomar
2025-01-27 14:50 ` Jason Yundt
2025-01-27 15:53   ` Alejandro Colomar
2025-01-27 17:14     ` Jason Yundt
2025-01-27 17:37       ` Alejandro Colomar
2025-01-27 18:27         ` наб
2025-01-27 18:46           ` [PATCH v1] man/man7/pathname.7: Pathnames are opaque C strings Alejandro Colomar
2025-01-27 18:58             ` наб
2025-01-27 19:05               ` Alejandro Colomar
2025-01-27 19:10                 ` наб
2025-01-27 19:18                   ` Alejandro Colomar
2025-01-27 23:07         ` man/man7/pathname.7: Correct handling of pathnames Jason Yundt
2025-01-27 23:49           ` Alejandro Colomar
2025-01-28  3:06             ` Jason Yundt
2025-01-28 10:17               ` Alejandro Colomar
2025-01-28 18:31                 ` наб

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox