* [PATCH v2] filename.7: new manual page
@ 2021-10-17 23:07 Thaddeus H. Black
2021-10-18 16:25 ` Thaddeus H. Black
2021-10-18 16:33 ` [PATCH v3] " Thaddeus H. Black
0 siblings, 2 replies; 10+ messages in thread
From: Thaddeus H. Black @ 2021-10-17 23:07 UTC (permalink / raw)
To: linux-man
Cc: Alejandro Colomar, G. Branden Robinson, Michael Kerrisk,
Hendrik Boom
[-- Attachment #1: Type: text/plain, Size: 22819 bytes --]
Alejandro, Branden and Hendrik:
Thank you for the useful suggestions and revisions. Patch v2, below,
assimilates them.
---------------------------------------------------------------------------
CHANGES IN v2
---------------------------------------------------------------------------
Hendrik Boom, G. Branden Robinson and Alejandro Colomar (1):
Write "uppercase" and "lowercase" rather than "capital" and "small"
Alejandro Colomar and G. Branden Robinson (2):
Use semantic newlines
Avoid \f, but rather use separate lines
Alejandro Colomar (11):
Use subsections instead of sections
Use subsubsections instead of subsections
Remove unnecessary .P after .S[HS]
Use .PP rather than .P
Fix indentation of paragraph, which continues talking about \0
Mention FAT
For consistency, list "-" with "-name" and ".name"; s/a pair of/some/
Delete the redundant mention of "."
By s/all but/almost/, avoid double negation
Reference filesystems(5) under SEE ALSO
Under CONFORMING TO, write only, "POSIX.1‐2001 and later."
G. Branden Robinson (3):
Write "letter case" rather than "capitalization"
Reference section-3 pages not under SEE ALSO but only in passing
Avoid \c
Thaddeus H. Black (2):
Reword subsubsect "Special semantics" to support Alejandro's change no. 7
Avoid beginning any subsect or subsubsect with specially formatted text
---------------------------------------------------------------------------
GROFF SOURCE v2 (IN PATCH FORMAT)
---------------------------------------------------------------------------
--- /dev/null 2021-10-17 12:05:11.393541700 +0000
+++ b/man7/filename.7 2021-10-17 20:21:24.938542310 +0000
@@ -0,0 +1,654 @@
+.\" Copyright (C) 2021 Thaddeus H. Black <thb@debian.org>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date. The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein. The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.\" 2021-10-18, Thaddeus H. Black <thb@debian.org>
+.\" Wrote the manual page's initial version.
+.\"
+.TH FILENAME 7 2021-10-18 "Linux" "Linux Programmer's Manual"
+.SH NAME
+filename \- requirements and conventions for the naming of files
+.SH DESCRIPTION
+This manual page sets forth requirements for
+and delineates conventions regarding filenames
+on a Linux system,
+where a
+.I filename
+is either (as the word suggests) the name of a regular file
+or the name of another object held by the system's filesystem
+such as a directory, symbolic link, named pipe or device.
+.SS Legal filenames
+A filename on a Linux system can consist
+of almost any sequence of UTF-8 characters
+or, indeed, almost any sequence of bytes.
+The exceptions are as follows.
+.TP
+.B Reserved characters
+.RS
+The following characters are reserved.
+.TP
+.B /
+The solidus is reserved to separate pathname components
+as for example in
+.IR /usr/share/doc ,
+each component being itself a filename.
+For this reason, no filename may include a solidus.
+More precisely,
+no filename may include the byte that,
+in ASCII and UTF-8,
+exclusively represents the solidus.
+.TP
+.B \e0
+The null character is reserved for the filesystem to append
+to terminate a filename's representation in memory.
+For this reason, no filename may include a null character.
+More precisely,
+no filename may include the byte that,
+in ASCII and UTF-8,
+exclusively represents the null character.
+(When appended by the filesystem
+to terminate a filename's representation in memory,
+the byte in question is called the
+.I terminating null
+.IR byte .
+Though familiar to\~C programmers,
+the terminating null byte is usually invisible to users.)
+.IP
+Note
+.RB that\~ \e0 ,
+the null character (or null byte), differs from
+.RB from\~ 0 ,
+the printable digit-zero character.
+The null character (or null byte)
+is unprintable and registers in ASCII and UTF-8
+as the eight-bit pattern\~0x00,
+whereas the printable digit zero registers as\~0x30
+[see the \(lqHex\(rq column in
+.BR ascii (7)'s
+character table].
+Nothing prevents a filename from including a printable digit zero,
+as for instance the filename
+.I intel-m10-bmc.h
+from the kernel's source does.
+.RE
+.TP
+.B Reserved names
+.RS
+The following names are reserved.
+.TP
+.B .
+The filename consisting of a single full stop
+is reserved to represent the current directory.
+.TP
+.B ..
+The filename consisting of two full stops
+is reserved to represent the parent directory.
+.TP
+(empty)
+The empty filename,
+consisting of no bytes at all
+(except a terminating null byte),
+is not allowed.
+.PP
+The aforementioned current and parent directories are the current
+.I working
+directory and its parent except when
+.RB the\~ .
+.RB or\~ ..
+occurs in the middle or at the end of a pathname,
+in which case the current and parent directories
+are taken relative to preceding pathname elements.
+For example, if the current working directory were
+.IR /home/jsmith ,
+then
+.I ../rjones
+would mean
+.I /home/rjones
+but
+.I foo/bar/../baz
+would mean
+.IR /home/jsmith/foo/baz ,
+whereas
+.I foo/bar/./baz
+would mean
+.IR /home/jsmith/foo/bar/baz .
+.RE
+.TP
+.B Long names
+.RS
+No filename may exceed\~255 bytes in length,
+or\~256 bytes after counting the terminating null byte.
+.RB ( Reserved
+.B characters
+above explains the terminating null byte.)
+.RE
+.TP
+.B Non-UTF-8 names
+.RS
+Filenames need not consist of valid UTF-8 characters
+(although, except where a non-UTF-8 legacy encoding is in use,
+most filenames do).
+As long as the requirements
+of the preceding subsubsections
+are met,
+any sequence of bytes can legally serve as a filename.
+.RE
+.SS Conventional filenames
+Merely because a filename is legal
+does not make its use advisable, though.
+Some legal filenames cause practical troubles.
+For example, the legal filenames
+.IR m=3 ,
+.IR \(tijsmith ,
+.I \-v
+and
+.I My\~Document.txt
+are susceptible to misinterpretation by a shell.
+Workarounds typically exist,
+chiefly via quotation, escape
+and the explicit termination of options processing
+[see
+.BR sh (1)];
+but when reprocessing of shell-command text
+requires requotation and re-escape,
+the workarounds become an inconvenient, confusing, error-prone hassle.
+.PP
+The use of conventional filenames averts the hassle.
+It also makes filenames more recognizable to experienced users.
+.PP
+This subsection introduces broadly observed conventions for filenames.
+.TP
+.B The POSIX Portable Filename Character Set
+.RS
+In general contexts,
+especially for international applications,
+conventional filenames
+are composed using the\~65 ASCII characters
+of the POSIX Portable Filename Character Set.
+The POSIX Portable Filename Character Set consists of the following.
+.TP
+.BR A \- Z
+The\~26 uppercase or capital ASCII letters.
+.TP
+.BR a \- z
+The\~26 lowercase or small ASCII letters.
+.TP
+.BR 0 \- 9
+The ten ASCII digits.
+.TP
+.B . \_ \-
+These three ASCII punctuators: full stop; low line; hyphen-minus.
+.PP
+Special contexts often employ additional characters but,
+in general contexts for international applications,
+conventional filenames exclude characters other than the listed\~65.
+(For noninternational applications, see
+.B Locales and Unicode
+below.)
+.PP
+Observe that the
+.RB space\~ \(aq\0\(aq \~( \eu0020 )
+is not listed despite being an ASCII character.
+Filenames that include spaces
+are often encountered for various reasons in certain contexts,
+but such filenames are unconventional in general
+and are inconvenient to use with tools.
+Within filenames, the low
+.RB line\~ \_
+or
+.RB hyphen-minus\~ \-
+is conventionally employed as necessary instead of the space.
+(See
+.B Unconventional filenames
+and, under
+.B Soft
+.BR conventions ,
+also
+.B Low line versus hyphen-minus
+below.)
+.PP
+Incidentally, uppercase and lowercase letters
+are normally distinct within filenames on a Linux system;
+so, for example,
+.I README
+and
+.I readme
+name two different files.
+(Exception: the FAT filesystem on Linux is case-insensitive,
+so uppercase and lowercase letters
+are indistinct where FAT is in use.
+For further observations regarding letter case, see
+.B Letter case
+under
+.B Soft conventions
+below.)
+.RE
+.TP
+.B Special semantics
+.RS
+Besides the last subsubsection's POSIX convention,
+some conventions derived from core utilities
+are almost always respected, as well.
+.TP
+.B \-
+The one-character name
+consisting of a lone hyphen-minus
+is sometimes understood by a shell
+to refer to the previous working directory
+and sometimes understood by tools
+to refer to standard input or standard output.
+Therefore, no conventional filename
+consists of a lone hyphen-minus.
+.TP
+.BR \- name
+A name (other than the just-mentioned one-character name)
+that begins with a hyphen-minus
+is usually interpreted by tools as a
+command-line option rather than as a filename.
+Therefore, no conventional filename
+begins with the hyphen-minus.
+.TP
+.BR . name
+Conventional filenames may indeed begin with the full stop.
+This is normal and does not generally cause trouble.
+However, filenames that begin with the full stop
+conventionally designate
+.I hidden files
+(or hidden directories, etc.),
+a familiar example being the
+.I .profile
+typically found in a user's home directory.
+Hidden files behave normally but, by default, are ignored by
+.BR ls (1)
+and certain other tools.
+.RE
+.TP
+.B The full stop to introduce a format extension
+.RS
+Other than at a filename's beginning
+(a case the last subsubsection has discussed),
+the full stop is employed in filenames
+for various further conventional purposes.
+No single rule governs all conventional uses of the full stop.
+.PP
+However, except at a filename's beginning,
+the most common conventional use of the full stop
+is to append to a filename's stem
+an extension to indicate the format of the file's contents.
+An example is the filename
+.IR UnicodeData.txt ,
+in which
+.I UnicodeData
+is the stem and
+.RI the\~ .txt
+indicates that the file contains plain text.
+Multiple format extensions are even appended to some filename stems,
+as in
+.I my-archive.tar.xz
+for instance, which is the name of a tape archive
+.I my-archive.tar
+that, after archival, has subsequently been compressed by
+.BR xz (1).
+.PP
+The format-extension convention is almost universally recognized.
+Even nontechnical users are typically familiar with it.
+However, many users employ full stops also
+for various purposes unrelated to format extensions, as well;
+and they do so often enough
+that such unrelated usage can hardly be called unconventional.
+Except at a filename's beginning,
+convention supports free use of the full stop.
+.PP
+.I You
+may reserve the full stop solely
+to append format extensions if you wish,
+of course.
+Many users do.
+.PP
+.\" The next sentence has been corrected
+.\" according to Charles Plessy's helpful advice
+.\" [https://lists.debian.org/debian-devel/2021/08/msg00557.html].
+(If your machine is configured as a desktop or laptop
+rather than as a server,
+then you can probably find a fairly comprehensive catalog
+of conventional filename extensions,
+identifying the format each extension implies,
+on your machine in a file such as
+.I /etc/mime.types
+or
+.IR /usr/share/mime/globs .)
+.RE
+.SS Soft conventions
+Further filenaming conventions are softer.
+Though often observed,
+such softer conventions can be bent or broken
+without rendering filenames unconventional.
+.PP
+This subsection introduces soft conventions for filenames.
+.TP
+.B Low line versus hyphen-minus
+.RS
+Whether to use the low
+.RB line\~ \_
+or the
+.RB hyphen-minus\~ \-
+in filenames is a matter of preference.
+Except as stated above,
+convention does not strongly prefer the one over the other.
+.PP
+If you would like advice, anyway, however,
+then the kernel's source sets an example.
+Most filenames in the kernel's source prefer the hyphen-minus.
+You can do the same if you wish.
+.PP
+Even if you prefer the hyphen-minus, though,
+some exceptions arise, as follows.
+.IP \(bu
+The contents of a program's source files usually designate various
+.I entities
+such as variables, functions, types and so forth.
+In\~C and similar programming languages,
+the hyphen-minus is a minus sign,
+so the designations of entities must use the low line, instead.
+Where a file is named after an entity the file introduces,
+the filename should use low lines as the entity's designation does.
+Examples include the file
+.IR lock\_events.h ,
+which introduces the entity
+.IR lock\_events ,
+in the kernel's source.
+.IP \(bu
+Where distinct separators with different semantics are required,
+a filename can use the low line as an alternate separator.
+Examples include the file
+.IR coreutils\_8.30-3\_amd64.deb ,
+which provides revision\~3 of the Debian binary package
+that installs version\~8.30 of the GNU core utilities
+for the amd64/x86-64 architecture.
+.IP \(bu
+Occasionally, the name of a file
+that provides
+private, internal, ephemeral, uninterfaceable or undocumented aspects
+of an implementation
+will
+.I begin
+with a low line to hint that the file
+.RS
+.IP +
+does not require the user's or programmer's attention or
+.IP +
+is unsuitable for external agents to access directly.
+.RE
+.IP
+Examples include the file
+.\" On the author's PC using Groff's default output device,
+.\" Groff typesets the next line's italicized low line inconsistently
+.\" compared to the manual page's other italicized low lines.
+.\" Presumably, Groff does this
+.\" because the low line in question begins its word
+.\" (though why Groff thinks beginning the word significant
+.\" is unclear), but the inconsistency is slightly distracting.
+.I \_sd-common.h
+in systemd's source.
+.IP \(bu
+Sometimes, the low line
+stands for an unspecified letter of the alphabet.
+.PP
+Otherwise,
+despite that the low line and the hyphen-minus
+are both conventional,
+if you want advice:
+prefer the hyphen-minus.
+.RE
+.TP
+.B Letter case
+.RS
+A loosely observed convention
+favors lowercase letters in filenames
+where no reason to use uppercase exists.
+Many exceptions occur, though, as for example the oft-encountered
+.I Makefile
+that instructs
+.BR make (1)
+how to build an executable program or other autogeneratable file.
+.PP
+The reason convention favors lowercase
+is that the general use of lowercase
+leaves uppercase to be employed for emphasis.
+Where the default\~C (or C.UTF-8) locale is in use,
+the uppercase ASCII letters
+are collated before all the lowercase ones,
+whereby
+.BR ls (1)
+lists filenames like
+.I Makefile
+and
+.I README
+before filenames like
+.I a.out
+and
+.IR foo.c .
+[If your locale causes
+.BR ls (1)
+to collate differently
+when you would have preferred the just-described default collation,
+then try
+.B LC\_ALL=C ls
+or
+.B LC\_ALL=C.UTF-8 ls
+to suppress the locale.
+See
+.BR locale (7).]
+.PP
+Programming styles in languages like\~C++ and Python
+occasionally uppercase some letters
+in the names of types and of certain other entities.
+Such casing can spill over to affect filenames,
+so it is hard to state a general rule.
+.RE
+.SS Locales and Unicode
+.\" If another subsubsection were added to the manual page,
+.\" then this subsection might be demoted to a subsubsection and,
+.\" if appropriate, grouped with the new subsubsection together
+.\" under a new subsection entitled "Further considerations."
+If your application is local rather than international,
+then you can relax POSIX's aforementioned character-set convention
+at your discretion
+by including graphic Unicode characters;
+specifically, by including non-ASCII Unicode characters for which
+.BR iswgraph (3)
+returns true
+in your locale
+or (if your system has it) in the C.UTF-8 locale.
+[For the relationship between
+.BR unicode (7),
+.BR utf-8 (7)
+and
+.BR ascii (7),
+see the respective manual pages.
+Approximately, in brief,
+Unicode is a character set,
+UTF-8 is a byte-oriented scheme
+by which Unicode characters can be encoded,
+and ASCII is both a character set
+and a byte-oriented scheme
+that is a subset of both Unicode and UTF-8.]
+.PP
+To suggest an exact noninternational filenaming rule, other than the
+.BR iswgraph (3)
+rule, for every locale would exceed the scope of this manual page;
+but approximately,
+in a Japanese or French application for instance,
+a filename might respectively
+include kanji ideographs or accented Latin letters.
+Filenames that include kanji ideographs or accented Latin letters
+might be hard for international users to read or type,
+but insofar as such filenames
+exclude spaces, control characters, ASCII symbols
+.RB (like\~ $
+.RB or\~ = ),
+and ASCII punctuators
+other than the three punctuators POSIX recommends,
+such filenames will not normally cause trouble for tools
+and, thus, may be regarded as conventional within the local context.
+.PP
+The use of nonbreaking spaces
+.RB like\~ \eu00A0 ,
+.BR \eu2007 ,
+.B \eu202F
+.RB or\~ \euFEFF
+in filenames is probably inadvisable for most locales, despite that
+.BR iswgraph (3)
+returns true.
+[The use of ordinary, breaking spaces
+.RB like\~ \eu0020
+(the familiar ASCII space),
+.BR \eu1680 ,
+.B \eu2000
+.RB through\~ \eu2006 ,
+.BR \eu2008 ,
+.BR \eu2009 ,
+.BR \eu200A ,
+.B \eu205F
+.RB and\~ \eu3000
+is probably also inadvisable, but
+.BR iswgraph (3)
+returns false for those, anyway.]
+.PP
+If calling
+.BR iswgraph (3)
+from a program, incidentally, see also
+.BR mbrtowc (3)
+and
+.BR wcrtomb (3).
+.SS Unconventional filenames
+More than a few files on a typical Linux system,
+occasionally even including standard files
+employed by and/or automatically installed
+by an operating-system distribution,
+have unconventional filenames.
+For example, on a Debian GNU/Linux system,
+some names of files that supply software packages use the
+.RB characters\~ +
+.RB and\~ \(ti
+which, though unconventional in general,
+are normal and expected within that context.
+For another example,
+in the kernel's source, certain filenames use the
+.RB character\~ ,
+to separate a device's designator
+from the name of the device's manufacturer.
+You may have noticed the unconventionally-named
+.I lost+found
+directory lurking at a filesystem's root on your computer;
+and there are further examples, as well.
+.PP
+There are many reasons to use unconventional filenames.
+.PP
+It is hard to give a general rule,
+with respect to a particular context,
+as to which unconventional filenames
+are likely to cause practical troubles
+and which are not.
+If unsure, you can avoid troubles by adhering to convention;
+but if you wish or need to depart from convention,
+then the only suggestions this manual page would make are
+.IP \(bu
+that unconventional filenames not be used without context;
+.IP \(bu
+that unconventional filenames not be used without reason;
+.IP \(bu
+that, even where filenames are unconventional, the recommendations of
+.B Special semantics
+above still be followed if practicable;
+.IP \(bu
+that, where several unconventionally named files are collected,
+the use of unconventional characters be systematic
+(for example,
+.IR 16:30.log ,
+.IR 16:45.log ,
+.I 17:00.log
+and so on);
+.IP \(bu
+that, even if unconventional symbols or punctuators
+are employed within filenames,
+one think twice before
+.I beginning
+a filename with an unconventional symbol or punctuator;
+specifically, before beginning a filename
+with a nonalphanumeric ASCII character
+other than the full stop or low line
+(consider for example a filename that began with
+.RB the\~ \(ti
+.RB or\~ $
+symbol, which a shell might misinterpret as it were a reference
+to a home directory or shell parameter);
+.IP \(bu
+that, even if non-POSIX characters are used,
+non-ASCII characters be avoided
+to the extent to which the application is international;
+.IP \(bu
+that the shell's four standard globbing
+.RB characters\~ *?[]
+be avoided in most instances; and
+.IP \(bu
+even if none of the other suggestions is followed,
+that control characters be avoided in any event,
+.I control characters
+being characters, including the
+.RB tab\~ \et
+and
+.RB line-feed\~ \en
+characters, for which
+.BR iswcntrl (3)
+returns true.
+(Note that,
+although the use of the space in filenames
+contravenes POSIX and anyway annoys many Linux users,
+the space is the sole nongraphic ASCII character
+that, by definition, is not a control character.
+Spaces in filenames are unconventional and perhaps inadvisable,
+but they are hardly unusual;
+whereas tabs and line feeds are,
+for good reason,
+practically never seen.)
+.SH CONFORMING TO
+POSIX.1-2001 and later.
+.SH SEE ALSO
+.BR ls (1),
+.BR sh (1),
+.BR ext4 (5),
+.BR filesystems (5),
+.BR ascii (7),
+.BR locale (7),
+.BR unicode (7),
+.BR utf-8 (7)
+.PP
+info
+.B coreutils
+.\" The author, Thaddeus H. Black, thanks
+.\" his wife Kristie, daughter Naomi and son George
+.\" for their review and proofreading
+.\" of various parts of this manual page; and also thanks
+.\" Alejandro Colomar, G. Branden Robinson and Hendrik Boom
+.\" for their subsequent inspection, discussion and revision.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] filename.7: new manual page
2021-10-17 23:07 [PATCH v2] filename.7: new manual page Thaddeus H. Black
@ 2021-10-18 16:25 ` Thaddeus H. Black
2021-10-18 16:33 ` [PATCH v3] " Thaddeus H. Black
1 sibling, 0 replies; 10+ messages in thread
From: Thaddeus H. Black @ 2021-10-18 16:25 UTC (permalink / raw)
To: linux-man; +Cc: Alejandro Colomar, G. Branden Robinson, Michael Kerrisk
[-- Attachment #1: Type: text/plain, Size: 129 bytes --]
Apparently, my patches have had the wrong format. Please disregard
patch v2. I will submit patch v3 in the right format, next.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3] filename.7: new manual page
2021-10-17 23:07 [PATCH v2] filename.7: new manual page Thaddeus H. Black
2021-10-18 16:25 ` Thaddeus H. Black
@ 2021-10-18 16:33 ` Thaddeus H. Black
2021-10-19 8:54 ` Florian Weimer
2021-10-19 13:38 ` Alejandro Colomar (man-pages)
1 sibling, 2 replies; 10+ messages in thread
From: Thaddeus H. Black @ 2021-10-18 16:33 UTC (permalink / raw)
To: linux-man; +Cc: Alejandro Colomar, G. Branden Robinson, Michael Kerrisk
[-- Attachment #1: Type: text/plain, Size: 23623 bytes --]
Please find patch v3 below, via "git format-patch".
(If the format is still wrong, or if there is some other way
in which I should better support the team's work flow, kindly advise.)
---------------------------------------------------------------------------
CHANGES IN v3
---------------------------------------------------------------------------
Thaddeus H. Black (3):
Polish prose under "Special semantics"
Polish and clarify prose under "Letter case"
Clarify the final paragraph under "Locales and Unicode"
---------------------------------------------------------------------------
CHANGES IN v2
---------------------------------------------------------------------------
Hendrik Boom, G. Branden Robinson and Alejandro Colomar (1):
Write "uppercase" and "lowercase" rather than "capital" and "small"
Alejandro Colomar and G. Branden Robinson (2):
Use semantic newlines
Avoid \f, but rather use separate lines
Alejandro Colomar (11):
Use subsections instead of sections
Use subsubsections instead of subsections
Remove unnecessary .P after .S[HS]
Use .PP rather than .P
Fix indentation of paragraph, which continues talking about \0
Mention FAT
For consistency, list "-" with "-name" and ".name"; s/a pair of/some/
Delete the redundant mention of "."
By s/all but/almost/, avoid double negation
Reference filesystems(5) under SEE ALSO
Under CONFORMING TO, write only, "POSIX.1‐2001 and later."
G. Branden Robinson (3):
Write "letter case" rather than "capitalization"
Reference section-3 pages not under SEE ALSO but only in passing
Avoid \c
Thaddeus H. Black (2):
Reword subsubsect "Special semantics" to support Alejandro's change no. 7
Avoid beginning any subsect or subsubsect with specially formatted text
---------------------------------------------------------------------------
GROFF SOURCE v3 (IN GIT'S PATCH FORMAT)
---------------------------------------------------------------------------
---
man7/filename.7 | 660 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 660 insertions(+)
create mode 100644 man7/filename.7
diff --git a/man7/filename.7 b/man7/filename.7
new file mode 100644
index 000000000..9c53f8c7b
--- /dev/null
+++ b/man7/filename.7
@@ -0,0 +1,660 @@
+.\" Copyright (C) 2021 Thaddeus H. Black <thb@debian.org>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date. The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein. The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.\" 2021-10-18, Thaddeus H. Black <thb@debian.org>
+.\" Wrote the manual page's initial version.
+.\"
+.TH FILENAME 7 2021-10-18 "Linux" "Linux Programmer's Manual"
+.SH NAME
+filename \- requirements and conventions for the naming of files
+.SH DESCRIPTION
+This manual page sets forth requirements for
+and delineates conventions regarding filenames
+on a Linux system,
+where a
+.I filename
+is either (as the word suggests) the name of a regular file
+or the name of another object held by the system's filesystem
+such as a directory, symbolic link, named pipe or device.
+.SS Legal filenames
+A filename on a Linux system can consist
+of almost any sequence of UTF-8 characters
+or, indeed, almost any sequence of bytes.
+The exceptions are as follows.
+.TP
+.B Reserved characters
+.RS
+The following characters are reserved.
+.TP
+.B /
+The solidus is reserved to separate pathname components
+as for example in
+.IR /usr/share/doc ,
+each component being itself a filename.
+For this reason, no filename may include a solidus.
+More precisely,
+no filename may include the byte that,
+in ASCII and UTF-8,
+exclusively represents the solidus.
+.TP
+.B \e0
+The null character is reserved for the filesystem to append
+to terminate a filename's representation in memory.
+For this reason, no filename may include a null character.
+More precisely,
+no filename may include the byte that,
+in ASCII and UTF-8,
+exclusively represents the null character.
+(When appended by the filesystem
+to terminate a filename's representation in memory,
+the byte in question is called the
+.I terminating null
+.IR byte .
+Though familiar to\~C programmers,
+the terminating null byte is usually invisible to users.)
+.IP
+Note
+.RB that\~ \e0 ,
+the null character (or null byte), differs from
+.RB from\~ 0 ,
+the printable digit-zero character.
+The null character (or null byte)
+is unprintable and registers in ASCII and UTF-8
+as the eight-bit pattern\~0x00,
+whereas the printable digit zero registers as\~0x30
+[see the \(lqHex\(rq column in
+.BR ascii (7)'s
+character table].
+Nothing prevents a filename from including a printable digit zero,
+as for instance the filename
+.I intel-m10-bmc.h
+from the kernel's source does.
+.RE
+.TP
+.B Reserved names
+.RS
+The following names are reserved.
+.TP
+.B .
+The filename consisting of a single full stop
+is reserved to represent the current directory.
+.TP
+.B ..
+The filename consisting of two full stops
+is reserved to represent the parent directory.
+.TP
+(empty)
+The empty filename,
+consisting of no bytes at all
+(except a terminating null byte),
+is not allowed.
+.PP
+The aforementioned current and parent directories are the current
+.I working
+directory and its parent except when
+.RB the\~ .
+.RB or\~ ..
+occurs in the middle or at the end of a pathname,
+in which case the current and parent directories
+are taken relative to preceding pathname elements.
+For example, if the current working directory were
+.IR /home/jsmith ,
+then
+.I ../rjones
+would mean
+.I /home/rjones
+but
+.I foo/bar/../baz
+would mean
+.IR /home/jsmith/foo/baz ,
+whereas
+.I foo/bar/./baz
+would mean
+.IR /home/jsmith/foo/bar/baz .
+.RE
+.TP
+.B Long names
+.RS
+No filename may exceed\~255 bytes in length,
+or\~256 bytes after counting the terminating null byte.
+.RB ( Reserved
+.B characters
+above explains the terminating null byte.)
+.RE
+.TP
+.B Non-UTF-8 names
+.RS
+Filenames need not consist of valid UTF-8 characters
+(although, except where a non-UTF-8 legacy encoding is in use,
+most filenames do).
+As long as the requirements
+of the preceding subsubsections
+are met,
+any sequence of bytes can legally serve as a filename.
+.RE
+.SS Conventional filenames
+Merely because a filename is legal
+does not make its use advisable, though.
+Some legal filenames cause practical troubles.
+For example, the legal filenames
+.IR m=3 ,
+.IR \(tijsmith ,
+.I \-v
+and
+.I My\~Document.txt
+are susceptible to misinterpretation by a shell.
+Workarounds typically exist,
+chiefly via quotation, escape
+and the explicit termination of options processing
+[see
+.BR sh (1)];
+but when reprocessing of shell-command text
+requires requotation and re-escape,
+the workarounds become an inconvenient, confusing, error-prone hassle.
+.PP
+The use of conventional filenames averts the hassle.
+It also makes filenames more recognizable to experienced users.
+.PP
+This subsection introduces broadly observed conventions for filenames.
+.TP
+.B The POSIX Portable Filename Character Set
+.RS
+In general contexts,
+especially for international applications,
+conventional filenames
+are composed using the\~65 ASCII characters
+of the POSIX Portable Filename Character Set.
+The POSIX Portable Filename Character Set consists of the following.
+.TP
+.BR A \- Z
+The\~26 uppercase or capital ASCII letters.
+.TP
+.BR a \- z
+The\~26 lowercase or small ASCII letters.
+.TP
+.BR 0 \- 9
+The ten ASCII digits.
+.TP
+.B . \_ \-
+These three ASCII punctuators: full stop; low line; hyphen-minus.
+.PP
+Special contexts often employ additional characters but,
+in general contexts for international applications,
+conventional filenames exclude characters other than the listed\~65.
+(For noninternational applications, see
+.B Locales and Unicode
+below.)
+.PP
+Observe that the
+.RB space\~ \(aq\0\(aq \~( \eu0020 )
+is not listed despite being an ASCII character.
+Filenames that include spaces
+are often encountered for various reasons in certain contexts,
+but such filenames are unconventional in general
+and are inconvenient to use with tools.
+Within filenames, the low
+.RB line\~ \_
+or
+.RB hyphen-minus\~ \-
+is conventionally employed as necessary instead of the space.
+(See
+.B Unconventional filenames
+and, under
+.B Soft
+.BR conventions ,
+also
+.B Low line versus hyphen-minus
+below.)
+.PP
+Incidentally, uppercase and lowercase letters
+are normally distinct within filenames on a Linux system;
+so, for example,
+.I README
+and
+.I readme
+name two different files.
+(Exception: the FAT filesystem on Linux is case-insensitive,
+so uppercase and lowercase letters
+are indistinct where FAT is in use.
+For further observations regarding letter case, see
+.B Letter case
+under
+.B Soft conventions
+below.)
+.RE
+.TP
+.B Special semantics
+.RS
+Besides the last subsubsection's POSIX convention,
+some conventions derived from core utilities
+are almost always respected, as well.
+.TP
+.B \-
+The one-character name
+consisting of a lone hyphen-minus
+is sometimes understood by a shell
+to refer to the previous working directory
+and sometimes understood by tools
+to refer to standard input or standard output.
+Therefore, no conventional filename
+consists of a lone hyphen-minus.
+.TP
+.BR \- name
+A name (other than the just-mentioned one-character name)
+that begins with a hyphen-minus
+is usually interpreted by tools as a
+command-line option rather than as a filename.
+Therefore, no conventional filename
+begins with a hyphen-minus.
+.TP
+.BR . name
+Conventional filenames may indeed begin with the full stop.
+To begin with the full stop is unexceptionable;
+it does not generally cause trouble.
+However, filenames that begin with the full stop
+conventionally designate
+.I hidden files
+(or hidden directories, etc.),
+a familiar example being the
+.I .profile
+typically found in a user's home directory.
+Hidden files behave normally but, by default, are ignored by
+.BR ls (1)
+and certain other tools.
+.RE
+.TP
+.B The full stop to introduce a format extension
+.RS
+Other than at a filename's beginning
+(a case the last subsubsection has discussed),
+the full stop is employed in filenames
+for various further conventional purposes.
+No single rule governs all conventional uses of the full stop.
+.PP
+However, except at a filename's beginning,
+the most common conventional use of the full stop
+is to append to a filename's stem
+an extension to indicate the format of the file's contents.
+An example is the filename
+.IR UnicodeData.txt ,
+in which
+.I UnicodeData
+is the stem and
+.RI the\~ .txt
+indicates that the file contains plain text.
+Multiple format extensions are even appended to some filename stems,
+as in
+.I my-archive.tar.xz
+for instance, which is the name of a tape archive
+.I my-archive.tar
+that, after archival, has subsequently been compressed by
+.BR xz (1).
+.PP
+The format-extension convention is almost universally recognized.
+Even nontechnical users are typically familiar with it.
+However, many users employ full stops also
+for various purposes unrelated to format extensions, as well;
+and they do so often enough
+that such unrelated usage can hardly be called unconventional.
+Except at a filename's beginning,
+convention supports free use of the full stop.
+.PP
+.I You
+may reserve the full stop solely
+to append format extensions if you wish,
+of course.
+Many users do.
+.PP
+.\" The next sentence has been corrected
+.\" according to Charles Plessy's helpful advice
+.\" [https://lists.debian.org/debian-devel/2021/08/msg00557.html].
+(If your machine is configured as a desktop or laptop
+rather than as a server,
+then you can probably find a fairly comprehensive catalog
+of conventional filename extensions,
+identifying the format each extension implies,
+on your machine in a file such as
+.I /etc/mime.types
+or
+.IR /usr/share/mime/globs .)
+.RE
+.SS Soft conventions
+Further filenaming conventions are softer.
+Though often observed,
+such softer conventions can be bent or broken
+without rendering filenames unconventional.
+.PP
+This subsection introduces soft conventions for filenames.
+.TP
+.B Low line versus hyphen-minus
+.RS
+Whether to use the low
+.RB line\~ \_
+or the
+.RB hyphen-minus\~ \-
+in filenames is a matter of preference.
+Except as stated above,
+convention does not strongly prefer the one over the other.
+.PP
+If you would like advice, anyway, however,
+then the kernel's source sets an example.
+Most filenames in the kernel's source prefer the hyphen-minus.
+You can do the same if you wish.
+.PP
+Even if you prefer the hyphen-minus, though,
+some exceptions arise, as follows.
+.IP \(bu
+The contents of a program's source files usually designate various
+.I entities
+such as variables, functions, types and so forth.
+In\~C and similar programming languages,
+the hyphen-minus is a minus sign,
+so the designations of entities must use the low line, instead.
+Where a file is named after an entity the file introduces,
+the filename should use low lines as the entity's designation does.
+Examples include the file
+.IR lock\_events.h ,
+which introduces the entity
+.IR lock\_events ,
+in the kernel's source.
+.IP \(bu
+Where distinct separators with different semantics are required,
+a filename can use the low line as an alternate separator.
+Examples include the file
+.IR coreutils\_8.30-3\_amd64.deb ,
+which provides revision\~3 of the Debian binary package
+that installs version\~8.30 of the GNU core utilities
+for the amd64/x86-64 architecture.
+.IP \(bu
+Occasionally, the name of a file
+that provides
+private, internal, ephemeral, uninterfaceable or undocumented aspects
+of an implementation
+will
+.I begin
+with a low line to hint that the file
+.RS
+.IP +
+does not require the user's or programmer's attention or
+.IP +
+is unsuitable for external agents to access directly.
+.RE
+.IP
+Examples include the file
+.\" On the author's PC using Groff's default output device,
+.\" Groff typesets the next line's italicized low line inconsistently
+.\" compared to the manual page's other italicized low lines.
+.\" Presumably, Groff does this
+.\" because the low line in question begins its word
+.\" (though why Groff thinks beginning the word significant
+.\" is unclear), but the inconsistency is slightly distracting.
+.I \_sd-common.h
+in systemd's source.
+.IP \(bu
+Sometimes, the low line
+stands for an unspecified letter of the alphabet.
+.PP
+Otherwise,
+despite that the low line and the hyphen-minus
+are both conventional,
+if you want advice:
+prefer the hyphen-minus.
+.RE
+.TP
+.B Letter case
+.RS
+A loosely observed convention
+favors lowercase letters in filenames
+where no reason to use uppercase exists.
+Many exceptions occur, though, as for example the oft-encountered
+.I Makefile
+that instructs
+.BR make (1)
+how to build an executable program or other autogeneratable file.
+.PP
+The reason convention favors lowercase
+is that the general use of lowercase
+leaves uppercase to be employed for emphasis.
+Where the default\~C (or C.UTF-8) locale is in use,
+the uppercase ASCII letters
+are collated before all the lowercase ones,
+whereby
+.BR ls (1)
+lists filenames like
+.I Makefile
+and
+.I README
+before filenames like
+.I a.out
+and
+.IR foo.c .
+[If your locale causes
+.BR ls (1)
+to collate differently
+when you would have preferred the just-described default collation,
+then try
+.B LC\_ALL=C ls
+or
+.B LC\_ALL=C.UTF-8 ls
+to suppress the locale.
+See
+.BR locale (7).]
+.PP
+Despite the foregoing,
+programming styles in languages like\~C++ and Python
+occasionally uppercase some letters
+in the names of types and of certain other entities.
+Such styles can spill over to affect filenames.
+For this among other reasons,
+it is hard to state a comprehensive rule regarding letter casing.
+.PP
+Nevertheless, if in doubt,
+prefer lowercase where no reason to use uppercase exists.
+.RE
+.SS Locales and Unicode
+.\" If another subsubsection were added to the manual page,
+.\" then this subsection might be demoted to a subsubsection and,
+.\" if appropriate, grouped with the new subsubsection together
+.\" under a new subsection entitled "Further considerations."
+If your application is local rather than international,
+then you can relax POSIX's aforementioned character-set convention
+at your discretion
+by including graphic Unicode characters;
+specifically, by including non-ASCII Unicode characters for which
+.BR iswgraph (3)
+returns true
+in your locale
+or (if your system has it) in the C.UTF-8 locale.
+[For the relationship between
+.BR unicode (7),
+.BR utf-8 (7)
+and
+.BR ascii (7),
+see the respective manual pages.
+Approximately, in brief,
+Unicode is a character set,
+UTF-8 is a byte-oriented scheme
+by which Unicode characters can be encoded,
+and ASCII is both a character set
+and a byte-oriented scheme
+that is a subset of both Unicode and UTF-8.]
+.PP
+To suggest an exact noninternational filenaming rule, other than the
+.BR iswgraph (3)
+rule, for every locale would exceed the scope of this manual page;
+but approximately,
+in a Japanese or French application for instance,
+a filename might respectively
+include kanji ideographs or accented Latin letters.
+Filenames that include kanji ideographs or accented Latin letters
+might be hard for international users to read or type,
+but insofar as such filenames
+exclude spaces, control characters, ASCII symbols
+.RB (like\~ $
+.RB or\~ = ),
+and ASCII punctuators
+other than the three punctuators POSIX recommends,
+such filenames will not normally cause trouble for tools
+and, thus, may be regarded as conventional within the local context.
+.PP
+The use of nonbreaking spaces
+.RB like\~ \eu00A0 ,
+.BR \eu2007 ,
+.B \eu202F
+.RB or\~ \euFEFF
+in filenames is probably inadvisable for most locales, despite that
+.BR iswgraph (3)
+returns true.
+[The use of ordinary, breaking spaces
+.RB like\~ \eu0020
+(the familiar ASCII space),
+.BR \eu1680 ,
+.B \eu2000
+.RB through\~ \eu2006 ,
+.BR \eu2008 ,
+.BR \eu2009 ,
+.BR \eu200A ,
+.B \eu205F
+.RB and\~ \eu3000
+is probably also inadvisable, but
+.BR iswgraph (3)
+returns false for those, anyway.]
+.PP
+If calling
+.BR iswgraph (3),
+incidentally, see also
+.BR mbrtowc (3)
+and
+.BR wcrtomb (3).
+.SS Unconventional filenames
+More than a few files on a typical Linux system,
+occasionally even including standard files
+employed by and/or automatically installed
+by an operating-system distribution,
+have unconventional filenames.
+For example, on a Debian GNU/Linux system,
+some names of files that supply software packages use the
+.RB characters\~ +
+.RB and\~ \(ti
+which, though unconventional in general,
+are normal and expected within that context.
+For another example,
+in the kernel's source, certain filenames use the
+.RB character\~ ,
+to separate a device's designator
+from the name of the device's manufacturer.
+You may have noticed the unconventionally-named
+.I lost+found
+directory lurking at a filesystem's root on your computer;
+and there are further examples, as well.
+.PP
+There are many reasons to use unconventional filenames.
+.PP
+It is hard to give a general rule,
+with respect to a particular context,
+as to which unconventional filenames
+are likely to cause practical troubles
+and which are not.
+If unsure, you can avoid troubles by adhering to convention;
+but if you wish or need to depart from convention,
+then the only suggestions this manual page would make are
+.IP \(bu
+that unconventional filenames not be used without context;
+.IP \(bu
+that unconventional filenames not be used without reason;
+.IP \(bu
+that, even where filenames are unconventional, the recommendations of
+.B Special semantics
+above still be followed if practicable;
+.IP \(bu
+that, where several unconventionally named files are collected,
+the use of unconventional characters be systematic
+(for example,
+.IR 16:30.log ,
+.IR 16:45.log ,
+.I 17:00.log
+and so on);
+.IP \(bu
+that, even if unconventional symbols or punctuators
+are employed within filenames,
+one think twice before
+.I beginning
+a filename with an unconventional symbol or punctuator;
+specifically, before beginning a filename
+with a nonalphanumeric ASCII character
+other than the full stop or low line
+(consider for example a filename that began with
+.RB the\~ \(ti
+.RB or\~ $
+symbol, which a shell might misinterpret as it were a reference
+to a home directory or shell parameter);
+.IP \(bu
+that, even if non-POSIX characters are used,
+non-ASCII characters be avoided
+to the extent to which the application is international;
+.IP \(bu
+that the shell's four standard globbing
+.RB characters\~ *?[]
+be avoided in most instances; and
+.IP \(bu
+even if none of the other suggestions is followed,
+that control characters be avoided in any event,
+.I control characters
+being characters, including the
+.RB tab\~ \et
+and
+.RB line-feed\~ \en
+characters, for which
+.BR iswcntrl (3)
+returns true.
+(Note that,
+although the use of the space in filenames
+contravenes POSIX and anyway annoys many Linux users,
+the space is the sole nongraphic ASCII character
+that, by definition, is not a control character.
+Spaces in filenames are unconventional and perhaps inadvisable,
+but they are hardly unusual;
+whereas tabs and line feeds are,
+for good reason,
+practically never seen.)
+.SH CONFORMING TO
+POSIX.1-2001 and later.
+.SH SEE ALSO
+.BR ls (1),
+.BR sh (1),
+.BR ext4 (5),
+.BR filesystems (5),
+.BR ascii (7),
+.BR locale (7),
+.BR unicode (7),
+.BR utf-8 (7)
+.PP
+info
+.B coreutils
+.\" The author, Thaddeus H. Black, thanks
+.\" his wife Kristie, daughter Naomi and son George
+.\" for their review and proofreading
+.\" of various parts of this manual page; and also thanks
+.\" Alejandro Colomar, G. Branden Robinson and Hendrik Boom
+.\" for their subsequent inspection, discussion and revision.
--
2.30.2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3] filename.7: new manual page
2021-10-18 16:33 ` [PATCH v3] " Thaddeus H. Black
@ 2021-10-19 8:54 ` Florian Weimer
2021-10-19 11:05 ` Thaddeus H. Black
2021-10-19 13:38 ` Alejandro Colomar (man-pages)
1 sibling, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2021-10-19 8:54 UTC (permalink / raw)
To: Thaddeus H. Black
Cc: linux-man, Alejandro Colomar, G. Branden Robinson,
Michael Kerrisk
* Thaddeus H. Black:
> +.TH FILENAME 7 2021-10-18 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +filename \- requirements and conventions for the naming of files
> +.SH DESCRIPTION
> +This manual page sets forth requirements for
> +and delineates conventions regarding filenames
> +on a Linux system,
> +where a
> +.I filename
> +is either (as the word suggests) the name of a regular file
> +or the name of another object held by the system's filesystem
> +such as a directory, symbolic link, named pipe or device.
Maybe add: “A pathname contains zero or more filenames.”
> +.SS Legal filenames
> +A filename on a Linux system can consist
> +of almost any sequence of UTF-8 characters
> +or, indeed, almost any sequence of bytes.
> +The exceptions are as follows.
> +.TP
> +.B Reserved characters
> +.RS
> +The following characters are reserved.
> +.TP
> +.B /
> +The solidus is reserved to separate pathname components
> +as for example in
> +.IR /usr/share/doc ,
> +each component being itself a filename.
> +For this reason, no filename may include a solidus.
> +More precisely,
> +no filename may include the byte that,
> +in ASCII and UTF-8,
> +exclusively represents the solidus.
What does this mean? I think only byte 0x2f is reserved. The UTF-8
comment is misleading. A historic/overlong encoding of / in multiple
UTF-8 bytes is *not* reserved.
> +.B \e0
> +The null character is reserved for the filesystem to append
> +to terminate a filename's representation in memory.
> +For this reason, no filename may include a null character.
> +More precisely,
> +no filename may include the byte that,
> +in ASCII and UTF-8,
> +exclusively represents the null character.
See above.
> +.B Reserved names
> +.RS
> +The following names are reserved.
> +.TP
> +.B .
> +The filename consisting of a single full stop
> +is reserved to represent the current directory.
> +.TP
> +.B ..
> +The filename consisting of two full stops
> +is reserved to represent the parent directory.
> +.TP
> +(empty)
> +The empty filename,
> +consisting of no bytes at all
> +(except a terminating null byte),
> +is not allowed.
This conflicts with the presentation of / as a separator in pathnames, I
think: The pathname "/usr/" contains two empty filenames.
> +.TP
> +.B Long names
> +.RS
> +No filename may exceed\~255 bytes in length,
> +or\~256 bytes after counting the terminating null byte.
This is not correct for Linux. Despite the definition of NAME_MAX,
filenames can be longer than 255 bytes. NTFS and CIFS have a limit of
255 UTF-16 characters, which translates to about 768 bytes in the UTF-8
encoding used by Linux.
Thanks,
Florian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] filename.7: new manual page
2021-10-19 8:54 ` Florian Weimer
@ 2021-10-19 11:05 ` Thaddeus H. Black
2021-10-19 13:55 ` Alejandro Colomar (man-pages)
2021-10-20 8:12 ` Florian Weimer
0 siblings, 2 replies; 10+ messages in thread
From: Thaddeus H. Black @ 2021-10-19 11:05 UTC (permalink / raw)
To: Florian Weimer
Cc: linux-man, Alejandro Colomar, G. Branden Robinson,
Michael Kerrisk
[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]
On Tue, Oct 19, 2021 at 10:54:11AM +0200, Florian Weimer wrote:
> Maybe add: “A pathname contains zero or more filenames.”
Okay.
> What does this mean? I think only byte 0x2f is reserved. The UTF-8
> comment is misleading. A historic/overlong encoding of / in multiple
> UTF-8 bytes is *not* reserved.
I had not known that UTF-8 had an alternate encoding for any ASCII
character. Does it indeed have an alternate encoding? If so, where
can I learn more?
The new filename(7) manual page wishes to be correct but, otherwise,
would like to inflict upon the reader as little difficult technical
prose as it can. The page wants to remain readable. In this light, can
you advise me how the page should speak to your point?
> This conflicts with the presentation of / as a separator in pathnames, I
> think: The pathname "/usr/" contains two empty filenames.
I had not thought of that. Good point.
Thus, the empty filename is not forbidden but rather is reserved.
> > +No filename may exceed\~255 bytes in length,
> > +or\~256 bytes after counting the terminating null byte.
>
> This is not correct for Linux. Despite the definition of NAME_MAX,
> filenames can be longer than 255 bytes. NTFS and CIFS have a limit of
> 255 UTF-16 characters, which translates to about 768 bytes in the UTF-8
> encoding used by Linux.
I see.
Your feedback is helpful and appreciated (especially since you are the
first Fedora-class user to return a review).
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] filename.7: new manual page
2021-10-18 16:33 ` [PATCH v3] " Thaddeus H. Black
2021-10-19 8:54 ` Florian Weimer
@ 2021-10-19 13:38 ` Alejandro Colomar (man-pages)
2021-11-07 14:36 ` Thaddeus H. Black
1 sibling, 1 reply; 10+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-10-19 13:38 UTC (permalink / raw)
To: Thaddeus H. Black; +Cc: G. Branden Robinson, linux-man, Michael Kerrisk
Hi Thaddeus,
On 10/18/21 6:33 PM, Thaddeus H. Black wrote:
> Please find patch v3 below, via "git format-patch".
> (If the format is still wrong, or if there is some other way
> in which I should better support the team's work flow, kindly advise.)
'git format-patch' is the preferred method :)
What I missed here is the long (and great) commit message from v1, which
I'm going to save as the commit message. Please, when you send v4,
include the original text.
Ephemeral stuff that should not go into the commit message (like
changelogs between versions of the patch), you can put it above, in a
"scissor patch" format (see git-format-patch(1) if necessary).
Or if it's short/simple enough, below the '---' (and just above the
patch itself; it is actually ignored by git, unless it's so complex that
it is misinterpreted as part of the patch). This method is what I
usually use, since it doesn't require specifying '--scissors', and I
usually only write normal text that can't be confused with the patch. I
don't know if this is documented anywhere, but it's very useful.
Thanks,
Alex
>
> ---------------------------------------------------------------------------
> CHANGES IN v3
> ---------------------------------------------------------------------------
>
> Thaddeus H. Black (3):
> Polish prose under "Special semantics"
> Polish and clarify prose under "Letter case"
> Clarify the final paragraph under "Locales and Unicode"
>
> ---------------------------------------------------------------------------
> CHANGES IN v2
> ---------------------------------------------------------------------------
>
> Hendrik Boom, G. Branden Robinson and Alejandro Colomar (1):
> Write "uppercase" and "lowercase" rather than "capital" and "small"
>
> Alejandro Colomar and G. Branden Robinson (2):
> Use semantic newlines
> Avoid \f, but rather use separate lines
>
> Alejandro Colomar (11):
> Use subsections instead of sections
> Use subsubsections instead of subsections
> Remove unnecessary .P after .S[HS]
> Use .PP rather than .P
> Fix indentation of paragraph, which continues talking about \0
> Mention FAT
> For consistency, list "-" with "-name" and ".name"; s/a pair of/some/
> Delete the redundant mention of "."
> By s/all but/almost/, avoid double negation
> Reference filesystems(5) under SEE ALSO
> Under CONFORMING TO, write only, "POSIX.1‐2001 and later."
>
> G. Branden Robinson (3):
> Write "letter case" rather than "capitalization"
> Reference section-3 pages not under SEE ALSO but only in passing
> Avoid \c
>
> Thaddeus H. Black (2):
> Reword subsubsect "Special semantics" to support Alejandro's change no. 7
> Avoid beginning any subsect or subsubsect with specially formatted text
>
> ---------------------------------------------------------------------------
> GROFF SOURCE v3 (IN GIT'S PATCH FORMAT)
> ---------------------------------------------------------------------------
>
> ---
Here you can write your patch changelogs.
> man7/filename.7 | 660 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 660 insertions(+)
> create mode 100644 man7/filename.7
>
> diff --git a/man7/filename.7 b/man7/filename.7
> new file mode 100644
> index 000000000..9c53f8c7b
> --- /dev/null
> +++ b/man7/filename.7
> @@ -0,0 +1,660 @@
> +.\" Copyright (C) 2021 Thaddeus H. Black <thb@debian.org>
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] filename.7: new manual page
2021-10-19 11:05 ` Thaddeus H. Black
@ 2021-10-19 13:55 ` Alejandro Colomar (man-pages)
2021-10-20 8:12 ` Florian Weimer
1 sibling, 0 replies; 10+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-10-19 13:55 UTC (permalink / raw)
To: Thaddeus H. Black, Florian Weimer
Cc: linux-man, Michael Kerrisk, G. Branden Robinson
Hi, Florian!
On 10/19/21 1:05 PM, Thaddeus H. Black wrote:
> On Tue, Oct 19, 2021 at 10:54:11AM +0200, Florian Weimer wrote:
>> Maybe add: “A pathname contains zero or more filenames.”
>
> Okay.
>
>> What does this mean? I think only byte 0x2f is reserved. The UTF-8
>> comment is misleading. A historic/overlong encoding of / in multiple
>> UTF-8 bytes is *not* reserved.
>
> I had not known that UTF-8 had an alternate encoding for any ASCII
> character. Does it indeed have an alternate encoding? If so, where
> can I learn more?
>
> The new filename(7) manual page wishes to be correct but, otherwise,
> would like to inflict upon the reader as little difficult technical
> prose as it can. The page wants to remain readable. In this light, can
> you advise me how the page should speak to your point?
>
>> This conflicts with the presentation of / as a separator in pathnames, I
>> think: The pathname "/usr/" contains two empty filenames.
>
> I had not thought of that. Good point.
>
> Thus, the empty filename is not forbidden but rather is reserved.
Not according to POSIX:
<https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_271>
[
3.271 Pathname
A string that is used to identify a file. In the context of
POSIX.1-2017, a pathname may be limited to {PATH_MAX} bytes, including
the terminating null byte. It has optional beginning <slash> characters,
followed by zero or more filenames separated by <slash> characters. A
pathname can optionally contain one or more trailing <slash> characters.
Multiple successive <slash> characters are considered to be the same as
one <slash>, except for the case of exactly two leading <slash> characters.
Note:
If a pathname consists of only bytes corresponding to characters
from the portable filename character set (see Portable Filename
Character Set), <slash> characters, and a single terminating <NUL>
character, the pathname will be usable as a character string in all
supported locales; otherwise, the pathname might only be a string
(rather than a character string). Additionally, since the single-byte
encoding of the <slash> character is required to be the same across all
locales and to not occur within a multi-byte character, references to a
<slash> character within a pathname are well-defined even when the
pathname is not a character string. However, this property does not
necessarily hold for the remaining characters within the portable
filename character set.
Pathname Resolution is defined in detail in Pathname Resolution.
3.272 Pathname Component
See Filename in Filename.
]
[
3.170 Filename
A sequence of bytes consisting of 1 to {NAME_MAX} bytes used to name a
file. The bytes composing the name shall not contain the <NUL> or
<slash> characters. In the context of a pathname, each filename shall be
followed by a <slash> or a <NUL> character; elsewhere, a filename
followed by a <NUL> character forms a string (but not necessarily a
character string). The filenames dot and dot-dot have special meaning. A
filename is sometimes referred to as a "pathname component". See also
Pathname.
Note:
Pathname Resolution is defined in detail in Pathname Resolution .
]
According to the above, there's no optionally-empty always-existing
initial filename in a pathname. It's the initial slash that is
optional, and the first filename is the one that goes after the first
optional slash. That's especially true in some systems such as Cygwin,
which has a special meaning for an initial '//'.
Multiple successive non-initial slashes also don't have empty filenames
between them, but are a single token, equivalent to a single slash,
acording to POSIX.
All of the above, AFAIK :)
>
>>> +No filename may exceed\~255 bytes in length,
>>> +or\~256 bytes after counting the terminating null byte.
>>
>> This is not correct for Linux. Despite the definition of NAME_MAX,
>> filenames can be longer than 255 bytes. NTFS and CIFS have a limit of
>> 255 UTF-16 characters, which translates to about 768 bytes in the UTF-8
>> encoding used by Linux.
>
> I see.
>
> Your feedback is helpful and appreciated (especially since you are the
> first Fedora-class user to return a review).
>
Thank you both!
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] filename.7: new manual page
2021-10-19 11:05 ` Thaddeus H. Black
2021-10-19 13:55 ` Alejandro Colomar (man-pages)
@ 2021-10-20 8:12 ` Florian Weimer
2021-10-21 12:18 ` Thaddeus H. Black
1 sibling, 1 reply; 10+ messages in thread
From: Florian Weimer @ 2021-10-20 8:12 UTC (permalink / raw)
To: Thaddeus H. Black
Cc: linux-man, Alejandro Colomar, G. Branden Robinson,
Michael Kerrisk
* Thaddeus H. Black:
>> What does this mean? I think only byte 0x2f is reserved. The UTF-8
>> comment is misleading. A historic/overlong encoding of / in multiple
>> UTF-8 bytes is *not* reserved.
>
> I had not known that UTF-8 had an alternate encoding for any ASCII
> character. Does it indeed have an alternate encoding? If so, where
> can I learn more?
See the Security Considerations section in the RFC:
<https://datatracker.ietf.org/doc/html/rfc3629#section-10>
Most file systems do not treat file names as UTF-8, so they do not
perform any validation.
Thanks,
Florian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] filename.7: new manual page
2021-10-20 8:12 ` Florian Weimer
@ 2021-10-21 12:18 ` Thaddeus H. Black
0 siblings, 0 replies; 10+ messages in thread
From: Thaddeus H. Black @ 2021-10-21 12:18 UTC (permalink / raw)
To: Florian Weimer
Cc: linux-man, Alejandro Colomar, G. Branden Robinson,
Michael Kerrisk
[-- Attachment #1: Type: text/plain, Size: 2582 bytes --]
This long email asks for no one's close attention but Florian's. Other
readers can skim the email or skip it, at their discretion.
On Wed, Oct 20, 2021 at 10:12:02AM +0200, Florian Weimer wrote:
> > > What does this mean? I think only byte 0x2f is reserved. The UTF-8
> > > comment is misleading. A historic/overlong encoding of / in multiple
> > > UTF-8 bytes is *not* reserved.
> >
> > I had not known that UTF-8 had an alternate encoding for any ASCII
> > character. Does it indeed have an alternate encoding? If so, where
> > can I learn more?
>
> See the Security Considerations section in the RFC:
>
> <https://datatracker.ietf.org/doc/html/rfc3629#section-10>
>
> Most file systems do not treat file names as UTF-8, so they do not
> perform any validation.
I see. That RFC explains it well: there exists no legal alternate
encoding, but rather several illegal encodings that, were they not
illegal, *would be* alternate encodings. In the case of the solidus,
the legal encoding is 2F but the illegal encodings are
C0 AF
E0 80 AF
F0 80 80 AF
F8 80 80 80 AF
FC 80 80 80 80 AF
This problem has nothing to do with Unicode but is merely an artifact
of UTF-8 -- and that's your point, isn't it? Most filesystems do not
care about UTF-8, so they do not perform any validation.
In view of your advice, I should think about how to rewrite the relevant
prose so that it is neither [i] confusing to inexperienced users
nor [ii] inaccurate.
Question: the filename(7) manual page ought to emphasize the
requirements of filesystems widely deployed for general-purpose use on
standard Linux installations. As far as I know, exactly three such
filesystems exist:
* ext4;
* xfs;
* btrfs.
Do any other such filesystems exist?
Comments:
1. I have heard of reiserfs and reiser4 but have not heard of anyone
that actually uses them since about 15 years ago.
2. There are also nfs, iso9660/joliet/rockridge, vfat, ntfs, cifs
and a few others. These are network-oriented, archive-oriented,
special-purpose, foreign and/or compatibility-oriented filesystems. If
the filename(7) manual page mentions the requirements of such
filesystems at all, it should mention them only briefly, in passing.
Otherwise, the page would become too confusing and grow too long.
(Also, I know too little about most of these extra filesystems to write
about them.)
3. Happily, the three main filesystems -- ext4, xfs and btrfs -- all
have similar filename requirements as far as I know.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] filename.7: new manual page
2021-10-19 13:38 ` Alejandro Colomar (man-pages)
@ 2021-11-07 14:36 ` Thaddeus H. Black
0 siblings, 0 replies; 10+ messages in thread
From: Thaddeus H. Black @ 2021-11-07 14:36 UTC (permalink / raw)
To: Alejandro Colomar (man-pages)
Cc: G. Branden Robinson, linux-man, Michael Kerrisk
[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]
On Tue, Oct 19, 2021 at 03:38:09PM +0200, Alejandro Colomar (man-pages) wrote:
> 'git format-patch' is the preferred method :)
>
> What I missed here is the long (and great) commit message from v1, which I'm
> going to save as the commit message. Please, when you send v4, include the
> original text.
Okay.
> Ephemeral stuff that should not go into the commit message (like changelogs
> between versions of the patch), you can put it above, in a "scissor patch"
> format (see git-format-patch(1) if necessary).
>
> Or if it's short/simple enough, below the '---' (and just above the patch
> itself; it is actually ignored by git, unless it's so complex that it is
> misinterpreted as part of the patch). This method is what I usually use,
> since it doesn't require specifying '--scissors', and I usually only write
> normal text that can't be confused with the patch. I don't know if this is
> documented anywhere, but it's very useful.
In view of feedback, v4 requires substantial rewriting. Rewritten text
wants polishing. Polishing takes time.
In short, I haven't forgotten, but v4 is not yet ready.
Note to other readers: if any latecomers to the conversation wish to
suggest further significant changes, now would be the time to suggest;
for once the draft has been reassembled and repolished, it will be too
late. Nonsubscribers can find v3 at [1].
1: https://www.spinics.net/lists/linux-man/msg21267.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-11-07 15:21 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-17 23:07 [PATCH v2] filename.7: new manual page Thaddeus H. Black
2021-10-18 16:25 ` Thaddeus H. Black
2021-10-18 16:33 ` [PATCH v3] " Thaddeus H. Black
2021-10-19 8:54 ` Florian Weimer
2021-10-19 11:05 ` Thaddeus H. Black
2021-10-19 13:55 ` Alejandro Colomar (man-pages)
2021-10-20 8:12 ` Florian Weimer
2021-10-21 12:18 ` Thaddeus H. Black
2021-10-19 13:38 ` Alejandro Colomar (man-pages)
2021-11-07 14:36 ` Thaddeus H. Black
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox