* [PATCH] strverscmp.3: this is NOT the ordering used by ls -v
@ 2024-12-15 20:17 Ahelenia Ziemiańska
2024-12-15 20:43 ` Alejandro Colomar
0 siblings, 1 reply; 6+ messages in thread
From: Ahelenia Ziemiańska @ 2024-12-15 20:17 UTC (permalink / raw)
To: alx; +Cc: linux-man
[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]
Compare, given:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int compar(const char **l, const char **r) {
return strverscmp(*l, *r);
}
int main(int argc, char ** argv) {
qsort(argv + 1, argc - 1, sizeof(*argv), compar);
for(int i = 1; i < argc; ++i)
puts(argv[i]);
}
yields:
$ /bin/ls -v1 a* # coreutils ls
a-1.0a
a-1.0.1a
$ ../vers a* # as above
a-1.0.1a
a-1.0a
$ ls -v1 a* # voreutils ls @ 5781698 with strverscmp()-equivalent sorting
a-1.0.1a
a-1.0a
compare also the results for real data like
netstat-nat-1.{0,1{,.1},2,3.1,4{,.{1,2,3,4,5,6,7,8,9,10}}}.tar.gz
Thus, coreutils ls -v does NOT use strverscmp(3),
it uses a similar algorithm that actually properly sorts versions,
not just single numbers.
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
man/man3/strverscmp.3 | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/man/man3/strverscmp.3 b/man/man3/strverscmp.3
index 41bc1ddbd..7c3643860 100644
--- a/man/man3/strverscmp.3
+++ b/man/man3/strverscmp.3
@@ -25,16 +25,7 @@ .SH DESCRIPTION
orders them
.IR jan1 ", " jan10 ", ..., " jan2 ", ..., " jan9 .
.\" classical solution: "rename jan jan0 jan?"
-In order to rectify this, GNU introduced the
-.I \-v
-option to
-.BR ls (1),
-which is implemented using
-.BR versionsort (3),
-which again uses
-.BR strverscmp ().
-.P
-Thus, the task of
+The task of
.BR strverscmp ()
is to compare two strings and find the "right" order, while
.BR strcmp (3)
@@ -44,6 +35,10 @@ .SH DESCRIPTION
.BR LC_COLLATE ,
so is meant mostly for situations
where the strings are expected to be in ASCII.
+This is not actually the ordering produced by
+.BR ls (1)
+.BR -v .
+.\" because it considers a-1.0.1a < a-1.0a; this is not what you want
.P
What this function does is the following.
If both strings are equal, return 0.
--
2.39.5
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] strverscmp.3: this is NOT the ordering used by ls -v 2024-12-15 20:17 [PATCH] strverscmp.3: this is NOT the ordering used by ls -v Ahelenia Ziemiańska @ 2024-12-15 20:43 ` Alejandro Colomar 2024-12-15 21:02 ` [PATCH v2] " наб 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar @ 2024-12-15 20:43 UTC (permalink / raw) To: Ahelenia Ziemiańska; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2790 bytes --] Hi nab, On Sun, Dec 15, 2024 at 09:17:59PM +0100, Ahelenia Ziemiańska wrote: > Compare, given: > #include <stdlib.h> > #include <stdio.h> > #include <string.h> > int compar(const char **l, const char **r) { > return strverscmp(*l, *r); > } > int main(int argc, char ** argv) { > qsort(argv + 1, argc - 1, sizeof(*argv), compar); > for(int i = 1; i < argc; ++i) > puts(argv[i]); > } > yields: > $ /bin/ls -v1 a* # coreutils ls > a-1.0a > a-1.0.1a > $ ../vers a* # as above > a-1.0.1a > a-1.0a > $ ls -v1 a* # voreutils ls @ 5781698 with strverscmp()-equivalent sorting > a-1.0.1a > a-1.0a Should we file a bug against glibc strverscmp(3)? We probably should. > compare also the results for real data like > netstat-nat-1.{0,1{,.1},2,3.1,4{,.{1,2,3,4,5,6,7,8,9,10}}}.tar.gz > > Thus, coreutils ls -v does NOT use strverscmp(3), > it uses a similar algorithm that actually properly sorts versions, > not just single numbers. First time I learn about ls(1) having a -v option. :| Was people too lazy to type `ls | sort -V`? > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man/man3/strverscmp.3 | 15 +++++---------- > 1 file changed, 5 insertions(+), 10 deletions(-) > > diff --git a/man/man3/strverscmp.3 b/man/man3/strverscmp.3 > index 41bc1ddbd..7c3643860 100644 > --- a/man/man3/strverscmp.3 > +++ b/man/man3/strverscmp.3 > @@ -25,16 +25,7 @@ .SH DESCRIPTION > orders them > .IR jan1 ", " jan10 ", ..., " jan2 ", ..., " jan9 . > .\" classical solution: "rename jan jan0 jan?" > -In order to rectify this, GNU introduced the > -.I \-v > -option to > -.BR ls (1), > -which is implemented using > -.BR versionsort (3), > -which again uses > -.BR strverscmp (). > -.P > -Thus, the task of > +The task of > .BR strverscmp () > is to compare two strings and find the "right" order, while > .BR strcmp (3) > @@ -44,6 +35,10 @@ .SH DESCRIPTION > .BR LC_COLLATE , > so is meant mostly for situations > where the strings are expected to be in ASCII. > +This is not actually the ordering produced by > +.BR ls (1) > +.BR -v . > +.\" because it considers a-1.0.1a < a-1.0a; this is not what you want I hate this reference to ls(1). ls(1) should not even have a -v option. Please refer to sort(1) instead. I would wipe any references to file names in this page, as I don't think they are relevant at all. And the reference to sort(1), I'd put it in BUGS, saying that this API is broken, and does not sort properly. Sounds good? Have a lovely night! Alex > .P > What this function does is the following. > If both strings are equal, return 0. > -- > 2.39.5 -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2] strverscmp.3: this is NOT the ordering used by ls -v 2024-12-15 20:43 ` Alejandro Colomar @ 2024-12-15 21:02 ` наб 2024-12-15 21:44 ` Alejandro Colomar 0 siblings, 1 reply; 6+ messages in thread From: наб @ 2024-12-15 21:02 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 4100 bytes --] On Sun, Dec 15, 2024 at 09:43:58PM +0100, Alejandro Colomar wrote: > On Sun, Dec 15, 2024 at 09:17:59PM +0100, Ahelenia Ziemiańska wrote: > > Compare, given: > > #include <stdlib.h> > > #include <stdio.h> > > #include <string.h> > > int compar(const char **l, const char **r) { > > return strverscmp(*l, *r); > > } > > int main(int argc, char ** argv) { > > qsort(argv + 1, argc - 1, sizeof(*argv), compar); > > for(int i = 1; i < argc; ++i) > > puts(argv[i]); > > } > > yields: > > $ /bin/ls -v1 a* # coreutils ls > > a-1.0a > > a-1.0.1a > > $ ../vers a* # as above > > a-1.0.1a > > a-1.0a > > $ ls -v1 a* # voreutils ls @ 5781698 with strverscmp()-equivalent sorting > > a-1.0.1a > > a-1.0a > Should we file a bug against glibc strverscmp(3)? We probably should. > > And the reference to sort(1), I'd put it in BUGS, saying that this API > is broken, and does not sort properly. Sounds good? No, this API works as-documented, and the implementation is useful. It's just not what ls -v does. > > @@ -44,6 +35,10 @@ .SH DESCRIPTION > > .BR LC_COLLATE , > > so is meant mostly for situations > > where the strings are expected to be in ASCII. > > +This is not actually the ordering produced by > > +.BR ls (1) > > +.BR -v . > > +.\" because it considers a-1.0.1a < a-1.0a; this is not what you want > Please refer to sort(1) instead. I would wipe any references to file > names in this page, as I don't think they are relevant at all. Applied in scissor-patch, below Best, -- >8 -- From: =?UTF-8?q?Ahelenia=20Ziemia=C5=84ska?= <nabijaczleweli@nabijaczleweli.xyz> Subject: [PATCH] strverscmp.3: this is NOT the ordering used by ls -v Compare, given: #include <stdlib.h> #include <stdio.h> #include <string.h> int compar(const char **l, const char **r) { return strverscmp(*l, *r); } int main(int argc, char ** argv) { qsort(argv + 1, argc - 1, sizeof(*argv), compar); for(int i = 1; i < argc; ++i) puts(argv[i]); } yields: $ /bin/ls -v1 a* # coreutils ls a-1.0a a-1.0.1a $ ../vers a* # as above a-1.0.1a a-1.0a $ ls -v1 a* # voreutils ls @ 5781698 with strverscmp()-equivalent sorting a-1.0.1a a-1.0a compare also the results for real data like netstat-nat-1.{0,1{,.1},2,3.1,4{,.{1,2,3,4,5,6,7,8,9,10}}}.tar.gz Thus, coreutils ls -v does NOT use strverscmp(3), it uses a similar algorithm that actually properly sorts versions, not just single numbers. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man/man3/strverscmp.3 | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/man/man3/strverscmp.3 b/man/man3/strverscmp.3 index 41bc1ddbd..65346410c 100644 --- a/man/man3/strverscmp.3 +++ b/man/man3/strverscmp.3 @@ -18,25 +18,14 @@ .SH SYNOPSIS .BI "int strverscmp(const char *" s1 ", const char *" s2 ); .fi .SH DESCRIPTION -Often one has files +For a dataset like .IR jan1 ", " jan2 ", ..., " jan9 ", " jan10 ", ..." -and it feels wrong when -.BR ls (1) -orders them +sorting it lexicographically yields .IR jan1 ", " jan10 ", ..., " jan2 ", ..., " jan9 . .\" classical solution: "rename jan jan0 jan?" -In order to rectify this, GNU introduced the -.I \-v -option to -.BR ls (1), -which is implemented using -.BR versionsort (3), -which again uses -.BR strverscmp (). -.P -Thus, the task of +The task of .BR strverscmp () -is to compare two strings and find the "right" order, while +is to compare two strings yielding the former order, while .BR strcmp (3) finds only the lexicographic order. This function does not use @@ -44,6 +33,10 @@ .SH DESCRIPTION .BR LC_COLLATE , so is meant mostly for situations where the strings are expected to be in ASCII. +This is different from the ordering produced by +.BR sort (1) +.BR -V . +.\" because it considers a-1.0.1a < a-1.0a; this is not what you want .P What this function does is the following. If both strings are equal, return 0. -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] strverscmp.3: this is NOT the ordering used by ls -v 2024-12-15 21:02 ` [PATCH v2] " наб @ 2024-12-15 21:44 ` Alejandro Colomar 2024-12-16 1:00 ` [PATCH v3] " наб 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar @ 2024-12-15 21:44 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1214 bytes --] Hi, On Sun, Dec 15, 2024 at 10:02:42PM +0100, наб wrote: > > Should we file a bug against glibc strverscmp(3)? We probably should. > > > > And the reference to sort(1), I'd put it in BUGS, saying that this API > > is broken, and does not sort properly. Sounds good? > No, this API works as-documented, and the implementation is useful. What does useful mean? > It's just not what ls -v does. While version sort isn't something standard, I think GNU should be self-consistent. > @@ -44,6 +33,10 @@ .SH DESCRIPTION > .BR LC_COLLATE , > so is meant mostly for situations > where the strings are expected to be in ASCII. > +This is different from the ordering produced by > +.BR sort (1) > +.BR -V . > +.\" because it considers a-1.0.1a < a-1.0a; this is not what you want Re: "it": sort(1) -V or strverscmp(3)? (it's the latter, I think, but don't use "it".) Re: "this is not what you want": Who is "you"? What is "this"? And why does "you" not want "this"? Please clarify. Cheers, Alex > .P > What this function does is the following. > If both strings are equal, return 0. > -- > 2.39.5 > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3] strverscmp.3: this is NOT the ordering used by ls -v 2024-12-15 21:44 ` Alejandro Colomar @ 2024-12-16 1:00 ` наб 2024-12-16 9:57 ` Alejandro Colomar 0 siblings, 1 reply; 6+ messages in thread From: наб @ 2024-12-16 1:00 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 3838 bytes --] On Sun, Dec 15, 2024 at 10:44:26PM +0100, Alejandro Colomar wrote: > On Sun, Dec 15, 2024 at 10:02:42PM +0100, наб wrote: > > > Should we file a bug against glibc strverscmp(3)? We probably should. > > > > > > And the reference to sort(1), I'd put it in BUGS, saying that this API > > > is broken, and does not sort properly. Sounds good? > > No, this API works as-documented, and the implementation is useful. > What does useful mean? There are applications where a lexicographical-except-numeric comparison like this is what you want (it's most of them). Calling it a "version sort is silly + goofy but, whatever. > > It's just not what ls -v does. > While version sort isn't something standard, I think GNU should be > self-consistent. It is, ls -v and sort -V are consistent. Having just implemented the /actual/ algorithm they use for voreutils, that is by far /not/ universally applicable, much hairier, and hard-tuned for "versions that are kinda like debian describes and sorts them (but not actually) AND ALSO we put them in filenames where we can assume the format a little bit AND ALSO {4 special cases to make ls -v work}". Replacing this well-defined lexicographical-except-numeric sorter with... that, isn't really applicable. Best, -- >8 -- From: =?UTF-8?q?Ahelenia=20Ziemia=C5=84ska?= <nabijaczleweli@nabijaczleweli.xyz> Subject: [PATCH v3] strverscmp.3: this is NOT the ordering used by ls -v Compare, given: #include <stdlib.h> #include <stdio.h> #include <string.h> int compar(const char **l, const char **r) { return strverscmp(*l, *r); } int main(int argc, char ** argv) { qsort(argv + 1, argc - 1, sizeof(*argv), compar); for(int i = 1; i < argc; ++i) puts(argv[i]); } yields: $ /bin/ls -v1 a* # coreutils ls a-1.0a a-1.0.1a $ ../vers a* # as above a-1.0.1a a-1.0a $ ls -v1 a* # voreutils ls @ 5781698 with strverscmp()-equivalent sorting a-1.0.1a a-1.0a compare also the results for real data like netstat-nat-1.{0,1{,.1},2,3.1,4{,.{1,2,3,4,5,6,7,8,9,10}}}.tar.gz Thus, coreutils ls -v does NOT use strverscmp(3); it uses a modified Debian version comparison algorithm with additional suffix processing and ls -v-specific exceptions. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man/man3/strverscmp.3 | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/man/man3/strverscmp.3 b/man/man3/strverscmp.3 index 41bc1ddbd..e028d6788 100644 --- a/man/man3/strverscmp.3 +++ b/man/man3/strverscmp.3 @@ -18,25 +18,14 @@ .SH SYNOPSIS .BI "int strverscmp(const char *" s1 ", const char *" s2 ); .fi .SH DESCRIPTION -Often one has files +For a dataset like .IR jan1 ", " jan2 ", ..., " jan9 ", " jan10 ", ..." -and it feels wrong when -.BR ls (1) -orders them +sorting it lexicographically yields .IR jan1 ", " jan10 ", ..., " jan2 ", ..., " jan9 . .\" classical solution: "rename jan jan0 jan?" -In order to rectify this, GNU introduced the -.I \-v -option to -.BR ls (1), -which is implemented using -.BR versionsort (3), -which again uses -.BR strverscmp (). -.P -Thus, the task of +The task of .BR strverscmp () -is to compare two strings and find the "right" order, while +is to compare two strings yielding the former order, while .BR strcmp (3) finds only the lexicographic order. This function does not use @@ -44,6 +33,10 @@ .SH DESCRIPTION .BR LC_COLLATE , so is meant mostly for situations where the strings are expected to be in ASCII. +This is different from the ordering produced by +.BR sort (1) +.BR -V . +.\" sort -V sorts a-1.0a < a-1.0.1a; strverscmp() does not .P What this function does is the following. If both strings are equal, return 0. -- 2.39.5 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3] strverscmp.3: this is NOT the ordering used by ls -v 2024-12-16 1:00 ` [PATCH v3] " наб @ 2024-12-16 9:57 ` Alejandro Colomar 0 siblings, 0 replies; 6+ messages in thread From: Alejandro Colomar @ 2024-12-16 9:57 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 4318 bytes --] Hi nab, On Mon, Dec 16, 2024 at 02:00:45AM +0100, наб wrote: > On Sun, Dec 15, 2024 at 10:44:26PM +0100, Alejandro Colomar wrote: > > On Sun, Dec 15, 2024 at 10:02:42PM +0100, наб wrote: > > > > Should we file a bug against glibc strverscmp(3)? We probably should. > > > > > > > > And the reference to sort(1), I'd put it in BUGS, saying that this API > > > > is broken, and does not sort properly. Sounds good? > > > No, this API works as-documented, and the implementation is useful. > > What does useful mean? > There are applications where a lexicographical-except-numeric comparison > like this is what you want (it's most of them). Calling it a "version > sort is silly + goofy but, whatever. Hmmm, yeah, we can live with that for historical raisins. > > > It's just not what ls -v does. > > While version sort isn't something standard, I think GNU should be > > self-consistent. > It is, ls -v and sort -V are consistent. > Having just implemented the /actual/ algorithm they use for voreutils, > that is by far /not/ universally applicable, much hairier, and hard-tuned for > "versions that are kinda like debian describes and sorts them (but not actually) > AND ALSO we put them in filenames where we can assume the format a little bit > AND ALSO {4 special cases to make ls -v work}". > Replacing this well-defined lexicographical-except-numeric sorter with... that, > isn't really applicable. Sounds reasonable. > > Best, > -- >8 -- > From: =?UTF-8?q?Ahelenia=20Ziemia=C5=84ska?= > <nabijaczleweli@nabijaczleweli.xyz> > Subject: [PATCH v3] strverscmp.3: this is NOT the ordering used by ls -v > > Compare, given: > #include <stdlib.h> > #include <stdio.h> > #include <string.h> > int compar(const char **l, const char **r) { > return strverscmp(*l, *r); > } > int main(int argc, char ** argv) { > qsort(argv + 1, argc - 1, sizeof(*argv), compar); > for(int i = 1; i < argc; ++i) > puts(argv[i]); > } > yields: > $ /bin/ls -v1 a* # coreutils ls > a-1.0a > a-1.0.1a > $ ../vers a* # as above > a-1.0.1a > a-1.0a > $ ls -v1 a* # voreutils ls @ 5781698 with strverscmp()-equivalent sorting > a-1.0.1a > a-1.0a > compare also the results for real data like > netstat-nat-1.{0,1{,.1},2,3.1,4{,.{1,2,3,4,5,6,7,8,9,10}}}.tar.gz > > Thus, coreutils ls -v does NOT use strverscmp(3); > it uses a modified Debian version comparison algorithm with additional > suffix processing and ls -v-specific exceptions. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied. Thanks! Have a lovely day! Alex > --- > man/man3/strverscmp.3 | 23 ++++++++--------------- > 1 file changed, 8 insertions(+), 15 deletions(-) > > diff --git a/man/man3/strverscmp.3 b/man/man3/strverscmp.3 > index 41bc1ddbd..e028d6788 100644 > --- a/man/man3/strverscmp.3 > +++ b/man/man3/strverscmp.3 > @@ -18,25 +18,14 @@ .SH SYNOPSIS > .BI "int strverscmp(const char *" s1 ", const char *" s2 ); > .fi > .SH DESCRIPTION > -Often one has files > +For a dataset like > .IR jan1 ", " jan2 ", ..., " jan9 ", " jan10 ", ..." > -and it feels wrong when > -.BR ls (1) > -orders them > +sorting it lexicographically yields > .IR jan1 ", " jan10 ", ..., " jan2 ", ..., " jan9 . > .\" classical solution: "rename jan jan0 jan?" > -In order to rectify this, GNU introduced the > -.I \-v > -option to > -.BR ls (1), > -which is implemented using > -.BR versionsort (3), > -which again uses > -.BR strverscmp (). > -.P > -Thus, the task of > +The task of > .BR strverscmp () > -is to compare two strings and find the "right" order, while > +is to compare two strings yielding the former order, while > .BR strcmp (3) > finds only the lexicographic order. > This function does not use > @@ -44,6 +33,10 @@ .SH DESCRIPTION > .BR LC_COLLATE , > so is meant mostly for situations > where the strings are expected to be in ASCII. > +This is different from the ordering produced by > +.BR sort (1) > +.BR -V . > +.\" sort -V sorts a-1.0a < a-1.0.1a; strverscmp() does not > .P > What this function does is the following. > If both strings are equal, return 0. > -- > 2.39.5 > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-12-16 9:57 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-15 20:17 [PATCH] strverscmp.3: this is NOT the ordering used by ls -v Ahelenia Ziemiańska 2024-12-15 20:43 ` Alejandro Colomar 2024-12-15 21:02 ` [PATCH v2] " наб 2024-12-15 21:44 ` Alejandro Colomar 2024-12-16 1:00 ` [PATCH v3] " наб 2024-12-16 9:57 ` Alejandro Colomar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox