linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx@kernel.org>
To: Brian.Inglis@Shaw.ca, linux-man@vger.kernel.org
Cc: Deri <deri@chuzzlewit.myzen.co.uk>
Subject: Re: No 6.05/.01 pdf book available
Date: Mon, 28 Aug 2023 23:11:56 +0200	[thread overview]
Message-ID: <fab31245-a17a-f472-f570-f5b35d2c79b4@kernel.org> (raw)
In-Reply-To: <1435b3f6-b4fb-28b1-3c54-547c9a7e919a@Shaw.ca>


[-- Attachment #1.1: Type: text/plain, Size: 4470 bytes --]

Hi Brian,

On 2023-08-28 20:24, Brian Inglis wrote:
> On 2023-08-28 06:17, Alejandro Colomar wrote:
>> Hi Brian,
>>
>> On 2023-08-22 01:45, Brian Inglis wrote:
>>> I am in favour of all punctuation being treated as word spaces and sorting
>>> "cat ..." before "cat..." but find the real orders more evocative and easier to
>>> decide about than examples.
>>
>> Here's an excerpt of how treating - and _ as spaces looks like.  I think
>> it's a reasonable order.  Should I apply that diff?
>>
>> Cheers,
>> Alex
>>
>> $ git diff
>> diff --git a/scripts/sortman b/scripts/sortman
>> index a8f70bab5..6d1d92f09 100755
>> --- a/scripts/sortman
>> +++ b/scripts/sortman
>> @@ -9,7 +9,7 @@ sed   -E '/\/intro./  s/.*\.([[:digit:]])/\10\t&/' \
>>   | sed -E '            s/\t(.*)/&\n\1/' \
>>   | sed -E '/\t/        s/\.[[:digit:]]([[:alpha:]][[:alnum:]]*)?\>.*//' \
>>   | sed -E '/\t/        s/\/[_-]*/\//g' \
>> -| sed -E '/\t/        s/[_-]/_/g' \
>> +| sed -E '/\t/        s/[_-]/ /g' \
>>   | sed -E '/\t/        {N;s/\n/\t/;}' \
>>   | sort -fV -k1,2 \
>>   | cut -f3;
>> $ touch man8/ld-z.8
>> $ touch man8/ld.8
>> $ find man8 | ./scripts/sortman
>> man8/intro.8
>> man8/iconvconfig.8
>> man8/ld.8
>> man8/ld-linux.8
>> man8/ld-linux.so.8
>> man8/ld-z.8
>> man8/ld.so.8
>> man8/ldconfig.8
>> man8/nscd.8
>> man8/sln.8
>> man8/tzselect.8
>> man8/zdump.8
>> man8/zic.8
>> man8
> 
> Looks better,

Thanks, I've applied and pushed the patch.

> but should your sort *key* field instance also drop the section 
> suffix (already in prefix)

It is already dropped.  Am I understanding it correctly?
Here's a debug patch to view the sort key field:

diff --git a/scripts/sortman b/scripts/sortman
index 6d1d92f09..e690f23ea 100755
--- a/scripts/sortman
+++ b/scripts/sortman
@@ -12,4 +12,5 @@ sed   -E '/\/intro./  s/.*\.([[:digit:]])/\10\t&/' \
 | sed -E '/\t/        s/[_-]/ /g' \
 | sed -E '/\t/        {N;s/\n/\t/;}' \
 | sort -fV -k1,2 \
+| tee /dev/tty \
 | cut -f3;


And here's how it looks with man8 (plus the dummy files):


$ find man8 -type f | ./scripts/sortman
80	man8/intro	man8/intro.8
81	man8/iconvconfig	man8/iconvconfig.8
81	man8/ld	man8/ld.8
81	man8/ld linux	man8/ld-linux.8
81	man8/ld linux.so	man8/ld-linux.so.8
81	man8/ld z	man8/ld-z.8
81	man8/ld.so	man8/ld.so.8
81	man8/ldconfig	man8/ldconfig.8
81	man8/nscd	man8/nscd.8
81	man8/sln	man8/sln.8
81	man8/tzselect	man8/tzselect.8
81	man8/zdump	man8/zdump.8
81	man8/zic	man8/zic.8
man8/intro.8
man8/iconvconfig.8
man8/ld.8
man8/ld-linux.8
man8/ld-linux.so.8
man8/ld-z.8
man8/ld.so.8
man8/ldconfig.8
man8/nscd.8
man8/sln.8
man8/tzselect.8
man8/zdump.8
man8/zic.8


There are no suffixes in the second field.

> and also treat "." as space?

I'had been thinking about it, but didn't make an opinion.
Since they are rare, I think making them stand out a little bit
by having a special order rather than just being mixed with the
underscores would make sense.  But I'm open to change that.

> Where would you expect to see ld.so?

Not sure.

> 
> Also, in `sed`, instead of cloning the line, at the start of a series of 
> executions, make them all into a single inline command script, start with `h` to 
> *hold* the input line, and end with `G` instead of `N` to append '\n' then the 
> held line, convert to `\t`, drop the braces, and you can skip the then redundant 
> tests, something like the following should get you close (tried it earlier, now 
> sadly already gone from history):
> 
> | sed -E '
> 	h
> 	/\/intro./  s/.*\.([[:digit:]])/\10\t&/
> 	s/\.[[:digit:]]([[:alpha:]][[:alnum:]]*)?\>.*//
> 	s/\/[_-]*/\//g
> 	s/[_-]/_/g
> 	s/[_-]/ /g
> 	G
> 	s/\n/\t/
> 	' \
> | ...

I prefer having many one-liners for a few reasons:

-  Not everybody knows what h and G do.  I did't.  And I will
   soon forget.  In contrast, my implementation has nothing
   rare in it.

-  I can inspect the contents at each of the steps easily by
   adding a line with `| tee /dev/tty \`, for debug purposes.

In general, I avoid having large scripts in other languages.
I prefer piping many one-liners, even if it might be less
efficient (but it uses more cores, so it might end up being
faster; I've seen such things happen already many times).

Cheers,
Alex 

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-08-28 21:13 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-07  1:16 [PATCH] scripts/LinuxManBook/gropdf: use symlink instead of hard coded groff version Brian Inglis
2023-08-07  2:46 ` No 6.05/.01 pdf book available Brian Inglis
2023-08-07  8:45   ` Alejandro Colomar
2023-08-07  9:16     ` Alejandro Colomar
2023-08-07 16:21       ` Brian Inglis
2023-08-12  0:02         ` Alejandro Colomar
2023-08-12  1:48           ` G. Branden Robinson
2023-08-12 21:32             ` Alejandro Colomar
     [not found]     ` <21975186.EfDdHjke4D@pip>
2023-08-11 23:51       ` Alejandro Colomar
2023-08-12  3:04         ` G. Branden Robinson
2023-08-12 21:33           ` Alejandro Colomar
2023-08-12 17:02       ` Brian Inglis
2023-08-12 20:02         ` Deri
2023-08-13 20:30           ` Brian Inglis
2023-08-13 20:47             ` Alejandro Colomar
2023-08-13 21:55               ` G. Branden Robinson
2023-08-13 22:45                 ` Alejandro Colomar
2023-08-13 22:18               ` Alejandro Colomar
2023-08-14  6:49                 ` Brian Inglis
2023-08-14 10:46                   ` Alejandro Colomar
2023-08-13 21:47             ` hyphens at ends of pages (was: No 6.05/.01 pdf book available) G. Branden Robinson
2023-08-14  5:28               ` Brian Inglis
2023-08-14 16:06             ` No 6.05/.01 pdf book available Deri
2023-08-14 17:37               ` Alejandro Colomar
2023-08-14 20:01                 ` Alejandro Colomar
2023-08-14 21:22                   ` Deri
2023-08-14 21:32                     ` Alejandro Colomar
2023-08-14 23:26                       ` Deri
2023-08-14 21:40                 ` Deri
2023-08-15  0:50                   ` groff features for hyperlinked man pages (was: No 6.05/.01 pdf book available) G. Branden Robinson
2023-08-15 10:34                     ` G. Branden Robinson
2023-08-18 13:50                     ` Alejandro Colomar
2023-08-19  4:37                       ` G. Branden Robinson
2023-10-01 12:02                         ` Alejandro Colomar
2023-08-18 10:29                   ` No 6.05/.01 pdf book available Alejandro Colomar
2023-08-15  0:34               ` Brian Inglis
2023-08-20 16:48                 ` Deri
2023-08-20 18:54                   ` Alejandro Colomar
2023-08-20 19:06                   ` Brian Inglis
     [not found]                     ` <3262525.44csPzL39Z@pip>
2023-08-21 22:02                       ` Alejandro Colomar
2023-08-21 23:10                         ` Deri
2023-08-21 23:45                         ` Brian Inglis
2023-08-28 12:17                           ` Alejandro Colomar
2023-08-28 18:24                             ` Brian Inglis
2023-08-28 21:11                               ` Alejandro Colomar [this message]
2023-08-07  8:29 ` [PATCH] scripts/LinuxManBook/gropdf: use symlink instead of hard coded groff version Alejandro Colomar
2023-08-07 15:01 ` Brian Inglis
2023-08-11 23:57 ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fab31245-a17a-f472-f570-f5b35d2c79b4@kernel.org \
    --to=alx@kernel.org \
    --cc=Brian.Inglis@Shaw.ca \
    --cc=deri@chuzzlewit.myzen.co.uk \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).