Re: Using C23 digit separators not locale digit grouping characters

public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed

From: Alejandro Colomar <alx.manpages@gmail.com>
To: Linux Man-Pages <linux-man@vger.kernel.org>,
	Brian Inglis <Brian.Inglis@Shaw.ca>
Cc: наб <nabijaczleweli@nabijaczleweli.xyz>,
	"G. Branden Robinson" <g.branden.robinson@gmail.com>
Subject: Re: Using C23 digit separators not locale digit grouping characters
Date: Sun, 29 Jan 2023 22:19:49 +0100	[thread overview]
Message-ID: <5c2be1e7-4c75-dc20-8d2e-a528edea7e32@gmail.com> (raw)
In-Reply-To: <aebef9ae-1bd0-b0e7-b333-7269dbaf50a2@Shaw.ca>


[-- Attachment #1.1: Type: text/plain, Size: 5745 bytes --]

Hi Brian, Branden,

On 1/29/23 22:04, Brian Inglis wrote:
> On 2023-01-29 07:38, Alejandro Colomar wrote:
>> On 1/28/23 21:40, Brian Inglis wrote:
>>> Seeing the recent tv_nsec patches drop the standard locale digit grouping 
>>> characters "," from the member range [0-999,999,999] made me regret the loss 
>>> of the punctuation which provides better and quicker comprehension of long 
>>> strings of digits.
> 
>> Nice! Didn't remember about that separator.  It makes a lot of sense to use it 
>> in comments and the likes in the pages.  Maybe we should be a bit more 
>> cautious in source code examples, but definitely for big numbers outside of 
>> running code should have them.
> The major compilers support them from draft C23, and the code is in examples, 
> not source that has to compile on older compilers, so not much to be concerned 
> about there, although some more opinions would be helpful.

My version of gcc only supports it if you specify -std=c2x or -std=gnu2x.  It 
hasn't been backported to -std=gnu17 (the default) so far, AFAICS.

$ cc -Wall -Wextra quote.c
quote.c: In function ‘main’:
quote.c:5:18: warning: multi-character character constant [-Wmultichar]
     5 |         int x = 1'23'4;
       |                  ^~~~
quote.c:5:18: error: expected ‘,’ or ‘;’ before '\x3233'
$ cc -Wall -Wextra quote.c -std=gnu2x
$


Since most people would be compiling on default settings, I prefer avoiding 
that.  When c23 is finally released, and GCC switches to gnu23 by default, I'd 
also use it in example programs.  Does it make sense to you?

> 
>> наб, would you please update your patches with that?  I also have a few
>> comments that I'll write in a moment in answers to your patches.
>>> It may be time to consider using the locale independent C23 digit
>>> separator characters "'" wherever more than a handful of digits occur,
>>> possibly convert grouping character uses in existing man pages as they are
>>> changed, and specify a future standard policy approach to provide better
>>> and quicker comprehension of long strings of digits: perhaps using a new
>>> digit separator register and glyph escape sequence \*ds \*[ds] \[ds] \(ds
>>> if not in use by base groff?
>> The sequence for the unslanted single quote is \(aq.
> Granted, but would it not be better to consider using a semantic digit separator 
> groff man escape sequence, especially in text, whose rendering could be tweaked, 
> rather than a generic literal apostrophe quote used everywhere?
> If nothing else is proposed and accepted, I will use the generic \(aq, and if 
> future changes are required, they can be targeted by digit context.

We have little semantic things in man(7), as opposed to mdoc(7).  I think it 
will be simpler to just use \(aq.

Branden, any opinion?

> 
>> We could add somewhere in man-pages(7) that decimal numbers should use a 
>> separator every 3 digits, and hex and binary should use it every 4 digits.
> As well as the 3 decimal, 4 binary/hex, we could use yyyy'mm['dd]L for POSIX and 
> similar date digit strings, and 0x10'ffff for Unicode code points, 
> distinguishing between the Basic and Supplementary Multilingual Plane indices 
> and codes, just as examples from what I've seen so far.
> 
> I've also noticed a lot of apparently random decimal digit strings that are 
> binary powers or close deltas: those would be more comprehensible if rendered in 
> text as Ki/Mi/Gi[+/-n], so would that be preferable, using the IEC i suffix to 
> avoid ambiguity?

In running text, I'd do it case by case.  In some cases I guess that'll make 
sense.  In others, 2^32 will make more sense...  But yes, big magic fatnums are 
not nice.

> 
>>> As well as the recently modified pages:
>>>
>  >> clock_getres.2
>  >> timer_settime.2
>  >> timerfd_create.2
>  >> utimensat.2
>>>
>>> there appear to be obvious occurrences in only the following pages:
>>>
>  >> futex.2
>  >> read.2
>  >> sendfile.2
>  >> write.2
>  >> mallopt.3
>  >> keyrings.7
>  >> mq_overview.7
>  >> sched.7
>  >> time_namespaces.7
>>>
>>> but there appear to be about 400 pages with more than 6 decimal digit
>>> strings (some spurious glibc hex commits and address outputs) where it
>>> could perhaps help, such as in POSIX version dates e.g. 2001'12L, and
>>> undoubtedly more with long digit strings in other radixes.
>> Would you mind preparing a patch for all of those?  If you'll do it, better
>> wait until we merge наб's patches, to avoid conflicts.
> I'll start anyway, need to review over 300 files with over 900 digit strings, 
> having cut a bunch more pages with output examples.

Sure.

> 
> Any particular subdivision of files patched into git logged patches, by section, 
> by type of edit, separate logged patches for files with many edits, or...?

Whatever you prefer, I guess.  I think the first division I'd do is in the kind 
of change, and then in the section within a page where it appears.  But, you 
write it, so I guess you'll find the best separation.  As long as patches are 
consistent enough to not have many context switches when reviewing, it should be 
good.

> 
> FYI although many hits are likely output, the top candidates so far are:
> 
> 80 man5/proc.5
> 55 man2/statfs.2
> 34 man7/feature_test_macros.7
> 32 man3/dl_iterate_phdr.3
> 30 man7/units.7
> 30 man5/rpc.5
> 23 man3/termios.3
> 20 man3/malloc_info.3
> 17 man2/userfaultfd.2
> 16 man7/keyrings.7
> 15 man7/time_namespaces.7
> 14 man7/posixoptions.7
> 14 man3/mallopt.3
> 13 man7/utf-8.7
> 12 man2/reboot.2
> 12 man2/keyctl.2
> 

Cheers,

Alex
-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2023-01-29 21:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-28 20:40 Using C23 digit separators not locale digit grouping characters Brian Inglis
2023-01-29 14:38 ` Alejandro Colomar
2023-01-29 21:04   ` Brian Inglis
2023-01-29 21:19     ` Alejandro Colomar [this message]
2023-02-02 22:29       ` Brian Inglis
2023-02-02 23:59         ` Alejandro Colomar
2023-02-03 13:27           ` Tom Schwindl
2023-02-05 13:47             ` Alejandro Colomar
2023-02-08 21:00             ` Jakub Wilk
2023-02-08 21:06               ` Alejandro Colomar
2023-02-08 22:10                 ` Brian Inglis
2023-02-09 12:47                   ` Alejandro Colomar
2023-02-04  7:19           ` Brian Inglis
2023-02-05 13:32             ` Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c2be1e7-4c75-dc20-8d2e-a528edea7e32@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=Brian.Inglis@Shaw.ca \
    --cc=g.branden.robinson@gmail.com \
    --cc=linux-man@vger.kernel.org \
    --cc=nabijaczleweli@nabijaczleweli.xyz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox