From: Alejandro Colomar <alx.manpages@gmail.com>
To: Linux Man-Pages <linux-man@vger.kernel.org>,
Brian Inglis <Brian.Inglis@Shaw.ca>
Cc: наб <nabijaczleweli@nabijaczleweli.xyz>,
"G. Branden Robinson" <g.branden.robinson@gmail.com>
Subject: Re: Using C23 digit separators not locale digit grouping characters
Date: Sun, 29 Jan 2023 22:19:49 +0100 [thread overview]
Message-ID: <5c2be1e7-4c75-dc20-8d2e-a528edea7e32@gmail.com> (raw)
In-Reply-To: <aebef9ae-1bd0-b0e7-b333-7269dbaf50a2@Shaw.ca>
[-- Attachment #1.1: Type: text/plain, Size: 5745 bytes --]
Hi Brian, Branden,
On 1/29/23 22:04, Brian Inglis wrote:
> On 2023-01-29 07:38, Alejandro Colomar wrote:
>> On 1/28/23 21:40, Brian Inglis wrote:
>>> Seeing the recent tv_nsec patches drop the standard locale digit grouping
>>> characters "," from the member range [0-999,999,999] made me regret the loss
>>> of the punctuation which provides better and quicker comprehension of long
>>> strings of digits.
>
>> Nice! Didn't remember about that separator. It makes a lot of sense to use it
>> in comments and the likes in the pages. Maybe we should be a bit more
>> cautious in source code examples, but definitely for big numbers outside of
>> running code should have them.
> The major compilers support them from draft C23, and the code is in examples,
> not source that has to compile on older compilers, so not much to be concerned
> about there, although some more opinions would be helpful.
My version of gcc only supports it if you specify -std=c2x or -std=gnu2x. It
hasn't been backported to -std=gnu17 (the default) so far, AFAICS.
$ cc -Wall -Wextra quote.c
quote.c: In function ‘main’:
quote.c:5:18: warning: multi-character character constant [-Wmultichar]
5 | int x = 1'23'4;
| ^~~~
quote.c:5:18: error: expected ‘,’ or ‘;’ before '\x3233'
$ cc -Wall -Wextra quote.c -std=gnu2x
$
Since most people would be compiling on default settings, I prefer avoiding
that. When c23 is finally released, and GCC switches to gnu23 by default, I'd
also use it in example programs. Does it make sense to you?
>
>> наб, would you please update your patches with that? I also have a few
>> comments that I'll write in a moment in answers to your patches.
>>> It may be time to consider using the locale independent C23 digit
>>> separator characters "'" wherever more than a handful of digits occur,
>>> possibly convert grouping character uses in existing man pages as they are
>>> changed, and specify a future standard policy approach to provide better
>>> and quicker comprehension of long strings of digits: perhaps using a new
>>> digit separator register and glyph escape sequence \*ds \*[ds] \[ds] \(ds
>>> if not in use by base groff?
>> The sequence for the unslanted single quote is \(aq.
> Granted, but would it not be better to consider using a semantic digit separator
> groff man escape sequence, especially in text, whose rendering could be tweaked,
> rather than a generic literal apostrophe quote used everywhere?
> If nothing else is proposed and accepted, I will use the generic \(aq, and if
> future changes are required, they can be targeted by digit context.
We have little semantic things in man(7), as opposed to mdoc(7). I think it
will be simpler to just use \(aq.
Branden, any opinion?
>
>> We could add somewhere in man-pages(7) that decimal numbers should use a
>> separator every 3 digits, and hex and binary should use it every 4 digits.
> As well as the 3 decimal, 4 binary/hex, we could use yyyy'mm['dd]L for POSIX and
> similar date digit strings, and 0x10'ffff for Unicode code points,
> distinguishing between the Basic and Supplementary Multilingual Plane indices
> and codes, just as examples from what I've seen so far.
>
> I've also noticed a lot of apparently random decimal digit strings that are
> binary powers or close deltas: those would be more comprehensible if rendered in
> text as Ki/Mi/Gi[+/-n], so would that be preferable, using the IEC i suffix to
> avoid ambiguity?
In running text, I'd do it case by case. In some cases I guess that'll make
sense. In others, 2^32 will make more sense... But yes, big magic fatnums are
not nice.
>
>>> As well as the recently modified pages:
>>>
> >> clock_getres.2
> >> timer_settime.2
> >> timerfd_create.2
> >> utimensat.2
>>>
>>> there appear to be obvious occurrences in only the following pages:
>>>
> >> futex.2
> >> read.2
> >> sendfile.2
> >> write.2
> >> mallopt.3
> >> keyrings.7
> >> mq_overview.7
> >> sched.7
> >> time_namespaces.7
>>>
>>> but there appear to be about 400 pages with more than 6 decimal digit
>>> strings (some spurious glibc hex commits and address outputs) where it
>>> could perhaps help, such as in POSIX version dates e.g. 2001'12L, and
>>> undoubtedly more with long digit strings in other radixes.
>> Would you mind preparing a patch for all of those? If you'll do it, better
>> wait until we merge наб's patches, to avoid conflicts.
> I'll start anyway, need to review over 300 files with over 900 digit strings,
> having cut a bunch more pages with output examples.
Sure.
>
> Any particular subdivision of files patched into git logged patches, by section,
> by type of edit, separate logged patches for files with many edits, or...?
Whatever you prefer, I guess. I think the first division I'd do is in the kind
of change, and then in the section within a page where it appears. But, you
write it, so I guess you'll find the best separation. As long as patches are
consistent enough to not have many context switches when reviewing, it should be
good.
>
> FYI although many hits are likely output, the top candidates so far are:
>
> 80 man5/proc.5
> 55 man2/statfs.2
> 34 man7/feature_test_macros.7
> 32 man3/dl_iterate_phdr.3
> 30 man7/units.7
> 30 man5/rpc.5
> 23 man3/termios.3
> 20 man3/malloc_info.3
> 17 man2/userfaultfd.2
> 16 man7/keyrings.7
> 15 man7/time_namespaces.7
> 14 man7/posixoptions.7
> 14 man3/mallopt.3
> 13 man7/utf-8.7
> 12 man2/reboot.2
> 12 man2/keyctl.2
>
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-01-29 21:20 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-28 20:40 Using C23 digit separators not locale digit grouping characters Brian Inglis
2023-01-29 14:38 ` Alejandro Colomar
2023-01-29 21:04 ` Brian Inglis
2023-01-29 21:19 ` Alejandro Colomar [this message]
2023-02-02 22:29 ` Brian Inglis
2023-02-02 23:59 ` Alejandro Colomar
2023-02-03 13:27 ` Tom Schwindl
2023-02-05 13:47 ` Alejandro Colomar
2023-02-08 21:00 ` Jakub Wilk
2023-02-08 21:06 ` Alejandro Colomar
2023-02-08 22:10 ` Brian Inglis
2023-02-09 12:47 ` Alejandro Colomar
2023-02-04 7:19 ` Brian Inglis
2023-02-05 13:32 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5c2be1e7-4c75-dc20-8d2e-a528edea7e32@gmail.com \
--to=alx.manpages@gmail.com \
--cc=Brian.Inglis@Shaw.ca \
--cc=g.branden.robinson@gmail.com \
--cc=linux-man@vger.kernel.org \
--cc=nabijaczleweli@nabijaczleweli.xyz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox