From: Sam James <sam@gentoo.org>
To: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Alexis <flexibeast@gmail.com>,
groff@gnu.org, linux-man <linux-man@vger.kernel.org>,
Ingo Schwarze <schwarze@usta.de>, Dirk Gouders <dirk@gouders.net>,
Colin Watson <cjwatson@debian.org>,
Ralph Corderoy <ralph@inputplus.co.uk>,
Kerin Millar <kfm@plushkava.net>
Subject: Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
Date: Wed, 12 Apr 2023 09:13:13 +0100 [thread overview]
Message-ID: <875ya1ecq1.fsf@gentoo.org> (raw)
In-Reply-To: <c6e9eb6a-a2ba-1de1-211f-bc6ccc3f7a9a@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4996 bytes --]
Alejandro Colomar <alx.manpages@gmail.com> writes:
> [[PGP Signed Part:Undecided]]
> [Added back linux-man@, and people that commented on this (sub)topic]
> [Added Sam, I've got a question for you]
>
> Hi Alexis,
>
> Please keep (at least) linux-man@ in the loop.
>
> On 4/9/23 08:44, Alexis wrote:
>>
>> As a related data point, i'd like to mention Gentoo's position on
>> this, i.e. that man pages will continue to be bzip2-compressed by
>> default:
>>
>> "app-text/mandoc bzip2 support"
>> https://bugs.gentoo.org/854267
>>
>> "Remove /usr/share/man from default inclusion list for docompress"
>> https://bugs.gentoo.org/836367
>
> As Ingo said[1] 3 years ago, I don't think in this year it makes any
> sense to compress pages anymore. However, since it's simple for me
> to add support for that, and it can be interesting for testing
> purposes, I added support for installing the Linux man-pages
> compressed with bzip2 using the Makefile[2]. While I was at it, I
> also added support for generating .tar.bz2 release tarballs[3].
>
> With this, I was able to test a bit more than what I did yesterday:
>
>
> $ sudo rm -rf /opt/local/man/
> $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc -l
> 2570
> $ du -sh /opt/local/man/*
> 5.4M /opt/local/man/bz2
> 5.5M /opt/local/man/gz_
> 9.4M /opt/local/man/man
>
>
> $ export MANPATH=/opt/local/man/gz_/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.24
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.14
>
>
> $ export MANPATH=/opt/local/man/bz2/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 10.90
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.33
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l RLIMIT_NOFILE | wc -l"
> 17
> 1.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.21
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.22
>
>
> $ export MANPATH=/opt/local/man/man/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
> 17
> 0.01
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l"
> 17
> 0.01
>
> Weird thing: today, the symlink bug in man(1) was reproducible in
> all kinds of pages, while yesterday it only reproduced in
> uncompressed ones.
>
> Another weird thing: times today changed considerably for the
> find(1) pipelines (half of yesterday's). It's not a thing of
> using dash(1), because I get similar times with bash(1) and its
> builtin time(1).
>
> Important note: Sam, are you sure you want your pages compressed
> with bz2? Have you seen the 10 seconds it takes man-db's man(1) to
> find a word in the pages? I suggest that at least you try to
> reproduce these tests in your machine, and see if it's just me or
> man-db's man(1) is pretty bad at non-gz pages.
>
> Test results:
>
> - man-db's man(1) is slower with plain man(7) source than with .gz
> pages for some misterious reason.
>
> - man-db's man(1) is turtle slow with .bz2 pages.
I started looking into changing to xz (or just.. not bz2, anyway),
partially motivated by https://gitlab.com/man-db/man-db/-/issues/4 /
just interest locally (without having done measurements to see if it
would be worth a global change) and the xz maintainer ended up
recommending a different implementation to how man-db currently handles
external utilties entirely (which I have a draft of).
The xz author had some suggestions on the best parameters to use
for man pages too which I need to look into and dig up...
https://bugs.gentoo.org/169260 was an interesting discussion
about our choice of bz2 (it came up a bit in
https://bugs.gentoo.org/372653 too).
(I'll get back and read the rest of the thread later, but wanted
to add this tidbit.)
Definitely surprised to learn bz2 is *that* bad though!
best,
sam
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]
next prev parent reply other threads:[~2023-04-12 8:17 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-25 20:37 Playground pager lsp(1) Dirk Gouders
2023-03-25 20:47 ` Dirk Gouders
2023-04-04 23:45 ` Alejandro Colomar
2023-04-05 5:35 ` Eli Zaretskii
2023-04-06 1:10 ` Alejandro Colomar
2023-04-06 8:11 ` Eli Zaretskii
2023-04-06 8:48 ` Gavin Smith
2023-04-07 22:01 ` Alejandro Colomar
2023-04-08 7:05 ` Eli Zaretskii
2023-04-08 13:02 ` Accessibility of man pages (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-08 13:42 ` Eli Zaretskii
2023-04-08 16:06 ` Alejandro Colomar
2023-04-08 13:47 ` Colin Watson
2023-04-08 15:42 ` Alejandro Colomar
2023-04-08 19:48 ` Accessibility of man pages Dirk Gouders
2023-04-08 20:02 ` Eli Zaretskii
2023-04-08 20:46 ` Dirk Gouders
2023-04-08 21:53 ` Alejandro Colomar
2023-04-08 22:33 ` Alejandro Colomar
2023-04-09 10:28 ` Ralph Corderoy
2023-04-08 20:31 ` Ingo Schwarze
2023-04-08 20:59 ` Dirk Gouders
2023-04-08 22:39 ` Ingo Schwarze
2023-04-09 9:50 ` Dirk Gouders
2023-04-09 10:35 ` Dirk Gouders
[not found] ` <87a5zhwntt.fsf@ada>
2023-04-09 12:05 ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Alejandro Colomar
2023-04-09 12:17 ` Alejandro Colomar
2023-04-09 18:55 ` G. Branden Robinson
2023-04-09 12:29 ` Colin Watson
2023-04-09 13:36 ` Alejandro Colomar
2023-04-09 13:47 ` Compressed man pages Ralph Corderoy
2023-04-12 8:13 ` Sam James [this message]
2023-04-12 8:32 ` Ralph Corderoy
2023-04-12 10:35 ` Mingye Wang
2023-04-12 10:55 ` Ralph Corderoy
2023-04-12 13:04 ` Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) Kerin Millar
2023-04-12 14:24 ` Alejandro Colomar
2023-04-12 18:52 ` Mingye Wang
2023-04-12 20:23 ` Compressed man pages Alejandro Colomar
2023-04-13 10:09 ` Ralph Corderoy
2023-04-07 2:18 ` Playground pager lsp(1) G. Branden Robinson
2023-04-07 6:36 ` Eli Zaretskii
2023-04-07 11:03 ` Gavin Smith
2023-04-07 14:43 ` man page rendering speed (was: Playground pager lsp(1)) G. Branden Robinson
2023-04-07 15:06 ` Eli Zaretskii
2023-04-07 15:08 ` Larry McVoy
2023-04-07 17:07 ` man page rendering speed Ingo Schwarze
2023-04-07 19:04 ` man page rendering speed (was: Playground pager lsp(1)) Alejandro Colomar
2023-04-07 19:28 ` Gavin Smith
2023-04-07 20:43 ` Alejandro Colomar
2023-04-07 16:08 ` Colin Watson
2023-04-08 11:24 ` Ralph Corderoy
2023-04-07 21:26 ` reformatting man pages at SIGWINCH " Alejandro Colomar
2023-04-07 22:09 ` reformatting man pages at SIGWINCH Dirk Gouders
2023-04-07 22:16 ` Alejandro Colomar
2023-04-10 19:05 ` Dirk Gouders
2023-04-10 19:57 ` Alejandro Colomar
2023-04-10 20:24 ` G. Branden Robinson
2023-04-11 9:20 ` Ralph Corderoy
2023-04-11 9:39 ` Dirk Gouders
2023-04-17 6:23 ` G. Branden Robinson
2023-04-08 11:40 ` Ralph Corderoy
2023-04-05 10:02 ` Playground pager lsp(1) Dirk Gouders
2023-04-05 14:19 ` Arsen Arsenović
2023-04-05 18:01 ` Dirk Gouders
2023-04-05 19:07 ` Eli Zaretskii
2023-04-05 19:56 ` Dirk Gouders
2023-04-05 20:38 ` A less presumptive .info? (was: Re: Playground pager lsp(1)) Arsen Arsenović
2023-04-06 8:14 ` Eli Zaretskii
2023-04-06 8:56 ` Gavin Smith
2023-04-07 13:14 ` Arsen Arsenović
2023-04-06 1:31 ` Playground pager lsp(1) Alejandro Colomar
2023-04-06 6:01 ` Dirk Gouders
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875ya1ecq1.fsf@gentoo.org \
--to=sam@gentoo.org \
--cc=alx.manpages@gmail.com \
--cc=cjwatson@debian.org \
--cc=dirk@gouders.net \
--cc=flexibeast@gmail.com \
--cc=groff@gnu.org \
--cc=kfm@plushkava.net \
--cc=linux-man@vger.kernel.org \
--cc=ralph@inputplus.co.uk \
--cc=schwarze@usta.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.