linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx@kernel.org>
To: debian-policy@lists.debian.org,
	 "Dr. Tobias Quathamer" <toddy@debian.org>
Cc: linux-man@vger.kernel.org,
	Helge Kreutzmann <debian@helgefjell.de>,
	 "G. Branden Robinson" <branden@debian.org>,
	Colin Watson <cjwatson@debian.org>
Subject: Stop compressing manual pages (was: Bug#1123959: manpages: Please consider shipping uncompressed man pages)
Date: Thu, 25 Dec 2025 15:15:03 +0100	[thread overview]
Message-ID: <aU1D0aL00gy1V-NX@devuan> (raw)
In-Reply-To: <fec615b5-af5b-46cd-ae09-d9343db6da77@debian.org>

[-- Attachment #1: Type: text/plain, Size: 5329 bytes --]

Hi,

On Thu, Dec 25, 2025 at 02:47:33PM +0100, Dr. Tobias Quathamer wrote:
> Am 25.12.25 um 12:20 schrieb Alejandro Colomar:
> > Hello Helge, Tobias,
> > 
> > On Thu, Dec 25, 2025 at 06:07:57AM +0000, Helge Kreutzmann wrote:
> > > Hello Tobias,
> > > if you look at mansect(1), the example given does not work in Debian.
> > > I reported this upstream and got the following reply:
> > > 
> > > > The issue is that Debian compresses manual pages.  Please consider
> > > > changing the policy to not compress manual pages.  The storage savings
> > > > are irrelevant in this age.
> > > 
> > > Could you consider this?
> > 
> > Thanks!
> > 
> > Indeed, compressed manual pages are a pain to work with.  You can't use
> > regular Unix tools to work with them.  With uncompressed manual pages,
> > You can go to /usr/share/man, and run a pipe of programs to do a complex
> > search.  With tools like zgrep(1) and zcat(1), you can do some stuff,
> > but not everything.
> 
> Hi Helge and Alex,
> 
> thanks for your bug report and the provided statistics. I haven't thought
> about this up until now, because it violates Debian Policy. Quoting from
> Section 12.1
> (https://www.debian.org/doc/debian-policy/ch-docs.html#manual-pages):
> 
> "Manual pages should be installed compressed using gzip -9."
> 
> And regarding the terminology using the word "should", this is defined in
> section 1.1 (https://www.debian.org/doc/debian-policy/ch-scope.html#scope):
> 
> "The terms should and should not, and the adjective recommended, denote best
> practices. Non-conformance with these guidelines will generally be
> considered a bug, but will not necessarily render a package unsuitable for
> distribution. These statements correspond to bug severities of important,
> normal, and minor. They are collectively called Policy recommendations."
> 
> So by not compressing the man pages, the Debian package would introduce a
> bug. Moreover, I'd have to explicitely opt out of automatic compression in
> the build stage of the package.
> 
> All of this is doable, of course. But I'm a bit hesitant with just making
> the switch, given that the manpages package is certainly the package with
> the most man pages in the Debian ecosystem -- by a large margin.
> 
> So it might be better to discuss the pros and cons in a broader audience,
> trying to understand why the compression has been chosen initially. Maybe
> only due to disk space limitations back then, but maybe there are other
> reasons as well -- which might still be valid today.

Yup, I'd like that policy to change.  I've added debian-policy@ to this
mail (and also linux-man@).

For those reading only since this email, please have a look at
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1123959>, which
itself also references an discussion in the Linux man-pages project from
a couple of years ago:
<https://lore.kernel.org/linux-man/c8cf5be0-04e7-f0a1-179f-eada6182c33e@gmail.com/T/#m272e6ee8939d0836999dd8bb28f2e0e94f48dfc7>.

I'll paste again the numbers:

        $ sudo make install-man prefix=/opt/local/man/gz__1 -j LINK_PAGES=symlink Z=.gz  GZIPFLAGS=-1  | wc -l
        2571
        $ sudo make install-man prefix=/opt/local/man/gz__9 -j LINK_PAGES=symlink Z=.gz  GZIPFLAGS=-9  | wc -l
        2571
        $ sudo make install-man prefix=/opt/local/man/man__ -j LINK_PAGES=symlink Z=                   | wc -l
        2571

        $ du -sh /opt/local/man/*
        5.7M    /opt/local/man/gz__1
        5.5M    /opt/local/man/gz__9
        5.5M    /opt/local/man/gz___
        9.4M    /opt/local/man/man__

        $ export MANPATH=/opt/local/man/gz__1/share/man
        $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
        17; 0.21
        $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
        17; 1.16

        $ export MANPATH=/opt/local/man/gz__9/share/man
        $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
        17; 0.20
        $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l | xargs printf '%s; '"
        17; 1.17

        $ export MANPATH=/opt/local/man/man__/share/man
        $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
        17; 0.55
        $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l RLIMIT_NOFILE | wc -l | xargs printf '%s; '"
        17; 0.01

Using uncompressed manual pages is both faster and simpler, by orders of
magnitude, when doing complex searches with pipelines.  In the simple
cases where man(1) is enough, the speed is in the same order of
magnitude.

Also, the compression only cuts storage by half, so not even an order of
magnitude.  In this age, where storage is relatively cheap, systems that
have manual pages installed most likely have room enough for the
uncompressed pages.

Please change Debian policy 12.1 ("Manual pages") to recommend
uncompressed pages.


Have a lovely day!
Alex

> 
> Regards,
> Tobias

-- 
<https://www.alejandro-colomar.es>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

       reply	other threads:[~2025-12-25 14:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <aUzUvdZEJpDHb3QX@meinfjell.helgefjelltest.de>
     [not found] ` <aU0WjfHED1esOwPy@devuan>
     [not found]   ` <fec615b5-af5b-46cd-ae09-d9343db6da77@debian.org>
2025-12-25 14:15     ` Alejandro Colomar [this message]
2025-12-25 19:06       ` Stop compressing manual pages Russ Allbery
2025-12-26  3:08         ` G. Branden Robinson
2025-12-26 10:51         ` Marc Haber
2025-12-26 14:52           ` Simon McVittie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aU1D0aL00gy1V-NX@devuan \
    --to=alx@kernel.org \
    --cc=branden@debian.org \
    --cc=cjwatson@debian.org \
    --cc=debian-policy@lists.debian.org \
    --cc=debian@helgefjell.de \
    --cc=linux-man@vger.kernel.org \
    --cc=toddy@debian.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).