public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed
* man -K finds repeated entries for each symlink page
@ 2023-04-09 13:58 Alejandro Colomar
  2023-04-09 14:55 ` Colin Watson
  0 siblings, 1 reply; 3+ messages in thread
From: Alejandro Colomar @ 2023-04-09 13:58 UTC (permalink / raw)
  To: Colin Watson, man-db-devel; +Cc: linux-man


[-- Attachment #1.1: Type: text/plain, Size: 1359 bytes --]

Hi Colin,

For a reproducer, run the following commands from a clone of the Linux
man-pages repo (although you should be able to reproduce in any Debian
installation, I guess).

$ sudo rm -r /opt/local/man/
$ sudo make install-man2 prefix=/opt/local/man LINK_PAGES=symlink -j | wc -l
503
$ export MANPATH=/opt/local/man/share/man
$ man -Kaw RLIMIT_NOFILE | sort | uniq -c
      3 /opt/local/man/share/man/man2/dup.2
      2 /opt/local/man/share/man/man2/fcntl.2
      5 /opt/local/man/share/man/man2/getrlimit.2
      3 /opt/local/man/share/man/man2/open.2
      1 /opt/local/man/share/man/man2/pidfd_getfd.2
      1 /opt/local/man/share/man/man2/pidfd_open.2
      2 /opt/local/man/share/man/man2/poll.2
      1 /opt/local/man/share/man/man2/seccomp_unotify.2
      4 /opt/local/man/share/man/man2/select.2

Those numbers coincide with 1+ the number of symlinks for each of the
pages.  For example, see select.2:

$ find /opt/local/man/share/man -type l | xargs readlink | grep -c /select.2
3

man(1) found the original page, plus the 3 symlinks.

The solution should be that man(1) ignores link pages for -K, since
looking at the source code of one page won't change the results from
a different page.

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: man -K finds repeated entries for each symlink page
  2023-04-09 13:58 man -K finds repeated entries for each symlink page Alejandro Colomar
@ 2023-04-09 14:55 ` Colin Watson
  2023-04-09 15:20   ` Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page) Alejandro Colomar
  0 siblings, 1 reply; 3+ messages in thread
From: Colin Watson @ 2023-04-09 14:55 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: man-db-devel, linux-man

On Sun, Apr 09, 2023 at 03:58:28PM +0200, Alejandro Colomar wrote:
> $ man -Kaw RLIMIT_NOFILE | sort | uniq -c
>       3 /opt/local/man/share/man/man2/dup.2
>       2 /opt/local/man/share/man/man2/fcntl.2
>       5 /opt/local/man/share/man/man2/getrlimit.2
>       3 /opt/local/man/share/man/man2/open.2
>       1 /opt/local/man/share/man/man2/pidfd_getfd.2
>       1 /opt/local/man/share/man/man2/pidfd_open.2
>       2 /opt/local/man/share/man/man2/poll.2
>       1 /opt/local/man/share/man/man2/seccomp_unotify.2
>       4 /opt/local/man/share/man/man2/select.2
> 
> Those numbers coincide with 1+ the number of symlinks for each of the
> pages.  For example, see select.2:

Thanks for the report.  Fixed by this commit:

  https://gitlab.com/man-db/man-db/-/commit/7ef30573a7023eb78bf70a34edaa4e3906531993

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page)
  2023-04-09 14:55 ` Colin Watson
@ 2023-04-09 15:20   ` Alejandro Colomar
  0 siblings, 0 replies; 3+ messages in thread
From: Alejandro Colomar @ 2023-04-09 15:20 UTC (permalink / raw)
  To: Colin Watson; +Cc: man-db-devel, linux-man


[-- Attachment #1.1: Type: text/plain, Size: 2418 bytes --]

Hi Colin,

On 4/9/23 16:55, Colin Watson wrote:
> On Sun, Apr 09, 2023 at 03:58:28PM +0200, Alejandro Colomar wrote:
>> $ man -Kaw RLIMIT_NOFILE | sort | uniq -c
>>       3 /opt/local/man/share/man/man2/dup.2
>>       2 /opt/local/man/share/man/man2/fcntl.2
>>       5 /opt/local/man/share/man/man2/getrlimit.2
>>       3 /opt/local/man/share/man/man2/open.2
>>       1 /opt/local/man/share/man/man2/pidfd_getfd.2
>>       1 /opt/local/man/share/man/man2/pidfd_open.2
>>       2 /opt/local/man/share/man/man2/poll.2
>>       1 /opt/local/man/share/man/man2/seccomp_unotify.2
>>       4 /opt/local/man/share/man/man2/select.2
>>
>> Those numbers coincide with 1+ the number of symlinks for each of the
>> pages.  For example, see select.2:
> 
> Thanks for the report.  Fixed by this commit:
> 
>   https://gitlab.com/man-db/man-db/-/commit/7ef30573a7023eb78bf70a34edaa4e3906531993

Heh, that was fast :)

As a side effect of not reading too many files, performance improved
considerably for bzip2 (~3x), and for gzip (~2x).

I built man from source (tweaking with -O3, so I cheated a little bit),
and here are the results:


$ export MANPATH=/tmp/man/gz_/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.19
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.14


$ export MANPATH=/tmp/man/bz2/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
3.05
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.20


$ export MANPATH=/tmp/man/man/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.52
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
17
0.01


Please consider this a new bug report, about performance.  See the last
block of commands.  man(1) takes half a second, while my loop with
find(1) and grep(1) is almost non-measurable.  I could understand that
man(1) has some overhead, but 52x feels like there's some serious
performance problem; especially when man(1) is faster reading
uncompressed pages (see at the top).


Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-04-09 15:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-09 13:58 man -K finds repeated entries for each symlink page Alejandro Colomar
2023-04-09 14:55 ` Colin Watson
2023-04-09 15:20   ` Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page) Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox