* man -K finds repeated entries for each symlink page
@ 2023-04-09 13:58 Alejandro Colomar
2023-04-09 14:55 ` Colin Watson
0 siblings, 1 reply; 3+ messages in thread
From: Alejandro Colomar @ 2023-04-09 13:58 UTC (permalink / raw)
To: Colin Watson, man-db-devel; +Cc: linux-man
[-- Attachment #1.1: Type: text/plain, Size: 1359 bytes --]
Hi Colin,
For a reproducer, run the following commands from a clone of the Linux
man-pages repo (although you should be able to reproduce in any Debian
installation, I guess).
$ sudo rm -r /opt/local/man/
$ sudo make install-man2 prefix=/opt/local/man LINK_PAGES=symlink -j | wc -l
503
$ export MANPATH=/opt/local/man/share/man
$ man -Kaw RLIMIT_NOFILE | sort | uniq -c
3 /opt/local/man/share/man/man2/dup.2
2 /opt/local/man/share/man/man2/fcntl.2
5 /opt/local/man/share/man/man2/getrlimit.2
3 /opt/local/man/share/man/man2/open.2
1 /opt/local/man/share/man/man2/pidfd_getfd.2
1 /opt/local/man/share/man/man2/pidfd_open.2
2 /opt/local/man/share/man/man2/poll.2
1 /opt/local/man/share/man/man2/seccomp_unotify.2
4 /opt/local/man/share/man/man2/select.2
Those numbers coincide with 1+ the number of symlinks for each of the
pages. For example, see select.2:
$ find /opt/local/man/share/man -type l | xargs readlink | grep -c /select.2
3
man(1) found the original page, plus the 3 symlinks.
The solution should be that man(1) ignores link pages for -K, since
looking at the source code of one page won't change the results from
a different page.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: man -K finds repeated entries for each symlink page
2023-04-09 13:58 man -K finds repeated entries for each symlink page Alejandro Colomar
@ 2023-04-09 14:55 ` Colin Watson
2023-04-09 15:20 ` Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page) Alejandro Colomar
0 siblings, 1 reply; 3+ messages in thread
From: Colin Watson @ 2023-04-09 14:55 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: man-db-devel, linux-man
On Sun, Apr 09, 2023 at 03:58:28PM +0200, Alejandro Colomar wrote:
> $ man -Kaw RLIMIT_NOFILE | sort | uniq -c
> 3 /opt/local/man/share/man/man2/dup.2
> 2 /opt/local/man/share/man/man2/fcntl.2
> 5 /opt/local/man/share/man/man2/getrlimit.2
> 3 /opt/local/man/share/man/man2/open.2
> 1 /opt/local/man/share/man/man2/pidfd_getfd.2
> 1 /opt/local/man/share/man/man2/pidfd_open.2
> 2 /opt/local/man/share/man/man2/poll.2
> 1 /opt/local/man/share/man/man2/seccomp_unotify.2
> 4 /opt/local/man/share/man/man2/select.2
>
> Those numbers coincide with 1+ the number of symlinks for each of the
> pages. For example, see select.2:
Thanks for the report. Fixed by this commit:
https://gitlab.com/man-db/man-db/-/commit/7ef30573a7023eb78bf70a34edaa4e3906531993
--
Colin Watson (he/him) [cjwatson@debian.org]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page)
2023-04-09 14:55 ` Colin Watson
@ 2023-04-09 15:20 ` Alejandro Colomar
0 siblings, 0 replies; 3+ messages in thread
From: Alejandro Colomar @ 2023-04-09 15:20 UTC (permalink / raw)
To: Colin Watson; +Cc: man-db-devel, linux-man
[-- Attachment #1.1: Type: text/plain, Size: 2418 bytes --]
Hi Colin,
On 4/9/23 16:55, Colin Watson wrote:
> On Sun, Apr 09, 2023 at 03:58:28PM +0200, Alejandro Colomar wrote:
>> $ man -Kaw RLIMIT_NOFILE | sort | uniq -c
>> 3 /opt/local/man/share/man/man2/dup.2
>> 2 /opt/local/man/share/man/man2/fcntl.2
>> 5 /opt/local/man/share/man/man2/getrlimit.2
>> 3 /opt/local/man/share/man/man2/open.2
>> 1 /opt/local/man/share/man/man2/pidfd_getfd.2
>> 1 /opt/local/man/share/man/man2/pidfd_open.2
>> 2 /opt/local/man/share/man/man2/poll.2
>> 1 /opt/local/man/share/man/man2/seccomp_unotify.2
>> 4 /opt/local/man/share/man/man2/select.2
>>
>> Those numbers coincide with 1+ the number of symlinks for each of the
>> pages. For example, see select.2:
>
> Thanks for the report. Fixed by this commit:
>
> https://gitlab.com/man-db/man-db/-/commit/7ef30573a7023eb78bf70a34edaa4e3906531993
Heh, that was fast :)
As a side effect of not reading too many files, performance improved
considerably for bzip2 (~3x), and for gzip (~2x).
I built man from source (tweaking with -O3, so I cheated a little bit),
and here are the results:
$ export MANPATH=/tmp/man/gz_/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.19
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.14
$ export MANPATH=/tmp/man/bz2/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
3.05
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.20
$ export MANPATH=/tmp/man/man/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
17
0.52
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE | wc -l"
17
0.01
Please consider this a new bug report, about performance. See the last
block of commands. man(1) takes half a second, while my loop with
find(1) and grep(1) is almost non-measurable. I could understand that
man(1) has some overhead, but 52x feels like there's some serious
performance problem; especially when man(1) is faster reading
uncompressed pages (see at the top).
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-04-09 15:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-09 13:58 man -K finds repeated entries for each symlink page Alejandro Colomar
2023-04-09 14:55 ` Colin Watson
2023-04-09 15:20 ` Poor performance of man -K for uncompressed pages (was: man -K finds repeated entries for each symlink page) Alejandro Colomar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox