* Optimize script for generating LinuxManBook.pdf
@ 2023-11-22 14:58 Alejandro Colomar
2023-11-22 17:33 ` Deri
2023-11-23 18:41 ` Deri
0 siblings, 2 replies; 10+ messages in thread
From: Alejandro Colomar @ 2023-11-22 14:58 UTC (permalink / raw)
To: Deri James, linux-man; +Cc: groff
[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]
Hi Deri,
I've optimized from 18.5 s down to 16.3 s the script, by splitting the
pipeline with this wrapper (and slightly reducing the perl script to
just print the pages to stdout). BTW, now it can be run from any
directory. And every step can be debugged by just introducing
| tee /dev/tty \
wherever you want to debug. It's all pushed to master.
The PDF is now printed to stdout, to avoid hard-coding file names.
I still need to split a bit more and reduce the longest lines. How does
this script look to you?
Cheers,
Alex :-)
$ cat scripts/LinuxManBook/build_linux_man_book.sh
#!/bin/sh
# Copyright 2023, Alejandro Colomar <alx@kernel.org>
# SPDX-License-Identifier: GPL-3.0-or-later
(
"$(dirname "$0")"/prepare_linux_man_book.pl "$1" \
| groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 -dpaper=a4 -Tpdf -k -pet \
-M"$(dirname "$0")" -mandoc -manmark \
-F"$(dirname "$0")" -P-pa4 \-rC1 -rCHECKSTYLE=3 2>&1 \
| LC_ALL=C grep '^\. *ds ';
"$(dirname "$0")"/prepare_linux_man_book.pl "$1";
) \
| preconv \
| tbl \
| eqn -Tpdf \
| (
troff -Tpdf -ms <"$(dirname "$0")"/LMBfront.ms;
troff -Tpdf -M"$(dirname "$0")" -mandoc -manmark \
-F"$(dirname "$0")" -dpaper=a4;
) \
| gropdf -F"$(dirname "$0")" -pa4;
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Optimize script for generating LinuxManBook.pdf
2023-11-22 14:58 Optimize script for generating LinuxManBook.pdf Alejandro Colomar
@ 2023-11-22 17:33 ` Deri
2023-11-22 17:39 ` Alejandro Colomar
2023-11-23 18:41 ` Deri
1 sibling, 1 reply; 10+ messages in thread
From: Deri @ 2023-11-22 17:33 UTC (permalink / raw)
To: linux-man, Alejandro Colomar; +Cc: groff
On Wednesday, 22 November 2023 14:58:56 GMT Alejandro Colomar wrote:
> Hi Deri,
>
> I've optimized from 18.5 s down to 16.3 s the script, by splitting the
> pipeline with this wrapper (and slightly reducing the perl script to
> just print the pages to stdout). BTW, now it can be run from any
> directory. And every step can be debugged by just introducing
>
> | tee /dev/tty \
>
> wherever you want to debug. It's all pushed to master.
>
> The PDF is now printed to stdout, to avoid hard-coding file names.
>
> I still need to split a bit more and reduce the longest lines. How does
> this script look to you?
>
> Cheers,
> Alex :-)
>
>
> $ cat scripts/LinuxManBook/build_linux_man_book.sh
> #!/bin/sh
> # Copyright 2023, Alejandro Colomar <alx@kernel.org>
> # SPDX-License-Identifier: GPL-3.0-or-later
>
> (
> "$(dirname "$0")"/prepare_linux_man_book.pl "$1" \
>
> | groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 -dpaper=a4 -Tpdf -k -pet \
>
> -M"$(dirname "$0")" -mandoc -manmark \
> -F"$(dirname "$0")" -P-pa4 \-rC1 -rCHECKSTYLE=3 2>&1 \
>
> | LC_ALL=C grep '^\. *ds ';
>
> "$(dirname "$0")"/prepare_linux_man_book.pl "$1";
> ) \
>
> | preconv \
> | tbl \
> | eqn -Tpdf \
> | (
>
> troff -Tpdf -ms <"$(dirname "$0")"/LMBfront.ms;
> troff -Tpdf -M"$(dirname "$0")" -mandoc -manmark \
> -F"$(dirname "$0")" -dpaper=a4;
> ) \
>
> | gropdf -F"$(dirname "$0")" -pa4;
Hi Alex,
Is there a git address I can clone to see the changes "in the round” and give
it a go?
Cheers
Deri
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Optimize script for generating LinuxManBook.pdf
2023-11-22 17:33 ` Deri
@ 2023-11-22 17:39 ` Alejandro Colomar
0 siblings, 0 replies; 10+ messages in thread
From: Alejandro Colomar @ 2023-11-22 17:39 UTC (permalink / raw)
To: Deri; +Cc: linux-man, groff
[-- Attachment #1: Type: text/plain, Size: 481 bytes --]
Hi Deri,
On Wed, Nov 22, 2023 at 05:33:52PM +0000, Deri wrote:
> Hi Alex,
>
> Is there a git address I can clone to see the changes "in the round” and give
> it a go?
Here's an unstable branch (I rebase often):
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/log/?h=contrib>
It's similar to Linux's 'next' branch.
But most of those changes are already in the master branch.
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Optimize script for generating LinuxManBook.pdf
2023-11-22 14:58 Optimize script for generating LinuxManBook.pdf Alejandro Colomar
2023-11-22 17:33 ` Deri
@ 2023-11-23 18:41 ` Deri
2023-11-23 22:12 ` Alejandro Colomar
1 sibling, 1 reply; 10+ messages in thread
From: Deri @ 2023-11-23 18:41 UTC (permalink / raw)
To: linux-man, Alejandro Colomar; +Cc: groff
On Wednesday, 22 November 2023 14:58:56 GMT Alejandro Colomar wrote:
> Hi Deri,
>
> I've optimized from 18.5 s down to 16.3 s the script, by splitting the
> pipeline with this wrapper (and slightly reducing the perl script to
> just print the pages to stdout). BTW, now it can be run from any
> directory. And every step can be debugged by just introducing
>
> | tee /dev/tty \
>
> wherever you want to debug. It's all pushed to master.
>
> The PDF is now printed to stdout, to avoid hard-coding file names.
>
> I still need to split a bit more and reduce the longest lines. How does
> this script look to you?
>
> Cheers,
> Alex :-)
>
Hi Alex,
It looks fine, although you have to run the code in
"prepare_linux_man_book.pl" twice (to avoid using a temporary file). If you
are going to run preconv it is best to run it first - stops pic spitting out
loads of warnings. You also dropped one stage in second pass, no pic in the
pipeline. This may explain part of the speedup you observed. I don't know if
any of your man pages require pic but they could in the future. The changes I
would advise are:-
--- a/scripts/LinuxManBook/build_linux_man_book.sh
+++ b/scripts/LinuxManBook/build_linux_man_book.sh
@@ -4,8 +4,8 @@
(
"$(dirname "$0")"/prepare_linux_man_book.pl "$1" \
- | pic \
| preconv \
+ | pic \
| tbl \
| eqn -Tpdf \
| troff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -dpaper=a4 \
@@ -16,6 +16,7 @@
"$(dirname "$0")"/prepare_linux_man_book.pl "$1";
) \
| preconv \
+| pic \
| tbl \
| eqn -Tpdf \
| (
Cheers
Deri
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Optimize script for generating LinuxManBook.pdf
2023-11-23 18:41 ` Deri
@ 2023-11-23 22:12 ` Alejandro Colomar
2023-11-24 11:46 ` Alejandro Colomar
0 siblings, 1 reply; 10+ messages in thread
From: Alejandro Colomar @ 2023-11-23 22:12 UTC (permalink / raw)
To: Deri; +Cc: linux-man, groff
[-- Attachment #1: Type: text/plain, Size: 1994 bytes --]
Hi Deri,
On Thu, Nov 23, 2023 at 06:41:14PM +0000, Deri wrote:
> Hi Alex,
>
> It looks fine, although you have to run the code in
> "prepare_linux_man_book.pl" twice (to avoid using a temporary file).
Yep. I was wondering if we could change something in the design of
prepare_linux_man_book.pl so that it could be run once without needing
a temporary file. Maybe if it could insert something in the pages that
the latter troff(1) would process in one take, without having to put all
the bookmarks at the start of the file. That would be an important
simplification of the scripts, and probably also an optimization.
> If you
> are going to run preconv it is best to run it first - stops pic spitting out
> loads of warnings.
Thanks! Makes sense.
> You also dropped one stage in second pass, no pic in the
> pipeline. This may explain part of the speedup you observed. I don't know if
> any of your man pages require pic but they could in the future. The changes I
> would advise are:-
Thanks! I've applied those changes; will push in a moment. The speedup
is still the same; probably because pic(1)'s throughput is faster than
the consumers of its output, and its latency is negligible. Since it
doesn't slow down, I've added it, just in case we want to use pic(1) in
some page in the future, as you say.
Cheers,
Alex
>
> --- a/scripts/LinuxManBook/build_linux_man_book.sh
> +++ b/scripts/LinuxManBook/build_linux_man_book.sh
> @@ -4,8 +4,8 @@
>
> (
> "$(dirname "$0")"/prepare_linux_man_book.pl "$1" \
> - | pic \
> | preconv \
> + | pic \
> | tbl \
> | eqn -Tpdf \
> | troff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -dpaper=a4 \
> @@ -16,6 +16,7 @@
> "$(dirname "$0")"/prepare_linux_man_book.pl "$1";
> ) \
> | preconv \
> +| pic \
> | tbl \
> | eqn -Tpdf \
> | (
>
> Cheers
>
> Deri
>
>
>
>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Optimize script for generating LinuxManBook.pdf
2023-11-23 22:12 ` Alejandro Colomar
@ 2023-11-24 11:46 ` Alejandro Colomar
2023-11-30 16:56 ` Deri
0 siblings, 1 reply; 10+ messages in thread
From: Alejandro Colomar @ 2023-11-24 11:46 UTC (permalink / raw)
To: Deri; +Cc: linux-man, groff
[-- Attachment #1: Type: text/plain, Size: 1892 bytes --]
On Thu, Nov 23, 2023 at 11:12:45PM +0100, Alejandro Colomar wrote:
> > It looks fine, although you have to run the code in
> > "prepare_linux_man_book.pl" twice (to avoid using a temporary file).
>
> Yep. I was wondering if we could change something in the design of
> prepare_linux_man_book.pl so that it could be run once without needing
> a temporary file. Maybe if it could insert something in the pages that
> the latter troff(1) would process in one take, without having to put all
> the bookmarks at the start of the file. That would be an important
> simplification of the scripts, and probably also an optimization.
Hi Deri,
I have another optimization: split the sort. It reduces around 0.3 s.
diff --git a/scripts/LinuxManBook/prepare_linux_man_book.pl b/scripts/LinuxManBook/prepare_linux_man_book.pl
index 0a79df4e5..5a4aad429 100755
--- a/scripts/LinuxManBook/prepare_linux_man_book.pl
+++ b/scripts/LinuxManBook/prepare_linux_man_book.pl
@@ -88,7 +88,16 @@ sub BuildBook
{
print ".pdfpagenumbering D . 1\n";
- foreach my $fn (sort sortman glob("$dir/man*/*")) {
+ foreach my $fn (sort glob("$dir/man*")) {
+ BuildSec($fn);
+ }
+}
+
+sub BuildSec
+{
+ my $manSdir=shift;
+
+ foreach my $fn (sort sortman glob("$manSdir/*")) {
BuildPage($fn);
}
}
I didn't think of this as an optimization, but rather to move code from
BuildPage() into BuildSec(). However, since it doesn't block until all
of the pages are sorted, it reduces the latency of the script (that's my
guess).
I think moving stuff from BuildPage() to BuildSec() would both simplify
and optimize, so please check it when you can. (I'm also checking it,
but while I'm learning Perl with this, I'm still very limited.)
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Optimize script for generating LinuxManBook.pdf
2023-11-24 11:46 ` Alejandro Colomar
@ 2023-11-30 16:56 ` Deri
2023-11-30 22:38 ` Alejandro Colomar
0 siblings, 1 reply; 10+ messages in thread
From: Deri @ 2023-11-30 16:56 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: linux-man
[-- Attachment #1: Type: text/plain, Size: 2321 bytes --]
On Friday, 24 November 2023 11:46:30 GMT Alejandro Colomar wrote:
> On Thu, Nov 23, 2023 at 11:12:45PM +0100, Alejandro Colomar wrote:
> > > It looks fine, although you have to run the code in
> > > "prepare_linux_man_book.pl" twice (to avoid using a temporary file).
> >
> > Yep. I was wondering if we could change something in the design of
> > prepare_linux_man_book.pl so that it could be run once without needing
> > a temporary file. Maybe if it could insert something in the pages that
> > the latter troff(1) would process in one take, without having to put all
> > the bookmarks at the start of the file. That would be an important
> > simplification of the scripts, and probably also an optimization.
>
> Hi Deri,
>
> I have another optimization: split the sort. It reduces around 0.3 s.
>
> diff --git a/scripts/LinuxManBook/prepare_linux_man_book.pl
> b/scripts/LinuxManBook/prepare_linux_man_book.pl index 0a79df4e5..5a4aad429
> 100755
> --- a/scripts/LinuxManBook/prepare_linux_man_book.pl
> +++ b/scripts/LinuxManBook/prepare_linux_man_book.pl
> @@ -88,7 +88,16 @@ sub BuildBook
> {
> print ".pdfpagenumbering D . 1\n";
>
> - foreach my $fn (sort sortman glob("$dir/man*/*")) {
> + foreach my $fn (sort glob("$dir/man*")) {
> + BuildSec($fn);
> + }
> +}
> +
> +sub BuildSec
> +{
> + my $manSdir=shift;
> +
> + foreach my $fn (sort sortman glob("$manSdir/*")) {
> BuildPage($fn);
> }
> }
>
>
> I didn't think of this as an optimization, but rather to move code from
> BuildPage() into BuildSec(). However, since it doesn't block until all
> of the pages are sorted, it reduces the latency of the script (that's my
> guess).
>
> I think moving stuff from BuildPage() to BuildSec() would both simplify
> and optimize, so please check it when you can. (I'm also checking it,
> but while I'm learning Perl with this, I'm still very limited.)
>
> Cheers,
> Alex
Hi Alex,
I have attached the latest iteration of my work, managed to knock two seconds
off the current code in your git. It no longer uses temporary files, outputs
the pdf to stdout, can be run from any directory and runs groff once.
It replaces the complete LinuxManBook directory and the executable is now
called BuildLinuxMan2.pl.
Cheers
Deri
[-- Attachment #2: LinuxManBook.tgz --]
[-- Type: application/x-compressed-tar, Size: 371754 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Optimize script for generating LinuxManBook.pdf
2023-11-30 16:56 ` Deri
@ 2023-11-30 22:38 ` Alejandro Colomar
2023-12-01 0:14 ` Alejandro Colomar
0 siblings, 1 reply; 10+ messages in thread
From: Alejandro Colomar @ 2023-11-30 22:38 UTC (permalink / raw)
To: Deri; +Cc: linux-man
[-- Attachment #1: Type: text/plain, Size: 712 bytes --]
Hi Deri,
On Thu, Nov 30, 2023 at 04:56:38PM +0000, Deri wrote:
> Hi Alex,
>
> I have attached the latest iteration of my work, managed to knock two seconds
> off the current code in your git.
Nice!
> It no longer uses temporary files,
It is creating a temporary dj.Z file. Is that a leftover?
> outputs
> the pdf to stdout, can be run from any directory and runs groff once.
Great.
> It replaces the complete LinuxManBook directory and the executable is now
> called BuildLinuxMan2.pl.
I'd prefer if the huge groff code would go in a separate file. Would
that make sense?
Thanks,
Alex
>
> Cheers
>
> Deri
>
>
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Optimize script for generating LinuxManBook.pdf
2023-11-30 22:38 ` Alejandro Colomar
@ 2023-12-01 0:14 ` Alejandro Colomar
2023-12-01 0:37 ` Alejandro Colomar
0 siblings, 1 reply; 10+ messages in thread
From: Alejandro Colomar @ 2023-12-01 0:14 UTC (permalink / raw)
To: Deri; +Cc: linux-man
[-- Attachment #1: Type: text/plain, Size: 849 bytes --]
Hi Deri,
On Thu, Nov 30, 2023 at 11:38:18PM +0100, Alejandro Colomar wrote:
> > It replaces the complete LinuxManBook directory and the executable is now
> > called BuildLinuxMan2.pl.
>
> I'd prefer if the huge groff code would go in a separate file. Would
> that make sense?
I've applied a few tweaks to the script you sent before committing it.
The performance is similar, and it's less of a change to the current
code.
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=0e64299d4dd0df90ea52b3fe6777e5ebfb2484da>
I think we should be able to cut half a second or so if we add a
BuildSec() function so that we don't block the first page until the
entire sort is done. And it would be also more readable. Please check.
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Optimize script for generating LinuxManBook.pdf
2023-12-01 0:14 ` Alejandro Colomar
@ 2023-12-01 0:37 ` Alejandro Colomar
0 siblings, 0 replies; 10+ messages in thread
From: Alejandro Colomar @ 2023-12-01 0:37 UTC (permalink / raw)
To: Deri; +Cc: linux-man
[-- Attachment #1: Type: text/plain, Size: 1182 bytes --]
On Fri, Dec 01, 2023 at 01:14:58AM +0100, Alejandro Colomar wrote:
> Hi Deri,
>
> On Thu, Nov 30, 2023 at 11:38:18PM +0100, Alejandro Colomar wrote:
> > > It replaces the complete LinuxManBook directory and the executable is now
> > > called BuildLinuxMan2.pl.
> >
> > I'd prefer if the huge groff code would go in a separate file. Would
> > that make sense?
>
> I've applied a few tweaks to the script you sent before committing it.
> The performance is similar, and it's less of a change to the current
> code.
>
> <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=0e64299d4dd0df90ea52b3fe6777e5ebfb2484da>
>
> I think we should be able to cut half a second or so if we add a
> BuildSec() function so that we don't block the first page until the
> entire sort is done. And it would be also more readable. Please check.
I amended a bit more, to keep the old LMBfront and an.tmac files with
minimal changes.
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=1adc2425f771bdd4089b06646d21c9eecf73c69c>
Cheers,
Alex
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-12-01 0:37 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-22 14:58 Optimize script for generating LinuxManBook.pdf Alejandro Colomar
2023-11-22 17:33 ` Deri
2023-11-22 17:39 ` Alejandro Colomar
2023-11-23 18:41 ` Deri
2023-11-23 22:12 ` Alejandro Colomar
2023-11-24 11:46 ` Alejandro Colomar
2023-11-30 16:56 ` Deri
2023-11-30 22:38 ` Alejandro Colomar
2023-12-01 0:14 ` Alejandro Colomar
2023-12-01 0:37 ` Alejandro Colomar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox