From: Alejandro Colomar <alx@kernel.org>
To: Deri <deri@chuzzlewit.myzen.co.uk>
Cc: Jonny Grant <jg@jguk.org>, linux-man <linux-man@vger.kernel.org>
Subject: Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
Date: Sun, 19 Nov 2023 21:58:03 +0100 [thread overview]
Message-ID: <ZVp24b1vXfoS8ABi@devuan> (raw)
In-Reply-To: <12344046.3XHVMEB1Be@pip>
[-- Attachment #1: Type: text/plain, Size: 11139 bytes --]
On Sun, Nov 19, 2023 at 04:21:45PM +0000, Deri wrote:
> > $ touch man2/membarrier.2
> > $ make build-pdf
> > PRECONV .tmp/man/man2/membarrier.2.tbl
> > TBL .tmp/man/man2/membarrier.2.eqn
> > EQN .tmp/man/man2/membarrier.2.pdf.troff
> > TROFF .tmp/man/man2/membarrier.2.pdf.set
> > GROPDF .tmp/man/man2/membarrier.2.pdf
> >
> > That helps debug the pipeline, and also learn about it.
> >
> > If that helps parallelize some tasks, then that'll be welcome.
>
> Hi Alex,
Hi Deri,
> Doing it that way actually stops the jobs being run in parallel! Each step
Hmm, kind of makes sense.
> completes before the next step starts, whereas if you let groff build the
> pipeline all the processes are run in parallel. Using separate steps may be
> desirable for "understanding every little step of the groff pipeline", (and
Still a useful thing for our build system.
> may aid debugging an issue), but once such knowledge is obtained it is
> probably better to leave the pipelining to groff, in a production environment.
Unless performance is really a problem, I prefer the understanding and
debugging aid. It'll help not only me, but others who see the project
and would like to learn how all this magic works.
> > > The time saved would be absolutely minimal. It is obvious that to produce
> > > a
> > > pdf containing all the man pages then all the man pages have to be
> > > consumed by groff, not just the page which has changed.
> >
> > But do you need to run the entire pipeline, or can you reuse most of it?
> > I can process in parallel much faster, with `make -jN ...`. I guess
> > the .pdf.troff files can be reused; maybe even the .pdf.set ones?
> >
> > Could you change the script at least to produce intermediary files as in
> > the pipeline shown above? As many as possible would be excellent.
>
> Perhaps it would help if I explain the stages of my script. First a look at
> what the script needs to do to produce a pdf of all man pages. There are too
> many files to produce a single command line with all the filenames of each
> man, groff has no mechanism for passing a list of filenames, so first job is
You can always `find ... | xargs cat | troff /dev/stdin`
> to concatenate all the separate files into one input file for groff. And while
> we are doing that, add the "magic sauce" which makes all the pdf links in the
> book and sorts out the aliases which point to another man page.
Yep, I think I partially understood that part of the script today. It's
what this `... | LC_ALL=C grep '^\\. *ds' |` pipeline produces and
passes to groff, right?
> After this is done there is a single troff file, called LMB.man, which is the
That's what's currently called LinuxManBook.Z, right?
> file groff is going to process. In the script you should see something like
> this:-
>
> my $temp='LMB.man';
I don't. Maybe you have a slightly different version of it?
> [...]
>
> my $format='pdf';
> my $paper=$fpaper ||';
> my $cmdstring="-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=$paper -P-
> p$paper -rC1 -rCHECKSTYLE=3";
> my $front='LMBfront.t';
> my $frontdit='LMBfront.set';
> my $mandit='LinuxManBook.set';
> my $book="LinuxManBook.$format";
>
> system("groff -T$format -dpaper=$paper -P-p$paper -ms $front -Z > $frontdit");
This creates the front page .set file
> system("groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 $temp $cmdstring 2>&1 |
> LC_ALL=C grep '^\\. *ds' |
This creates the bookmarks, right?
> groff -T$format $cmdstring - $temp -Z > $mandit");
And this is the main .set file.
> system("./gro$format -F.:/usr/share/groff/current/font $frontdit $mandit -
> p$paper > $book");
And finally we have the book.
>
> (This includes changes by Brian Inglish ts). If you remove the lines which
> call system you will end up with just the single file LMB.man (in about a
> quarter of a second). You can treat this file just the same as your single
> page example if you want to.
>
> The first system call creates the title page from the troff source file
> LMBfront.t and produces LMBfront.set, this can be added to your makefile as an
> entirely separate rule depending on whether the .set file needs to be built.
>
> The second and third system calls are the calls to groff which could be put
> into your makefile or split into separate stages to avoid parallelism.
>
> The second system call produces LinuxManBook.set and the third system combines
> this with LMBfront.set to produce the pdf.
>
> The "./" in the third system call is because I gave you a pre-release gropdf,
> you may be using the released 1.23.0 gropdf now.
>
> > > On my system this takes about 18
> > > seconds to produce the 2800+ pages of the book. Of this, a quarter of a
> > > second is consumed by the "magic" part of the script, the rest of the 18
> > > seconds is consumed by calls to groff and gropdf.
> >
> > But how much of that work needs to be on a single process? I bought a
> > new CPU with 24 cores. Gotta use them all :D
>
> I realise you are having difficulty in letting go of your idea of re-using
> previous work, rather than starting afresh each time. Imagine a single word
> change in one man page causes it to grow from 2 pages to 3, so all links to
> pages after this changed entry would be one page adrift. This is why very
> little previous work is useful, and why the whole book has to be dealt with as
> a single process.
Does such a change need re-running troff(1)? Or is gropdf(1) enough? If
troff(1)
My problem is probably that I don't know what's done by `gropdf`, and
what's done by `troff -Tpdf`. I was hoping that `troff -Tpdf` still
didn't need to know about the entire book, and that only gropdf(1) would
need that.
> If each entry was processed separately, as you would like to
> use all your shiny new cores, how would the process dealing with accept(2)
> know which page socket(2) would be on when it adds it as a link in the text. I
> hope you can see that at some point it has to be treated as a homogenous whole
> in order calculate correct links between entries.
>
> > > So any splitting of the perl script is
> > > only going to have an effect on the quarter of a second!
> > >
> > > I don't understand why the perl script can't be included in your make file
> > > as part of build-pdf target.
> >
> > It can. I just prefer to be strict about the Makefile having "one rule
> > per each file", while currently the script generates 4 files (T, two
> > .Z's, and the .pdf).
>
> Explained how to separate above so that the script only generates LMB.man and
> the system calls moved to the makefile.
Thanks!
> > > Presumably it would be dependent on running after
> > > the scripts which add the revision label and date to each man page.
> >
> > I only set the revision and date on dist tarballs. For the git HEAD
> > book, I'd keep the (unreleased) version and (date). So, no worries
> > there.
>
> Given that you seem to intend to offer these interim books as a download, it
> would make sense if they included either a date or git commit ID to
> differenciate them, if someone queries something it would be useful to know
> exactly what they were looking at.
The books for releases are available at
<https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf>
(replace the version numbers for other versions, or navigate the dirs)
I need to document that in the README of the project.
For git HEAD, I plan to have something like
<https://www.alejandro-colomar.es/share/dist/man-pages/git/man-pages-HEAD.pdf>
It's mainly intended for easily checking what git HEAD looks like, and
discard that later. If the audience asks for version numbers, though,
I could create provide `git --describe` versions and dates in the pages.
> Cheers
>
> Deri
>
> > > > Since I don't understand Perl, and don't know much of gropdf(1) either,
> > > > I need help.
> > > >
> > > > Maybe Deri or Branden can help with that. If anyone else understands it
> > > > and can also help, that's very welcome too!
> > >
> > > You are probably better placed to add the necessaries to your makefile.
> > > You
> > > would then just need to remember to make build-pdf any time you alter one
> > > of the source man pages. Since you are manually running my script to
> > > produce the pdf, it should not be difficult to automate it in a makefile.
> > >
> > > > Then I could install a hook in my server that runs
> > > >
> > > > $ make build-pdf docdir=/srv/www/...
> > >
> > > And wait 18s each time the hook is actioned!! Or, set the build to place
> > > the generated pdf somewhere in /srv/www/... and include the build in your
> > > normal workflow when a man page is changed.
> >
> > Hmm. I still hope some of it can be parallelized, but 18s could be
> > reasonable, if the server does that in the background after pushing.
> > My old raspberry pi would burn, but the new computer should handle that
> > just fine.
>
> I'm confused. The 18s is how long it takes to generate the book, so if the
> book is built in response to an access to a particular url the http server
> can't start "pushing" for the 18s, then addon the transfer time for the pdf
> and I suspect you will have a lot of aborted transfers. Additionally, the
> script, and any makefile equivalent you write, is not designed for concurrent
> invocation, so if two people visit the same url within the 18 second window
> neither user will receive a valid pdf.
No, my intention is that whenever I `git push` via SSH, the receiving
server runs `make build-book-pdf` after receiving the changes. That is
run after the git SSH connection has closed, so I wouldn't notice.
HTTP connections wouldn't trigger anything in my server, except Nginx
serving the file, of course.
> I advise the build becomes part of your workflow after making changes, and
> then place the pdf in a location where it can be served by the http server.
>
> Your model of slicing and dicing man pages to be processed individually is
> doable using a website to serve the individual pages, see:-
>
> http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept
>
> This is running on a 1" cube no more powerful than a raspberry pi 3. The
> difference is that the "magic sauce" added to each man page sets the links to
> external http calls back to itself to produce another man page, rather than
> internal links to another part of the pdf. You can get an index of all the man
> pages, on the (very old) system, here.
>
> http://chuzzlewit.co.uk/
Yep, I've seen that server :)
Long term I also intend to provide one-page PDFs and HTML files of the
pages. Although I prefer pre-generating them, instead of on-demand.
Maybe a git hook, or maybe a cron job that re-generates them once a day
or so.
Cheers,
Alex
>
> Cheers
>
> Deri
--
<https://www.alejandro-colomar.es/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-11-19 20:55 UTC|newest]
Thread overview: 138+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
2023-11-04 19:33 ` Alejandro Colomar
2023-11-04 21:18 ` Jonny Grant
2023-11-05 1:36 ` Alejandro Colomar
2023-11-05 21:16 ` Jonny Grant
2023-11-05 23:31 ` Alejandro Colomar
2023-11-07 11:52 ` Jonny Grant
2023-11-07 13:23 ` Alejandro Colomar
2023-11-07 14:19 ` Jonny Grant
2023-11-07 16:17 ` Alejandro Colomar
2023-11-07 17:00 ` Jonny Grant
2023-11-07 17:20 ` Alejandro Colomar
2023-11-08 6:18 ` Oskari Pirhonen
2023-11-08 9:51 ` Alejandro Colomar
2023-11-08 9:59 ` Thorsten Kukuk
2023-11-08 15:09 ` Alejandro Colomar
[not found] ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
2023-11-08 15:44 ` Thorsten Kukuk
2023-11-08 17:26 ` Adhemerval Zanella Netto
2023-11-08 14:06 ` Zack Weinberg
2023-11-08 15:07 ` Alejandro Colomar
2023-11-08 19:45 ` G. Branden Robinson
2023-11-08 21:35 ` Carlos O'Donell
2023-11-08 22:11 ` Alejandro Colomar
2023-11-08 23:31 ` Paul Eggert
2023-11-09 0:29 ` Alejandro Colomar
2023-11-09 10:13 ` Jonny Grant
2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-09 14:06 ` catenate vs concatenate Jonny Grant
2023-11-27 14:33 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
2023-11-27 15:08 ` Alejandro Colomar
2023-11-27 15:13 ` Alejandro Colomar
2023-11-27 16:59 ` G. Branden Robinson
2023-11-27 18:35 ` Zack Weinberg
2023-11-27 23:45 ` G. Branden Robinson
2023-11-09 11:13 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-09 14:05 ` Jonny Grant
2023-11-09 15:04 ` Alejandro Colomar
2023-11-08 19:04 ` DJ Delorie
2023-11-08 19:40 ` Alejandro Colomar
2023-11-08 19:58 ` DJ Delorie
2023-11-08 20:13 ` Alejandro Colomar
2023-11-08 21:07 ` DJ Delorie
2023-11-08 21:50 ` Alejandro Colomar
2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
2023-11-08 23:06 ` Paul Eggert
2023-11-08 23:28 ` DJ Delorie
2023-11-09 0:24 ` Alejandro Colomar
2023-11-09 14:11 ` Jonny Grant
2023-11-09 14:35 ` Alejandro Colomar
2023-11-09 14:47 ` Jonny Grant
2023-11-09 15:02 ` Alejandro Colomar
2023-11-09 17:30 ` DJ Delorie
2023-11-09 17:54 ` Andreas Schwab
2023-11-09 18:00 ` Alejandro Colomar
2023-11-09 19:42 ` Jonny Grant
2023-11-09 7:23 ` Oskari Pirhonen
2023-11-09 15:20 ` [PATCH v2 1/2] " Alejandro Colomar
2023-11-09 15:20 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
2023-11-10 5:47 ` Oskari Pirhonen
2023-11-10 10:47 ` Alejandro Colomar
2023-11-08 2:12 ` strncpy clarify result may not be null terminated Matthew House
2023-11-08 19:33 ` Alejandro Colomar
2023-11-08 19:40 ` Alejandro Colomar
2023-11-09 3:13 ` Matthew House
2023-11-09 10:26 ` Jonny Grant
2023-11-09 10:31 ` Jonny Grant
2023-11-09 11:38 ` Alejandro Colomar
2023-11-09 12:43 ` Alejandro Colomar
2023-11-09 12:51 ` Xi Ruoyao
2023-11-09 14:01 ` Alejandro Colomar
2023-11-09 18:11 ` Paul Eggert
2023-11-09 23:48 ` Alejandro Colomar
2023-11-10 5:36 ` Paul Eggert
2023-11-10 11:05 ` Alejandro Colomar
2023-11-10 11:47 ` Alejandro Colomar
2023-11-10 17:58 ` Paul Eggert
2023-11-10 18:36 ` Alejandro Colomar
2023-11-10 20:19 ` Alejandro Colomar
2023-11-10 23:44 ` Jonny Grant
2023-11-10 19:52 ` Alejandro Colomar
2023-11-10 22:14 ` Paul Eggert
2023-11-11 21:13 ` Alejandro Colomar
2023-11-11 22:20 ` Paul Eggert
2023-11-12 9:52 ` Jonny Grant
2023-11-12 10:59 ` Alejandro Colomar
2023-11-12 20:49 ` Paul Eggert
2023-11-12 21:00 ` Alejandro Colomar
2023-11-12 21:45 ` Alejandro Colomar
2023-11-13 23:46 ` Jonny Grant
2023-11-17 21:57 ` Jonny Grant
2023-11-18 10:12 ` Alejandro Colomar
2023-11-18 23:03 ` Jonny Grant
2023-11-10 11:36 ` Jonny Grant
2023-11-10 13:15 ` Alejandro Colomar
2023-11-18 23:40 ` Jonny Grant
2023-11-20 11:56 ` Jonny Grant
2023-11-20 15:12 ` Alejandro Colomar
2023-11-20 23:08 ` Jonny Grant
2023-11-20 23:42 ` Alejandro Colomar
2023-11-10 11:23 ` Jonny Grant
2023-11-09 12:23 ` Alejandro Colomar
2023-11-09 12:35 ` Alejandro Colomar
2023-11-10 7:06 ` Oskari Pirhonen
2023-11-10 11:18 ` Alejandro Colomar
2023-11-11 7:55 ` Oskari Pirhonen
2023-11-10 16:06 ` Matthew House
2023-11-10 17:48 ` Alejandro Colomar
2023-11-13 15:01 ` Matthew House
2023-11-11 20:55 ` Jonny Grant
2023-11-11 21:15 ` Jonny Grant
2023-11-11 22:36 ` Alejandro Colomar
2023-11-11 23:19 ` Alejandro Colomar
2023-11-17 21:46 ` Jonny Grant
2023-11-18 9:37 ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-19 0:22 ` Deri
2023-11-19 1:19 ` Alejandro Colomar
2023-11-19 9:29 ` Alejandro Colomar
2023-11-19 16:21 ` Deri
2023-11-19 20:58 ` Alejandro Colomar [this message]
2023-11-20 0:46 ` G. Branden Robinson
2023-11-20 9:43 ` Alejandro Colomar
2023-11-18 9:44 ` NULL safety " Alejandro Colomar
2023-11-18 23:21 ` NULL safety Jonny Grant
2023-11-24 22:25 ` Alejandro Colomar
2023-11-25 0:57 ` Jonny Grant
2023-11-10 10:40 ` strncpy clarify result may not be null terminated Stefan Puiu
2023-11-10 11:06 ` Jonny Grant
2023-11-10 11:20 ` Alejandro Colomar
2023-11-12 9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
2023-11-12 9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-12 9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-17 21:43 ` Jonny Grant
2023-11-18 0:25 ` Signing all patches and email to this list Matthew House
2023-11-18 23:24 ` Jonny Grant
2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZVp24b1vXfoS8ABi@devuan \
--to=alx@kernel.org \
--cc=deri@chuzzlewit.myzen.co.uk \
--cc=jg@jguk.org \
--cc=linux-man@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox