Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)

public inbox for linux-man@vger.kernel.org
 help / color / mirror / Atom feed

From: Alejandro Colomar <alx@kernel.org>
To: Deri <deri@chuzzlewit.myzen.co.uk>
Cc: Jonny Grant <jg@jguk.org>, linux-man <linux-man@vger.kernel.org>
Subject: Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
Date: Sun, 19 Nov 2023 21:58:03 +0100	[thread overview]
Message-ID: <ZVp24b1vXfoS8ABi@devuan> (raw)
In-Reply-To: <12344046.3XHVMEB1Be@pip>

[-- Attachment #1: Type: text/plain, Size: 11139 bytes --]

On Sun, Nov 19, 2023 at 04:21:45PM +0000, Deri wrote:
> > 	$ touch man2/membarrier.2
> > 	$ make build-pdf
> > 	PRECONV	.tmp/man/man2/membarrier.2.tbl
> > 	TBL	.tmp/man/man2/membarrier.2.eqn
> > 	EQN	.tmp/man/man2/membarrier.2.pdf.troff
> > 	TROFF	.tmp/man/man2/membarrier.2.pdf.set
> > 	GROPDF	.tmp/man/man2/membarrier.2.pdf
> > 
> > That helps debug the pipeline, and also learn about it.
> > 
> > If that helps parallelize some tasks, then that'll be welcome.
> 
> Hi Alex,

Hi Deri,

> Doing it that way actually stops the jobs being run in parallel! Each step 

Hmm, kind of makes sense.

> completes before the next step starts, whereas if you let groff build the 
> pipeline all the processes are run in parallel. Using separate steps may be 
> desirable for "understanding every little step of the groff pipeline", (and 

Still a useful thing for our build system.

> may aid debugging an issue), but once such knowledge is obtained it is 
> probably better to leave the pipelining to groff, in a production environment.

Unless performance is really a problem, I prefer the understanding and
debugging aid.  It'll help not only me, but others who see the project
and would like to learn how all this magic works.

> > > The time saved would be absolutely minimal. It is obvious that to produce
> > > a
> > > pdf containing all the man pages then all the man pages have to be
> > > consumed by groff, not just the page which has changed.
> > 
> > But do you need to run the entire pipeline, or can you reuse most of it?
> > I can process in parallel much faster, with `make -jN ...`.  I guess
> > the .pdf.troff files can be reused; maybe even the .pdf.set ones?
> > 
> > Could you change the script at least to produce intermediary files as in
> > the pipeline shown above?  As many as possible would be excellent.
> 
> Perhaps it would help if I explain the stages of my script. First a look at 
> what the script needs to do to produce a pdf of all man pages. There are too 
> many files to produce a single command line with all the filenames of each 
> man, groff has no mechanism for passing a list of filenames, so first job is 

You can always `find ... | xargs cat | troff /dev/stdin`

> to concatenate all the separate files into one input file for groff. And while 
> we are doing that, add the "magic sauce" which makes all the pdf links in the 
> book and sorts out the aliases which point to another man page.

Yep, I think I partially understood that part of the script today.  It's
what this `... | LC_ALL=C grep '^\\. *ds' |` pipeline produces and
passes to groff, right?

> After this is done there is a single troff file, called LMB.man, which is the 

That's what's currently called LinuxManBook.Z, right?

> file groff is going to process. In the script you should see something like 
> this:-
> 
> my $temp='LMB.man';

I don't.  Maybe you have a slightly different version of it?

> [...]
> 
> my $format='pdf';
> my $paper=$fpaper ||';
> my $cmdstring="-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=$paper -P-
> p$paper -rC1 -rCHECKSTYLE=3";
> my $front='LMBfront.t';
> my $frontdit='LMBfront.set';
> my $mandit='LinuxManBook.set';
> my $book="LinuxManBook.$format";
> 
> system("groff -T$format -dpaper=$paper -P-p$paper -ms $front -Z > $frontdit");

This creates the front page .set file

> system("groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 $temp $cmdstring 2>&1 | 
> LC_ALL=C grep '^\\. *ds' |

This creates the bookmarks, right?

> groff -T$format $cmdstring - $temp -Z > $mandit");

And this is the main .set file.

> system("./gro$format -F.:/usr/share/groff/current/font $frontdit $mandit -
> p$paper  > $book");

And finally we have the book.

> 
> (This includes changes by Brian Inglish ts). If you remove the lines which 
> call system you will end up with just the single file LMB.man (in about a 
> quarter of a second). You can treat this file just the same as your single 
> page example if you want to.
> 
> The first system call creates the title page from the troff source file 
> LMBfront.t and produces LMBfront.set, this can be added to your makefile as an 
> entirely separate rule depending on whether the .set file needs to be built.
> 
> The second and third system calls are the calls to groff which could be put 
> into your makefile or split into separate stages to avoid parallelism.
> 
> The second system call produces LinuxManBook.set and the third system combines 
> this with LMBfront.set to produce the pdf.
> 
> The "./" in the third system call is because I gave you a pre-release gropdf, 
> you may be using the released 1.23.0 gropdf now.
> 
> > > On my system this takes about 18
> > > seconds to produce the 2800+ pages of the book. Of this, a quarter of a
> > > second is consumed by the "magic" part of the script, the rest of the 18
> > > seconds is consumed by calls to groff and gropdf.
> > 
> > But how much of that work needs to be on a single process?  I bought a
> > new CPU with 24 cores.  Gotta use them all  :D
> 
> I realise you are having difficulty in letting go of your idea of re-using 
> previous work, rather than starting afresh each time. Imagine a single word 
> change in one man page causes it to grow from 2 pages to 3, so all links to 
> pages after this changed entry would be one page adrift. This is why very 
> little previous work is useful, and why the whole book has to be dealt with as 
> a single process.

Does such a change need re-running troff(1)?  Or is gropdf(1) enough?  If
troff(1)

My problem is probably that I don't know what's done by `gropdf`, and
what's done by `troff -Tpdf`.  I was hoping that `troff -Tpdf` still
didn't need to know about the entire book, and that only gropdf(1) would
need that.

> If each entry was processed separately, as you would like to 
> use all your shiny new cores, how would the process dealing with accept(2) 
> know which page socket(2) would be on when it adds it as a link in the text. I 
> hope you can see that at some point it has to be treated as a homogenous whole 
> in order calculate correct links between entries.
> 
> > > So any splitting of the perl script is
> > > only going to have an effect on the quarter of a second!
> > > 
> > > I don't understand why the perl script can't be included in your make file
> > > as part of build-pdf target.
> > 
> > It can.  I just prefer to be strict about the Makefile having "one rule
> > per each file", while currently the script generates 4 files (T, two
> > .Z's, and the .pdf).
> 
> Explained how to separate above so that the script only generates LMB.man and 
> the system calls moved to the makefile.

Thanks!

> > > Presumably it would be dependent on running after
> > > the scripts which add the revision label and date to each man page.
> > 
> > I only set the revision and date on dist tarballs.  For the git HEAD
> > book, I'd keep the (unreleased) version and (date).  So, no worries
> > there.
> 
> Given that you seem to intend to offer these interim books as a download, it 
> would make sense if they included either a date or git commit ID to 
> differenciate them, if someone queries something it would be useful to know 
> exactly what they were looking at.

The books for releases are available at

<https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf>

(replace the version numbers for other versions, or navigate the dirs)
I need to document that in the README of the project.

For git HEAD, I plan to have something like

<https://www.alejandro-colomar.es/share/dist/man-pages/git/man-pages-HEAD.pdf>

It's mainly intended for easily checking what git HEAD looks like, and
discard that later.  If the audience asks for version numbers, though,
I could create provide `git --describe` versions and dates in the pages.

> Cheers 
> 
> Deri
> 
> > > > Since I don't understand Perl, and don't know much of gropdf(1) either,
> > > > I need help.
> > > > 
> > > > Maybe Deri or Branden can help with that.  If anyone else understands it
> > > > and can also help, that's very welcome too!
> > > 
> > > You are probably better placed to add the necessaries to your makefile.
> > > You
> > > would then just need to remember to make build-pdf any time you alter one
> > > of the source man pages. Since you are manually running my script to
> > > produce the pdf, it should not be difficult to automate it in a makefile.
> > > 
> > > > Then I could install a hook in my server that runs
> > > > 
> > > > 	$ make build-pdf docdir=/srv/www/...
> > > 
> > > And wait 18s each time the hook is actioned!! Or, set the build to place
> > > the generated pdf somewhere in /srv/www/... and include the build in your
> > > normal workflow when a man page is changed.
> > 
> > Hmm.  I still hope some of it can be parallelized, but 18s could be
> > reasonable, if the server does that in the background after pushing.
> > My old raspberry pi would burn, but the new computer should handle that
> > just fine.
> 
> I'm confused. The 18s is how long it takes to generate the book, so if the 
> book is built in response to an access to a particular url the http server 
> can't start "pushing" for the 18s, then addon the transfer time for the pdf 
> and I suspect you will have a lot of aborted transfers. Additionally, the 
> script, and any makefile equivalent you write, is not designed for concurrent 
> invocation, so if two people visit the same url within the 18 second window 
> neither user will receive a valid pdf.

No, my intention is that whenever I `git push` via SSH, the receiving
server runs `make build-book-pdf` after receiving the changes.  That is
run after the git SSH connection has closed, so I wouldn't notice.

HTTP connections wouldn't trigger anything in my server, except Nginx
serving the file, of course.

> I advise the build becomes part of your workflow after making changes, and 
> then place the pdf in a location where it can be served by the http server.
> 
> Your model of slicing and dicing man pages to be processed individually is 
> doable using a website to serve the individual pages, see:-
> 
> http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept
> 
> This is running on a 1" cube no more powerful than a raspberry pi 3. The 
> difference is that the "magic sauce" added to each man page sets the links to 
> external http calls back to itself to produce another man page, rather than 
> internal links to another part of the pdf. You can get an index of all the man 
> pages, on the (very old) system, here.
> 
> http://chuzzlewit.co.uk/

Yep, I've seen that server :)
Long term I also intend to provide one-page PDFs and HTML files of the
pages.  Although I prefer pre-generating them, instead of on-demand.
Maybe a git hook, or maybe a cron job that re-generates them once a day
or so.

Cheers,
Alex

> 
> Cheers 
> 
> Deri

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2023-11-19 20:55 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
2023-11-04 19:33 ` Alejandro Colomar
2023-11-04 21:18   ` Jonny Grant
2023-11-05  1:36     ` Alejandro Colomar
2023-11-05 21:16   ` Jonny Grant
2023-11-05 23:31     ` Alejandro Colomar
2023-11-07 11:52       ` Jonny Grant
2023-11-07 13:23         ` Alejandro Colomar
2023-11-07 14:19           ` Jonny Grant
2023-11-07 16:17             ` Alejandro Colomar
2023-11-07 17:00               ` Jonny Grant
2023-11-07 17:20                 ` Alejandro Colomar
2023-11-08  6:18               ` Oskari Pirhonen
2023-11-08  9:51                 ` Alejandro Colomar
2023-11-08  9:59                   ` Thorsten Kukuk
2023-11-08 15:09                     ` Alejandro Colomar
     [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
2023-11-08 15:44                       ` Thorsten Kukuk
2023-11-08 17:26                         ` Adhemerval Zanella Netto
2023-11-08 14:06                   ` Zack Weinberg
2023-11-08 15:07                     ` Alejandro Colomar
2023-11-08 19:45                       ` G. Branden Robinson
2023-11-08 21:35                       ` Carlos O'Donell
2023-11-08 22:11                         ` Alejandro Colomar
2023-11-08 23:31                           ` Paul Eggert
2023-11-09  0:29                             ` Alejandro Colomar
2023-11-09 10:13                               ` Jonny Grant
2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-09 14:06                                   ` catenate vs concatenate Jonny Grant
2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
2023-11-27 15:08                                     ` Alejandro Colomar
2023-11-27 15:13                                       ` Alejandro Colomar
2023-11-27 16:59                                       ` G. Branden Robinson
2023-11-27 18:35                                         ` Zack Weinberg
2023-11-27 23:45                                           ` G. Branden Robinson
2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-09 14:05                                   ` Jonny Grant
2023-11-09 15:04                                     ` Alejandro Colomar
2023-11-08 19:04                   ` DJ Delorie
2023-11-08 19:40                     ` Alejandro Colomar
2023-11-08 19:58                       ` DJ Delorie
2023-11-08 20:13                         ` Alejandro Colomar
2023-11-08 21:07                           ` DJ Delorie
2023-11-08 21:50                             ` Alejandro Colomar
2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
2023-11-08 23:06                                 ` Paul Eggert
2023-11-08 23:28                                   ` DJ Delorie
2023-11-09  0:24                                   ` Alejandro Colomar
2023-11-09 14:11                                   ` Jonny Grant
2023-11-09 14:35                                     ` Alejandro Colomar
2023-11-09 14:47                                       ` Jonny Grant
2023-11-09 15:02                                         ` Alejandro Colomar
2023-11-09 17:30                                           ` DJ Delorie
2023-11-09 17:54                                             ` Andreas Schwab
2023-11-09 18:00                                             ` Alejandro Colomar
2023-11-09 19:42                                             ` Jonny Grant
2023-11-09  7:23                                 ` Oskari Pirhonen
2023-11-09 15:20                                 ` [PATCH v2 1/2] " Alejandro Colomar
2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
2023-11-10  5:47                                   ` Oskari Pirhonen
2023-11-10 10:47                                     ` Alejandro Colomar
2023-11-08  2:12           ` strncpy clarify result may not be null terminated Matthew House
2023-11-08 19:33             ` Alejandro Colomar
2023-11-08 19:40               ` Alejandro Colomar
2023-11-09  3:13               ` Matthew House
2023-11-09 10:26                 ` Jonny Grant
2023-11-09 10:31                 ` Jonny Grant
2023-11-09 11:38                   ` Alejandro Colomar
2023-11-09 12:43                     ` Alejandro Colomar
2023-11-09 12:51                     ` Xi Ruoyao
2023-11-09 14:01                       ` Alejandro Colomar
2023-11-09 18:11                     ` Paul Eggert
2023-11-09 23:48                       ` Alejandro Colomar
2023-11-10  5:36                         ` Paul Eggert
2023-11-10 11:05                           ` Alejandro Colomar
2023-11-10 11:47                             ` Alejandro Colomar
2023-11-10 17:58                             ` Paul Eggert
2023-11-10 18:36                               ` Alejandro Colomar
2023-11-10 20:19                                 ` Alejandro Colomar
2023-11-10 23:44                                   ` Jonny Grant
2023-11-10 19:52                               ` Alejandro Colomar
2023-11-10 22:14                                 ` Paul Eggert
2023-11-11 21:13                                   ` Alejandro Colomar
2023-11-11 22:20                                     ` Paul Eggert
2023-11-12  9:52                                     ` Jonny Grant
2023-11-12 10:59                                       ` Alejandro Colomar
2023-11-12 20:49                                         ` Paul Eggert
2023-11-12 21:00                                           ` Alejandro Colomar
2023-11-12 21:45                                             ` Alejandro Colomar
2023-11-13 23:46                                           ` Jonny Grant
2023-11-17 21:57                                         ` Jonny Grant
2023-11-18 10:12                                           ` Alejandro Colomar
2023-11-18 23:03                                             ` Jonny Grant
2023-11-10 11:36                           ` Jonny Grant
2023-11-10 13:15                             ` Alejandro Colomar
2023-11-18 23:40                               ` Jonny Grant
2023-11-20 11:56                                 ` Jonny Grant
2023-11-20 15:12                                   ` Alejandro Colomar
2023-11-20 23:08                                     ` Jonny Grant
2023-11-20 23:42                                       ` Alejandro Colomar
2023-11-10 11:23                     ` Jonny Grant
2023-11-09 12:23                 ` Alejandro Colomar
2023-11-09 12:35                   ` Alejandro Colomar
2023-11-10  7:06                   ` Oskari Pirhonen
2023-11-10 11:18                     ` Alejandro Colomar
2023-11-11  7:55                       ` Oskari Pirhonen
2023-11-10 16:06                   ` Matthew House
2023-11-10 17:48                     ` Alejandro Colomar
2023-11-13 15:01                       ` Matthew House
2023-11-11 20:55                     ` Jonny Grant
2023-11-11 21:15                       ` Jonny Grant
2023-11-11 22:36                         ` Alejandro Colomar
2023-11-11 23:19                           ` Alejandro Colomar
2023-11-17 21:46                           ` Jonny Grant
2023-11-18  9:37                             ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-19  0:22                               ` Deri
2023-11-19  1:19                                 ` Alejandro Colomar
2023-11-19  9:29                                   ` Alejandro Colomar
2023-11-19 16:21                                   ` Deri
2023-11-19 20:58                                     ` Alejandro Colomar [this message]
2023-11-20  0:46                                       ` G. Branden Robinson
2023-11-20  9:43                                         ` Alejandro Colomar
2023-11-18  9:44                             ` NULL safety " Alejandro Colomar
2023-11-18 23:21                               ` NULL safety Jonny Grant
2023-11-24 22:25                                 ` Alejandro Colomar
2023-11-25  0:57                                   ` Jonny Grant
2023-11-10 10:40               ` strncpy clarify result may not be null terminated Stefan Puiu
2023-11-10 11:06                 ` Jonny Grant
2023-11-10 11:20                 ` Alejandro Colomar
2023-11-12  9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
2023-11-12  9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-12  9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-17 21:43   ` Jonny Grant
2023-11-18  0:25     ` Signing all patches and email to this list Matthew House
2023-11-18 23:24       ` Jonny Grant
2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZVp24b1vXfoS8ABi@devuan \
    --to=alx@kernel.org \
    --cc=deri@chuzzlewit.myzen.co.uk \
    --cc=jg@jguk.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox