All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx@kernel.org>
To: Deri <deri@chuzzlewit.myzen.co.uk>
Cc: Jonny Grant <jg@jguk.org>, linux-man <linux-man@vger.kernel.org>
Subject: Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
Date: Sun, 19 Nov 2023 21:58:03 +0100	[thread overview]
Message-ID: <ZVp24b1vXfoS8ABi@devuan> (raw)
In-Reply-To: <12344046.3XHVMEB1Be@pip>

[-- Attachment #1: Type: text/plain, Size: 11139 bytes --]

On Sun, Nov 19, 2023 at 04:21:45PM +0000, Deri wrote:
> > 	$ touch man2/membarrier.2
> > 	$ make build-pdf
> > 	PRECONV	.tmp/man/man2/membarrier.2.tbl
> > 	TBL	.tmp/man/man2/membarrier.2.eqn
> > 	EQN	.tmp/man/man2/membarrier.2.pdf.troff
> > 	TROFF	.tmp/man/man2/membarrier.2.pdf.set
> > 	GROPDF	.tmp/man/man2/membarrier.2.pdf
> > 
> > That helps debug the pipeline, and also learn about it.
> > 
> > If that helps parallelize some tasks, then that'll be welcome.
> 
> Hi Alex,

Hi Deri,

> Doing it that way actually stops the jobs being run in parallel! Each step 

Hmm, kind of makes sense.

> completes before the next step starts, whereas if you let groff build the 
> pipeline all the processes are run in parallel. Using separate steps may be 
> desirable for "understanding every little step of the groff pipeline", (and 

Still a useful thing for our build system.

> may aid debugging an issue), but once such knowledge is obtained it is 
> probably better to leave the pipelining to groff, in a production environment.

Unless performance is really a problem, I prefer the understanding and
debugging aid.  It'll help not only me, but others who see the project
and would like to learn how all this magic works.

> > > The time saved would be absolutely minimal. It is obvious that to produce
> > > a
> > > pdf containing all the man pages then all the man pages have to be
> > > consumed by groff, not just the page which has changed.
> > 
> > But do you need to run the entire pipeline, or can you reuse most of it?
> > I can process in parallel much faster, with `make -jN ...`.  I guess
> > the .pdf.troff files can be reused; maybe even the .pdf.set ones?
> > 
> > Could you change the script at least to produce intermediary files as in
> > the pipeline shown above?  As many as possible would be excellent.
> 
> Perhaps it would help if I explain the stages of my script. First a look at 
> what the script needs to do to produce a pdf of all man pages. There are too 
> many files to produce a single command line with all the filenames of each 
> man, groff has no mechanism for passing a list of filenames, so first job is 

You can always `find ... | xargs cat | troff /dev/stdin`

> to concatenate all the separate files into one input file for groff. And while 
> we are doing that, add the "magic sauce" which makes all the pdf links in the 
> book and sorts out the aliases which point to another man page.

Yep, I think I partially understood that part of the script today.  It's
what this `... | LC_ALL=C grep '^\\. *ds' |` pipeline produces and
passes to groff, right?

> After this is done there is a single troff file, called LMB.man, which is the 

That's what's currently called LinuxManBook.Z, right?

> file groff is going to process. In the script you should see something like 
> this:-
> 
> my $temp='LMB.man';

I don't.  Maybe you have a slightly different version of it?

> [...]
> 
> my $format='pdf';
> my $paper=$fpaper ||';
> my $cmdstring="-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=$paper -P-
> p$paper -rC1 -rCHECKSTYLE=3";
> my $front='LMBfront.t';
> my $frontdit='LMBfront.set';
> my $mandit='LinuxManBook.set';
> my $book="LinuxManBook.$format";
> 
> system("groff -T$format -dpaper=$paper -P-p$paper -ms $front -Z > $frontdit");

This creates the front page .set file

> system("groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 $temp $cmdstring 2>&1 | 
> LC_ALL=C grep '^\\. *ds' |

This creates the bookmarks, right?

> groff -T$format $cmdstring - $temp -Z > $mandit");

And this is the main .set file.

> system("./gro$format -F.:/usr/share/groff/current/font $frontdit $mandit -
> p$paper  > $book");

And finally we have the book.

> 
> (This includes changes by Brian Inglish ts). If you remove the lines which 
> call system you will end up with just the single file LMB.man (in about a 
> quarter of a second). You can treat this file just the same as your single 
> page example if you want to.
> 
> The first system call creates the title page from the troff source file 
> LMBfront.t and produces LMBfront.set, this can be added to your makefile as an 
> entirely separate rule depending on whether the .set file needs to be built.
> 
> The second and third system calls are the calls to groff which could be put 
> into your makefile or split into separate stages to avoid parallelism.
> 
> The second system call produces LinuxManBook.set and the third system combines 
> this with LMBfront.set to produce the pdf.
> 
> The "./" in the third system call is because I gave you a pre-release gropdf, 
> you may be using the released 1.23.0 gropdf now.
> 
> > > On my system this takes about 18
> > > seconds to produce the 2800+ pages of the book. Of this, a quarter of a
> > > second is consumed by the "magic" part of the script, the rest of the 18
> > > seconds is consumed by calls to groff and gropdf.
> > 
> > But how much of that work needs to be on a single process?  I bought a
> > new CPU with 24 cores.  Gotta use them all  :D
> 
> I realise you are having difficulty in letting go of your idea of re-using 
> previous work, rather than starting afresh each time. Imagine a single word 
> change in one man page causes it to grow from 2 pages to 3, so all links to 
> pages after this changed entry would be one page adrift. This is why very 
> little previous work is useful, and why the whole book has to be dealt with as 
> a single process.

Does such a change need re-running troff(1)?  Or is gropdf(1) enough?  If
troff(1)

My problem is probably that I don't know what's done by `gropdf`, and
what's done by `troff -Tpdf`.  I was hoping that `troff -Tpdf` still
didn't need to know about the entire book, and that only gropdf(1) would
need that.

> If each entry was processed separately, as you would like to 
> use all your shiny new cores, how would the process dealing with accept(2) 
> know which page socket(2) would be on when it adds it as a link in the text. I 
> hope you can see that at some point it has to be treated as a homogenous whole 
> in order calculate correct links between entries.
> 
> > > So any splitting of the perl script is
> > > only going to have an effect on the quarter of a second!
> > > 
> > > I don't understand why the perl script can't be included in your make file
> > > as part of build-pdf target.
> > 
> > It can.  I just prefer to be strict about the Makefile having "one rule
> > per each file", while currently the script generates 4 files (T, two
> > .Z's, and the .pdf).
> 
> Explained how to separate above so that the script only generates LMB.man and 
> the system calls moved to the makefile.

Thanks!

> > > Presumably it would be dependent on running after
> > > the scripts which add the revision label and date to each man page.
> > 
> > I only set the revision and date on dist tarballs.  For the git HEAD
> > book, I'd keep the (unreleased) version and (date).  So, no worries
> > there.
> 
> Given that you seem to intend to offer these interim books as a download, it 
> would make sense if they included either a date or git commit ID to 
> differenciate them, if someone queries something it would be useful to know 
> exactly what they were looking at.

The books for releases are available at

<https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf>

(replace the version numbers for other versions, or navigate the dirs)
I need to document that in the README of the project.

For git HEAD, I plan to have something like

<https://www.alejandro-colomar.es/share/dist/man-pages/git/man-pages-HEAD.pdf>

It's mainly intended for easily checking what git HEAD looks like, and
discard that later.  If the audience asks for version numbers, though,
I could create provide `git --describe` versions and dates in the pages.

> Cheers 
> 
> Deri
> 
> > > > Since I don't understand Perl, and don't know much of gropdf(1) either,
> > > > I need help.
> > > > 
> > > > Maybe Deri or Branden can help with that.  If anyone else understands it
> > > > and can also help, that's very welcome too!
> > > 
> > > You are probably better placed to add the necessaries to your makefile.
> > > You
> > > would then just need to remember to make build-pdf any time you alter one
> > > of the source man pages. Since you are manually running my script to
> > > produce the pdf, it should not be difficult to automate it in a makefile.
> > > 
> > > > Then I could install a hook in my server that runs
> > > > 
> > > > 	$ make build-pdf docdir=/srv/www/...
> > > 
> > > And wait 18s each time the hook is actioned!! Or, set the build to place
> > > the generated pdf somewhere in /srv/www/... and include the build in your
> > > normal workflow when a man page is changed.
> > 
> > Hmm.  I still hope some of it can be parallelized, but 18s could be
> > reasonable, if the server does that in the background after pushing.
> > My old raspberry pi would burn, but the new computer should handle that
> > just fine.
> 
> I'm confused. The 18s is how long it takes to generate the book, so if the 
> book is built in response to an access to a particular url the http server 
> can't start "pushing" for the 18s, then addon the transfer time for the pdf 
> and I suspect you will have a lot of aborted transfers. Additionally, the 
> script, and any makefile equivalent you write, is not designed for concurrent 
> invocation, so if two people visit the same url within the 18 second window 
> neither user will receive a valid pdf.

No, my intention is that whenever I `git push` via SSH, the receiving
server runs `make build-book-pdf` after receiving the changes.  That is
run after the git SSH connection has closed, so I wouldn't notice.

HTTP connections wouldn't trigger anything in my server, except Nginx
serving the file, of course.

> I advise the build becomes part of your workflow after making changes, and 
> then place the pdf in a location where it can be served by the http server.
> 
> Your model of slicing and dicing man pages to be processed individually is 
> doable using a website to serve the individual pages, see:-
> 
> http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept
> 
> This is running on a 1" cube no more powerful than a raspberry pi 3. The 
> difference is that the "magic sauce" added to each man page sets the links to 
> external http calls back to itself to produce another man page, rather than 
> internal links to another part of the pdf. You can get an index of all the man 
> pages, on the (very old) system, here.
> 
> http://chuzzlewit.co.uk/

Yep, I've seen that server :)
Long term I also intend to provide one-page PDFs and HTML files of the
pages.  Although I prefer pre-generating them, instead of on-demand.
Maybe a git hook, or maybe a cron job that re-generates them once a day
or so.

Cheers,
Alex

> 
> Cheers 
> 
> Deri

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-11-19 20:55 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
2023-11-04 19:33 ` Alejandro Colomar
2023-11-04 21:18   ` Jonny Grant
2023-11-05  1:36     ` Alejandro Colomar
2023-11-05 21:16   ` Jonny Grant
2023-11-05 23:31     ` Alejandro Colomar
2023-11-07 11:52       ` Jonny Grant
2023-11-07 13:23         ` Alejandro Colomar
2023-11-07 14:19           ` Jonny Grant
2023-11-07 16:17             ` Alejandro Colomar
2023-11-07 17:00               ` Jonny Grant
2023-11-07 17:20                 ` Alejandro Colomar
2023-11-08  6:18               ` Oskari Pirhonen
2023-11-08  9:51                 ` Alejandro Colomar
2023-11-08  9:59                   ` Thorsten Kukuk
2023-11-08 15:09                     ` Alejandro Colomar
     [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
2023-11-08 15:44                       ` Thorsten Kukuk
2023-11-08 17:26                         ` Adhemerval Zanella Netto
2023-11-08 14:06                   ` Zack Weinberg
2023-11-08 15:07                     ` Alejandro Colomar
2023-11-08 19:45                       ` G. Branden Robinson
2023-11-08 21:35                       ` Carlos O'Donell
2023-11-08 22:11                         ` Alejandro Colomar
2023-11-08 23:31                           ` Paul Eggert
2023-11-09  0:29                             ` Alejandro Colomar
2023-11-09 10:13                               ` Jonny Grant
2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-09 14:06                                   ` catenate vs concatenate Jonny Grant
2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
2023-11-27 15:08                                     ` Alejandro Colomar
2023-11-27 15:13                                       ` Alejandro Colomar
2023-11-27 16:59                                       ` G. Branden Robinson
2023-11-27 18:35                                         ` Zack Weinberg
2023-11-27 23:45                                           ` G. Branden Robinson
2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-09 14:05                                   ` Jonny Grant
2023-11-09 15:04                                     ` Alejandro Colomar
2023-11-08 19:04                   ` DJ Delorie
2023-11-08 19:40                     ` Alejandro Colomar
2023-11-08 19:58                       ` DJ Delorie
2023-11-08 20:13                         ` Alejandro Colomar
2023-11-08 21:07                           ` DJ Delorie
2023-11-08 21:50                             ` Alejandro Colomar
2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
2023-11-08 23:06                                 ` Paul Eggert
2023-11-08 23:28                                   ` DJ Delorie
2023-11-09  0:24                                   ` Alejandro Colomar
2023-11-09 14:11                                   ` Jonny Grant
2023-11-09 14:35                                     ` Alejandro Colomar
2023-11-09 14:47                                       ` Jonny Grant
2023-11-09 15:02                                         ` Alejandro Colomar
2023-11-09 17:30                                           ` DJ Delorie
2023-11-09 17:54                                             ` Andreas Schwab
2023-11-09 18:00                                             ` Alejandro Colomar
2023-11-09 19:42                                             ` Jonny Grant
2023-11-09  7:23                                 ` Oskari Pirhonen
2023-11-09 15:20                                 ` [PATCH v2 1/2] " Alejandro Colomar
2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
2023-11-10  5:47                                   ` Oskari Pirhonen
2023-11-10 10:47                                     ` Alejandro Colomar
2023-11-08  2:12           ` strncpy clarify result may not be null terminated Matthew House
2023-11-08 19:33             ` Alejandro Colomar
2023-11-08 19:40               ` Alejandro Colomar
2023-11-09  3:13               ` Matthew House
2023-11-09 10:26                 ` Jonny Grant
2023-11-09 10:31                 ` Jonny Grant
2023-11-09 11:38                   ` Alejandro Colomar
2023-11-09 12:43                     ` Alejandro Colomar
2023-11-09 12:51                     ` Xi Ruoyao
2023-11-09 14:01                       ` Alejandro Colomar
2023-11-09 18:11                     ` Paul Eggert
2023-11-09 23:48                       ` Alejandro Colomar
2023-11-10  5:36                         ` Paul Eggert
2023-11-10 11:05                           ` Alejandro Colomar
2023-11-10 11:47                             ` Alejandro Colomar
2023-11-10 17:58                             ` Paul Eggert
2023-11-10 18:36                               ` Alejandro Colomar
2023-11-10 20:19                                 ` Alejandro Colomar
2023-11-10 23:44                                   ` Jonny Grant
2023-11-10 19:52                               ` Alejandro Colomar
2023-11-10 22:14                                 ` Paul Eggert
2023-11-11 21:13                                   ` Alejandro Colomar
2023-11-11 22:20                                     ` Paul Eggert
2023-11-12  9:52                                     ` Jonny Grant
2023-11-12 10:59                                       ` Alejandro Colomar
2023-11-12 20:49                                         ` Paul Eggert
2023-11-12 21:00                                           ` Alejandro Colomar
2023-11-12 21:45                                             ` Alejandro Colomar
2023-11-13 23:46                                           ` Jonny Grant
2023-11-17 21:57                                         ` Jonny Grant
2023-11-18 10:12                                           ` Alejandro Colomar
2023-11-18 23:03                                             ` Jonny Grant
2023-11-10 11:36                           ` Jonny Grant
2023-11-10 13:15                             ` Alejandro Colomar
2023-11-18 23:40                               ` Jonny Grant
2023-11-20 11:56                                 ` Jonny Grant
2023-11-20 15:12                                   ` Alejandro Colomar
2023-11-20 23:08                                     ` Jonny Grant
2023-11-20 23:42                                       ` Alejandro Colomar
2023-11-10 11:23                     ` Jonny Grant
2023-11-09 12:23                 ` Alejandro Colomar
2023-11-09 12:35                   ` Alejandro Colomar
2023-11-10  7:06                   ` Oskari Pirhonen
2023-11-10 11:18                     ` Alejandro Colomar
2023-11-11  7:55                       ` Oskari Pirhonen
2023-11-10 16:06                   ` Matthew House
2023-11-10 17:48                     ` Alejandro Colomar
2023-11-13 15:01                       ` Matthew House
2023-11-11 20:55                     ` Jonny Grant
2023-11-11 21:15                       ` Jonny Grant
2023-11-11 22:36                         ` Alejandro Colomar
2023-11-11 23:19                           ` Alejandro Colomar
2023-11-17 21:46                           ` Jonny Grant
2023-11-18  9:37                             ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-19  0:22                               ` Deri
2023-11-19  1:19                                 ` Alejandro Colomar
2023-11-19  9:29                                   ` Alejandro Colomar
2023-11-19 16:21                                   ` Deri
2023-11-19 20:58                                     ` Alejandro Colomar [this message]
2023-11-20  0:46                                       ` G. Branden Robinson
2023-11-20  9:43                                         ` Alejandro Colomar
2023-11-18  9:44                             ` NULL safety " Alejandro Colomar
2023-11-18 23:21                               ` NULL safety Jonny Grant
2023-11-24 22:25                                 ` Alejandro Colomar
2023-11-25  0:57                                   ` Jonny Grant
2023-11-10 10:40               ` strncpy clarify result may not be null terminated Stefan Puiu
2023-11-10 11:06                 ` Jonny Grant
2023-11-10 11:20                 ` Alejandro Colomar
2023-11-12  9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
2023-11-12  9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-12  9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-17 21:43   ` Jonny Grant
2023-11-18  0:25     ` Signing all patches and email to this list Matthew House
2023-11-18 23:24       ` Jonny Grant
2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZVp24b1vXfoS8ABi@devuan \
    --to=alx@kernel.org \
    --cc=deri@chuzzlewit.myzen.co.uk \
    --cc=jg@jguk.org \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.