From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D436DF4B for ; Sun, 19 Nov 2023 20:55:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Sv7Ofb0r" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8D303C433C7; Sun, 19 Nov 2023 20:54:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1700427300; bh=q0eglp9dSiD9aejz6PB+lCxa4oED7aRsG3V9av/HWqk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Sv7Ofb0rc/yq+nobxQgaSHz4DbMzIcFV7BGA2JcfyuM16t0UQdy7BgquxRgENy7Mz MCH/X2TLBmg/7UqO3QQcYOi1KlvVphwkyJdQejY77tyw/6aEecljdTQAlS/8JkbEA9 9Du++WiGSV6Uitwuh06sxVJ6PY3ayspmYjNYkO62uzn+K6BrDodpumJc5Dmxu0tpPj wtEbOfqPk0aBibF2MAV63kamoOzs9WFKFOTjLUwIPq3cMWosMAXtS92UU3/bbf1iGe bhTxON2fXpnmYzr7QoVz8legjZtCvPr86lq1h+v91W8PXw/Qq784BYT4yseCvNdjTI zoRRddR8u7R5w== Date: Sun, 19 Nov 2023 21:58:03 +0100 From: Alejandro Colomar To: Deri Cc: Jonny Grant , linux-man Subject: Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Message-ID: References: <4567510.7DdL66CAHx@pip> <12344046.3XHVMEB1Be@pip> Precedence: bulk X-Mailing-List: linux-man@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="lBaDmkFbLk/X5oJC" Content-Disposition: inline In-Reply-To: <12344046.3XHVMEB1Be@pip> --lBaDmkFbLk/X5oJC Content-Type: text/plain; protected-headers=v1; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Date: Sun, 19 Nov 2023 21:58:03 +0100 From: Alejandro Colomar To: Deri Cc: Jonny Grant , linux-man Subject: Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) On Sun, Nov 19, 2023 at 04:21:45PM +0000, Deri wrote: > > $ touch man2/membarrier.2 > > $ make build-pdf > > PRECONV .tmp/man/man2/membarrier.2.tbl > > TBL .tmp/man/man2/membarrier.2.eqn > > EQN .tmp/man/man2/membarrier.2.pdf.troff > > TROFF .tmp/man/man2/membarrier.2.pdf.set > > GROPDF .tmp/man/man2/membarrier.2.pdf > >=20 > > That helps debug the pipeline, and also learn about it. > >=20 > > If that helps parallelize some tasks, then that'll be welcome. >=20 > Hi Alex, Hi Deri, > Doing it that way actually stops the jobs being run in parallel! Each ste= p=20 Hmm, kind of makes sense. > completes before the next step starts, whereas if you let groff build the= =20 > pipeline all the processes are run in parallel. Using separate steps may = be=20 > desirable for "understanding every little step of the groff pipeline", (a= nd=20 Still a useful thing for our build system. > may aid debugging an issue), but once such knowledge is obtained it is=20 > probably better to leave the pipelining to groff, in a production environ= ment. Unless performance is really a problem, I prefer the understanding and debugging aid. It'll help not only me, but others who see the project and would like to learn how all this magic works. > > > The time saved would be absolutely minimal. It is obvious that to pro= duce > > > a > > > pdf containing all the man pages then all the man pages have to be > > > consumed by groff, not just the page which has changed. > >=20 > > But do you need to run the entire pipeline, or can you reuse most of it? > > I can process in parallel much faster, with `make -jN ...`. I guess > > the .pdf.troff files can be reused; maybe even the .pdf.set ones? > >=20 > > Could you change the script at least to produce intermediary files as in > > the pipeline shown above? As many as possible would be excellent. >=20 > Perhaps it would help if I explain the stages of my script. First a look = at=20 > what the script needs to do to produce a pdf of all man pages. There are = too=20 > many files to produce a single command line with all the filenames of eac= h=20 > man, groff has no mechanism for passing a list of filenames, so first job= is=20 You can always `find ... | xargs cat | troff /dev/stdin` > to concatenate all the separate files into one input file for groff. And = while=20 > we are doing that, add the "magic sauce" which makes all the pdf links in= the=20 > book and sorts out the aliases which point to another man page. Yep, I think I partially understood that part of the script today. It's what this `... | LC_ALL=3DC grep '^\\. *ds' |` pipeline produces and passes to groff, right? > After this is done there is a single troff file, called LMB.man, which is= the=20 That's what's currently called LinuxManBook.Z, right? > file groff is going to process. In the script you should see something li= ke=20 > this:- >=20 > my $temp=3D'LMB.man'; I don't. Maybe you have a slightly different version of it? > [...] >=20 > my $format=3D'pdf'; > my $paper=3D$fpaper ||'; > my $cmdstring=3D"-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=3D$pa= per -P- > p$paper -rC1 -rCHECKSTYLE=3D3"; > my $front=3D'LMBfront.t'; > my $frontdit=3D'LMBfront.set'; > my $mandit=3D'LinuxManBook.set'; > my $book=3D"LinuxManBook.$format"; >=20 > system("groff -T$format -dpaper=3D$paper -P-p$paper -ms $front -Z > $fron= tdit"); This creates the front page .set file > system("groff -z -dPDF.EXPORT=3D1 -dLABEL.REFS=3D1 $temp $cmdstring 2>&1 = |=20 > LC_ALL=3DC grep '^\\. *ds' | This creates the bookmarks, right? > groff -T$format $cmdstring - $temp -Z > $mandit"); And this is the main .set file. > system("./gro$format -F.:/usr/share/groff/current/font $frontdit $mandit - > p$paper > $book"); And finally we have the book. >=20 > (This includes changes by Brian Inglish ts). If you remove the lines whic= h=20 > call system you will end up with just the single file LMB.man (in about a= =20 > quarter of a second). You can treat this file just the same as your singl= e=20 > page example if you want to. >=20 > The first system call creates the title page from the troff source file= =20 > LMBfront.t and produces LMBfront.set, this can be added to your makefile = as an=20 > entirely separate rule depending on whether the .set file needs to be bui= lt. >=20 > The second and third system calls are the calls to groff which could be p= ut=20 > into your makefile or split into separate stages to avoid parallelism. >=20 > The second system call produces LinuxManBook.set and the third system com= bines=20 > this with LMBfront.set to produce the pdf. >=20 > The "./" in the third system call is because I gave you a pre-release gro= pdf,=20 > you may be using the released 1.23.0 gropdf now. >=20 > > > On my system this takes about 18 > > > seconds to produce the 2800+ pages of the book. Of this, a quarter of= a > > > second is consumed by the "magic" part of the script, the rest of the= 18 > > > seconds is consumed by calls to groff and gropdf. > >=20 > > But how much of that work needs to be on a single process? I bought a > > new CPU with 24 cores. Gotta use them all :D >=20 > I realise you are having difficulty in letting go of your idea of re-usin= g=20 > previous work, rather than starting afresh each time. Imagine a single wo= rd=20 > change in one man page causes it to grow from 2 pages to 3, so all links = to=20 > pages after this changed entry would be one page adrift. This is why very= =20 > little previous work is useful, and why the whole book has to be dealt wi= th as=20 > a single process. Does such a change need re-running troff(1)? Or is gropdf(1) enough? If troff(1) My problem is probably that I don't know what's done by `gropdf`, and what's done by `troff -Tpdf`. I was hoping that `troff -Tpdf` still didn't need to know about the entire book, and that only gropdf(1) would need that. > If each entry was processed separately, as you would like to=20 > use all your shiny new cores, how would the process dealing with accept(2= )=20 > know which page socket(2) would be on when it adds it as a link in the te= xt. I=20 > hope you can see that at some point it has to be treated as a homogenous = whole=20 > in order calculate correct links between entries. >=20 > > > So any splitting of the perl script is > > > only going to have an effect on the quarter of a second! > > >=20 > > > I don't understand why the perl script can't be included in your make= file > > > as part of build-pdf target. > >=20 > > It can. I just prefer to be strict about the Makefile having "one rule > > per each file", while currently the script generates 4 files (T, two > > .Z's, and the .pdf). >=20 > Explained how to separate above so that the script only generates LMB.man= and=20 > the system calls moved to the makefile. Thanks! > > > Presumably it would be dependent on running after > > > the scripts which add the revision label and date to each man page. > >=20 > > I only set the revision and date on dist tarballs. For the git HEAD > > book, I'd keep the (unreleased) version and (date). So, no worries > > there. >=20 > Given that you seem to intend to offer these interim books as a download,= it=20 > would make sense if they included either a date or git commit ID to=20 > differenciate them, if someone queries something it would be useful to kn= ow=20 > exactly what they were looking at. The books for releases are available at (replace the version numbers for other versions, or navigate the dirs) I need to document that in the README of the project. For git HEAD, I plan to have something like It's mainly intended for easily checking what git HEAD looks like, and discard that later. If the audience asks for version numbers, though, I could create provide `git --describe` versions and dates in the pages. > Cheers=20 >=20 > Deri >=20 > > > > Since I don't understand Perl, and don't know much of gropdf(1) eit= her, > > > > I need help. > > > >=20 > > > > Maybe Deri or Branden can help with that. If anyone else understan= ds it > > > > and can also help, that's very welcome too! > > >=20 > > > You are probably better placed to add the necessaries to your makefil= e. > > > You > > > would then just need to remember to make build-pdf any time you alter= one > > > of the source man pages. Since you are manually running my script to > > > produce the pdf, it should not be difficult to automate it in a makef= ile. > > >=20 > > > > Then I could install a hook in my server that runs > > > >=20 > > > > $ make build-pdf docdir=3D/srv/www/... > > >=20 > > > And wait 18s each time the hook is actioned!! Or, set the build to pl= ace > > > the generated pdf somewhere in /srv/www/... and include the build in = your > > > normal workflow when a man page is changed. > >=20 > > Hmm. I still hope some of it can be parallelized, but 18s could be > > reasonable, if the server does that in the background after pushing. > > My old raspberry pi would burn, but the new computer should handle that > > just fine. >=20 > I'm confused. The 18s is how long it takes to generate the book, so if th= e=20 > book is built in response to an access to a particular url the http serve= r=20 > can't start "pushing" for the 18s, then addon the transfer time for the p= df=20 > and I suspect you will have a lot of aborted transfers. Additionally, the= =20 > script, and any makefile equivalent you write, is not designed for concur= rent=20 > invocation, so if two people visit the same url within the 18 second wind= ow=20 > neither user will receive a valid pdf. No, my intention is that whenever I `git push` via SSH, the receiving server runs `make build-book-pdf` after receiving the changes. That is run after the git SSH connection has closed, so I wouldn't notice. HTTP connections wouldn't trigger anything in my server, except Nginx serving the file, of course. > I advise the build becomes part of your workflow after making changes, an= d=20 > then place the pdf in a location where it can be served by the http serve= r. >=20 > Your model of slicing and dicing man pages to be processed individually i= s=20 > doable using a website to serve the individual pages, see:- >=20 > http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept >=20 > This is running on a 1" cube no more powerful than a raspberry pi 3. The= =20 > difference is that the "magic sauce" added to each man page sets the link= s to=20 > external http calls back to itself to produce another man page, rather th= an=20 > internal links to another part of the pdf. You can get an index of all th= e man=20 > pages, on the (very old) system, here. >=20 > http://chuzzlewit.co.uk/ Yep, I've seen that server :) Long term I also intend to provide one-page PDFs and HTML files of the pages. Although I prefer pre-generating them, instead of on-demand. Maybe a git hook, or maybe a cron job that re-generates them once a day or so. Cheers, Alex >=20 > Cheers=20 >=20 > Deri --=20 --lBaDmkFbLk/X5oJC Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmVadtsACgkQnowa+77/ 2zJRPA//VRtZV5DZATwSy7dTHQ3a6ddqeMuOeVHUC8pMPG9riblrTFKnhPtuKzQq H8qVid/hDGrED147h5Y1FjMdee62zT8CrwmMl0oxdR2e0Ald3K3I4i5RrAKAub3e B/k+3enVvSL3SY0uNuXCEPwQaMP6zslMoKRrCtWHasaZpkdU/7QZLbgPsvF2cRFS hp/Qiiu5JJeuL+3MJQVIJxJub59GuhGqOT83qXa1Wb4GEqsSDFcysiuUVwziNgL7 yTOYdbp6WX6cg2Qx5vJMb4/AG5+LQ7gZeP0V5lWXvuZu41jwEsYCLQUqLgq9nRTE IFfV46rA3TKn1YIBXdp4rd4+zqTk26sAyRavVnSkGiGbHJrGR8Jrg1yDGqJLOiRp QS/Q7btM4fCy9wg2g/q27bpNy2Gd5lNPml/rcJhZs0hR2B4XpTXuYRtklMp1O4Q2 pJ0SJdn3408+vU4KNvI7JMSASYZIZI4laEBWUb3SGCG++P5fLvWCnQWqzCmAR+gq S1WMpZeQhoAAhPx4CnuUnks0ridFzpjQ6Q+6Z4GrWHYgKsjw9C8xGdE5eCrIfzih D9C5+YNFDMerNHhCwACS+NsPGkICVQgWGfuFOK6RgWusN4oJkhA79nXvMTG9/fbb BK6FZmmVd/p1+XIdzbVb4Px18Xy/BHF5GwoFj+wnEfoRrHX+n2g= =fqqb -----END PGP SIGNATURE----- --lBaDmkFbLk/X5oJC--