From: Sam Vilain <sam@vilain.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, msysgit@googlegroups.com
Subject: Re: WIP: asciidoc replacement
Date: Wed, 03 Oct 2007 14:56:11 +1300 [thread overview]
Message-ID: <4702F6BB.60908@vilain.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0710030133020.28395@racer.site>
Johannes Schindelin wrote:
> Hi,
>
> I do not want to depend on more than necessary in msysGit, and therefore I
> started to write an asciidoc replacement.
>
> So here it is: a perl script that does a good job on many .txt files in
> Documentation/, although for some it deviates from "make man"'s output,
> and for others it is outright broken. It is meant to be run in
> Documentation/.
>
> My intention is not to fix the script for all cases, but to make patches
> to Documentation/*.txt themselves, so that they are more consistent (and
> incidentally nicer to the script).
>
> Now, I hear you already moan: "But Dscho, you know you suck at Perl!"
>
> Yeah, I know, but maybe instead of bashing on me (pun intended), you may
> want to enlighten me with tips how to make it nicer to read. (Yes, there
> are no comments; yes, I will gladly add them where appropriate; yes, html
> is just a stub.)
>
> So here, without further ado, da script:
It's pretty good, I certainly wouldn't have trouble reading or
maintaining it, but I'll give you suggestions anyway.
nice work, replacing a massive XML/XSL/etc stack with a small Perl
script ;-)
Sam.
>
> -- snip --
> #!/usr/bin/perl
Add -w for warnings, also use strict;
> $conv = new man_page();
> $conv->{manual} = 'Git Manual';
> $conv->{git_version} = 'Git ' . `cat ../GIT-VERSION-FILE`;
> $conv->{git_version} =~ s/GIT_VERSION = //;
> $conv->{git_version} =~ s/-/\\-/;
> $conv->{git_version} =~ s/\n//;
> $conv->{date} = `date +%m/%d/%Y`;
> $conv->{date} =~ s/\n//;
>
> $par = '';
> handle_file($ARGV[0]);
> $conv->finish();
>
> sub handle_text {
this function acts on globals; make them explicit arguments to the function.
> if ($par =~ /^\. /s) {
> my @lines = split(/^\. /m, $par);
> shift @lines;
> $conv->enumeration(\@lines);
> } elsif ($par =~ /^\* /s) {
uncuddle your elsif's; also consider making this a "tabular ternary"
with the actions in separate functions.
ie
$result = ( $par =~ /^\. /s ? $conv->do_enum($par) :
$par =~ /^\[verse\]/ ? $conv->do_verse($par) :
... )
However I have a suspicion that your script is doing line-based parsing
instead of recursive descent; I don't know whether that's the right
thing for asciidoc. It's actually fairly easy to convert a grammar to
code blocks using tricks from MJD's _Higher Order Perl_. Is it
necessary for the asciidoc grammar?
> my @lines = split(/^\* /m, $par);
> shift @lines;
> $conv->enumeration(\@lines, 'unnumbered');
> } elsif ($par =~ /^\[verse\]/) {
> $par =~ s/\[verse\] *\n?//;
> $conv->verse($par);
> } elsif ($par =~ /^(\t| +)/s) {
> $par =~ s/^$1//mg;
> $par =~ s/^\+$//mg;
> $conv->indent($par);
> } elsif ($par =~ /^([^\n]*)::\n((\t| +).*)$/s) {
> my ($first, $rest, $indent) = ($1, $2, $3);
> $rest =~ s/^\+$//mg;
> while ($rest =~ /^(.*?\n\n)--+\n(.*?\n)--+\n\n(.*)$/s) {
> my ($pre, $verb, $post) = ($1, $2, $3);
>
> $pre =~ s/^(\t|$indent)//mg;
> if ($first ne '') {
> $conv->begin_item($first, $pre);
> $first = '';
> } else {
> $conv->normal($pre);
> }
>
> $conv->verbatim($verb);
> $rest = $post;
> }
> $rest =~ s/^(\t|$indent)//mg;
> if ($first ne '') {
> $conv->begin_item($first, $rest);
> } else {
> $conv->normal($rest);
> }
> $conv->end_item();
> } elsif ($par =~ /^-+\n(.*\n)-+\n$/s) {
> $conv->verbatim($1);
> } else {
> $conv->normal($par);
> }
> $par = '';
> }
>
> sub handle_file {
> my $in;
> open($in, '<' . $_[0]);
> while (<$in>) {
> if (/^=+$/) {
> if ($par ne '' && length($_) >= length($par)) {
> $conv->header($par);
> $par = '';
> next;
> }
> } elsif (/^-+$/) {
> if ($par ne '' && length($_) >= length($par)) {
> $conv->section($par);
> $par = '';
> next;
> }
> } elsif (/^~+$/) {
> if ($par ne '' && length($_) >= length($par)) {
> $conv->subsection($par);
> $par = '';
> next;
> }
> } elsif (/^\[\[(.*)\]\]$/) {
> handle_text();
> $conv->anchor($1);
> next;
> } elsif (/^$/) {
> if ($par =~ /^-+\n.*[^-]\n$/s) {
> # fallthru; is verbatim, but needs more.
> } elsif ($par =~ /::\n$/s) {
> # is item, but needs more.
> next;
> } else {
> handle_text();
> next;
> }
> } elsif (/^include::(.*)\[\]$/) {
> handle_text();
> handle_file($1);
> next;
> }
>
> # convert "\--" to "--"
> s/\\--/--/g;
> # convert "\*" to "*"
> s/\\\*/*/g;
>
> # handle gitlink:
> s/gitlink:([^\[ ]*)\[(\d+)\]/sprintf "%s",
> $conv->get_link($1, $2)/ge;
> # handle link:
> s/link:([^\[ ]*)\[(.+)\]/sprintf "%s",
> $conv->get_link($1, $2, 'external')/ge;
These REs suffer from LTS (Leaning Toothpick Syndrome). Consider using
s{foo}{bar} and adding the 'x' modifier to space out groups.
>
> $par .= $_;
> }
> close($in);
> handle_text();
> }
>
> package man_page;
>
> sub new {
> my ($class) = @_;
> my $self = {
> sep => '',
> links => [],
> # generator => 'Home grown git txt2man converter'
> generator => 'DocBook XSL Stylesheets v1.71.1 <http://docbook.sf.net/>'
> };
> bless $self, $class;
> return $self;
> }
>
> sub header {
> my ($self, $text) = @_;
> $text =~ s/-/\\-/g;
>
> if ($self->{preamble_shown} == undef) {
> $title = $text;
> $title =~ s/\(\d+\)$//;
> print '.\" Title: ' . $title
> . '.\" Author: ' . "\n"
> . '.\" Generator: ' . $self->{generator} . "\n"
> . '.\" Date: ' . $self->{date} . "\n"
> . '.\" Manual: ' . $self->{manual} . "\n"
> . '.\" Source: ' . $self->{git_version} . "\n"
> . '.\"' . "\n";
> }
I'd consider a HERE-doc, or multi-line qq{ } more readable than this.
>
> $text =~ tr/a-z/A-Z/;
> my $suffix = "\"$self->{date}\" \"$self->{git_version}\""
> . " \"$self->{manual}\"";
Use qq{} when making strings with lots of embedded double quotes and
interpolation.
> $text =~ s/^(.*)\((\d+)\)$/.TH "\1" "\2" $suffix/;
> print $text;
>
> if ($self->{preamble_shown} == undef) {
> print '.\" disable hyphenation' . "\n"
> . '.nh' . "\n"
> . '.\" disable justification (adjust text to left'
> . ' margin only)' . "\n"
> . '.ad l' . "\n";
Using commas rather than "." will safe you a concat when printing to
filehandles, but that's a very small nit to pick :)
> $self->{preamble_shown} = 1;
> }
>
> $self->{last_op} = 'header';
> }
>
> sub section {
> my ($self, $text) = @_;
>
> $text =~ tr/a-z/A-Z/;
> $text =~ s/^(.*)$/.SH "\1"/;
>
> print $text;
>
> $self->{last_op} = 'section';
> }
>
> sub subsection {
> my ($self, $text) = @_;
>
> $text =~ s/^(.*)$/.SS "\1"/;
>
> print $text;
>
> $self->{last_op} = 'subsection';
> }
>
> sub get_link {
> my ($self, $command, $section, $option) = @_;
>
> if ($option eq 'external') {
> my $links = $self->{links};
> push(@$links, $command);
> $command =~ s/\.html$//;
> $command =~ s/-/ /g;
> push(@$links, $command);
> return '\fI' . $command . '\fR\&[1]';
> } else {
> return '\fB' . $command . '\fR(' . $section . ')';
> }
> }
>
> sub common {
> my ($self, $text, $option) = @_;
>
> # escape backslashes, but not in "\n", "\&" or "\fB"
> $text =~ s/\\(?!n|f[A-Z]|&)/\\\\/g;
> # escape "-"
> $text =~ s/-/\\-/g;
> # handle ...
> $text =~ s/(\.\.\.)/\\&\1/g;
> # remove double space after full stop or comma
> $text =~ s/([\.,]) /\1 /g;
>
> if ($option ne 'no-markup') {
> # make 'italic'
> $text =~ s/'([^'\n]*)'/\\fI\1\\fR/g;
> # ignore `
> $text =~ s/`//g;
> # make *bold*
> $text =~ s/\*([^\*\n]*)\*/\\fB\1\\fR/g;
> # handle <<sections>
> $text =~ s/<<([^>]*)>>/the section called \\(lq\1\\(rq/g;
Hmm, that regex would not match for <<foo > bar>>, if you care you'd
need to write something like <<((?:[^>]+|>[^>])*)>>
> }
>
> return $text;
> }
>
> sub normal {
> my ($self, $text) = @_;
>
> if ($text eq "") {
> return;
> }
>
> $text = $self->common($text);
>
> $text =~ s/ *\n(.)/ \1/g;
>
> if ($self->{last_op} eq 'normal') {
> print "\n";
> }
>
> print $text;
>
> $self->{last_op} = 'normal';
> }
>
> sub verse {
> my ($self, $text) = @_;
>
> $text = $self->common($text);
> $text =~ s/^\t/ /mg;
>
> print ".sp\n.RS 4\n.nf\n" . $text . ".fi\n.RE\n";
>
> $self->{last_op} = 'verse';
> }
>
> sub enumeration {
> my ($self, $text, $option) = @_;
>
> my $counter = 0;
> foreach $line (@$text) {
> $counter++;
> print ".TP 4\n"
> . ($option eq 'unnumbered' ? '\(bu' : $counter . '.')
> . "\n"
> . $self->common($line);
> }
>
> $self->{last_op} = 'enumeration';
> }
>
> sub begin_item {
> my ($self, $item, $text) = @_;
>
> $item = $self->common($item);
> $text = $self->common($text);
>
> $text =~ s/([^\n]) *\n([^\n])/\1 \2/g;
"." is the same as [^\n] (without the 's' modifier).
>
> print ".PP\n" . $item . "\n.RS 4\n" . $text;
>
> $self->{last_op} = 'item';
> }
>
> sub end_item {
> my ($self) = @_;
>
> print ".RE\n";
>
> $self->{last_op} = 'end_item';
> }
>
> sub indent {
> my ($self, $text) = @_;
>
> $text = $self->common($text, 'no-markup');
> $text =~ s/^\t/ /mg;
>
> if ($self->{last_op} eq 'normal') {
> print "\n";
> }
>
> print ".sp\n.RS 4\n.nf\n" . $text . ".fi\n.RE\n";
>
> $self->{last_op} = 'indent';
> }
>
> sub verbatim {
> my ($self, $text) = @_;
>
> $text = $self->common($text, 'no-markup');
>
> # convert tabs to spaces
> $text =~ s/^\t/ /mg;
> # remove trailing empty lines
> $text =~ s/\n\n*$/\n/;
>
> if ($self->{last_op} eq 'normal') {
> print "\n";
> }
>
> print ".sp\n.RS 4\n.nf\n.ft C\n" . $text . ".ft\n\n.fi\n.RE\n";
>
> $self->{last_op} = 'verbatim';
> }
>
> sub anchor {
> my ($self, $text) = @_;
>
> $self->{last_op} = 'anchor';
> }
>
> sub finish {
> my ($self) = @_;
> my $links = $self->{links};
>
> if ($#$links >= 0) {
> print '.SH "REFERENCES"' . "\n";
> my $i = 1;
> while ($#$links >= 0) {
just use if (@$links) and while (@$links)
> my $ref = shift(@$links);
> $ref =~ s/-/\\-/g;
> my $label = shift(@$links);
> printf (".IP \"% 2d.\" 4\n%s\n.RS 4\n\\%%%s\n.RE\n",
> $i++, $label, $ref);
> }
> } else {
> print "\n";
> }
> }
>
> package html_page;
>
> sub new {
> my ($class) = @_;
> my $self = {};
> bless $self, $class;
> return $self;
> }
>
> -- snap --
>
> Ciao,
> Dscho
>
> P.S.: I need to catch some Zs, and do some real work, so do not be
> surprised if I do not respond within the next 24 hours.
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2007-10-03 1:56 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-03 0:42 WIP: asciidoc replacement Johannes Schindelin
2007-10-03 1:56 ` Sam Vilain [this message]
2007-10-03 4:23 ` Johannes Schindelin
2007-10-03 4:51 ` Jeff King
2007-10-03 13:55 ` J. Bruce Fields
2007-10-04 4:13 ` Sam Vilain
2007-10-04 12:41 ` Johannes Schindelin
2007-10-03 6:40 ` Wincent Colaiuta
2007-10-03 4:48 ` Junio C Hamano
2007-10-03 6:34 ` Wincent Colaiuta
2007-10-03 8:12 ` David Kastrup
2007-10-03 10:05 ` Wincent Colaiuta
2007-10-03 10:25 ` David Kastrup
2007-10-03 10:52 ` Sam Ravnborg
2007-10-03 13:47 ` J. Bruce Fields
2007-10-03 14:01 ` David Kastrup
2007-10-03 10:57 ` Junio C Hamano
2007-10-03 17:46 ` Sam Ravnborg
2007-10-03 18:57 ` Johannes Schindelin
2007-10-03 19:21 ` Sam Ravnborg
2007-10-04 6:55 ` Martin Langhoff
2007-10-04 20:58 ` David Kastrup
2007-10-04 22:49 ` Martin Langhoff
2007-10-03 11:50 ` [msysGit] " Johannes Schindelin
2007-10-03 12:02 ` David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4702F6BB.60908@vilain.net \
--to=sam@vilain.net \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=msysgit@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).