Re: WIP: asciidoc replacement

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sam Vilain <sam@vilain.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, msysgit@googlegroups.com
Subject: Re: WIP: asciidoc replacement
Date: Wed, 03 Oct 2007 14:56:11 +1300	[thread overview]
Message-ID: <4702F6BB.60908@vilain.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0710030133020.28395@racer.site>

Johannes Schindelin wrote:
> Hi,
> 
> I do not want to depend on more than necessary in msysGit, and therefore I 
> started to write an asciidoc replacement.
> 
> So here it is: a perl script that does a good job on many .txt files in 
> Documentation/, although for some it deviates from "make man"'s output, 
> and for others it is outright broken.  It is meant to be run in 
> Documentation/.
> 
> My intention is not to fix the script for all cases, but to make patches 
> to Documentation/*.txt themselves, so that they are more consistent (and 
> incidentally nicer to the script).
> 
> Now, I hear you already moan: "But Dscho, you know you suck at Perl!"
> 
> Yeah, I know, but maybe instead of bashing on me (pun intended), you may 
> want to enlighten me with tips how to make it nicer to read.  (Yes, there 
> are no comments; yes, I will gladly add them where appropriate; yes, html 
> is just a stub.)
> 
> So here, without further ado, da script:

It's pretty good, I certainly wouldn't have trouble reading or
maintaining it, but I'll give you suggestions anyway.

nice work, replacing a massive XML/XSL/etc stack with a small Perl
script ;-)

Sam.

> 
> -- snip --
> #!/usr/bin/perl

Add -w for warnings, also use strict;

> $conv = new man_page();
> $conv->{manual} = 'Git Manual';
> $conv->{git_version} = 'Git ' . `cat ../GIT-VERSION-FILE`;
> $conv->{git_version} =~ s/GIT_VERSION = //;
> $conv->{git_version} =~ s/-/\\-/;
> $conv->{git_version} =~ s/\n//;
> $conv->{date} = `date +%m/%d/%Y`;
> $conv->{date} =~ s/\n//;
> 
> $par = '';
> handle_file($ARGV[0]);
> $conv->finish();
> 
> sub handle_text {

this function acts on globals; make them explicit arguments to the function.

> 	if ($par =~ /^\. /s) {
> 		my @lines = split(/^\. /m, $par);
> 		shift @lines;
> 		$conv->enumeration(\@lines);
> 	} elsif ($par =~ /^\* /s) {

uncuddle your elsif's; also consider making this a "tabular ternary"
with the actions in separate functions.

ie

$result = ( $par =~ /^\. /s      ? $conv->do_enum($par)    :
            $par =~ /^\[verse\]/ ? $conv->do_verse($par)  :
            ... )

However I have a suspicion that your script is doing line-based parsing
instead of recursive descent; I don't know whether that's the right
thing for asciidoc.  It's actually fairly easy to convert a grammar to
code blocks using tricks from MJD's _Higher Order Perl_.  Is it
necessary for the asciidoc grammar?

> 		my @lines = split(/^\* /m, $par);
> 		shift @lines;
> 		$conv->enumeration(\@lines, 'unnumbered');
> 	} elsif ($par =~ /^\[verse\]/) {
> 		$par =~ s/\[verse\] *\n?//;
> 		$conv->verse($par);
> 	} elsif ($par =~ /^(\t|  +)/s) {
> 		$par =~ s/^$1//mg;
> 		$par =~ s/^\+$//mg;
> 		$conv->indent($par);
> 	} elsif ($par =~ /^([^\n]*)::\n((\t|  +).*)$/s) {
> 		my ($first, $rest, $indent) = ($1, $2, $3);
> 		$rest =~ s/^\+$//mg;
> 		while ($rest =~ /^(.*?\n\n)--+\n(.*?\n)--+\n\n(.*)$/s) {
> 			my ($pre, $verb, $post) = ($1, $2, $3);
> 
> 			$pre =~ s/^(\t|$indent)//mg;
> 			if ($first ne '') {
> 				$conv->begin_item($first, $pre);
> 				$first = '';
> 			} else {
> 				$conv->normal($pre);
> 			}
> 
> 			$conv->verbatim($verb);
> 			$rest = $post;
> 		}
> 		$rest =~ s/^(\t|$indent)//mg;
> 		if ($first ne '') {
> 			$conv->begin_item($first, $rest);
> 		} else {
> 			$conv->normal($rest);
> 		}
> 		$conv->end_item();
> 	} elsif ($par =~ /^-+\n(.*\n)-+\n$/s) {
> 		$conv->verbatim($1);
> 	} else {
> 		$conv->normal($par);
> 	}
> 	$par = '';
> }
> 
> sub handle_file {
> 	my $in;
> 	open($in, '<' . $_[0]);
> 	while (<$in>) {
> 		if (/^=+$/) {
> 			if ($par ne '' && length($_) >= length($par)) {
> 				$conv->header($par);
> 				$par = '';
> 				next;
> 			}
> 		} elsif (/^-+$/) {
> 			if ($par ne '' && length($_) >= length($par)) {
> 				$conv->section($par);
> 				$par = '';
> 				next;
> 			}
> 		} elsif (/^~+$/) {
> 			if ($par ne '' && length($_) >= length($par)) {
> 				$conv->subsection($par);
> 				$par = '';
> 				next;
> 			}
> 		} elsif (/^\[\[(.*)\]\]$/) {
> 			handle_text();
> 			$conv->anchor($1);
> 			next;
> 		} elsif (/^$/) {
> 			if ($par =~ /^-+\n.*[^-]\n$/s) {
> 				# fallthru; is verbatim, but needs more.
> 			} elsif ($par =~ /::\n$/s) {
> 				# is item, but needs more.
> 				next;
> 			} else {
> 				handle_text();
> 				next;
> 			}
> 		} elsif (/^include::(.*)\[\]$/) {
> 			handle_text();
> 			handle_file($1);
> 			next;
> 		}
> 
> 		# convert "\--" to "--"
> 		s/\\--/--/g;
> 		# convert "\*" to "*"
> 		s/\\\*/*/g;
> 
> 		# handle gitlink:
> 		s/gitlink:([^\[ ]*)\[(\d+)\]/sprintf "%s",
> 			$conv->get_link($1, $2)/ge;
> 		# handle link:
> 		s/link:([^\[ ]*)\[(.+)\]/sprintf "%s",
> 			$conv->get_link($1, $2, 'external')/ge;

These REs suffer from LTS (Leaning Toothpick Syndrome).  Consider using
s{foo}{bar} and adding the 'x' modifier to space out groups.

> 
> 		$par .= $_;
> 	}
> 	close($in);
> 	handle_text();
> }
> 
> package man_page;
> 
> sub new {
> 	my ($class) = @_;
> 	my $self = {
> 		sep => '',
> 		links => [],
> #		generator => 'Home grown git txt2man converter'
> 		generator => 'DocBook XSL Stylesheets v1.71.1 <http://docbook.sf.net/>'
> 	};
> 	bless $self, $class;
> 	return $self;
> }
> 
> sub header {
> 	my ($self, $text) = @_;
> 	$text =~ s/-/\\-/g;
> 
> 	if ($self->{preamble_shown} == undef) {
> 		$title = $text;
> 		$title =~ s/\(\d+\)$//;
> 		print '.\"     Title: ' . $title
> 			. '.\"    Author: ' . "\n"
> 			. '.\" Generator: ' . $self->{generator} . "\n"
> 			. '.\"      Date: ' . $self->{date} . "\n"
> 			. '.\"    Manual: ' . $self->{manual} . "\n"
> 			. '.\"    Source: ' . $self->{git_version} . "\n"
> 			. '.\"' . "\n";
> 	}

I'd consider a HERE-doc, or multi-line qq{ } more readable than this.

> 
> 	$text =~ tr/a-z/A-Z/;
> 	my $suffix = "\"$self->{date}\" \"$self->{git_version}\""
> 		. " \"$self->{manual}\"";

Use qq{} when making strings with lots of embedded double quotes and
interpolation.

> 	$text =~ s/^(.*)\((\d+)\)$/.TH "\1" "\2" $suffix/;
> 	print $text;
> 
> 	if ($self->{preamble_shown} == undef) {
> 		print '.\" disable hyphenation' . "\n"
> 			. '.nh' . "\n"
> 			. '.\" disable justification (adjust text to left'
> 				. ' margin only)' . "\n"
> 			. '.ad l' . "\n";

Using commas rather than "." will safe you a concat when printing to
filehandles, but that's a very small nit to pick :)

> 		$self->{preamble_shown} = 1;
> 	}
> 
> 	$self->{last_op} = 'header';
> }
> 
> sub section {
> 	my ($self, $text) = @_;
> 
> 	$text =~ tr/a-z/A-Z/;
> 	$text =~ s/^(.*)$/.SH "\1"/;
> 
> 	print $text;
> 
> 	$self->{last_op} = 'section';
> }
> 
> sub subsection {
> 	my ($self, $text) = @_;
> 
> 	$text =~ s/^(.*)$/.SS "\1"/;
> 
> 	print $text;
> 
> 	$self->{last_op} = 'subsection';
> }
> 
> sub get_link {
> 	my ($self, $command, $section, $option) = @_;
> 
> 	if ($option eq 'external') {
> 		my $links = $self->{links};
> 		push(@$links, $command);
> 		$command =~ s/\.html$//;
> 		$command =~ s/-/ /g;
> 		push(@$links, $command);
> 		return '\fI' . $command . '\fR\&[1]';
> 	} else {
> 		return '\fB' . $command . '\fR(' . $section . ')';
> 	}
> }
> 
> sub common {
> 	my ($self, $text, $option) = @_;
> 
> 	# escape backslashes, but not in "\n", "\&" or "\fB"
> 	$text =~ s/\\(?!n|f[A-Z]|&)/\\\\/g;
> 	# escape "-"
> 	$text =~ s/-/\\-/g;
> 	# handle ...
> 	$text =~ s/(\.\.\.)/\\&\1/g;
> 	# remove double space after full stop or comma
> 	$text =~ s/([\.,])  /\1 /g;
> 
> 	if ($option ne 'no-markup') {
> 		# make 'italic'
> 		$text =~ s/'([^'\n]*)'/\\fI\1\\fR/g;
> 		# ignore `
> 		$text =~ s/`//g;
> 		# make *bold*
> 		$text =~ s/\*([^\*\n]*)\*/\\fB\1\\fR/g;
> 		# handle <<sections>
> 		$text =~ s/<<([^>]*)>>/the section called \\(lq\1\\(rq/g;

Hmm, that regex would not match for <<foo > bar>>, if you care you'd
need to write something like <<((?:[^>]+|>[^>])*)>>

> 	}
> 
> 	return $text;
> }
> 
> sub normal {
> 	my ($self, $text) = @_;
> 
> 	if ($text eq "") {
> 		return;
> 	}
> 
> 	$text = $self->common($text);
> 
> 	$text =~ s/ *\n(.)/ \1/g;
> 
> 	if ($self->{last_op} eq 'normal') {
> 		print "\n";
> 	}
> 
> 	print $text;
> 
> 	$self->{last_op} = 'normal';
> }
> 
> sub verse {
> 	my ($self, $text) = @_;
> 
> 	$text = $self->common($text);
> 	$text =~ s/^\t/        /mg;
> 
> 	print ".sp\n.RS 4\n.nf\n" . $text . ".fi\n.RE\n";
> 
> 	$self->{last_op} = 'verse';
> }
> 
> sub enumeration {
> 	my ($self, $text, $option) = @_;
> 
> 	my $counter = 0;
> 	foreach $line (@$text) {
> 		$counter++;
> 		print ".TP 4\n"
> 			. ($option eq 'unnumbered' ? '\(bu' : $counter . '.')
> 			. "\n"
> 			. $self->common($line);
> 	}
> 
> 	$self->{last_op} = 'enumeration';
> }
> 
> sub begin_item {
> 	my ($self, $item, $text) = @_;
> 
> 	$item = $self->common($item);
> 	$text = $self->common($text);
> 
> 	$text =~ s/([^\n]) *\n([^\n])/\1 \2/g;

"." is the same as [^\n] (without the 's' modifier).

> 
> 	print ".PP\n" . $item . "\n.RS 4\n" . $text;
> 
> 	$self->{last_op} = 'item'; 
> }
> 
> sub end_item {
> 	my ($self) = @_;
> 
> 	print ".RE\n";
> 
> 	$self->{last_op} = 'end_item';
> }
> 
> sub indent {
> 	my ($self, $text) = @_;
> 
> 	$text = $self->common($text, 'no-markup');
> 	$text =~ s/^\t/        /mg;
> 
> 	if ($self->{last_op} eq 'normal') {
> 		print "\n";
> 	}
> 
> 	print ".sp\n.RS 4\n.nf\n" . $text . ".fi\n.RE\n";
> 
> 	$self->{last_op} = 'indent';
> }
> 
> sub verbatim {
> 	my ($self, $text) = @_;
> 
> 	$text = $self->common($text, 'no-markup');
> 
> 	# convert tabs to spaces
> 	$text =~ s/^\t/        /mg;
> 	# remove trailing empty lines
> 	$text =~ s/\n\n*$/\n/;
> 
> 	if ($self->{last_op} eq 'normal') {
> 		print "\n";
> 	}
> 
> 	print ".sp\n.RS 4\n.nf\n.ft C\n" . $text . ".ft\n\n.fi\n.RE\n";
> 
> 	$self->{last_op} = 'verbatim';
> }
> 
> sub anchor {
> 	my ($self, $text) = @_;
> 
> 	$self->{last_op} = 'anchor';
> }
> 
> sub finish {
> 	my ($self) = @_;
> 	my $links = $self->{links};
> 
> 	if ($#$links >= 0) {
> 		print '.SH "REFERENCES"' . "\n";
> 		my $i = 1;
> 		while ($#$links >= 0) {

just use if (@$links) and while (@$links)

> 			my $ref = shift(@$links);
> 			$ref =~ s/-/\\-/g;
> 			my $label = shift(@$links);
> 			printf (".IP \"% 2d.\" 4\n%s\n.RS 4\n\\%%%s\n.RE\n",
> 				$i++, $label, $ref);
> 		}
> 	} else {
> 		print "\n";
> 	}
> }
> 
> package html_page;
> 
> sub new {
> 	my ($class) = @_;
> 	my $self = {};
> 	bless $self, $class;
> 	return $self;
> }
> 
> -- snap --
> 
> Ciao,
> Dscho
> 
> P.S.: I need to catch some Zs, and do some real work, so do not be 
> surprised if I do not respond within the next 24 hours.
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2007-10-03  1:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-03  0:42 WIP: asciidoc replacement Johannes Schindelin
2007-10-03  1:56 ` Sam Vilain [this message]
2007-10-03  4:23   ` Johannes Schindelin
2007-10-03  4:51     ` Jeff King
2007-10-03 13:55     ` J. Bruce Fields
2007-10-04  4:13     ` Sam Vilain
2007-10-04 12:41       ` Johannes Schindelin
2007-10-03  6:40   ` Wincent Colaiuta
2007-10-03  4:48 ` Junio C Hamano
2007-10-03  6:34   ` Wincent Colaiuta
2007-10-03  8:12     ` David Kastrup
2007-10-03 10:05       ` Wincent Colaiuta
2007-10-03 10:25         ` David Kastrup
2007-10-03 10:52           ` Sam Ravnborg
2007-10-03 13:47           ` J. Bruce Fields
2007-10-03 14:01             ` David Kastrup
2007-10-03 10:57         ` Junio C Hamano
2007-10-03 17:46           ` Sam Ravnborg
2007-10-03 18:57             ` Johannes Schindelin
2007-10-03 19:21               ` Sam Ravnborg
2007-10-04  6:55           ` Martin Langhoff
2007-10-04 20:58             ` David Kastrup
2007-10-04 22:49               ` Martin Langhoff
2007-10-03 11:50   ` [msysGit] " Johannes Schindelin
2007-10-03 12:02     ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4702F6BB.60908@vilain.net \
    --to=sam@vilain.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=msysgit@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.