From: Jakub Narebski <jnareb@gmail.com>
To: Thomas Rast <trast@student.ethz.ch>
Cc: git@vger.kernel.org, "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Jonathan Nieder" <jrnieder@gmail.com>,
"Junio C Hamano" <gitster@pobox.com>,
"Sverre Rabbelier" <srabbelier@gmail.com>
Subject: Re: [PATCH] Documentation: complete config list from other manpages
Date: Sun, 24 Oct 2010 16:36:50 +0200 [thread overview]
Message-ID: <201010241636.51106.jnareb@gmail.com> (raw)
In-Reply-To: <8145782bddf60325909f328337cb76d25c4402cf.1287872553.git.trast@student.ethz.ch>
This is second review, with the same comments ad the one before,
and containing proposal for simpler autogeneration script.
On Sat, 24 Oct 2010, Thomas Rast wrote:
> Add an autogeneration script that inserts links to other manpages
> describing config variables,
Actualy it is "Add ... and run it to generate config-vars.txt",
or "Create ... and use it to ...", isn't it.
> as follows:
>
> * parse config-vars-src.txt (as it now needs to be renamed) to find
> out its current contents
>
> * parse each manpage source (following includes) for config variable
> headers (the blocks are ignored)
>
> * assemble a new config-vars.txt that completes the original list with
> "See linkgit:git-foo[1]" entries for all variables that were not in
> it.
I find this itemized list slightly hard to read (as you can see by the
confusion shown in previous review of this patch). I think it would be
better to describe _goal_ of this script first, and only then how it
does that.
For example:
The Documentation/make-config-list.perl script finds all config
variables that are defined in individual manpages that are missing
from the list of all config variable (in git-config(1) manpage).
It then generates stub[1] documentation for those missing config
variables at appropriate place in Documentation/config-vars.txt,
using the following form:
foo.bar::
foo.baz::
See linkgit:git-foo[1].
It does that by parsing config-vars-src.txt (now source for
config-vars.txt) getting list of all config variables to find later
which ones are missing, parsing each manpage source provided on
command line (following includes) for config variable headers to
find which config variables are described directly on individual
manpages, then it assembles config-vars.txt with missing variables
added[2].
NOTE that as a side-effect list of config variables would be sorted
in alphabetical order[3]
[1] Not a best word, but I couldn't think of a better one.
[2] This paragraph could be done as itemized list, as before.
[3] In my proposal it is not necessary; note that some of config
variables are grouped _by function_ to be close together, and
your script would destroy that.
>
> Signed-off-by: Thomas Rast <trast@student.ethz.ch>
> ---
>
> Jakub Narebski wrote:
> > Could you please resend this patch using rename detection
> > (git format-patch -M)? It would make it clear what the difference
> > between config-vars and config-vars-src is, if any.
>
> Right, sorry about that. There wasn't supposed to be any; I'm
> resending what I pushed out for everyone's convenience, but the stray
> change will be gone in the final version.
Stray changes (there are 2 of them).
> Documentation/Makefile | 9 ++-
> .../{config-vars.txt => config-vars-src.txt} | 2 +-
> Documentation/make-config-list.perl | 131 ++++++++++++++++++++
> 3 files changed, 140 insertions(+), 2 deletions(-)
> rename Documentation/{config-vars.txt => config-vars-src.txt} (99%)
> create mode 100755 Documentation/make-config-list.perl
>
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index e117bc4..747b849 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -125,7 +125,7 @@ endif
>
> SHELL_PATH ?= $(SHELL)
> # Shell quote;
> -SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
> +SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH)) #'
>
> #
> # Please note that there is a minor bug in asciidoc.
Stray change, doesn't belong in this commit (and probably should be
skipped altogether).
BTW. there was a proposal to do something like that by saving quote
in a variable, and using it in shell quoting...
> @@ -320,6 +320,13 @@ howto-index.txt: howto-index.sh $(wildcard howto/*.txt)
> '$(SHELL_PATH_SQ)' ./howto-index.sh $(wildcard howto/*.txt) >$@+ && \
> mv $@+ $@
>
> +config-vars.txt: config-vars-src.txt $(MAN1_TXT) $(MAN5_TXT) $(MAN7_TXT)
> + ./make-config-list.perl --mainlist=config-vars-src.txt \
> + --ignore=config-vars.txt \
> + $(MAN1_TXT) $(MAN5_TXT) $(MAN7_TXT) \
> + > config-vars.txt+ && \
Very, very minor nitpick: wouldn't it read better as the following?
+ ./make-config-list.perl \
+ --mainlist=config-vars-src.txt \
+ --ignore=config-vars.txt \
+ $(MAN1_TXT) $(MAN5_TXT) $(MAN7_TXT) \
+ > config-vars.txt+ && \
> + > config-vars.txt+ && \
> + mv config-vars.txt+ config-vars.txt
> +
> $(patsubst %,%.html,$(ARTICLES)) : %.html : %.txt
> $(QUIET_ASCIIDOC)$(ASCIIDOC) $(ASCIIDOC_EXTRA) -b xhtml11 $*.txt
>
> diff --git a/Documentation/config-vars.txt b/Documentation/config-vars-src.txt
> similarity index 99%
> rename from Documentation/config-vars.txt
> rename to Documentation/config-vars-src.txt
> index a8d37a7..949259c 100644
> --- a/Documentation/config-vars.txt
> +++ b/Documentation/config-vars-src.txt
> @@ -936,7 +936,7 @@ gitcvs.dbname::
>
> gitcvs.dbdriver::
> Used Perl DBI driver. You can specify any available driver
> - for this here, but it might not work. git-cvsserver is tested
> + for this here, but it might not work. git-cvsserver is tested
> with 'DBD::SQLite', reported to work with 'DBD::Pg', and
> reported *not* to work with 'DBD::mysql'. Experimental feature.
> May not contain double colons (`:`). Default: 'SQLite'.
Stray change (spaces to tab), which should be done as separate
"whitespace cleanup" patch.
> diff --git a/Documentation/make-config-list.perl b/Documentation/make-config-list.perl
> new file mode 100755
> index 0000000..f086867
> --- /dev/null
> +++ b/Documentation/make-config-list.perl
> @@ -0,0 +1,131 @@
> +#!/usr/bin/perl
> +
> +use strict;
> +use warnings;
> +use Getopt::Long;
> +
> +my %ignore;
> +
> +my $rc = GetOptions(
> + "mainlist=s" => \my $mainlistfile,
> + "ignore=s" => sub { $ignore{$_[1]} = 1 },
> + "no-sort" => \my $no_sort,
> + );
Errr, wouldn't it be better to do it this way:
+my $sort = 1;
+my $rc = GetOptions(
+ "mainlist=s" => \my $mainlistfile,
+ "ignore=s" => \my @ignore,
+ "sort!" => \$sort,
+);
+my %ignore = map { $_ => 1 } @ignore;
Though it is not necessary; I guess it is a matter of taste.
BTW. does your sript work correctly with `--no-sort`?
> +
> +if (!$rc or (!-r $mainlistfile)) {
> + print "$0 --mainlist=<mainlist> [--ignore=<ignore>...] <asciidoc_manpage>...\n";
> + exit 1;
> +}
I'd rather you did a better error handling:
+if (!$rc || !defined $mainlistfile) {
+ print "$0 --mainlist=<mainlist> [--ignore=<ignore>...] <asciidoc_manpage>...\n";
+ exit 1;
+}
and leave handling '!-r $mainlistfile' to read_varlist() subroutine.
> +
This subroutine needs description, I think; at least oneline comment
describing it.
> +sub read_varlist {
> + my ($file) = @_;
> +
> + open my $fh, "<", $file or die "cannot open $file: $!";
You are a bit inconsistent here. read_varlist() has
open my $fh, "<", $file or die "cannot open $file: $!";
while read_file() has
my $fp;
open $fp, '<', $name or die "open $name failed: $!";
Note that you can use the same name of filehandle (e.g. $fh), as it is
local to the subroutine.
> + my (%mainlist, @mainvars);
> +
> + my ($v, $last_v);
> + my $in_block = 0;
I would done this differently, and IMHO in a simpler way. Instead of
generating %mainvars structure, that holds for each variable the contents
of its description (in the form of list of lines), adding missing
variables to it, and then (re)generating config-vars.txt from it:
--mainlist=config-vars-src.txt
|
v
@mainlist, %mainvars
|
| <-- %vars <-- <asciidoc_manpage>...
|
v
@all_keys, %out
|
v
> config-vars.txt+
wouldn't it be better to simply store where description of each variable
begins in a file (line number, or file position), and then use it to
inject generated description for missing variables in correct place?
> + while (<$fh>) {
> + if (/^(\S+)::/) {
> + $v = lc $1;
Note: because you are (re)generating config-vars.txt from those data
structures this has an unfortunate side-effect of lowercasing all config
variable names in final config-bars.txt, e.g. it would replace
`core.ignoreCygwinFSTricks` with `core.ignorecygwinfstricks`.
> + $in_block = 0;
> + push @{$mainlist{$v}}, $_;
> + push @mainvars, $v;
So in the proposed implementation it would simply be:
+ $mainlist{lc($v)} = $.;
+ push @mainvars, $v;
I think lc($v) is needed only in %mainlist keys.
> + } elsif (/^$/ && !$in_block) {
> + if (defined $last_v && !$#{$mainlist{$last_v}}) {
> + $mainlist{$last_v} = $mainlist{$v};
> + }
> + $last_v = $v;
Besides, if I understand it correctly, that any empty line which does
not precede config variable definition, and is not inside literal block,
is a *bug* in AsciiDoc markup. It should be "^+$" (continuation),
not "^$".
This branch of conditional would be simply not required in proposed
alternate solution.
By the way, why you do not follow includes in this file (for example
'include::merge-config.txt[]'?
> + } elsif (defined $v) {
> + push @{$mainlist{$v}}, $_;
> + $in_block = !$in_block if /^--$/;
> + }
This branch of conditional would be simply not required in proposed
alternate solution.
Well... assuming that we do not have lines matching /^(\S+)::/ in literal
blocks (which we don't, and which you also assume).
> + }
> +
> + close $fh or die "eh? close failed: $!";
You are a bit inconsistent here. read_varlist() has
close $fh or die "eh? close failed: $!";
while read_file() has
close $fp or die "close $name failed: $!";
> +
> + return \%mainlist, \@mainvars;
> +}
> +
> +my %vars;
> +my %sections;
I think those variables needs better names, for example
%vars_manpages (as it is mapping from config variable name found in
manpage to list of manpages it was found in) and %manpage_section
(as it is mapping from manpage name to man section, for purpose of
linking).
Though perhaps %vars could be left as is...
> +
This subroutine needs description, I think; at least oneline comment
describing it. And probably better name, like read_manpage().
> +sub read_file {
> + my ($name, $main_name) = @_;
> + if (!defined $main_name) {
> + $main_name = $name;
> + }
> + my $manpage_name = $main_name;
> + $manpage_name =~ s/\.txt//;
Wouldn't it be simpler to pass $filename, $manpage as arguments to
read_file(), rather than $filename/$name, $main_name?
It would read as:
+# Parse manpages for config variables and sections, following includes
+sub read_manpage {
+ my ($filename, $manpage) = @_;
+ if (!defined $manpage) {
+ $manpage = $filename;
+ $manpage =~ s/\.txt//;
+ }
And of course s/$name/$filename/ and s/$manpage_name/$manpage/. Well,
the change of variable names is not really necessary.
> + my $fp;
> + open $fp, '<', $name or die "open $name failed: $!";
> + while (<$fp>) {
> + if ($. < 5 && /^$manpage_name\((\d+)\)/) {
> + $sections{$manpage_name} = $1;
> + }
> + if (/^([a-z0-9]+\.[a-z<>0-9.]+)::/) {
> + push @{$vars{$1}}, $manpage_name;
> + }
> + if (/^include::\s*(\S+)\s*\[\]/
> + && !exists $ignore{$1}) {
> + read_file($1, $main_name);
> + }
> + }
> + close $fp or die "close $name failed: $!";
> +}
> +
> +foreach my $name (@ARGV) {
> + read_file($name);
> +}
O.K. nicely done. This would be unchanged in proposed alternate
solution.
> +
> +my ($mainlist, $mainvars) = read_varlist($mainlistfile);
> +
> +my @all_keys = @$mainvars;
> +foreach my $k (keys %vars) {
> + if (!exists $mainlist->{$k}) {
> + push @all_keys, $k;
> + }
> +}
I propose the following:
+my @missing_vars = ();
+foreach my $k (keys %vars) {
+ if (!exists $mainlist->{$k}) {
+ push @missing_vars, $k;
+ }
+}
or even
+foreach my $k (keys %vars) {
+ if (exists $mainlist->{$k}) {
+ delete $vars{$k};
+ }
+}
so that %vars would now contain only missing variables.
> +
> +@all_keys = sort @all_keys unless $no_sort;
So if you pass `--no-sort` to this script, all missing variables would
be put at the end, and if you pass it, the order of variables would be
changed, isn't it?
> +
> +my %out;
> +foreach my $k (@all_keys) {
> + if (exists $mainlist->{$k}) {
> + push @{$out{$k}}, @{$mainlist->{$k}}, "\n";
> + } elsif (exists $vars{$k}) {
> + push @{$out{$k}}, $k . "::\n";
> + push @{$out{$k}}, "\tSee ";
> + my $sep = " ";
> + foreach my $p (sort @{$vars{$k}}) {
> + next if ($p =~ /$mainlistfile/);
> + if (!exists $sections{$p}) {
> + warn "section for $p unknown";
> + $sections{$p} = 1;
> + }
> + push @{$out{$k}}, $sep . "linkgit:" . $p . "[" . $sections{$p} . "]";
> + $sep = ", ";
> + }
> + push @{$out{$k}}, ".\n\n";
> + } else {
> + die "can't happen: $k not in any source";
> + }
> +}
> +
> +for (my $i = 0; $i < $#all_keys; $i++) {
> + next if $#{$out{$all_keys[$i]}} != $#{$out{$all_keys[$i+1]}};
> + my $same = 1;
> + for (my $j = 1; $j <= $#{$out{$all_keys[$i]}}; $j++) {
> + if ($out{$all_keys[$i]}[$j] ne $out{$all_keys[$i+1]}[$j]) {
> + $same = 0;
> + last;
> + }
> + }
> + if ($same) {
> + $out{$all_keys[$i]} = [$out{$all_keys[$i]}[0]];
> + }
> +}
> +
> +foreach my $k (@all_keys) {
> + print @{$out{$k}};
> +}
Instead of that, I would propose the following UNTESTED code:
-- 8< --
%vars = find_insertion_points($mainlist, \%vars);
# %vars has now the following format:
# 'name' => {
# 'lineno' => line of $mainfile where description should be inserted
# 'manpages' => [ <manpage_name>... ]
# }
# group by insertion point
my %insertion_point; ## or just %insert
foreach my $v (keys %vars) {
push @{$insertion_point{$v->{'lineno'}}}, $v;
}
open my $fh, '<', $mainfile
or die "cannot open '$mainfile': $!";
while (my $line = <$fh>) {
if (exists $insertion_point{$.}) {
print document_vars($insertion_point{$.}, \%vars);
print "\n";
}
print $line;
}
# special case: insertion after last line in $mainfile
print document_vars($insertion_point{-1}, \%vars)
if exists $insertion_point{-1};
close $fh
or die "cannot close '$mainfile': $!";
exit 0;
## those subroutines should probably be defined earlier
sub find_insertion_points {
my ($mainlist, $missing_vars) = @_;
my %res; ## not required if we modify %$missing_vars in-place
my %all_vars = (%$mainlist, %$missing_vars);
my $lineno = -1; # means after last line
# reverse order because we want to find a place before which to insert
# generated documentation; it is easy to find where description
# of variable begins, but in general harder to find where it ends.
foreach my $key (reverse sort { lc($a) cmp lc($b) } keys %all_vars) {
my $val = $all_vars{$key};
if (ref $val) {
# this came from %$missing_vars
$res{$v} = { 'lineno' => $lineno, 'manpages' => $val };
## or $missing_vars->{$v} instead of $res{$v} if in-place
} else {
# this came from %$mainlist
$lineno = $value;
}
}
return %res; ## not required if in-place
}
## this is single place where we need to change code if output changes,
## e.g. to
##
## format.attach linkgit:git-format-patch[1]
## format.cc linkgit:git-format-patch[1]
## format.headers linkgit:git-format-patch[1]
## format.pretty linkgit:gitpretty[7]
sub document_vars {
my ($keylist, $vars) = @_;
# generate output for each key now, to compact output, because it is
# easier to compare strings than arrays
foreach my $k (@$keylist) {
$vars->{$k}{'output'} = "\tSee: ".gen_links($vars->{$k}{'manpages'}).".\n";
}
my $output = '';
## the below could be done in different way, e.g. using new array for
## sorted keys, and using 'my $k = pop @keys' etc.
$keylist = [ sort @$keylist ];
for (my $i = 0; $i < @$keylist; $i++) {
my $k = $keylist->[$i];
$output .= "$k::\n";
unless ($i <= $#@$keylist &&
$keylist->[$i]{'output'} eq $keylist->[$i+1]{'output'}) {
$output .= $k->{'output'};
}
}
return $output;
}
## we can make it smarter in the future, for example:
## 'linkgit:git-foo[1], linkgit:git-bar[1] and linkgit:git-baz[1] for details';
sub gen_links {
my $manpages = shift;
return join(", ", map { linkgit($_) } @$manpages);
}
sub linkgit {
my $manpage = shift;
if (!exists $manpage_section{$manpage}) {
warn "section for $manpage unknown\n";
$manpage_section{$manpage} = 1;
}
return "linkgit:${manpage}[$manpage_section{$manpage}]";
}
__END__
-- >8 --
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2010-10-24 14:37 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-22 5:02 [RFC PATCH v2 0/3] Documentation: refactor config variable descriptions Thomas Rast
2010-10-22 5:02 ` [RFC PATCH v2 3/3] Documentation: move format.* documentation to format-patch Thomas Rast
[not found] ` <c3f621cd062b2c4f80aa2e8dadcfddbc042aefaa.1287690696.git.trast@student.ethz.ch>
2010-10-22 8:18 ` [RFC PATCH v2 1/3] Documentation: Move variables from config.txt to separate file Jakub Narebski
2010-10-22 15:17 ` [RFC PATCH v2 0/3] Documentation: refactor config variable descriptions Jakub Narebski
2010-10-22 15:53 ` Jeff King
2010-10-24 1:24 ` Thomas Rast
2010-10-25 15:04 ` Jeff King
2010-10-25 12:44 ` Jakub Narebski
2010-10-25 15:11 ` Jeff King
2010-10-25 15:49 ` Jakub Narebski
2010-10-27 10:56 ` Jakub Narebski
2010-11-02 13:42 ` Jens Lehmann
[not found] ` <8145782bddf60325909f328337cb76d25c4402cf.1287690696.git.trast@student.ethz.ch>
2010-10-22 6:19 ` [RFC PATCH v2 2/3] Documentation: complete config list from other manpages Jakub Narebski
2010-10-22 14:31 ` Jakub Narebski
2010-10-23 22:24 ` [PATCH] " Thomas Rast
2010-10-23 22:30 ` Jakub Narebski
2010-10-24 14:36 ` Jakub Narebski [this message]
2010-10-24 20:44 ` [RFC PATCH v3 2/3] " Jakub Narebski
2010-10-24 20:55 ` [RFC PATCH v3 2/3 (amend)] " Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201010241636.51106.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=srabbelier@gmail.com \
--cc=trast@student.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).