All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: "Jeff King" <peff@peff.net>,
	git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Johannes Sixt" <j6t@kdbg.org>,
	"Øystein Walle" <oystwa@gmail.com>
Subject: Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
Date: Thu, 21 Oct 2021 01:14:37 +0200	[thread overview]
Message-ID: <211021.86v91rmftn.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YXCKqAEwtwFozWk6@nand.local>


On Wed, Oct 20 2021, Taylor Blau wrote:

> On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote:
>> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > This series is based off an off-hand comment I made about making the
>> > cmdlist.sh faster, in the meantime much of the same methods are
>> > already cooking in "next" for the "lint-docs" target.
>> >
>> > See 7/8 for the main performance numbers, along the way I stole some
>> > patches from Johannes Sixt who'd worked on optimizing the script
>> > before, which compliment this new method of generating this file by
>> > piggy-backing more on GNU make for managing a dependency graph for us.
>>
>> I still think this is a much more complicated and error-prone approach
>> than just making the script faster. I know we can't rely on perl, but
>> could we use it optimistically?

Jeff: Just in terms of error prone both of these implementations will
accept bad input that's being caught in 8/8 of this series.

We accept a lot of bad input now, ending up with some combinations of
bad output or compile errors if you screw with the input *.txt files. I
think I've addressed all of those in this series.

If you mean the general concept of making a "foo.gen" from a "foo.txt"
as an intermediate with make as a way to get to "many-foo.h" I don't
really see how it's error prone conceptually. You get error checking
each step of the way, and it encourages logic that's simpler each step
of the way.

> I'll take credit for this terrible idea of using Perl when available.
>
> But I don't think we even need to, since we could just rely on Awk. That
> has all the benefits you described while still avoiding the circular
> dependency on libgit.a. But the killer feature is that we don't have to
> rely on two implementations, the lesser-used of which is likely to
> bitrot over time.
>
> The resulting awk is a little ugly, because of the nested-ness. I'm also
> no awk-spert, but I think that something like the below gets the job
> done.
>
> It also has the benefit of being slightly faster than the equivalent
> Perl implementation, for whatever those extra ~9 ms are worth ;).
>
> Benchmark #1: sh generate-cmdlist.sh command-list.txt
>   Time (mean ± σ):      25.3 ms ±   5.3 ms    [User: 31.1 ms, System: 8.3 ms]
>   Range (min … max):    15.5 ms …  31.7 ms    95 runs
>
> Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
>   Time (mean ± σ):      34.9 ms ±   9.8 ms    [User: 41.0 ms, System: 6.9 ms]
>   Range (min … max):    22.4 ms …  54.8 ms    64 runs
>
> Summary
>   'sh generate-cmdlist.sh command-list.txt' ran
>     1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
>
> ---
>
> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index a1ab2b1f07..39338ef1cc 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -64,12 +64,19 @@ print_command_list () {
>  	echo "static struct cmdname_help command_list[] = {"
>
>  	command_list "$1" |
> -	while read cmd rest
> -	do
> -		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
> -		printf " | CAT_%s" $(echo "$rest" | get_category_line)
> -		echo " },"
> -	done
> +	awk '{
> +		f="Documentation/" $1 ".txt"
> +		while((getline line<f) > 0) {
> +			if (match(line, "^" $1 " - ")) {
> +				syn=substr(line, RLENGTH+1)
> +				printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn
> +				for (i=2; i<=NF; i++) {
> +					printf " | CAT_%s", $i
> +				}
> +				print " },"
> +			}
> +		}
> +	}'
>  	echo "};"
>  }

Per Eric's Sunshine's upthread comments an awk and Perl implementation
were both considered before[1].

I also care a bit about the timings of the from-scratch build, but I
think they're way less interesting than a partial build.

I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
with/without this awk version the difference in runtime is within the
error bars. I.e. making the loop faster isn't necessary. It's better to
get to a point where make can save you from doing all/most of the work
by checking modification times, rather than making an O(n) loop faster.

The only reason there's even a loop there is because it's used by the
cmake logic in contrib/* (how we've ended up with a hard dependency in
contrib is another matter...).

I'm also interested in (and have WIP patches for) simplifying things
more generally in the Makefile. Once we have a file exploded out has
just the synopsis line that can be used to replace what's now in
Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
"man git".

There's subtle dependency issues there as well, and just having a
one-off solution for the the command-list.h doesn't get us closer to
addressing that sibling implementation.

In terms of future Makefile work I was hoping to get this in, untangle
some of the complexity between the inter-dependency of Makefile &
Documentation/Makefile (eventually just merging the two, and leaving a
stub in Documentation/Makefile). I've also got a working implementation
for getting rid of all of the "FORCE" dependencies (except the version
one).

1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@mail.gmail.com/

  reply	other threads:[~2021-10-20 23:34 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30   ` Johannes Sixt
2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason
2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
2021-06-29  2:13       ` Jeff King
2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42             ` Jeff King
2021-10-21 16:25               ` Jeff King
2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45             ` Jeff King
2021-10-21 18:24               ` Junio C Hamano
2021-10-21 22:46             ` Øystein Walle
2021-10-20 18:39           ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35           ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
2021-10-20 21:31             ` Taylor Blau
2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason [this message]
2021-10-20 23:46                 ` Jeff King
2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
2021-10-21  2:20                     ` Taylor Blau
2021-10-22 12:37                       ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34                     ` Jeff King
2021-10-21 22:34                       ` Junio C Hamano
2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31                         ` Jeff King
2021-10-22 20:50                           ` Ævar Arnfjörð Bjarmason
2021-10-21  5:39                 ` Eric Sunshine
2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29               ` Junio C Hamano
2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26                   ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39               ` Jeff King
2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46               ` Jeff King
2021-10-25 17:52                 ` Jeff King
2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19               ` Junio C Hamano
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 21:20             ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34             ` Junio C Hamano
2021-10-25 16:57             ` Jeff King
2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45                 ` Junio C Hamano
2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18                     ` Junio C Hamano
2021-11-05 14:08               ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47                 ` Junio C Hamano
2021-11-06  4:23                   ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17   ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29  5:04   ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39   ` Øystein Walle
2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53   ` Junio C Hamano
2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48       ` Junio C Hamano
2021-07-03  1:05   ` Felipe Contreras
2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56       ` Felipe Contreras
2021-07-05 19:38       ` Junio C Hamano
2021-07-06 22:25         ` Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211021.86v91rmftn.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=me@ttaylorr.com \
    --cc=oystwa@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.