All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Johannes Sixt <j6t@kdbg.org>
Cc: Jeff King <peff@peff.net>, Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>,
	Emily Shaffer <emilyshaffer@google.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: Why the Makefile is so eager to re-build & re-link
Date: Fri, 25 Jun 2021 10:34:20 +0200	[thread overview]
Message-ID: <87r1gqxqxn.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <fb23a23e-13be-14a8-4fbe-5ca2b4bcdb52@kdbg.org>


On Thu, Jun 24 2021, Johannes Sixt wrote:

> Am 24.06.21 um 17:16 schrieb Jeff King:
>> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>  * {command,config}-list.h (and in-flight, my hook-list.h): Every time
>>>    you touch a Documentation/git-*.txt we need to re-generate these, and
>>>    since their mtime changes we re-compile and re-link all the way up to
>>>    libgit and our other tools.
>>>
>>>    I think the best solution here is to make the generate-*.sh
>>>    shellscripts faster (just one takes ~300ms of nested shellscripting,
>>>    just to grep out the first few lines of every git-*.txt, in e.g. Perl
>>>    or a smarter awk script this would be <5ms).
>> 
>> Yeah, I think Eric mentioned he had looked into doing this in perl, but
>> we weren't entirely happy with the dependency. Here's another really odd
>> thing I noticed:
>> 
>>   $ time sh ./generate-cmdlist.sh command-list.txt >one
>>   real	0m1.323s
>>   user	0m1.531s
>>   sys	0m0.834s
>> 
>>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two
>>   [a bunch of trace output]
>>   real	0m0.513s
>>   user	0m0.754s
>>   sys	0m0.168s
>> 
>>   $ cmp one two
>>   [no output]
>> 
>> Er, what? Running with "-x" makes it almost 3 times faster to generate
>> the same output? I'd have said this is an anomaly, but it's repeatable
>> (and swapping the order produces the same output, so it's not some weird
>> priming thing). And then to top it all off, redirecting the trace is
>> slow again:
>> 
>>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null
>>   real	0m1.363s
>>   user	0m1.538s
>>   sys	0m0.902s
>> 
>> A little mini-mystery that I think I may leave unsolved for now.
>
> Strange, really. Reminds me of the case that the `read` built-in must
> read input byte by byte if stdin is not connected to a (searchable) file.
>
> I have two patches that speed up generate-cmdlist.sh a bit:
>
> generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
> (https://github.com/j6t/git/commit/b6d05f653bede727bc001f299b57969f62d3bc03)
> generate-cmdlist.sh: spawn fewer processes
> (https://github.com/j6t/git/commit/fd8721ee8fae06d7b584fa5166f32bf5521ca304)
>
> that I can submit (give me some time, though) or interested parties
> could pick up.

Interesting, but I think rather than micro-optimizing the O(n) loop it
makes more sense to turn it into a series of O(1) in -j parallel,
i.e. actually use the make dependency graph for this as I suggested in:
https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/

Something like the hacky throwaway patch that follows. Now when you
touch a file in Documentation/git-*.txt you re-make just that file
chain, which gets assembled into the command-list.h:
	
	$ touch Documentation/git-add.txt; time make -k -j $(nproc) git 2>&1
	GIT_VERSION = 2.32.0.94.g87ef2a6b7ed.dirty
	    GEN build/Documentation/git-add.txt.cmdlist.in
	    CC version.o
	    GEN build/Documentation/git-add.txt.cmdlist
	    GEN command-list.h
	    CC help.o
	    AR libgit.a
	    LINK git

Those build/* files are, respectively, the relevant line corresponding
to the *.txt file from command-list.txt:

    $ cat build/Documentation/git-add.txt.cmdlist.in
    git-add                                 mainporcelain           worktree

And then the line we want in command-list.h at the end:

    $ cat build/Documentation/git-add.txt.cmdlist
        { "git-add", N_("Add file contents to the index"), 0 | CAT_mainporcelain | CAT_worktree },

The build process in then just a matter of keeping those files
up-to-date, and having "make" create the command-list.h is just a matter
of:

    cat build/Documentation/*.cmdlist

Well, with the categories still being an O(n) affair, but the cost of
generating those is trivial.

I think that aside from optimizing for speed it just makes things
clearer, if you modify e.g. git-add.txt you see a clear chain of output
from make about how we generate output from that, and then make the
command-list.h.

Now we'd just show a mysterious command-list.h entry. Whenever you have
"make" create one file from many it becomes hard to see at a glance
what's going on.

I'd be curious to see what how that performs e.g. on Windows, on Linux I
get e.g. (warm cache):
	
	$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
	    GEN build/Documentation/git-apply.txt.cmdlist.in
	    GEN build/Documentation/git-annotate.txt.cmdlist.in
	    GEN build/Documentation/git-am.txt.cmdlist.in
	    GEN build/Documentation/git-add.txt.cmdlist.in
	    GEN build/Documentation/git-archimport.txt.cmdlist.in
	    GEN build/Documentation/git-archive.txt.cmdlist.in
	    GEN build/Documentation/git-apply.txt.cmdlist
	    GEN build/Documentation/git-annotate.txt.cmdlist
	    GEN build/Documentation/git-am.txt.cmdlist
	    GEN build/Documentation/git-add.txt.cmdlist
	    GEN build/Documentation/git-archimport.txt.cmdlist
	    GEN build/Documentation/git-archive.txt.cmdlist
	    GEN command-list.h
	
	real    0m0.214s
	user    0m0.196s
	sys     0m0.071s
	
Doing the same on master takes around 600ms for me at best:
	
	$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
	    GEN command-list.h
	
	real    0m0.611s
	user    0m0.756s
	sys     0m0.112s

It's even faster when I have to make all of them (my -j is = 8), or
around 450ms after a "touch Documentation/git-*.txt".

We have ~170 lines we process in command-list.txt, I'd think on Windows
you'd get much better results, instead of optimizing those number of
"sort | uniq" to the same number of "sort -u" the common case is just
running 1-5 of those, as that's all the *.txt files you changed, then
just "cat-ing" the full set.

diff --git a/.gitignore b/.gitignore
index 311841f9bed..9b365395496 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,6 +13,7 @@
 /GIT-USER-AGENT
 /GIT-VERSION-FILE
 /bin-wrappers/
+/build/
 /git
 /git-add
 /git-add--interactive
diff --git a/Makefile b/Makefile
index c3565fc0f8f..5e845bd0f69 100644
--- a/Makefile
+++ b/Makefile
@@ -2231,12 +2231,30 @@ config-list.h: Documentation/*config.txt Documentation/config/*.txt
 	$(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh \
 		>$@+ && mv $@+ $@
 
-command-list.h: generate-cmdlist.sh command-list.txt
+build/Documentation:
+	$(QUIET_GEN)mkdir -p build/Documentation
+.PRECIOUS: build/Documentation/git%.txt.cmdlist.in
+build/Documentation/git%.txt.cmdlist.in: Documentation/git%.txt
+	$(QUIET_GEN)if ! grep -w $(patsubst build/Documentation/%.txt.cmdlist.in,%,$@) command-list.txt >$@; \
+	then \
+		# For e.g. git-init-db, which has a *.txt file, but no \
+		# command-list.h entry \
+		>$@; \
+	fi
+build/Documentation/git%.txt.cmdlist: build/Documentation/git%.txt.cmdlist.in
+	$(QUIET_GEN)./generate-cmdlist.sh --tail $< >$@
 
-command-list.h: $(wildcard Documentation/git*.txt)
-	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \
+COMMAND_LIST_H_FILES = $(filter-out Documentation/git.txt,$(wildcard Documentation/git*.txt))
+
+COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,build/Documentation/%.txt.cmdlist,$(COMMAND_LIST_H_FILES))
+command-list.h: build/Documentation generate-cmdlist.sh command-list.txt $(COMMAND_LIST_GEN)
+	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh --header \
 		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
-		command-list.txt >$@+ && mv $@+ $@
+		command-list.txt >$@+ && \
+	echo "static struct cmdname_help command_list[] = {" >>$@+ && \
+	cat build/Documentation/*.txt.cmdlist >>$@+ && \
+	echo "};" >>$@+ && \
+	mv $@+ $@
 
 SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
 	$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9dbbb08e70a..f80266d5138 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -1,5 +1,8 @@
 #!/bin/sh
 
+mode=$1
+shift
+
 die () {
 	echo "$@" >&2
 	exit 1
@@ -61,8 +64,6 @@ define_category_names () {
 }
 
 print_command_list () {
-	echo "static struct cmdname_help command_list[] = {"
-
 	command_list "$1" |
 	while read cmd rest
 	do
@@ -73,6 +74,9 @@ print_command_list () {
 		done
 		echo " },"
 	done
+}
+
+end_print_command_list () {
 	echo "};"
 }
 
@@ -84,6 +88,12 @@ do
 	shift
 done
 
+if test "$mode" = "--tail"
+then
+	print_command_list "$1"
+	exit 0
+fi
+
 echo "/* Automatically generated by generate-cmdlist.sh */
 struct cmdname_help {
 	const char *name;
@@ -94,5 +104,3 @@ struct cmdname_help {
 define_categories "$1"
 echo
 define_category_names "$1"
-echo
-print_command_list "$1"

  reply	other threads:[~2021-06-25  8:57 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30   ` Johannes Sixt
2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason [this message]
2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
2021-06-29  2:13       ` Jeff King
2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42             ` Jeff King
2021-10-21 16:25               ` Jeff King
2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45             ` Jeff King
2021-10-21 18:24               ` Junio C Hamano
2021-10-21 22:46             ` Øystein Walle
2021-10-20 18:39           ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35           ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
2021-10-20 21:31             ` Taylor Blau
2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason
2021-10-20 23:46                 ` Jeff King
2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
2021-10-21  2:20                     ` Taylor Blau
2021-10-22 12:37                       ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34                     ` Jeff King
2021-10-21 22:34                       ` Junio C Hamano
2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31                         ` Jeff King
2021-10-22 20:50                           ` Ævar Arnfjörð Bjarmason
2021-10-21  5:39                 ` Eric Sunshine
2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29               ` Junio C Hamano
2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26                   ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39               ` Jeff King
2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46               ` Jeff King
2021-10-25 17:52                 ` Jeff King
2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19               ` Junio C Hamano
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 21:20             ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34             ` Junio C Hamano
2021-10-25 16:57             ` Jeff King
2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45                 ` Junio C Hamano
2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18                     ` Junio C Hamano
2021-11-05 14:08               ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47                 ` Junio C Hamano
2021-11-06  4:23                   ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17   ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29  5:04   ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39   ` Øystein Walle
2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53   ` Junio C Hamano
2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48       ` Junio C Hamano
2021-07-03  1:05   ` Felipe Contreras
2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56       ` Felipe Contreras
2021-07-05 19:38       ` Junio C Hamano
2021-07-06 22:25         ` Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1gqxqxn.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.