git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] GIT-VERSION-GEN: allow it to be run in parallel
@ 2025-01-09 14:24 Johannes Schindelin via GitGitGadget
  2025-01-09 18:33 ` Martin Ågren
  2025-01-10 11:48 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
  0 siblings, 2 replies; 5+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-01-09 14:24 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

"Why would one want to run it in parallel?" I hear you ask. I am glad
you are curious, because a curious story is what it is, indeed.

The `GIT-VERSION-GEN` script is quite a pillar of Git's source code,
with most lines being unchanged for the past 15 years. Until the v2.48.0
release candidate cycle.

Its original purpose was to generate the version string and store it in
the `GIT-VERSION-FILE`.

This paradigm changed quite dramatically when support for building with
Meson was introduced. Most crucially, a38edab7c88b (Makefile: generate
doc versions via GIT-VERSION-GEN, 2024-12-06) changed the way the
documentation is built by using the `GIT-VERSION-GEN` file to write out
the `asciidocor-extensions.rb` and `asciidoc.conf` files with now
hard-coded version strings.

Crucially, the Makefile rule to generate those files needs to be run in
every build because `GIT_VERSION` could have been specified in the
`make` command-line, which would require these files to be modified.

This introduced a surprising race condition!

And this is how that race surfaces: When calling `make -j2 html man`
from the top-level directory (a variant of which is invoked in Git for
Windows' release process), two sub-processes are spawned, a `make -C
Documentation html` one and a `make -C Documentation man` one. Both run
the rule to (re-)generate `asciidoctor-extensions.rb` or
`asciidoc.conf`, invoking `GIT-VERSION-GEN` to do so. That script first
generates a temporary file (appending the `+` character to the
filename), then looks whether it contains something different than the
already existing file (if it exists, that is), and either replaces it if
needed, or removes the temporary file. If one of the two parallel
invocations removes that temporary file before the other can compare it,
or even worse: if one tries to replace the target file just after the
other _started_ writing the temporary file (but did not finish writing
it yet), that race condition now causes bad builds.

This may sound highly theoretical, but due to the design of Git's build
process, Git for Windows is forced to use a (slow) POSIX emulation layer
to run that script and in the blink of an eye it becomes very much not
theoretical at all. See Exhibit A: These GitHub workflow runs failed
because one of the two competing `make` processes tried to remove the
temporary file when the other process had already done so:

https://github.com/git-for-windows/git-sdk-32/actions/runs/12663456654
https://github.com/git-for-windows/git-sdk-32/actions/runs/12683174970
https://github.com/git-for-windows/git-sdk-64/actions/runs/12649348496

While it is undesirable to run this script over and over again,
certainly when this involves above-mentioned slow POSIX emulation layer,
the stage of the release cycle in which we are presently finding
ourselves does not lend itself to a re-design where this script could be
run once, and once only, but instead dictates that a quick and reliable
work-around be implemented that prevents the race condition without
changing the overall architecture of the build process.

This patch does that: By using a filename suffix for the temporary file
which is based on the currently-executing script's process ID, We
guarantee that the two competing invocations cannot overwrite or remove
each others' temporary files.

Incidentally, this also fixes something else: The `+` character is
not even a valid filename character on Windows. The only reason why Git
for Windows did not need this is that above-mentioned POSIX emulation
layer also plays a couple of tricks with filenames (tricks that are not
interoperable with regular Windows programs, though), and previous
attempts to remedy this in git/git were unsuccessful, see e.g.
https://lore.kernel.org/git/pull.216.git.gitgitgadget@gmail.com/

This commit fixes one of the issues that are currently delaying Git for
Windows v2.48.0-rc2.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    GIT-VERSION-GEN: allow it to be run in parallel

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1850%2Fdscho%2Fasciidoctor-extensions-gen-race-work-around-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1850/dscho/asciidoctor-extensions-gen-race-work-around-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1850

 GIT-VERSION-GEN | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN
index 6d1cb66d69a..5b49e2d72fb 100755
--- a/GIT-VERSION-GEN
+++ b/GIT-VERSION-GEN
@@ -86,11 +86,11 @@ sed -e "s|@GIT_VERSION@|$GIT_VERSION|" \
 	-e "s|@GIT_BUILT_FROM_COMMIT@|$GIT_BUILT_FROM_COMMIT|" \
 	-e "s|@GIT_USER_AGENT@|$GIT_USER_AGENT|" \
 	-e "s|@GIT_DATE@|$GIT_DATE|" \
-	"$INPUT" >"$OUTPUT"+
+	"$INPUT" >"$OUTPUT".$$
 
-if ! test -f "$OUTPUT" || ! cmp "$OUTPUT"+ "$OUTPUT" >/dev/null
+if ! test -f "$OUTPUT" || ! cmp "$OUTPUT".$$ "$OUTPUT" >/dev/null
 then
-	mv "$OUTPUT"+ "$OUTPUT"
+	mv "$OUTPUT".$$ "$OUTPUT"
 else
-	rm "$OUTPUT"+
+	rm "$OUTPUT".$$
 fi

base-commit: a60673e9252b08d4eca90543b3729f4798b9aafd
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] GIT-VERSION-GEN: allow it to be run in parallel
  2025-01-09 14:24 [PATCH] GIT-VERSION-GEN: allow it to be run in parallel Johannes Schindelin via GitGitGadget
@ 2025-01-09 18:33 ` Martin Ågren
  2025-01-09 20:00   ` Junio C Hamano
  2025-01-10 11:48 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
  1 sibling, 1 reply; 5+ messages in thread
From: Martin Ågren @ 2025-01-09 18:33 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Johannes Schindelin

On Thu, 9 Jan 2025 at 15:24, Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> And this is how that race surfaces: When calling `make -j2 html man`
> from the top-level directory (a variant of which is invoked in Git for
> Windows' release process), two sub-processes are spawned, a `make -C
> Documentation html` one and a `make -C Documentation man` one. Both run
> the rule to (re-)generate `asciidoctor-extensions.rb` or
> `asciidoc.conf`, invoking `GIT-VERSION-GEN` to do so.

Nicely described. Indeed, there's a reason recursive make is considered
harmful. This is of course not the time or place for addressing that.

> Incidentally, this also fixes something else: The `+` character is
> not even a valid filename character on Windows. The only reason why Git
> for Windows did not need this is that above-mentioned POSIX emulation
> layer also plays a couple of tricks with filenames (tricks that are not
> interoperable with regular Windows programs, though), and previous
> attempts to remedy this in git/git were unsuccessful, see e.g.
> https://lore.kernel.org/git/pull.216.git.gitgitgadget@gmail.com/

> -       "$INPUT" >"$OUTPUT"+
> +       "$INPUT" >"$OUTPUT".$$
>
> -if ! test -f "$OUTPUT" || ! cmp "$OUTPUT"+ "$OUTPUT" >/dev/null
> +if ! test -f "$OUTPUT" || ! cmp "$OUTPUT".$$ "$OUTPUT" >/dev/null
>  then
> -       mv "$OUTPUT"+ "$OUTPUT"
> +       mv "$OUTPUT".$$ "$OUTPUT"
>  else
> -       rm "$OUTPUT"+
> +       rm "$OUTPUT".$$
>  fi

Our `.gitignore` contains an entry "*+" to ignore this sort of temporary
files. Yes, they're supposed to disappear within a second or so, but
according to f9bbaa384e (Add intermediate build products to .gitignore,
2009-11-08), they can linger after interrupted builds. Maybe separate
tooling built around git could pick up these as untracked files for a
second, causing them to come and go in whatever GUI.

You could use "$OUTPUT"."$$"+ to restore this. That of course
invalidates your remark about "Incidentally, ..." above, but might give
this fix a tiny bit less chance of regressing something somewhere?

Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] GIT-VERSION-GEN: allow it to be run in parallel
  2025-01-09 18:33 ` Martin Ågren
@ 2025-01-09 20:00   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2025-01-09 20:00 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Johannes Schindelin via GitGitGadget, git, Patrick Steinhardt,
	Johannes Schindelin

Martin Ågren <martin.agren@gmail.com> writes:

>> ... attempts to remedy this in git/git were unsuccessful, see e.g.
>> https://lore.kernel.org/git/pull.216.git.gitgitgadget@gmail.com/
> ...
> You could use "$OUTPUT"."$$"+ to restore this. That of course
> invalidates your remark about "Incidentally, ..." above, but might give
> this fix a tiny bit less chance of regressing something somewhere?

Thanks for being careful.

My reading of that old thread cited there tells me that the reason
that previous one failed was mostly because it wasn't being self
consistent and only touched the use of "+" in the Documentation
directory but not what the top-level Makefile did, and also because
it did not adjust .gitignore patterns, so it is good that somebody
actually read the cited thread and made sure this time we do better.

Again, I was not opposed to moving from "+" to something else that
is equally short-and-sweet, and I still am not ("~" is a fine suffix
for this kind of thing, for example).  But if we are aiming for a
short-term fix, I think your ".$$+" may make the most sense.

Thanks.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] GIT-VERSION-GEN: allow it to be run in parallel
  2025-01-09 14:24 [PATCH] GIT-VERSION-GEN: allow it to be run in parallel Johannes Schindelin via GitGitGadget
  2025-01-09 18:33 ` Martin Ågren
@ 2025-01-10 11:48 ` Johannes Schindelin via GitGitGadget
  2025-01-10 16:55   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2025-01-10 11:48 UTC (permalink / raw)
  To: git
  Cc: Patrick Steinhardt, Martin Ågren, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

"Why would one want to run it in parallel?" I hear you ask. I am glad
you are curious, because a curious story is what it is, indeed.

The `GIT-VERSION-GEN` script is quite a pillar of Git's source code,
with most lines being unchanged for the past 15 years. Until the v2.48.0
release candidate cycle.

Its original purpose was to generate the version string and store it in
the `GIT-VERSION-FILE`.

This paradigm changed quite dramatically when support for building with
Meson was introduced. Most crucially, a38edab7c88b (Makefile: generate
doc versions via GIT-VERSION-GEN, 2024-12-06) changed the way the
documentation is built by using the `GIT-VERSION-GEN` file to write out
the `asciidocor-extensions.rb` and `asciidoc.conf` files with now
hard-coded version strings.

Crucially, the Makefile rule to generate those files needs to be run in
every build because `GIT_VERSION` could have been specified in the
`make` command-line, which would require these files to be modified.

This introduced a surprising race condition!

And this is how that race surfaces: When calling `make -j2 html man`
from the top-level directory (a variant of which is invoked in Git for
Windows' release process), two sub-processes are spawned, a `make -C
Documentation html` one and a `make -C Documentation man` one. Both run
the rule to (re-)generate `asciidoctor-extensions.rb` or
`asciidoc.conf`, invoking `GIT-VERSION-GEN` to do so. That script first
generates a temporary file (appending the `+` character to the
filename), then looks whether it contains something different than the
already existing file (if it exists, that is), and either replaces it if
needed, or removes the temporary file. If one of the two parallel
invocations removes that temporary file before the other can compare it,
or even worse: if one tries to replace the target file just after the
other _started_ writing the temporary file (but did not finish writing
it yet), that race condition now causes bad builds.

This may sound highly theoretical, but due to the design of Git's build
process, Git for Windows is forced to use a (slow) POSIX emulation layer
to run that script and in the blink of an eye it becomes very much not
theoretical at all. See Exhibit A: These GitHub workflow runs failed
because one of the two competing `make` processes tried to remove the
temporary file when the other process had already done so:

https://github.com/git-for-windows/git-sdk-32/actions/runs/12663456654
https://github.com/git-for-windows/git-sdk-32/actions/runs/12683174970
https://github.com/git-for-windows/git-sdk-64/actions/runs/12649348496

While it is undesirable to run this script over and over again,
certainly when this involves above-mentioned slow POSIX emulation layer,
the stage of the release cycle in which we are presently finding
ourselves does not lend itself to a re-design where this script could be
run once, and once only, but instead dictates that a quick and reliable
work-around be implemented that prevents the race condition without
changing the overall architecture of the build process.

This patch does that: By using a filename suffix for the temporary file
which is based on the currently-executing script's process ID, We
guarantee that the two competing invocations cannot overwrite or remove
each others' temporary files.

The filename suffix still ends in `+` to ensure that the temporary
artifacts are matched by the `*+` pattern in `.gitignore` that was added
in f9bbaa384ef (Add intermediate build products to .gitignore,
2009-11-08).

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    GIT-VERSION-GEN: allow it to be run in parallel
    
    Changes since v1:
    
     * Appended + again, to get the benefit of the .gitignore pattern that
       prevents the temporary files from being committed.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1850%2Fdscho%2Fasciidoctor-extensions-gen-race-work-around-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1850/dscho/asciidoctor-extensions-gen-race-work-around-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1850

Range-diff vs v1:

 1:  7a1c39b9075 ! 1:  4057596dc16 GIT-VERSION-GEN: allow it to be run in parallel
     @@ Commit message
          guarantee that the two competing invocations cannot overwrite or remove
          each others' temporary files.
      
     -    Incidentally, this also fixes something else: The `+` character is
     -    not even a valid filename character on Windows. The only reason why Git
     -    for Windows did not need this is that above-mentioned POSIX emulation
     -    layer also plays a couple of tricks with filenames (tricks that are not
     -    interoperable with regular Windows programs, though), and previous
     -    attempts to remedy this in git/git were unsuccessful, see e.g.
     -    https://lore.kernel.org/git/pull.216.git.gitgitgadget@gmail.com/
     -
     -    This commit fixes one of the issues that are currently delaying Git for
     -    Windows v2.48.0-rc2.
     +    The filename suffix still ends in `+` to ensure that the temporary
     +    artifacts are matched by the `*+` pattern in `.gitignore` that was added
     +    in f9bbaa384ef (Add intermediate build products to .gitignore,
     +    2009-11-08).
      
     +    Helped-by: Martin Ågren <martin.agren@gmail.com>
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
       ## GIT-VERSION-GEN ##
     @@ GIT-VERSION-GEN: sed -e "s|@GIT_VERSION@|$GIT_VERSION|" \
       	-e "s|@GIT_USER_AGENT@|$GIT_USER_AGENT|" \
       	-e "s|@GIT_DATE@|$GIT_DATE|" \
      -	"$INPUT" >"$OUTPUT"+
     -+	"$INPUT" >"$OUTPUT".$$
     ++	"$INPUT" >"$OUTPUT".$$+
       
      -if ! test -f "$OUTPUT" || ! cmp "$OUTPUT"+ "$OUTPUT" >/dev/null
     -+if ! test -f "$OUTPUT" || ! cmp "$OUTPUT".$$ "$OUTPUT" >/dev/null
     ++if ! test -f "$OUTPUT" || ! cmp "$OUTPUT".$$+ "$OUTPUT" >/dev/null
       then
      -	mv "$OUTPUT"+ "$OUTPUT"
     -+	mv "$OUTPUT".$$ "$OUTPUT"
     ++	mv "$OUTPUT".$$+ "$OUTPUT"
       else
      -	rm "$OUTPUT"+
     -+	rm "$OUTPUT".$$
     ++	rm "$OUTPUT".$$+
       fi


 GIT-VERSION-GEN | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN
index 6d1cb66d69a..2e2d0811581 100755
--- a/GIT-VERSION-GEN
+++ b/GIT-VERSION-GEN
@@ -86,11 +86,11 @@ sed -e "s|@GIT_VERSION@|$GIT_VERSION|" \
 	-e "s|@GIT_BUILT_FROM_COMMIT@|$GIT_BUILT_FROM_COMMIT|" \
 	-e "s|@GIT_USER_AGENT@|$GIT_USER_AGENT|" \
 	-e "s|@GIT_DATE@|$GIT_DATE|" \
-	"$INPUT" >"$OUTPUT"+
+	"$INPUT" >"$OUTPUT".$$+
 
-if ! test -f "$OUTPUT" || ! cmp "$OUTPUT"+ "$OUTPUT" >/dev/null
+if ! test -f "$OUTPUT" || ! cmp "$OUTPUT".$$+ "$OUTPUT" >/dev/null
 then
-	mv "$OUTPUT"+ "$OUTPUT"
+	mv "$OUTPUT".$$+ "$OUTPUT"
 else
-	rm "$OUTPUT"+
+	rm "$OUTPUT".$$+
 fi

base-commit: a60673e9252b08d4eca90543b3729f4798b9aafd
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] GIT-VERSION-GEN: allow it to be run in parallel
  2025-01-10 11:48 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
@ 2025-01-10 16:55   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2025-01-10 16:55 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Patrick Steinhardt, Martin Ågren, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

>      * Appended + again, to get the benefit of the .gitignore pattern that
>        prevents the temporary files from being committed.

I had queued an incremental fix-up in my tree on top, which may have
been easier to read, but you did the testing on a system that showed
the problem earlier, so I'll drop it (together with the v1) and
replace them with this one.

Thanks.


--- >8 ---
Subject: [PATCH] Make sure the name of the temporary file ends with "+"

To avoid regression by leaving untracked cruft that is not covered
by the .gitignore file, match the convention to generate into
$name$suffix and then move to $name where $suffix ends with "+",
which is used everywhere else in the system.

diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN
index 5b49e2d72f..bc11258d9b 100755
--- a/GIT-VERSION-GEN
+++ b/GIT-VERSION-GEN
@@ -78,6 +78,8 @@ read GIT_MAJOR_VERSION GIT_MINOR_VERSION GIT_MICRO_VERSION GIT_PATCH_LEVEL trail
 $(echo "$GIT_VERSION" 0 0 0 0 | tr '.a-zA-Z-' ' ')
 EOF
 
+OUTPUT_TMP="$OUTPUT.$$+"
+
 sed -e "s|@GIT_VERSION@|$GIT_VERSION|" \
 	-e "s|@GIT_MAJOR_VERSION@|$GIT_MAJOR_VERSION|" \
 	-e "s|@GIT_MINOR_VERSION@|$GIT_MINOR_VERSION|" \
@@ -86,11 +88,11 @@ sed -e "s|@GIT_VERSION@|$GIT_VERSION|" \
 	-e "s|@GIT_BUILT_FROM_COMMIT@|$GIT_BUILT_FROM_COMMIT|" \
 	-e "s|@GIT_USER_AGENT@|$GIT_USER_AGENT|" \
 	-e "s|@GIT_DATE@|$GIT_DATE|" \
-	"$INPUT" >"$OUTPUT".$$
+	"$INPUT" >"$OUTPUT_TMP"
 
-if ! test -f "$OUTPUT" || ! cmp "$OUTPUT".$$ "$OUTPUT" >/dev/null
+if ! test -f "$OUTPUT" || ! cmp "$OUTPUT_TMP" "$OUTPUT" >/dev/null
 then
-	mv "$OUTPUT".$$ "$OUTPUT"
+	mv "$OUTPUT_TMP" "$OUTPUT"
 else
-	rm "$OUTPUT".$$
+	rm "$OUTPUT_TMP"
 fi

--- 8< ---

>  GIT-VERSION-GEN | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN
> index 6d1cb66d69a..2e2d0811581 100755
> --- a/GIT-VERSION-GEN
> +++ b/GIT-VERSION-GEN
> @@ -86,11 +86,11 @@ sed -e "s|@GIT_VERSION@|$GIT_VERSION|" \
>  	-e "s|@GIT_BUILT_FROM_COMMIT@|$GIT_BUILT_FROM_COMMIT|" \
>  	-e "s|@GIT_USER_AGENT@|$GIT_USER_AGENT|" \
>  	-e "s|@GIT_DATE@|$GIT_DATE|" \
> -	"$INPUT" >"$OUTPUT"+
> +	"$INPUT" >"$OUTPUT".$$+
>  
> -if ! test -f "$OUTPUT" || ! cmp "$OUTPUT"+ "$OUTPUT" >/dev/null
> +if ! test -f "$OUTPUT" || ! cmp "$OUTPUT".$$+ "$OUTPUT" >/dev/null
>  then
> -	mv "$OUTPUT"+ "$OUTPUT"
> +	mv "$OUTPUT".$$+ "$OUTPUT"
>  else
> -	rm "$OUTPUT"+
> +	rm "$OUTPUT".$$+
>  fi
>
> base-commit: a60673e9252b08d4eca90543b3729f4798b9aafd

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-01-10 16:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-09 14:24 [PATCH] GIT-VERSION-GEN: allow it to be run in parallel Johannes Schindelin via GitGitGadget
2025-01-09 18:33 ` Martin Ågren
2025-01-09 20:00   ` Junio C Hamano
2025-01-10 11:48 ` [PATCH v2] " Johannes Schindelin via GitGitGadget
2025-01-10 16:55   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).