From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "René Scharfe" <l.s.r@web.de>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: [PATCH v5 0/2] tests(mingw): avoid super-slow mingw_test_cmp
Date: Tue, 06 Dec 2022 15:07:45 +0000 [thread overview]
Message-ID: <pull.1309.v5.git.1670339267.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1309.v4.git.1668434812.gitgitgadget@gmail.com>
Half a year ago, directly after sending a patch to fix a performance
regression due to a mis-use of test_cmp
[https://lore.kernel.org/git/b9203ea247776332e4b6f519aa27d541207adc2f.1659097724.git.gitgitgadget@gmail.com/],
I got curious to see whether Git for Windows had the same issue. And it did
not: it passed t5351 in 22 seconds, even while using test_cmp to compare
pack files
[https://github.com/git-for-windows/git/blob/3922f62f0d5991e9fe0a0817ebf89a91339c7705/t/t5351-unpack-large-objects.sh#L90].
This motivated me to upstream Git for Windows' changes.
Changes since v4:
* The commit message was rewritten almost completely.
Changes since v3:
* Fixed the subject of the cover letter (which should have been adjusted in
v3)
* Elaborated the paragraph about the historical context of this patch
Changes since v2:
* Dropped the test helper, using diff --no-index instead.
Changes since v1:
* Fixed double "with" in the commit message.
* Renamed the test helper to text-cmp.
* Made the diff --no-index call more robust by using a double-dash
separator.
Johannes Schindelin (2):
t0021: use Windows-friendly `pwd`
tests(mingw): avoid very slow `mingw_test_cmp`
t/t0021-conversion.sh | 4 +--
t/test-lib-functions.sh | 66 -----------------------------------------
t/test-lib.sh | 2 +-
3 files changed, 3 insertions(+), 69 deletions(-)
base-commit: 23b219f8e3f2adfb0441e135f0a880e6124f766c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1309%2Fdscho%2Fmingw-test-cmp-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1309/dscho/mingw-test-cmp-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/1309
Range-diff vs v4:
1: b38b8fb5a85 = 1: b38b8fb5a85 t0021: use Windows-friendly `pwd`
2: 128b1f348d8 ! 2: 6a80fab7e39 tests(mingw): avoid very slow `mingw_test_cmp`
@@ Metadata
## Commit message ##
tests(mingw): avoid very slow `mingw_test_cmp`
- It is more performant to run `git diff --no-index` than running the
- `mingw_test_cmp` code with MSYS2's Bash, i.e. the Bash that Git for
- Windows uses. And a lot more readable.
+ When Git's test suite uses `test_cmp`, it is not actually trying to
+ compare binary files as the name `cmp` would suggest to users familiar
+ with Unix' tools, but the tests instead verify that actual output
+ matches the expected text.
- The original reason why Git's test suite needs the `mingw_test_cmp`
- function at all (and why `cmp` is not good enough) is that Git's test
- suite is not actually trying to compare binary files when it calls
- `test_cmp`, but it compares text files. And those text files can contain
- CR/LF line endings depending on the circumstances.
+ On Unix, `cmp` works well enough for Git's purposes because only Line
+ Feed characters are used as line endings. However, on Windows, while
+ most tools accept Line Feeds as line endings, many tools produce
+ Carriage Return + Line Feed line endings, including some of the tools
+ used by the test suite (which are therefore provided via Git for Windows
+ SDK). Therefore, `cmp` would frequently fail merely due to different
+ line endings.
- Note: The original fix in the Git for Windows project implemented a test
- helper that avoids the overhead of the diff machinery, in favor of
- implementing a behavior that is more in line with what `mingw_test_cmp`
- does now. This was done to minimize the risk in using something as
- complex as the diff machinery to perform something as simple as
- determining whether text output is identical to the expected output or
- not. This approach has served Git for Windows well for years, but the
- attempt to upstream this saw a lot of backlash and distractions during
- the review, was disliked by the Git maintainer and was therefore
- abandoned. For full details, see the thread at
- https://lore.kernel.org/git/pull.1309.git.1659106382128.gitgitgadget@gmail.com/t
+ To accommodate for that, the `mingw_test_cmp` function was introduced
+ into Git's test suite to perform a line-by-line comparison that ignores
+ line endings. This function is a Bash function that is only used on
+ Windows, everywhere else `cmp` is used.
+
+ This is a double whammy because `cmp` is fast, and `mingw_test_cmp` is
+ slow, even more so on Windows because it is a Bash script function, and
+ Bash scripts are known to run particularly slowly on Windows due to
+ Bash's need for the POSIX emulation layer provided by the MSYS2 runtime.
+
+ The commit message of 32ed3314c104 (t5351: avoid using `test_cmp` for
+ binary data, 2022-07-29) provides an illuminating account of the
+ consequences: On Windows, the platform on which Git could really use all
+ the help it can get to improve its performance, the time spent on one
+ entire test script was reduced from half an hour to less than half a
+ minute merely by avoiding a single call to `mingw_test_cmp` in but a
+ single test case.
+
+ Learning the lesson to avoid shell scripting wherever possible, the Git
+ for Windows project implemented a minimal replacement for
+ `mingw_test_cmp` in the form of a `test-tool` subcommand that parses the
+ input files line by line, ignoring line endings, and compares them.
+ Essentially the same thing as `mingw_test_cmp`, but implemented in
+ C instead of Bash. This solution served the Git for Windows project
+ well, over years.
+
+ However, when this solution was finally upstreamed, the conclusion was
+ reached that a change to use `git diff --no-index` instead of
+ `mingw_test_cmp` was more easily reviewed and hence should be used
+ instead.
+
+ The reason why this approach was not even considered in Git for Windows
+ is that in 2007, there was already a motion on the table to use Git's
+ own diff machinery to perform comparisons in Git's test suite, but it
+ was dismissed in https://lore.kernel.org/git/xmqqbkrpo9or.fsf@gitster.g/
+ as undesirable because tests might potentially succeed due to bugs in
+ the diff machinery when they should not succeed, and those bugs could
+ therefore hide regressions that the tests try to prevent.
+
+ By the time Git for Windows' `mingw-test-cmp` in C was finally
+ contributed to the Git mailing list, reviewers agreed that the diff
+ machinery had matured enough and should be used instead.
+
+ When the concern was raised that the diff machinery, due to its
+ complexity, would perform substantially worse than the test helper
+ originally implemented in the Git for Windows project, a test
+ demonstrated that these performance differences are well lost within the
+ 100+ minutes it takes to run Git's test suite on Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
--
gitgitgadget
next prev parent reply other threads:[~2022-12-06 15:11 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-29 14:53 [PATCH] tests: replace mingw_test_cmp with a helper in C Johannes Schindelin via GitGitGadget
2022-07-29 14:54 ` Johannes Schindelin
2022-07-29 16:44 ` Junio C Hamano
2022-09-06 13:10 ` Johannes Schindelin
2022-09-07 12:09 ` René Scharfe
2022-09-07 16:25 ` Junio C Hamano
2022-09-07 21:45 ` Re* " Junio C Hamano
2022-09-07 22:39 ` René Scharfe
2022-09-08 0:03 ` Junio C Hamano
2022-09-08 8:59 ` René Scharfe
2022-09-08 15:26 ` Ævar Arnfjörð Bjarmason
2022-09-08 20:54 ` Johannes Schindelin
2022-09-08 21:09 ` Junio C Hamano
2022-09-06 13:10 ` [PATCH v2 0/2] " Johannes Schindelin via GitGitGadget
2022-09-06 13:10 ` [PATCH v2 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-09-06 13:10 ` [PATCH v2 2/2] tests: replace mingw_test_cmp with a helper in C Johannes Schindelin via GitGitGadget
2022-09-07 11:57 ` Ævar Arnfjörð Bjarmason
2022-09-07 12:24 ` Ævar Arnfjörð Bjarmason
2022-09-07 19:45 ` Junio C Hamano
2022-09-07 9:04 ` [PATCH v2 0/2] " Johannes Schindelin
2022-11-12 22:07 ` [PATCH v3 " Johannes Schindelin via GitGitGadget
2022-11-12 22:07 ` [PATCH v3 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-11-12 22:07 ` [PATCH v3 2/2] tests(mingw): avoid very slow `mingw_test_cmp` Johannes Schindelin via GitGitGadget
2022-11-13 4:51 ` Taylor Blau
2022-11-14 13:34 ` Johannes Schindelin
2022-11-18 23:15 ` Junio C Hamano
2022-11-19 2:53 ` Taylor Blau
2022-11-19 12:03 ` Ævar Arnfjörð Bjarmason
2022-11-19 8:18 ` Johannes Sixt
2022-11-19 17:50 ` René Scharfe
2022-11-20 9:29 ` Torsten Bögershausen
2022-11-21 17:49 ` Johannes Sixt
2022-11-21 3:13 ` Junio C Hamano
2022-11-14 9:53 ` Phillip Wood
2022-11-14 13:47 ` Johannes Schindelin
2022-11-14 11:55 ` Ævar Arnfjörð Bjarmason
2022-11-14 14:02 ` Johannes Schindelin
2022-11-14 15:23 ` Ævar Arnfjörð Bjarmason
2022-11-18 23:19 ` Junio C Hamano
2022-11-19 2:56 ` Taylor Blau
2022-11-19 11:54 ` Ævar Arnfjörð Bjarmason
2022-11-21 3:17 ` Junio C Hamano
2022-11-14 14:06 ` [PATCH v4 0/2] tests(mingw): avoid super-slow mingw_test_cmp Johannes Schindelin via GitGitGadget
2022-11-14 14:06 ` [PATCH v4 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-11-14 14:06 ` [PATCH v4 2/2] tests(mingw): avoid very slow `mingw_test_cmp` Johannes Schindelin via GitGitGadget
2022-11-14 22:40 ` Taylor Blau
2022-11-18 13:32 ` Johannes Schindelin
2022-11-18 18:14 ` Taylor Blau
2022-11-20 23:36 ` Johannes Schindelin
2022-11-21 0:07 ` Taylor Blau
2022-12-06 15:07 ` Johannes Schindelin via GitGitGadget [this message]
2022-12-06 15:07 ` [PATCH v5 1/2] t0021: use Windows-friendly `pwd` Johannes Schindelin via GitGitGadget
2022-12-06 15:07 ` [PATCH v5 2/2] tests(mingw): avoid very slow `mingw_test_cmp` Johannes Schindelin via GitGitGadget
2022-12-06 18:55 ` Ævar Arnfjörð Bjarmason
2022-12-06 21:52 ` Johannes Sixt
2022-12-06 21:54 ` René Scharfe
2022-12-07 4:33 ` Junio C Hamano
2022-12-07 1:31 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1309.v5.git.1670339267.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=johannes.schindelin@gmx.de \
--cc=l.s.r@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.