git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tests: handle "funny" exit code 127 produced by MSVC-compiled exes
@ 2023-10-30 15:45 Johannes Schindelin via GitGitGadget
  2023-10-30 17:56 ` Jeff King
  2023-11-01 13:03 ` [PATCH v2] max_tree_depth: lower it for MSVC to avoid stack overflows Johannes Schindelin via GitGitGadget
  0 siblings, 2 replies; 4+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-10-30 15:45 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

The exit code 127 is well-documented to mean: command not found.

Unfortunately, it is also used as fall-back in Cygwin's
`pinfo::status_exit()` method (which maps things like Windows'
`STATUS_ACCESS_VIOLATION` to `128 | SIGSEGV`).

This is particularly unfortunate because there is no explicit mapping
for `STATUS_STACK_OVERFLOW`. Meaning: when MSVC-compiled executables
produce a stack overflow the exit code in the Cygwin Bash will be 127.
Consequently, the same will be true for the MSYS2 Bash that is used by
Git for Windows.

Now, `jk/tree-name-and-depth-limit` introduces a pair of test cases that
expect a command that produces a stack overflow to fail, which it
typically does with exit code 139 (which means SIGSEGV).

But since MSVC-compiled `git.exe` exits with `STATUS_STACK_OVERFLOW`
which the MSYS2 runtime maps to 127, and since 127 is taken to mean
"command not found" by `test_must_fail`, even though everything works as
planned the two new test cases fail when run in `win+VS test`.

Let's work around this by:

1) recording which C compiler was used, and

2) adding an MSVC-only exception to `test_must_fail` to treat 127 as a
   regular failure.

There is a slight downside of this approach in that a real missing
command could be mistaken for a failure. However, this would be caught
on other platforms, and besides, we use `test_must_fail` only for `git`
and `scalar` anymore, and we can be pretty certain that both are there.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Fix t6700.[45] in win+VS test
    
    These two test cases have been failing for a while in Git for Windows'
    shears/* branches. Took a good while to figure out, too.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1604%2Fdscho%2Ffix-vs-win-test-with-new-depth-limit-test-cases-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1604/dscho/fix-vs-win-test-with-new-depth-limit-test-cases-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1604

 contrib/buildsystems/CMakeLists.txt | 3 ++-
 t/test-lib-functions.sh             | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index 6b819e2fbdf..e164484be98 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -1057,7 +1057,8 @@ if(NOT PYTHON_TESTS)
 	set(NO_PYTHON 1)
 endif()
 
-file(WRITE ${CMAKE_BINARY_DIR}/GIT-BUILD-OPTIONS "SHELL_PATH='${SHELL_PATH}'\n")
+file(WRITE ${CMAKE_BINARY_DIR}/GIT-BUILD-OPTIONS "CMAKE_C_COMPILER='${CMAKE_C_COMPILER}'\n")
+file(APPEND ${CMAKE_BINARY_DIR}/GIT-BUILD-OPTIONS "SHELL_PATH='${SHELL_PATH}'\n")
 file(APPEND ${CMAKE_BINARY_DIR}/GIT-BUILD-OPTIONS "TEST_SHELL_PATH='${TEST_SHELL_PATH}'\n")
 file(APPEND ${CMAKE_BINARY_DIR}/GIT-BUILD-OPTIONS "PERL_PATH='${PERL_PATH}'\n")
 file(APPEND ${CMAKE_BINARY_DIR}/GIT-BUILD-OPTIONS "DIFF='${DIFF}'\n")
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 2f8868caa17..ee19c748973 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1112,6 +1112,9 @@ test_must_fail () {
 		return 1
 	elif test $exit_code -eq 127
 	then
+		# Work-around for MSVC-compiled executables
+		case "$CMAKE_C_COMPILER" in *MSVC*) return 0;; esac
+
 		echo >&4 "test_must_fail: command not found: $*"
 		return 1
 	elif test $exit_code -eq 126

base-commit: 3130c155df9a65ebccf128b4af5a19af49532580
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] tests: handle "funny" exit code 127 produced by MSVC-compiled exes
  2023-10-30 15:45 [PATCH] tests: handle "funny" exit code 127 produced by MSVC-compiled exes Johannes Schindelin via GitGitGadget
@ 2023-10-30 17:56 ` Jeff King
  2023-11-01 13:03 ` [PATCH v2] max_tree_depth: lower it for MSVC to avoid stack overflows Johannes Schindelin via GitGitGadget
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff King @ 2023-10-30 17:56 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Mon, Oct 30, 2023 at 03:45:32PM +0000, Johannes Schindelin via GitGitGadget wrote:

> Now, `jk/tree-name-and-depth-limit` introduces a pair of test cases that
> expect a command that produces a stack overflow to fail, which it
> typically does with exit code 139 (which means SIGSEGV).

I think you're misinterpreting the purpose of the tests from that
series; they're not intended to segfault. Quoting from t6700:

  # We'll test against two depths here: a small one that will let us check the
  # behavior of the config setting easily, and a large one that should be
  # forbidden by default. Testing the default depth will let us know whether our
  # default is enough to prevent segfaults on systems that run the tests.

So for the "big tree" tests in that file, we are looking for a
controlled failure rather than a segfault. And indeed, the end of that
series already lowered the default to accommodate the msys windows
build; see the discussion in 4d5693ba05 (lower core.maxTreeDepth default
to 2048, 2023-08-31).

So I think the test is working as designed here: it is showing us that
the default value is not sufficient to protect MSVC builds from running
out of stack space. There are a few options there:

  1. We can lower the default everywhere.

  2. We can lower it just for MSVC builds.

  3. We can accept the situation and skip the tests for that build.

There's a bit more discussion in the commit I referenced above.

> Let's work around this by:
> 
> 1) recording which C compiler was used, and
> 
> 2) adding an MSVC-only exception to `test_must_fail` to treat 127 as a
>    regular failure.
> 
> There is a slight downside of this approach in that a real missing
> command could be mistaken for a failure. However, this would be caught
> on other platforms, and besides, we use `test_must_fail` only for `git`
> and `scalar` anymore, and we can be pretty certain that both are there.

I think there is another much worse downside to your patch: we will stop
noticing when MSVC builds segfault in the tests. The purpose of
test_must_fail is to allow controlled and expected failure returns from
the command, but still report on unexpected situations (signal death,
command not found, and so on).

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2] max_tree_depth: lower it for MSVC to avoid stack overflows
  2023-10-30 15:45 [PATCH] tests: handle "funny" exit code 127 produced by MSVC-compiled exes Johannes Schindelin via GitGitGadget
  2023-10-30 17:56 ` Jeff King
@ 2023-11-01 13:03 ` Johannes Schindelin via GitGitGadget
  2023-11-01 20:18   ` Jeff King
  1 sibling, 1 reply; 4+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2023-11-01 13:03 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

There seems to be some internal stack overflow detection in MSVC's
`malloc()` machinery that seems to be independent of the `stack reserve`
and `heap reserve` sizes specified in the executable (editable via
`EDITBIN /STACK:<n> <exe>` and `EDITBIN /HEAP:<n> <exe>`).

In the newly test cases added by `jk/tree-name-and-depth-limit`, this
stack overflow detection is unfortunately triggered before Git can print
out the error message about too-deep trees and exit gracefully. Instead,
it exits with `STATUS_STACK_OVERFLOW`. This corresponds to the numeric
value -1073741571, something the MSYS2 runtime we sadly need to use to
run Git's test suite cannot handle and which it internally maps to the
exit code 127. Git's test suite, in turn, mistakes this to mean that the
command was not found, and fails both test cases.

Here is an example stack trace from an example run:

    [0x0]   ntdll!RtlpAllocateHeap+0x31   0x4212603f50   0x7ff9d6d4cd49
    [0x1]   ntdll!RtlpAllocateHeapInternal+0x6c9   0x42126041b0   0x7ff9d6e14512
    [0x2]   ntdll!RtlDebugAllocateHeap+0x102   0x42126042b0   0x7ff9d6dcd8b0
    [0x3]   ntdll!RtlpAllocateHeap+0x7ec70   0x4212604350   0x7ff9d6d4cd49
    [0x4]   ntdll!RtlpAllocateHeapInternal+0x6c9   0x42126045b0   0x7ff9596ed480
    [0x5]   ucrtbased!heap_alloc_dbg_internal+0x210   0x42126046b0   0x7ff9596ed20d
    [0x6]   ucrtbased!heap_alloc_dbg+0x4d   0x4212604750   0x7ff9596f037f
    [0x7]   ucrtbased!_malloc_dbg+0x2f   0x42126047a0   0x7ff9596f0dee
    [0x8]   ucrtbased!malloc+0x1e   0x42126047d0   0x7ff730fcc1ef
    [0x9]   git!do_xmalloc+0x2f   0x4212604800   0x7ff730fcc2b9
    [0xa]   git!do_xmallocz+0x59   0x4212604840   0x7ff730fca779
    [0xb]   git!xmallocz_gently+0x19   0x4212604880   0x7ff7311b0883
    [0xc]   git!unpack_compressed_entry+0x43   0x42126048b0   0x7ff7311ac9a4
    [0xd]   git!unpack_entry+0x554   0x42126049a0   0x7ff7311b0628
    [0xe]   git!cache_or_unpack_entry+0x58   0x4212605250   0x7ff7311ad3a8
    [0xf]   git!packed_object_info+0x98   0x42126052a0   0x7ff7310a92da
    [0x10]   git!do_oid_object_info_extended+0x3fa   0x42126053b0   0x7ff7310a44e7
    [0x11]   git!oid_object_info_extended+0x37   0x4212605460   0x7ff7310a38ba
    [0x12]   git!repo_read_object_file+0x9a   0x42126054a0   0x7ff7310a6147
    [0x13]   git!read_object_with_reference+0x97   0x4212605560   0x7ff7310b4656
    [0x14]   git!fill_tree_descriptor+0x66   0x4212605620   0x7ff7310dc0a5
    [0x15]   git!traverse_trees_recursive+0x3f5   0x4212605680   0x7ff7310dd831
    [0x16]   git!unpack_callback+0x441   0x4212605790   0x7ff7310b4c95
    [0x17]   git!traverse_trees+0x5d5   0x42126058a0   0x7ff7310dc0f2
    [0x18]   git!traverse_trees_recursive+0x442   0x4212605980   0x7ff7310dd831
    [0x19]   git!unpack_callback+0x441   0x4212605a90   0x7ff7310b4c95
    [0x1a]   git!traverse_trees+0x5d5   0x4212605ba0   0x7ff7310dc0f2
    [0x1b]   git!traverse_trees_recursive+0x442   0x4212605c80   0x7ff7310dd831
    [0x1c]   git!unpack_callback+0x441   0x4212605d90   0x7ff7310b4c95
    [0x1d]   git!traverse_trees+0x5d5   0x4212605ea0   0x7ff7310dc0f2
    [0x1e]   git!traverse_trees_recursive+0x442   0x4212605f80   0x7ff7310dd831
    [0x1f]   git!unpack_callback+0x441   0x4212606090   0x7ff7310b4c95
    [0x20]   git!traverse_trees+0x5d5   0x42126061a0   0x7ff7310dc0f2
    [0x21]   git!traverse_trees_recursive+0x442   0x4212606280   0x7ff7310dd831
    [...]
    [0xfad]   git!cmd_main+0x2a2   0x42126ff740   0x7ff730fb6345
    [0xfae]   git!main+0xe5   0x42126ff7c0   0x7ff730fbff93
    [0xfaf]   git!wmain+0x2a3   0x42126ff830   0x7ff731318859
    [0xfb0]   git!invoke_main+0x39   0x42126ff8a0   0x7ff7313186fe
    [0xfb1]   git!__scrt_common_main_seh+0x12e   0x42126ff8f0   0x7ff7313185be
    [0xfb2]   git!__scrt_common_main+0xe   0x42126ff960   0x7ff7313188ee
    [0xfb3]   git!wmainCRTStartup+0xe   0x42126ff990   0x7ff9d5ed257d
    [0xfb4]   KERNEL32!BaseThreadInitThunk+0x1d   0x42126ff9c0   0x7ff9d6d6aa78
    [0xfb5]   ntdll!RtlUserThreadStart+0x28   0x42126ff9f0   0x0

I verified manually that `traverse_trees_cur_depth` was 562 when that
happened, which is far below the 2048 that were already accepted into
Git as a hard limit.

Despite many attempts to figure out which of the internals trigger this
`STATUS_STACK_OVERFLOW` and how to maybe increase certain sizes to avoid
running into this issue and let Git behave the same way as under Linux,
I failed to find any build-time/runtime knob we could turn to that
effect.

Note: even switching to using a different allocator (I used mimalloc
because that's what Git for Windows uses for its GCC builds) does not
help, as the zlib code used to unpack compressed pack entries _still_
uses the regular `malloc()`. And runs into the same issue.

Note also: switching to using a different allocator _also_ for zlib code
seems _also_ not to help. I tried that, and it still exited with
`STATUS_STACK_OVERFLOW` that seems to have been triggered by a
`mi_assert_internal()`, i.e. an internal assertion of mimalloc...

So the best bet to work around this for now seems to just lower the
maximum allowed tree depth _even further_ for MSVC builds.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Fix t6700.[45] in win+VS test
    
    These two test cases have been failing for a while in Git for Windows'
    shears/* branches. Took a good while to figure out, too.
    
    Changes since v1:
    
     * Rewrite the patch to instead lower the max_allowed_tree_depth
       threshold even further for MSVC, side-stepping the stack overflow.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1604%2Fdscho%2Ffix-vs-win-test-with-new-depth-limit-test-cases-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1604/dscho/fix-vs-win-test-with-new-depth-limit-test-cases-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1604

Range-diff vs v1:

 1:  0e6e53bd824 < -:  ----------- tests: handle "funny" exit code 127 produced by MSVC-compiled exes
 -:  ----------- > 1:  5f738a78eb1 max_tree_depth: lower it for MSVC to avoid stack overflows


 environment.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/environment.c b/environment.c
index bb3c2a96a33..9e37bf58c0c 100644
--- a/environment.c
+++ b/environment.c
@@ -81,7 +81,20 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
 enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
-int max_allowed_tree_depth = 2048;
+int max_allowed_tree_depth =
+#ifdef _MSC_VER
+	/*
+	 * When traversing into too-deep trees, Visual C-compiled Git seems to
+	 * run into some internal stack overflow detection in the
+	 * `RtlpAllocateHeap()` function that is called from within
+	 * `git_inflate_init()`'s call tree. The following value seems to be
+	 * low enough to avoid that by letting Git exit with an error before
+	 * the stack overflow can occur.
+	 */
+	512;
+#else
+	2048;
+#endif
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0

base-commit: 3130c155df9a65ebccf128b4af5a19af49532580
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] max_tree_depth: lower it for MSVC to avoid stack overflows
  2023-11-01 13:03 ` [PATCH v2] max_tree_depth: lower it for MSVC to avoid stack overflows Johannes Schindelin via GitGitGadget
@ 2023-11-01 20:18   ` Jeff King
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff King @ 2023-11-01 20:18 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin

On Wed, Nov 01, 2023 at 01:03:30PM +0000, Johannes Schindelin via GitGitGadget wrote:

> So the best bet to work around this for now seems to just lower the
> maximum allowed tree depth _even further_ for MSVC builds.

Thanks for rewriting this. The resulting patch looks good to me.

Just a few small thoughts:

> There seems to be some internal stack overflow detection in MSVC's
> `malloc()` machinery that seems to be independent of the `stack reserve`
> and `heap reserve` sizes specified in the executable (editable via
> `EDITBIN /STACK:<n> <exe>` and `EDITBIN /HEAP:<n> <exe>`).

Yikes, I'm sure that paragraph sums up a painful debugging journey. :)

> In the newly test cases added by `jk/tree-name-and-depth-limit`, this
> stack overflow detection is unfortunately triggered before Git can print
> out the error message about too-deep trees and exit gracefully. Instead,
> it exits with `STATUS_STACK_OVERFLOW`. This corresponds to the numeric
> value -1073741571, something the MSYS2 runtime we sadly need to use to
> run Git's test suite cannot handle and which it internally maps to the
> exit code 127. Git's test suite, in turn, mistakes this to mean that the
> command was not found, and fails both test cases.

I think this detail is OK, but the bit about mistaking 127 is IMHO kind
of irrelevant to the purpose of the patch. The whole point of those
tests is that they would trigger in a segfault to alert us that the
default depth limit was too high, and they did. So it was in fact lucky
that even though the segfault was munged into 127, our test_must_fail
still noticed it.

> Note: even switching to using a different allocator (I used mimalloc
> because that's what Git for Windows uses for its GCC builds) does not
> help, as the zlib code used to unpack compressed pack entries _still_
> uses the regular `malloc()`. And runs into the same issue.

I didn't think zlib ever malloc'd, since we feed it streaming data (and
it will return and ask us to flush if the output buffer is full). But I
admit I haven't dug too far into it, and it sounds like you may have.

What I was wondering specifically is whether you're actually hitting the
raw malloc() (as opposed to xmalloc) calls in diff-delta.c (which would
depend on how you've set up the different allocator).

Either way, changing anything there is well outside the scope of your
patch. I've just always wondered if those raw malloc() calls might cause
headaches, and whether this might be a concrete example of such.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-01 20:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-30 15:45 [PATCH] tests: handle "funny" exit code 127 produced by MSVC-compiled exes Johannes Schindelin via GitGitGadget
2023-10-30 17:56 ` Jeff King
2023-11-01 13:03 ` [PATCH v2] max_tree_depth: lower it for MSVC to avoid stack overflows Johannes Schindelin via GitGitGadget
2023-11-01 20:18   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).