* [PATCH 0/4] robustifying t5504
@ 2016-02-24 7:36 Jeff King
2016-02-24 7:40 ` [PATCH 1/4] write_or_die: handle EPIPE in async threads Jeff King
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Jeff King @ 2016-02-24 7:36 UTC (permalink / raw)
To: git; +Cc: Lars Schneider
I got a spurious test failure on t5504 while running the test suite
today. This is the result of my quest through the SIGPIPE rabbit hole.
Since this is not the first time I've investigated tests failing under
load, I finally broke down and wrote a helper script. It probably needs
some adapting for other people's environments, but I'll share it here in
case anyone is interested:
-- >8 --
#!/bin/sh
test=$1; shift
test=$(cd t && echo $test*.sh)
: ${GIT_STRESS_LOAD:=$(( 2 * $(grep -c ^processor /proc/cpuinfo)))}
: ${GIT_STRESS_ROOT:=/var/ram/git-stress}
mkdir -p "$GIT_STRESS_ROOT" || exit 1
rm -f "$GIT_STRESS_ROOT/fail"
trap 'echo aborted >"$GIT_STRESS_ROOT/fail"' TERM INT HUP
for i in $(seq $GIT_STRESS_LOAD); do
(
cd t &&
while ! test -e "$GIT_STRESS_ROOT/fail"
do
if ./$test \
--root="$GIT_STRESS_ROOT/trash-$i" \
-v -i >"$GIT_STRESS_ROOT/output-$i" 2>&1
then
echo >&2 "OK $i"
else
echo >&2 "FAIL $i"
cp "$GIT_STRESS_ROOT/output-$i" "$GIT_STRESS_ROOT/fail"
fi
done
) &
done
wait
cat "$GIT_STRESS_ROOT/fail"
-- 8< --
You can run it like "./stress t5504", and it will run t5504 over and
over in parallel until one instance fails. Without this patch series,
t5504 generally fails for me within 30 seconds or so. With it, I can run
for several minutes without problems.
[1/4]: write_or_die: handle EPIPE in async threads
[2/4]: fetch-pack: ignore SIGPIPE in sideband demuxer
[3/4]: test_must_fail: report number of unexpected signal
[4/4]: t5504: handle expected output from SIGPIPE death
-Peff
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] write_or_die: handle EPIPE in async threads
2016-02-24 7:36 [PATCH 0/4] robustifying t5504 Jeff King
@ 2016-02-24 7:40 ` Jeff King
2016-02-24 7:44 ` [PATCH 2/4] fetch-pack: ignore SIGPIPE in sideband demuxer Jeff King
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2016-02-24 7:40 UTC (permalink / raw)
To: git; +Cc: Lars Schneider
When write_or_die() sees EPIPE, it treats it specially by
converting it into a SIGPIPE death. We obviously cannot
ignore it, as the write has failed and the caller expects us
to die. But likewise, we cannot just call die(), because
printing any message at all would be a nuisance during
normal operations.
However, this is a problem if write_or_die() is called from
a thread. Our raised signal ends up killing the whole
process, when logically we just need to kill the thread
(after all, if we are ignoring SIGPIPE, there is good reason
to think that the main thread is expecting to handle it).
Inside an async thread, the die() code already does the
right thing, because we use our custom die_async() routine,
which calls pthread_join(). So ideally we would piggy-back
on that, and simply call:
die_quietly_with_code(141);
or similar. But refactoring the die code to do this is
surprisingly non-trivial. The die_routines themselves handle
both printing and the decision of the exit code. Every one
of them would have to be modified to take new parameters for
the code, and to tell us to be quiet.
Instead, we can just teach write_or_die() to check for the
async case and handle it specially. We do have to build an
interface to abstract the async exit, but it's simple and
self-contained. If we had many call-sites that wanted to do
this die_quietly_with_code(), this approach wouldn't scale
as well, but we don't. This is the only place where do this
weird exit trick.
Signed-off-by: Jeff King <peff@peff.net>
---
This is needed for patch 2, wherein we (surprise!) call write_or_die()
in a thread and want to ignore SIGPIPE.
Obviously another solution is "don't call write_or_die() in a thread",
and that would work for patch 2 here. But it can be hard to know whether
or not write_or_die() is called deep in the call stack. So even though
this solution is slightly more complex, I like that it behaves correctly
in all cases.
run-command.c | 10 ++++++++++
run-command.h | 1 +
write_or_die.c | 4 ++++
3 files changed, 15 insertions(+)
diff --git a/run-command.c b/run-command.c
index cdf0184..426387b 100644
--- a/run-command.c
+++ b/run-command.c
@@ -635,6 +635,11 @@ int in_async(void)
return !pthread_equal(main_thread, pthread_self());
}
+void async_exit(int code)
+{
+ pthread_exit((void *)(intptr_t)code);
+}
+
#else
static struct {
@@ -680,6 +685,11 @@ int in_async(void)
return process_is_async;
}
+int async_exit(int code)
+{
+ exit(code);
+}
+
#endif
int start_async(struct async *async)
diff --git a/run-command.h b/run-command.h
index d5a57f9..42917e8 100644
--- a/run-command.h
+++ b/run-command.h
@@ -121,6 +121,7 @@ struct async {
int start_async(struct async *async);
int finish_async(struct async *async);
int in_async(void);
+void NORETURN async_exit(int code);
/**
* This callback should initialize the child process and preload the
diff --git a/write_or_die.c b/write_or_die.c
index e7afe7a..49e80aa 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,8 +1,12 @@
#include "cache.h"
+#include "run-command.h"
static void check_pipe(int err)
{
if (err == EPIPE) {
+ if (in_async())
+ async_exit(141);
+
signal(SIGPIPE, SIG_DFL);
raise(SIGPIPE);
/* Should never happen, but just in case... */
--
2.7.2.645.g4e1306c
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/4] fetch-pack: ignore SIGPIPE in sideband demuxer
2016-02-24 7:36 [PATCH 0/4] robustifying t5504 Jeff King
2016-02-24 7:40 ` [PATCH 1/4] write_or_die: handle EPIPE in async threads Jeff King
@ 2016-02-24 7:44 ` Jeff King
2016-02-24 7:45 ` [PATCH 3/4] test_must_fail: report number of unexpected signal Jeff King
2016-02-24 7:48 ` [PATCH 4/4] t5504: handle expected output from SIGPIPE death Jeff King
3 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2016-02-24 7:44 UTC (permalink / raw)
To: git; +Cc: Lars Schneider
If the other side feeds us a bogus pack, index-pack (or
unpack-objects) may die early, before consuming all of its
input. As a result, the sideband demuxer may get SIGPIPE
(racily, depending on whether our data made it into the pipe
buffer or not). If this happens and we are compiled with
pthread support, it will take down the main thread, too.
This isn't the end of the world, as the main process will
just die() anyway when it sees index-pack failed. But it
does mean we don't get a chance to say "fatal: index-pack
failed" or similar. And it also means that we racily fail
t5504, as we sometimes die() and sometimes are killed by
SIGPIPE.
So let's ignore SIGPIPE while demuxing the sideband. We are
already careful to check the return value of write(), so we
won't waste time writing to a broken pipe. The caller will
notice the error return from the async thread, though in
practice we don't even get that far, as we die() as soon as
we see that index-pack failed.
The non-sideband case is already fine; we let index-pack
read straight from the socket, so there is no SIGPIPE at
all. Technically the non-threaded async case is also OK
without this (the forked async process gets SIGPIPE), but
it's not worth distinguishing from the threaded case here.
Signed-off-by: Jeff King <peff@peff.net>
---
It's tempting to just ignore SIGPIPE in all async processes,
as other sites may have similar problems (and we cannot
depend on SIGPIPE taking down the main thread anyway, as
async code may be implemented via fork()).
But that errs in the opposite direction. If an async process
does not check the return value of write(), it may
wastefully keep writing.
It would also mean implementing a pthread_sigmask() wrapper
on Windows. I have no idea how feasible that would be.
fetch-pack.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 01e34b6..f96f6df 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
#include "version.h"
#include "prio-queue.h"
#include "sha1-array.h"
+#include "sigchain.h"
static int transfer_unpack_limit = -1;
static int fetch_unpack_limit = -1;
@@ -671,9 +672,12 @@ static int everything_local(struct fetch_pack_args *args,
static int sideband_demux(int in, int out, void *data)
{
int *xd = data;
+ int ret;
- int ret = recv_sideband("fetch-pack", xd[0], out);
+ sigchain_push(SIGPIPE, SIG_IGN);
+ ret = recv_sideband("fetch-pack", xd[0], out);
close(out);
+ sigchain_pop(SIGPIPE);
return ret;
}
--
2.7.2.645.g4e1306c
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] test_must_fail: report number of unexpected signal
2016-02-24 7:36 [PATCH 0/4] robustifying t5504 Jeff King
2016-02-24 7:40 ` [PATCH 1/4] write_or_die: handle EPIPE in async threads Jeff King
2016-02-24 7:44 ` [PATCH 2/4] fetch-pack: ignore SIGPIPE in sideband demuxer Jeff King
@ 2016-02-24 7:45 ` Jeff King
2016-02-24 7:48 ` [PATCH 4/4] t5504: handle expected output from SIGPIPE death Jeff King
3 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2016-02-24 7:45 UTC (permalink / raw)
To: git; +Cc: Lars Schneider
If a command is marked as test_must_fail but dies with a
signal, we consider that a problem and report the error to
stderr. However, we don't say _which_ signal; knowing that
can make debugging easier. Let's share as much as we know.
Signed-off-by: Jeff King <peff@peff.net>
---
Not necessary for the fix, obviously, but I implemented this while
trying to figure out what in the world was going on with the
write_or_die() thing. :)
t/test-lib-functions.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index c64e5a5..8d99eb3 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -617,7 +617,7 @@ test_must_fail () {
return 0
elif test $exit_code -gt 129 && test $exit_code -le 192
then
- echo >&2 "test_must_fail: died by signal: $*"
+ echo >&2 "test_must_fail: died by signal $(($exit_code - 128)): $*"
return 1
elif test $exit_code -eq 127
then
--
2.7.2.645.g4e1306c
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 4/4] t5504: handle expected output from SIGPIPE death
2016-02-24 7:36 [PATCH 0/4] robustifying t5504 Jeff King
` (2 preceding siblings ...)
2016-02-24 7:45 ` [PATCH 3/4] test_must_fail: report number of unexpected signal Jeff King
@ 2016-02-24 7:48 ` Jeff King
3 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2016-02-24 7:48 UTC (permalink / raw)
To: git; +Cc: Lars Schneider
Commit 8bf4bec (add "ok=sigpipe" to test_must_fail and use
it to fix flaky tests, 2015-11-27) taught t5504 to handle
"git push" racily exiting with SIGPIPE rather than failing.
However, one of the tests checks the output of the command,
as well. In the SIGPIPE case, we will not have produced any
output. If we want the test to be truly non-flaky, we have
to accept either output.
Signed-off-by: Jeff King <peff@peff.net>
---
I'm not sure we're really accomplishing anything with this test_cmp
anymore. We'll complain only if we get _some_ output, and it's not the
expected output. I'm not sure if that's actually useful.
It looks like 8bf4bec just dropped the test_cmp completely in one of the
cases. So an alternative here would be to do the same.
We _could_ also tighten this, to make sure the output matches the exit
code we got (right now we can get code=128 with blank output and not
complain, even though that's clearly bogus). But we'd have to abandon
test_must_fail and do things manually.
t/t5504-fetch-receive-strict.sh | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 89224ed..a3e12d2 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -101,7 +101,10 @@ test_expect_success 'push with receive.fsckobjects' '
git config transfer.fsckobjects false
) &&
test_must_fail ok=sigpipe git push --porcelain dst master:refs/heads/test >act &&
- test_cmp exp act
+ {
+ test_cmp exp act ||
+ ! test -s act
+ }
'
test_expect_success 'push with transfer.fsckobjects' '
--
2.7.2.645.g4e1306c
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-02-24 7:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-24 7:36 [PATCH 0/4] robustifying t5504 Jeff King
2016-02-24 7:40 ` [PATCH 1/4] write_or_die: handle EPIPE in async threads Jeff King
2016-02-24 7:44 ` [PATCH 2/4] fetch-pack: ignore SIGPIPE in sideband demuxer Jeff King
2016-02-24 7:45 ` [PATCH 3/4] test_must_fail: report number of unexpected signal Jeff King
2016-02-24 7:48 ` [PATCH 4/4] t5504: handle expected output from SIGPIPE death Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).