* [PATCH] t4053: avoid writing to unopened pipe
@ 2023-08-13 16:24 Jeff King
2023-08-14 15:23 ` Phillip Wood
0 siblings, 1 reply; 2+ messages in thread
From: Jeff King @ 2023-08-13 16:24 UTC (permalink / raw)
To: Phillip Wood; +Cc: git, Junio C Hamano
This fixes an occasional hang I see when running t4053 with
--verbose-log using dash.
Commit 1e3f26542a (diff --no-index: support reading from named pipes,
2023-07-05) added a test that "diff --no-index" will complain when
comparing a named pipe and a directory. The minimum we need to test this
is to mkfifo the pipe, and then run "git diff --no-index pipe some_dir".
But the test does one thing more: it spawns a background shell process
that opens the pipe for writing, like this:
{
(>pipe) &
} &&
This extra writer _could_ be useful if Git misbehaves and tries to open
the pipe for reading. Without the writer, Git would block indefinitely
and the test would never end. But since we do not have such a bug, Git
does not open the pipe and it is the writing process which will block
indefinitely, since there are no readers. The test addresses this by
running "kill $!" in a test_when_finished block. Since the writer should
be blocking forever, this kill command will reliably find it waiting.
However, this seems to be somewhat racy, in that the writing process
sometimes hangs around even after the "kill". In a normal run of the
test script without options, this doesn't have any effect; the
main test script completes anyway. But with --verbose-log, we spawn a
"tee" process that reads the script output, and it won't end until all
descriptors pointing to its input pipe are closed. And the background
process that is hanging around still has its stderr, etc, pointed into
that pipe.
You can reproduce the situation like this:
cd t
./t4053-diff-no-index.sh --verbose-log --stress
Let that run for a few minutes, and then you'll find that some of the
runs have hung. For example, at 11:53, I ran:
$ ps xk start o pid,start,command | grep tee | head
713459 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-9.out
713527 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-15.out
719434 11:48:07 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-1.out
728117 11:48:08 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-5.out
738738 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-31.out
739457 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-27.out
744432 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-21.out
744471 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-29.out
761961 11:48:12 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-0.out
812299 11:48:19 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-8.out
All of these have been hung for several minutes. We can investigate one
and see that it's waiting to get EOF on its input:
$ strace -p 713459
strace: Process 713459 attached
read(0,
^C
Who else has that descriptor open?
$ lsof -a -p 713459 -d 0 +E
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
tee 713459 peff 0r FIFO 0,13 0t0 3943636 pipe 719203,sh,5w 719203,sh,7w 719203,sh,12w 719203,sh,13w
sh 719203 peff 5w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,7w 719203,sh,12w 719203,sh,13w
sh 719203 peff 7w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,12w 719203,sh,13w
sh 719203 peff 12w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,13w
sh 719203 peff 13w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,12w
It's a shell, presumably a subshell spawned by the main script. Though
it may seem odd, having the same descriptor open several times is not
unreasonable (they're all basically the original stdout/stderr of the
script that has been copied). And they should all close when the process
exits. So what's it doing? Curiously, it will exit as soon as we strace
it:
$ strace -s 64 -p 719203
strace: Process 719203 attached
openat(AT_FDCWD, "pipe", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
write(2, "./t4053-diff-no-index.sh: 7: eval: ", 35) = 35
write(2, "cannot create pipe: Directory nonexistent", 41) = 41
write(2, "\n", 1) = 1
exit_group(2) = ?
+++ exited with 2 +++
I think what happens is this:
- it is blocking in the openat() call for the pipe, as we expect (so
this is definitely the backgrounded subshell mentioned above)
- strace sends signals (probably STOP/CONT); those cause the kernel to
stop blocking, but libc will restart the system call automatically
- by this time, the "pipe" fifo is gone, so we'll actually try to
create a regular file. But of course the surrounding directory is
gone, too! So we get ENOENT, and then exit as normal.
So the blocking is something we expect to happen. But what we didn't
expect is for the process to still exist at all! It should have been
killed earlier when the parent process called "kill", but it wasn't. And
we can't catch the race at this point, because it happened much earlier.
One can guess, though, that there is some race with the shell setting up
the signal handling in the backgrounded subshell, and possibly blocking
or ignoring signals at the time that the "kill" is received. Curiously,
the race does not seem to happen if I use "bash" instead of "dash", so
presumably bash's setup here is more atomic.
One fix might be to try killing the subshell more aggressively, either
using SIGKILL, or looping on kill/wait. But that seems complex and
likely to introduce new problems/races. Instead, we can observe that the
writer is not needed at all. Git will notice the pipe via stat() before
it is ever opened. So we can simply drop the writer subshell entirely.
If we ever changed Git to open the path and fstat() it, this would
result in the test hanging. But we're not likely to do that. After all,
we have to stat() paths to see if they are openable at all (e.g., it
could be a directory), so this seems like a low risk. And anybody who
does make such a change will immediately see the issue, as Git would
hang consistently.
Signed-off-by: Jeff King <peff@peff.net>
---
t/t4053-diff-no-index.sh | 4 ----
1 file changed, 4 deletions(-)
diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
index 1fb7d33462..6781cc9078 100755
--- a/t/t4053-diff-no-index.sh
+++ b/t/t4053-diff-no-index.sh
@@ -232,10 +232,6 @@ test_expect_success 'diff --no-index refuses to diff stdin and a directory' '
test_expect_success PIPE 'diff --no-index refuses to diff a named pipe and a directory' '
test_when_finished "rm -f pipe" &&
mkfifo pipe &&
- {
- (>pipe) &
- } &&
- test_when_finished "kill $!" &&
test_must_fail git diff --no-index -- pipe a 2>err &&
grep "fatal: cannot compare a named pipe to a directory" err
'
--
2.42.0.rc1.378.g7c0c59538f
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] t4053: avoid writing to unopened pipe
2023-08-13 16:24 [PATCH] t4053: avoid writing to unopened pipe Jeff King
@ 2023-08-14 15:23 ` Phillip Wood
0 siblings, 0 replies; 2+ messages in thread
From: Phillip Wood @ 2023-08-14 15:23 UTC (permalink / raw)
To: Jeff King, Phillip Wood; +Cc: git, Junio C Hamano
Hi Peff
On 13/08/2023 17:24, Jeff King wrote:
> This fixes an occasional hang I see when running t4053 with
> --verbose-log using dash.
Well done for tracking this down, it is interesting that it only affects
dash. I think your proposed fix is the simplest solution, thanks for the
comprehensive commit message.
Best Wishes
Phillip
> Commit 1e3f26542a (diff --no-index: support reading from named pipes,
> 2023-07-05) added a test that "diff --no-index" will complain when
> comparing a named pipe and a directory. The minimum we need to test this
> is to mkfifo the pipe, and then run "git diff --no-index pipe some_dir".
> But the test does one thing more: it spawns a background shell process
> that opens the pipe for writing, like this:
>
> {
> (>pipe) &
> } &&
>
> This extra writer _could_ be useful if Git misbehaves and tries to open
> the pipe for reading. Without the writer, Git would block indefinitely
> and the test would never end. But since we do not have such a bug, Git
> does not open the pipe and it is the writing process which will block
> indefinitely, since there are no readers. The test addresses this by
> running "kill $!" in a test_when_finished block. Since the writer should
> be blocking forever, this kill command will reliably find it waiting.
>
> However, this seems to be somewhat racy, in that the writing process
> sometimes hangs around even after the "kill". In a normal run of the
> test script without options, this doesn't have any effect; the
> main test script completes anyway. But with --verbose-log, we spawn a
> "tee" process that reads the script output, and it won't end until all
> descriptors pointing to its input pipe are closed. And the background
> process that is hanging around still has its stderr, etc, pointed into
> that pipe.
>
> You can reproduce the situation like this:
>
> cd t
> ./t4053-diff-no-index.sh --verbose-log --stress
>
> Let that run for a few minutes, and then you'll find that some of the
> runs have hung. For example, at 11:53, I ran:
>
> $ ps xk start o pid,start,command | grep tee | head
> 713459 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-9.out
> 713527 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-15.out
> 719434 11:48:07 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-1.out
> 728117 11:48:08 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-5.out
> 738738 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-31.out
> 739457 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-27.out
> 744432 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-21.out
> 744471 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-29.out
> 761961 11:48:12 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-0.out
> 812299 11:48:19 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-8.out
>
> All of these have been hung for several minutes. We can investigate one
> and see that it's waiting to get EOF on its input:
>
> $ strace -p 713459
> strace: Process 713459 attached
> read(0,
> ^C
>
> Who else has that descriptor open?
>
> $ lsof -a -p 713459 -d 0 +E
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> tee 713459 peff 0r FIFO 0,13 0t0 3943636 pipe 719203,sh,5w 719203,sh,7w 719203,sh,12w 719203,sh,13w
> sh 719203 peff 5w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,7w 719203,sh,12w 719203,sh,13w
> sh 719203 peff 7w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,12w 719203,sh,13w
> sh 719203 peff 12w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,13w
> sh 719203 peff 13w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,12w
>
> It's a shell, presumably a subshell spawned by the main script. Though
> it may seem odd, having the same descriptor open several times is not
> unreasonable (they're all basically the original stdout/stderr of the
> script that has been copied). And they should all close when the process
> exits. So what's it doing? Curiously, it will exit as soon as we strace
> it:
>
> $ strace -s 64 -p 719203
> strace: Process 719203 attached
> openat(AT_FDCWD, "pipe", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
> write(2, "./t4053-diff-no-index.sh: 7: eval: ", 35) = 35
> write(2, "cannot create pipe: Directory nonexistent", 41) = 41
> write(2, "\n", 1) = 1
> exit_group(2) = ?
> +++ exited with 2 +++
>
> I think what happens is this:
>
> - it is blocking in the openat() call for the pipe, as we expect (so
> this is definitely the backgrounded subshell mentioned above)
>
> - strace sends signals (probably STOP/CONT); those cause the kernel to
> stop blocking, but libc will restart the system call automatically
>
> - by this time, the "pipe" fifo is gone, so we'll actually try to
> create a regular file. But of course the surrounding directory is
> gone, too! So we get ENOENT, and then exit as normal.
>
> So the blocking is something we expect to happen. But what we didn't
> expect is for the process to still exist at all! It should have been
> killed earlier when the parent process called "kill", but it wasn't. And
> we can't catch the race at this point, because it happened much earlier.
>
> One can guess, though, that there is some race with the shell setting up
> the signal handling in the backgrounded subshell, and possibly blocking
> or ignoring signals at the time that the "kill" is received. Curiously,
> the race does not seem to happen if I use "bash" instead of "dash", so
> presumably bash's setup here is more atomic.
>
> One fix might be to try killing the subshell more aggressively, either
> using SIGKILL, or looping on kill/wait. But that seems complex and
> likely to introduce new problems/races. Instead, we can observe that the
> writer is not needed at all. Git will notice the pipe via stat() before
> it is ever opened. So we can simply drop the writer subshell entirely.
>
> If we ever changed Git to open the path and fstat() it, this would
> result in the test hanging. But we're not likely to do that. After all,
> we have to stat() paths to see if they are openable at all (e.g., it
> could be a directory), so this seems like a low risk. And anybody who
> does make such a change will immediately see the issue, as Git would
> hang consistently.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> t/t4053-diff-no-index.sh | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
> index 1fb7d33462..6781cc9078 100755
> --- a/t/t4053-diff-no-index.sh
> +++ b/t/t4053-diff-no-index.sh
> @@ -232,10 +232,6 @@ test_expect_success 'diff --no-index refuses to diff stdin and a directory' '
> test_expect_success PIPE 'diff --no-index refuses to diff a named pipe and a directory' '
> test_when_finished "rm -f pipe" &&
> mkfifo pipe &&
> - {
> - (>pipe) &
> - } &&
> - test_when_finished "kill $!" &&
> test_must_fail git diff --no-index -- pipe a 2>err &&
> grep "fatal: cannot compare a named pipe to a directory" err
> '
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-08-14 15:24 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-13 16:24 [PATCH] t4053: avoid writing to unopened pipe Jeff King
2023-08-14 15:23 ` Phillip Wood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).