From: Phillip Wood <phillip.wood123@gmail.com>
To: Jeff King <peff@peff.net>, Phillip Wood <phillip.wood@dunelm.org.uk>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] t4053: avoid writing to unopened pipe
Date: Mon, 14 Aug 2023 16:23:35 +0100 [thread overview]
Message-ID: <bc373bcc-2ff0-4902-aa7f-f34c23951173@gmail.com> (raw)
In-Reply-To: <20230813162440.GA1688763@coredump.intra.peff.net>
Hi Peff
On 13/08/2023 17:24, Jeff King wrote:
> This fixes an occasional hang I see when running t4053 with
> --verbose-log using dash.
Well done for tracking this down, it is interesting that it only affects
dash. I think your proposed fix is the simplest solution, thanks for the
comprehensive commit message.
Best Wishes
Phillip
> Commit 1e3f26542a (diff --no-index: support reading from named pipes,
> 2023-07-05) added a test that "diff --no-index" will complain when
> comparing a named pipe and a directory. The minimum we need to test this
> is to mkfifo the pipe, and then run "git diff --no-index pipe some_dir".
> But the test does one thing more: it spawns a background shell process
> that opens the pipe for writing, like this:
>
> {
> (>pipe) &
> } &&
>
> This extra writer _could_ be useful if Git misbehaves and tries to open
> the pipe for reading. Without the writer, Git would block indefinitely
> and the test would never end. But since we do not have such a bug, Git
> does not open the pipe and it is the writing process which will block
> indefinitely, since there are no readers. The test addresses this by
> running "kill $!" in a test_when_finished block. Since the writer should
> be blocking forever, this kill command will reliably find it waiting.
>
> However, this seems to be somewhat racy, in that the writing process
> sometimes hangs around even after the "kill". In a normal run of the
> test script without options, this doesn't have any effect; the
> main test script completes anyway. But with --verbose-log, we spawn a
> "tee" process that reads the script output, and it won't end until all
> descriptors pointing to its input pipe are closed. And the background
> process that is hanging around still has its stderr, etc, pointed into
> that pipe.
>
> You can reproduce the situation like this:
>
> cd t
> ./t4053-diff-no-index.sh --verbose-log --stress
>
> Let that run for a few minutes, and then you'll find that some of the
> runs have hung. For example, at 11:53, I ran:
>
> $ ps xk start o pid,start,command | grep tee | head
> 713459 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-9.out
> 713527 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-15.out
> 719434 11:48:07 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-1.out
> 728117 11:48:08 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-5.out
> 738738 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-31.out
> 739457 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-27.out
> 744432 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-21.out
> 744471 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-29.out
> 761961 11:48:12 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-0.out
> 812299 11:48:19 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-8.out
>
> All of these have been hung for several minutes. We can investigate one
> and see that it's waiting to get EOF on its input:
>
> $ strace -p 713459
> strace: Process 713459 attached
> read(0,
> ^C
>
> Who else has that descriptor open?
>
> $ lsof -a -p 713459 -d 0 +E
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> tee 713459 peff 0r FIFO 0,13 0t0 3943636 pipe 719203,sh,5w 719203,sh,7w 719203,sh,12w 719203,sh,13w
> sh 719203 peff 5w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,7w 719203,sh,12w 719203,sh,13w
> sh 719203 peff 7w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,12w 719203,sh,13w
> sh 719203 peff 12w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,13w
> sh 719203 peff 13w FIFO 0,13 0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,12w
>
> It's a shell, presumably a subshell spawned by the main script. Though
> it may seem odd, having the same descriptor open several times is not
> unreasonable (they're all basically the original stdout/stderr of the
> script that has been copied). And they should all close when the process
> exits. So what's it doing? Curiously, it will exit as soon as we strace
> it:
>
> $ strace -s 64 -p 719203
> strace: Process 719203 attached
> openat(AT_FDCWD, "pipe", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
> write(2, "./t4053-diff-no-index.sh: 7: eval: ", 35) = 35
> write(2, "cannot create pipe: Directory nonexistent", 41) = 41
> write(2, "\n", 1) = 1
> exit_group(2) = ?
> +++ exited with 2 +++
>
> I think what happens is this:
>
> - it is blocking in the openat() call for the pipe, as we expect (so
> this is definitely the backgrounded subshell mentioned above)
>
> - strace sends signals (probably STOP/CONT); those cause the kernel to
> stop blocking, but libc will restart the system call automatically
>
> - by this time, the "pipe" fifo is gone, so we'll actually try to
> create a regular file. But of course the surrounding directory is
> gone, too! So we get ENOENT, and then exit as normal.
>
> So the blocking is something we expect to happen. But what we didn't
> expect is for the process to still exist at all! It should have been
> killed earlier when the parent process called "kill", but it wasn't. And
> we can't catch the race at this point, because it happened much earlier.
>
> One can guess, though, that there is some race with the shell setting up
> the signal handling in the backgrounded subshell, and possibly blocking
> or ignoring signals at the time that the "kill" is received. Curiously,
> the race does not seem to happen if I use "bash" instead of "dash", so
> presumably bash's setup here is more atomic.
>
> One fix might be to try killing the subshell more aggressively, either
> using SIGKILL, or looping on kill/wait. But that seems complex and
> likely to introduce new problems/races. Instead, we can observe that the
> writer is not needed at all. Git will notice the pipe via stat() before
> it is ever opened. So we can simply drop the writer subshell entirely.
>
> If we ever changed Git to open the path and fstat() it, this would
> result in the test hanging. But we're not likely to do that. After all,
> we have to stat() paths to see if they are openable at all (e.g., it
> could be a directory), so this seems like a low risk. And anybody who
> does make such a change will immediately see the issue, as Git would
> hang consistently.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> t/t4053-diff-no-index.sh | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
> index 1fb7d33462..6781cc9078 100755
> --- a/t/t4053-diff-no-index.sh
> +++ b/t/t4053-diff-no-index.sh
> @@ -232,10 +232,6 @@ test_expect_success 'diff --no-index refuses to diff stdin and a directory' '
> test_expect_success PIPE 'diff --no-index refuses to diff a named pipe and a directory' '
> test_when_finished "rm -f pipe" &&
> mkfifo pipe &&
> - {
> - (>pipe) &
> - } &&
> - test_when_finished "kill $!" &&
> test_must_fail git diff --no-index -- pipe a 2>err &&
> grep "fatal: cannot compare a named pipe to a directory" err
> '
prev parent reply other threads:[~2023-08-14 15:24 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-13 16:24 [PATCH] t4053: avoid writing to unopened pipe Jeff King
2023-08-14 15:23 ` Phillip Wood [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bc373bcc-2ff0-4902-aa7f-f34c23951173@gmail.com \
--to=phillip.wood123@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=phillip.wood@dunelm.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).