git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Jeff King <peff@peff.net>, Phillip Wood <phillip.wood@dunelm.org.uk>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] t4053: avoid writing to unopened pipe
Date: Mon, 14 Aug 2023 16:23:35 +0100	[thread overview]
Message-ID: <bc373bcc-2ff0-4902-aa7f-f34c23951173@gmail.com> (raw)
In-Reply-To: <20230813162440.GA1688763@coredump.intra.peff.net>

Hi Peff

On 13/08/2023 17:24, Jeff King wrote:
> This fixes an occasional hang I see when running t4053 with
> --verbose-log using dash.

Well done for tracking this down, it is interesting that it only affects 
dash. I think your proposed fix is the simplest solution, thanks for the 
comprehensive commit message.

Best Wishes

Phillip

> Commit 1e3f26542a (diff --no-index: support reading from named pipes,
> 2023-07-05) added a test that "diff --no-index" will complain when
> comparing a named pipe and a directory. The minimum we need to test this
> is to mkfifo the pipe, and then run "git diff --no-index pipe some_dir".
> But the test does one thing more: it spawns a background shell process
> that opens the pipe for writing, like this:
> 
>          {
>                  (>pipe) &
>          } &&
> 
> This extra writer _could_ be useful if Git misbehaves and tries to open
> the pipe for reading. Without the writer, Git would block indefinitely
> and the test would never end. But since we do not have such a bug, Git
> does not open the pipe and it is the writing process which will block
> indefinitely, since there are no readers. The test addresses this by
> running "kill $!" in a test_when_finished block. Since the writer should
> be blocking forever, this kill command will reliably find it waiting.
> 
> However, this seems to be somewhat racy, in that the writing process
> sometimes hangs around even after the "kill". In a normal run of the
> test script without options, this doesn't have any effect; the
> main test script completes anyway. But with --verbose-log, we spawn a
> "tee" process that reads the script output, and it won't end until all
> descriptors pointing to its input pipe are closed. And the background
> process that is hanging around still has its stderr, etc, pointed into
> that pipe.
> 
> You can reproduce the situation like this:
> 
>    cd t
>    ./t4053-diff-no-index.sh --verbose-log --stress
> 
> Let that run for a few minutes, and then you'll find that some of the
> runs have hung. For example, at 11:53, I ran:
> 
>    $ ps xk start o pid,start,command | grep tee | head
>     713459 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-9.out
>     713527 11:48:06 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-15.out
>     719434 11:48:07 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-1.out
>     728117 11:48:08 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-5.out
>     738738 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-31.out
>     739457 11:48:09 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-27.out
>     744432 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-21.out
>     744471 11:48:10 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-29.out
>     761961 11:48:12 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-0.out
>     812299 11:48:19 tee -a /home/peff/compile/git/t/test-results/t4053-diff-no-index.stress-8.out
> 
> All of these have been hung for several minutes. We can investigate one
> and see that it's waiting to get EOF on its input:
> 
>    $ strace -p 713459
>    strace: Process 713459 attached
>    read(0,
>    ^C
> 
> Who else has that descriptor open?
> 
>    $ lsof -a -p 713459 -d 0 +E
>    COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
>    tee     713459 peff    0r  FIFO   0,13      0t0 3943636 pipe 719203,sh,5w 719203,sh,7w 719203,sh,12w 719203,sh,13w
>    sh      719203 peff    5w  FIFO   0,13      0t0 3943636 pipe 713459,tee,0r 719203,sh,7w 719203,sh,12w 719203,sh,13w
>    sh      719203 peff    7w  FIFO   0,13      0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,12w 719203,sh,13w
>    sh      719203 peff   12w  FIFO   0,13      0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,13w
>    sh      719203 peff   13w  FIFO   0,13      0t0 3943636 pipe 713459,tee,0r 719203,sh,5w 719203,sh,7w 719203,sh,12w
> 
> It's a shell, presumably a subshell spawned by the main script. Though
> it may seem odd, having the same descriptor open several times is not
> unreasonable (they're all basically the original stdout/stderr of the
> script that has been copied). And they should all close when the process
> exits. So what's it doing? Curiously, it will exit as soon as we strace
> it:
> 
>    $ strace -s 64 -p 719203
>    strace: Process 719203 attached
>    openat(AT_FDCWD, "pipe", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or directory)
>    write(2, "./t4053-diff-no-index.sh: 7: eval: ", 35) = 35
>    write(2, "cannot create pipe: Directory nonexistent", 41) = 41
>    write(2, "\n", 1)                       = 1
>    exit_group(2)                           = ?
>    +++ exited with 2 +++
> 
> I think what happens is this:
> 
>    - it is blocking in the openat() call for the pipe, as we expect (so
>      this is definitely the backgrounded subshell mentioned above)
> 
>    - strace sends signals (probably STOP/CONT); those cause the kernel to
>      stop blocking, but libc will restart the system call automatically
> 
>    - by this time, the "pipe" fifo is gone, so we'll actually try to
>      create a regular file. But of course the surrounding directory is
>      gone, too! So we get ENOENT, and then exit as normal.
> 
> So the blocking is something we expect to happen. But what we didn't
> expect is for the process to still exist at all! It should have been
> killed earlier when the parent process called "kill", but it wasn't. And
> we can't catch the race at this point, because it happened much earlier.
> 
> One can guess, though, that there is some race with the shell setting up
> the signal handling in the backgrounded subshell, and possibly blocking
> or ignoring signals at the time that the "kill" is received.  Curiously,
> the race does not seem to happen if I use "bash" instead of "dash", so
> presumably bash's setup here is more atomic.
> 
> One fix might be to try killing the subshell more aggressively, either
> using SIGKILL, or looping on kill/wait. But that seems complex and
> likely to introduce new problems/races. Instead, we can observe that the
> writer is not needed at all. Git will notice the pipe via stat() before
> it is ever opened. So we can simply drop the writer subshell entirely.
> 
> If we ever changed Git to open the path and fstat() it, this would
> result in the test hanging. But we're not likely to do that. After all,
> we have to stat() paths to see if they are openable at all (e.g., it
> could be a directory), so this seems like a low risk. And anybody who
> does make such a change will immediately see the issue, as Git would
> hang consistently.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>   t/t4053-diff-no-index.sh | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
> index 1fb7d33462..6781cc9078 100755
> --- a/t/t4053-diff-no-index.sh
> +++ b/t/t4053-diff-no-index.sh
> @@ -232,10 +232,6 @@ test_expect_success 'diff --no-index refuses to diff stdin and a directory' '
>   test_expect_success PIPE 'diff --no-index refuses to diff a named pipe and a directory' '
>   	test_when_finished "rm -f pipe" &&
>   	mkfifo pipe &&
> -	{
> -		(>pipe) &
> -	} &&
> -	test_when_finished "kill $!" &&
>   	test_must_fail git diff --no-index -- pipe a 2>err &&
>   	grep "fatal: cannot compare a named pipe to a directory" err
>   '

      reply	other threads:[~2023-08-14 15:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-13 16:24 [PATCH] t4053: avoid writing to unopened pipe Jeff King
2023-08-14 15:23 ` Phillip Wood [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc373bcc-2ff0-4902-aa7f-f34c23951173@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).