git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Thomas Rast <trast@student.ethz.ch>
Cc: git@vger.kernel.org, Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: t7006 sometimes hangs in cronjobs on OS X
Date: Wed, 9 Feb 2011 15:50:57 -0500	[thread overview]
Message-ID: <20110209205056.GA2083@sigill.intra.peff.net> (raw)
In-Reply-To: <201102092122.19178.trast@student.ethz.ch>

On Wed, Feb 09, 2011 at 09:22:19PM +0100, Thomas Rast wrote:

> Thomas Rast wrote:
> > I'll have to put off the more involved experimentation somewhat :-(
> 
> Or not.
> 
> Anyway, I managed to load the relevant parts of t7006 in an
> interactive shell, and used
> 
>   i=0
>   while test_terminal git --no-pager log >out.$i 2>&1; do
>     i=$(($i+1))
>     echo $i
>   done

Great, that is a much smaller bit to work with. I of course can't
replicate it on my Linux boxes, though. :(

> Running 'dtruss -n git' instead did hit the issue, and I have the
> output saved away, but I'm not sure what I'd be looking for.  After
> all, the git process becomes a zombie:
> 
>   47606 s006  S+     0:00.13 /usr/bin/perl /Users/trast/git/t/test-terminal.perl git --no-pager log
>   47607 s006  Z+     0:00.00 (git)
>   47608 s006  Z+     0:00.00 (perl5.10.0)

Yeah, I think the problem is going to be in the perl process, but...

> The output for pid 47607 ends in:
> 
>   47607/0x1f7fec5:  write_nocancel(0x1, "Author: A U Thor <author@example.com>\n\0", 0x26)                 = 38 0
>   47607/0x1f7fec5:  write_nocancel(0x1, "Date:   Thu Apr 7 15:13:13 2005 -0700\n\0", 0x26)                 = 38 0
>   47607/0x1f7fec5:  write_nocancel(0x1, "\n\0", 0x1)               = 1 0
>   47607/0x1f7fec5:  write_nocancel(0x1, "    initial\n\0", 0xC)            = 12 0
>   47607/0x1f7fec5:  fstat64(0x1, 0x7FFF5FBFE8A0, 0x0)              = 0 0
>   47607/0x1f7fec5:  fstat64(0x1, 0x7FFF5FBFEF60, 0x0)              = 0 0
>   47607/0x1f7fec5:  close_nocancel(0x1)            = 0 0
> 
> Which seems rather unsurprising really :-(

It doesn't close(2) also? That could be the problem. I would expect it
also to exit().

> Finally, attaching dtruss after hitting the race gives no output when
> I finally kill it.

Yeah, it is probably stuck in some syscall. You could try instrumenting
your test-terminal.perl like:

diff --git a/t/test-terminal.perl b/t/test-terminal.perl
index ee01eb9..1755017 100755
--- a/t/test-terminal.perl
+++ b/t/test-terminal.perl
@@ -55,11 +55,15 @@ sub copy_stdio {
 	defined $pid or die "fork failed: $!";
 	if (!$pid) {
 		close($out);
+		print STDERR "child: pumping stderr\n";
 		xsendfile(\*STDERR, $err);
+		print STDERR "child: done pumping stderr\n";
 		exit 0;
 	}
 	close($err);
+	print STDERR "parent: pumping stdout\n";
 	xsendfile(\*STDOUT, $out);
+	print STDERR "parent: done pumping stdout\n";
 	finish_child($pid) == 0
 		or exit 1;
 }

and see which of those messages actually gets printed on the stuck case.
My suspicion is that one of the processes is getting stuck in xsendfile.
It may be the fault of File::Copy::copy, which I have seen be flaky
before on non-regular files. We should arguably be using non-buffered
I/O here anyway, so maybe try this:

diff --git a/t/test-terminal.perl b/t/test-terminal.perl
index ee01eb9..0d33da1 100755
--- a/t/test-terminal.perl
+++ b/t/test-terminal.perl
@@ -3,7 +3,6 @@ use 5.008;
 use strict;
 use warnings;
 use IO::Pty;
-use File::Copy;
 
 # Run @$argv in the background with stdio redirected to $out and $err.
 sub start_child {
@@ -46,7 +45,13 @@ sub xsendfile {
 	# from a disconnected terminal will return
 	# EIO (as in AIX 4.x, IRIX, and Linux) or
 	# end-of-file.  Either is fine.
-	copy($in, $out, 4096) or $!{EIO} or die "cannot copy from child: $!";
+	while (1) {
+		my $r = sysread($in, my $buf, 4096);
+		last if $r == 0;
+		last if !defined $r and $!{EIO};
+		die "cannot copy from child: $!" unless defined $r;
+		syswrite($out, $buf);
+	}
 }
 
 sub copy_stdio {

  reply	other threads:[~2011-02-09 20:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-09 14:38 t7006 sometimes hangs in cronjobs on OS X Thomas Rast
2011-02-09 18:30 ` Ben Walton
2011-02-09 19:09 ` Jeff King
2011-02-09 19:16   ` Thomas Rast
2011-02-09 20:22     ` Thomas Rast
2011-02-09 20:50       ` Jeff King [this message]
2011-02-12  2:03         ` Jonathan Nieder
2011-02-12  2:05           ` Jonathan Nieder
2011-02-12  5:12           ` Jeff King
2011-02-12  8:37             ` Jonathan Nieder
2011-02-13  5:31 ` Jonathan Nieder
2011-02-17 12:38 ` [PATCH] tests: skip terminal output tests " Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110209205056.GA2083@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).