git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] upload-pack deadlock
@ 2011-04-04  5:36 Jeff King
  2011-04-06 17:20 ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2011-04-04  5:36 UTC (permalink / raw)
  To: git; +Cc: Erik Faye-Lund, Aman Gupta, Ryan Tomayko

In a pthread-enabled version of upload-pack, there's a race condition
that can cause a deadlock on the fflush(NULL) we call from run-command.

What happens is this:

  1. Upload-pack is informed we are doing a shallow clone.

  2. We call start_async() to spawn a thread that will generate rev-list
     results to feed to pack-objects. It gets a file descriptor to a
     pipe which will eventually hook to pack-objects.

  3. The rev-list thread uses fdopen to create a new output stream
     around the fd we gave it, called pack_pipe.

  4. The thread writes results to pack_pipe. Outside of our control,
     libc is doing locking on the stream. We keep writing until the OS
     pipe buffer is full, and then we block in write(), still holding
     the lock.

  5. The main thread now uses start_command to spawn pack-objects.
     Before forking, it calls fflush(NULL) to flush every stdio output
     buffer. It blocks trying to get the lock on pack_pipe.

And we have a deadlock. The thread will block until somebody starts
reading from the pipe. But nobody will read from the pipe until we
finish flushing to the pipe.

Most of the time it doesn't happen, because the main thread manages to
call fflush(NULL) before the rev-list thread manages to block. But it's
easy to reproduce if you apply this patch:

diff --git a/upload-pack.c b/upload-pack.c
index bba053f..ceaef86 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -189,6 +189,7 @@ static void create_pack_file(void)
 	pack_objects.git_cmd = 1;
 	pack_objects.argv = argv;
 
+	sleep(1);
 	if (start_command(&pack_objects))
 		die("git upload-pack: unable to fork git-pack-objects");
 

which slows the main thread just enough. With that applied, I can
reproduce reliably by running:

  git clone --depth 1 file://$GIT

where $GIT is some repo that is large enough that its rev-list output
will fill the OS's pipe buffer (I used git.git).

The end of the strace looks like this (27315 is the parent, 27316 is the
thread):

[pid 27316] write(4, "0ac7ec43918a5af59f1a305d47cd16ed"..., 4096) = 4096
[pid 27316] write(4, "entation/RelNotes/1.5.3.txt\nd4e4"..., 4096) = 4096
[pid 27316] write(4, "elNotes/1.6.5.3.txt\nd3a2a3e71243"..., 4096) = 4096
[pid 27316] write(4, "cumentation/docbook.xsl\nae413e52"..., 4096) = 4096
[pid 27316] write(4, "5d5c06cc37c7a203f758 Documentati"..., 4096 <unfinished ...>
[pid 27315] <... nanosleep resumed> 0x7fff053d81e0) = 0
[pid 27315] pipe([5, 6])                = 0
[pid 27315] pipe([7, 8])                = 0
[pid 27315] futex(0x82c930, FUTEX_WAIT_PRIVATE, 2, NULL

So you see the thread blocking on write, and the parent process stuck
trying to get the lock.

There are a few possible solutions:

  1. Flip the order of initialization, so that we don't start writing
     anything until the pack-objects reader is already in place. That
     works in this case, and the patch is at the end of this mail. The
     biggest problem is that it doesn't fix the general case.

  2. Rewrite the async rev-list into a separate rev-list invocation.
     There is no deadlock without threads (and this works fine if
     start_async uses fork() on your system). The problem is that I'm
     not sure what the thread is doing is possible via a rev-list
     invocation. And it's a lot more error-prone than (1), but still
     doesn't fix the general case.

  3. Make run-command fflush just stdout and stderr before calling fork.
     The reasons to flush before forking are:

       a. Avoid duplicated output when the forked process might be
          writing to the stream (and then both sides will end up
          outputting whatever cruft was in the buffer). But the point of
          run-command is to exec something, so the only stream we are
          likely to write to before exec'ing something (and losing those
          buffers anyway) is going to be stderr.

       b. Provide sane ordering of output. If you write something to
          a stream and then run a new command, you expect your output to
          come before the sub-command's.

     Doing (a) is easy; we really just need to flush stderr. Doing (b)
     is harder.  Most of the time the buffer you would want to flush is
     going to be stdout. But it's possible for it to be some other FILE
     stream pointing to a pipe.

     So the best we can do is to flush the common things that are
     probably safe, and then any callers who really care about the
     ordering of other output will have to fflush() their outputs
     manually.

     It would be nice to do the opposite: have everything flushed unless
     the caller says "no, this stream is being taken care of in a
     thread, so don't worry about it". But I don't think there is any
     way to do that. fflush(NULL) flushes everything, and there is no
     way to even get a list of what it _would_ flush so that we could do
     it ourselves.

     But of course that doesn't really fix the problem. It just
     restricts it to stdout and stderr. But once we have the actual list
     of things to flush, we can use ftrylockfile() to just try flushing
     each one (ftrylockfile is probably not available everywhere, but it
     is in POSIX.1-2001, which hopefully means it's available where we
     have pthreads).

I am tempted to go with (1), as it's simple and unlikely to introduce
new bugs. There are a few other start_async callsites with threads that
output to stdio: demuxers which call recv_sideband. They write to
stderr, but since they are just relaying stderr from the remote side, it
is unlikely that a well-behaved remote would send enough to actually
cause stderr to block. OTOH, I know Erik has mentioned some deadlock
problems on Windows when sideband-64k is enabled, so that might be
related (we start_async the demuxer and then spawn pack-objects).

Comments/ideas? Patch for approach (1) is below.

---
diff --git a/upload-pack.c b/upload-pack.c
index bba053f..ce5cbbe 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -157,15 +157,8 @@ static void create_pack_file(void)
 	const char *argv[10];
 	int arg = 0;
 
-	if (shallow_nr) {
-		memset(&rev_list, 0, sizeof(rev_list));
-		rev_list.proc = do_rev_list;
-		rev_list.out = -1;
-		if (start_async(&rev_list))
-			die("git upload-pack: unable to fork git-rev-list");
-		argv[arg++] = "pack-objects";
-	} else {
-		argv[arg++] = "pack-objects";
+	argv[arg++] = "pack-objects";
+	if (!shallow_nr) {
 		argv[arg++] = "--revs";
 		if (create_full_pack)
 			argv[arg++] = "--all";
@@ -183,7 +176,7 @@ static void create_pack_file(void)
 	argv[arg++] = NULL;
 
 	memset(&pack_objects, 0, sizeof(pack_objects));
-	pack_objects.in = shallow_nr ? rev_list.out : -1;
+	pack_objects.in = -1;
 	pack_objects.out = -1;
 	pack_objects.err = -1;
 	pack_objects.git_cmd = 1;
@@ -192,8 +185,14 @@ static void create_pack_file(void)
 	if (start_command(&pack_objects))
 		die("git upload-pack: unable to fork git-pack-objects");
 
-	/* pass on revisions we (don't) want */
-	if (!shallow_nr) {
+	if (shallow_nr) {
+		memset(&rev_list, 0, sizeof(rev_list));
+		rev_list.proc = do_rev_list;
+		rev_list.out = pack_objects.in;
+		if (start_async(&rev_list))
+			die("git upload-pack: unable to fork git-rev-list");
+	}
+	else {
 		FILE *pipe_fd = xfdopen(pack_objects.in, "w");
 		if (!create_full_pack) {
 			int i;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC] upload-pack deadlock
  2011-04-04  5:36 [RFC] upload-pack deadlock Jeff King
@ 2011-04-06 17:20 ` Junio C Hamano
  2011-04-06 17:54   ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2011-04-06 17:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Erik Faye-Lund, Aman Gupta, Ryan Tomayko

Jeff King <peff@peff.net> writes:

> In a pthread-enabled version of upload-pack, there's a race condition
> that can cause a deadlock on the fflush(NULL) we call from run-command.
>
> What happens is this:
>
>   1. Upload-pack is informed we are doing a shallow clone.
>
>   2. We call start_async() to spawn a thread that will generate rev-list
>      results to feed to pack-objects. It gets a file descriptor to a
>      pipe which will eventually hook to pack-objects.
>
>   3. The rev-list thread uses fdopen to create a new output stream
>      around the fd we gave it, called pack_pipe.
>
>   4. The thread writes results to pack_pipe. Outside of our control,
>      libc is doing locking on the stream. We keep writing until the OS
>      pipe buffer is full, and then we block in write(), still holding
>      the lock.
>
>   5. The main thread now uses start_command to spawn pack-objects.
>      Before forking, it calls fflush(NULL) to flush every stdio output
>      buffer. It blocks trying to get the lock on pack_pipe.
>
> And we have a deadlock. The thread will block until somebody starts
> reading from the pipe. But nobody will read from the pipe until we
> finish flushing to the pipe.

Thanks for a concise summary.

> There are a few possible solutions:
>
>   1. Flip the order of initialization, so that we don't start writing
>      anything until the pack-objects reader is already in place. That
>      works in this case, and the patch is at the end of this mail. The
>      biggest problem is that it doesn't fix the general case.

In what sense are you not fixing "the general case", though?

If a program runs two threads, both of which access the FILE streams, it
should be the responsibility of the program to get these threads
coordinated to avoid such races and deadlocks, no?

While I think you mean start_async() API should be helping the API users
(calling programs) to fulfil that responsibility better, it really is up
to the thread to do random things to wreak havoc, like the shallow
codepath that tries to fill the stream while the main codepath expected
nothing happening, and I do not think of a good abstraction to help us
being more careful.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] upload-pack deadlock
  2011-04-06 17:20 ` Junio C Hamano
@ 2011-04-06 17:54   ` Jeff King
  2011-04-06 19:15     ` Erik Faye-Lund
  2011-04-06 21:33     ` Jeff King
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2011-04-06 17:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Erik Faye-Lund, Aman Gupta, Ryan Tomayko

On Wed, Apr 06, 2011 at 10:20:55AM -0700, Junio C Hamano wrote:

> > There are a few possible solutions:
> >
> >   1. Flip the order of initialization, so that we don't start writing
> >      anything until the pack-objects reader is already in place. That
> >      works in this case, and the patch is at the end of this mail. The
> >      biggest problem is that it doesn't fix the general case.
> 
> In what sense are you not fixing "the general case", though?
> 
> If a program runs two threads, both of which access the FILE streams, it
> should be the responsibility of the program to get these threads
> coordinated to avoid such races and deadlocks, no?

Yes, but the problem is that looking at the code of the two threads, you
would never realize there is a deadlock. They never intentionally try to
touch the same stream. The real problem is buried in the run-command
code which calls fflush(NULL). This is inherently thread-unsafe, because
it wants to touch global data. It does do proper locking, at least, but
there is a deadlock issue, as demonstrated here.

We could do something like this:

diff --git a/run-command.c b/run-command.c
index 8619c76..ec8a2e6 100644
--- a/run-command.c
+++ b/run-command.c
@@ -2,6 +2,21 @@
 #include "run-command.h"
 #include "exec_cmd.h"
 
+static void flush_one(FILE *fh) {
+	/* if we can't get the lock, some thread
+	 * is already writing/flushing it */
+	if (ftrylockfile(fh))
+		return;
+
+	fflush(fh);
+	funlockfile(fh);
+}
+
+static void flush_all() {
+	flush_one(stdout);
+	flush_one(stderr);
+}
+
 static inline void close_pair(int fd[2])
 {
 	close(fd[0]);
@@ -199,7 +214,7 @@ fail_pipe:
 	}
 
 	trace_argv_printf(cmd->argv, "trace: run_command:");
-	fflush(NULL);
+	flush_all();
 
 #ifndef WIN32
 {
@@ -530,7 +545,7 @@ int start_async(struct async *async)
 
 #ifdef NO_PTHREADS
 	/* Flush stdio before fork() to avoid cloning buffers */
-	fflush(NULL);
+	flush_all();
 
 	async->pid = fork();
 	if (async->pid < 0) {

but I'm not all that happy with it. It does remove the deadlock, though
it makes the race condition for duplicate output slightly worse.  The
comment in this hunk:

+static void flush_one(FILE *fh) {
+	/* if we can't get the lock, some thread
+	 * is already writing/flushing it */
+	if (ftrylockfile(fh))
+		return;

is a little optimistic. Somebody may be writing to the stream, but not
enough to flush. We fail to flush, and then the forked process has the
duplicated buffer. To be fair, this race condition already exists. If a
thread is writing and not flushing a buffer, it may do a write after the
fflush(NULL) but before the fork, anyway.

Of slightly more concern is this hunk:

+static void flush_all() {
+	flush_one(stdout);
+	flush_one(stderr);
+}

which obviously is not really "all" but just a fixed set of descriptors.
But AFAIK, there is no portable way to get the list of all streams (even
though it clearly must exist to implement fflush(NULL) properly).

I wonder if that matters, though. Consider why we fflush(NULL); it is to
avoid duplicate output across a fork. But we fork in only two cases:

  1. To start an async process when we don't have pthreads. But for this
     to be a problem, we must be using pthreads already.

  2. To immediately exec a command. In that case, we control the whole
     code path up to the point of exec (at which point we no longer care
     about duplicated buffers), so we know which streams will be written
     to. I assumed it was just stderr, but actually I think it may be
     none at all. We replace the die routine with one that writes
     straight to the stderr descriptor.

So I am wondering if we could simply drop the fflush(NULL) entirely in
the start_command case. And in the start_async case, move it inside the
NO_PTHREADS case.

I guess the fflush does do one other thing; it makes sure that output on
a single descriptor is ordered sensibly. And we would be losing that.

Bleh. I hate parallel programming. Maybe my original fix is the right
thing to do. It's simple and obviously correct. It does mean we might
run into this problem again, but we really don't use threads very much,
so it's probably not worth spending too much up-front effort to prevent
a later coding error that is not very likely to occur.

I do still wonder about the win32 deadlock that Erik mentioned. Does my
patch help at all, or is there another bug hiding somewhere? This
particular deadlock only occurs with shallow clones.

-Peff

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC] upload-pack deadlock
  2011-04-06 17:54   ` Jeff King
@ 2011-04-06 19:15     ` Erik Faye-Lund
  2011-04-06 21:38       ` Jeff King
  2011-04-06 21:33     ` Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Erik Faye-Lund @ 2011-04-06 19:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Aman Gupta, Ryan Tomayko

On Wed, Apr 6, 2011 at 7:54 PM, Jeff King <peff@github.com> wrote:
> I do still wonder about the win32 deadlock that Erik mentioned. Does my
> patch help at all, or is there another bug hiding somewhere?

Nope :(

If you're interested, you can read some more information here on the
msysGit mailing list:
http://groups.google.com/group/msysgit/browse_thread/thread/865318a3e89d9e64/9d475b8ff3cef3c3

It only happens when pushing over the git-protocol. But it happens consistently.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] upload-pack deadlock
  2011-04-06 17:54   ` Jeff King
  2011-04-06 19:15     ` Erik Faye-Lund
@ 2011-04-06 21:33     ` Jeff King
  2011-04-18  5:34       ` Jonathan Nieder
  2011-05-26  6:45       ` [1.7.2] Please cherry-pick "upload-pack: start pack-objects before async rev-list" Jonathan Nieder
  1 sibling, 2 replies; 10+ messages in thread
From: Jeff King @ 2011-04-06 21:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Erik Faye-Lund, Aman Gupta, Ryan Tomayko

On Wed, Apr 06, 2011 at 01:54:13PM -0400, Jeff King wrote:

> So I am wondering if we could simply drop the fflush(NULL) entirely in
> the start_command case. And in the start_async case, move it inside the
> NO_PTHREADS case.
> 
> I guess the fflush does do one other thing; it makes sure that output on
> a single descriptor is ordered sensibly. And we would be losing that.

After reading 13af8cb (start_command: flush buffers in the WIN32 code
path as well, 2011-02-04), I think dropping the fflush is a bad idea.
Let's do the simple and safe fix, and if this type of problem actually
comes up more than once, then I'll think about over-engineering an
abstraction to fix it. :)

Here it is with a nice commit message.

-- >8 --
Subject: [PATCH] upload-pack: start pack-objects before async rev-list

In a pthread-enabled version of upload-pack, there's a race condition
that can cause a deadlock on the fflush(NULL) we call from run-command.

What happens is this:

  1. Upload-pack is informed we are doing a shallow clone.

  2. We call start_async() to spawn a thread that will generate rev-list
     results to feed to pack-objects. It gets a file descriptor to a
     pipe which will eventually hook to pack-objects.

  3. The rev-list thread uses fdopen to create a new output stream
     around the fd we gave it, called pack_pipe.

  4. The thread writes results to pack_pipe. Outside of our control,
     libc is doing locking on the stream. We keep writing until the OS
     pipe buffer is full, and then we block in write(), still holding
     the lock.

  5. The main thread now uses start_command to spawn pack-objects.
     Before forking, it calls fflush(NULL) to flush every stdio output
     buffer. It blocks trying to get the lock on pack_pipe.

And we have a deadlock. The thread will block until somebody starts
reading from the pipe. But nobody will read from the pipe until we
finish flushing to the pipe.

To fix this, we swap the start order: we start the
pack-objects reader first, and then the rev-list writer
after. Thus the problematic fflush(NULL) happens before we
even open the new file descriptor (and even if it didn't,
flushing should no longer block, as the reader at the end of
the pipe is now active).

Signed-off-by: Jeff King <peff@peff.net>
---
And the result is one line shorter, so it _must_ be good.

 upload-pack.c |   23 +++++++++++------------
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index bba053f..ce5cbbe 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -157,15 +157,8 @@ static void create_pack_file(void)
 	const char *argv[10];
 	int arg = 0;
 
-	if (shallow_nr) {
-		memset(&rev_list, 0, sizeof(rev_list));
-		rev_list.proc = do_rev_list;
-		rev_list.out = -1;
-		if (start_async(&rev_list))
-			die("git upload-pack: unable to fork git-rev-list");
-		argv[arg++] = "pack-objects";
-	} else {
-		argv[arg++] = "pack-objects";
+	argv[arg++] = "pack-objects";
+	if (!shallow_nr) {
 		argv[arg++] = "--revs";
 		if (create_full_pack)
 			argv[arg++] = "--all";
@@ -183,7 +176,7 @@ static void create_pack_file(void)
 	argv[arg++] = NULL;
 
 	memset(&pack_objects, 0, sizeof(pack_objects));
-	pack_objects.in = shallow_nr ? rev_list.out : -1;
+	pack_objects.in = -1;
 	pack_objects.out = -1;
 	pack_objects.err = -1;
 	pack_objects.git_cmd = 1;
@@ -192,8 +185,14 @@ static void create_pack_file(void)
 	if (start_command(&pack_objects))
 		die("git upload-pack: unable to fork git-pack-objects");
 
-	/* pass on revisions we (don't) want */
-	if (!shallow_nr) {
+	if (shallow_nr) {
+		memset(&rev_list, 0, sizeof(rev_list));
+		rev_list.proc = do_rev_list;
+		rev_list.out = pack_objects.in;
+		if (start_async(&rev_list))
+			die("git upload-pack: unable to fork git-rev-list");
+	}
+	else {
 		FILE *pipe_fd = xfdopen(pack_objects.in, "w");
 		if (!create_full_pack) {
 			int i;
-- 
1.7.4.3.13.g0b769.dirty

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC] upload-pack deadlock
  2011-04-06 19:15     ` Erik Faye-Lund
@ 2011-04-06 21:38       ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2011-04-06 21:38 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Junio C Hamano, git, Aman Gupta, Ryan Tomayko

On Wed, Apr 06, 2011 at 09:15:31PM +0200, Erik Faye-Lund wrote:

> On Wed, Apr 6, 2011 at 7:54 PM, Jeff King <peff@github.com> wrote:
> > I do still wonder about the win32 deadlock that Erik mentioned. Does my
> > patch help at all, or is there another bug hiding somewhere?
> 
> Nope :(
> 
> If you're interested, you can read some more information here on the
> msysGit mailing list:
> http://groups.google.com/group/msysgit/browse_thread/thread/865318a3e89d9e64/9d475b8ff3cef3c3
> 
> It only happens when pushing over the git-protocol. But it happens consistently.

Thanks for an interesting read. Unfortunately, my reading of that thread
is that the bug isn't in git code, but in some weird Windows
interaction, and it isn't worth me trying to reproduce on Linux. So I
guess it is probably unrelated to this.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] upload-pack deadlock
  2011-04-06 21:33     ` Jeff King
@ 2011-04-18  5:34       ` Jonathan Nieder
  2011-05-26  6:45       ` [1.7.2] Please cherry-pick "upload-pack: start pack-objects before async rev-list" Jonathan Nieder
  1 sibling, 0 replies; 10+ messages in thread
From: Jonathan Nieder @ 2011-04-18  5:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Erik Faye-Lund, Aman Gupta, Ryan Tomayko

Jeff King wrote:

> Subject: [PATCH] upload-pack: start pack-objects before async rev-list
>
> In a pthread-enabled version of upload-pack, there's a race condition
> that can cause a deadlock on the fflush(NULL) we call from run-command.
[...]
> To fix this, we swap the start order: we start the
> pack-objects reader first, and then the rev-list writer
> after. Thus the problematic fflush(NULL) happens before we
> even open the new file descriptor (and even if it didn't,
> flushing should no longer block, as the reader at the end of
> the pipe is now active).
>
> Signed-off-by: Jeff King <peff@peff.net>

t5500.12 "fetch same depth in shallow repo" reproducibly hangs[1] on
the HURD without this patch and passes with it.  I had just assumed it
was some weird hurd thing.  Thanks for figuring it out.

Tested-by: Jonathan Nieder <jrnieder@gmail.com>

[1] http://bugs.debian.org/607346

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [1.7.2] Please cherry-pick "upload-pack: start pack-objects before async rev-list"
  2011-04-06 21:33     ` Jeff King
  2011-04-18  5:34       ` Jonathan Nieder
@ 2011-05-26  6:45       ` Jonathan Nieder
  2011-05-26 16:58         ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Jonathan Nieder @ 2011-05-26  6:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Erik Faye-Lund, Aman Gupta, Ryan Tomayko

Hi Junio,

Jeff King wrote:

> Subject: [PATCH] upload-pack: start pack-objects before async rev-list
>
> In a pthread-enabled version of upload-pack, there's a race condition
> that can cause a deadlock on the fflush(NULL) we call from run-command.

This server-side deadlock started being triggered by shallow clones
when sv.gnu.org upgraded to v1.7.2.5 a couple of months ago[1].  So it
might be worth thinking about how to help upgrade-averse server admins
to fix it.

Luckily, Jeff's patch (v1.7.5.1~4^2, upload-pack: start pack-objects
before async rev-list, 2011-04-06) applies cleanly on top of the
oldest commit with the problem that I can find (v1.7.2-rc0~31^2~3,
Reimplement async procedures using pthreads, 2010-03-06).  Could you
queue it for inclusion in maint-1.7.2?

If I understand correctly, Windows with threaded async rev-list before
then would not have suffered from this deadlock because it did not
fflush(NULL) in start_command until v1.7.4.1~18 (start_command: flush
buffers in the WIN32 code path as well, 2011-02-04).

[1] http://thread.gmane.org/gmane.comp.version-control.git/172042

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [1.7.2] Please cherry-pick "upload-pack: start pack-objects before async rev-list"
  2011-05-26  6:45       ` [1.7.2] Please cherry-pick "upload-pack: start pack-objects before async rev-list" Jonathan Nieder
@ 2011-05-26 16:58         ` Junio C Hamano
  2011-05-26 17:11           ` Jonathan Nieder
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2011-05-26 16:58 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jeff King, git, Erik Faye-Lund, Aman Gupta, Ryan Tomayko

Jonathan Nieder <jrnieder@gmail.com> writes:

> This server-side deadlock started being triggered by shallow clones
> when sv.gnu.org upgraded to v1.7.2.5 a couple of months ago[1].  So it
> might be worth thinking about how to help upgrade-averse server admins
> to fix it.

Upgrade-averse people can be fixed by keeping them closer to the tip, no?
I don't plan to issue any more maintenance release on 1.7.2.X track beyond
what is already released, unless there is a high priority security fix or
something.

Placing it on 1.7.4.X and newer maintenance tracks is a separate matter.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [1.7.2] Please cherry-pick "upload-pack: start pack-objects before async rev-list"
  2011-05-26 16:58         ` Junio C Hamano
@ 2011-05-26 17:11           ` Jonathan Nieder
  0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Nieder @ 2011-05-26 17:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Erik Faye-Lund, Aman Gupta, Ryan Tomayko

Junio C Hamano wrote:
> Jonathan Nieder <jrnieder@gmail.com> writes:

>> This server-side deadlock started being triggered by shallow clones
>> when sv.gnu.org upgraded to v1.7.2.5 a couple of months ago[1].  So it
>> might be worth thinking about how to help upgrade-averse server admins
>> to fix it.
>
> Upgrade-averse people can be fixed by keeping them closer to the tip, no?
> I don't plan to issue any more maintenance release on 1.7.2.X track beyond
> what is already released, unless there is a high priority security fix or
> something.
>
> Placing it on 1.7.4.X and newer maintenance tracks is a separate matter.

Ok, that's fine with me.  (To be clear, I was suggesting
cherry-picking to the branches but not making a release, though I
realize that would mean more time wasted the next time it is time to
make a high priority security fix.)

I'll cherry-pick it as a Debian-specific patch, which should be good
enough to get shallow clones working again on the affected servers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-05-26 17:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-04  5:36 [RFC] upload-pack deadlock Jeff King
2011-04-06 17:20 ` Junio C Hamano
2011-04-06 17:54   ` Jeff King
2011-04-06 19:15     ` Erik Faye-Lund
2011-04-06 21:38       ` Jeff King
2011-04-06 21:33     ` Jeff King
2011-04-18  5:34       ` Jonathan Nieder
2011-05-26  6:45       ` [1.7.2] Please cherry-pick "upload-pack: start pack-objects before async rev-list" Jonathan Nieder
2011-05-26 16:58         ` Junio C Hamano
2011-05-26 17:11           ` Jonathan Nieder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).