From: Jeff King <peff@github.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Erik Faye-Lund <kusmabite@gmail.com>,
Aman Gupta <aman@github.com>, Ryan Tomayko <ryan@github.com>
Subject: Re: [RFC] upload-pack deadlock
Date: Wed, 6 Apr 2011 13:54:13 -0400 [thread overview]
Message-ID: <20110406175413.GA8205@sigill.intra.peff.net> (raw)
In-Reply-To: <7v8vvnjnyg.fsf@alter.siamese.dyndns.org>
On Wed, Apr 06, 2011 at 10:20:55AM -0700, Junio C Hamano wrote:
> > There are a few possible solutions:
> >
> > 1. Flip the order of initialization, so that we don't start writing
> > anything until the pack-objects reader is already in place. That
> > works in this case, and the patch is at the end of this mail. The
> > biggest problem is that it doesn't fix the general case.
>
> In what sense are you not fixing "the general case", though?
>
> If a program runs two threads, both of which access the FILE streams, it
> should be the responsibility of the program to get these threads
> coordinated to avoid such races and deadlocks, no?
Yes, but the problem is that looking at the code of the two threads, you
would never realize there is a deadlock. They never intentionally try to
touch the same stream. The real problem is buried in the run-command
code which calls fflush(NULL). This is inherently thread-unsafe, because
it wants to touch global data. It does do proper locking, at least, but
there is a deadlock issue, as demonstrated here.
We could do something like this:
diff --git a/run-command.c b/run-command.c
index 8619c76..ec8a2e6 100644
--- a/run-command.c
+++ b/run-command.c
@@ -2,6 +2,21 @@
#include "run-command.h"
#include "exec_cmd.h"
+static void flush_one(FILE *fh) {
+ /* if we can't get the lock, some thread
+ * is already writing/flushing it */
+ if (ftrylockfile(fh))
+ return;
+
+ fflush(fh);
+ funlockfile(fh);
+}
+
+static void flush_all() {
+ flush_one(stdout);
+ flush_one(stderr);
+}
+
static inline void close_pair(int fd[2])
{
close(fd[0]);
@@ -199,7 +214,7 @@ fail_pipe:
}
trace_argv_printf(cmd->argv, "trace: run_command:");
- fflush(NULL);
+ flush_all();
#ifndef WIN32
{
@@ -530,7 +545,7 @@ int start_async(struct async *async)
#ifdef NO_PTHREADS
/* Flush stdio before fork() to avoid cloning buffers */
- fflush(NULL);
+ flush_all();
async->pid = fork();
if (async->pid < 0) {
but I'm not all that happy with it. It does remove the deadlock, though
it makes the race condition for duplicate output slightly worse. The
comment in this hunk:
+static void flush_one(FILE *fh) {
+ /* if we can't get the lock, some thread
+ * is already writing/flushing it */
+ if (ftrylockfile(fh))
+ return;
is a little optimistic. Somebody may be writing to the stream, but not
enough to flush. We fail to flush, and then the forked process has the
duplicated buffer. To be fair, this race condition already exists. If a
thread is writing and not flushing a buffer, it may do a write after the
fflush(NULL) but before the fork, anyway.
Of slightly more concern is this hunk:
+static void flush_all() {
+ flush_one(stdout);
+ flush_one(stderr);
+}
which obviously is not really "all" but just a fixed set of descriptors.
But AFAIK, there is no portable way to get the list of all streams (even
though it clearly must exist to implement fflush(NULL) properly).
I wonder if that matters, though. Consider why we fflush(NULL); it is to
avoid duplicate output across a fork. But we fork in only two cases:
1. To start an async process when we don't have pthreads. But for this
to be a problem, we must be using pthreads already.
2. To immediately exec a command. In that case, we control the whole
code path up to the point of exec (at which point we no longer care
about duplicated buffers), so we know which streams will be written
to. I assumed it was just stderr, but actually I think it may be
none at all. We replace the die routine with one that writes
straight to the stderr descriptor.
So I am wondering if we could simply drop the fflush(NULL) entirely in
the start_command case. And in the start_async case, move it inside the
NO_PTHREADS case.
I guess the fflush does do one other thing; it makes sure that output on
a single descriptor is ordered sensibly. And we would be losing that.
Bleh. I hate parallel programming. Maybe my original fix is the right
thing to do. It's simple and obviously correct. It does mean we might
run into this problem again, but we really don't use threads very much,
so it's probably not worth spending too much up-front effort to prevent
a later coding error that is not very likely to occur.
I do still wonder about the win32 deadlock that Erik mentioned. Does my
patch help at all, or is there another bug hiding somewhere? This
particular deadlock only occurs with shallow clones.
-Peff
next prev parent reply other threads:[~2011-04-06 17:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-04 5:36 [RFC] upload-pack deadlock Jeff King
2011-04-06 17:20 ` Junio C Hamano
2011-04-06 17:54 ` Jeff King [this message]
2011-04-06 19:15 ` Erik Faye-Lund
2011-04-06 21:38 ` Jeff King
2011-04-06 21:33 ` Jeff King
2011-04-18 5:34 ` Jonathan Nieder
2011-05-26 6:45 ` [1.7.2] Please cherry-pick "upload-pack: start pack-objects before async rev-list" Jonathan Nieder
2011-05-26 16:58 ` Junio C Hamano
2011-05-26 17:11 ` Jonathan Nieder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110406175413.GA8205@sigill.intra.peff.net \
--to=peff@github.com \
--cc=aman@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kusmabite@gmail.com \
--cc=ryan@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).