status hangs trying to get submodule summary

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* status hangs trying to get submodule summary
@ 2015-03-22  5:56 Wincent Colaiuta
  2015-03-22  7:44 ` [PATCH] status: read submodule process output before calling wait() Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Wincent Colaiuta @ 2015-03-22  5:56 UTC (permalink / raw)
  To: git

Hi,

I just ran into some odd behavior trying to update a submodule in one
of my projects where it would hang indefinitely trying to run either
`git status` or `git commit`.

Here's the minimal repro recipe:

mkdir demo
cd demo
git init
git submodule add git://github.com/ansible/ansible.git ansible
(cd ansible && git checkout v1.6.10)
git add ansible
git commit -m "initial commit"
(cd ansible && git checkout v1.8.4)
git config status.submodulesummary true
git status # hangs...
git commit # hangs...

At the time `git status` is hanging, these are the Git processes
running on the system:

12431 git status
12462 git submodule summary --files --for-status --summary-limit -1
12463 /bin/sh /usr/libexec/git-core/git-submodule summary --files
--for-status --summary-limit -1
12507 /bin/sh /usr/libexec/git-core/git-submodule summary --files
--for-status --summary-limit -1
12522 git log --pretty=format:  %m %s --first-parent
8959338284f6c6e44890b8911434285848f34859...ebc8d48d34296fe010096f044e2b7591df37a622

And `strace` shows the processes `wait4`-ing, except for the `git log`
one which is doing a `write` but apparently blocked:

# strace -p 12431
Process 12431 attached
wait4(12462, ^CProcess 12431 detached
 <detached ...>
# strace -p 12462
Process 12462 attached
wait4(12463, ^CProcess 12462 detached
 <detached ...>
# strace -p 12463
Process 12463 attached
wait4(-1, ^CProcess 12463 detached
 <detached ...>
# strace -p 12507
Process 12507 attached
wait4(-1, ^CProcess 12507 detached
 <detached ...>
# strace -p 12522
Process 12522 attached
write(1, "\n  > Merge pull request #7814 fr"..., 108^CProcess 12522 detached
 <detached ...>

This repros for me on Mac OS X 10.10.2 with Git 1.9.5 and Git 2.3.3,
and on the Amazon LInux (a RHEL-like OS) with Git 2.1.0. Both of these
with an empty .gitconfig (other than user.email, user.name and the
status.submodulesummary value already mentioned above).

I've never seen this hang before despite frequent use of submodules.
Oddly, I was able to work around the hang by moving the submodule in
two hops (one from Ansible v1.6.10 to v1.7.0, then from v1.7.0 to
v1.8.4). I am not sure if this is specific to the Ansible repo, or
whether the length of the summary is crossing some threshold that
triggers the bug to manifest. If I run the forked commands manually
from an interactive shell, they complete just fine.

-Greg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] status: read submodule process output before calling wait()
  2015-03-22  5:56 status hangs trying to get submodule summary Wincent Colaiuta
@ 2015-03-22  7:44 ` Jeff King
  2015-03-22  8:07   ` Jeff King
  2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
  0 siblings, 2 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22  7:44 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

On Sat, Mar 21, 2015 at 10:56:54PM -0700, Wincent Colaiuta wrote:

> I've never seen this hang before despite frequent use of submodules.
> Oddly, I was able to work around the hang by moving the submodule in
> two hops (one from Ansible v1.6.10 to v1.7.0, then from v1.7.0 to
> v1.8.4). I am not sure if this is specific to the Ansible repo, or
> whether the length of the summary is crossing some threshold that
> triggers the bug to manifest. If I run the forked commands manually
> from an interactive shell, they complete just fine.

It's the length of the summary. The fix is below.

-- >8 --
The status code tries to read the output of "git submodule
summary" over a pipe by waiting for the program to finish
and then reading its output, like this:

  run_command(&sm_summary);
  len = strbuf_read(&cmd_stdout, sm_summary.out, 1024);

Besides being a violation of the run-command API (which
makes no promises about the state of the struct after
run_command returns), this can easily lead to deadlock. The
"submodule status" process may fill up the pipe buffer and
block on write(). Meanwhile, the reading side in the parent
process is blocked in wait(), waiting for the child to
finish.

Instead, we should start the process, read everything it
produces, and only then call wait() to finish it off.

Signed-off-by: Jeff King <peff@peff.net>
---
I notice that we also don't detect when the sub-command fails. I don't
know what we would do in that case (abort the status? print a message?)
and it's orthogonal to this issue, so I left it for somebody more
clueful in the area to think about.

 wt-status.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 7036fa2..96f0033 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -748,9 +748,9 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt
 	fflush(s->fp);
 	sm_summary.out = -1;
 
-	run_command(&sm_summary);
-
+	start_command(&sm_summary);
 	len = strbuf_read(&cmd_stdout, sm_summary.out, 1024);
+	finish_command(&sm_summary);
 
 	/* prepend header, only if there's an actual output */
 	if (len) {
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] status: read submodule process output before calling wait()
  2015-03-22  7:44 ` [PATCH] status: read submodule process output before calling wait() Jeff King
@ 2015-03-22  8:07   ` Jeff King
  2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22  8:07 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

On Sun, Mar 22, 2015 at 03:44:55AM -0400, Jeff King wrote:

> The status code tries to read the output of "git submodule
> summary" over a pipe by waiting for the program to finish
> and then reading its output, like this:
> 
>   run_command(&sm_summary);
>   len = strbuf_read(&cmd_stdout, sm_summary.out, 1024);

By the way, I spotted this code as bogus immediately upon seeing it
(though certainly it helped to know there was a deadlock in the area,
which had me thinking about such things). So I wondered if it could have
been easy to catch in review, but its introduction was a little bit
subtle.

The original run_command invocation came in ac8d5af (builtin-status:
submodule summary support, 2008-04-12), which just let the sub-process
dump its stdout to the same descriptor that the rest of the status
output was going to. So the use of run_command there was fine. It was
later, in 3ba7407 (submodule summary: ignore --for-status option,
2013-09-06), that we started post-processing the output and it became
buggy. But that's harder to see in review.

> Besides being a violation of the run-command API (which
> makes no promises about the state of the struct after
> run_command returns),

This may be overly harsh of me. Certainly we make no guarantees (and
things like the dynamic "args" and "env_array" are cleaned up
automatically after finish_command returns), but I would not be
surprised if there are other spots that treat "struct child_process" as
transparent rather than as a black box.

It's really the run_command + pipe construct that is really the danger
here. I wonder if we should do something like this:

diff --git a/run-command.c b/run-command.c
index 3afb124..78807de 100644
--- a/run-command.c
+++ b/run-command.c
@@ -557,7 +557,12 @@ int finish_command(struct child_process *cmd)

 int run_command(struct child_process *cmd)
 {
-	int code = start_command(cmd);
+	int code;
+
+	if (cmd->out < 0)
+		die("BUG: run_command with a pipe can cause deadlock");
+
+	code = start_command(cmd);
 	if (code)
 		return code;
 	return finish_command(cmd);

It seems to catch at least one other dubious construct.

-Peff

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks
  2015-03-22  7:44 ` [PATCH] status: read submodule process output before calling wait() Jeff King
  2015-03-22  8:07   ` Jeff King
@ 2015-03-22  9:59   ` Jeff King
  2015-03-22 10:00     ` [PATCH 1/7] wt-status: don't flush before running "submodule status" Jeff King
                       ` (6 more replies)
  1 sibling, 7 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22  9:59 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

On Sun, Mar 22, 2015 at 03:44:55AM -0400, Jeff King wrote:

> diff --git a/wt-status.c b/wt-status.c
> index 7036fa2..96f0033 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -748,9 +748,9 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt
>  	fflush(s->fp);
>  	sm_summary.out = -1;
>  
> -	run_command(&sm_summary);
> -
> +	start_command(&sm_summary);
>  	len = strbuf_read(&cmd_stdout, sm_summary.out, 1024);
> +	finish_command(&sm_summary);

This isn't quite right. In both the original and the new one, if
start_command fails, we may call strbuf_read on whatever it happens to
have left in sm_summary.out. Worse, in the new one, we would call
finish_command on something that was never started.  And in both old and
new, we leak the sm_summary.out descriptor.

Furthermore, I found two other potential deadlocks (and one more
descriptor leak). I tried at first to fix them all up like I did with
the above patch, but I found that I was introducing similar problems in
each case. So instead, I pulled this pattern out into its own strbuf
helper.

So here's a replacement series.

  [1/7]: wt-status: don't flush before running "submodule status"
  [2/7]: wt_status: fix signedness mismatch in strbuf_read call
  [3/7]: strbuf: introduce strbuf_read_cmd helper
  [4/7]: wt-status: use strbuf_read_cmd
  [5/7]: submodule: use strbuf_read_cmd
  [6/7]: trailer: use strbuf_read_cmd
  [7/7]: run-command: forbid using run_command with piped output

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/7] wt-status: don't flush before running "submodule status"
  2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
@ 2015-03-22 10:00     ` Jeff King
  2015-03-22 10:00     ` [PATCH 2/7] wt_status: fix signedness mismatch in strbuf_read call Jeff King
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22 10:00 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

This is a holdover from the original implementation in
ac8d5af (builtin-status: submodule summary support,
2008-04-12), which just had the sub-process output to our
descriptor; we had to make sure we had flushed any data that
we produced before it started writing.

Since 3ba7407 (submodule summary: ignore --for-status
option, 2013-09-06), however, we pipe the sub-process output
back to ourselves. So there's no longer any need to flush
(it does not hurt, but it may leave readers wondering why we
do it).

Signed-off-by: Jeff King <peff@peff.net>
---
 wt-status.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/wt-status.c b/wt-status.c
index 7036fa2..1712762 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -745,7 +745,6 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt

 	sm_summary.git_cmd = 1;
 	sm_summary.no_stdin = 1;
-	fflush(s->fp);
 	sm_summary.out = -1;

 	run_command(&sm_summary);
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/7] wt_status: fix signedness mismatch in strbuf_read call
  2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
  2015-03-22 10:00     ` [PATCH 1/7] wt-status: don't flush before running "submodule status" Jeff King
@ 2015-03-22 10:00     ` Jeff King
  2015-03-22 10:07     ` [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper Jeff King
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22 10:00 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

We call strbuf_read(), and want to know whether we got any
output. To do so, we assign the result to a size_t, and
check whether it is non-zero.

But strbuf_read returns a signed ssize_t. If it encounters
an error, it will return -1, and we'll end up treating this
the same as if we had gotten output. Instead, we can just
check whether our buffer has anything in it (which is what
we care about anyway, and is the same thing since we know
the buffer was empty to begin with).

Note that the "len" variable actually has two roles in this
function. Now that we've eliminated the first, we can push the
declaration closer to the point of use for the second one.

Signed-off-by: Jeff King <peff@peff.net>
---
 wt-status.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 1712762..b47f6d9 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -729,7 +729,6 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt
 	struct strbuf cmd_stdout = STRBUF_INIT;
 	struct strbuf summary = STRBUF_INIT;
 	char *summary_content;
-	size_t len;
 
 	argv_array_pushf(&sm_summary.env_array, "GIT_INDEX_FILE=%s",
 			 s->index_file);
@@ -749,10 +748,10 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt
 
 	run_command(&sm_summary);
 
-	len = strbuf_read(&cmd_stdout, sm_summary.out, 1024);
+	strbuf_read(&cmd_stdout, sm_summary.out, 1024);
 
 	/* prepend header, only if there's an actual output */
-	if (len) {
+	if (cmd_stdout.len) {
 		if (uncommitted)
 			strbuf_addstr(&summary, _("Submodules changed but not updated:"));
 		else
@@ -763,6 +762,7 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt
 	strbuf_release(&cmd_stdout);
 
 	if (s->display_comment_prefix) {
+		size_t len;
 		summary_content = strbuf_detach(&summary, &len);
 		strbuf_add_commented_lines(&summary, summary_content, len);
 		free(summary_content);
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper
  2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
  2015-03-22 10:00     ` [PATCH 1/7] wt-status: don't flush before running "submodule status" Jeff King
  2015-03-22 10:00     ` [PATCH 2/7] wt_status: fix signedness mismatch in strbuf_read call Jeff King
@ 2015-03-22 10:07     ` Jeff King
  2015-03-22 19:36       ` Eric Sunshine
  2015-03-22 23:22       ` Junio C Hamano
  2015-03-22 10:08     ` [PATCH 4/7] wt-status: use strbuf_read_cmd Jeff King
                       ` (3 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22 10:07 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

Something as simple as reading the stdout from a command
turns out to be rather hard to do right. Doing:

  if (!run_command(&cmd))
        strbuf_read(&buf, cmd.out, 0);

can result in deadlock if the child process produces a large
amount of output. What happens is:

  1. The parent spawns the child with its stdout connected
     to a pipe, of which the parent is the sole reader.

  2. The parent calls wait(), blocking until the child exits.

  3. The child writes to stdout. If it writes more data than
     the OS pipe buffer can hold, the write() call will
     block.

This is a deadlock; the parent is waiting for the child to
exit, and the child is waiting for the parent to call
read().

So we should do instead:

  if (!start_command(&cmd)) {
        strbuf_read(&buf, cmd.out, 0);
        finish_command(&cmd);
  }

But note that this leaks cmd.out (which must be closed). And
there's no error handling for strbuf_read. We probably want
to know whether the operation succeeded, but we must make
sure to always run finish_command even if the read failed
(or else we leave a zombie child process).

Let's introduce a strbuf helper that can make this a bit
simpler for callers to do right.

Signed-off-by: Jeff King <peff@peff.net>
---
This is really at the intersection of the strbuf and
run-command APIs, so you could argue for it being part of
either It is logically quite like the strbuf_read_file()
function, so I put it there.

 strbuf.c | 17 +++++++++++++++++
 strbuf.h | 10 ++++++++++
 2 files changed, 27 insertions(+)

diff --git a/strbuf.c b/strbuf.c
index 88cafd4..9d1d48f 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "refs.h"
 #include "utf8.h"
+#include "run-command.h"
 
 int starts_with(const char *str, const char *prefix)
 {
@@ -414,6 +415,22 @@ int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint)
 	return -1;
 }
 
+int strbuf_read_cmd(struct strbuf *sb, struct child_process *cmd, size_t hint)
+{
+	cmd->out = -1;
+	if (start_command(cmd) < 0)
+		return -1;
+
+	if (strbuf_read(sb, cmd->out, hint) < 0) {
+		close(cmd->out);
+		finish_command(cmd); /* throw away exit code */
+		return -1;
+	}
+
+	close(cmd->out);
+	return finish_command(cmd);
+}
+
 int strbuf_getcwd(struct strbuf *sb)
 {
 	size_t oldalloc = sb->alloc;
diff --git a/strbuf.h b/strbuf.h
index 1883494..93a50da 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -1,6 +1,8 @@
 #ifndef STRBUF_H
 #define STRBUF_H
 
+struct child_process;
+
 /**
  * strbuf's are meant to be used with all the usual C string and memory
  * APIs. Given that the length of the buffer is known, it's often better to
@@ -373,6 +375,14 @@ extern int strbuf_read_file(struct strbuf *sb, const char *path, size_t hint);
 extern int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint);
 
 /**
+ * Execute the given command, capturing its stdout in the given strbuf.
+ * Returns -1 if starting the command fails or reading fails, and otherwise
+ * returns the exit code of the command. The output collected in the
+ * buffer is kept even if the command returns a non-zero exit.
+ */
+int strbuf_read_cmd(struct strbuf *sb, struct child_process *cmd, size_t hint);
+
+/**
  * Read a line from a FILE *, overwriting the existing contents
  * of the strbuf. The second argument specifies the line
  * terminator character, typically `'\n'`.
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper
  2015-03-22 10:07     ` [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper Jeff King
@ 2015-03-22 19:36       ` Eric Sunshine
  2015-03-22 22:54         ` Junio C Hamano
  2015-03-22 23:34         ` [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper Jeff King
  2015-03-22 23:22       ` Junio C Hamano
  1 sibling, 2 replies; 26+ messages in thread
From: Eric Sunshine @ 2015-03-22 19:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Wincent Colaiuta, Git List

On Sun, Mar 22, 2015 at 6:07 AM, Jeff King <peff@peff.net> wrote:
> Something as simple as reading the stdout from a command
> turns out to be rather hard to do right. Doing:
>
>   if (!run_command(&cmd))
>         strbuf_read(&buf, cmd.out, 0);
>
> can result in deadlock if the child process produces a large
> amount of output. [...]
>
> Let's introduce a strbuf helper that can make this a bit
> simpler for callers to do right.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> This is really at the intersection of the strbuf and
> run-command APIs, so you could argue for it being part of
> either It is logically quite like the strbuf_read_file()
> function, so I put it there.

It does feel like a layering violation. If moved to the run-command
API, it could given one of the following names or something better:

    run_command_capture()
    capture_command()
    command_capture()
    run_command_with_output()
    capture_output()

> diff --git a/strbuf.h b/strbuf.h
> index 1883494..93a50da 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -1,6 +1,8 @@
>  #ifndef STRBUF_H
>  #define STRBUF_H
>
> +struct child_process;
> +
>  /**
>   * strbuf's are meant to be used with all the usual C string and memory
>   * APIs. Given that the length of the buffer is known, it's often better to
> @@ -373,6 +375,14 @@ extern int strbuf_read_file(struct strbuf *sb, const char *path, size_t hint);
>  extern int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint);
>
>  /**
> + * Execute the given command, capturing its stdout in the given strbuf.
> + * Returns -1 if starting the command fails or reading fails, and otherwise
> + * returns the exit code of the command. The output collected in the
> + * buffer is kept even if the command returns a non-zero exit.
> + */
> +int strbuf_read_cmd(struct strbuf *sb, struct child_process *cmd, size_t hint);
> +
> +/**
>   * Read a line from a FILE *, overwriting the existing contents
>   * of the strbuf. The second argument specifies the line
>   * terminator character, typically `'\n'`.
> --
> 2.3.3.618.ga041503

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper
  2015-03-22 19:36       ` Eric Sunshine
@ 2015-03-22 22:54         ` Junio C Hamano
  2015-03-22 23:40           ` Junio C Hamano
  2015-03-22 23:34         ` [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2015-03-22 22:54 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Jeff King, Wincent Colaiuta, Git List

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Sun, Mar 22, 2015 at 6:07 AM, Jeff King <peff@peff.net> wrote:
>> Something as simple as reading the stdout from a command
>> turns out to be rather hard to do right. Doing:
>>
>>   if (!run_command(&cmd))
>>         strbuf_read(&buf, cmd.out, 0);
>>
>> can result in deadlock if the child process produces a large
>> amount of output. [...]
>>
>> Let's introduce a strbuf helper that can make this a bit
>> simpler for callers to do right.
>>
>> Signed-off-by: Jeff King <peff@peff.net>
>> ---
>> This is really at the intersection of the strbuf and
>> run-command APIs, so you could argue for it being part of
>> either It is logically quite like the strbuf_read_file()
>> function, so I put it there.
>
> It does feel like a layering violation. If moved to the run-command
> API, it could given one of the following names or something better:
>
>     run_command_capture()
>     capture_command()
>     command_capture()
>     run_command_with_output()
>     capture_output()

Sound like a good suggestion (but I haven't read the users of the
proposed function, after doing which I might change my mind---I'll
see).

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper
  2015-03-22 22:54         ` Junio C Hamano
@ 2015-03-22 23:40           ` Junio C Hamano
  2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2015-03-22 23:40 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Jeff King, Wincent Colaiuta, Git List

Junio C Hamano <gitster@pobox.com> writes:

> Eric Sunshine <sunshine@sunshineco.com> writes:
>
>> It does feel like a layering violation. If moved to the run-command
>> API, it could given one of the following names or something better:
>>
>>     run_command_capture()
>>     capture_command()
>>     command_capture()
>>     run_command_with_output()
>>     capture_output()
>
> Sound like a good suggestion (but I haven't read the users of the
> proposed function, after doing which I might change my mind---I'll
> see).

Now I read the callers, it does look like this new function better
fits in the run-command suite, essentially allowing us to do what we
would do with $(cmd) or `cmd` in shell and Perl scripts, even though
I do not particularly agree with the phrase "layering violation" to
call its current placement.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 0/7] introduce capture_command to avoid deadlocks
  2015-03-22 23:40           ` Junio C Hamano
@ 2015-03-23  3:53             ` Jeff King
  2015-03-23  3:53               ` [PATCH v2 1/7] wt-status: don't flush before running "submodule status" Jeff King
                                 ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Jeff King @ 2015-03-23  3:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Wincent Colaiuta, Git List

On Sun, Mar 22, 2015 at 04:40:37PM -0700, Junio C Hamano wrote:

> Now I read the callers, it does look like this new function better
> fits in the run-command suite, essentially allowing us to do what we
> would do with $(cmd) or `cmd` in shell and Perl scripts, even though
> I do not particularly agree with the phrase "layering violation" to
> call its current placement.

I was on the fence, and you both seem to prefer it in run-command, so
here is a re-roll in that direction (no other changes).

  [1/7]: wt-status: don't flush before running "submodule status"
  [2/7]: wt_status: fix signedness mismatch in strbuf_read call
  [3/7]: run-command: introduce capture_command helper
  [4/7]: wt-status: use capture_command
  [5/7]: submodule: use capture_command
  [6/7]: trailer: use capture_command
  [7/7]: run-command: forbid using run_command with piped output

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/7] wt-status: don't flush before running "submodule status"
  2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
@ 2015-03-23  3:53               ` Jeff King
  2015-03-23  3:53               ` [PATCH v2 2/7] wt_status: fix signedness mismatch in strbuf_read call Jeff King
                                 ` (6 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-23  3:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Wincent Colaiuta, Git List

This is a holdover from the original implementation in
ac8d5af (builtin-status: submodule summary support,
2008-04-12), which just had the sub-process output to our
descriptor; we had to make sure we had flushed any data that
we produced before it started writing.

Since 3ba7407 (submodule summary: ignore --for-status
option, 2013-09-06), however, we pipe the sub-process output
back to ourselves. So there's no longer any need to flush
(it does not hurt, but it may leave readers wondering why we
do it).

Signed-off-by: Jeff King <peff@peff.net>
---
 wt-status.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/wt-status.c b/wt-status.c
index 7036fa2..1712762 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -745,7 +745,6 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt

 	sm_summary.git_cmd = 1;
 	sm_summary.no_stdin = 1;
-	fflush(s->fp);
 	sm_summary.out = -1;

 	run_command(&sm_summary);
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/7] wt_status: fix signedness mismatch in strbuf_read call
  2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
  2015-03-23  3:53               ` [PATCH v2 1/7] wt-status: don't flush before running "submodule status" Jeff King
@ 2015-03-23  3:53               ` Jeff King
  2015-03-23  3:53               ` [PATCH v2 3/7] run-command: introduce capture_command helper Jeff King
                                 ` (5 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-23  3:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Wincent Colaiuta, Git List

We call strbuf_read(), and want to know whether we got any
output. To do so, we assign the result to a size_t, and
check whether it is non-zero.

But strbuf_read returns a signed ssize_t. If it encounters
an error, it will return -1, and we'll end up treating this
the same as if we had gotten output. Instead, we can just
check whether our buffer has anything in it (which is what
we care about anyway, and is the same thing since we know
the buffer was empty to begin with).

Note that the "len" variable actually has two roles in this
function. Now that we've eliminated the first, we can push the
declaration closer to the point of use for the second one.

Signed-off-by: Jeff King <peff@peff.net>
---
 wt-status.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 1712762..b47f6d9 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -729,7 +729,6 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt
 	struct strbuf cmd_stdout = STRBUF_INIT;
 	struct strbuf summary = STRBUF_INIT;
 	char *summary_content;
-	size_t len;
 
 	argv_array_pushf(&sm_summary.env_array, "GIT_INDEX_FILE=%s",
 			 s->index_file);
@@ -749,10 +748,10 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt
 
 	run_command(&sm_summary);
 
-	len = strbuf_read(&cmd_stdout, sm_summary.out, 1024);
+	strbuf_read(&cmd_stdout, sm_summary.out, 1024);
 
 	/* prepend header, only if there's an actual output */
-	if (len) {
+	if (cmd_stdout.len) {
 		if (uncommitted)
 			strbuf_addstr(&summary, _("Submodules changed but not updated:"));
 		else
@@ -763,6 +762,7 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt
 	strbuf_release(&cmd_stdout);
 
 	if (s->display_comment_prefix) {
+		size_t len;
 		summary_content = strbuf_detach(&summary, &len);
 		strbuf_add_commented_lines(&summary, summary_content, len);
 		free(summary_content);
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/7] run-command: introduce capture_command helper
  2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
  2015-03-23  3:53               ` [PATCH v2 1/7] wt-status: don't flush before running "submodule status" Jeff King
  2015-03-23  3:53               ` [PATCH v2 2/7] wt_status: fix signedness mismatch in strbuf_read call Jeff King
@ 2015-03-23  3:53               ` Jeff King
  2015-03-23  3:53               ` [PATCH v2 4/7] wt-status: use capture_command Jeff King
                                 ` (4 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-23  3:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Wincent Colaiuta, Git List

Something as simple as reading the stdout from a command
turns out to be rather hard to do right. Doing:

  cmd.out = -1;
  run_command(&cmd);
  strbuf_read(&buf, cmd.out, 0);

can result in deadlock if the child process produces a large
amount of output. What happens is:

  1. The parent spawns the child with its stdout connected
     to a pipe, of which the parent is the sole reader.

  2. The parent calls wait(), blocking until the child exits.

  3. The child writes to stdout. If it writes more data than
     the OS pipe buffer can hold, the write() call will
     block.

This is a deadlock; the parent is waiting for the child to
exit, and the child is waiting for the parent to call
read().

So we might try instead:

  start_command(&cmd);
  strbuf_read(&buf, cmd.out, 0);
  finish_command(&cmd);

But that is not quite right either. We are examining cmd.out
and running finish_command whether start_command succeeded
or not, which is wrong. Moreover, these snippets do not do
any error handling. If our read() fails, we must make sure
to still call finish_command (to reap the child process).
And both snippets failed to close the cmd.out descriptor,
which they must do (provided start_command succeeded).

Let's introduce a run-command helper that can make this a
bit simpler for callers to get right.

Signed-off-by: Jeff King <peff@peff.net>
---
 run-command.c | 16 ++++++++++++++++
 run-command.h | 13 +++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/run-command.c b/run-command.c
index 3afb124..e591d2c 100644
--- a/run-command.c
+++ b/run-command.c
@@ -829,3 +829,19 @@ int run_hook_le(const char *const *env, const char *name, ...)
 
 	return ret;
 }
+
+int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint)
+{
+	cmd->out = -1;
+	if (start_command(cmd) < 0)
+		return -1;
+
+	if (strbuf_read(buf, cmd->out, hint) < 0) {
+		close(cmd->out);
+		finish_command(cmd); /* throw away exit code */
+		return -1;
+	}
+
+	close(cmd->out);
+	return finish_command(cmd);
+}
diff --git a/run-command.h b/run-command.h
index d6868dc..263b966 100644
--- a/run-command.h
+++ b/run-command.h
@@ -71,6 +71,19 @@ int run_command_v_opt(const char **argv, int opt);
  */
 int run_command_v_opt_cd_env(const char **argv, int opt, const char *dir, const char *const *env);
 
+/**
+ * Execute the given command, capturing its stdout in the given strbuf.
+ * Returns -1 if starting the command fails or reading fails, and otherwise
+ * returns the exit code of the command. The output collected in the
+ * buffer is kept even if the command returns a non-zero exit. The hint field
+ * gives a starting size for the strbuf allocation.
+ *
+ * The fields of "cmd" should be set up as they would for a normal run_command
+ * invocation. But note that there is no need to set cmd->out; the function
+ * sets it up for the caller.
+ */
+int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint);
+
 /*
  * The purpose of the following functions is to feed a pipe by running
  * a function asynchronously and providing output that the caller reads.
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 4/7] wt-status: use capture_command
  2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
                                 ` (2 preceding siblings ...)
  2015-03-23  3:53               ` [PATCH v2 3/7] run-command: introduce capture_command helper Jeff King
@ 2015-03-23  3:53               ` Jeff King
  2015-03-23  3:53               ` [PATCH v2 5/7] submodule: " Jeff King
                                 ` (3 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-23  3:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Wincent Colaiuta, Git List

When we spawn "git submodule status" to read its output, we
use run_command() followed by strbuf_read() read from the
pipe. This can deadlock if the subprocess output is larger
than the system pipe buffer.

Furthermore, if start_command() fails, we'll try to read
from a bogus descriptor (probably "-1" or a descriptor we
just closed, but it is a bad idea for us to make assumptions
about how start_command implements its error handling). And
if start_command succeeds, we leak the file descriptor for
the pipe to the child.

All of these can be solved by using the capture_command
helper.

Signed-off-by: Jeff King <peff@peff.net>
---
 wt-status.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index b47f6d9..853419f 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -744,11 +744,8 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt
 
 	sm_summary.git_cmd = 1;
 	sm_summary.no_stdin = 1;
-	sm_summary.out = -1;
 
-	run_command(&sm_summary);
-
-	strbuf_read(&cmd_stdout, sm_summary.out, 1024);
+	capture_command(&sm_summary, &cmd_stdout, 1024);
 
 	/* prepend header, only if there's an actual output */
 	if (cmd_stdout.len) {
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 5/7] submodule: use capture_command
  2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
                                 ` (3 preceding siblings ...)
  2015-03-23  3:53               ` [PATCH v2 4/7] wt-status: use capture_command Jeff King
@ 2015-03-23  3:53               ` Jeff King
  2015-03-23  3:54               ` [PATCH v2 6/7] trailer: " Jeff King
                                 ` (2 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-23  3:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Wincent Colaiuta, Git List

In is_submodule_commit_present, we call run_command followed
by a pipe read, which is prone to deadlock. It is unlikely
to happen in this case, as rev-list should never produce
more than a single line of output, but it does not hurt to
avoid an anti-pattern (and using the helper simplifies the
setup and cleanup).

Signed-off-by: Jeff King <peff@peff.net>
---
 submodule.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index d37d400..c0e6c81 100644
--- a/submodule.c
+++ b/submodule.c
@@ -576,12 +576,10 @@ static int is_submodule_commit_present(const char *path, unsigned char sha1[20])
 		cp.env = local_repo_env;
 		cp.git_cmd = 1;
 		cp.no_stdin = 1;
-		cp.out = -1;
 		cp.dir = path;
-		if (!run_command(&cp) && !strbuf_read(&buf, cp.out, 1024))
+		if (!capture_command(&cp, &buf, 1024) && !buf.len)
 			is_present = 1;
 
-		close(cp.out);
 		strbuf_release(&buf);
 	}
 	return is_present;
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 6/7] trailer: use capture_command
  2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
                                 ` (4 preceding siblings ...)
  2015-03-23  3:53               ` [PATCH v2 5/7] submodule: " Jeff King
@ 2015-03-23  3:54               ` Jeff King
  2015-03-23  3:54               ` [PATCH v2 7/7] run-command: forbid using run_command with piped output Jeff King
  2015-03-23  4:40               ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Junio C Hamano
  7 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-23  3:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Wincent Colaiuta, Git List

When we read from a trailer.*.command sub-program, the
current code uses run_command followed by a pipe read, which
can result in deadlock (though in practice you would have to
have a large trailer for this to be a problem). The current
code also leaks the file descriptor for the pipe to the
sub-command.

Instead, let's use capture_command, which makes this simpler
(and we can get rid of our custom helper).

Signed-off-by: Jeff King <peff@peff.net>
---
 trailer.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/trailer.c b/trailer.c
index 05b3859..4b14a56 100644
--- a/trailer.c
+++ b/trailer.c
@@ -214,16 +214,6 @@ static struct trailer_item *remove_first(struct trailer_item **first)
 	return item;
 }
 
-static int read_from_command(struct child_process *cp, struct strbuf *buf)
-{
-	if (run_command(cp))
-		return error("running trailer command '%s' failed", cp->argv[0]);
-	if (strbuf_read(buf, cp->out, 1024) < 1)
-		return error("reading from trailer command '%s' failed", cp->argv[0]);
-	strbuf_trim(buf);
-	return 0;
-}
-
 static const char *apply_command(const char *command, const char *arg)
 {
 	struct strbuf cmd = STRBUF_INIT;
@@ -240,14 +230,16 @@ static const char *apply_command(const char *command, const char *arg)
 	cp.argv = argv;
 	cp.env = local_repo_env;
 	cp.no_stdin = 1;
-	cp.out = -1;
 	cp.use_shell = 1;
 
-	if (read_from_command(&cp, &buf)) {
+	if (capture_command(&cp, &buf, 1024)) {
+		error("running trailer command '%s' failed", cmd.buf);
 		strbuf_release(&buf);
 		result = xstrdup("");
-	} else
+	} else {
+		strbuf_trim(&buf);
 		result = strbuf_detach(&buf, NULL);
+	}
 
 	strbuf_release(&cmd);
 	return result;
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 7/7] run-command: forbid using run_command with piped output
  2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
                                 ` (5 preceding siblings ...)
  2015-03-23  3:54               ` [PATCH v2 6/7] trailer: " Jeff King
@ 2015-03-23  3:54               ` Jeff King
  2015-03-23  4:40               ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Junio C Hamano
  7 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-23  3:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Wincent Colaiuta, Git List

Because run_command both spawns and wait()s for the command
before returning control to the caller, any reads from the
pipes we open must necessarily happen after wait() returns.
This can lead to deadlock, as the child process may block
on writing to us while we are blocked waiting for it to
exit.

Worse, it only happens when the child fills the pipe
buffer, which means that the problem may come and go
depending on the platform and the size of the output
produced by the child.

Let's detect and flag this dangerous construct so that we
can catch potential bugs early in the test suite rather than
having them happen in the field.

Signed-off-by: Jeff King <peff@peff.net>
---
 run-command.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index e591d2c..aad03ab 100644
--- a/run-command.c
+++ b/run-command.c
@@ -557,7 +557,12 @@ int finish_command(struct child_process *cmd)

 int run_command(struct child_process *cmd)
 {
-	int code = start_command(cmd);
+	int code;
+
+	if (cmd->out < 0 || cmd->err < 0)
+		die("BUG: run_command with a pipe can cause deadlock");
+
+	code = start_command(cmd);
 	if (code)
 		return code;
 	return finish_command(cmd);
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/7] introduce capture_command to avoid deadlocks
  2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
                                 ` (6 preceding siblings ...)
  2015-03-23  3:54               ` [PATCH v2 7/7] run-command: forbid using run_command with piped output Jeff King
@ 2015-03-23  4:40               ` Junio C Hamano
  7 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2015-03-23  4:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, Wincent Colaiuta, Git List

Thanks; will queue.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper
  2015-03-22 19:36       ` Eric Sunshine
  2015-03-22 22:54         ` Junio C Hamano
@ 2015-03-22 23:34         ` Jeff King
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22 23:34 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Wincent Colaiuta, Git List

On Sun, Mar 22, 2015 at 03:36:01PM -0400, Eric Sunshine wrote:

> > This is really at the intersection of the strbuf and
> > run-command APIs, so you could argue for it being part of
> > either It is logically quite like the strbuf_read_file()
> > function, so I put it there.
> 
> It does feel like a layering violation. If moved to the run-command
> API, it could given one of the following names or something better:

A layering violation implies there is an ordering to the APIs. Certainly
we call APIs from other APIs all the time. I guess you could argue that
these are the "same" layer, and should be next to each, and not building
on each other (i.e., that strbuf has dependencies only on system APIs
like stdio.h, and run-command only on system APIs like unistd.h, etc).

But then reversing the order of the dependency does not really solve
that. You would have to introduce a new higher-level API that combines
them. But that seems silly for a single function (and I do not foresee
any other similar functions).

That being said, I'm not opposed to one of the reverse names if people
feel strongly (I also considered making it an option flag to
run_command_v_opt, but it ended up tangling things quite a bit more).

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper
  2015-03-22 10:07     ` [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper Jeff King
  2015-03-22 19:36       ` Eric Sunshine
@ 2015-03-22 23:22       ` Junio C Hamano
  2015-03-22 23:36         ` Jeff King
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2015-03-22 23:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Wincent Colaiuta, git

Jeff King <peff@peff.net> writes:

> diff --git a/strbuf.h b/strbuf.h
> index 1883494..93a50da 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -1,6 +1,8 @@
>  #ifndef STRBUF_H
>  #define STRBUF_H
>  
> +struct child_process;
> +
>  /**
>   * strbuf's are meant to be used with all the usual C string and memory
>   * APIs. Given that the length of the buffer is known, it's often better to
> @@ -373,6 +375,14 @@ extern int strbuf_read_file(struct strbuf *sb, const char *path, size_t hint);
>  extern int strbuf_readlink(struct strbuf *sb, const char *path, size_t hint);
>  
>  /**
> + * Execute the given command, capturing its stdout in the given strbuf.
> + * Returns -1 if starting the command fails or reading fails, and otherwise
> + * returns the exit code of the command. The output collected in the
> + * buffer is kept even if the command returns a non-zero exit.
> + */
> +int strbuf_read_cmd(struct strbuf *sb, struct child_process *cmd, size_t hint);
> +
> +/**
>   * Read a line from a FILE *, overwriting the existing contents
>   * of the strbuf. The second argument specifies the line
>   * terminator character, typically `'\n'`.

It is an unfortunate tangent that this is a bugfix that may want to
go to 'maint' and older, but our earlier jk/strbuf-doc-to-header
topic introduces an unnecessary merge conflicts.

I've wiggled this part and moved the doc elsewhere, only to remove
that in the merge, which may not be optimal from the point of view
of what I have to do when merging this topic down from pu to next
to master to maint, but I do not see a good way around it.

Thanks.  The whole series looks very sensible.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper
  2015-03-22 23:22       ` Junio C Hamano
@ 2015-03-22 23:36         ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22 23:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Wincent Colaiuta, git

On Sun, Mar 22, 2015 at 04:22:50PM -0700, Junio C Hamano wrote:

> > +/**
> >   * Read a line from a FILE *, overwriting the existing contents
> >   * of the strbuf. The second argument specifies the line
> >   * terminator character, typically `'\n'`.
> 
> It is an unfortunate tangent that this is a bugfix that may want to
> go to 'maint' and older, but our earlier jk/strbuf-doc-to-header
> topic introduces an unnecessary merge conflicts.

Yeah, that is the worst part of refactoring and cleanup. Even when you
make sure you are not hurting any topics in flight, you cannot know when
a new topic will take off in your general area.

> I've wiggled this part and moved the doc elsewhere, only to remove
> that in the merge, which may not be optimal from the point of view
> of what I have to do when merging this topic down from pu to next
> to master to maint, but I do not see a good way around it.

I'd suggest just dropping the documentation in the "maint" version
(i.e., make it a moral cherry-pick of the function declaration only).

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/7] wt-status: use strbuf_read_cmd
  2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
                       ` (2 preceding siblings ...)
  2015-03-22 10:07     ` [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper Jeff King
@ 2015-03-22 10:08     ` Jeff King
  2015-03-22 10:08     ` [PATCH 5/7] submodule: " Jeff King
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22 10:08 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

When we spawn "git submodule status" to read its output, we
use run_command() followed by a strbuf_read() from a pipe.
This can deadlock if the subprocess output is larger than
the system pipe buffer.

Furthermore, if start_command() fails, we'll try to read
from a bogus descriptor (probably "-1" or a descriptor we
just closed, but it is a bad idea for us to make assumptions
about how start_command implements its error handling). And
if start_command succeeds, we leak the file descriptor for
the pipe to the child.

All of these can be solved by using the strbuf_read_cmd
helper.

Signed-off-by: Jeff King <peff@peff.net>
---
 wt-status.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index b47f6d9..fd85d61 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -744,11 +744,8 @@ static void wt_status_print_submodule_summary(struct wt_status *s, int uncommitt

 	sm_summary.git_cmd = 1;
 	sm_summary.no_stdin = 1;
-	sm_summary.out = -1;

-	run_command(&sm_summary);
-
-	strbuf_read(&cmd_stdout, sm_summary.out, 1024);
+	strbuf_read_cmd(&cmd_stdout, &sm_summary, 1024);

 	/* prepend header, only if there's an actual output */
 	if (cmd_stdout.len) {
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/7] submodule: use strbuf_read_cmd
  2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
                       ` (3 preceding siblings ...)
  2015-03-22 10:08     ` [PATCH 4/7] wt-status: use strbuf_read_cmd Jeff King
@ 2015-03-22 10:08     ` Jeff King
  2015-03-22 10:09     ` [PATCH 6/7] trailer: " Jeff King
  2015-03-22 10:10     ` [PATCH 7/7] run-command: forbid using run_command with piped output Jeff King
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22 10:08 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

In is_submodule_commit_present, we call run_command followed
by a pipe read, which is prone to deadlock. It is unlikely
to happen in this case, as rev-list should never produce
more than a single line of output, but it does not hurt to
avoid an anti-pattern (and using the helper simplifies the
setup and cleanup).

Signed-off-by: Jeff King <peff@peff.net>
---
 submodule.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index d37d400..d85f1bb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -576,12 +576,10 @@ static int is_submodule_commit_present(const char *path, unsigned char sha1[20])
 		cp.env = local_repo_env;
 		cp.git_cmd = 1;
 		cp.no_stdin = 1;
-		cp.out = -1;
 		cp.dir = path;
-		if (!run_command(&cp) && !strbuf_read(&buf, cp.out, 1024))
+		if (!strbuf_read_cmd(&buf, &cp, 1024) && !buf.len)
 			is_present = 1;
 
-		close(cp.out);
 		strbuf_release(&buf);
 	}
 	return is_present;
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/7] trailer: use strbuf_read_cmd
  2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
                       ` (4 preceding siblings ...)
  2015-03-22 10:08     ` [PATCH 5/7] submodule: " Jeff King
@ 2015-03-22 10:09     ` Jeff King
  2015-03-22 10:10     ` [PATCH 7/7] run-command: forbid using run_command with piped output Jeff King
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22 10:09 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

When we read from a trailer.*.command sub-program, the
current code uses run_command followed by a pipe read, which
can result in deadlock (though in practice you would have to
have a large trailer for this to be a problem). The current
code also leaks the file descriptor for the pipe to the
sub-command.

Instead, let's use strbuf_read_cmd, which makes this simpler
(and we can get rid of our custom helper).

Signed-off-by: Jeff King <peff@peff.net>
---
 trailer.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/trailer.c b/trailer.c
index 05b3859..b47bc0e 100644
--- a/trailer.c
+++ b/trailer.c
@@ -214,16 +214,6 @@ static struct trailer_item *remove_first(struct trailer_item **first)
 	return item;
 }
 
-static int read_from_command(struct child_process *cp, struct strbuf *buf)
-{
-	if (run_command(cp))
-		return error("running trailer command '%s' failed", cp->argv[0]);
-	if (strbuf_read(buf, cp->out, 1024) < 1)
-		return error("reading from trailer command '%s' failed", cp->argv[0]);
-	strbuf_trim(buf);
-	return 0;
-}
-
 static const char *apply_command(const char *command, const char *arg)
 {
 	struct strbuf cmd = STRBUF_INIT;
@@ -240,14 +230,16 @@ static const char *apply_command(const char *command, const char *arg)
 	cp.argv = argv;
 	cp.env = local_repo_env;
 	cp.no_stdin = 1;
-	cp.out = -1;
 	cp.use_shell = 1;
 
-	if (read_from_command(&cp, &buf)) {
+	if (strbuf_read_cmd(&buf, &cp, 1024)) {
+		error("running trailer command '%s' failed", cmd.buf);
 		strbuf_release(&buf);
 		result = xstrdup("");
-	} else
+	} else {
+		strbuf_trim(&buf);
 		result = strbuf_detach(&buf, NULL);
+	}
 
 	strbuf_release(&cmd);
 	return result;
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/7] run-command: forbid using run_command with piped output
  2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
                       ` (5 preceding siblings ...)
  2015-03-22 10:09     ` [PATCH 6/7] trailer: " Jeff King
@ 2015-03-22 10:10     ` Jeff King
  6 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2015-03-22 10:10 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: git

Because run_command both spawns and wait()s for the command
before returning control to the caller, any reads from the
pipes we open must necessarily happen after wait() returns.
This can lead to deadlock, as the child process may block on
writing to us while we are blocked waiting for it to exit.

Worse, it only happens when the child fills the pipe buffer,
which means that the problem may come and go depending on
the platform and the size of the output produced by the
child.

Let's detect and flag this dangerous construct so that we
can catch potential bugs early in the test suite rather than
having them happen in the field.

Signed-off-by: Jeff King <peff@peff.net>
---
 run-command.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index 3afb124..23636e0 100644
--- a/run-command.c
+++ b/run-command.c
@@ -557,7 +557,12 @@ int finish_command(struct child_process *cmd)

 int run_command(struct child_process *cmd)
 {
-	int code = start_command(cmd);
+	int code;
+
+	if (cmd->out < 0 || cmd->err < 0)
+		die("BUG: run_command with a pipe can cause deadlock");
+
+	code = start_command(cmd);
 	if (code)
 		return code;
 	return finish_command(cmd);
-- 
2.3.3.618.ga041503

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-03-23  4:40 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-22  5:56 status hangs trying to get submodule summary Wincent Colaiuta
2015-03-22  7:44 ` [PATCH] status: read submodule process output before calling wait() Jeff King
2015-03-22  8:07   ` Jeff King
2015-03-22  9:59   ` [PATCH 0/7] introduce strbuf_read_cmd to avoid deadlocks Jeff King
2015-03-22 10:00     ` [PATCH 1/7] wt-status: don't flush before running "submodule status" Jeff King
2015-03-22 10:00     ` [PATCH 2/7] wt_status: fix signedness mismatch in strbuf_read call Jeff King
2015-03-22 10:07     ` [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper Jeff King
2015-03-22 19:36       ` Eric Sunshine
2015-03-22 22:54         ` Junio C Hamano
2015-03-22 23:40           ` Junio C Hamano
2015-03-23  3:53             ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Jeff King
2015-03-23  3:53               ` [PATCH v2 1/7] wt-status: don't flush before running "submodule status" Jeff King
2015-03-23  3:53               ` [PATCH v2 2/7] wt_status: fix signedness mismatch in strbuf_read call Jeff King
2015-03-23  3:53               ` [PATCH v2 3/7] run-command: introduce capture_command helper Jeff King
2015-03-23  3:53               ` [PATCH v2 4/7] wt-status: use capture_command Jeff King
2015-03-23  3:53               ` [PATCH v2 5/7] submodule: " Jeff King
2015-03-23  3:54               ` [PATCH v2 6/7] trailer: " Jeff King
2015-03-23  3:54               ` [PATCH v2 7/7] run-command: forbid using run_command with piped output Jeff King
2015-03-23  4:40               ` [PATCH v2 0/7] introduce capture_command to avoid deadlocks Junio C Hamano
2015-03-22 23:34         ` [PATCH 3/7] strbuf: introduce strbuf_read_cmd helper Jeff King
2015-03-22 23:22       ` Junio C Hamano
2015-03-22 23:36         ` Jeff King
2015-03-22 10:08     ` [PATCH 4/7] wt-status: use strbuf_read_cmd Jeff King
2015-03-22 10:08     ` [PATCH 5/7] submodule: " Jeff King
2015-03-22 10:09     ` [PATCH 6/7] trailer: " Jeff King
2015-03-22 10:10     ` [PATCH 7/7] run-command: forbid using run_command with piped output Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).