From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Josh Steadmon <steadmon@google.com>
Cc: git@vger.kernel.org, gitster@pobox.com, git@jeffhostetler.com
Subject: Re: [PATCH v2] run-command: don't spam trace2_child_exit()
Date: Wed, 08 Jun 2022 00:09:14 +0200 [thread overview]
Message-ID: <220608.86pmjkt97e.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <50d872a057a558fa5519856b95abd048ddb514dc.1654625626.git.steadmon@google.com>
On Tue, Jun 07 2022, Josh Steadmon wrote:
> In rare cases[1], wait_or_whine() cannot determine a child process's
> status (and will return -1 in this case). This can cause Git to issue
> trace2 child_exit events despite the fact that the child may still be
> running. In pathological cases, we've seen > 80 million exit events in
> our trace logs for a single child process.
>
> Fix this by only issuing trace2 events in finish_command_in_signal() if
> we get a value other than -1 from wait_or_whine(). This can lead to
> missing child_exit events in such a case, but that is preferable to
> duplicating events on a scale that threatens to fill the user's
> filesystem with invalid trace logs.
>
> [1]: This can happen when:
>
> * waitpid() returns -1 and errno != EINTR
> * waitpid() returns an invalid PID
> * the status set by waitpid() has neither the WIFEXITED() nor
> WIFSIGNALED() flags
>
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
> Updated the commit message with more details about when wait_or_whine()
> can fail.
>
> run-command.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/run-command.c b/run-command.c
> index a8501e38ce..e0fe2418a2 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -983,7 +983,8 @@ int finish_command(struct child_process *cmd)
> int finish_command_in_signal(struct child_process *cmd)
> {
> int ret = wait_or_whine(cmd->pid, cmd->args.v[0], 1);
> - trace2_child_exit(cmd, ret);
> + if (ret != -1)
> + trace2_child_exit(cmd, ret);
> return ret;
> }
This seems like a legitimate issue, but I really don't think we should
sweep this under the rug like this.
* Why can't we see if we logged such an event already in common_exit(),
if we didn't we should trace2_child_exit() (or similar). I.e. not
miss an event, ever.
* Should this really be an "exit" event, aren't some of these failed
signal events? Per the "should this be an exit event?" question in my
related "signal on BUG" series.
* We should have tests here, e.g. in t0210 to see the exact events we
emit in certain cases, we really should have a test for this. Perhaps
we can instrument a simulated failure with some GIT_TEST_* variables?
next prev parent reply other threads:[~2022-06-08 0:33 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-28 20:58 [PATCH] run-command: don't spam trace2_child_exit() Josh Steadmon
2022-04-28 21:46 ` Junio C Hamano
2022-05-03 14:59 ` Jeff Hostetler
2022-05-05 19:58 ` Josh Steadmon
2022-05-10 20:37 ` Jeff Hostetler
2022-06-07 18:45 ` Josh Steadmon
2022-05-05 19:44 ` Josh Steadmon
2022-06-07 18:21 ` [PATCH v2] " Josh Steadmon
2022-06-07 22:09 ` Ævar Arnfjörð Bjarmason [this message]
2022-06-10 15:31 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=220608.86pmjkt97e.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=steadmon@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.