* [PATCH] daemon: handle EINTR failures from waitpid()
@ 2025-06-30 4:13 Carlo Marcelo Arenas Belón
2025-06-30 9:00 ` Phillip Wood
0 siblings, 1 reply; 3+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2025-06-30 4:13 UTC (permalink / raw)
To: git
Cc: Stephen R . van den Berg, Erik Faye-Lund,
Carlo Marcelo Arenas Belón
Since 695605b508 (git-daemon: Simplify dead-children reaping logic,
2008-08-14), the logic to check for zombie children was moved out of
the SIGCHLD signal handler, but adding checks for a failed waitpid()
were missed, with the possibility that a badly timed signal could
prevent the promptly reaping of those defunct processes.
After the refactoring of 30e1560230 (daemon: use run-command api for
async serving, 2010-11-04), that reproduced that bug, a single
process could be skipped from reaping, so prevent that by adding the
missing error handling, and while at it make sure that ECHILD (or
other errors) are correctly reported as a BUG().
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
daemon.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/daemon.c b/daemon.c
index d1be61fd57..16ae66a2da 100644
--- a/daemon.c
+++ b/daemon.c
@@ -864,8 +864,11 @@ static void check_dead_children(void)
live_children--;
child_process_clear(&blanket->cld);
free(blanket);
- } else
+ } else if (!pid)
cradle = &blanket->next;
+ else if (errno != EINTR)
+ BUG("invalid child '%" PRIuMAX "'",
+ (uintmax_t)blanket->cld.pid);
}
static struct strvec cld_argv = STRVEC_INIT;
--
2.50.0.132.g32f443f09a.dirty
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] daemon: handle EINTR failures from waitpid()
2025-06-30 4:13 [PATCH] daemon: handle EINTR failures from waitpid() Carlo Marcelo Arenas Belón
@ 2025-06-30 9:00 ` Phillip Wood
2025-06-30 12:18 ` Carlo Marcelo Arenas Belón
0 siblings, 1 reply; 3+ messages in thread
From: Phillip Wood @ 2025-06-30 9:00 UTC (permalink / raw)
To: Carlo Marcelo Arenas Belón, git
Cc: Stephen R . van den Berg, Erik Faye-Lund
Hi Carlo
On 30/06/2025 05:13, Carlo Marcelo Arenas Belón wrote:
> Since 695605b508 (git-daemon: Simplify dead-children reaping logic,
> 2008-08-14), the logic to check for zombie children was moved out of
> the SIGCHLD signal handler, but adding checks for a failed waitpid()
> were missed, with the possibility that a badly timed signal could
> prevent the promptly reaping of those defunct processes.
>
> After the refactoring of 30e1560230 (daemon: use run-command api for
> async serving, 2010-11-04), that reproduced that bug, a single
> process could be skipped from reaping, so prevent that by adding the
> missing error handling, and while at it make sure that ECHILD (or
> other errors) are correctly reported as a BUG().
I agree with you analysis, I've left a couple of comments on the fix. I
noticed this when I was reading the code to see how well it handled
EINTR and decided it wasn't worth worrying about as we still collect the
child the next time we call check_dead_children() but there is no harm
in checking for EINTR here. It might be worth noting in the commit
message that the linux man page for waitpid() explicitly says that EINTR
cannot happen when WNOHANG is given though. I wonder if that is the case
on other platforms as well because the calling thread is not suspended
and EINTR is usually associated with calls that block.
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
> daemon.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/daemon.c b/daemon.c
> index d1be61fd57..16ae66a2da 100644
> --- a/daemon.c
> +++ b/daemon.c
> @@ -864,8 +864,11 @@ static void check_dead_children(void)
> live_children--;
> child_process_clear(&blanket->cld);
> free(blanket);
> - } else
> + } else if (!pid)
Our style guidelines say that if one clause of an if statement needs
braces then all the clauses should be braced.
> cradle = &blanket->next;
> + else if (errno != EINTR)
> + BUG("invalid child '%" PRIuMAX "'",
> + (uintmax_t)blanket->cld.pid);
POSIX says pid_t is signed so I'm not sure about the unsigned cast here.
Do any of the platforms we support have a pid_t that is wider than a
long integer? I wondered if we should be logging an error instead of
calling BUG() but I think any error other that EINTR indicates a
programming error so BUG() seems appropriate.
Thanks
Phillip
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] daemon: handle EINTR failures from waitpid()
2025-06-30 9:00 ` Phillip Wood
@ 2025-06-30 12:18 ` Carlo Marcelo Arenas Belón
0 siblings, 0 replies; 3+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2025-06-30 12:18 UTC (permalink / raw)
To: phillip.wood; +Cc: git, Stephen R . van den Berg, Erik Faye-Lund
On Mon, Jun 30, 2025 at 10:00:09AM -0800, Phillip Wood wrote:
>
> On 30/06/2025 05:13, Carlo Marcelo Arenas Belón wrote:
> > Since 695605b508 (git-daemon: Simplify dead-children reaping logic,
> > 2008-08-14), the logic to check for zombie children was moved out of
> > the SIGCHLD signal handler, but adding checks for a failed waitpid()
> > were missed, with the possibility that a badly timed signal could
> > prevent the promptly reaping of those defunct processes.
> >
> > After the refactoring of 30e1560230 (daemon: use run-command api for
> > async serving, 2010-11-04), that reproduced that bug, a single
> > process could be skipped from reaping, so prevent that by adding the
> > missing error handling, and while at it make sure that ECHILD (or
> > other errors) are correctly reported as a BUG().
>
> I agree with you analysis, I've left a couple of comments on the fix. I
> noticed this when I was reading the code to see how well it handled EINTR
> and decided it wasn't worth worrying about as we still collect the child the
> next time we call check_dead_children() but there is no harm in checking for
> EINTR here. It might be worth noting in the commit message that the linux
> man page for waitpid() explicitly says that EINTR cannot happen when WNOHANG
> is given though. I wonder if that is the case on other platforms as well
> because the calling thread is not suspended and EINTR is usually associated
> with calls that block.
I wasn't aware of the comment in the Linux man page, and didn't see
something similar in the ones I checked or the POSIX specification.
If WNOHANG prevents it from returning -1 with errno == EINTR, then my analysis
is incorrect, and the last refactoring is the only one to blame as it didn't
add error handling from ECHILD.
More importantly, if we consider that regardless of the coment in the Linux
man page (google found something similar in the one from zVM) that behaviour
is implementation dependent it might be worth to fix also a similar use case
in run_command.
> > cradle = &blanket->next;
> > + else if (errno != EINTR)
> > + BUG("invalid child '%" PRIuMAX "'",
> > + (uintmax_t)blanket->cld.pid);
>
> POSIX says pid_t is signed so I'm not sure about the unsigned cast here.
but that is only so that a `(pid_t)-1` is valid AFAIK, and all "real" pid
are expected to be positive (even in systems where pid_t is a 8 byte long
like Solaris).
casting them to unsigned to print them and using a uintmax_t for it was
how all pid are printed since 85e7283069 (cast pid_t's to uintmax_t to
improve portability, 2008-08-31) AFAIK.
> Do
> any of the platforms we support have a pid_t that is wider than a long
> integer?
the ones in AIX are pretty long, but definitely no longer than INT_MAX (with
pid_t being 4 bytes long there).
Carlo
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-06-30 12:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30 4:13 [PATCH] daemon: handle EINTR failures from waitpid() Carlo Marcelo Arenas Belón
2025-06-30 9:00 ` Phillip Wood
2025-06-30 12:18 ` Carlo Marcelo Arenas Belón
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox