public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] daemon: handle EINTR failures from waitpid()
@ 2025-06-30  4:13 Carlo Marcelo Arenas Belón
  2025-06-30  9:00 ` Phillip Wood
  0 siblings, 1 reply; 3+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2025-06-30  4:13 UTC (permalink / raw)
  To: git
  Cc: Stephen R . van den Berg, Erik Faye-Lund,
	Carlo Marcelo Arenas Belón

Since 695605b508 (git-daemon: Simplify dead-children reaping logic,
2008-08-14), the logic to check for zombie children was moved out of
the SIGCHLD signal handler, but adding checks for a failed waitpid()
were missed, with the possibility that a badly timed signal could
prevent the promptly reaping of those defunct processes.

After the refactoring of 30e1560230 (daemon: use run-command api for
async serving, 2010-11-04), that reproduced that bug, a single
process could be skipped from reaping, so prevent that by adding the
missing error handling, and while at it make sure that ECHILD (or
other errors) are correctly reported as a BUG().

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 daemon.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/daemon.c b/daemon.c
index d1be61fd57..16ae66a2da 100644
--- a/daemon.c
+++ b/daemon.c
@@ -864,8 +864,11 @@ static void check_dead_children(void)
 			live_children--;
 			child_process_clear(&blanket->cld);
 			free(blanket);
-		} else
+		} else if (!pid)
 			cradle = &blanket->next;
+		else if (errno != EINTR)
+			BUG("invalid child '%" PRIuMAX "'",
+			    (uintmax_t)blanket->cld.pid);
 }
 
 static struct strvec cld_argv = STRVEC_INIT;
-- 
2.50.0.132.g32f443f09a.dirty


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] daemon: handle EINTR failures from waitpid()
  2025-06-30  4:13 [PATCH] daemon: handle EINTR failures from waitpid() Carlo Marcelo Arenas Belón
@ 2025-06-30  9:00 ` Phillip Wood
  2025-06-30 12:18   ` Carlo Marcelo Arenas Belón
  0 siblings, 1 reply; 3+ messages in thread
From: Phillip Wood @ 2025-06-30  9:00 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón, git
  Cc: Stephen R . van den Berg, Erik Faye-Lund

Hi Carlo

On 30/06/2025 05:13, Carlo Marcelo Arenas Belón wrote:
> Since 695605b508 (git-daemon: Simplify dead-children reaping logic,
> 2008-08-14), the logic to check for zombie children was moved out of
> the SIGCHLD signal handler, but adding checks for a failed waitpid()
> were missed, with the possibility that a badly timed signal could
> prevent the promptly reaping of those defunct processes.
> 
> After the refactoring of 30e1560230 (daemon: use run-command api for
> async serving, 2010-11-04), that reproduced that bug, a single
> process could be skipped from reaping, so prevent that by adding the
> missing error handling, and while at it make sure that ECHILD (or
> other errors) are correctly reported as a BUG().

I agree with you analysis, I've left a couple of comments on the fix. I 
noticed this when I was reading the code to see how well it handled 
EINTR and decided it wasn't worth worrying about as we still collect the 
child the next time we call check_dead_children() but there is no harm 
in checking for EINTR here. It might be worth noting in the commit 
message that the linux man page for waitpid() explicitly says that EINTR 
cannot happen when WNOHANG is given though. I wonder if that is the case 
on other platforms as well because the calling thread is not suspended 
and EINTR is usually associated with calls that block.

> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>   daemon.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/daemon.c b/daemon.c
> index d1be61fd57..16ae66a2da 100644
> --- a/daemon.c
> +++ b/daemon.c
> @@ -864,8 +864,11 @@ static void check_dead_children(void)
>   			live_children--;
>   			child_process_clear(&blanket->cld);
>   			free(blanket);
> -		} else
> +		} else if (!pid)

Our style guidelines say that if one clause of an if statement needs 
braces then all the clauses should be braced.

>   			cradle = &blanket->next;
> +		else if (errno != EINTR)
> +			BUG("invalid child '%" PRIuMAX "'",
> +			    (uintmax_t)blanket->cld.pid);

POSIX says pid_t is signed so I'm not sure about the unsigned cast here. 
Do any of the platforms we support have a pid_t that is wider than a 
long integer? I wondered if we should be logging an error instead of 
calling BUG() but I think any error other that EINTR indicates a 
programming error so BUG() seems appropriate.

Thanks

Phillip


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] daemon: handle EINTR failures from waitpid()
  2025-06-30  9:00 ` Phillip Wood
@ 2025-06-30 12:18   ` Carlo Marcelo Arenas Belón
  0 siblings, 0 replies; 3+ messages in thread
From: Carlo Marcelo Arenas Belón @ 2025-06-30 12:18 UTC (permalink / raw)
  To: phillip.wood; +Cc: git, Stephen R . van den Berg, Erik Faye-Lund

On Mon, Jun 30, 2025 at 10:00:09AM -0800, Phillip Wood wrote:
> 
> On 30/06/2025 05:13, Carlo Marcelo Arenas Belón wrote:
> > Since 695605b508 (git-daemon: Simplify dead-children reaping logic,
> > 2008-08-14), the logic to check for zombie children was moved out of
> > the SIGCHLD signal handler, but adding checks for a failed waitpid()
> > were missed, with the possibility that a badly timed signal could
> > prevent the promptly reaping of those defunct processes.
> > 
> > After the refactoring of 30e1560230 (daemon: use run-command api for
> > async serving, 2010-11-04), that reproduced that bug, a single
> > process could be skipped from reaping, so prevent that by adding the
> > missing error handling, and while at it make sure that ECHILD (or
> > other errors) are correctly reported as a BUG().
> 
> I agree with you analysis, I've left a couple of comments on the fix. I
> noticed this when I was reading the code to see how well it handled EINTR
> and decided it wasn't worth worrying about as we still collect the child the
> next time we call check_dead_children() but there is no harm in checking for
> EINTR here. It might be worth noting in the commit message that the linux
> man page for waitpid() explicitly says that EINTR cannot happen when WNOHANG
> is given though. I wonder if that is the case on other platforms as well
> because the calling thread is not suspended and EINTR is usually associated
> with calls that block.

I wasn't aware of the comment in the Linux man page, and didn't see
something similar in the ones I checked or the POSIX specification.

If WNOHANG prevents it from returning -1 with errno == EINTR, then my analysis
is incorrect, and the last refactoring is the only one to blame as it didn't
add error handling from ECHILD.

More importantly, if we consider that regardless of the coment in the Linux
man page (google found something similar in the one from zVM) that behaviour
is implementation dependent it might be worth to fix also a similar use case
in run_command.

> >   			cradle = &blanket->next;
> > +		else if (errno != EINTR)
> > +			BUG("invalid child '%" PRIuMAX "'",
> > +			    (uintmax_t)blanket->cld.pid);
> 
> POSIX says pid_t is signed so I'm not sure about the unsigned cast here.

but that is only so that a `(pid_t)-1` is valid AFAIK, and all "real" pid
are expected to be positive (even in systems where pid_t is a 8 byte long
like Solaris).

casting them to unsigned to print them and using a uintmax_t for it was
how all pid are printed since 85e7283069 (cast pid_t's to uintmax_t to
improve portability, 2008-08-31) AFAIK.

> Do
> any of the platforms we support have a pid_t that is wider than a long
> integer?

the ones in AIX are pretty long, but definitely no longer than INT_MAX (with
pid_t being 4 bytes long there).

Carlo

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-06-30 12:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30  4:13 [PATCH] daemon: handle EINTR failures from waitpid() Carlo Marcelo Arenas Belón
2025-06-30  9:00 ` Phillip Wood
2025-06-30 12:18   ` Carlo Marcelo Arenas Belón

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox