* [bug] git clone command leaves orphaned ssh process @ 2023-09-10 6:38 Max Amelchenko 2023-09-10 8:50 ` Bagas Sanjaya 0 siblings, 1 reply; 9+ messages in thread From: Max Amelchenko @ 2023-09-10 6:38 UTC (permalink / raw) To: git What did you do before the bug happened? (Steps to reproduce your issue) Run the command: ps aux Observe no ssh processes running on system. Run git clone against a non-existent hostname: git clone -v --depth=1 -b 3.23.66 ssh://*****@*****lab-prod.server.sim.cloud/terraform/modules/aws-eks /tmp/dest Observe the command fails with: Could not resolve hostname *****lab-prod.server.sim.cloud: Name or service not known Run: ps aux Observe a defunct ssh process is left behind. What did you expect to happen? (Expected behavior) I expected the command to quit without leaving any processes behind. What happened instead? (Actual behavior) The command quit and left a defunct ssh process on the system. What's different between what you expected and what actually happened? I don't want zombie processes left after any git command (either failed or not). Anything else you want to add: These processes are zombie orphaned, meaning we're stuck with them until system reboot (which is bad). Please review the rest of the bug report below. You can delete any lines you don't wish to share. [System Info] git version: git version 2.40.1 cpu: aarch64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh compiler info: gnuc: 7.3 libc info: glibc: 2.26 $SHELL (typically, interactive shell): <unset> [Enabled Hooks] not run from a git repository - no hooks to show ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug] git clone command leaves orphaned ssh process 2023-09-10 6:38 [bug] git clone command leaves orphaned ssh process Max Amelchenko @ 2023-09-10 8:50 ` Bagas Sanjaya 2023-09-10 9:47 ` Max Amelchenko 0 siblings, 1 reply; 9+ messages in thread From: Bagas Sanjaya @ 2023-09-10 8:50 UTC (permalink / raw) To: Max Amelchenko, git; +Cc: Hideaki Yoshifuji, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 1913 bytes --] On Sun, Sep 10, 2023 at 09:38:54AM +0300, Max Amelchenko wrote: > What did you do before the bug happened? (Steps to reproduce your issue) > > Run the command: > ps aux > Observe no ssh processes running on system. > > Run git clone against a non-existent hostname: > git clone -v --depth=1 -b 3.23.66 > ssh://*****@*****lab-prod.server.sim.cloud/terraform/modules/aws-eks > /tmp/dest > Observe the command fails with: > > Could not resolve hostname *****lab-prod.server.sim.cloud: Name or > service not known > > Run: > ps aux > > Observe a defunct ssh process is left behind. On git current master on my system, I got sshd (server) processes instead: ``` root 835 0.0 0.0 15500 3584 ? Ss 14:38 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups 165536 3865 0.0 0.0 8488 1408 ? Ss 14:39 0:00 sshd: /usr/sbin/sshd -D -e [listener] 0 of 10-100 startups 165536 4039 0.0 0.0 11308 1920 ? Ss 14:40 0:00 sshd: /usr/bin/sshd -D [listener] 0 of 10-100 startups 165536 4374 0.0 0.0 15404 1920 ? Ss 14:40 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups 165536 4399 0.0 0.0 15404 1792 ? Ss 14:40 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups 165536 4732 0.0 0.0 15404 2048 ? Ss 14:41 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups 165536 4943 0.0 0.0 18004 848 ? Ss 14:41 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups bagas 6841 0.0 0.0 7668 1092 ? Ss 14:43 0:00 /usr/bin/ssh-agent /usr/bin/im-launch /usr/bin/gnome-session bagas 6908 0.0 0.1 162780 5488 ? Ssl 14:43 0:00 /usr/libexec/gcr-ssh-agent /run/user/1000/gcr ``` What is your ps output then? Thanks. -- An old man doll... just what I always wanted! - Clara [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug] git clone command leaves orphaned ssh process 2023-09-10 8:50 ` Bagas Sanjaya @ 2023-09-10 9:47 ` Max Amelchenko 2023-09-10 18:47 ` Taylor Blau 0 siblings, 1 reply; 9+ messages in thread From: Max Amelchenko @ 2023-09-10 9:47 UTC (permalink / raw) To: Bagas Sanjaya; +Cc: git, Hideaki Yoshifuji, Junio C Hamano Output of first ps aux command: bash-4.2# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 715708 5144 pts/0 Ssl+ 09:43 0:00 /usr/local/bin/aws-lambda-rie /var/runtime/bootstrap root 14 0.1 0.0 114096 3088 pts/1 Ss 09:43 0:00 bash root 165 0.0 0.0 118296 3392 pts/1 R+ 09:45 0:00 ps aux Output of second ps aux command (after running git clone): bash-4.2# ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 715708 5144 pts/0 Ssl+ 09:43 0:00 /usr/local/bin/aws-lambda-rie /var/runtime/bootstrap root 14 0.0 0.0 114096 3088 pts/1 Ss 09:43 0:00 bash root 167 0.5 0.0 0 0 pts/1 Z 09:46 0:00 [ssh] <defunct> root 168 0.0 0.0 118296 3408 pts/1 R+ 09:46 0:00 ps aux See the added ssh defunct process. On Sun, Sep 10, 2023 at 11:50 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote: > > On Sun, Sep 10, 2023 at 09:38:54AM +0300, Max Amelchenko wrote: > > What did you do before the bug happened? (Steps to reproduce your issue) > > > > Run the command: > > ps aux > > Observe no ssh processes running on system. > > > > Run git clone against a non-existent hostname: > > git clone -v --depth=1 -b 3.23.66 > > ssh://*****@*****lab-prod.server.sim.cloud/terraform/modules/aws-eks > > /tmp/dest > > Observe the command fails with: > > > > Could not resolve hostname *****lab-prod.server.sim.cloud: Name or > > service not known > > > > Run: > > ps aux > > > > Observe a defunct ssh process is left behind. > > On git current master on my system, I got sshd (server) processes instead: > > ``` > root 835 0.0 0.0 15500 3584 ? Ss 14:38 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups > 165536 3865 0.0 0.0 8488 1408 ? Ss 14:39 0:00 sshd: /usr/sbin/sshd -D -e [listener] 0 of 10-100 startups > 165536 4039 0.0 0.0 11308 1920 ? Ss 14:40 0:00 sshd: /usr/bin/sshd -D [listener] 0 of 10-100 startups > 165536 4374 0.0 0.0 15404 1920 ? Ss 14:40 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups > 165536 4399 0.0 0.0 15404 1792 ? Ss 14:40 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups > 165536 4732 0.0 0.0 15404 2048 ? Ss 14:41 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups > 165536 4943 0.0 0.0 18004 848 ? Ss 14:41 0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups > bagas 6841 0.0 0.0 7668 1092 ? Ss 14:43 0:00 /usr/bin/ssh-agent /usr/bin/im-launch /usr/bin/gnome-session > bagas 6908 0.0 0.1 162780 5488 ? Ssl 14:43 0:00 /usr/libexec/gcr-ssh-agent /run/user/1000/gcr > > ``` > > What is your ps output then? > > Thanks. > > -- > An old man doll... just what I always wanted! - Clara ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug] git clone command leaves orphaned ssh process 2023-09-10 9:47 ` Max Amelchenko @ 2023-09-10 18:47 ` Taylor Blau 2023-09-11 10:11 ` Max Amelchenko 0 siblings, 1 reply; 9+ messages in thread From: Taylor Blau @ 2023-09-10 18:47 UTC (permalink / raw) To: Max Amelchenko; +Cc: Bagas Sanjaya, git, Hideaki Yoshifuji, Junio C Hamano On Sun, Sep 10, 2023 at 12:47:14PM +0300, Max Amelchenko wrote: > Output of second ps aux command (after running git clone): > > bash-4.2# ps aux > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > > root 1 0.0 0.0 715708 5144 pts/0 Ssl+ 09:43 0:00 > /usr/local/bin/aws-lambda-rie /var/runtime/bootstrap > > root 14 0.0 0.0 114096 3088 pts/1 Ss 09:43 0:00 bash > > root 167 0.5 0.0 0 0 pts/1 Z 09:46 0:00 [ssh] <defunct> > > root 168 0.0 0.0 118296 3408 pts/1 R+ 09:46 0:00 ps aux > > See the added ssh defunct process. Hmm... I wasn't quite able to reproduce this locally. Below `git.compile` points to a Git executable built from the v2.40.1 tag corresponding to your bug report: $ host='ssh://*****@*****lab-prod.server.sim.cloud/terraform/modules/aws-eks' $ git.compile clone "$host" /tmp/x Cloning into '/tmp/x'... ssh: Could not resolve hostname *****lab-prod.server.sim.cloud: Name or service not known fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. and then: $ ps aux | grep defunct ttaylorr 3688844 0.0 0.0 6340 2180 pts/1 S+ 14:45 0:00 grep --color defunct Thanks, Taylor ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug] git clone command leaves orphaned ssh process 2023-09-10 18:47 ` Taylor Blau @ 2023-09-11 10:11 ` Max Amelchenko 2023-09-12 0:40 ` Aaron Schrab 0 siblings, 1 reply; 9+ messages in thread From: Max Amelchenko @ 2023-09-11 10:11 UTC (permalink / raw) To: Taylor Blau; +Cc: Bagas Sanjaya, git, Hideaki Yoshifuji, Junio C Hamano Maybe it's connected also to the underlying infrastructure? We are getting this in AWS lambda jobs and we're hitting a system limit of max processes because of it. Can you try running this inside this image public.ecr.aws/lambda/python ? On Sun, Sep 10, 2023 at 9:47 PM Taylor Blau <me@ttaylorr.com> wrote: > > On Sun, Sep 10, 2023 at 12:47:14PM +0300, Max Amelchenko wrote: > > Output of second ps aux command (after running git clone): > > > > bash-4.2# ps aux > > > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > > > > root 1 0.0 0.0 715708 5144 pts/0 Ssl+ 09:43 0:00 > > /usr/local/bin/aws-lambda-rie /var/runtime/bootstrap > > > > root 14 0.0 0.0 114096 3088 pts/1 Ss 09:43 0:00 bash > > > > root 167 0.5 0.0 0 0 pts/1 Z 09:46 0:00 [ssh] <defunct> > > > > root 168 0.0 0.0 118296 3408 pts/1 R+ 09:46 0:00 ps aux > > > > See the added ssh defunct process. > > Hmm... I wasn't quite able to reproduce this locally. Below > `git.compile` points to a Git executable built from the v2.40.1 tag > corresponding to your bug report: > > $ host='ssh://*****@*****lab-prod.server.sim.cloud/terraform/modules/aws-eks' > $ git.compile clone "$host" /tmp/x > Cloning into '/tmp/x'... > ssh: Could not resolve hostname *****lab-prod.server.sim.cloud: Name or service not known > fatal: Could not read from remote repository. > > Please make sure you have the correct access rights > and the repository exists. > > and then: > > $ ps aux | grep defunct > ttaylorr 3688844 0.0 0.0 6340 2180 pts/1 S+ 14:45 0:00 grep --color defunct > > Thanks, > Taylor ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug] git clone command leaves orphaned ssh process 2023-09-11 10:11 ` Max Amelchenko @ 2023-09-12 0:40 ` Aaron Schrab 2023-09-12 4:33 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Aaron Schrab @ 2023-09-12 0:40 UTC (permalink / raw) To: Max Amelchenko Cc: Taylor Blau, Bagas Sanjaya, git, Hideaki Yoshifuji, Junio C Hamano At 13:11 +0300 11 Sep 2023, Max Amelchenko <maxamel2002@gmail.com> wrote: >Maybe it's connected also to the underlying infrastructure? We are >getting this in AWS lambda jobs and we're hitting a system limit of >max processes because of it. Running as a lambda, or in a container, could definitely be why you're seeing a difference. Normally when a process is orphaned it gets adopted by `init` (PID 1), and that will take care of cleaning up after orphaned zombie processes. But most of the time containers just run the configured process directly, without an init process. That leaves nothing to clean orphan processes. Although for that to really be a problem, would require hitting that max process limit inside a single container invocation. Of course since containers usually aren't meant to be spawning a lot of processes, that limit might be a lot lower than on a normal system. I know that Docker provides a way to include an init process in the started container (`docker run --init`), but I don't think that AWS Lambda does. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug] git clone command leaves orphaned ssh process 2023-09-12 0:40 ` Aaron Schrab @ 2023-09-12 4:33 ` Jeff King 2023-09-24 10:25 ` Max Amelchenko 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2023-09-12 4:33 UTC (permalink / raw) To: Aaron Schrab Cc: Max Amelchenko, Taylor Blau, Bagas Sanjaya, git, Hideaki Yoshifuji, Junio C Hamano On Mon, Sep 11, 2023 at 08:40:49PM -0400, Aaron Schrab wrote: > At 13:11 +0300 11 Sep 2023, Max Amelchenko <maxamel2002@gmail.com> wrote: > > Maybe it's connected also to the underlying infrastructure? We are > > getting this in AWS lambda jobs and we're hitting a system limit of > > max processes because of it. > > Running as a lambda, or in a container, could definitely be why you're > seeing a difference. Normally when a process is orphaned it gets adopted by > `init` (PID 1), and that will take care of cleaning up after orphaned zombie > processes. > > But most of the time containers just run the configured process directly, > without an init process. That leaves nothing to clean orphan processes. Yeah, that seems like the culprit. If the clone finishes successfully, we do end up in finish_connect(), where we wait() for the process. But if we exit early (in this case, ssh bails and we get EOF on the pipe reading from it), then we may call die() and exit immediately. We _could_ take special care to add every spawned process to a global list, set up handlers via atexit() and signal(), and then reap the processes. But traditionally it's not a big deal to exit with un-reaped children, and this is the responsibility of init. I'm not sure it makes sense for Git to basically reimplement that catch-all (and of course we cannot even do it reliably if we are killed by certain signals). > Although for that to really be a problem, would require hitting that max > process limit inside a single container invocation. Of course since > containers usually aren't meant to be spawning a lot of processes, that > limit might be a lot lower than on a normal system. > > I know that Docker provides a way to include an init process in the started > container (`docker run --init`), but I don't think that AWS Lambda does. I don't know anything about Lambda, but if you are running arbitrary commands, then it seems like you could insert something like this: https://github.com/krallin/tini into the mix. I much prefer that to teaching Git to try to do the same thing in-process. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug] git clone command leaves orphaned ssh process 2023-09-12 4:33 ` Jeff King @ 2023-09-24 10:25 ` Max Amelchenko 2023-09-25 12:29 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Max Amelchenko @ 2023-09-24 10:25 UTC (permalink / raw) To: Jeff King Cc: Aaron Schrab, Taylor Blau, Bagas Sanjaya, git, Hideaki Yoshifuji, Junio C Hamano Thanks, Just wanted to clarify something. This will not be handled by AWS (we had a support ticket re. that case), since they do not interfere with the running processes on its infrastructure, and if there is a problematic process causing this overflowing in orphaned processes, it needs to be handled by that process. The question is, doesn't Git want to ensure a clean exit in all cases? This is a clear example of a non-clean exit. On Tue, Sep 12, 2023 at 7:33 AM Jeff King <peff@peff.net> wrote: > > On Mon, Sep 11, 2023 at 08:40:49PM -0400, Aaron Schrab wrote: > > > At 13:11 +0300 11 Sep 2023, Max Amelchenko <maxamel2002@gmail.com> wrote: > > > Maybe it's connected also to the underlying infrastructure? We are > > > getting this in AWS lambda jobs and we're hitting a system limit of > > > max processes because of it. > > > > Running as a lambda, or in a container, could definitely be why you're > > seeing a difference. Normally when a process is orphaned it gets adopted by > > `init` (PID 1), and that will take care of cleaning up after orphaned zombie > > processes. > > > > But most of the time containers just run the configured process directly, > > without an init process. That leaves nothing to clean orphan processes. > > Yeah, that seems like the culprit. If the clone finishes successfully, > we do end up in finish_connect(), where we wait() for the process. But > if we exit early (in this case, ssh bails and we get EOF on the pipe > reading from it), then we may call die() and exit immediately. > > We _could_ take special care to add every spawned process to a global > list, set up handlers via atexit() and signal(), and then reap the > processes. But traditionally it's not a big deal to exit with un-reaped > children, and this is the responsibility of init. I'm not sure it makes > sense for Git to basically reimplement that catch-all (and of course we > cannot even do it reliably if we are killed by certain signals). > > > Although for that to really be a problem, would require hitting that max > > process limit inside a single container invocation. Of course since > > containers usually aren't meant to be spawning a lot of processes, that > > limit might be a lot lower than on a normal system. > > > > I know that Docker provides a way to include an init process in the started > > container (`docker run --init`), but I don't think that AWS Lambda does. > > I don't know anything about Lambda, but if you are running arbitrary > commands, then it seems like you could insert something like this: > > https://github.com/krallin/tini > > into the mix. I much prefer that to teaching Git to try to do the same > thing in-process. > > -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [bug] git clone command leaves orphaned ssh process 2023-09-24 10:25 ` Max Amelchenko @ 2023-09-25 12:29 ` Jeff King 0 siblings, 0 replies; 9+ messages in thread From: Jeff King @ 2023-09-25 12:29 UTC (permalink / raw) To: Max Amelchenko Cc: Aaron Schrab, Taylor Blau, Bagas Sanjaya, git, Hideaki Yoshifuji, Junio C Hamano On Sun, Sep 24, 2023 at 01:25:08PM +0300, Max Amelchenko wrote: > Thanks, > Just wanted to clarify something. This will not be handled by AWS (we > had a support ticket re. that case), since they do not interfere with > the running processes on its infrastructure, and if there is a > problematic process causing this overflowing in orphaned processes, it > needs to be handled by that process. > The question is, doesn't Git want to ensure a clean exit in all cases? > This is a clear example of a non-clean exit. Git does ensure a clean exit if we run the clone process to completion. In your case we hit a fatal error midway through and are aborting. At that point we do not care what the exit code of ssh is. We _could_ set up a signal/atexit handler combo to call waitpid(), but we would just be throwing away the result code. And that is a catch-all I would rather see done by PID 1 than by git. It can serve all processes, not just git. And it can do so more robustly, since git may be killed without a chance to run cleanup code (e.g., signal 9). -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-09-25 12:29 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-10 6:38 [bug] git clone command leaves orphaned ssh process Max Amelchenko 2023-09-10 8:50 ` Bagas Sanjaya 2023-09-10 9:47 ` Max Amelchenko 2023-09-10 18:47 ` Taylor Blau 2023-09-11 10:11 ` Max Amelchenko 2023-09-12 0:40 ` Aaron Schrab 2023-09-12 4:33 ` Jeff King 2023-09-24 10:25 ` Max Amelchenko 2023-09-25 12:29 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).