* Can dependency on /bin/sh be removed? @ 2024-07-15 18:41 Scott Moser 2024-07-15 20:18 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Scott Moser @ 2024-07-15 18:41 UTC (permalink / raw) To: git Hi, I'm looking at putting together minimal images to run git. In order to make gitcredentials to work (git config credential.helper) the image needs a /bin/sh. I realize there are many workflows that are going to need a shell, but it does not seem like it should be required in order to handle a gitconfig like: [credential-helper] helper = /bin/myhelper In that case, the shell is only being used to tokenize 'myhelper get'. Is there a solution that I'm missing here? Would upstream be open to some modifier on the helper value that would indicate "do not pass to shell" ? Like a '@' to indicate "direct invoke" rather than letting shell handle? [credential-helper] helper = @/bin/myhelper Scott ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-15 18:41 Can dependency on /bin/sh be removed? Scott Moser @ 2024-07-15 20:18 ` Junio C Hamano 2024-07-15 21:46 ` brian m. carlson 2024-07-15 23:52 ` Jeff King 0 siblings, 2 replies; 12+ messages in thread From: Junio C Hamano @ 2024-07-15 20:18 UTC (permalink / raw) To: Scott Moser; +Cc: git Scott Moser <scott.moser@chainguard.dev> writes: > [credential-helper] > helper = /bin/myhelper > > In that case, the shell is only being used to tokenize 'myhelper get'. > > Is there a solution that I'm missing here? Isn't the above example faulty? Shouldn't it be more like [credential] helper = /bin/myhelper > Would upstream be open to some modifier on the helper value that would > indicate "do not pass to shell" ? Like a '@' to indicate "direct > invoke" rather than letting shell handle? Absolutely not. My first gut reaction regarding what needs to happen was that somebody needs to teach credential.c:run_credential_helper() the same trick as run-command.c::prepare_shell_cmd(). In the latter codepath, run_command() calls start_command() which in turn calls prepare_cmd(). The prepare_cmd() then dispatches between prepare_shell_cmd() and strvec_pushv(), but even when .use_shell is set and prepare_shell_cmd() is called, prepare_shell_cmd() knows that it can bypass the shell altogether if the command is simple enough (and /bin/myhelper is indeed simple enough). And when that simplification is taken, shell is not involved at all to run that simple command. Even though the code path starting from start_command() is what run_credential_helper() does use, what is run is NOT a simple command "/bin/myhelper". It will receive arguments, like /bin/myhelper erase /bin/myhelper get /bin/myhelper store etc., because the caller appends these operation verbs to the value of the configuration variable. And as you found out, to tokenize them into two, we need shell. We may be able to teach credential.c:credential_do() not to paste the operation verb to the command line so early. Instead you could teach the function to send the command line and operation verb separately down to run_credential_helper() though. That way, we might be able to avoid the shell in this particular case. That is, if we can * Have start_command() -> prepare_cmd() -> prepare_shell_cmd() codepath to take the usual route _without_ the operation verb tucked to the command line, we would get cmd->args.v[] that does not rely on the shell; * Then before the prepared command is executed, if we can somehow _append_ to cmd->args.v[] the operation verb (after all, that wants to become the argv[1] to the spawned command) before start_command() exec's it then we are done. Having said that, I do not think you can avoid /bin/sh if your goal is "minimal image *to run git*", as there are many things we run, starting from the editor and the pager and end-user hooks. The credential helper is probably the least of your problems. What's a minimum /bin/dash image cost these days? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-15 20:18 ` Junio C Hamano @ 2024-07-15 21:46 ` brian m. carlson 2024-07-15 23:52 ` Jeff King 1 sibling, 0 replies; 12+ messages in thread From: brian m. carlson @ 2024-07-15 21:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Scott Moser, git [-- Attachment #1: Type: text/plain, Size: 1205 bytes --] On 2024-07-15 at 20:18:52, Junio C Hamano wrote: > Having said that, I do not think you can avoid /bin/sh if your goal > is "minimal image *to run git*", as there are many things we run, > starting from the editor and the pager and end-user hooks. The > credential helper is probably the least of your problems. What's a > minimum /bin/dash image cost these days? Debian provides busybox-static, a statically linked, multi-call busybox binary including sh, other POSIX utilities, and quite a lot of other features, all in a 1.9 MiB binary. If you don't need, say, httpd or dpkg emulation, you can build a much more stripped down version for less disk usage. Alpine Linux is based off of busybox (dynamically linked at 808 KiB), so we can assume that that configuration works just fine. We know that Git for Windows also ships a configuration using busybox successfully (although with a much larger size). I am very interested to know what kind of restricted environment needs Git (especially with a credential helper) but doesn't ship with a POSIX sh. I suppose a restricted Windows environment could qualify. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-15 20:18 ` Junio C Hamano 2024-07-15 21:46 ` brian m. carlson @ 2024-07-15 23:52 ` Jeff King 2024-07-16 15:23 ` Scott Moser 2024-07-16 19:23 ` Jeff King 1 sibling, 2 replies; 12+ messages in thread From: Jeff King @ 2024-07-15 23:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: Scott Moser, git On Mon, Jul 15, 2024 at 01:18:52PM -0700, Junio C Hamano wrote: > Even though the code path starting from start_command() is what > run_credential_helper() does use, what is run is NOT a simple > command "/bin/myhelper". It will receive arguments, like > > /bin/myhelper erase > /bin/myhelper get > /bin/myhelper store > > etc., because the caller appends these operation verbs to the value > of the configuration variable. And as you found out, to tokenize them > into two, we need shell. It is also usually "git myhelper get", and so on (though you can configure a shell command). > We may be able to teach credential.c:credential_do() not to paste > the operation verb to the command line so early. Instead you could > teach the function to send the command line and operation verb > separately down to run_credential_helper() though. That way, we > might be able to avoid the shell in this particular case. That is, > if we can > > * Have start_command() -> prepare_cmd() -> prepare_shell_cmd() > codepath to take the usual route _without_ the operation verb > tucked to the command line, we would get cmd->args.v[] that does > not rely on the shell; > > * Then before the prepared command is executed, if we can somehow > _append_ to cmd->args.v[] the operation verb (after all, that > wants to become the argv[1] to the spawned command) before > start_command() exec's it > > then we are done. Yes, I think this is reasonable. You'd also perhaps want to have it set child->git_cmd as appropriate (though really, I do not think that does anything except stick "git" into child.args[0], so we could just do that ourselves). I'm actually a little surprised it was not written this way in the first place. In the non-!, non-absolute-path case we are pasting together a string that will be passed to the shell, and it includes the "helper" argument without further quoting. I don't think you could smuggle a semicolon into there (due to our protocol restrictions), but it does seem like a possible shell injection route. I think it probably goes all the way back to my abca927dbe (introduce credentials API, 2011-12-10). > Having said that, I do not think you can avoid /bin/sh if your goal > is "minimal image *to run git*", as there are many things we run, > starting from the editor and the pager and end-user hooks. The > credential helper is probably the least of your problems. What's a > minimum /bin/dash image cost these days? Right. The bigger issue to me is that the credential helper "!" syntax is defined to the end user as running a shell (and ditto for all of those other spots). So any platform where we can't run a shell would not be fully compatible with git on other platforms. That may be an OK limitation as long as it is advertised clearly, but the use of a shell here is not just an implementation detail, but an intentional user-facing design. -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-15 23:52 ` Jeff King @ 2024-07-16 15:23 ` Scott Moser 2024-07-16 16:32 ` Junio C Hamano 2024-07-16 19:23 ` Jeff King 1 sibling, 1 reply; 12+ messages in thread From: Scott Moser @ 2024-07-16 15:23 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git On Mon, Jul 15, 2024 at 7:52 PM Jeff King <peff@peff.net> wrote: > > On Mon, Jul 15, 2024 at 01:18:52PM -0700, Junio C Hamano wrote: > Yes, I think this is reasonable. You'd also perhaps want to have it set > child->git_cmd as appropriate (though really, I do not think that does > anything except stick "git" into child.args[0], so we could just do that > ourselves). > > I'm actually a little surprised it was not written this way in the first > place. I was too. It seems odd to combine the arguments into a single string early. I was also surprised / didn't realize that 'use_shell' might be ignored. > > Having said that, I do not think you can avoid /bin/sh if your goal > > is "minimal image *to run git*", as there are many things we run, > > starting from the editor and the pager and end-user hooks. The > > credential helper is probably the least of your problems. What's a > > minimum /bin/dash image cost these days? Adding dash is ultimately what we're going to do at least for now. That won't get us a pager or an editor. That's fine. The image will be used in non-interactive environments such as c-i. Not being able to utilize your custom hook that runs awk and sed is one thing. Not being able to build your private repo from github is another. Thanks for your input. On Mon, Jul 15, 2024 at 7:52 PM Jeff King <peff@peff.net> wrote: > > On Mon, Jul 15, 2024 at 01:18:52PM -0700, Junio C Hamano wrote: > > > Even though the code path starting from start_command() is what > > run_credential_helper() does use, what is run is NOT a simple > > command "/bin/myhelper". It will receive arguments, like > > > > /bin/myhelper erase > > /bin/myhelper get > > /bin/myhelper store > > > > etc., because the caller appends these operation verbs to the value > > of the configuration variable. And as you found out, to tokenize them > > into two, we need shell. > > It is also usually "git myhelper get", and so on (though you can > configure a shell command). > > > We may be able to teach credential.c:credential_do() not to paste > > the operation verb to the command line so early. Instead you could > > teach the function to send the command line and operation verb > > separately down to run_credential_helper() though. That way, we > > might be able to avoid the shell in this particular case. That is, > > if we can > > > > * Have start_command() -> prepare_cmd() -> prepare_shell_cmd() > > codepath to take the usual route _without_ the operation verb > > tucked to the command line, we would get cmd->args.v[] that does > > not rely on the shell; > > > > * Then before the prepared command is executed, if we can somehow > > _append_ to cmd->args.v[] the operation verb (after all, that > > wants to become the argv[1] to the spawned command) before > > start_command() exec's it > > > > then we are done. > > Yes, I think this is reasonable. You'd also perhaps want to have it set > child->git_cmd as appropriate (though really, I do not think that does > anything except stick "git" into child.args[0], so we could just do that > ourselves). > > I'm actually a little surprised it was not written this way in the first > place. In the non-!, non-absolute-path case we are pasting together a > string that will be passed to the shell, and it includes the "helper" > argument without further quoting. I don't think you could smuggle a > semicolon into there (due to our protocol restrictions), but it does > seem like a possible shell injection route. > > I think it probably goes all the way back to my abca927dbe (introduce > credentials API, 2011-12-10). > > > Having said that, I do not think you can avoid /bin/sh if your goal > > is "minimal image *to run git*", as there are many things we run, > > starting from the editor and the pager and end-user hooks. The > > credential helper is probably the least of your problems. What's a > > minimum /bin/dash image cost these days? > > Right. The bigger issue to me is that the credential helper "!" syntax > is defined to the end user as running a shell (and ditto for all of > those other spots). So any platform where we can't run a shell would not > be fully compatible with git on other platforms. > > That may be an OK limitation as long as it is advertised clearly, but > the use of a shell here is not just an implementation detail, but an > intentional user-facing design. > > -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-16 15:23 ` Scott Moser @ 2024-07-16 16:32 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2024-07-16 16:32 UTC (permalink / raw) To: Scott Moser; +Cc: Jeff King, git Scott Moser <scott.moser@chainguard.dev> writes: > I was too. It seems odd to combine the arguments into a single string > early. I was also surprised / didn't realize that 'use_shell' might be > ignored. Call it "optimized away" ;-) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-15 23:52 ` Jeff King 2024-07-16 15:23 ` Scott Moser @ 2024-07-16 19:23 ` Jeff King 2024-07-16 20:02 ` Junio C Hamano 2024-07-16 21:30 ` Andreas Schwab 1 sibling, 2 replies; 12+ messages in thread From: Jeff King @ 2024-07-16 19:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: Scott Moser, git On Mon, Jul 15, 2024 at 07:52:12PM -0400, Jeff King wrote: > > We may be able to teach credential.c:credential_do() not to paste > > the operation verb to the command line so early. Instead you could > > teach the function to send the command line and operation verb > > separately down to run_credential_helper() though. That way, we > > might be able to avoid the shell in this particular case. That is, > > if we can > > > > * Have start_command() -> prepare_cmd() -> prepare_shell_cmd() > > codepath to take the usual route _without_ the operation verb > > tucked to the command line, we would get cmd->args.v[] that does > > not rely on the shell; > > > > * Then before the prepared command is executed, if we can somehow > > _append_ to cmd->args.v[] the operation verb (after all, that > > wants to become the argv[1] to the spawned command) before > > start_command() exec's it > > > > then we are done. > > Yes, I think this is reasonable. You'd also perhaps want to have it set > child->git_cmd as appropriate (though really, I do not think that does > anything except stick "git" into child.args[0], so we could just do that > ourselves). > > I'm actually a little surprised it was not written this way in the first > place. In the non-!, non-absolute-path case we are pasting together a > string that will be passed to the shell, and it includes the "helper" > argument without further quoting. I don't think you could smuggle a > semicolon into there (due to our protocol restrictions), but it does > seem like a possible shell injection route. > > I think it probably goes all the way back to my abca927dbe (introduce > credentials API, 2011-12-10). Ah, having tried to refactor it, I see now why it is written as it is. Even for a regular helper without "!", it is important that we construct a string and pass it to the shell, since it is legal (and even encouraged) to do things like: [credential] helper = cache --socket=/path/to/socket --timeout=123 Arguably we could have gotten away with word-splitting ourselves, sticking the result in child_process.args, and avoided the shell. But the use of the shell is documented in gitcredentials(7): helper The name of an external credential helper, and any associated options. If the helper name is not an absolute path, then the string git credential- is prepended. The resulting string is executed by the shell (so, for example, setting this to foo --option=bar will execute git credential-foo --option=bar via the shell. See the manual of specific helpers for examples of their use. So users may be depending on that to do "--socket=$HOME/.foo", or even more exotic shell constructs. Again, it's possible that we could detect that no shell metacharacters are in play and do the word-splitting ourselves. But at that point I think it should go into run-command's prepare_shell_cmd(). That is, I I think it could take space out of the list of metachars that force us to invoke the shell, and do the word-splitting there. But not having thought very hard about it, there are probably corner cases where that optimization is detectable by the user (presumably unusual IFS, but maybe more?). -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-16 19:23 ` Jeff King @ 2024-07-16 20:02 ` Junio C Hamano 2024-07-17 5:52 ` Jeff King 2024-07-16 21:30 ` Andreas Schwab 1 sibling, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2024-07-16 20:02 UTC (permalink / raw) To: Jeff King; +Cc: Scott Moser, git Jeff King <peff@peff.net> writes: > [credential] > helper = cache --socket=/path/to/socket --timeout=123 > > Arguably we could have gotten away with word-splitting ourselves, > sticking the result in child_process.args, and avoided the shell. But > the use of the shell is documented in gitcredentials(7): > > helper > The name of an external credential helper, and any associated > options. If the helper name is not an absolute path, then the string > git credential- is prepended. The resulting string is executed by > the shell (so, for example, setting this to foo --option=bar will > execute git credential-foo --option=bar via the shell. See the > manual of specific helpers for examples of their use. > > So users may be depending on that to do "--socket=$HOME/.foo", or even > more exotic shell constructs. > > Again, it's possible that we could detect that no shell metacharacters > are in play and do the word-splitting ourselves. But at that point I > think it should go into run-command's prepare_shell_cmd(). That is, I I > think it could take space out of the list of metachars that force us to > invoke the shell, and do the word-splitting there. But not having > thought very hard about it, there are probably corner cases where that > optimization is detectable by the user (presumably unusual IFS, but > maybe more?). Well, I strongly object to an approach for us to "parse" anything. But even then it would be sensible to formulate: argv[0] = sh argv[1] = -c argv[2] = git-credential-cache --socket=/path/ --timeout=123 "$@" argv[3] = - argv[4] = NULL and if there is an argument say "get", extend it to argv[0] = sh argv[1] = -c argv[2] = git-credential-cache --socket=/path/ --timeout=123 "$@" argv[3] = - argv[4] = get argv[5] = NULL before passing the array to execv(), no? And with the metacharacter optimization to drop .use_shell we already have, a single-token /bin/myhelper case would then become argv[0] = /bin/myhelper argv[1] = get argv[2] = NULL naturally. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-16 20:02 ` Junio C Hamano @ 2024-07-17 5:52 ` Jeff King 0 siblings, 0 replies; 12+ messages in thread From: Jeff King @ 2024-07-17 5:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: Scott Moser, git On Tue, Jul 16, 2024 at 01:02:54PM -0700, Junio C Hamano wrote: > > Again, it's possible that we could detect that no shell metacharacters > > are in play and do the word-splitting ourselves. But at that point I > > think it should go into run-command's prepare_shell_cmd(). That is, I I > > think it could take space out of the list of metachars that force us to > > invoke the shell, and do the word-splitting there. But not having > > thought very hard about it, there are probably corner cases where that > > optimization is detectable by the user (presumably unusual IFS, but > > maybe more?). > > Well, I strongly object to an approach for us to "parse" anything. OK, it makes me nervous, too, so I am happy to leave it be (though it is interesting to hear that "make" does a similar optimization). > But even then it would be sensible to formulate: > > argv[0] = sh > argv[1] = -c > argv[2] = git-credential-cache --socket=/path/ --timeout=123 "$@" > argv[3] = - > argv[4] = NULL > > and if there is an argument say "get", extend it to > > argv[0] = sh > argv[1] = -c > argv[2] = git-credential-cache --socket=/path/ --timeout=123 "$@" > argv[3] = - > argv[4] = get > argv[5] = NULL > > before passing the array to execv(), no? Hmm. I think that is OK. It is a little funny to have some arguments pasted in and some via "$@", but I think in the end it should be indistinguishable to the user. In your example it is "git-credential-cache", whereas we do "git credential-cache" now. I _think_ the two should be equivalent at this point (since the parent process running this code would already have set up GIT_EXEC_PATH to find dashed forms), but it does give me pause. We could keep saying "git credential-cache", but then in most cases we would never trigger the metacharacter optimization, because of the space. And of course: Splitting it out like this: argv[0] = sh argv[1] = -c argv[2] = git "$@" argv[3] = - argv[4] = credential-cache --socket=/path/ --timeout=123 argv[5] = get argv[6] = NULL is wrong, because argv[4] is really a shell snippet, not a single argument (and we cannot split it ourselves without doing shell-like parsing). > And with the metacharacter optimization to drop .use_shell we > already have, a single-token /bin/myhelper case would then become > > argv[0] = /bin/myhelper > argv[1] = get > argv[2] = NULL > > naturally. Yes, I think that is the one case that would benefit. I do wonder how often people point to an absolute path, though, rather than git-credential-* or a more complex shell invocation. -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-16 19:23 ` Jeff King 2024-07-16 20:02 ` Junio C Hamano @ 2024-07-16 21:30 ` Andreas Schwab 2024-07-16 21:40 ` Paul Smith 2024-07-17 5:53 ` Jeff King 1 sibling, 2 replies; 12+ messages in thread From: Andreas Schwab @ 2024-07-16 21:30 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Scott Moser, git On Jul 16 2024, Jeff King wrote: > Again, it's possible that we could detect that no shell metacharacters > are in play and do the word-splitting ourselves. But at that point I > think it should go into run-command's prepare_shell_cmd(). This is what GNU make does (see construct_command_argv_internal), for performance reason. But run_command is probably not performance critical. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-16 21:30 ` Andreas Schwab @ 2024-07-16 21:40 ` Paul Smith 2024-07-17 5:53 ` Jeff King 1 sibling, 0 replies; 12+ messages in thread From: Paul Smith @ 2024-07-16 21:40 UTC (permalink / raw) To: Andreas Schwab, Jeff King; +Cc: Junio C Hamano, Scott Moser, git On Tue, 2024-07-16 at 23:30 +0200, Andreas Schwab wrote: > On Jul 16 2024, Jeff King wrote: > > > Again, it's possible that we could detect that no shell > > metacharacters are in play and do the word-splitting ourselves. But > > at that point I think it should go into run-command's > > prepare_shell_cmd(). > > This is what GNU make does (see construct_command_argv_internal), for > performance reason. But run_command is probably not performance > critical. Also I would definitely not recommend anyone look at this part of the GNU Make code for inspiration. It's an unholy mess. The concept is very good though. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Can dependency on /bin/sh be removed? 2024-07-16 21:30 ` Andreas Schwab 2024-07-16 21:40 ` Paul Smith @ 2024-07-17 5:53 ` Jeff King 1 sibling, 0 replies; 12+ messages in thread From: Jeff King @ 2024-07-17 5:53 UTC (permalink / raw) To: Andreas Schwab; +Cc: Junio C Hamano, Scott Moser, git On Tue, Jul 16, 2024 at 11:30:59PM +0200, Andreas Schwab wrote: > On Jul 16 2024, Jeff King wrote: > > > Again, it's possible that we could detect that no shell metacharacters > > are in play and do the word-splitting ourselves. But at that point I > > think it should go into run-command's prepare_shell_cmd(). > > This is what GNU make does (see construct_command_argv_internal), for > performance reason. But run_command is probably not performance > critical. Thanks, that's interesting to hear. I agree that run_command is not usually performance critical. Generally if we find ourselves spawning a lot of processes, the right solution is to accomplish the same thing with fewer processes (or even in-process), not micro-optimize out the intermediate shell. -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-07-17 5:53 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-15 18:41 Can dependency on /bin/sh be removed? Scott Moser 2024-07-15 20:18 ` Junio C Hamano 2024-07-15 21:46 ` brian m. carlson 2024-07-15 23:52 ` Jeff King 2024-07-16 15:23 ` Scott Moser 2024-07-16 16:32 ` Junio C Hamano 2024-07-16 19:23 ` Jeff King 2024-07-16 20:02 ` Junio C Hamano 2024-07-17 5:52 ` Jeff King 2024-07-16 21:30 ` Andreas Schwab 2024-07-16 21:40 ` Paul Smith 2024-07-17 5:53 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).