* [PATCH] do not require filters to consume stdin @ 2011-08-29 20:31 Joey Hess 2011-08-29 22:53 ` Junio C Hamano 2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess 0 siblings, 2 replies; 7+ messages in thread From: Joey Hess @ 2011-08-29 20:31 UTC (permalink / raw) To: git A clean filter that uses %f to examine a file may not need to consume the entire file content from stdin every time it's run, due to caching or other optimisations to support large files. Ignore the SIGPIPE that may result when writing to the filter if it exits without consuming stdin, and do not check that all content is sent to it. The filter is still required to exit successfully, so a crash in the filter should still be handled correctly. --- There has been discussion before about using clean and smudge filters with %f to handle big files in git, with the file content stored outside git somewhere. A simplistic clean filter for large files could look like this: #!/bin/sh file="$1" ln -f $file ~/.big/$file echo $file But trying to use this will fail on truely large files. For example: $ ls -l sorta.huge -rw-r--r-- 3 joey joey 524288000 Aug 29 15:19 sorta.huge $ git add sorta.huge broken pipe git add sorta.huge $ echo $? 141 The SIGPIPE occurs because git expects the clean filter to read the full file content from stdin. (Although if the content is small enough, a SIGPIPE may not occur.) So the clean filter needs to look like this: #!/bin/sh file="$1" cat >/dev/null ln -f $file ~/.big/$file echo $file But this means much more work has to be done whenever the clean filter is run. Including every time git status is run. So it's currently impractical to use clean/smudge filters like this for large files. This patch should close that gap and allow such filters to be developed. convert.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/convert.c b/convert.c index 416bf83..3d528c4 100644 --- a/convert.c +++ b/convert.c @@ -330,7 +330,7 @@ static int filter_buffer(int in, int out, void *data) */ struct child_process child_process; struct filter_params *params = (struct filter_params *)data; - int write_err, status; + int write_err = 0, status; const char *argv[] = { NULL, NULL }; /* apply % substitution to cmd */ @@ -360,9 +360,11 @@ static int filter_buffer(int in, int out, void *data) if (start_command(&child_process)) return error("cannot fork to run external filter %s", params->cmd); - write_err = (write_in_full(child_process.in, params->src, params->size) < 0); + signal(SIGPIPE, SIG_IGN); + write_in_full(child_process.in, params->src, params->size); if (close(child_process.in)) write_err = 1; + signal(SIGPIPE, SIG_DFL); if (write_err) error("cannot feed the input to external filter %s", params->cmd); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] do not require filters to consume stdin 2011-08-29 20:31 [PATCH] do not require filters to consume stdin Joey Hess @ 2011-08-29 22:53 ` Junio C Hamano 2011-08-30 1:20 ` Joey Hess 2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2011-08-29 22:53 UTC (permalink / raw) To: Joey Hess; +Cc: git Joey Hess <joey@kitenet.net> writes: > There has been discussion before about using clean and smudge filters > with %f to handle big files in git, with the file content stored outside > git somewhere. A simplistic clean filter for large files could look > like this: > > #!/bin/sh > file="$1" > ln -f $file ~/.big/$file > echo $file Isn't this filter already broken if clean request is for a blob contents that is different from what is on the filesystem? The name %f is passed to give the filter a _hint_ on what the path is about (so that the filter can choose to work differently depending on the extension, for example), but the data may or may not come from the filesystem, depending on what is calling the filter, no? Most notably, renormalize_buffer() would call convert_to_git() on a buffer that is internal, possibly quite different from what is in the working tree. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] do not require filters to consume stdin 2011-08-29 22:53 ` Junio C Hamano @ 2011-08-30 1:20 ` Joey Hess 0 siblings, 0 replies; 7+ messages in thread From: Joey Hess @ 2011-08-30 1:20 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1161 bytes --] Junio C Hamano wrote: > Isn't this filter already broken if clean request is for a blob contents > that is different from what is on the filesystem? The name %f is passed > to give the filter a _hint_ on what the path is about (so that the filter > can choose to work differently depending on the extension, for example), > but the data may or may not come from the filesystem, depending on what is > calling the filter, no? > > Most notably, renormalize_buffer() would call convert_to_git() on a buffer > that is internal, possibly quite different from what is in the working > tree. So during a merge. gitattributes(5) is not very clear about this, it would probably be good to add a caveat there about what %f is not. This seems to make it impractical to build the sort of thing described here: http://lists-archives.org/git/737857-fwd-git-and-large-binaries-a-proposed-solution.html Arguably that thread already reached the same conclusion about using smudge/clean for handling large files, for other reasons. Since I already have something that works without smudge/clean, perhaps I should give up on them. -- see shy jo [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* hooks that do not consume stdin sometimes crash git with SIGPIPE 2011-08-29 20:31 [PATCH] do not require filters to consume stdin Joey Hess 2011-08-29 22:53 ` Junio C Hamano @ 2011-12-05 19:29 ` Joey Hess 2011-12-05 21:43 ` Jeff King 2011-12-06 1:39 ` Junio C Hamano 1 sibling, 2 replies; 7+ messages in thread From: Joey Hess @ 2011-12-05 19:29 UTC (permalink / raw) To: git; +Cc: Lars Wirzenius [-- Attachment #1: Type: text/plain, Size: 1638 bytes --] We had a weird problem where, after moving to a new, faster server, "git push" would sometimes fail like this: Unpacking objects: 100% (3/3), done. fatal: The remote end hung up unexpectedly fatal: The remote end hung up unexpectedly Turns out that git-receive-pack was dying due to an uncaught SIGPIPE. The SIGPIPE occurred when it tried to write to the pre-receive hook's stdin. The pre-receive hook, in this case, was able to do all the checks it needed to do[1] without the input, and so did exit(0) without consuming it. Apparently that causes a race. Most of the time, git forks the hook, writes output to the hook, and then the hook runs, ignores it, and exits. But sometimes, on our new faster (and SMP) server, git forked the hook, and it ran, and exited, before git got around to writing to it, resulting in the SIGPIPE. write(7, "c9f98c67d70a1cfeba382ec27d87644a"..., 100) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- I think git should ignore SIGPIPE when writing to hooks. Otherwise, hooks may have to go out of their way to consume all input, and as I've seen, the races when they fail to do this can lurk undiscovered. Note that I encountered this same sort of problem from another direction (involving smudge filters) not long ago, and sent a patch, in <20110829203107.GA4946@gnu.kitenet.net>. That wasn't applied, and is in different code than the case I outlined above. -- see shy jo [1] If you're wondering, it only needed to check that the push was coming from a trusted UID. With an untrusted UID, it did further checks that consumed the stdin. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hooks that do not consume stdin sometimes crash git with SIGPIPE 2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess @ 2011-12-05 21:43 ` Jeff King 2011-12-06 1:39 ` Junio C Hamano 1 sibling, 0 replies; 7+ messages in thread From: Jeff King @ 2011-12-05 21:43 UTC (permalink / raw) To: Joey Hess; +Cc: git, Lars Wirzenius On Mon, Dec 05, 2011 at 03:29:30PM -0400, Joey Hess wrote: > We had a weird problem where, after moving to a new, faster server, > "git push" would sometimes fail like this: > > Unpacking objects: 100% (3/3), done. > fatal: The remote end hung up unexpectedly > fatal: The remote end hung up unexpectedly > > Turns out that git-receive-pack was dying due to an uncaught SIGPIPE. > The SIGPIPE occurred when it tried to write to the pre-receive hook's > stdin. The pre-receive hook, in this case, was able to do all the checks > it needed to do[1] without the input, and so did exit(0) without > consuming it. Ugh. Yeah, receive-pack should probably just be ignoring SIGPIPE entirely. It checks the return from write() properly where it matters, so SIGPIPE is just an annoyance. I would go so far as to say that git should be ignoring SIGPIPE 99% of the time. We have crap like this sprinkled throughout the code base: $ git grep -C1 SIGPIPE builtin/tag.c- /* When the username signingkey is bad, program could be terminated builtin/tag.c: * because gpg exits without reading and then write gets SIGPIPE. */ builtin/tag.c: signal(SIGPIPE, SIG_IGN); [...] upload-pack.c- * If rev-list --stdin encounters an unknown commit, it upload-pack.c: * terminates, which will cause SIGPIPE in the write loop upload-pack.c- */ upload-pack.c: sigchain_push(SIGPIPE, SIG_IGN); but I find it highly unlikely that they are covering all of the cases. You found one already, and these things can quite often be race-condition heisenbugs. The one place where SIGPIPE _is_ useful is for things like "git log" which are just dumping to a pager over stdout. When the pager dies, we can stop bothering to produce output. It would be really nice if we could write a sigpipe handler that knew which fd caused the the signal, and then we could do something like: void sigpipe_handler(int sig) { /* If we're writing to a pager over stdout, then there is * little use in writing more; nobody is interested in our * output. */ if (get_fd_that_caused_sigpipe() == 1 && pager_in_use) exit(1); /* Otherwise, ignore it, as it's a write to some auxiliary * process and we will be careful about checking the return * code from write(). */ } But I don't think such a function exists. We could just check pager_in_use, which would cover most cases (e.g., programs like receive-pack don't use a pager). But it would fail in the case of something like "git log" using an auxiliary process that closes the pipe early. Maybe that would be good enough, though. I dunno. Another option is to just ignore SIGPIPE entirely, and convince programs like log to actually bother checking the result of write(). It would be a slight pain to check every printf() call we make in log-tree.c, but we could do one of: 1. Make a set of stdio wrapper scripts that exit gracefully on EPIPE. Using the wrapper scripts everywhere is a slight pain, but would work pretty well, and adapts easily to other places like printing lists of refs, etc. 2. Don't check every printf. After printing each commit, check ferror(stdout) and exit as appropriate. This is a very small amount of code, but you'd need to do it in several places (i.e., anywhere that produces a lot of output). > I think git should ignore SIGPIPE when writing to hooks. Otherwise, > hooks may have to go out of their way to consume all input, and as I've > seen, the races when they fail to do this can lurk undiscovered. Yeah, certainly. The question to me is whether we should just stick a SIG_IGN in the beginning of receive-pack, or whether we should try to deal with this problem everywhere. For example, I suspect the same problem exists in the credential helpers I wrote recently. Generally they will read all of their input, but you could do something like: [credential "https://github.com"] username = peff helper = "! f() { # if we call this helper, we know we want the # github password, or else this helper config # would never have been triggered. so we # don't even have to bother reading our stdin. # We only handle getting the password. test "$1" = "get" || return # Presumably gpg-agent will ask for and cache # your gpg password. p=`gpg -qd --no-tty <~/.github-password.gpg` echo "password=$p" }; f" This can racily fail if our write happens after the helper has already finished (unlikely, but possible). -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hooks that do not consume stdin sometimes crash git with SIGPIPE 2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess 2011-12-05 21:43 ` Jeff King @ 2011-12-06 1:39 ` Junio C Hamano 2011-12-06 3:11 ` Joey Hess 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2011-12-06 1:39 UTC (permalink / raw) To: Joey Hess; +Cc: git, Lars Wirzenius Joey Hess <joey@kitenet.net> writes: > We had a weird problem where, after moving to a new, faster server, > "git push" would sometimes fail like this: > > Unpacking objects: 100% (3/3), done. > fatal: The remote end hung up unexpectedly > fatal: The remote end hung up unexpectedly > > Turns out that git-receive-pack was dying due to an uncaught SIGPIPE. Why do you have a hook that is expected to read from receive-pack that does _not_ read anything from it in the first place? If you do not care about the update status given to pre-receive, shouldn't you be using the update hook and ignoring the command line parameters instead? I am not saying this is a user configuration error and there is nothing to fix---Git shouldn't get killed merely because of configuration error. I am wondering if we would want to have a uniform way to tell run_*_hook() functions that the hook writer explicitly declines to get any input. E.g. "hooks/pre-receive-noinput" is called instead of "hooks/pre-receive" and we do not send any input to it, or something like that. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: hooks that do not consume stdin sometimes crash git with SIGPIPE 2011-12-06 1:39 ` Junio C Hamano @ 2011-12-06 3:11 ` Joey Hess 0 siblings, 0 replies; 7+ messages in thread From: Joey Hess @ 2011-12-06 3:11 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Lars Wirzenius [-- Attachment #1: Type: text/plain, Size: 737 bytes --] Junio C Hamano wrote: > Why do you have a hook that is expected to read from receive-pack that > does _not_ read anything from it in the first place? If you do not care > about the update status given to pre-receive, shouldn't you be using the > update hook and ignoring the command line parameters instead? My hook *does* consume the stdin in one case, but in another case it does no checks and so can immediately exit. Also, I didn't want it to be run once per updated ref as the update hook is, since the tests it performs are rather expensive -- loading a perl wiki engine in order to check that the changeset contains only changes to wiki pages that are allowed based on the wiki's configuration. -- see shy jo [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-12-06 3:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-08-29 20:31 [PATCH] do not require filters to consume stdin Joey Hess 2011-08-29 22:53 ` Junio C Hamano 2011-08-30 1:20 ` Joey Hess 2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess 2011-12-05 21:43 ` Jeff King 2011-12-06 1:39 ` Junio C Hamano 2011-12-06 3:11 ` Joey Hess
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).