git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joey Hess <joey@kitenet.net>
To: git@vger.kernel.org
Subject: [PATCH] do not require filters to consume stdin
Date: Mon, 29 Aug 2011 16:31:07 -0400	[thread overview]
Message-ID: <20110829203107.GA4946@gnu.kitenet.net> (raw)

A clean filter that uses %f to examine a file may not need to consume
the entire file content from stdin every time it's run, due to caching
or other optimisations to support large files.

Ignore the SIGPIPE that may result when writing to the filter
if it exits without consuming stdin, and do not check that all
content is sent to it. The filter is still required to exit
successfully, so a crash in the filter should still be handled
correctly.
---

There has been discussion before about using clean and smudge filters
with %f to handle big files in git, with the file content stored outside
git somewhere.  A simplistic clean filter for large files could look
like this:

#!/bin/sh
file="$1"
ln -f $file ~/.big/$file
echo $file

But trying to use this will fail on truely large files. For example:

$ ls -l sorta.huge 
-rw-r--r-- 3 joey joey 524288000 Aug 29 15:19 sorta.huge
$ git add sorta.huge 
broken pipe  git add sorta.huge
$ echo $?
141

The SIGPIPE occurs because git expects the clean filter to read
the full file content from stdin. (Although if the content is small
enough, a SIGPIPE may not occur.) So the clean filter needs to
look like this:

#!/bin/sh
file="$1"
cat >/dev/null
ln -f $file ~/.big/$file
echo $file

But this means much more work has to be done whenever the clean filter
is run. Including every time git status is run. So it's currently
impractical to use clean/smudge filters like this for large files.
This patch should close that gap and allow such filters to be developed.

 convert.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/convert.c b/convert.c
index 416bf83..3d528c4 100644
--- a/convert.c
+++ b/convert.c
@@ -330,7 +330,7 @@ static int filter_buffer(int in, int out, void *data)
 	 */
 	struct child_process child_process;
 	struct filter_params *params = (struct filter_params *)data;
-	int write_err, status;
+	int write_err = 0, status;
 	const char *argv[] = { NULL, NULL };
 
 	/* apply % substitution to cmd */
@@ -360,9 +360,11 @@ static int filter_buffer(int in, int out, void *data)
 	if (start_command(&child_process))
 		return error("cannot fork to run external filter %s", params->cmd);
 
-	write_err = (write_in_full(child_process.in, params->src, params->size) < 0);
+	signal(SIGPIPE, SIG_IGN);
+	write_in_full(child_process.in, params->src, params->size);
 	if (close(child_process.in))
 		write_err = 1;
+	signal(SIGPIPE, SIG_DFL);
 	if (write_err)
 		error("cannot feed the input to external filter %s", params->cmd);
 
-- 
1.7.5.4

             reply	other threads:[~2011-08-29 20:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-29 20:31 Joey Hess [this message]
2011-08-29 22:53 ` [PATCH] do not require filters to consume stdin Junio C Hamano
2011-08-30  1:20   ` Joey Hess
2011-12-05 19:29 ` hooks that do not consume stdin sometimes crash git with SIGPIPE Joey Hess
2011-12-05 21:43   ` Jeff King
2011-12-06  1:39   ` Junio C Hamano
2011-12-06  3:11     ` Joey Hess

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110829203107.GA4946@gnu.kitenet.net \
    --to=joey@kitenet.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).