git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] git-rev-list: give better diagnostic for failed write
@ 2007-06-25 20:32 Jim Meyering
  2007-06-25 20:59 ` Linus Torvalds
  2007-06-25 21:39 ` Jim Meyering
  0 siblings, 2 replies; 33+ messages in thread
From: Jim Meyering @ 2007-06-25 20:32 UTC (permalink / raw)
  To: git

[this patch depends on the one I posted here:
 http://marc.info/?l=git&m=118280134031923&w=2 ]

Without this patch, git-rev-list unnecessarily omits strerror(errno)
from its diagnostic, upon write failure:

    $ ./git-rev-list --max-count=1 HEAD > /dev/full
    fatal: write failure on standard output

With the patch, git reports the desired ENOSPC diagnostic:

    fatal: write failure on standard output: No space left on device

* builtin-rev-list (show_commit): Don't fflush stdout here.
Instead, let the fclose in main do it, so there's a better
chance the underlying cause (errno) will be reported.

Signed-off-by: Jim Meyering <jim@meyering.net>
---
 builtin-rev-list.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..62f0ba9 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,6 @@ static void show_commit(struct commit *commit)
 		printf("%s%c", buf, hdr_termination);
 		free(buf);
 	}
-	fflush(stdout);
 	if (commit->parents) {
 		free_commit_list(commit->parents);
 		commit->parents = NULL;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 20:32 [PATCH] git-rev-list: give better diagnostic for failed write Jim Meyering
@ 2007-06-25 20:59 ` Linus Torvalds
  2007-06-25 21:52   ` Jim Meyering
  2007-06-25 21:39 ` Jim Meyering
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2007-06-25 20:59 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git



On Mon, 25 Jun 2007, Jim Meyering wrote:
>
> [this patch depends on the one I posted here:
>  http://marc.info/?l=git&m=118280134031923&w=2 ]
> 
> Without this patch, git-rev-list unnecessarily omits strerror(errno)
> from its diagnostic, upon write failure:

And this is a perfect example of what's wrong with the whole thing.

Dammit, how many times do I need to say this:

 - If you want reliable errors, don't use stdio!

That fflush is there FOR A REASON. You removed it FOR A MUCH LESS 
IMPORTANT REASON!

That fflush is there exactly because WE DO NOT WANT TO BUFFER the list of 
commits, because that thing is meant very much to be used for pipelines, 
and it's quite common that the receiving end is going to do something 
asynchronous with the result, and can - and does - want the results as 
soon as possible.

IOW, things like "gitk" use git-rev-list exactly to get the list of 
commits, and they want that list *incrementally*. They don't want to wait 
for git-rev-list to have filled up some 8kB buffer of commits. Especially 
since generating those commits can be slow if we're talking about a big 
tree and some path-limited stuff.

So for example, do something like

	git rev-list HEAD -- drivers/char/drm/Makefile

and if you don't see the result scroll a line at a time on a slower 
machine, there's something *wrong*. 

Junio, I'm NAK'ing this very forcefully!

Jim: I don't know what I'm doing wrong, but I'm apparently not reaching 
you. So let me try one more time:

 - stdio really isn't very good with error handling

 - if you use stdio, YOU HAD BETTER ACCEPT THAT

 - don't screw up basic functionality in your *insane* quest to get stdio 
   to give you ENOSPC. It's not going to happen. Not that way. Just face 
   the fact that stdio *will* throw error numbers away.

The whole notion of "buffered IO" and "reliable errors" is simply not 
something that goes well together.

			Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 20:32 [PATCH] git-rev-list: give better diagnostic for failed write Jim Meyering
  2007-06-25 20:59 ` Linus Torvalds
@ 2007-06-25 21:39 ` Jim Meyering
  2007-06-25 21:53   ` Linus Torvalds
  1 sibling, 1 reply; 33+ messages in thread
From: Jim Meyering @ 2007-06-25 21:39 UTC (permalink / raw)
  To: git

Jim Meyering <jim@meyering.net> wrote:
> [this patch depends on the one I posted here:
>  http://marc.info/?l=git&m=118280134031923&w=2 ]

Here's a version of that patch that retains the fflush call
and adds a comment explaining why it's needed.

Without this patch, git-rev-list unnecessarily omits strerror(errno)
from its diagnostic, upon write failure:

    $ ./git-rev-list --max-count=1 HEAD > /dev/full
    fatal: write failure on standard output

With the patch, git reports the desired ENOSPC diagnostic:

    fatal: write failure on standard output: No space left on device

* builtin-rev-list (show_commit): Diagnose a failed fflush call.

Signed-off-by: Jim Meyering <jim@meyering.net>
---
 builtin-rev-list.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..f13a594 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,12 @@ static void show_commit(struct commit *commit)
 		printf("%s%c", buf, hdr_termination);
 		free(buf);
 	}
-	fflush(stdout);
+
+	/* Flush regularly.
+	   This is especially important for an asynchronous consumer.  */
+	if (fflush(stdout))
+		die("write failure on standard output: %s", strerror(errno));
+
 	if (commit->parents) {
 		free_commit_list(commit->parents);
 		commit->parents = NULL;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 20:59 ` Linus Torvalds
@ 2007-06-25 21:52   ` Jim Meyering
  2007-06-25 22:20     ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Jim Meyering @ 2007-06-25 21:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, 25 Jun 2007, Jim Meyering wrote:
>>
>> [this patch depends on the one I posted here:
>>  http://marc.info/?l=git&m=118280134031923&w=2 ]
>>
>> Without this patch, git-rev-list unnecessarily omits strerror(errno)
>> from its diagnostic, upon write failure:
>
> And this is a perfect example of what's wrong with the whole thing.
>
> Dammit, how many times do I need to say this:
>
>  - If you want reliable errors, don't use stdio!
>
> That fflush is there FOR A REASON. You removed it FOR A MUCH LESS
> IMPORTANT REASON!

Wow.  No need to curse and get into ALL_CAPS_MODE every time you
reply to me.  It does not advance your cause.

Remember: I'm trying to improve existing code here.
You should save some of your ire for the person who wrote that code.

> That fflush is there exactly because WE DO NOT WANT TO BUFFER the list of
> commits, because that thing is meant very much to be used for pipelines,
> and it's quite common that the receiving end is going to do something
> asynchronous with the result, and can - and does - want the results as
> soon as possible.

That's good to know.  I'm glad you pointed it out.  It would have been
nice to have a comment.  However, wouldn't it be better at least to check
for and report fflush failure?  fflush usually does a write, after all.
Most of the rest of the code is careful to diagnose write errors at
the source.  Why not here?

I've posted a revised patch.

> IOW, things like "gitk" use git-rev-list exactly to get the list of
> commits, and they want that list *incrementally*. They don't want to wait
> for git-rev-list to have filled up some 8kB buffer of commits. Especially
> since generating those commits can be slow if we're talking about a big
> tree and some path-limited stuff.
>
> So for example, do something like
>
> 	git rev-list HEAD -- drivers/char/drm/Makefile
>
> and if you don't see the result scroll a line at a time on a slower
> machine, there's something *wrong*.
>
> Junio, I'm NAK'ing this very forcefully!
>
> Jim: I don't know what I'm doing wrong, but I'm apparently not reaching
> you. So let me try one more time:
>
>  - stdio really isn't very good with error handling
>
>  - if you use stdio, YOU HAD BETTER ACCEPT THAT

Using stdio is fine, as long as you know and respect its limitations.

It's a real shame that you have to intersperse your often-valuable
feedback with such vitriol.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 21:39 ` Jim Meyering
@ 2007-06-25 21:53   ` Linus Torvalds
  2007-06-25 22:08     ` Jim Meyering
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2007-06-25 21:53 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git



On Mon, 25 Jun 2007, Jim Meyering wrote:
>
> Here's a version of that patch that retains the fflush call
> and adds a comment explaining why it's needed.

Ok. I will hereby just suggest to Junio that he just not take patches from 
you.

You seem to be totally unable to ever really think or worry about your own 
little uninteresting test-case, and have shown yourself totally 
uninterested in anything anybody ever tells you.

In other words, you now screwed up EPIPE.

AGAIN.

And why? All apparently because you want "disk full" rather than just 
"write error".

Jim, you really need to see past your small test, and think about the 
bigger picture.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 21:53   ` Linus Torvalds
@ 2007-06-25 22:08     ` Jim Meyering
  0 siblings, 0 replies; 33+ messages in thread
From: Jim Meyering @ 2007-06-25 22:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, 25 Jun 2007, Jim Meyering wrote:
>>
>> Here's a version of that patch that retains the fflush call
>> and adds a comment explaining why it's needed.
>
> Ok. I will hereby just suggest to Junio that he just not take patches from
> you.
>
> You seem to be totally unable to ever really think or worry about your own
> little uninteresting test-case, and have shown yourself totally
> uninterested in anything anybody ever tells you.
>
> In other words, you now screwed up EPIPE.
>
> AGAIN.
>
> And why? All apparently because you want "disk full" rather than just
> "write error".
>
> Jim, you really need to see past your small test, and think about the
> bigger picture.

No.  I don't keep the "small", git-specific, picture in mind
all the time.  Git is the only project I contribute to with
this no-EPIPE restriction, and it doesn't come naturally yet.

Here's the patch you seem to want.

---------------------------------------------------------
Without this patch, git-rev-list unnecessarily omits strerror(errno)
from its diagnostic, upon write failure:

    $ ./git-rev-list --max-count=1 HEAD > /dev/full
    fatal: write failure on standard output

With the patch, git reports the desired ENOSPC diagnostic:

    fatal: write failure on standard output: No space left on device

* builtin-rev-list (show_commit): Diagnose a failed fflush call.

Signed-off-by: Jim Meyering <jim@meyering.net>
---
 builtin-rev-list.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..94f8fca 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,12 @@ static void show_commit(struct commit *commit)
 		printf("%s%c", buf, hdr_termination);
 		free(buf);
 	}
-	fflush(stdout);
+
+	/* Flush regularly.
+	   This is especially important for an asynchronous consumer.  */
+	if (fflush(stdout) && errno != EPIPE)
+		die("write failure on standard output: %s", strerror(errno));
+
 	if (commit->parents) {
 		free_commit_list(commit->parents);
 		commit->parents = NULL;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 21:52   ` Jim Meyering
@ 2007-06-25 22:20     ` Linus Torvalds
  2007-06-25 22:56       ` Linus Torvalds
  2007-06-27  8:59       ` Jim Meyering
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2007-06-25 22:20 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git



On Mon, 25 Jun 2007, Jim Meyering wrote:
> 
> Remember: I'm trying to improve existing code here.
> You should save some of your ire for the person who wrote that code.

Ehh. Remind me who I should be pissed at when the old code was _better_ 
before your change?

With the current git.c, we report write errors quite well. We don't give 
the exact output you want, but on a scale of 1-10, how important is that? 
Pretty damn low on the list.

And the reason I'm really really irritated at you is that you ignore me 
when I tell you what your bugs are.

 - I *told* you that EPIPE is special. What did you do? Ignore my advice, 
   and made a broken patch that did exactly the opposite of what I told 
   you.

 - And I *told* you that you shouldn't care about errno for stdio, because 
   stdio was broken. What did you do? You again ignored my advice, and 
   made _another_ broken patch, exactly the _opposite_ of what I told you.

If you really really *must* get that ENOSPC error string output, create a 
helper function like

	void flush_or_die(FILE *f, const char *desc)
	{
		if (ferror(f))
			die("write failure on %s", desc)
		if (fflush(f)) {
			if (errno == EPIPE)
				exit(0);
			die("write failure on %s: %s",
				desc, strerror(errno));
		}
	}

and then you can start adding calls to "flush_or_die()" to appropriate 
places. You could replace the "fflush()" in builtin-rev-list.c with a 
"flush_or_die()".

And then you could add a call to "log_tree_commit()" (in log-tree.c), and 
that would probably be an improvement too (especially if we start having 
things like gitk parse "git log" output, and try to deprecate the old 
really low-level plumbing a bit).

			Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 22:20     ` Linus Torvalds
@ 2007-06-25 22:56       ` Linus Torvalds
  2007-06-25 23:01         ` Linus Torvalds
                           ` (2 more replies)
  2007-06-27  8:59       ` Jim Meyering
  1 sibling, 3 replies; 33+ messages in thread
From: Linus Torvalds @ 2007-06-25 22:56 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git



On Mon, 25 Jun 2007, Linus Torvalds wrote:
> 
> If you really really *must* get that ENOSPC error string output, create a 
> helper function like
> 
> 	void flush_or_die(FILE *f, const char *desc)
> 	{
> 		if (ferror(f))
> 			die("write failure on %s", desc)

Actually, even this is really nasty, and it's a case where the current 
"git.c" code can also fail.

It's an example of where EPIPE can actually cause a write failure that you 
can never figure out what the reason for the error was, because the flush 
was caused by something earlier (write too much for the buffer or 
whatever), exactly because stdio throws the error away.

So after thinking about it some more, I would suggest just ignoring ferror 
entirely, and hoping that any errors are caught by the fflush().

I do hate stdio error checking. In my opinion, there really is only *one* 
correct way to use stdio error checking: ignore it. It doesn't work. The 
thing is fundamentally mis-designed for error handling.

So I think the right solution would literally be to either not do this 
broken error checking at all, or to rewrite the code that cares about 
errors to not use stdio.

There's also another issue: regular files really *are* different from 
pipes and sockets and other things. Not because of EPIPE, but because you 
want different buffering behaviour. For a regular file, we really don't 
even care about the line buffering, and we'd actually be better off (from 
a performance angle) without it.

But we don't have any sane way to save that kind of information (and we 
definitely do *not* want to do the "fstat()" thing on every flush). We 
could use the stdio buffering mode, but

 - it's too weak (we want not line buffered or block buffered, we want 
   basically "record buffered")

 - I don't think there are any portable ways to read it (only set it, 
   using "setbuf()" and friends).

Anyway, getting rid of stdio for writes we care about really *would* be a 
nice thing. But it's a lot of boring, nasty work.

So here's a patch that I think is acceptable. IT IS NOT PERFECT. Stdio 
simply cannot do a good job on errors, but it has a comment about the case 
where it just decides to ignore ferror() instead of doing what I suggest 
above.

I'm not saying this is great. But doing this right really does require 
avoiding stdio entirely.

Does this work for you?

(I pass the "desc" string thing in case there is some future use where we 
also want to use stdio, but we use it for something else than just regular 
stdout. Dunno).

		Linus

---
 builtin-rev-list.c |    2 +-
 cache.h            |    2 ++
 log-tree.c         |    1 +
 write_or_die.c     |   20 ++++++++++++++++++++
 4 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..3980bf4 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,7 @@ static void show_commit(struct commit *commit)
 		printf("%s%c", buf, hdr_termination);
 		free(buf);
 	}
-	fflush(stdout);
+	flush_or_die(stdout, "stdout");
 	if (commit->parents) {
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
diff --git a/cache.h b/cache.h
index ed83d92..aae9e2a 100644
--- a/cache.h
+++ b/cache.h
@@ -532,6 +532,8 @@ extern char git_default_name[MAX_GITNAME];
 extern const char *git_commit_encoding;
 extern const char *git_log_output_encoding;
 
+/* IO helper functions */
+extern void flush_or_die(FILE *, const char *);
 extern int copy_fd(int ifd, int ofd);
 extern int read_in_full(int fd, void *buf, size_t count);
 extern int write_in_full(int fd, const void *buf, size_t count);
diff --git a/log-tree.c b/log-tree.c
index 0cf21bc..061ecf7 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		shown = 1;
 	}
 	opt->loginfo = NULL;
+	flush_or_die(stdout, "stdout");
 	return shown;
 }
diff --git a/write_or_die.c b/write_or_die.c
index 5c4bc85..84f411d 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,5 +1,25 @@
 #include "cache.h"
 
+/*
+ * Some cases use stdio, but want to flush after the write
+ * to get error handling (and to get better interactive
+ * behaviour - not buffering excessively).
+ *
+ * Of course, if the flush happened within the write itself,
+ * we've already lost the error code, and cannot report it any
+ * more. So we just ignore that case instead (and hope we get
+ * the right error code on the flush).
+ */
+void flush_or_die(FILE *f, const char *desc)
+{
+	if (fflush(f)) {
+		if (errno == EPIPE)
+			exit(0);
+		die("write failure on %s: %s",
+			desc, strerror(errno));
+	}
+}
+
 int read_in_full(int fd, void *buf, size_t count)
 {
 	char *p = buf;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 22:56       ` Linus Torvalds
@ 2007-06-25 23:01         ` Linus Torvalds
  2007-06-27  8:56           ` Jim Meyering
  2007-06-25 23:16         ` Linus Torvalds
  2007-06-26  9:06         ` [PATCH] git-rev-list: give better diagnostic for failed write Jeff King
  2 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2007-06-25 23:01 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git



On Mon, 25 Jun 2007, Linus Torvalds wrote:
> 
> Actually, even this is really nasty, and it's a case where the current 
> "git.c" code can also fail.

Just to clarify: this is why the git.c code obviously does the fstat(), in 
case anybody wondered.

So I didn't mean to imply that the new git.c code in 'next' is wrong, I 
just meant to imply that the "ferror()"+"fflush()" sequence that it uses 
and that I copied for my example is a very unreliable sequence, and it 
basically fails exactly because you can never know what caused the 
ferror() to trigger - if it *ever* triggers, you're basically screwed.

I wonder how many applications actually ever use ferror() and friends. 

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 22:56       ` Linus Torvalds
  2007-06-25 23:01         ` Linus Torvalds
@ 2007-06-25 23:16         ` Linus Torvalds
  2007-06-26 17:11           ` Theodore Tso
  2007-06-26  9:06         ` [PATCH] git-rev-list: give better diagnostic for failed write Jeff King
  2 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2007-06-25 23:16 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git



On Mon, 25 Jun 2007, Linus Torvalds wrote:
> 
> There's also another issue: regular files really *are* different from 
> pipes and sockets and other things. Not because of EPIPE, but because you 
> want different buffering behaviour. For a regular file, we really don't 
> even care about the line buffering, and we'd actually be better off (from 
> a performance angle) without it.

Just for fun, I tried this out.

Doing

	time git log > logfile

on the kernel repo with and without the patch I just sent out, I get:

Without:

	real    0m1.361s
	user    0m1.312s
	sys     0m0.040s

With:

	real    0m1.687s
	user    0m1.392s
	sys     0m0.284s

so doing the extra flushing does actually cost us (it's just fundamentally 
more expensive to do disk IO on non-block-boundaries).

It would be much nicer if we only did it for sockets and pipes, which 
don't have the same block-boundary issues anyway (there's still the system 
call cost, but on a pipe/socket, the real costs tend to be elsewhere).

Again, this is something that a non-stdio-based buffering library would 
easily handle. You could just test the file descriptor _once_ at the 
beginning, to see if it's a regular file or not. And then you could have 
the error handling where it belongs (when the IO is actually done, and the 
error actually happens) rather than in the callers using a bad interface 
that sometimes loses 'errno'.

Btw, to balance the above performance comment: doing the flush_or_die() 
obviously *does* mean that you get better performance in the odd cases. 
For example, if you do

	[torvalds@woody linux]$ trap '' SIGPIPE
	[torvalds@woody linux]$ time git log | head

I get get 0.002s, while it used to be:

	real    0m1.382s
	user    0m1.340s
	sys     0m0.028s

just because it did the whole thing regardless of any EPIPE errors.

Of course, that case probably isn't very usual, but I could imagine that 
some users of "git blame -C --incremental" could actually cause situations 
like this (ie just stop listening when they got the part they're 
interested in, and maybe they'd have some strange reason to ignore 
SIGPIPE).

So I'm not opposed to the patch I sent out, I just wanted to point out 
that this is an area we *could* improve upon if we didn't do that stdio 
thing.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 22:56       ` Linus Torvalds
  2007-06-25 23:01         ` Linus Torvalds
  2007-06-25 23:16         ` Linus Torvalds
@ 2007-06-26  9:06         ` Jeff King
  2007-06-26 17:12           ` Linus Torvalds
  2 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2007-06-26  9:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jim Meyering, git

On Mon, Jun 25, 2007 at 03:56:11PM -0700, Linus Torvalds wrote:

> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
>  		shown = 1;
>  	}
>  	opt->loginfo = NULL;
> +	flush_or_die(stdout, "stdout");
>  	return shown;
>  }

I think flushing here is a good change regardless of the error checking.
Sometimes, when you are severely limiting commits, the whole output is
smaller than the buffer, and you end up waiting a long time for any
output even though your answer may have been found immediately.

For example, 'git-whatchanged -Sfoo' when 'foo' was introduced in the
last couple of commits (but wasn't referenced before) will have to
calculate diffs on all of history before producing output. Flushing
after every commit restores the illusion that git provides your answer
instaneously. :)

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 23:16         ` Linus Torvalds
@ 2007-06-26 17:11           ` Theodore Tso
  2007-06-26 17:32             ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Theodore Tso @ 2007-06-26 17:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jim Meyering, git

On Mon, Jun 25, 2007 at 04:16:56PM -0700, Linus Torvalds wrote:
> Again, this is something that a non-stdio-based buffering library would 
> easily handle. You could just test the file descriptor _once_ at the 
> beginning, to see if it's a regular file or not. And then you could have 
> the error handling where it belongs (when the IO is actually done, and the 
> error actually happens) rather than in the callers using a bad interface 
> that sometimes loses 'errno'.

Is there something obviously wrong with doing something like this?

	if ((fstat(fileno(stdout), &st) < 0) &&
	    !S_ISREG(st.st_mode))
		setbuf(stdout, NULL);

This would change stdout to use completely unbuffered I/O we're not
sending the output to a file.

						- Ted

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-26  9:06         ` [PATCH] git-rev-list: give better diagnostic for failed write Jeff King
@ 2007-06-26 17:12           ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2007-06-26 17:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Jim Meyering, git



On Tue, 26 Jun 2007, Jeff King wrote:
> 
> I think flushing here is a good change regardless of the error checking.
> Sometimes, when you are severely limiting commits, the whole output is
> smaller than the buffer, and you end up waiting a long time for any
> output even though your answer may have been found immediately.

Exactly. That's why it was done in git-rev-list, and why it's good to do 
in "git log" too.

The slowdown worries me a bit, but it's only noticeable with *lots* of 
output - ie no path limiting (and no diffs - the diff generation becomes 
the bottleneck if you do diffs). So the flushing slows down a case that we 
do ridiculously well right now, so I doubt anybody will really care.

It's more a benchmarking issue: "we can show the whole log of the kernel 
in under two seconds", and it didn't slow down *that* much.

> For example, 'git-whatchanged -Sfoo' when 'foo' was introduced in the
> last couple of commits (but wasn't referenced before) will have to
> calculate diffs on all of history before producing output. Flushing
> after every commit restores the illusion that git provides your answer
> instaneously. :)

On that note, it should probably also do it for the incremental output of 
git blame.

		Linus
---

diff --git a/builtin-blame.c b/builtin-blame.c
index f7e2c13..86ab6c7 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1459,6 +1459,7 @@ static void found_guilty_entry(struct blame_entry *ent)
 				printf("boundary\n");
 		}
 		write_filename_info(suspect->path);
+		flush_or_die(stdout, "stdout");
 	}
 }
 

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-26 17:11           ` Theodore Tso
@ 2007-06-26 17:32             ` Linus Torvalds
  2007-06-26 22:04               ` Theodore Tso
  2007-06-28 19:04               ` Theodore Tso
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2007-06-26 17:32 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jim Meyering, git



On Tue, 26 Jun 2007, Theodore Tso wrote:
> 
> Is there something obviously wrong with doing something like this?
> 
> 	if ((fstat(fileno(stdout), &st) < 0) &&
> 	    !S_ISREG(st.st_mode))
> 		setbuf(stdout, NULL);
> 
> This would change stdout to use completely unbuffered I/O we're not
> sending the output to a file.

Well, we might as well keep it line-buffered, so I'd use setvbuf(_IOLBF) 
instead. Totally unbuffered is bad, since we often do printf's in smaller 
chunks.

But we actually _do_ want fully buffered from a performance angle. 
Especially for the big stuff, which is usually the *diffs*, not the commit 
messages. Not so much an issue with git-rev-list, but with "git log -p" 
you would normally not want it line-buffered, and it's actually much nicer 
to let it be fully buffered and then do a flush at the end.

Even pipes are often used for "throughput" stuff if you end up doing some 
post-processing (ie "git log -p | gather-statistics"), and yes, I actually 
do things like that - it's nice for things like looking at how many lines 
have been added during the last release cycle:

	git log -p v2.6.21.. | grep '^+' | wc -l

and I'd really like thigns like that to be close to optimal. 

How much the system call overhead is, I don't know, though. So it might be 
worth testing out. Under Linux, you'll probably have a fairly hard time 
seeing any difference, but under other OS's you have both system call 
latency issues *and* possible scheduling issues (ie the above kind of 
pipeline can act very differently from a scheduling standpoint if you send 
lots of small things rather than buffer things a bit on the generating 
side)

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-26 17:32             ` Linus Torvalds
@ 2007-06-26 22:04               ` Theodore Tso
  2007-06-26 22:32                 ` Linus Torvalds
  2007-06-28 19:04               ` Theodore Tso
  1 sibling, 1 reply; 33+ messages in thread
From: Theodore Tso @ 2007-06-26 22:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jim Meyering, git

On Tue, Jun 26, 2007 at 10:32:23AM -0700, Linus Torvalds wrote:
> But we actually _do_ want fully buffered from a performance angle. 
> Especially for the big stuff, which is usually the *diffs*, not the commit 
> messages. Not so much an issue with git-rev-list, but with "git log -p" 
> you would normally not want it line-buffered, and it's actually much nicer 
> to let it be fully buffered and then do a flush at the end.

Well, sometimes we do and sometimes we don't right?  Some of it
depends on how large the *stuff* we are dumping out (diffs vs. commit
objects), and what the receiving process on the other bit of the
pipeline is doing with the data --- is it doing some kind of
statistical reduction (git-rev-log .. | wc -l) versus some kind of
asynchronous processing (git-rev-log as used by gitk).

So maybe the answer is that when outputing to a file, we always use
full stdio buffering, and in other cases there should be some
intelligent defaults plus command-line overrides.  So when we emit
lists of commit identifiers, it should probably be line buffered, and
if it is diffs, it should probably be fully buffered, etc.?

						- Ted

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-26 22:04               ` Theodore Tso
@ 2007-06-26 22:32                 ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2007-06-26 22:32 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jim Meyering, git



On Tue, 26 Jun 2007, Theodore Tso wrote:
> On Tue, Jun 26, 2007 at 10:32:23AM -0700, Linus Torvalds wrote:
> > But we actually _do_ want fully buffered from a performance angle. 
> > Especially for the big stuff, which is usually the *diffs*, not the commit 
> > messages. Not so much an issue with git-rev-list, but with "git log -p" 
> > you would normally not want it line-buffered, and it's actually much nicer 
> > to let it be fully buffered and then do a flush at the end.
> 
> Well, sometimes we do and sometimes we don't right?

No.

We basically _never_ want "line buffered" or "unbuffered", which is what 
stdio knows how to do. That sucks in _all_ cases.

What we want is "fully buffered" for plain files, and "record buffered" 
for anything else (where a "record" is basically the "commit + optional 
diff").

We can get the record buffered by adding the fflush() calls, but the thing 
is, we'd want to _avoid_ that if it was a file. It's just that there is no 
way to set that kind of flag portably with stdio, we'd have to carry it 
around _separately_ from stdio, which is a big pain.

But if we decide that this only matters with stdout (which currently is 
what the patches have done), we could of course just make it a single 
global variable (like "stdout" itself already is). Then we could just make 
git.c start out by testing stdout at startup and setting the global 
variable.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 23:01         ` Linus Torvalds
@ 2007-06-27  8:56           ` Jim Meyering
  0 siblings, 0 replies; 33+ messages in thread
From: Jim Meyering @ 2007-06-27  8:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> wrote:
...
> So I didn't mean to imply that the new git.c code in 'next' is wrong, I
> just meant to imply that the "ferror()"+"fflush()" sequence that it uses
> and that I copied for my example is a very unreliable sequence, and it
> basically fails exactly because you can never know what caused the
> ferror() to trigger - if it *ever* triggers, you're basically screwed.
>
> I wonder how many applications actually ever use ferror() and friends.

At least among GNU programs, many do.
Here are a few:

    gcc, make, diff, grep, tar, find, xargs, gawk

And add to that 89 or 90 of the coreutils programs:

    $ git-grep -l close_stdout|grep 'src/.*\.c'|wc -l
    89

But none of those suppress EPIPE errors, so ferror works fine for them.
That's partly because the ferror-only situation is far less common
than the one in which we have a valid errno value.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-25 22:20     ` Linus Torvalds
  2007-06-25 22:56       ` Linus Torvalds
@ 2007-06-27  8:59       ` Jim Meyering
  2007-06-27 16:06         ` Linus Torvalds
  1 sibling, 1 reply; 33+ messages in thread
From: Jim Meyering @ 2007-06-27  8:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Mon, 25 Jun 2007, Jim Meyering wrote:
>> Remember: I'm trying to improve existing code here.
>> You should save some of your ire for the person who wrote that code.
>
> Ehh. Remind me who I should be pissed at when the old code was _better_
> before your change?

If someone pointed out a bug in coreutils whereby a tool failed
to give an accurate diagnostic, I would thank them (and honestly,
I'd be grateful), even if their attempt at fixing it introduced
a horrible bug.  Of course, if the original bug were mine, I might
be annoyed with myself, but certainly would not shoot the messenger.

> With the current git.c, we report write errors quite well.

Yes, I'm glad (in spite of the tone of much of your feedback)
that my little improvements are making it into git.

> We don't give
> the exact output you want, but on a scale of 1-10, how important is that?
> Pretty damn low on the list.

If you get a report of a git write error, you might well
appreciate knowing right away whether it was due to EIO or to ENOSPC.

> And the reason I'm really really irritated at you is that you ignore me
> when I tell you what your bugs are.

Be careful when throwing stones...
I mentioned (without making a fuss) a regression you introduced,
  http://thread.gmane.org/gmane.comp.version-control.git/50742/focus=50917
and you included similar code in a snippet in the very message to which
I'm replying now.  I do see that you omitted the troublesome ferror call
from your eventual patch.

>  - I *told* you that EPIPE is special. What did you do? Ignore my advice,
>    and made a broken patch that did exactly the opposite of what I told
>    you.

I did not ignore your advice.  I considered it carefully and explained
why I hold the opposing view.  We've already beaten this to death, but
for the record, I'd summarize it like this:

You and I have a difference of opinion.  I firmly believe that it is
unnecessary and counterproductive (it is prohibited by POSIX for many
of the tools I maintain) to suppress an EPIPE diagnostic and to exit
successfully in spite of a write error.  Since EPIPE doesn't even arise
in most normal situations, the only reason to do that is if you want
to pander to the needs of some theoretical, broken-by-design (i.e.,
SIGPIPE-ignoring) git Porcelain program.  You want to add such code to
git, at every point where writing to stdout may fail.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-27  8:59       ` Jim Meyering
@ 2007-06-27 16:06         ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2007-06-27 16:06 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git



On Wed, 27 Jun 2007, Jim Meyering wrote:
> 
> If someone pointed out a bug in coreutils whereby a tool failed
> to give an accurate diagnostic

The thing is, I don't think stdio output is "important" enough.

In fact, by definition, if the data was really important, it had better 
not use stdio.

So look at where git uses "real" IO, and where it uses stdio, and you'll 
see where the "diagnostics" matter.

In other words, the diagnostics matter for:
 - the actual git database (including refs).
 - doing things like "git checkout" that writes the working tree

Those are the ones where we want to *guarantee* correctness. And that 
means that they don't use stdio (and if any of them do, then that really 
*is* a bug, and should be fixed asap)

In contrast, if you redirect stdout, that's a totally different level of 
correctness, and all bets are off. Exactly because we use stdio. If we 
really cared, we shouldn't use stdio. 

> Be careful when throwing stones...
> I mentioned (without making a fuss) a regression you introduced,
>   http://thread.gmane.org/gmane.comp.version-control.git/50742/focus=50917
> and you included similar code in a snippet in the very message to which
> I'm replying now.  I do see that you omitted the troublesome ferror call
> from your eventual patch.

Right. There's a difference between an email saying "ok, this is how you 
might want to do it" and then actually sitting down and thinking about it 
and coding it.

If you look closer, you'll notice that my initial suggestion wouldn't even 
*compile*. 

By the time I sent out a patch, I had actually thought it through.

And no, I don't guarantee that I do that all the time. I try to, but hey, 
mistakes happen. It's when the mistake is repeated over and over after 
being pointed out that I get irritated.

> You and I have a difference of opinion.  I firmly believe that it is
> unnecessary and counterproductive (it is prohibited by POSIX for many
> of the tools I maintain) to suppress an EPIPE diagnostic and to exit
> successfully in spite of a write error.

Quite frankly, the tools you maintain are mostly filters in the first 
place! In other words, for you, stdout *matters*. That's the real data!

But look at something like "cp" one of these days, and ask yourself: 

 - do you check error returns for the IO operations that actually do the 
   copy (I hope yes). 

   I also hope to high heaven that you would never use stdio for that IO. 
   Because you *will* lose error information!

 - do you check the error returns of informational messages like the ones 
   from the verbose output switch (cp -v).

If you answered "no" to the second question, you should really think about 
that fact. Please realize that there's a *huge* difference in error 
checking between "core data" and "random information".

For something like "cat", the core data is the actual stream. You care a 
lot more about that. For git, the core data is the working tree and the 
database. I care about _that_. That's the stuff that should be protected 
and must never _ever_ fail silently.

The rest is really "statistics".

Let's get it right, but unlike core data, that's mostly a politeness 
issue, much less important than "oops, we just corrupted your archive".

And when we talk about politeness, avoiding the _idiotic_ error messages 
like "Broken pipe" is a big part of it! The "Broken pipe" message means 
either:
 - the receiver already got everything he wanted (and you should damn well 
   shut up about it)
 - the receiver had some problem (and it's damn well *that* which matters, 
   since the sender was fine, and you shouldn't confuse the issue by 
   writing bogus error messages)

The fact that you have some broken POSIX spec to worry about is not git's 
problem. I hold git to higher standards.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-26 17:32             ` Linus Torvalds
  2007-06-26 22:04               ` Theodore Tso
@ 2007-06-28 19:04               ` Theodore Tso
  2007-06-28 21:34                 ` Jeff King
  1 sibling, 1 reply; 33+ messages in thread
From: Theodore Tso @ 2007-06-28 19:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jim Meyering, git

On Tue, Jun 26, 2007 at 10:32:23AM -0700, Linus Torvalds wrote:
> But we actually _do_ want fully buffered from a performance angle. 
> Especially for the big stuff, which is usually the *diffs*, not the commit 
> messages. Not so much an issue with git-rev-list, but with "git log -p" 
> you would normally not want it line-buffered, and it's actually much nicer 
> to let it be fully buffered and then do a flush at the end.
> 
> Even pipes are often used for "throughput" stuff if you end up doing some 
> post-processing (ie "git log -p | gather-statistics"), and yes, I actually 
> do things like that - it's nice for things like looking at how many lines 
> have been added during the last release cycle
> 
> 	git log -p v2.6.21.. | grep '^+' | wc -l
> 
> and I'd really like thigns like that to be close to optimal. 
> 
> How much the system call overhead is, I don't know, though. So it might be 
> worth testing out....

So just for yuks, I devised the following patch, and did some measurements....

For the above pipeline, the result was hardly worth mentioning:

With flushing:

% time  git log -p v2.6.21.. | grep '^+' | wc -l

real    0m22.330s
user    0m21.512s
sys     0m0.807s

# of write() system calls according to strace -c: 15167

Without flushing:

% time  git log -p v2.6.21.. | grep '^+' | wc -l

real    0m22.367s
user    0m21.355s
sys     0m0.720s

# of write() system calls according to strace -c: 11373

So here's the worst case:

% time   git-rev-list HEAD  > /dev/null
real    0m1.575s
user    0m1.477s
sys     0m0.053s
# of write() system calls according to strace -c: 582

% (export GIT_NEVER_FLUSH_STDOUT=t; time   git-rev-list HEAD  > /dev/null) 
real    0m1.535s
user    0m1.463s
sys     0m0.027s
# of write() system calls according to strace -c: 58055

> Under Linux, you'll probably have a fairly hard time 
> seeing any difference, but under other OS's you have both system call 
> latency issues *and* possible scheduling issues (ie the above kind of 
> pipeline can act very differently from a scheduling standpoint if you send 
> lots of small things rather than buffer things a bit on the generating 
> side)

Indeed.  So it's not at all clear this patch is worth applying, but
maybe it would make a difference on some other OS; of course, this
patch also obviates the original intent of Jim Meyering's patch, since
it means that we won't print a useful error message if stdout has been
redirected to a file and the write returns ENOSPC, since we won't be
fflush() when stdout is redirected to a file.  

The added fflush() calls to the incremental git-blame and the
git-log-*/git-whatchanged might make it worthwhile for tools that
depend on those outputs and want faster user response time.  So maybe
adding the fflush() call might be worthwhile for those programs.

						- Ted


commit 7f483ec6366f62d52199e3edefa292a110fcb5c8
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Thu Jun 28 14:10:58 2007 -0400

    Don't fflush(stdout) when it's not helpful
    
    This patch arose from a discussion started by Jim Meyering's patch
    whose intention was to provide better diagnostics for failed writes.
    Linus proposed a better way to do things, which also had the added
    benefit that adding a fflush() to git-log-* operations and incremental
    git-blame operations could improve interactive respose time feel, at
    the cost of making things a bit slower when we aren't piping the
    output to a downstream program.
    
    This patch skips the fflush() calls when stdout is a regular file, or
    if the environment variable GIT_NEVER_FLUSH_STDOUT is set.  This
    latter can speed up a command such as:
    
    	(export GIT_NEVER_FLUSH_STDOUT=t; git-rev-list HEAD | wc -l)
    
    a tiny amount.
    
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/builtin-blame.c b/builtin-blame.c
index f7e2c13..da23a6f 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1459,6 +1459,7 @@ static void found_guilty_entry(struct blame_entry *ent)
 				printf("boundary\n");
 		}
 		write_filename_info(suspect->path);
+		maybe_flush_or_die(stdout, "stdout");
 	}
 }
 
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..86db8b0 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,7 @@ static void show_commit(struct commit *commit)
 		printf("%s%c", buf, hdr_termination);
 		free(buf);
 	}
-	fflush(stdout);
+	maybe_flush_or_die(stdout, "stdout");
 	if (commit->parents) {
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
diff --git a/cache.h b/cache.h
index ed83d92..0525c4e 100644
--- a/cache.h
+++ b/cache.h
@@ -532,6 +532,8 @@ extern char git_default_name[MAX_GITNAME];
 extern const char *git_commit_encoding;
 extern const char *git_log_output_encoding;
 
+/* IO helper functions */
+extern void maybe_flush_or_die(FILE *, const char *);
 extern int copy_fd(int ifd, int ofd);
 extern int read_in_full(int fd, void *buf, size_t count);
 extern int write_in_full(int fd, const void *buf, size_t count);
diff --git a/log-tree.c b/log-tree.c
index 0cf21bc..ced3f33 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		shown = 1;
 	}
 	opt->loginfo = NULL;
+	maybe_flush_or_die(stdout, "stdout");
 	return shown;
 }
diff --git a/write_or_die.c b/write_or_die.c
index 5c4bc85..2cebeb5 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,5 +1,43 @@
 #include "cache.h"
 
+/*
+ * Some cases use stdio, but want to flush after the write
+ * to get error handling (and to get better interactive
+ * behaviour - not buffering excessively).
+ *
+ * Of course, if the flush happened within the write itself,
+ * we've already lost the error code, and cannot report it any
+ * more. So we just ignore that case instead (and hope we get
+ * the right error code on the flush).
+ *
+ * If the file handle is stdout, and stdout is a file, then skip the
+ * flush entirely since it's not needed.
+ */
+void maybe_flush_or_die(FILE *f, const char *desc)
+{
+	static int stdout_is_file = -1;
+	struct stat st;
+
+	if (f == stdout) {
+		if (stdout_is_file < 0) {
+			if (getenv("GIT_NEVER_FLUSH_STDOUT") ||
+			    ((fstat(fileno(stdout), &st) == 0) &&
+			     S_ISREG(st.st_mode)))
+				stdout_is_file = 1;
+			else
+				stdout_is_file = 0;
+		}
+		if (stdout_is_file)
+			return;
+	}
+	if (fflush(f)) {
+		if (errno == EPIPE)
+			exit(0);
+		die("write failure on %s: %s",
+			desc, strerror(errno));
+	}
+}
+
 int read_in_full(int fd, void *buf, size_t count)
 {
 	char *p = buf;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] git-rev-list: give better diagnostic for failed write
  2007-06-28 19:04               ` Theodore Tso
@ 2007-06-28 21:34                 ` Jeff King
  2007-06-28 23:53                   ` [PATCH] Don't fflush(stdout) when it's not helpful Theodore Tso
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2007-06-28 21:34 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Linus Torvalds, Jim Meyering, git

On Thu, Jun 28, 2007 at 03:04:06PM -0400, Theodore Tso wrote:

>     This patch skips the fflush() calls when stdout is a regular file, or
>     if the environment variable GIT_NEVER_FLUSH_STDOUT is set.  This
>     latter can speed up a command such as:
>     
>     	(export GIT_NEVER_FLUSH_STDOUT=t; git-rev-list HEAD | wc -l)

I wonder if this would be more natural in the opposite form:

  GIT_FLUSH_STDOUT=1 git-rev-list HEAD

In general, you don't want to do the flushing unless:
  - it's going to the pager
  - some program is reading incrementally

In the first case, we can just turn on GIT_FLUSH_STDOUT when we kick off
the pager. In the second case, that program can just add the variable to
its invocation.

On top of which, in your patch the type of output trumps the environment
variable, which seems backwards. In other words, I can't do this:

GIT_FLUSH_EVEN_THOUGH_ITS_A_FILE=1 git-rev-list HEAD >file
[in another window] tail -f file

I would think an explicit preference from a variable should override any
guesses.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-28 21:34                 ` Jeff King
@ 2007-06-28 23:53                   ` Theodore Tso
  2007-06-29  1:05                     ` Frank Lichtenheld
  0 siblings, 1 reply; 33+ messages in thread
From: Theodore Tso @ 2007-06-28 23:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Jim Meyering, git

On Thu, Jun 28, 2007 at 05:34:51PM -0400, Jeff King wrote:
> On top of which, in your patch the type of output trumps the environment
> variable, which seems backwards. In other words, I can't do this:
> 
> GIT_FLUSH_EVEN_THOUGH_ITS_A_FILE=1 git-rev-list HEAD >file
> [in another window] tail -f file
> 
> I would think an explicit preference from a variable should override any
> guesses.

Good point.  Here's a revised patch where GIT_FLUSH=0 and GIT_FLASH=1
trumps any hueristics.

My comments about this making only trivial differences on Linux still
apply (although it does make things slightly more optimal).  I suspect
it might make more of a difference on MacOS, but I haven't had time to
benchmark it.

							- Ted

---
commit 422becc0f8430d2386ceed92f224a94c9047751e
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Thu Jun 28 14:10:58 2007 -0400

    Don't fflush(stdout) when it's not helpful
    
    This patch arose from a discussion started by Jim Meyering's patch
    whose intention was to provide better diagnostics for failed writes.
    Linus proposed a better way to do things, which also had the added
    benefit that adding a fflush() to git-log-* operations and incremental
    git-blame operations could improve interactive respose time feel, at
    the cost of making things a bit slower when we aren't piping the
    output to a downstream program.
    
    This patch skips the fflush() calls when stdout is a regular file, or
    if the environment variable GIT_FLUSH is set to "0".  This latter can
    speed up a command such as:
    
    GIT_FLUSH=0 strace -c -f -e write time git-rev-list HEAD | wc -l
    
    a tiny amount.
    
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 20b5b7b..8269148 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -396,6 +396,16 @@ other
 'GIT_PAGER'::
 	This environment variable overrides `$PAGER`.
 
+'GIT_FLUSH'::
+	If this environment variable is set to "1", then commands such
+	as git-blame (in incremental mode), git-rev-list, git-log,
+	git-whatchanged, etc., will force a flush of the output stream
+	after each commit-oriented record have been flushed.   If this
+	variable is set to "0", the output of these commands will be done
+	using completely buffered I/O.   If this environment variable is
+	not set, git will choose buffered or record-oriented flushing
+	based on whether stdout appears to be redirected to a file or not.
+
 'GIT_TRACE'::
 	If this variable is set to "1", "2" or "true" (comparison
 	is case insensitive), git will print `trace:` messages on
diff --git a/builtin-blame.c b/builtin-blame.c
index f7e2c13..da23a6f 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1459,6 +1459,7 @@ static void found_guilty_entry(struct blame_entry *ent)
 				printf("boundary\n");
 		}
 		write_filename_info(suspect->path);
+		maybe_flush_or_die(stdout, "stdout");
 	}
 }
 
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..86db8b0 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,7 @@ static void show_commit(struct commit *commit)
 		printf("%s%c", buf, hdr_termination);
 		free(buf);
 	}
-	fflush(stdout);
+	maybe_flush_or_die(stdout, "stdout");
 	if (commit->parents) {
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
diff --git a/cache.h b/cache.h
index ed83d92..0525c4e 100644
--- a/cache.h
+++ b/cache.h
@@ -532,6 +532,8 @@ extern char git_default_name[MAX_GITNAME];
 extern const char *git_commit_encoding;
 extern const char *git_log_output_encoding;
 
+/* IO helper functions */
+extern void maybe_flush_or_die(FILE *, const char *);
 extern int copy_fd(int ifd, int ofd);
 extern int read_in_full(int fd, void *buf, size_t count);
 extern int write_in_full(int fd, const void *buf, size_t count);
diff --git a/log-tree.c b/log-tree.c
index 0cf21bc..ced3f33 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		shown = 1;
 	}
 	opt->loginfo = NULL;
+	maybe_flush_or_die(stdout, "stdout");
 	return shown;
 }
diff --git a/write_or_die.c b/write_or_die.c
index 5c4bc85..9929a15 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,5 +1,47 @@
 #include "cache.h"
 
+/*
+ * Some cases use stdio, but want to flush after the write
+ * to get error handling (and to get better interactive
+ * behaviour - not buffering excessively).
+ *
+ * Of course, if the flush happened within the write itself,
+ * we've already lost the error code, and cannot report it any
+ * more. So we just ignore that case instead (and hope we get
+ * the right error code on the flush).
+ *
+ * If the file handle is stdout, and stdout is a file, then skip the
+ * flush entirely since it's not needed.
+ */
+void maybe_flush_or_die(FILE *f, const char *desc)
+{
+	static int stdout_is_file = -1;
+	struct stat st;
+	char *cp;
+
+	if (f == stdout) {
+		if (stdout_is_file < 0) {
+			cp = getenv("GIT_FLUSH");
+			if (cp)
+				stdout_is_file = (atoi(cp) == 0);
+			else if (getenv("GIT_NEVER_FLUSH_STDOUT") ||
+				 ((fstat(fileno(stdout), &st) == 0) &&
+				  S_ISREG(st.st_mode)))
+				stdout_is_file = 1;
+			else
+				stdout_is_file = 0;
+		}
+		if (stdout_is_file)
+			return;
+	}
+	if (fflush(f)) {
+		if (errno == EPIPE)
+			exit(0);
+		die("write failure on %s: %s",
+			desc, strerror(errno));
+	}
+}
+
 int read_in_full(int fd, void *buf, size_t count)
 {
 	char *p = buf;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-28 23:53                   ` [PATCH] Don't fflush(stdout) when it's not helpful Theodore Tso
@ 2007-06-29  1:05                     ` Frank Lichtenheld
  2007-06-29  3:48                       ` Theodore Tso
  0 siblings, 1 reply; 33+ messages in thread
From: Frank Lichtenheld @ 2007-06-29  1:05 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jeff King, Linus Torvalds, Jim Meyering, git

On Thu, Jun 28, 2007 at 07:53:19PM -0400, Theodore Tso wrote:
> Good point.  Here's a revised patch where GIT_FLUSH=0 and GIT_FLASH=1
> trumps any hueristics.

[...]

> +			else if (getenv("GIT_NEVER_FLUSH_STDOUT") ||
> +				 ((fstat(fileno(stdout), &st) == 0) &&
> +				  S_ISREG(st.st_mode)))
> +				stdout_is_file = 1;
> +			else
> +				stdout_is_file = 0;
> +		}

Any particular reason why you still check for GIT_NEVER_FLUSH_STDOUT in
addition to GIT_FLUSH?

Gruesse,
-- 
Frank Lichtenheld <frank@lichtenheld.de>
www: http://www.djpig.de/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-29  1:05                     ` Frank Lichtenheld
@ 2007-06-29  3:48                       ` Theodore Tso
  2007-06-29  6:38                         ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Theodore Tso @ 2007-06-29  3:48 UTC (permalink / raw)
  To: Frank Lichtenheld; +Cc: Jeff King, Linus Torvalds, Jim Meyering, git

On Fri, Jun 29, 2007 at 03:05:08AM +0200, Frank Lichtenheld wrote:
> Any particular reason why you still check for GIT_NEVER_FLUSH_STDOUT in
> addition to GIT_FLUSH?

Yup, I forgot to remove it.  :-)

					- Ted

---
commit 473065f89f27de476d12d774141009cd4f2600c4
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Thu Jun 28 14:10:58 2007 -0400

    Don't fflush(stdout) when it's not helpful
    
    This patch arose from a discussion started by Jim Meyering's patch
    whose intention was to provide better diagnostics for failed writes.
    Linus proposed a better way to do things, which also had the added
    benefit that adding a fflush() to git-log-* operations and incremental
    git-blame operations could improve interactive respose time feel, at
    the cost of making things a bit slower when we aren't piping the
    output to a downstream program.
    
    This patch skips the fflush() calls when stdout is a regular file, or
    if the environment variable GIT_FLUSH is set to "0".  This latter can
    speed up a command such as:
    
    GIT_FLUSH=0 strace -c -f -e write time git-rev-list HEAD | wc -l
    
    a tiny amount.
    
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 20b5b7b..8269148 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -396,6 +396,16 @@ other
 'GIT_PAGER'::
 	This environment variable overrides `$PAGER`.
 
+'GIT_FLUSH'::
+	If this environment variable is set to "1", then commands such
+	as git-blame (in incremental mode), git-rev-list, git-log,
+	git-whatchanged, etc., will force a flush of the output stream
+	after each commit-oriented record have been flushed.   If this
+	variable is set to "0", the output of these commands will be done
+	using completely buffered I/O.   If this environment variable is
+	not set, git will choose buffered or record-oriented flushing
+	based on whether stdout appears to be redirected to a file or not.
+
 'GIT_TRACE'::
 	If this variable is set to "1", "2" or "true" (comparison
 	is case insensitive), git will print `trace:` messages on
diff --git a/builtin-blame.c b/builtin-blame.c
index f7e2c13..da23a6f 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1459,6 +1459,7 @@ static void found_guilty_entry(struct blame_entry *ent)
 				printf("boundary\n");
 		}
 		write_filename_info(suspect->path);
+		maybe_flush_or_die(stdout, "stdout");
 	}
 }
 
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..86db8b0 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,7 @@ static void show_commit(struct commit *commit)
 		printf("%s%c", buf, hdr_termination);
 		free(buf);
 	}
-	fflush(stdout);
+	maybe_flush_or_die(stdout, "stdout");
 	if (commit->parents) {
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
diff --git a/cache.h b/cache.h
index ed83d92..0525c4e 100644
--- a/cache.h
+++ b/cache.h
@@ -532,6 +532,8 @@ extern char git_default_name[MAX_GITNAME];
 extern const char *git_commit_encoding;
 extern const char *git_log_output_encoding;
 
+/* IO helper functions */
+extern void maybe_flush_or_die(FILE *, const char *);
 extern int copy_fd(int ifd, int ofd);
 extern int read_in_full(int fd, void *buf, size_t count);
 extern int write_in_full(int fd, const void *buf, size_t count);
diff --git a/log-tree.c b/log-tree.c
index 0cf21bc..ced3f33 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		shown = 1;
 	}
 	opt->loginfo = NULL;
+	maybe_flush_or_die(stdout, "stdout");
 	return shown;
 }
diff --git a/write_or_die.c b/write_or_die.c
index 5c4bc85..3fd7c5c 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,5 +1,46 @@
 #include "cache.h"
 
+/*
+ * Some cases use stdio, but want to flush after the write
+ * to get error handling (and to get better interactive
+ * behaviour - not buffering excessively).
+ *
+ * Of course, if the flush happened within the write itself,
+ * we've already lost the error code, and cannot report it any
+ * more. So we just ignore that case instead (and hope we get
+ * the right error code on the flush).
+ *
+ * If the file handle is stdout, and stdout is a file, then skip the
+ * flush entirely since it's not needed.
+ */
+void maybe_flush_or_die(FILE *f, const char *desc)
+{
+	static int stdout_is_file = -1;
+	struct stat st;
+	char *cp;
+
+	if (f == stdout) {
+		if (stdout_is_file < 0) {
+			cp = getenv("GIT_FLUSH");
+			if (cp)
+				stdout_is_file = (atoi(cp) == 0);
+			else if ((fstat(fileno(stdout), &st) == 0) &&
+				 S_ISREG(st.st_mode))
+				stdout_is_file = 1;
+			else
+				stdout_is_file = 0;
+		}
+		if (stdout_is_file)
+			return;
+	}
+	if (fflush(f)) {
+		if (errno == EPIPE)
+			exit(0);
+		die("write failure on %s: %s",
+			desc, strerror(errno));
+	}
+}
+
 int read_in_full(int fd, void *buf, size_t count)
 {
 	char *p = buf;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-29  3:48                       ` Theodore Tso
@ 2007-06-29  6:38                         ` Jeff King
  2007-06-29  7:07                           ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2007-06-29  6:38 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Frank Lichtenheld, Linus Torvalds, Jim Meyering, git

On Thu, Jun 28, 2007 at 11:48:38PM -0400, Theodore Tso wrote:

> +void maybe_flush_or_die(FILE *f, const char *desc)
> +{
> +	static int stdout_is_file = -1;
> +	struct stat st;
> +	char *cp;
> +
> +	if (f == stdout) {
> +		if (stdout_is_file < 0) {
> +			cp = getenv("GIT_FLUSH");
> +			if (cp)
> +				stdout_is_file = (atoi(cp) == 0);
> +			else if ((fstat(fileno(stdout), &st) == 0) &&
> +				 S_ISREG(st.st_mode))
> +				stdout_is_file = 1;
> +			else
> +				stdout_is_file = 0;
> +		}
> +		if (stdout_is_file)
> +			return;
> +	}

Looks much better to me, but I have one minor nit: stdout_is_file is a
poor name, since it can mean either that stdout is a file, or that
flushing was explicitly turned off. Naming it something like
stdout_wants_flush would make much more sense. Though it's not a huge
deal since the function is fairly short, I think it makes things a
little easier to read (I had to double-check the negation on atoi(cp) ==
0 before I convinced myself the code was correct).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-29  6:38                         ` Jeff King
@ 2007-06-29  7:07                           ` Junio C Hamano
  2007-06-29 16:06                             ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2007-06-29  7:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Theodore Tso, Frank Lichtenheld, Linus Torvalds, Jim Meyering,
	git

Jeff King <peff@peff.net> writes:

> On Thu, Jun 28, 2007 at 11:48:38PM -0400, Theodore Tso wrote:
>
>> +void maybe_flush_or_die(FILE *f, const char *desc)
>> +{
>> +	static int stdout_is_file = -1;
>> +	struct stat st;
>> +	char *cp;
>> +
>> +	if (f == stdout) {
>> +		if (stdout_is_file < 0) {
>> +			cp = getenv("GIT_FLUSH");
>> +			if (cp)
>> +				stdout_is_file = (atoi(cp) == 0);
>> +			else if ((fstat(fileno(stdout), &st) == 0) &&
>> +				 S_ISREG(st.st_mode))
>> +				stdout_is_file = 1;
>> ...
>
> Looks much better to me, but I have one minor nit: stdout_is_file is a
> poor name,...

Thanks for bringing it up, as I had the same "Huh?" moment.
I would probably call that simply "do_not_flush".  Or name the
variable "flush_stdout" and swap all the logic.

	if (f == stdout) {
        	if (flush_stdout < 0) {
                	cp = getenv("GIT_FLUSH_STDOUT");
                        if (cp)
                        	flush_stdout = !!atoi(cp);
			else if ((fstat(fileno(stdout), &st) == 0) &&
				!S_ISREG(st.st_mode))
				flush_stdout = 0;
			else
                        	flush_stdout = 1;
		}
                if (!flush_stdout)
                	return;
	}

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-29  7:07                           ` Junio C Hamano
@ 2007-06-29 16:06                             ` Linus Torvalds
  2007-06-29 17:40                               ` Theodore Tso
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2007-06-29 16:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Theodore Tso, Frank Lichtenheld, Jim Meyering, git



On Fri, 29 Jun 2007, Junio C Hamano wrote:
> 
> Thanks for bringing it up, as I had the same "Huh?" moment.
> I would probably call that simply "do_not_flush".  Or name the
> variable "flush_stdout" and swap all the logic.

I think that patch looks fine, but I also think that there is a more 
fundamental problem with this approach:

 - all these patches basically break the whole _point_ of Jim's original 
   reason for wanting this!

So remember, we had two different reasons for flushing:

 - interactivity over a pipe. 

   Tools like "gitk" and "git gui blame" all run waiting for input, and we 
   want to give them the commit data as soon as possible.

 - error checking on a filesystem.

   Yes, "ferror()" _after_ the write shows an error happened, but doesn't 
   show what error it was. Doing a fflush() is a lot more likely to 
   actually give the right errors.

So non-files want flushing due to latency issues, and files want flushing 
due to error handling. And the patch under discussion basically broke the 
error handling.

(It turns out that it doesn't break it for the trivial "/dev/full" 
testcase, since that doesn't show up as a file, but it would break it for 
the *real* case of a filesystem that becomes full).

One option is to change the git.c thing to do

	if (fflush(stdout))
		die("write failure on standard output: %s", strerror(errno));
	if (ferror(stdout))
		die("unknown write failure on standard output");
	if (fclose(stdout))
		die("close failure on standard output: %s", strerror(errno));

where the "fflush()" is done first (regardless of ferror), just in the 
(probably futile) hope of getting the right errno.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-29 16:06                             ` Linus Torvalds
@ 2007-06-29 17:40                               ` Theodore Tso
  2007-06-29 23:43                                 ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Theodore Tso @ 2007-06-29 17:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Jeff King, Frank Lichtenheld, Jim Meyering, git

On Fri, Jun 29, 2007 at 09:06:22AM -0700, Linus Torvalds wrote:
> I think that patch looks fine, but I also think that there is a more 
> fundamental problem with this approach:
> 
>  - all these patches basically break the whole _point_ of Jim's original 
>    reason for wanting this!

Yeah, I pointed that out in my first patch.  It had seemed that
interactivity over a pipe was considered more important, though when
we started talking about things.  :-)

It looks like from my reading of the standard that ferror(f) should
not change the state of the file handle f.  So the following patch I
think should work; it checks ferror(f), and if it indicates that there
is an error, we try a flush to get the error message.  I've tested
under Linux and it gives the correct error message in the "git log >
/mnt/full-filesystem" case, and I believe it should DTRT on other
systems.

Comments?

						- Ted

commit 93a96f94028106687412acbb771bb18ee7ec5560
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Thu Jun 28 14:10:58 2007 -0400

    Don't fflush(stdout) when it's not helpful
    
    This patch arose from a discussion started by Jim Meyering's patch
    whose intention was to provide better diagnostics for failed writes.
    Linus proposed a better way to do things, which also had the added
    benefit that adding a fflush() to git-log-* operations and incremental
    git-blame operations could improve interactive respose time feel, at
    the cost of making things a bit slower when we aren't piping the
    output to a downstream program.
    
    This patch skips the fflush() calls when stdout is a regular file, or
    if the environment variable GIT_FLUSH is set to "0".  This latter can
    speed up a command such as:
    
    GIT_FLUSH=0 strace -c -f -e write time git-rev-list HEAD | wc -l
    
    a tiny amount.
    
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 20b5b7b..8269148 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -396,6 +396,16 @@ other
 'GIT_PAGER'::
 	This environment variable overrides `$PAGER`.
 
+'GIT_FLUSH'::
+	If this environment variable is set to "1", then commands such
+	as git-blame (in incremental mode), git-rev-list, git-log,
+	git-whatchanged, etc., will force a flush of the output stream
+	after each commit-oriented record have been flushed.   If this
+	variable is set to "0", the output of these commands will be done
+	using completely buffered I/O.   If this environment variable is
+	not set, git will choose buffered or record-oriented flushing
+	based on whether stdout appears to be redirected to a file or not.
+
 'GIT_TRACE'::
 	If this variable is set to "1", "2" or "true" (comparison
 	is case insensitive), git will print `trace:` messages on
diff --git a/builtin-blame.c b/builtin-blame.c
index f7e2c13..da23a6f 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1459,6 +1459,7 @@ static void found_guilty_entry(struct blame_entry *ent)
 				printf("boundary\n");
 		}
 		write_filename_info(suspect->path);
+		maybe_flush_or_die(stdout, "stdout");
 	}
 }
 
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..86db8b0 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,7 @@ static void show_commit(struct commit *commit)
 		printf("%s%c", buf, hdr_termination);
 		free(buf);
 	}
-	fflush(stdout);
+	maybe_flush_or_die(stdout, "stdout");
 	if (commit->parents) {
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
diff --git a/cache.h b/cache.h
index ed83d92..0525c4e 100644
--- a/cache.h
+++ b/cache.h
@@ -532,6 +532,8 @@ extern char git_default_name[MAX_GITNAME];
 extern const char *git_commit_encoding;
 extern const char *git_log_output_encoding;
 
+/* IO helper functions */
+extern void maybe_flush_or_die(FILE *, const char *);
 extern int copy_fd(int ifd, int ofd);
 extern int read_in_full(int fd, void *buf, size_t count);
 extern int write_in_full(int fd, const void *buf, size_t count);
diff --git a/log-tree.c b/log-tree.c
index 0cf21bc..ced3f33 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		shown = 1;
 	}
 	opt->loginfo = NULL;
+	maybe_flush_or_die(stdout, "stdout");
 	return shown;
 }
diff --git a/write_or_die.c b/write_or_die.c
index 5c4bc85..e125e11 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,5 +1,45 @@
 #include "cache.h"
 
+/*
+ * Some cases use stdio, but want to flush after the write
+ * to get error handling (and to get better interactive
+ * behaviour - not buffering excessively).
+ *
+ * Of course, if the flush happened within the write itself,
+ * we've already lost the error code, and cannot report it any
+ * more. So we just ignore that case instead (and hope we get
+ * the right error code on the flush).
+ *
+ * If the file handle is stdout, and stdout is a file, then skip the
+ * flush entirely since it's not needed.
+ */
+void maybe_flush_or_die(FILE *f, const char *desc)
+{
+	static int skip_stdout_flush = -1;
+	struct stat st;
+	char *cp;
+
+	if (f == stdout) {
+		if (skip_stdout_flush < 0) {
+			cp = getenv("GIT_FLUSH");
+			if (cp)
+				skip_stdout_flush = (atoi(cp) == 0);
+			else if ((fstat(fileno(stdout), &st) == 0) &&
+				 S_ISREG(st.st_mode))
+				skip_stdout_flush = 1;
+			else
+				skip_stdout_flush = 0;
+		}
+		if (skip_stdout_flush && !ferror(f))
+			return;
+	}
+	if (fflush(f)) {
+		if (errno == EPIPE)
+			exit(0);
+		die("write failure on %s: %s", desc, strerror(errno));
+	}
+}
+
 int read_in_full(int fd, void *buf, size_t count)
 {
 	char *p = buf;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-29 17:40                               ` Theodore Tso
@ 2007-06-29 23:43                                 ` Linus Torvalds
  2007-06-30  2:15                                   ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2007-06-29 23:43 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Junio C Hamano, Jeff King, Frank Lichtenheld, Jim Meyering, git



On Fri, 29 Jun 2007, Theodore Tso wrote:
> 
> Comments?

Looks ok to me. 

This should probably be paired up with the change to git.c (in "next") to 
do the "fflush()" before the "ferror()" too, in case the error is pending.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-29 23:43                                 ` Linus Torvalds
@ 2007-06-30  2:15                                   ` Junio C Hamano
  2007-06-30  4:24                                     ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2007-06-30  2:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Tso, Jeff King, Frank Lichtenheld, Jim Meyering, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, 29 Jun 2007, Theodore Tso wrote:
>> 
>> Comments?
>
> Looks ok to me. 
>
> This should probably be paired up with the change to git.c (in "next") to 
> do the "fflush()" before the "ferror()" too, in case the error is pending.

Do you mean this part?

+	/* Somebody closed stdout? */
+	if (fstat(fileno(stdout), &st))
+		return 0;
+	/* Ignore write errors for pipes and sockets.. */
+	if (S_ISFIFO(st.st_mode) || S_ISSOCK(st.st_mode))
+		return 0;
+
+	/* Check for ENOSPC and EIO errors.. */
+	if (ferror(stdout))
+		die("write failure on standard output");
+	if (fflush(stdout) || fclose(stdout))
+		die("write failure on standard output: %s", strerror(errno));
+
+	return 0;
+}

I was planning to push this out to 'master' this weekend.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-30  2:15                                   ` Junio C Hamano
@ 2007-06-30  4:24                                     ` Linus Torvalds
  2007-06-30 14:27                                       ` Theodore Tso
  2007-06-30 18:42                                       ` Junio C Hamano
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2007-06-30  4:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Theodore Tso, Jeff King, Frank Lichtenheld, Jim Meyering, git



On Fri, 29 Jun 2007, Junio C Hamano wrote:
> 
> Do you mean this part?
> 
> +	/* Somebody closed stdout? */
> +	if (fstat(fileno(stdout), &st))
> +		return 0;
> +	/* Ignore write errors for pipes and sockets.. */
> +	if (S_ISFIFO(st.st_mode) || S_ISSOCK(st.st_mode))
> +		return 0;
> +
> +	/* Check for ENOSPC and EIO errors.. */
> +	if (ferror(stdout))
> +		die("write failure on standard output");
> +	if (fflush(stdout) || fclose(stdout))
> +		die("write failure on standard output: %s", strerror(errno));
> +
> +	return 0;
> +}
> 
> I was planning to push this out to 'master' this weekend.

I think that code is fine, but switching the order around could probably 
make it less likely that stdio loses the errno for us. 

So doing the last part in a different order, and making it say

	/* Check for ENOSPC and EIO errors.. */
	if (fflush(stdout))
		die("write failure on standard output: %s", strerror(errno));
	if (ferror(stdout))
		die("unknown write failure on standard output");
	if (fclose(stdout))
		die("close failed on standard output: %s", strerror(errno));
	return 0;

may recover at least non-transient errors.

It's still not perfect. As I've been harping on, stdio simply isn't very 
good for error reporting. For example, if an IO error happened, you'd want 
to see EIO, wouldn't you? And yes, that's what the kernel would return. 
However, with buffered stdio (and flushing outside of our control), what 
would likely happen is that some intermediate error return _does_ return 
EIO, but then the kernel might decide to re-mount the filesystem read-only 
due to the error, and the actual *report* for us might be

	"write failure on standard output: read-only filesystem"

which lost the EIO. 

But even worse, if the output happened to be buffer-aligned, stdio will 
have thrown the error out entirely, and the "fflush()" will return 0, and 
then we end up with that "unknown write failure" after all.

Or we might have had a ENOSPC at some point, but removed a temp-file, and 
the final fflush() doesn't error out at all, so now the incomplete write 
got done (with one or more buffer chunks missing), and we get "unknown 
write failure" again, because we again lost the ENOSPC.

So you basically cannot get "perfect" with stdio. It's impossible. But the 
above re-ordering will at least get you _closer_, and *most* of the time 
you'll get exactly the error you'd expect.

(I'm not a huge fan of "most of the time it works", but that's stdio for 
you).

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-30  4:24                                     ` Linus Torvalds
@ 2007-06-30 14:27                                       ` Theodore Tso
  2007-06-30 18:42                                       ` Junio C Hamano
  1 sibling, 0 replies; 33+ messages in thread
From: Theodore Tso @ 2007-06-30 14:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Jeff King, Frank Lichtenheld, Jim Meyering, git

On Fri, Jun 29, 2007 at 09:24:41PM -0700, Linus Torvalds wrote:
> So you basically cannot get "perfect" with stdio. It's impossible. But the 
> above re-ordering will at least get you _closer_, and *most* of the time 
> you'll get exactly the error you'd expect.

Well, we *can* get perfect with stdio, but I can pretty much guarantee
no one will like it.  We could wrap printf, fprintf, purchar, et. al,
and check each of them for an error return.  The spec doesn't
currently specify that errno is supposed to be set, but a defect[1]
was filed last year saying that fprintf et.al, are indeed supposed to
set errno and not throw it away.

It would be a real pain in the *ss, though....

							- Ted

[1] http://www.opengroup.org/austin/mailarchives/ag-review/msg02161.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Don't fflush(stdout) when it's not helpful
  2007-06-30  4:24                                     ` Linus Torvalds
  2007-06-30 14:27                                       ` Theodore Tso
@ 2007-06-30 18:42                                       ` Junio C Hamano
  1 sibling, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2007-06-30 18:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Tso, Jeff King, Frank Lichtenheld, Jim Meyering, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

>> I was planning to push this out to 'master' this weekend.
>
> I think that code is fine, but switching the order around could probably 
> make it less likely that stdio loses the errno for us. 
>
> So doing the last part in a different order, and making it say
>
> 	/* Check for ENOSPC and EIO errors.. */
> 	if (fflush(stdout))
> 		die("write failure on standard output: %s", strerror(errno));
> 	if (ferror(stdout))
> 		die("unknown write failure on standard output");
> 	if (fclose(stdout))
> 		die("close failed on standard output: %s", strerror(errno));
> 	return 0;
>
> may recover at least non-transient errors.

That makes sense, to a certain degree, given that we do not
check every printf().  I'll forge your signature as usual ;-)

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2007-06-30 18:42 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-25 20:32 [PATCH] git-rev-list: give better diagnostic for failed write Jim Meyering
2007-06-25 20:59 ` Linus Torvalds
2007-06-25 21:52   ` Jim Meyering
2007-06-25 22:20     ` Linus Torvalds
2007-06-25 22:56       ` Linus Torvalds
2007-06-25 23:01         ` Linus Torvalds
2007-06-27  8:56           ` Jim Meyering
2007-06-25 23:16         ` Linus Torvalds
2007-06-26 17:11           ` Theodore Tso
2007-06-26 17:32             ` Linus Torvalds
2007-06-26 22:04               ` Theodore Tso
2007-06-26 22:32                 ` Linus Torvalds
2007-06-28 19:04               ` Theodore Tso
2007-06-28 21:34                 ` Jeff King
2007-06-28 23:53                   ` [PATCH] Don't fflush(stdout) when it's not helpful Theodore Tso
2007-06-29  1:05                     ` Frank Lichtenheld
2007-06-29  3:48                       ` Theodore Tso
2007-06-29  6:38                         ` Jeff King
2007-06-29  7:07                           ` Junio C Hamano
2007-06-29 16:06                             ` Linus Torvalds
2007-06-29 17:40                               ` Theodore Tso
2007-06-29 23:43                                 ` Linus Torvalds
2007-06-30  2:15                                   ` Junio C Hamano
2007-06-30  4:24                                     ` Linus Torvalds
2007-06-30 14:27                                       ` Theodore Tso
2007-06-30 18:42                                       ` Junio C Hamano
2007-06-26  9:06         ` [PATCH] git-rev-list: give better diagnostic for failed write Jeff King
2007-06-26 17:12           ` Linus Torvalds
2007-06-27  8:59       ` Jim Meyering
2007-06-27 16:06         ` Linus Torvalds
2007-06-25 21:39 ` Jim Meyering
2007-06-25 21:53   ` Linus Torvalds
2007-06-25 22:08     ` Jim Meyering

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).