git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jim Meyering <jim@meyering.net>, git@vger.kernel.org
Subject: Re: [PATCH] git-rev-list: give better diagnostic for failed write
Date: Thu, 28 Jun 2007 15:04:06 -0400	[thread overview]
Message-ID: <20070628190406.GC29279@thunk.org> (raw)
In-Reply-To: <alpine.LFD.0.98.0706261024210.8675@woody.linux-foundation.org>

On Tue, Jun 26, 2007 at 10:32:23AM -0700, Linus Torvalds wrote:
> But we actually _do_ want fully buffered from a performance angle. 
> Especially for the big stuff, which is usually the *diffs*, not the commit 
> messages. Not so much an issue with git-rev-list, but with "git log -p" 
> you would normally not want it line-buffered, and it's actually much nicer 
> to let it be fully buffered and then do a flush at the end.
> 
> Even pipes are often used for "throughput" stuff if you end up doing some 
> post-processing (ie "git log -p | gather-statistics"), and yes, I actually 
> do things like that - it's nice for things like looking at how many lines 
> have been added during the last release cycle
> 
> 	git log -p v2.6.21.. | grep '^+' | wc -l
> 
> and I'd really like thigns like that to be close to optimal. 
> 
> How much the system call overhead is, I don't know, though. So it might be 
> worth testing out....

So just for yuks, I devised the following patch, and did some measurements....

For the above pipeline, the result was hardly worth mentioning:

With flushing:

% time  git log -p v2.6.21.. | grep '^+' | wc -l

real    0m22.330s
user    0m21.512s
sys     0m0.807s

# of write() system calls according to strace -c: 15167

Without flushing:

% time  git log -p v2.6.21.. | grep '^+' | wc -l

real    0m22.367s
user    0m21.355s
sys     0m0.720s

# of write() system calls according to strace -c: 11373

So here's the worst case:

% time   git-rev-list HEAD  > /dev/null
real    0m1.575s
user    0m1.477s
sys     0m0.053s
# of write() system calls according to strace -c: 582

% (export GIT_NEVER_FLUSH_STDOUT=t; time   git-rev-list HEAD  > /dev/null) 
real    0m1.535s
user    0m1.463s
sys     0m0.027s
# of write() system calls according to strace -c: 58055

> Under Linux, you'll probably have a fairly hard time 
> seeing any difference, but under other OS's you have both system call 
> latency issues *and* possible scheduling issues (ie the above kind of 
> pipeline can act very differently from a scheduling standpoint if you send 
> lots of small things rather than buffer things a bit on the generating 
> side)

Indeed.  So it's not at all clear this patch is worth applying, but
maybe it would make a difference on some other OS; of course, this
patch also obviates the original intent of Jim Meyering's patch, since
it means that we won't print a useful error message if stdout has been
redirected to a file and the write returns ENOSPC, since we won't be
fflush() when stdout is redirected to a file.  

The added fflush() calls to the incremental git-blame and the
git-log-*/git-whatchanged might make it worthwhile for tools that
depend on those outputs and want faster user response time.  So maybe
adding the fflush() call might be worthwhile for those programs.

						- Ted


commit 7f483ec6366f62d52199e3edefa292a110fcb5c8
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Thu Jun 28 14:10:58 2007 -0400

    Don't fflush(stdout) when it's not helpful
    
    This patch arose from a discussion started by Jim Meyering's patch
    whose intention was to provide better diagnostics for failed writes.
    Linus proposed a better way to do things, which also had the added
    benefit that adding a fflush() to git-log-* operations and incremental
    git-blame operations could improve interactive respose time feel, at
    the cost of making things a bit slower when we aren't piping the
    output to a downstream program.
    
    This patch skips the fflush() calls when stdout is a regular file, or
    if the environment variable GIT_NEVER_FLUSH_STDOUT is set.  This
    latter can speed up a command such as:
    
    	(export GIT_NEVER_FLUSH_STDOUT=t; git-rev-list HEAD | wc -l)
    
    a tiny amount.
    
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/builtin-blame.c b/builtin-blame.c
index f7e2c13..da23a6f 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1459,6 +1459,7 @@ static void found_guilty_entry(struct blame_entry *ent)
 				printf("boundary\n");
 		}
 		write_filename_info(suspect->path);
+		maybe_flush_or_die(stdout, "stdout");
 	}
 }
 
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..86db8b0 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,7 @@ static void show_commit(struct commit *commit)
 		printf("%s%c", buf, hdr_termination);
 		free(buf);
 	}
-	fflush(stdout);
+	maybe_flush_or_die(stdout, "stdout");
 	if (commit->parents) {
 		free_commit_list(commit->parents);
 		commit->parents = NULL;
diff --git a/cache.h b/cache.h
index ed83d92..0525c4e 100644
--- a/cache.h
+++ b/cache.h
@@ -532,6 +532,8 @@ extern char git_default_name[MAX_GITNAME];
 extern const char *git_commit_encoding;
 extern const char *git_log_output_encoding;
 
+/* IO helper functions */
+extern void maybe_flush_or_die(FILE *, const char *);
 extern int copy_fd(int ifd, int ofd);
 extern int read_in_full(int fd, void *buf, size_t count);
 extern int write_in_full(int fd, const void *buf, size_t count);
diff --git a/log-tree.c b/log-tree.c
index 0cf21bc..ced3f33 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		shown = 1;
 	}
 	opt->loginfo = NULL;
+	maybe_flush_or_die(stdout, "stdout");
 	return shown;
 }
diff --git a/write_or_die.c b/write_or_die.c
index 5c4bc85..2cebeb5 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,5 +1,43 @@
 #include "cache.h"
 
+/*
+ * Some cases use stdio, but want to flush after the write
+ * to get error handling (and to get better interactive
+ * behaviour - not buffering excessively).
+ *
+ * Of course, if the flush happened within the write itself,
+ * we've already lost the error code, and cannot report it any
+ * more. So we just ignore that case instead (and hope we get
+ * the right error code on the flush).
+ *
+ * If the file handle is stdout, and stdout is a file, then skip the
+ * flush entirely since it's not needed.
+ */
+void maybe_flush_or_die(FILE *f, const char *desc)
+{
+	static int stdout_is_file = -1;
+	struct stat st;
+
+	if (f == stdout) {
+		if (stdout_is_file < 0) {
+			if (getenv("GIT_NEVER_FLUSH_STDOUT") ||
+			    ((fstat(fileno(stdout), &st) == 0) &&
+			     S_ISREG(st.st_mode)))
+				stdout_is_file = 1;
+			else
+				stdout_is_file = 0;
+		}
+		if (stdout_is_file)
+			return;
+	}
+	if (fflush(f)) {
+		if (errno == EPIPE)
+			exit(0);
+		die("write failure on %s: %s",
+			desc, strerror(errno));
+	}
+}
+
 int read_in_full(int fd, void *buf, size_t count)
 {
 	char *p = buf;

  parent reply	other threads:[~2007-06-28 19:04 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-25 20:32 [PATCH] git-rev-list: give better diagnostic for failed write Jim Meyering
2007-06-25 20:59 ` Linus Torvalds
2007-06-25 21:52   ` Jim Meyering
2007-06-25 22:20     ` Linus Torvalds
2007-06-25 22:56       ` Linus Torvalds
2007-06-25 23:01         ` Linus Torvalds
2007-06-27  8:56           ` Jim Meyering
2007-06-25 23:16         ` Linus Torvalds
2007-06-26 17:11           ` Theodore Tso
2007-06-26 17:32             ` Linus Torvalds
2007-06-26 22:04               ` Theodore Tso
2007-06-26 22:32                 ` Linus Torvalds
2007-06-28 19:04               ` Theodore Tso [this message]
2007-06-28 21:34                 ` Jeff King
2007-06-28 23:53                   ` [PATCH] Don't fflush(stdout) when it's not helpful Theodore Tso
2007-06-29  1:05                     ` Frank Lichtenheld
2007-06-29  3:48                       ` Theodore Tso
2007-06-29  6:38                         ` Jeff King
2007-06-29  7:07                           ` Junio C Hamano
2007-06-29 16:06                             ` Linus Torvalds
2007-06-29 17:40                               ` Theodore Tso
2007-06-29 23:43                                 ` Linus Torvalds
2007-06-30  2:15                                   ` Junio C Hamano
2007-06-30  4:24                                     ` Linus Torvalds
2007-06-30 14:27                                       ` Theodore Tso
2007-06-30 18:42                                       ` Junio C Hamano
2007-06-26  9:06         ` [PATCH] git-rev-list: give better diagnostic for failed write Jeff King
2007-06-26 17:12           ` Linus Torvalds
2007-06-27  8:59       ` Jim Meyering
2007-06-27 16:06         ` Linus Torvalds
2007-06-25 21:39 ` Jim Meyering
2007-06-25 21:53   ` Linus Torvalds
2007-06-25 22:08     ` Jim Meyering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070628190406.GC29279@thunk.org \
    --to=tytso@mit.edu \
    --cc=git@vger.kernel.org \
    --cc=jim@meyering.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).