From: Theodore Tso <tytso@mit.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jim Meyering <jim@meyering.net>, git@vger.kernel.org
Subject: Re: [PATCH] git-rev-list: give better diagnostic for failed write
Date: Thu, 28 Jun 2007 15:04:06 -0400 [thread overview]
Message-ID: <20070628190406.GC29279@thunk.org> (raw)
In-Reply-To: <alpine.LFD.0.98.0706261024210.8675@woody.linux-foundation.org>
On Tue, Jun 26, 2007 at 10:32:23AM -0700, Linus Torvalds wrote:
> But we actually _do_ want fully buffered from a performance angle.
> Especially for the big stuff, which is usually the *diffs*, not the commit
> messages. Not so much an issue with git-rev-list, but with "git log -p"
> you would normally not want it line-buffered, and it's actually much nicer
> to let it be fully buffered and then do a flush at the end.
>
> Even pipes are often used for "throughput" stuff if you end up doing some
> post-processing (ie "git log -p | gather-statistics"), and yes, I actually
> do things like that - it's nice for things like looking at how many lines
> have been added during the last release cycle
>
> git log -p v2.6.21.. | grep '^+' | wc -l
>
> and I'd really like thigns like that to be close to optimal.
>
> How much the system call overhead is, I don't know, though. So it might be
> worth testing out....
So just for yuks, I devised the following patch, and did some measurements....
For the above pipeline, the result was hardly worth mentioning:
With flushing:
% time git log -p v2.6.21.. | grep '^+' | wc -l
real 0m22.330s
user 0m21.512s
sys 0m0.807s
# of write() system calls according to strace -c: 15167
Without flushing:
% time git log -p v2.6.21.. | grep '^+' | wc -l
real 0m22.367s
user 0m21.355s
sys 0m0.720s
# of write() system calls according to strace -c: 11373
So here's the worst case:
% time git-rev-list HEAD > /dev/null
real 0m1.575s
user 0m1.477s
sys 0m0.053s
# of write() system calls according to strace -c: 582
% (export GIT_NEVER_FLUSH_STDOUT=t; time git-rev-list HEAD > /dev/null)
real 0m1.535s
user 0m1.463s
sys 0m0.027s
# of write() system calls according to strace -c: 58055
> Under Linux, you'll probably have a fairly hard time
> seeing any difference, but under other OS's you have both system call
> latency issues *and* possible scheduling issues (ie the above kind of
> pipeline can act very differently from a scheduling standpoint if you send
> lots of small things rather than buffer things a bit on the generating
> side)
Indeed. So it's not at all clear this patch is worth applying, but
maybe it would make a difference on some other OS; of course, this
patch also obviates the original intent of Jim Meyering's patch, since
it means that we won't print a useful error message if stdout has been
redirected to a file and the write returns ENOSPC, since we won't be
fflush() when stdout is redirected to a file.
The added fflush() calls to the incremental git-blame and the
git-log-*/git-whatchanged might make it worthwhile for tools that
depend on those outputs and want faster user response time. So maybe
adding the fflush() call might be worthwhile for those programs.
- Ted
commit 7f483ec6366f62d52199e3edefa292a110fcb5c8
Author: Theodore Ts'o <tytso@mit.edu>
Date: Thu Jun 28 14:10:58 2007 -0400
Don't fflush(stdout) when it's not helpful
This patch arose from a discussion started by Jim Meyering's patch
whose intention was to provide better diagnostics for failed writes.
Linus proposed a better way to do things, which also had the added
benefit that adding a fflush() to git-log-* operations and incremental
git-blame operations could improve interactive respose time feel, at
the cost of making things a bit slower when we aren't piping the
output to a downstream program.
This patch skips the fflush() calls when stdout is a regular file, or
if the environment variable GIT_NEVER_FLUSH_STDOUT is set. This
latter can speed up a command such as:
(export GIT_NEVER_FLUSH_STDOUT=t; git-rev-list HEAD | wc -l)
a tiny amount.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
diff --git a/builtin-blame.c b/builtin-blame.c
index f7e2c13..da23a6f 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1459,6 +1459,7 @@ static void found_guilty_entry(struct blame_entry *ent)
printf("boundary\n");
}
write_filename_info(suspect->path);
+ maybe_flush_or_die(stdout, "stdout");
}
}
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..86db8b0 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,7 @@ static void show_commit(struct commit *commit)
printf("%s%c", buf, hdr_termination);
free(buf);
}
- fflush(stdout);
+ maybe_flush_or_die(stdout, "stdout");
if (commit->parents) {
free_commit_list(commit->parents);
commit->parents = NULL;
diff --git a/cache.h b/cache.h
index ed83d92..0525c4e 100644
--- a/cache.h
+++ b/cache.h
@@ -532,6 +532,8 @@ extern char git_default_name[MAX_GITNAME];
extern const char *git_commit_encoding;
extern const char *git_log_output_encoding;
+/* IO helper functions */
+extern void maybe_flush_or_die(FILE *, const char *);
extern int copy_fd(int ifd, int ofd);
extern int read_in_full(int fd, void *buf, size_t count);
extern int write_in_full(int fd, const void *buf, size_t count);
diff --git a/log-tree.c b/log-tree.c
index 0cf21bc..ced3f33 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
shown = 1;
}
opt->loginfo = NULL;
+ maybe_flush_or_die(stdout, "stdout");
return shown;
}
diff --git a/write_or_die.c b/write_or_die.c
index 5c4bc85..2cebeb5 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,5 +1,43 @@
#include "cache.h"
+/*
+ * Some cases use stdio, but want to flush after the write
+ * to get error handling (and to get better interactive
+ * behaviour - not buffering excessively).
+ *
+ * Of course, if the flush happened within the write itself,
+ * we've already lost the error code, and cannot report it any
+ * more. So we just ignore that case instead (and hope we get
+ * the right error code on the flush).
+ *
+ * If the file handle is stdout, and stdout is a file, then skip the
+ * flush entirely since it's not needed.
+ */
+void maybe_flush_or_die(FILE *f, const char *desc)
+{
+ static int stdout_is_file = -1;
+ struct stat st;
+
+ if (f == stdout) {
+ if (stdout_is_file < 0) {
+ if (getenv("GIT_NEVER_FLUSH_STDOUT") ||
+ ((fstat(fileno(stdout), &st) == 0) &&
+ S_ISREG(st.st_mode)))
+ stdout_is_file = 1;
+ else
+ stdout_is_file = 0;
+ }
+ if (stdout_is_file)
+ return;
+ }
+ if (fflush(f)) {
+ if (errno == EPIPE)
+ exit(0);
+ die("write failure on %s: %s",
+ desc, strerror(errno));
+ }
+}
+
int read_in_full(int fd, void *buf, size_t count)
{
char *p = buf;
next prev parent reply other threads:[~2007-06-28 19:04 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-25 20:32 [PATCH] git-rev-list: give better diagnostic for failed write Jim Meyering
2007-06-25 20:59 ` Linus Torvalds
2007-06-25 21:52 ` Jim Meyering
2007-06-25 22:20 ` Linus Torvalds
2007-06-25 22:56 ` Linus Torvalds
2007-06-25 23:01 ` Linus Torvalds
2007-06-27 8:56 ` Jim Meyering
2007-06-25 23:16 ` Linus Torvalds
2007-06-26 17:11 ` Theodore Tso
2007-06-26 17:32 ` Linus Torvalds
2007-06-26 22:04 ` Theodore Tso
2007-06-26 22:32 ` Linus Torvalds
2007-06-28 19:04 ` Theodore Tso [this message]
2007-06-28 21:34 ` Jeff King
2007-06-28 23:53 ` [PATCH] Don't fflush(stdout) when it's not helpful Theodore Tso
2007-06-29 1:05 ` Frank Lichtenheld
2007-06-29 3:48 ` Theodore Tso
2007-06-29 6:38 ` Jeff King
2007-06-29 7:07 ` Junio C Hamano
2007-06-29 16:06 ` Linus Torvalds
2007-06-29 17:40 ` Theodore Tso
2007-06-29 23:43 ` Linus Torvalds
2007-06-30 2:15 ` Junio C Hamano
2007-06-30 4:24 ` Linus Torvalds
2007-06-30 14:27 ` Theodore Tso
2007-06-30 18:42 ` Junio C Hamano
2007-06-26 9:06 ` [PATCH] git-rev-list: give better diagnostic for failed write Jeff King
2007-06-26 17:12 ` Linus Torvalds
2007-06-27 8:59 ` Jim Meyering
2007-06-27 16:06 ` Linus Torvalds
2007-06-25 21:39 ` Jim Meyering
2007-06-25 21:53 ` Linus Torvalds
2007-06-25 22:08 ` Jim Meyering
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070628190406.GC29279@thunk.org \
--to=tytso@mit.edu \
--cc=git@vger.kernel.org \
--cc=jim@meyering.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).