From: Theodore Tso <tytso@mit.edu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jim Meyering <jim@meyering.net>, git@vger.kernel.org
Subject: Re: [PATCH] git-rev-list: give better diagnostic for failed write
Date: Thu, 28 Jun 2007 15:04:06 -0400 [thread overview]
Message-ID: <20070628190406.GC29279@thunk.org> (raw)
In-Reply-To: <alpine.LFD.0.98.0706261024210.8675@woody.linux-foundation.org>
On Tue, Jun 26, 2007 at 10:32:23AM -0700, Linus Torvalds wrote:
> But we actually _do_ want fully buffered from a performance angle.
> Especially for the big stuff, which is usually the *diffs*, not the commit
> messages. Not so much an issue with git-rev-list, but with "git log -p"
> you would normally not want it line-buffered, and it's actually much nicer
> to let it be fully buffered and then do a flush at the end.
>
> Even pipes are often used for "throughput" stuff if you end up doing some
> post-processing (ie "git log -p | gather-statistics"), and yes, I actually
> do things like that - it's nice for things like looking at how many lines
> have been added during the last release cycle
>
> git log -p v2.6.21.. | grep '^+' | wc -l
>
> and I'd really like thigns like that to be close to optimal.
>
> How much the system call overhead is, I don't know, though. So it might be
> worth testing out....
So just for yuks, I devised the following patch, and did some measurements....
For the above pipeline, the result was hardly worth mentioning:
With flushing:
% time git log -p v2.6.21.. | grep '^+' | wc -l
real 0m22.330s
user 0m21.512s
sys 0m0.807s
# of write() system calls according to strace -c: 15167
Without flushing:
% time git log -p v2.6.21.. | grep '^+' | wc -l
real 0m22.367s
user 0m21.355s
sys 0m0.720s
# of write() system calls according to strace -c: 11373
So here's the worst case:
% time git-rev-list HEAD > /dev/null
real 0m1.575s
user 0m1.477s
sys 0m0.053s
# of write() system calls according to strace -c: 582
% (export GIT_NEVER_FLUSH_STDOUT=t; time git-rev-list HEAD > /dev/null)
real 0m1.535s
user 0m1.463s
sys 0m0.027s
# of write() system calls according to strace -c: 58055
> Under Linux, you'll probably have a fairly hard time
> seeing any difference, but under other OS's you have both system call
> latency issues *and* possible scheduling issues (ie the above kind of
> pipeline can act very differently from a scheduling standpoint if you send
> lots of small things rather than buffer things a bit on the generating
> side)
Indeed. So it's not at all clear this patch is worth applying, but
maybe it would make a difference on some other OS; of course, this
patch also obviates the original intent of Jim Meyering's patch, since
it means that we won't print a useful error message if stdout has been
redirected to a file and the write returns ENOSPC, since we won't be
fflush() when stdout is redirected to a file.
The added fflush() calls to the incremental git-blame and the
git-log-*/git-whatchanged might make it worthwhile for tools that
depend on those outputs and want faster user response time. So maybe
adding the fflush() call might be worthwhile for those programs.
- Ted
commit 7f483ec6366f62d52199e3edefa292a110fcb5c8
Author: Theodore Ts'o <tytso@mit.edu>
Date: Thu Jun 28 14:10:58 2007 -0400
Don't fflush(stdout) when it's not helpful
This patch arose from a discussion started by Jim Meyering's patch
whose intention was to provide better diagnostics for failed writes.
Linus proposed a better way to do things, which also had the added
benefit that adding a fflush() to git-log-* operations and incremental
git-blame operations could improve interactive respose time feel, at
the cost of making things a bit slower when we aren't piping the
output to a downstream program.
This patch skips the fflush() calls when stdout is a regular file, or
if the environment variable GIT_NEVER_FLUSH_STDOUT is set. This
latter can speed up a command such as:
(export GIT_NEVER_FLUSH_STDOUT=t; git-rev-list HEAD | wc -l)
a tiny amount.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
diff --git a/builtin-blame.c b/builtin-blame.c
index f7e2c13..da23a6f 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1459,6 +1459,7 @@ static void found_guilty_entry(struct blame_entry *ent)
printf("boundary\n");
}
write_filename_info(suspect->path);
+ maybe_flush_or_die(stdout, "stdout");
}
}
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 813aadf..86db8b0 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -100,7 +100,7 @@ static void show_commit(struct commit *commit)
printf("%s%c", buf, hdr_termination);
free(buf);
}
- fflush(stdout);
+ maybe_flush_or_die(stdout, "stdout");
if (commit->parents) {
free_commit_list(commit->parents);
commit->parents = NULL;
diff --git a/cache.h b/cache.h
index ed83d92..0525c4e 100644
--- a/cache.h
+++ b/cache.h
@@ -532,6 +532,8 @@ extern char git_default_name[MAX_GITNAME];
extern const char *git_commit_encoding;
extern const char *git_log_output_encoding;
+/* IO helper functions */
+extern void maybe_flush_or_die(FILE *, const char *);
extern int copy_fd(int ifd, int ofd);
extern int read_in_full(int fd, void *buf, size_t count);
extern int write_in_full(int fd, const void *buf, size_t count);
diff --git a/log-tree.c b/log-tree.c
index 0cf21bc..ced3f33 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -408,5 +408,6 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
shown = 1;
}
opt->loginfo = NULL;
+ maybe_flush_or_die(stdout, "stdout");
return shown;
}
diff --git a/write_or_die.c b/write_or_die.c
index 5c4bc85..2cebeb5 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -1,5 +1,43 @@
#include "cache.h"
+/*
+ * Some cases use stdio, but want to flush after the write
+ * to get error handling (and to get better interactive
+ * behaviour - not buffering excessively).
+ *
+ * Of course, if the flush happened within the write itself,
+ * we've already lost the error code, and cannot report it any
+ * more. So we just ignore that case instead (and hope we get
+ * the right error code on the flush).
+ *
+ * If the file handle is stdout, and stdout is a file, then skip the
+ * flush entirely since it's not needed.
+ */
+void maybe_flush_or_die(FILE *f, const char *desc)
+{
+ static int stdout_is_file = -1;
+ struct stat st;
+
+ if (f == stdout) {
+ if (stdout_is_file < 0) {
+ if (getenv("GIT_NEVER_FLUSH_STDOUT") ||
+ ((fstat(fileno(stdout), &st) == 0) &&
+ S_ISREG(st.st_mode)))
+ stdout_is_file = 1;
+ else
+ stdout_is_file = 0;
+ }
+ if (stdout_is_file)
+ return;
+ }
+ if (fflush(f)) {
+ if (errno == EPIPE)
+ exit(0);
+ die("write failure on %s: %s",
+ desc, strerror(errno));
+ }
+}
+
int read_in_full(int fd, void *buf, size_t count)
{
char *p = buf;
next prev parent reply other threads:[~2007-06-28 19:04 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-25 20:32 [PATCH] git-rev-list: give better diagnostic for failed write Jim Meyering
2007-06-25 20:59 ` Linus Torvalds
2007-06-25 21:52 ` Jim Meyering
2007-06-25 22:20 ` Linus Torvalds
2007-06-25 22:56 ` Linus Torvalds
2007-06-25 23:01 ` Linus Torvalds
2007-06-27 8:56 ` Jim Meyering
2007-06-25 23:16 ` Linus Torvalds
2007-06-26 17:11 ` Theodore Tso
2007-06-26 17:32 ` Linus Torvalds
2007-06-26 22:04 ` Theodore Tso
2007-06-26 22:32 ` Linus Torvalds
2007-06-28 19:04 ` Theodore Tso [this message]
2007-06-28 21:34 ` Jeff King
2007-06-28 23:53 ` [PATCH] Don't fflush(stdout) when it's not helpful Theodore Tso
2007-06-29 1:05 ` Frank Lichtenheld
2007-06-29 3:48 ` Theodore Tso
2007-06-29 6:38 ` Jeff King
2007-06-29 7:07 ` Junio C Hamano
2007-06-29 16:06 ` Linus Torvalds
2007-06-29 17:40 ` Theodore Tso
2007-06-29 23:43 ` Linus Torvalds
2007-06-30 2:15 ` Junio C Hamano
2007-06-30 4:24 ` Linus Torvalds
2007-06-30 14:27 ` Theodore Tso
2007-06-30 18:42 ` Junio C Hamano
2007-06-26 9:06 ` [PATCH] git-rev-list: give better diagnostic for failed write Jeff King
2007-06-26 17:12 ` Linus Torvalds
2007-06-27 8:59 ` Jim Meyering
2007-06-27 16:06 ` Linus Torvalds
2007-06-25 21:39 ` Jim Meyering
2007-06-25 21:53 ` Linus Torvalds
2007-06-25 22:08 ` Jim Meyering
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070628190406.GC29279@thunk.org \
--to=tytso@mit.edu \
--cc=git@vger.kernel.org \
--cc=jim@meyering.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.