git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Charles Bailey <charles@hashpling.org>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: [PATCH 4/7] cat-file: add --buffer option
Date: Mon, 22 Jun 2015 06:45:17 -0400	[thread overview]
Message-ID: <20150622104517.GD14475@peff.net> (raw)
In-Reply-To: <20150622103321.GB12584@peff.net>

We use a direct write() to output the results of --batch and
--batch-check. This is good for processes feeding the input
and reading the output interactively, but it introduces
measurable overhead if you do not want this feature. For
example, on linux.git:

  $ git rev-list --objects --all | cut -d' ' -f1 >objects
  $ time git cat-file --batch-check='%(objectsize)' \
          <objects >/dev/null
  real    0m5.440s
  user    0m5.060s
  sys     0m0.384s

This patch adds an option to use regular stdio buffering:

  $ time git cat-file --batch-check='%(objectsize)' \
          --buffer <objects >/dev/null
  real    0m4.975s
  user    0m4.888s
  sys     0m0.092s

Signed-off-by: Jeff King <peff@peff.net>
---
This selectively uses fwrite or write_or_die, depending on the buffer
setting. Another option would be to just always use fwrite(), and then
selectively fflush(). It feels kind of wasteful in the non-buffered
case, as it's just another layer to write through. OTOH, the cost of
writing a line into the buffer only to flush is probably dwarfed by the
system call of actually flushing.

If we went that direction, we could probably simplify the code a bit
(both getting rid of the batch_write function I call here, and dropping
a bunch of existing fflush() calls, where we must flush any time we use
printf for its formatting capabilities).

I also considered that the "--buffer" case is likely to be the common
one. We cannot flip the default, though, as it would break any existing
callers (who would need to specify "--no-buffer"). We can do the usual
deprecation dance, but I don't know if it is worth it for a plumbing
command like this.

 Documentation/git-cat-file.txt |  7 +++++++
 builtin/cat-file.c             | 26 +++++++++++++++++++-------
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 319ab4c..0058bd4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -69,6 +69,13 @@ OPTIONS
 	not be combined with any other options or arguments.  See the
 	section `BATCH OUTPUT` below for details.
 
+--buffer::
+	Normally batch output is flushed after each object is output, so
+	that a process can interactively read and write from
+	`cat-file`. With this option, the output uses normal stdio
+	buffering; this is much more efficient when invoking
+	`--batch-check` on a large number of objects.
+
 --allow-unknown-type::
 	Allow -s or -t to query broken/corrupt objects of unknown type.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index d4101b7..741e100 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -14,6 +14,7 @@ struct batch_options {
 	int enabled;
 	int follow_symlinks;
 	int print_contents;
+	int buffer_output;
 	const char *format;
 };
 
@@ -211,14 +212,25 @@ static size_t expand_format(struct strbuf *sb, const char *start, void *data)
 	return end - start + 1;
 }
 
-static void print_object_or_die(int fd, struct expand_data *data)
+static void batch_write(struct batch_options *opt, const void *data, int len)
+{
+	if (opt->buffer_output) {
+		if (fwrite(data, 1, len, stdout) != len)
+			die_errno("unable to write to stdout");
+	} else
+		write_or_die(1, data, len);
+}
+
+static void print_object_or_die(struct batch_options *opt, struct expand_data *data)
 {
 	const unsigned char *sha1 = data->sha1;
 
 	assert(data->info.typep);
 
 	if (data->type == OBJ_BLOB) {
-		if (stream_blob_to_fd(fd, sha1, NULL, 0) < 0)
+		if (opt->buffer_output)
+			fflush(stdout);
+		if (stream_blob_to_fd(1, sha1, NULL, 0) < 0)
 			die("unable to stream %s to stdout", sha1_to_hex(sha1));
 	}
 	else {
@@ -234,12 +246,11 @@ static void print_object_or_die(int fd, struct expand_data *data)
 		if (data->info.sizep && size != data->size)
 			die("object %s changed size!?", sha1_to_hex(sha1));
 
-		write_or_die(fd, contents, size);
+		batch_write(opt, contents, size);
 		free(contents);
 	}
 }
 
-
 static int batch_one_object(const char *obj_name, struct batch_options *opt,
 			    struct expand_data *data)
 {
@@ -294,12 +305,12 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt,
 
 	strbuf_expand(&buf, opt->format, expand_format, data);
 	strbuf_addch(&buf, '\n');
-	write_or_die(1, buf.buf, buf.len);
+	batch_write(opt, buf.buf, buf.len);
 	strbuf_release(&buf);
 
 	if (opt->print_contents) {
-		print_object_or_die(1, data);
-		write_or_die(1, "\n", 1);
+		print_object_or_die(opt, data);
+		batch_write(opt, "\n", 1);
 	}
 	return 0;
 }
@@ -415,6 +426,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			    N_("for blob objects, run textconv on object's content"), 'c'),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
+		OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")),
 		{ OPTION_CALLBACK, 0, "batch", &batch, "format",
 			N_("show info and content of objects fed from the standard input"),
 			PARSE_OPT_OPTARG, batch_option_callback },
-- 
2.4.4.719.g3984bc6

  parent reply	other threads:[~2015-06-22 10:45 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
2015-06-19  9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-19 18:28   ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-19 11:03   ` Remi Galan Alfonso
2015-06-19 11:06     ` Charles Bailey
2015-06-19 17:58   ` Junio C Hamano
2015-06-19 18:39     ` Junio C Hamano
2015-06-20 15:31       ` Jakub Narębski
2015-06-19 18:47     ` Jakub Narębski
2015-06-20 16:51     ` Charles Bailey
2015-06-20 17:47       ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 3/3] Add filter-objects command Charles Bailey
2015-06-19 10:10   ` Jeff King
2015-06-19 10:33     ` Charles Bailey
2015-06-19 10:52       ` Jeff King
2015-06-19 18:28         ` Junio C Hamano
2015-06-19 10:52       ` John Keeping
2015-06-19 11:04         ` Charles Bailey
2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
2015-06-21 18:25   ` [PATCH 1/2] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-21 18:25   ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-21 18:30     ` Charles Bailey
2015-06-22 22:03       ` Junio C Hamano
2015-06-22 22:08     ` Junio C Hamano
2015-06-22 22:09   ` Improvements to integer option parsing Junio C Hamano
2015-06-22 22:42     ` Charles Bailey
2015-06-21 19:20 ` Fast enumeration of objects Charles Bailey
2015-06-21 19:20   ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22  8:38     ` Jeff King
2015-06-22 10:33       ` Jeff King
2015-06-22 10:40         ` [PATCH 1/7] for_each_packed_object: automatically open pack index Jeff King
2015-06-22 10:40         ` [PATCH 2/7] cat-file: minor style fix in options list Jeff King
2015-06-22 10:41         ` [PATCH 3/7] cat-file: move batch_options definition to top of file Jeff King
2015-06-22 10:45         ` Jeff King [this message]
2015-06-22 10:45         ` [PATCH 5/7] cat-file: stop returning value from batch_one_object Jeff King
2015-06-22 10:45         ` [PATCH 6/7] cat-file: split batch_one_object into two stages Jeff King
2015-06-22 10:45         ` [PATCH 7/7] cat-file: add --batch-all-objects option Jeff King
2015-06-26  6:56           ` Eric Sunshine
2015-06-26 15:48             ` Jeff King
2015-06-22 11:06         ` [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects Jeff King
2015-06-22 22:03           ` Charles Bailey
2015-06-22 23:46             ` Jeff King
2015-06-22 21:48         ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22 21:50         ` Junio C Hamano
2015-06-22 23:50           ` Jeff King
2015-06-22 11:38       ` Charles Bailey
2015-06-22  9:57     ` Duy Nguyen
2015-06-22 10:24       ` Jeff King
2015-06-22  8:35   ` Fast enumeration of objects Jeff King
2015-06-22 19:44     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150622104517.GD14475@peff.net \
    --to=peff@peff.net \
    --cc=charles@hashpling.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).