All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Charles Bailey <charles@hashpling.org>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: [PATCH 4/7] cat-file: add --buffer option
Date: Mon, 22 Jun 2015 06:45:17 -0400	[thread overview]
Message-ID: <20150622104517.GD14475@peff.net> (raw)
In-Reply-To: <20150622103321.GB12584@peff.net>

We use a direct write() to output the results of --batch and
--batch-check. This is good for processes feeding the input
and reading the output interactively, but it introduces
measurable overhead if you do not want this feature. For
example, on linux.git:

  $ git rev-list --objects --all | cut -d' ' -f1 >objects
  $ time git cat-file --batch-check='%(objectsize)' \
          <objects >/dev/null
  real    0m5.440s
  user    0m5.060s
  sys     0m0.384s

This patch adds an option to use regular stdio buffering:

  $ time git cat-file --batch-check='%(objectsize)' \
          --buffer <objects >/dev/null
  real    0m4.975s
  user    0m4.888s
  sys     0m0.092s

Signed-off-by: Jeff King <peff@peff.net>
---
This selectively uses fwrite or write_or_die, depending on the buffer
setting. Another option would be to just always use fwrite(), and then
selectively fflush(). It feels kind of wasteful in the non-buffered
case, as it's just another layer to write through. OTOH, the cost of
writing a line into the buffer only to flush is probably dwarfed by the
system call of actually flushing.

If we went that direction, we could probably simplify the code a bit
(both getting rid of the batch_write function I call here, and dropping
a bunch of existing fflush() calls, where we must flush any time we use
printf for its formatting capabilities).

I also considered that the "--buffer" case is likely to be the common
one. We cannot flip the default, though, as it would break any existing
callers (who would need to specify "--no-buffer"). We can do the usual
deprecation dance, but I don't know if it is worth it for a plumbing
command like this.

 Documentation/git-cat-file.txt |  7 +++++++
 builtin/cat-file.c             | 26 +++++++++++++++++++-------
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 319ab4c..0058bd4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -69,6 +69,13 @@ OPTIONS
 	not be combined with any other options or arguments.  See the
 	section `BATCH OUTPUT` below for details.
 
+--buffer::
+	Normally batch output is flushed after each object is output, so
+	that a process can interactively read and write from
+	`cat-file`. With this option, the output uses normal stdio
+	buffering; this is much more efficient when invoking
+	`--batch-check` on a large number of objects.
+
 --allow-unknown-type::
 	Allow -s or -t to query broken/corrupt objects of unknown type.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index d4101b7..741e100 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -14,6 +14,7 @@ struct batch_options {
 	int enabled;
 	int follow_symlinks;
 	int print_contents;
+	int buffer_output;
 	const char *format;
 };
 
@@ -211,14 +212,25 @@ static size_t expand_format(struct strbuf *sb, const char *start, void *data)
 	return end - start + 1;
 }
 
-static void print_object_or_die(int fd, struct expand_data *data)
+static void batch_write(struct batch_options *opt, const void *data, int len)
+{
+	if (opt->buffer_output) {
+		if (fwrite(data, 1, len, stdout) != len)
+			die_errno("unable to write to stdout");
+	} else
+		write_or_die(1, data, len);
+}
+
+static void print_object_or_die(struct batch_options *opt, struct expand_data *data)
 {
 	const unsigned char *sha1 = data->sha1;
 
 	assert(data->info.typep);
 
 	if (data->type == OBJ_BLOB) {
-		if (stream_blob_to_fd(fd, sha1, NULL, 0) < 0)
+		if (opt->buffer_output)
+			fflush(stdout);
+		if (stream_blob_to_fd(1, sha1, NULL, 0) < 0)
 			die("unable to stream %s to stdout", sha1_to_hex(sha1));
 	}
 	else {
@@ -234,12 +246,11 @@ static void print_object_or_die(int fd, struct expand_data *data)
 		if (data->info.sizep && size != data->size)
 			die("object %s changed size!?", sha1_to_hex(sha1));
 
-		write_or_die(fd, contents, size);
+		batch_write(opt, contents, size);
 		free(contents);
 	}
 }
 
-
 static int batch_one_object(const char *obj_name, struct batch_options *opt,
 			    struct expand_data *data)
 {
@@ -294,12 +305,12 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt,
 
 	strbuf_expand(&buf, opt->format, expand_format, data);
 	strbuf_addch(&buf, '\n');
-	write_or_die(1, buf.buf, buf.len);
+	batch_write(opt, buf.buf, buf.len);
 	strbuf_release(&buf);
 
 	if (opt->print_contents) {
-		print_object_or_die(1, data);
-		write_or_die(1, "\n", 1);
+		print_object_or_die(opt, data);
+		batch_write(opt, "\n", 1);
 	}
 	return 0;
 }
@@ -415,6 +426,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			    N_("for blob objects, run textconv on object's content"), 'c'),
 		OPT_BOOL(0, "allow-unknown-type", &unknown_type,
 			  N_("allow -s and -t to work with broken/corrupt objects")),
+		OPT_BOOL(0, "buffer", &batch.buffer_output, N_("buffer --batch output")),
 		{ OPTION_CALLBACK, 0, "batch", &batch, "format",
 			N_("show info and content of objects fed from the standard input"),
 			PARSE_OPT_OPTARG, batch_option_callback },
-- 
2.4.4.719.g3984bc6

  parent reply	other threads:[~2015-06-22 10:45 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
2015-06-19  9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-19 18:28   ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-19 11:03   ` Remi Galan Alfonso
2015-06-19 11:06     ` Charles Bailey
2015-06-19 17:58   ` Junio C Hamano
2015-06-19 18:39     ` Junio C Hamano
2015-06-20 15:31       ` Jakub Narębski
2015-06-19 18:47     ` Jakub Narębski
2015-06-20 16:51     ` Charles Bailey
2015-06-20 17:47       ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 3/3] Add filter-objects command Charles Bailey
2015-06-19 10:10   ` Jeff King
2015-06-19 10:33     ` Charles Bailey
2015-06-19 10:52       ` Jeff King
2015-06-19 18:28         ` Junio C Hamano
2015-06-19 10:52       ` John Keeping
2015-06-19 11:04         ` Charles Bailey
2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
2015-06-21 18:25   ` [PATCH 1/2] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-21 18:25   ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-21 18:30     ` Charles Bailey
2015-06-22 22:03       ` Junio C Hamano
2015-06-22 22:08     ` Junio C Hamano
2015-06-22 22:09   ` Improvements to integer option parsing Junio C Hamano
2015-06-22 22:42     ` Charles Bailey
2015-06-21 19:20 ` Fast enumeration of objects Charles Bailey
2015-06-21 19:20   ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22  8:38     ` Jeff King
2015-06-22 10:33       ` Jeff King
2015-06-22 10:40         ` [PATCH 1/7] for_each_packed_object: automatically open pack index Jeff King
2015-06-22 10:40         ` [PATCH 2/7] cat-file: minor style fix in options list Jeff King
2015-06-22 10:41         ` [PATCH 3/7] cat-file: move batch_options definition to top of file Jeff King
2015-06-22 10:45         ` Jeff King [this message]
2015-06-22 10:45         ` [PATCH 5/7] cat-file: stop returning value from batch_one_object Jeff King
2015-06-22 10:45         ` [PATCH 6/7] cat-file: split batch_one_object into two stages Jeff King
2015-06-22 10:45         ` [PATCH 7/7] cat-file: add --batch-all-objects option Jeff King
2015-06-26  6:56           ` Eric Sunshine
2015-06-26 15:48             ` Jeff King
2015-06-22 11:06         ` [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects Jeff King
2015-06-22 22:03           ` Charles Bailey
2015-06-22 23:46             ` Jeff King
2015-06-22 21:48         ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22 21:50         ` Junio C Hamano
2015-06-22 23:50           ` Jeff King
2015-06-22 11:38       ` Charles Bailey
2015-06-22  9:57     ` Duy Nguyen
2015-06-22 10:24       ` Jeff King
2015-06-22  8:35   ` Fast enumeration of objects Jeff King
2015-06-22 19:44     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150622104517.GD14475@peff.net \
    --to=peff@peff.net \
    --cc=charles@hashpling.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.