git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Michael Giuffrida" <michaelpg@chromium.org>,
	git@vger.kernel.org, "Jeff King" <peff@peff.net>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [BUG] add_again() off-by-one error in custom format
Date: Wed, 14 Jun 2017 20:24:25 +0200	[thread overview]
Message-ID: <dae96f72-761c-3ed1-4567-0933acc7618a@web.de> (raw)
In-Reply-To: <xmqqwp8f4mb2.fsf@gitster.mtv.corp.google.com>

Am 13.06.2017 um 23:20 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
> 
>> The difference is about the same as the one between:
>>
>> 	$ time git log --format="" >/dev/null
>>
>> 	real    0m0.463s
>> 	user    0m0.448s
>> 	sys     0m0.012s
>>
>> and:
>>
>> 	$ time git log --format="%h" >/dev/null
>>
>> 	real    0m1.062s
>> 	user    0m0.636s
>> 	sys     0m0.416s
>>
>> With caching duplicates are basically free and without it short
>> hashes have to be looked up again.  Other placeholders may reduce
>> the relative slowdown, depending on how expensive they are.
> 
> I think the real question is how likely people use more than one
> occurrence of the same thing in their custom format, and how deeply
> they care that --format='%h %h' costs more than --format='%h'.  The
> cost won't of course be double (because the main traversal costs
> without any output), but it would be rather unreasonable to expect
> that --format='%h %h %h %h %h' to cost the same as --format='%h';
> after all, Git is doing more for them ;-)

The answer to the first half is obviously "very likely" -- otherwise
this bug wouldn't have been found, right? :)

Regarding the question of how bad a 50% slowdown for a second %h
would be: No idea.  If ran interactively it may not even be noticeable
because the user can read the first few lines in less while the rest
is prepared in the background.  We don't have a perf test for formats
with duplicate short hashes, so we don't promise anything, right? :)

René

-- >8 --
Subject: [PATCH] pretty: recalculate duplicate short hashes

b9c6232138 (--format=pretty: avoid calculating expensive expansions
twice) optimized adding short hashes multiple times by using the
fact that the output strbuf was only ever simply appended to and
copying the added string from the previous run.  That prerequisite
is no longer given; we now have modfiers like %< and %+ that can
cause the cache to lose track of the correct offsets.  Remove it.

Reported-by: Michael Giuffrida <michaelpg@chromium.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
I'm sending this out in the hope that there might be a simple way
to fix it after all, like Gábor's patch does for %+.  %< and %>
seem to be the only other problematic modifiers for now -- I'm
actually surprised that %w seems to be OK.

 pretty.c | 32 --------------------------------
 strbuf.c |  7 -------
 strbuf.h |  6 ------
 3 files changed, 45 deletions(-)

diff --git a/pretty.c b/pretty.c
index 09701bd2ff..cc099dfdd1 100644
--- a/pretty.c
+++ b/pretty.c
@@ -783,29 +783,9 @@ struct format_commit_context {
 	size_t body_off;
 
 	/* The following ones are relative to the result struct strbuf. */
-	struct chunk abbrev_commit_hash;
-	struct chunk abbrev_tree_hash;
-	struct chunk abbrev_parent_hashes;
 	size_t wrap_start;
 };
 
-static int add_again(struct strbuf *sb, struct chunk *chunk)
-{
-	if (chunk->len) {
-		strbuf_adddup(sb, chunk->off, chunk->len);
-		return 1;
-	}
-
-	/*
-	 * We haven't seen this chunk before.  Our caller is surely
-	 * going to add it the hard way now.  Remember the most likely
-	 * start of the to-be-added chunk: the current end of the
-	 * struct strbuf.
-	 */
-	chunk->off = sb->len;
-	return 0;
-}
-
 static void parse_commit_header(struct format_commit_context *context)
 {
 	const char *msg = context->message;
@@ -1147,24 +1127,16 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 		return 1;
 	case 'h':		/* abbreviated commit hash */
 		strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_COMMIT));
-		if (add_again(sb, &c->abbrev_commit_hash)) {
-			strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_RESET));
-			return 1;
-		}
 		strbuf_add_unique_abbrev(sb, commit->object.oid.hash,
 					 c->pretty_ctx->abbrev);
 		strbuf_addstr(sb, diff_get_color(c->auto_color, DIFF_RESET));
-		c->abbrev_commit_hash.len = sb->len - c->abbrev_commit_hash.off;
 		return 1;
 	case 'T':		/* tree hash */
 		strbuf_addstr(sb, oid_to_hex(&commit->tree->object.oid));
 		return 1;
 	case 't':		/* abbreviated tree hash */
-		if (add_again(sb, &c->abbrev_tree_hash))
-			return 1;
 		strbuf_add_unique_abbrev(sb, commit->tree->object.oid.hash,
 					 c->pretty_ctx->abbrev);
-		c->abbrev_tree_hash.len = sb->len - c->abbrev_tree_hash.off;
 		return 1;
 	case 'P':		/* parent hashes */
 		for (p = commit->parents; p; p = p->next) {
@@ -1174,16 +1146,12 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 		}
 		return 1;
 	case 'p':		/* abbreviated parent hashes */
-		if (add_again(sb, &c->abbrev_parent_hashes))
-			return 1;
 		for (p = commit->parents; p; p = p->next) {
 			if (p != commit->parents)
 				strbuf_addch(sb, ' ');
 			strbuf_add_unique_abbrev(sb, p->item->object.oid.hash,
 						 c->pretty_ctx->abbrev);
 		}
-		c->abbrev_parent_hashes.len = sb->len -
-		                              c->abbrev_parent_hashes.off;
 		return 1;
 	case 'm':		/* left/right/bottom */
 		strbuf_addstr(sb, get_revision_mark(NULL, commit));
diff --git a/strbuf.c b/strbuf.c
index 00457940cf..9103bc75e4 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -204,13 +204,6 @@ void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2)
 	strbuf_setlen(sb, sb->len + sb2->len);
 }
 
-void strbuf_adddup(struct strbuf *sb, size_t pos, size_t len)
-{
-	strbuf_grow(sb, len);
-	memcpy(sb->buf + sb->len, sb->buf + pos, len);
-	strbuf_setlen(sb, sb->len + len);
-}
-
 void strbuf_addchars(struct strbuf *sb, int c, size_t n)
 {
 	strbuf_grow(sb, n);
diff --git a/strbuf.h b/strbuf.h
index 80047b1bb7..d785258649 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -264,12 +264,6 @@ static inline void strbuf_addstr(struct strbuf *sb, const char *s)
 extern void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2);
 
 /**
- * Copy part of the buffer from a given position till a given length to the
- * end of the buffer.
- */
-extern void strbuf_adddup(struct strbuf *sb, size_t pos, size_t len);
-
-/**
  * This function can be used to expand a format string containing
  * placeholders. To that end, it parses the string and calls the specified
  * function for every percent sign found.
-- 
2.13.0


  reply	other threads:[~2017-06-14 18:24 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-12  3:13 [BUG] add_again() off-by-one error in custom format Michael Giuffrida
2017-06-12 22:49 ` Junio C Hamano
2017-06-13 18:09   ` René Scharfe
2017-06-13 18:29     ` Junio C Hamano
2017-06-13 20:29       ` René Scharfe
2017-06-13 21:20         ` Junio C Hamano
2017-06-14 18:24           ` René Scharfe [this message]
2017-06-15  5:56             ` Jeff King
2017-06-15 11:33               ` René Scharfe
2017-06-15 13:25                 ` Jeff King
2017-06-18 10:58                   ` René Scharfe
2017-06-18 11:49                     ` Jeff King
2017-06-18 12:59                       ` René Scharfe
2017-06-18 13:56                         ` Jeff King
2017-06-22 18:19                           ` René Scharfe
2017-06-22 23:15                             ` Jeff King
2017-06-18 10:58                   ` René Scharfe
2017-06-18 11:50                     ` Jeff King
2017-06-19  4:46                       ` Junio C Hamano
2017-06-22 18:19                         ` [PATCH] sha1_name: cache readdir(3) results in find_short_object_filename() René Scharfe
2017-06-22 23:10                           ` Jeff King
2017-06-24 12:12                             ` René Scharfe
2017-06-24 12:14                               ` Jeff King
2017-06-24 12:12                             ` René Scharfe
2017-06-24 12:20                               ` Jeff King
2017-06-24 14:09                                 ` René Scharfe
2017-06-24 14:12                                   ` Jeff King
2017-06-15 18:37             ` [BUG] add_again() off-by-one error in custom format Junio C Hamano
2017-06-13 22:24         ` SZEDER Gábor
2017-06-14 17:34           ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dae96f72-761c-3ed1-4567-0933acc7618a@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michaelpg@chromium.org \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).