git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 2/2] cat-file: provide %(deltabase) batch format
Date: Sat, 21 Dec 2013 09:25:22 -0500	[thread overview]
Message-ID: <20131221142522.GB29679@sigill.intra.peff.net> (raw)
In-Reply-To: <20131221142336.GA28649@sigill.intra.peff.net>

It can be useful for debugging or analysis to see which
objects are stored as delta bases on top of others. This
information is available by running `git verify-pack`, but
that is extremely expensive (and is harder than necessary to
parse).

Instead, let's make it available as a cat-file query format,
which makes it fast and simple to get the bases for a subset
of the objects.

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-cat-file.txt | 12 +++++++++---
 builtin/cat-file.c             |  6 ++++++
 t/t1006-cat-file.sh            | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 322f5ed..f6a16f4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -109,6 +109,11 @@ newline. The available atoms are:
 	The size, in bytes, that the object takes up on disk. See the
 	note about on-disk sizes in the `CAVEATS` section below.
 
+`deltabase`::
+	If the object is stored as a delta on-disk, this expands to the
+	40-hex sha1 of the delta base object. Otherwise, expands to the
+	null sha1 (40 zeroes). See `CAVEATS` below.
+
 `rest`::
 	If this atom is used in the output string, input lines are split
 	at the first whitespace boundary. All characters before that
@@ -152,10 +157,11 @@ should be taken in drawing conclusions about which refs or objects are
 responsible for disk usage. The size of a packed non-delta object may be
 much larger than the size of objects which delta against it, but the
 choice of which object is the base and which is the delta is arbitrary
-and is subject to change during a repack. Note also that multiple copies
-of an object may be present in the object database; in this case, it is
-undefined which copy's size will be reported.
+and is subject to change during a repack.
 
+Note also that multiple copies of an object may be present in the object
+database; in this case, it is undefined which copy's size or delta base
+will be reported.
 
 GIT
 ---
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b2ca775..2e0af2e 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -118,6 +118,7 @@ struct expand_data {
 	unsigned long size;
 	unsigned long disk_size;
 	const char *rest;
+	unsigned char delta_base_sha1[20];
 
 	/*
 	 * If mark_query is true, we do not expand anything, but rather
@@ -174,6 +175,11 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len,
 			data->split_on_whitespace = 1;
 		else if (data->rest)
 			strbuf_addstr(sb, data->rest);
+	} else if (is_atom("deltabase", atom, len)) {
+		if (data->mark_query)
+			data->info.delta_base_sha1 = data->delta_base_sha1;
+		else
+			strbuf_addstr(sb, sha1_to_hex(data->delta_base_sha1));
 	} else
 		die("unknown format element: %.*s", len, atom);
 }
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8a1bc5c..633dc82 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -240,4 +240,38 @@ test_expect_success "--batch-check with multiple sha1s gives correct format" '
     "$(echo_without_newline "$batch_check_input" | git cat-file --batch-check)"
 '
 
+test_expect_success 'setup blobs which are likely to delta' '
+	test-genrandom foo 10240 >foo &&
+	{ cat foo; echo plus; } >foo-plus &&
+	git add foo foo-plus &&
+	git commit -m foo &&
+	cat >blobs <<-\EOF
+	HEAD:foo
+	HEAD:foo-plus
+	EOF
+'
+
+test_expect_success 'confirm that neither loose blob is a delta' '
+	cat >expect <<-EOF
+	$_z40
+	$_z40
+	EOF
+	git cat-file --batch-check="%(deltabase)" <blobs >actual &&
+	test_cmp expect actual
+'
+
+# To avoid relying too much on the current delta heuristics,
+# we will check only that one of the two objects is a delta
+# against the other, but not the order. We can do so by just
+# asking for the base of both, and checking whether either
+# sha1 appears in the output.
+test_expect_success '%(deltabase) reports packed delta bases' '
+	git repack -ad &&
+	git cat-file --batch-check="%(deltabase)" <blobs >actual &&
+	{
+		grep "$(git rev-parse HEAD:foo)" actual ||
+		grep "$(git rev-parse HEAD:foo-plus)" actual
+	}
+'
+
 test_done
-- 
1.8.5.1.399.g900e7cd

  parent reply	other threads:[~2013-12-21 14:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-21 14:23 [PATCH 0/2] cat-file --batch-check='%(deltabase)' Jeff King
2013-12-21 14:24 ` [PATCH 1/2] sha1_object_info_extended: provide delta base sha1s Jeff King
2013-12-26 19:54   ` Junio C Hamano
2013-12-21 14:25 ` Jeff King [this message]
2013-12-26 20:20 ` [PATCH 0/2] cat-file --batch-check='%(deltabase)' Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131221142522.GB29679@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).