Re: [PATCH] t1006: add tests for %(objectsize:disk)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Ondrej Pohorelsky <opohorel@redhat.com>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] t1006: add tests for %(objectsize:disk)
Date: Thu, 21 Dec 2023 13:19:53 +0100	[thread overview]
Message-ID: <d44cb8e7-ffce-4184-b9b5-6bb56705dcd1@web.de> (raw)
In-Reply-To: <20231221094722.GA570888@coredump.intra.peff.net>

Am 21.12.23 um 10:47 schrieb Jeff King:
> On Tue, Dec 19, 2023 at 05:42:39PM +0100, René Scharfe wrote:
>
>>> This adds a packed-object function, but I doubt anybody actually calls
>>> it. If we're going to do that, it's probably worth adding some tests for
>>> "cat-file --batch-check" or similar.
>>
>> Yes, and I was assuming that someone else would be eager to add such
>> tests. *ahem*
>
> :P OK, here it is. This can be its own topic, or go on top of the
> rs/t6300-compressed-size-fix branch.

Great, thank you!

> -- >8 --
> Subject: [PATCH] t1006: add tests for %(objectsize:disk)
>
> Back when we added this placeholder in a4ac106178 (cat-file: add
> %(objectsize:disk) format atom, 2013-07-10), there were no tests,
> claiming "[...]the exact numbers returned are volatile and subject to
> zlib and packing decisions".
>
> But we can use a little shell hackery to get the expected numbers
> ourselves. To a certain degree this is just re-implementing what Git is
> doing under the hood, but it is still worth doing. It makes sure we
> exercise the %(objectsize:disk) code at all, and having the two
> implementations agree gives us more confidence.
>
> Note that our shell code assumes that no object appears twice (either in
> two packs, or as both loose and packed), as then the results really are
> undefined. That's OK for our purposes, and the test will notice if that
> assumption is violated (the shell version would produce duplicate lines
> that Git's output does not have).
>
> Helped-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I stole a bit of your awk. You can tell because I'd have written it in
> perl. ;)

I think we can do it even in shell, especially if...

>
>  t/t1006-cat-file.sh | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 271c5e4fd3..21915be308 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -1100,6 +1100,40 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
>  	cmp expect actual
>  '
>
> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> +	# our state has both loose and packed objects,
> +	# so find both for our expected output
> +	{
> +		find .git/objects/?? -type f |
> +		awk -F/ "{ print \$0, \$3\$4 }" |
> +		while read path oid
> +		do
> +			size=$(test_file_size "$path") &&
> +			echo "$oid $size" ||
> +			return 1
> +		done &&
> +		rawsz=$(test_oid rawsz) &&
> +		find .git/objects/pack -name "*.idx" |
> +		while read idx
> +		do
> +			git show-index <"$idx" >idx.raw &&
> +			sort -n <idx.raw >idx.sorted &&
> +			packsz=$(test_file_size "${idx%.idx}.pack") &&
> +			end=$((packsz - rawsz)) &&
> +			awk -v end="$end" "
> +			  NR > 1 { print oid, \$1 - start }
> +			  { start = \$1; oid = \$2 }
> +			  END { print oid, end - start }
> +			" idx.sorted ||

... we stop slicing the data against the grain.  Let's reverse the order
(sort -r), then we don't need to carry the oid forward:

			sort -nr <idx.raw >idx.sorted &&
			packsz=$(test_file_size "${idx%.idx}.pack") &&
			end=$((packsz - rawsz)) &&
			awk -v end="$end" "
			  { print \$2, end - \$1; end = \$1 }
			" idx.sorted ||

And at that point it should be easy to use a shell loop instead of awk:

			while read start oid rest
			do
				size=$((end - start)) &&
				end=$start &&
				echo "$oid $size" ||
				return 1
			done <idx.sorted

> +			return 1
> +		done
> +	} >expect.raw &&
> +	sort <expect.raw >expect &&

The reversal above becomes irrelevant with that line, so the result in
expect stays the same.

Should we deduplicate here, like cat-file does (i.e. use "sort -u")?
Having the same object in multiple places for whatever reason would not
be a cause for reporting an error in this test, I would think.

> +	git cat-file --batch-all-objects \
> +		--batch-check="%(objectname) %(objectsize:disk)" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'set up replacement object' '
>  	orig=$(git rev-parse HEAD) &&
>  	git cat-file commit $orig >orig &&

One more thing: We can do the work of the first awk invocation in the
already existing loop as well:

> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> +	# our state has both loose and packed objects,
> +	# so find both for our expected output
> +	{
> +		find .git/objects/?? -type f |
> +		awk -F/ "{ print \$0, \$3\$4 }" |
> +		while read path oid
> +		do
> +			size=$(test_file_size "$path") &&
> +			echo "$oid $size" ||
> +			return 1
> +		done &&

... but the substitutions are a bit awkward:

		find .git/objects/?? -type f |
		while read path
		do
			basename=${path##*/} &&
			dirname=${path%/$basename} &&
			oid="${dirname#.git/objects/}${basename}" &&
			size=$(test_file_size "$path") &&
			echo "$oid $size" ||
			return 1
		done &&

The avoided awk invocation might be worth the trouble, though.

René

next prev parent reply	other threads:[~2023-12-21 12:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-12 14:16 Test breakage with zlib-ng Ondrej Pohorelsky
2023-12-12 17:04 ` René Scharfe
2023-12-12 20:01   ` Jeff King
2023-12-12 22:54     ` René Scharfe
2023-12-13 12:28       ` [PATCH 2/1] test-lib-functions: add object size functions René Scharfe
2023-12-14 20:59         ` Jeff King
2023-12-19 16:42           ` René Scharfe
2023-12-21  9:47             ` [PATCH] t1006: add tests for %(objectsize:disk) Jeff King
2023-12-21 12:19               ` René Scharfe [this message]
2023-12-21 21:30                 ` Jeff King
2023-12-21 23:13                   ` René Scharfe
2023-12-23 10:09                     ` [PATCH v2] " Jeff King
2023-12-24  9:30                       ` René Scharfe
2023-12-23 10:18                     ` [PATCH] " Jeff King
2023-12-24  9:30                       ` René Scharfe
2023-12-12 22:18   ` Test breakage with zlib-ng brian m. carlson
2023-12-12 22:30   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d44cb8e7-ffce-4184-b9b5-6bb56705dcd1@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=opohorel@redhat.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.