git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Ondrej Pohorelsky <opohorel@redhat.com>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] t1006: add tests for %(objectsize:disk)
Date: Thu, 21 Dec 2023 13:19:53 +0100	[thread overview]
Message-ID: <d44cb8e7-ffce-4184-b9b5-6bb56705dcd1@web.de> (raw)
In-Reply-To: <20231221094722.GA570888@coredump.intra.peff.net>

Am 21.12.23 um 10:47 schrieb Jeff King:
> On Tue, Dec 19, 2023 at 05:42:39PM +0100, René Scharfe wrote:
>
>>> This adds a packed-object function, but I doubt anybody actually calls
>>> it. If we're going to do that, it's probably worth adding some tests for
>>> "cat-file --batch-check" or similar.
>>
>> Yes, and I was assuming that someone else would be eager to add such
>> tests. *ahem*
>
> :P OK, here it is. This can be its own topic, or go on top of the
> rs/t6300-compressed-size-fix branch.

Great, thank you!

> -- >8 --
> Subject: [PATCH] t1006: add tests for %(objectsize:disk)
>
> Back when we added this placeholder in a4ac106178 (cat-file: add
> %(objectsize:disk) format atom, 2013-07-10), there were no tests,
> claiming "[...]the exact numbers returned are volatile and subject to
> zlib and packing decisions".
>
> But we can use a little shell hackery to get the expected numbers
> ourselves. To a certain degree this is just re-implementing what Git is
> doing under the hood, but it is still worth doing. It makes sure we
> exercise the %(objectsize:disk) code at all, and having the two
> implementations agree gives us more confidence.
>
> Note that our shell code assumes that no object appears twice (either in
> two packs, or as both loose and packed), as then the results really are
> undefined. That's OK for our purposes, and the test will notice if that
> assumption is violated (the shell version would produce duplicate lines
> that Git's output does not have).
>
> Helped-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I stole a bit of your awk. You can tell because I'd have written it in
> perl. ;)

I think we can do it even in shell, especially if...

>
>  t/t1006-cat-file.sh | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 271c5e4fd3..21915be308 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -1100,6 +1100,40 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
>  	cmp expect actual
>  '
>
> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> +	# our state has both loose and packed objects,
> +	# so find both for our expected output
> +	{
> +		find .git/objects/?? -type f |
> +		awk -F/ "{ print \$0, \$3\$4 }" |
> +		while read path oid
> +		do
> +			size=$(test_file_size "$path") &&
> +			echo "$oid $size" ||
> +			return 1
> +		done &&
> +		rawsz=$(test_oid rawsz) &&
> +		find .git/objects/pack -name "*.idx" |
> +		while read idx
> +		do
> +			git show-index <"$idx" >idx.raw &&
> +			sort -n <idx.raw >idx.sorted &&
> +			packsz=$(test_file_size "${idx%.idx}.pack") &&
> +			end=$((packsz - rawsz)) &&
> +			awk -v end="$end" "
> +			  NR > 1 { print oid, \$1 - start }
> +			  { start = \$1; oid = \$2 }
> +			  END { print oid, end - start }
> +			" idx.sorted ||

... we stop slicing the data against the grain.  Let's reverse the order
(sort -r), then we don't need to carry the oid forward:

			sort -nr <idx.raw >idx.sorted &&
			packsz=$(test_file_size "${idx%.idx}.pack") &&
			end=$((packsz - rawsz)) &&
			awk -v end="$end" "
			  { print \$2, end - \$1; end = \$1 }
			" idx.sorted ||

And at that point it should be easy to use a shell loop instead of awk:

			while read start oid rest
			do
				size=$((end - start)) &&
				end=$start &&
				echo "$oid $size" ||
				return 1
			done <idx.sorted

> +			return 1
> +		done
> +	} >expect.raw &&
> +	sort <expect.raw >expect &&

The reversal above becomes irrelevant with that line, so the result in
expect stays the same.

Should we deduplicate here, like cat-file does (i.e. use "sort -u")?
Having the same object in multiple places for whatever reason would not
be a cause for reporting an error in this test, I would think.

> +	git cat-file --batch-all-objects \
> +		--batch-check="%(objectname) %(objectsize:disk)" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'set up replacement object' '
>  	orig=$(git rev-parse HEAD) &&
>  	git cat-file commit $orig >orig &&

One more thing: We can do the work of the first awk invocation in the
already existing loop as well:

> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> +	# our state has both loose and packed objects,
> +	# so find both for our expected output
> +	{
> +		find .git/objects/?? -type f |
> +		awk -F/ "{ print \$0, \$3\$4 }" |
> +		while read path oid
> +		do
> +			size=$(test_file_size "$path") &&
> +			echo "$oid $size" ||
> +			return 1
> +		done &&

... but the substitutions are a bit awkward:

		find .git/objects/?? -type f |
		while read path
		do
			basename=${path##*/} &&
			dirname=${path%/$basename} &&
			oid="${dirname#.git/objects/}${basename}" &&
			size=$(test_file_size "$path") &&
			echo "$oid $size" ||
			return 1
		done &&

The avoided awk invocation might be worth the trouble, though.

René

  reply	other threads:[~2023-12-21 12:20 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-12 14:16 Test breakage with zlib-ng Ondrej Pohorelsky
2023-12-12 17:04 ` René Scharfe
2023-12-12 20:01   ` Jeff King
2023-12-12 22:54     ` René Scharfe
2023-12-13 12:28       ` [PATCH 2/1] test-lib-functions: add object size functions René Scharfe
2023-12-14 20:59         ` Jeff King
2023-12-19 16:42           ` René Scharfe
2023-12-21  9:47             ` [PATCH] t1006: add tests for %(objectsize:disk) Jeff King
2023-12-21 12:19               ` René Scharfe [this message]
2023-12-21 21:30                 ` Jeff King
2023-12-21 23:13                   ` René Scharfe
2023-12-23 10:09                     ` [PATCH v2] " Jeff King
2023-12-24  9:30                       ` René Scharfe
2023-12-23 10:18                     ` [PATCH] " Jeff King
2023-12-24  9:30                       ` René Scharfe
2023-12-12 22:18   ` Test breakage with zlib-ng brian m. carlson
2023-12-12 22:30   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d44cb8e7-ffce-4184-b9b5-6bb56705dcd1@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=opohorel@redhat.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).