From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Ondrej Pohorelsky <opohorel@redhat.com>,
"brian m . carlson" <sandals@crustytoothpaste.net>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] t1006: add tests for %(objectsize:disk)
Date: Thu, 21 Dec 2023 13:19:53 +0100 [thread overview]
Message-ID: <d44cb8e7-ffce-4184-b9b5-6bb56705dcd1@web.de> (raw)
In-Reply-To: <20231221094722.GA570888@coredump.intra.peff.net>
Am 21.12.23 um 10:47 schrieb Jeff King:
> On Tue, Dec 19, 2023 at 05:42:39PM +0100, René Scharfe wrote:
>
>>> This adds a packed-object function, but I doubt anybody actually calls
>>> it. If we're going to do that, it's probably worth adding some tests for
>>> "cat-file --batch-check" or similar.
>>
>> Yes, and I was assuming that someone else would be eager to add such
>> tests. *ahem*
>
> :P OK, here it is. This can be its own topic, or go on top of the
> rs/t6300-compressed-size-fix branch.
Great, thank you!
> -- >8 --
> Subject: [PATCH] t1006: add tests for %(objectsize:disk)
>
> Back when we added this placeholder in a4ac106178 (cat-file: add
> %(objectsize:disk) format atom, 2013-07-10), there were no tests,
> claiming "[...]the exact numbers returned are volatile and subject to
> zlib and packing decisions".
>
> But we can use a little shell hackery to get the expected numbers
> ourselves. To a certain degree this is just re-implementing what Git is
> doing under the hood, but it is still worth doing. It makes sure we
> exercise the %(objectsize:disk) code at all, and having the two
> implementations agree gives us more confidence.
>
> Note that our shell code assumes that no object appears twice (either in
> two packs, or as both loose and packed), as then the results really are
> undefined. That's OK for our purposes, and the test will notice if that
> assumption is violated (the shell version would produce duplicate lines
> that Git's output does not have).
>
> Helped-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I stole a bit of your awk. You can tell because I'd have written it in
> perl. ;)
I think we can do it even in shell, especially if...
>
> t/t1006-cat-file.sh | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index 271c5e4fd3..21915be308 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -1100,6 +1100,40 @@ test_expect_success 'cat-file --batch="batman" with --batch-all-objects will wor
> cmp expect actual
> '
>
> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> + # our state has both loose and packed objects,
> + # so find both for our expected output
> + {
> + find .git/objects/?? -type f |
> + awk -F/ "{ print \$0, \$3\$4 }" |
> + while read path oid
> + do
> + size=$(test_file_size "$path") &&
> + echo "$oid $size" ||
> + return 1
> + done &&
> + rawsz=$(test_oid rawsz) &&
> + find .git/objects/pack -name "*.idx" |
> + while read idx
> + do
> + git show-index <"$idx" >idx.raw &&
> + sort -n <idx.raw >idx.sorted &&
> + packsz=$(test_file_size "${idx%.idx}.pack") &&
> + end=$((packsz - rawsz)) &&
> + awk -v end="$end" "
> + NR > 1 { print oid, \$1 - start }
> + { start = \$1; oid = \$2 }
> + END { print oid, end - start }
> + " idx.sorted ||
... we stop slicing the data against the grain. Let's reverse the order
(sort -r), then we don't need to carry the oid forward:
sort -nr <idx.raw >idx.sorted &&
packsz=$(test_file_size "${idx%.idx}.pack") &&
end=$((packsz - rawsz)) &&
awk -v end="$end" "
{ print \$2, end - \$1; end = \$1 }
" idx.sorted ||
And at that point it should be easy to use a shell loop instead of awk:
while read start oid rest
do
size=$((end - start)) &&
end=$start &&
echo "$oid $size" ||
return 1
done <idx.sorted
> + return 1
> + done
> + } >expect.raw &&
> + sort <expect.raw >expect &&
The reversal above becomes irrelevant with that line, so the result in
expect stays the same.
Should we deduplicate here, like cat-file does (i.e. use "sort -u")?
Having the same object in multiple places for whatever reason would not
be a cause for reporting an error in this test, I would think.
> + git cat-file --batch-all-objects \
> + --batch-check="%(objectname) %(objectsize:disk)" >actual &&
> + test_cmp expect actual
> +'
> +
> test_expect_success 'set up replacement object' '
> orig=$(git rev-parse HEAD) &&
> git cat-file commit $orig >orig &&
One more thing: We can do the work of the first awk invocation in the
already existing loop as well:
> +test_expect_success 'cat-file %(objectsize:disk) with --batch-all-objects' '
> + # our state has both loose and packed objects,
> + # so find both for our expected output
> + {
> + find .git/objects/?? -type f |
> + awk -F/ "{ print \$0, \$3\$4 }" |
> + while read path oid
> + do
> + size=$(test_file_size "$path") &&
> + echo "$oid $size" ||
> + return 1
> + done &&
... but the substitutions are a bit awkward:
find .git/objects/?? -type f |
while read path
do
basename=${path##*/} &&
dirname=${path%/$basename} &&
oid="${dirname#.git/objects/}${basename}" &&
size=$(test_file_size "$path") &&
echo "$oid $size" ||
return 1
done &&
The avoided awk invocation might be worth the trouble, though.
René
next prev parent reply other threads:[~2023-12-21 12:20 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-12 14:16 Test breakage with zlib-ng Ondrej Pohorelsky
2023-12-12 17:04 ` René Scharfe
2023-12-12 20:01 ` Jeff King
2023-12-12 22:54 ` René Scharfe
2023-12-13 12:28 ` [PATCH 2/1] test-lib-functions: add object size functions René Scharfe
2023-12-14 20:59 ` Jeff King
2023-12-19 16:42 ` René Scharfe
2023-12-21 9:47 ` [PATCH] t1006: add tests for %(objectsize:disk) Jeff King
2023-12-21 12:19 ` René Scharfe [this message]
2023-12-21 21:30 ` Jeff King
2023-12-21 23:13 ` René Scharfe
2023-12-23 10:09 ` [PATCH v2] " Jeff King
2023-12-24 9:30 ` René Scharfe
2023-12-23 10:18 ` [PATCH] " Jeff King
2023-12-24 9:30 ` René Scharfe
2023-12-12 22:18 ` Test breakage with zlib-ng brian m. carlson
2023-12-12 22:30 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d44cb8e7-ffce-4184-b9b5-6bb56705dcd1@web.de \
--to=l.s.r@web.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=opohorel@redhat.com \
--cc=peff@peff.net \
--cc=sandals@crustytoothpaste.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).