git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Ondrej Pohorelsky <opohorel@redhat.com>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/1] test-lib-functions: add object size functions
Date: Tue, 19 Dec 2023 17:42:39 +0100	[thread overview]
Message-ID: <6750c93c-78d0-46b5-bfc2-0774156ed2ed@web.de> (raw)
In-Reply-To: <20231214205936.GA2272813@coredump.intra.peff.net>

Am 14.12.23 um 21:59 schrieb Jeff King:
> On Wed, Dec 13, 2023 at 01:28:56PM +0100, René Scharfe wrote:
>
>> Add test_object_size and its helpers test_loose_object_size and
>> test_packed_object_size, which allow determining the size of a Git
>> object using only the low-level Git commands rev-parse and show-index.
>>
>> Use it in t6300 to replace the bare-bones function test_object_file_size
>> as a motivating example.  There it provides the expected output of the
>> high-level Git command for-each-ref.
>
> This adds a packed-object function, but I doubt anybody actually calls
> it. If we're going to do that, it's probably worth adding some tests for
> "cat-file --batch-check" or similar.

Yes, and I was assuming that someone else would be eager to add such
tests. *ahem*

> At which point I wonder if rather than having a function for a single
> object, we are better off just testing the result of:
>
>   git cat-file --batch-all-objects --unordered --batch-check='%(objectsize:disk)'
>
> against a single post-processed "show-index" invocation.

Sure, we might want to optimize for bulk-processing and possibly end up
only checking the size of single objects in t6300, making new library
functions unnecessary.

When dumping size information of multiple objects, it's probably a good
idea to include "%(objectname)" as well in the format.

You'd need one show-index call for each .idx file.  A simple test would
only have a single one; a library function might need to handle multiple
packs.

>> So how about this?  I'm a bit nervous about all the rules about output
>> descriptors and error propagation and whatnot in the test library, but
>> this implementation seems simple enough and might be useful in more than
>> one test.  No idea how to add support for alternate object directories,
>> but I doubt we'll ever need it.
>
> I'm not sure that we need to do anything special with output
> redirection. Shouldn't these functions just send errors to stderr as
> usual? If they are run inside a test_expect block, that goes to
> descriptor 4 (which is either /dev/null or the original stderr,
> depending on whether "-v" was used).

Good point.  My bad excuse is that I copied the redirection to fd 4 from
test_grep.

>> +	git show-index <"$idx" |
>> +	awk -v oid="$oid" -v end="$end" '
>> +		$2 == oid {start = $1}
>> +		{offsets[$1] = 1}
>> +		END {
>> +			if (!start || start >= end)
>> +				exit 1
>> +			for (o in offsets)
>> +				if (start < o && o < end)
>> +					end = o
>> +			print end - start
>> +		}
>> +	' && return 0
>
> I was confused at first, because I didn't see any sorting happening. But
> if I understand correctly, you're just looking for the smallest "end"
> that comes after the start of the object we're looking for. Which I
> think works.

Yes, calculating the minimum offset suffices when handling a single
object -- no sorting required.  For bulk mode we'd better sort, of
course:

	git show-index <"$idx" |
	sort -n |
	awk -v end="$end" '
		NR > 1 {print oid, $1 - start}
		{start = $1; oid = $2}
		END {print oid, end - start}
	'

No idea how to make such a thing robust against malformed or truncated
output from show-index, but perhaps that's not necessary, depending on
how the result is used.

René


  reply	other threads:[~2023-12-19 16:43 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-12 14:16 Test breakage with zlib-ng Ondrej Pohorelsky
2023-12-12 17:04 ` René Scharfe
2023-12-12 20:01   ` Jeff King
2023-12-12 22:54     ` René Scharfe
2023-12-13 12:28       ` [PATCH 2/1] test-lib-functions: add object size functions René Scharfe
2023-12-14 20:59         ` Jeff King
2023-12-19 16:42           ` René Scharfe [this message]
2023-12-21  9:47             ` [PATCH] t1006: add tests for %(objectsize:disk) Jeff King
2023-12-21 12:19               ` René Scharfe
2023-12-21 21:30                 ` Jeff King
2023-12-21 23:13                   ` René Scharfe
2023-12-23 10:09                     ` [PATCH v2] " Jeff King
2023-12-24  9:30                       ` René Scharfe
2023-12-23 10:18                     ` [PATCH] " Jeff King
2023-12-24  9:30                       ` René Scharfe
2023-12-12 22:18   ` Test breakage with zlib-ng brian m. carlson
2023-12-12 22:30   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6750c93c-78d0-46b5-bfc2-0774156ed2ed@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=opohorel@redhat.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).