public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
	Justin Tobler <jltobler@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3 11/14] odb: introduce mtime fields for object info requests
Date: Thu, 22 Jan 2026 20:06:40 -0500	[thread overview]
Message-ID: <aXLJoDdoEyKXKtBf@nand.local> (raw)
In-Reply-To: <20260121-pks-odb-for-each-object-v3-11-12c4dfd24227@pks.im>

On Wed, Jan 21, 2026 at 01:50:27PM +0100, Patrick Steinhardt wrote:
> There are some use cases where we need to figure out the mtime for
> objects. Most importantly, this is the case when we want to prune
> unreachable objects. But getting at that data requires users to manually
> derive the info either via the loose object's mtime, the packfiles'
> mtime or via the ".mtimes" file.
>
> Introduce a new `struct object_info::mtimep` pointer that allows callers
> to request an object's mtime. This new field will be used in a
> subsequent commit.

The goal seems reasonable to me, but I am a little unsure about whether
or not this is the right place to expose this information. I have some
more thoughts below...

> diff --git a/object-file.c b/object-file.c
> index 65e730684b..c0f896673b 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -409,6 +409,7 @@ static int read_object_info_from_path(struct odb_source *source,
>  	char hdr[MAX_HEADER_LEN];
>  	unsigned long size_scratch;
>  	enum object_type type_scratch;
> +	struct stat st;

I was a little confused why we were declaring a stat struct here...

>  	/*
>  	 * If we don't care about type or size, then we don't
> @@ -421,7 +422,7 @@ static int read_object_info_from_path(struct odb_source *source,
>  	if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) {
>  		struct stat st;
>
> -		if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) {
> +		if ((!oi || (!oi->disk_sizep && !oi->mtimep)) && (flags & OBJECT_INFO_QUICK)) {
>  			ret = quick_has_loose(source->loose, oid) ? 0 : -1;
>  			goto out;
>  		}
> @@ -431,8 +432,12 @@ static int read_object_info_from_path(struct odb_source *source,
>  			goto out;
>  		}
>
> -		if (oi && oi->disk_sizep)
> -			*oi->disk_sizep = st.st_size;
> +		if (oi) {
> +			if (oi->disk_sizep)
> +				*oi->disk_sizep = st.st_size;

...and then assigning it here without actually calling lstat() between
the two. But the diff context elides the fact that there is another stat
declaration within this block that we *do* lstat() into before reading
it.

That tripped me up a little while reviewing, but not a huge deal. I do
wonder whether or not there is a clearer way to structure all of these
conditionals. I *think* that what you wrote here is right, but the way
that it has grown organically over time (to be clear, not the fault of
your series) makes it a little difficult to follow.

> +			if (oi->mtimep)
> +				*oi->mtimep = st.st_mtime;
> +		}
>
>  		ret = 0;
>  		goto out;
> @@ -446,7 +451,21 @@ static int read_object_info_from_path(struct odb_source *source,
>  		goto out;
>  	}
>
> -	map = map_fd(fd, path, &mapsize);
> +	if (fstat(fd, &st)) {
> +		close(fd);
> +		ret = -1;
> +		goto out;
> +	}

Makes sense. We were previously letting map_fd() take care of stat()-ing
the file to know how large the mmap should be, but now we might need
that information for the mtime as well. So doing what map_fd() is doing
underneath here directly makes sense.

> diff --git a/odb.c b/odb.c
> index 65f0447aa5..67decd3908 100644
> --- a/odb.c
> +++ b/odb.c
> @@ -702,6 +702,8 @@ static int do_oid_object_info_extended(struct object_database *odb,
>  				oidclr(oi->delta_base_oid, odb->repo->hash_algo);
>  			if (oi->contentp)
>  				*oi->contentp = xmemdupz(co->buf, co->size);
> +			if (oi->mtimep)
> +				*oi->mtimep = 0;

Assuming that you do not change the object_info request/response
semantics, I wonder if it might make sense to zero out the entirety of
the response section as a belt-and-suspenders mechanism in case future
contributors forget to assign zero to the new fields themselves.

> @@ -1619,16 +1620,34 @@ int packed_object_info(struct packed_git *p,
>  		}
>  	}
>
> -	if (oi->disk_sizep) {
> -		uint32_t pos;
> -		if (offset_to_pack_pos(p, obj_offset, &pos) < 0) {
> +	if (oi->disk_sizep || (oi->mtimep && p->is_cruft)) {
> +		if (offset_to_pack_pos(p, obj_offset, &pack_pos) < 0) {
>  			error("could not find object at offset %"PRIuMAX" "
>  			      "in pack %s", (uintmax_t)obj_offset, p->pack_name);
>  			ret = -1;
>  			goto out;
>  		}
> +	}
> +
> +	if (oi->disk_sizep)
> +		*oi->disk_sizep = pack_pos_to_offset(p, pack_pos + 1) - obj_offset;
> +
> +	if (oi->mtimep) {
> +		if (p->is_cruft) {
> +			uint32_t index_pos;
> +
> +			if (load_pack_mtimes(p) < 0)
> +				die(_("could not load cruft pack .mtimes"));

Do you think it would be worth doing instead:

    die(_("could not load .mtimes for cruft pack '%s'"), pack_basename(p));

? Most repositories should only ever have one cruft pack in practice
(even so, there should still be some value in identifying it by its
checksum in case someone is repacking underneath us). But some
repositories will have >1 cruft pack, so knowing which one is busted may
be useful in that case.

> +
> +			if (maybe_index_pos)
> +				index_pos = *maybe_index_pos;
> +			else
> +				index_pos = pack_pos_to_index(p, pack_pos);
>
> -		*oi->disk_sizep = pack_pos_to_offset(p, pos + 1) - obj_offset;
> +			*oi->mtimep = nth_packed_mtime(p, index_pos);
> +		} else {
> +			*oi->mtimep = p->mtime;
> +		}

I am a little stuck here on whether or not this is the right layer to
determine an object's mtime. On the one hand, it makes sense to me that
callers would want to know the mtime of an object, either by the mtime
of the loose object on disk, or the mtime of the contain pack otherwise.

But I'm not sure whether the GC-specific definition of "mtime" is what
the caller would always want. For GC uses, yes, having mtime be aware of
cruft packs makes total sense to me. But for non-GC uses, would there
ever be a scenario where the caller would want to know the mtime of an
object's containing pack, regardless of whether or not that pack is
cruft?

I suppose they could get around that today by doing something like:

    if (oi->whence == OI_PACKED) {
        struct packed_git *p = oi->u.packed.p;
        if (p->is_cruft) {
            /* reinterpret the meaning of mtime... */
            *oi->mtimep = p->mtime;
        }
    }

, but that feels a little clunky. I dunno, maybe this hypothetical
doesn't really exist and I'm overthinking this. But I have this nagging
feeling that we are exposing this information at too low of a level as
to make the object store aware of cruft pack/GC-specific mechanics.

Thanks,
Taylor

  reply	other threads:[~2026-01-23  1:06 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-15 11:04 [PATCH 00/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-15 18:00   ` Justin Tobler
2026-01-15 11:04 ` [PATCH 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-15 18:31   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-20  9:09   ` Karthik Nayak
2026-01-15 11:04 ` [PATCH 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-15 20:54   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-20  9:16   ` Karthik Nayak
2026-01-15 11:04 ` [PATCH 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-15 21:17   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-16 17:46   ` Justin Tobler
2026-01-19  7:10     ` Patrick Steinhardt
2026-01-20  9:20   ` Karthik Nayak
2026-01-21  7:39     ` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-15 21:24   ` Justin Tobler
2026-01-15 11:04 ` [PATCH 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-15 21:44   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-16 17:47       ` Justin Tobler
2026-01-19  7:10         ` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-01-15 13:50 ` [PATCH 00/14] odb: introduce `odb_for_each_object()` Junio C Hamano
2026-01-16  7:03   ` Patrick Steinhardt
2026-01-16 16:49     ` Junio C Hamano
2026-01-20 15:25 ` [PATCH v2 " Patrick Steinhardt
2026-01-20 15:25   ` [PATCH v2 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-20 15:25   ` [PATCH v2 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-20 15:25   ` [PATCH v2 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-01-21 12:50 ` [PATCH v3 00/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-21 21:11     ` Jeff King
2026-01-22  0:00       ` Taylor Blau
2026-01-22 15:41         ` Junio C Hamano
2026-01-22 19:23           ` Jeff King
2026-01-23 10:57             ` Patrick Steinhardt
2026-01-26 22:32             ` Junio C Hamano
2026-01-22  6:50       ` Patrick Steinhardt
2026-01-22 23:44         ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-22  0:04     ` Taylor Blau
2026-01-22  6:51       ` Patrick Steinhardt
2026-01-22 23:47         ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-22  0:15     ` Taylor Blau
2026-01-22  6:52       ` Patrick Steinhardt
2026-01-23  0:01         ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-22  1:37     ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-23  0:06     ` Taylor Blau
2026-01-23  9:42       ` Patrick Steinhardt
2026-01-23  9:52         ` Chris Torek
2026-01-23 16:22           ` Junio C Hamano
2026-01-23 17:45             ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-23  0:13     ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-23  0:32     ` Taylor Blau
2026-01-23  9:42       ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-23  0:33     ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-23  0:46     ` Taylor Blau
2026-01-23  9:43       ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-23  1:06     ` Taylor Blau [this message]
2026-01-23  9:43       ` Patrick Steinhardt
2026-01-23 17:48         ` Taylor Blau
2026-01-26  8:53           ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-23  1:21     ` Taylor Blau
2026-01-23  9:43       ` Patrick Steinhardt
2026-01-23 18:35         ` Taylor Blau
2026-01-26  8:53           ` Patrick Steinhardt
2026-01-29 11:08             ` Jeff King
2026-01-30 12:57               ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-01-22  1:33   ` [PATCH v3 00/14] odb: introduce `odb_for_each_object()` Taylor Blau
2026-01-22 17:02     ` Junio C Hamano
2026-01-26  9:51 ` [PATCH v4 " Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-02-20 22:59   ` [PATCH v4 00/14] odb: introduce `odb_for_each_object()` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXLJoDdoEyKXKtBf@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox