public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
	Justin Tobler <jltobler@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3 12/14] builtin/pack-objects: use `packfile_store_for_each_object()`
Date: Mon, 26 Jan 2026 09:53:18 +0100	[thread overview]
Message-ID: <aXcrftLpfcG4S5AX@pks.im> (raw)
In-Reply-To: <aXO/YLzRlDXD5IPY@nand.local>

On Fri, Jan 23, 2026 at 01:35:12PM -0500, Taylor Blau wrote:
> On Fri, Jan 23, 2026 at 10:43:16AM +0100, Patrick Steinhardt wrote:
> > On Thu, Jan 22, 2026 at 08:21:55PM -0500, Taylor Blau wrote:
> > > On Wed, Jan 21, 2026 at 01:50:28PM +0100, Patrick Steinhardt wrote:
> > > >  static int add_object_in_unpacked_pack(const struct object_id *oid,
> > > > -				       struct packed_git *pack,
> > > > -				       uint32_t pos,
> > > > +				       struct object_info *oi,
> > > >  				       void *data UNUSED)
> > > >  {
> > > >  	if (cruft) {
> > > > -		off_t offset;
> > > > -		time_t mtime;
> > > > -
> > > > -		if (pack->is_cruft) {
> > > > -			if (load_pack_mtimes(pack) < 0)
> > > > -				die(_("could not load cruft pack .mtimes"));
> > > > -			mtime = nth_packed_mtime(pack, pos);
> > > > -		} else {
> > > > -			mtime = pack->mtime;
> > > > -		}
> > > > -		offset = nth_packed_object_offset(pack, pos);
> > > > -
> > > > -		add_cruft_object_entry(oid, OBJ_NONE, pack, offset,
> > > > -				       NULL, mtime);
> > >
> > > OK, here's where we see the existing logic for determining the mtime of
> > > an object in the GC sense. I see there's a subsequent patch that also
> > > makes use of the object_info->mtimep field, and my guess is (not having
> > > completely read that patch yet) that having the same notion of mtime
> > > between the two callsites is desirable.
> > >
> > > I still wonder whether imposing that notion of mtime at the object_info
> > > layer is the right choice. I wonder if it would make more sense to allow
> > > the caller to have a "statp" pointer filled out (or alternatively stick
> > > a "struct stat" in both the packed union type as well as the loose one,
> > > though the latter doesn't yet exist).
> >
> > The problem with filling out a `struct stat` though is that it will only
> > apply to backends that actually have a path to stat. There may be other
> > backends that don't. You could of course pretend that there was a file
> > and fill in the `st_mtime` field. But I don't really see the benefit
> > over having a standalone mtime field.
> 
> I understand what you're saying, but I don't think that this is unique
> to stat. Looking through the object_info struct, there are a handful of
> fields on the request side that are coupled to the objects themselves,
> not their representation, such as typep, sizep, and contentp.
> 
> But there are a handful of fields in the request section which are *not*
> properties of the objects themselves, but rather properties of the way
> those objects are represented as part of the backend-specific
> implementation.

Yes, the specific interpretation will change for some fields. But in
general, most of the fields still apply to all backends.

> For example, disk_sizep suggests that all objects are stored on disk and
> have a clear notion of how much space they occupy. I could imagine a
> backend implementation where perhaps the contents of objects are divvied
> up into smaller chunks and deduplicated across many objects. I don't
> think there is a clear answer to how much "disk size" an object occupies
> in that case.

This would still apply to other backends though. It's true that "disk"
size is a bit of a misnomer now, and that it should probably rather be
renamed to "storage" size. But overall, no matter the backend, you will
still eventually end up storing the object data somewhere, and that
takes up space.

> delta_base_oid is another field that I'd argue is not a property of the
> object itself, but rather its representation. Of course, objects stored
> in packfiles may or may not be stored as a delta against some other
> object, and thus being able to ask what that object is makes sense. But
> loose objects don't have the same property as a result of how they are
> stored.

Yup. This field is specific to the packed backend indeed and ideally
shouldn't be part of the `struct object_info`. In the best case it could
be lifted into `struct object_info::u`, but I'm not sure whether that's
easily possible.

> To me this seems like an example where implementation-specific details
> are already leaking through the object_info struct. So in that sense I
> don't think that adding a "struct stat" here is meaningfully changing
> anything.

The thing is that I'm trying to clean up all the different messes that
we have. So instead of adding _more_ leakiness, I'd rather prefer to
remove some of it.

> But I think the proposed mtimep field is a special case not only for the
> reasons stated above, but because an object's mtime has multiple
> interpretations already. For example, if I'm asking about an object's
> mtime, and that object happens to be stored in a cruft pack, which mtime
> am I referring to? Packed objects inherit their mtime from the mtime of
> the *.pack itself, but cruft objects have an additional interpretation
> which is read from the *.mtimes file corresponding to the cruft pack.

This is a good question indeed though.

> I don't love bolting another leaky abstraction onto the object_info
> interface, but my broader concern is that the information here is not
> just leaky but ambiguous. By adding a statp pointer, I think the
> information is less ambiguous since the GC-specific interpretation of
> mtime is done at a layer above stat(2).

Fair point. For the current backend, mtime can be ambiguous as the same
object may be stored multiple times: either as a loose object, or as
part of any of the packfiles.

In the context of `odb_for_each_object()` that info is not ambigous
though: we would yield the same object multiple times, and every time we
yield it we will may have a different mtime. And this is working as
expected for the two callsites:

  - In "reachable.c" we use the mtimep field in the context of recent
    objects. So if at least one of the objects has a new-enough mtime we
    would eventually see it.

 - In "builtin/pack-objects.c" we use basically the same logic as we
   use after my patch seires, where we use either the cruft time or the
   pack time. So things work as expected over there, too.

So I'd claim that this is working sensibly for `odb_for_each_object()`,
and there is no ambiguity involved. It's the caller that has to
disambiguite, and that's already happening.

But things are a bit different if you invoke `odb_read_object_info()`
directly, as we have no way to disambiguate there. We only want to yield
_a_ representation of an object, so the mtime will be derived from
whatever data structure the object was found in first. This could be
helped with better documentation.

> > > Then the caller could do something like:
> > >
> > > static time_t object_info_gc_mtime(const struct object_info *oi)
> > > {
> > >     if (!oi->statp)
> > >         BUG("oops!");
> > >
> > >     switch (oi->whence) {
> > >     case OI_CACHED:
> > >         return 0;
> > >     case OI_LOOSE:
> > >         return oi->statp->st_mtime;
> > >     case OI_PACKED:
> > >         struct packed_git *p = oi->u.packed.pack;
> > >         if (p->is_cruft) {
> > >             uint32_t pack_pos;
> > >
> > >             if (load_pack_mtimes(p) < 0)
> > >                 die(_("could not load cruft pack .mtimes for '%s'"),
> > >                     pack_basename(p));
> > >             if (offset_to_pack_pos(p, oi->u.packed.offset, &pack_pos) < 0)
> > >                 die(_("could not find offset for object '%s' in cruft pack '%s'"),
> > >                     oid_to_hex(&oi->oid),
> > >                     pack_basename(p));
> > >
> > >             return nth_packed_mtime(p, pack_pos_to_index(p, pack_pos));
> > >         } else {
> > >             return p->mtime; /* or oi->statp->st_mtime */
> > >         }
> > >     default:
> > >         BUG("unknown oi->whence: %d", oi->whence);
> > >     }
> > > }
> > >
> > > I like the above because it encapsulates the GC-specific interpretation
> > > of an object's mtime outside of the object_info layer, while adding
> > > information (namely statp) that is generic enough to be potentially
> > > useful to other callers who may not be interested in the GC-specific
> > > interpretation.
> >
> > This isn't achieving the goal of making the logic pluggable though, as
> > you now have backend-specific logic outside of the backends. Also, isn't
> > the end result basically the same as what I have proposed, except that
> > my version _is_ fully pluggable because the logic is entirely contained
> > in the backend?
> 
> Yes, the end result is the same, both your patch and what I wrote here
> implement the same GC-specific definition of an object's "mtime". I am
> not following the argument about pluggability, though. The concern I
> have above is that we are pushing domain-specific logic into the object
> storage backend, not the other way around.

To expand on the pluggability bit: every time you add a new backend
you'll have to extend the above logic to understand how it represents
the mtime. That by itself might be doable, but let's for example
consider a backend that is a black box to us (like a shared library that
may plug in arbitrary storage logic). In that case you would not even be
able to derive the information unless you have a generic layer that lets
you convey it to the caller.

So overall I agree with you that there are nuances here, and that the
mtimep pointer _can_ be used incorrectly. But I still think that the
concept is generic enough across backends, and the refactored logic
still works as extended. I'll try to expand the docs and commit message
a bit to cover this discussion.

Thanks!

Patrick

  reply	other threads:[~2026-01-26  8:53 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-15 11:04 [PATCH 00/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-15 18:00   ` Justin Tobler
2026-01-15 11:04 ` [PATCH 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-15 18:31   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-20  9:09   ` Karthik Nayak
2026-01-15 11:04 ` [PATCH 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-15 20:54   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-20  9:16   ` Karthik Nayak
2026-01-15 11:04 ` [PATCH 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-15 21:17   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-16 17:46   ` Justin Tobler
2026-01-19  7:10     ` Patrick Steinhardt
2026-01-20  9:20   ` Karthik Nayak
2026-01-21  7:39     ` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-15 21:24   ` Justin Tobler
2026-01-15 11:04 ` [PATCH 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-15 21:44   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-16 17:47       ` Justin Tobler
2026-01-19  7:10         ` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-01-15 13:50 ` [PATCH 00/14] odb: introduce `odb_for_each_object()` Junio C Hamano
2026-01-16  7:03   ` Patrick Steinhardt
2026-01-16 16:49     ` Junio C Hamano
2026-01-20 15:25 ` [PATCH v2 " Patrick Steinhardt
2026-01-20 15:25   ` [PATCH v2 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-20 15:25   ` [PATCH v2 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-20 15:25   ` [PATCH v2 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-01-21 12:50 ` [PATCH v3 00/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-21 21:11     ` Jeff King
2026-01-22  0:00       ` Taylor Blau
2026-01-22 15:41         ` Junio C Hamano
2026-01-22 19:23           ` Jeff King
2026-01-23 10:57             ` Patrick Steinhardt
2026-01-26 22:32             ` Junio C Hamano
2026-01-22  6:50       ` Patrick Steinhardt
2026-01-22 23:44         ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-22  0:04     ` Taylor Blau
2026-01-22  6:51       ` Patrick Steinhardt
2026-01-22 23:47         ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-22  0:15     ` Taylor Blau
2026-01-22  6:52       ` Patrick Steinhardt
2026-01-23  0:01         ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-22  1:37     ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-23  0:06     ` Taylor Blau
2026-01-23  9:42       ` Patrick Steinhardt
2026-01-23  9:52         ` Chris Torek
2026-01-23 16:22           ` Junio C Hamano
2026-01-23 17:45             ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-23  0:13     ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-23  0:32     ` Taylor Blau
2026-01-23  9:42       ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-23  0:33     ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-23  0:46     ` Taylor Blau
2026-01-23  9:43       ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-23  1:06     ` Taylor Blau
2026-01-23  9:43       ` Patrick Steinhardt
2026-01-23 17:48         ` Taylor Blau
2026-01-26  8:53           ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-23  1:21     ` Taylor Blau
2026-01-23  9:43       ` Patrick Steinhardt
2026-01-23 18:35         ` Taylor Blau
2026-01-26  8:53           ` Patrick Steinhardt [this message]
2026-01-29 11:08             ` Jeff King
2026-01-30 12:57               ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-01-22  1:33   ` [PATCH v3 00/14] odb: introduce `odb_for_each_object()` Taylor Blau
2026-01-22 17:02     ` Junio C Hamano
2026-01-26  9:51 ` [PATCH v4 " Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-02-20 22:59   ` [PATCH v4 00/14] odb: introduce `odb_for_each_object()` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXcrftLpfcG4S5AX@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox