public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
	Justin Tobler <jltobler@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3 12/14] builtin/pack-objects: use `packfile_store_for_each_object()`
Date: Fri, 23 Jan 2026 13:35:12 -0500	[thread overview]
Message-ID: <aXO/YLzRlDXD5IPY@nand.local> (raw)
In-Reply-To: <aXNCtCZwP57Tfu60@pks.im>

On Fri, Jan 23, 2026 at 10:43:16AM +0100, Patrick Steinhardt wrote:
> On Thu, Jan 22, 2026 at 08:21:55PM -0500, Taylor Blau wrote:
> > On Wed, Jan 21, 2026 at 01:50:28PM +0100, Patrick Steinhardt wrote:
> > >  static int add_object_in_unpacked_pack(const struct object_id *oid,
> > > -				       struct packed_git *pack,
> > > -				       uint32_t pos,
> > > +				       struct object_info *oi,
> > >  				       void *data UNUSED)
> > >  {
> > >  	if (cruft) {
> > > -		off_t offset;
> > > -		time_t mtime;
> > > -
> > > -		if (pack->is_cruft) {
> > > -			if (load_pack_mtimes(pack) < 0)
> > > -				die(_("could not load cruft pack .mtimes"));
> > > -			mtime = nth_packed_mtime(pack, pos);
> > > -		} else {
> > > -			mtime = pack->mtime;
> > > -		}
> > > -		offset = nth_packed_object_offset(pack, pos);
> > > -
> > > -		add_cruft_object_entry(oid, OBJ_NONE, pack, offset,
> > > -				       NULL, mtime);
> >
> > OK, here's where we see the existing logic for determining the mtime of
> > an object in the GC sense. I see there's a subsequent patch that also
> > makes use of the object_info->mtimep field, and my guess is (not having
> > completely read that patch yet) that having the same notion of mtime
> > between the two callsites is desirable.
> >
> > I still wonder whether imposing that notion of mtime at the object_info
> > layer is the right choice. I wonder if it would make more sense to allow
> > the caller to have a "statp" pointer filled out (or alternatively stick
> > a "struct stat" in both the packed union type as well as the loose one,
> > though the latter doesn't yet exist).
>
> The problem with filling out a `struct stat` though is that it will only
> apply to backends that actually have a path to stat. There may be other
> backends that don't. You could of course pretend that there was a file
> and fill in the `st_mtime` field. But I don't really see the benefit
> over having a standalone mtime field.

I understand what you're saying, but I don't think that this is unique
to stat. Looking through the object_info struct, there are a handful of
fields on the request side that are coupled to the objects themselves,
not their representation, such as typep, sizep, and contentp.

But there are a handful of fields in the request section which are *not*
properties of the objects themselves, but rather properties of the way
those objects are represented as part of the backend-specific
implementation.

For example, disk_sizep suggests that all objects are stored on disk and
have a clear notion of how much space they occupy. I could imagine a
backend implementation where perhaps the contents of objects are divvied
up into smaller chunks and deduplicated across many objects. I don't
think there is a clear answer to how much "disk size" an object occupies
in that case.

delta_base_oid is another field that I'd argue is not a property of the
object itself, but rather its representation. Of course, objects stored
in packfiles may or may not be stored as a delta against some other
object, and thus being able to ask what that object is makes sense. But
loose objects don't have the same property as a result of how they are
stored.

To me this seems like an example where implementation-specific details
are already leaking through the object_info struct. So in that sense I
don't think that adding a "struct stat" here is meaningfully changing
anything.

But I think the proposed mtimep field is a special case not only for the
reasons stated above, but because an object's mtime has multiple
interpretations already. For example, if I'm asking about an object's
mtime, and that object happens to be stored in a cruft pack, which mtime
am I referring to? Packed objects inherit their mtime from the mtime of
the *.pack itself, but cruft objects have an additional interpretation
which is read from the *.mtimes file corresponding to the cruft pack.

I don't love bolting another leaky abstraction onto the object_info
interface, but my broader concern is that the information here is not
just leaky but ambiguous. By adding a statp pointer, I think the
information is less ambiguous since the GC-specific interpretation of
mtime is done at a layer above stat(2).

> > Then the caller could do something like:
> >
> > static time_t object_info_gc_mtime(const struct object_info *oi)
> > {
> >     if (!oi->statp)
> >         BUG("oops!");
> >
> >     switch (oi->whence) {
> >     case OI_CACHED:
> >         return 0;
> >     case OI_LOOSE:
> >         return oi->statp->st_mtime;
> >     case OI_PACKED:
> >         struct packed_git *p = oi->u.packed.pack;
> >         if (p->is_cruft) {
> >             uint32_t pack_pos;
> >
> >             if (load_pack_mtimes(p) < 0)
> >                 die(_("could not load cruft pack .mtimes for '%s'"),
> >                     pack_basename(p));
> >             if (offset_to_pack_pos(p, oi->u.packed.offset, &pack_pos) < 0)
> >                 die(_("could not find offset for object '%s' in cruft pack '%s'"),
> >                     oid_to_hex(&oi->oid),
> >                     pack_basename(p));
> >
> >             return nth_packed_mtime(p, pack_pos_to_index(p, pack_pos));
> >         } else {
> >             return p->mtime; /* or oi->statp->st_mtime */
> >         }
> >     default:
> >         BUG("unknown oi->whence: %d", oi->whence);
> >     }
> > }
> >
> > I like the above because it encapsulates the GC-specific interpretation
> > of an object's mtime outside of the object_info layer, while adding
> > information (namely statp) that is generic enough to be potentially
> > useful to other callers who may not be interested in the GC-specific
> > interpretation.
>
> This isn't achieving the goal of making the logic pluggable though, as
> you now have backend-specific logic outside of the backends. Also, isn't
> the end result basically the same as what I have proposed, except that
> my version _is_ fully pluggable because the logic is entirely contained
> in the backend?

Yes, the end result is the same, both your patch and what I wrote here
implement the same GC-specific definition of an object's "mtime". I am
not following the argument about pluggability, though. The concern I
have above is that we are pushing domain-specific logic into the object
storage backend, not the other way around.

Thanks,
Taylor

  reply	other threads:[~2026-01-23 18:35 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-15 11:04 [PATCH 00/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-15 18:00   ` Justin Tobler
2026-01-15 11:04 ` [PATCH 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-15 18:31   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-20  9:09   ` Karthik Nayak
2026-01-15 11:04 ` [PATCH 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-15 20:54   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-20  9:16   ` Karthik Nayak
2026-01-15 11:04 ` [PATCH 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-15 21:17   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-16 17:46   ` Justin Tobler
2026-01-19  7:10     ` Patrick Steinhardt
2026-01-20  9:20   ` Karthik Nayak
2026-01-21  7:39     ` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-15 21:24   ` Justin Tobler
2026-01-15 11:04 ` [PATCH 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-15 21:44   ` Justin Tobler
2026-01-16  7:03     ` Patrick Steinhardt
2026-01-16 17:47       ` Justin Tobler
2026-01-19  7:10         ` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-15 11:04 ` [PATCH 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-01-15 13:50 ` [PATCH 00/14] odb: introduce `odb_for_each_object()` Junio C Hamano
2026-01-16  7:03   ` Patrick Steinhardt
2026-01-16 16:49     ` Junio C Hamano
2026-01-20 15:25 ` [PATCH v2 " Patrick Steinhardt
2026-01-20 15:25   ` [PATCH v2 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-20 15:25   ` [PATCH v2 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-20 15:25   ` [PATCH v2 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-20 15:26   ` [PATCH v2 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-01-21 12:50 ` [PATCH v3 00/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-21 21:11     ` Jeff King
2026-01-22  0:00       ` Taylor Blau
2026-01-22 15:41         ` Junio C Hamano
2026-01-22 19:23           ` Jeff King
2026-01-23 10:57             ` Patrick Steinhardt
2026-01-26 22:32             ` Junio C Hamano
2026-01-22  6:50       ` Patrick Steinhardt
2026-01-22 23:44         ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-22  0:04     ` Taylor Blau
2026-01-22  6:51       ` Patrick Steinhardt
2026-01-22 23:47         ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-22  0:15     ` Taylor Blau
2026-01-22  6:52       ` Patrick Steinhardt
2026-01-23  0:01         ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-22  1:37     ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-23  0:06     ` Taylor Blau
2026-01-23  9:42       ` Patrick Steinhardt
2026-01-23  9:52         ` Chris Torek
2026-01-23 16:22           ` Junio C Hamano
2026-01-23 17:45             ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-23  0:13     ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-23  0:32     ` Taylor Blau
2026-01-23  9:42       ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-23  0:33     ` Taylor Blau
2026-01-21 12:50   ` [PATCH v3 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-23  0:46     ` Taylor Blau
2026-01-23  9:43       ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-23  1:06     ` Taylor Blau
2026-01-23  9:43       ` Patrick Steinhardt
2026-01-23 17:48         ` Taylor Blau
2026-01-26  8:53           ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-23  1:21     ` Taylor Blau
2026-01-23  9:43       ` Patrick Steinhardt
2026-01-23 18:35         ` Taylor Blau [this message]
2026-01-26  8:53           ` Patrick Steinhardt
2026-01-29 11:08             ` Jeff King
2026-01-30 12:57               ` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-21 12:50   ` [PATCH v3 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-01-22  1:33   ` [PATCH v3 00/14] odb: introduce `odb_for_each_object()` Taylor Blau
2026-01-22 17:02     ` Junio C Hamano
2026-01-26  9:51 ` [PATCH v4 " Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 01/14] odb: rename `FOR_EACH_OBJECT_*` flags Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 02/14] odb: fix flags parameter to be unsigned Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 03/14] object-file: extract function to read object info from path Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 04/14] object-file: introduce function to iterate through objects Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 05/14] packfile: extract function to iterate through objects of a store Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 06/14] packfile: introduce function to iterate through objects Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 07/14] odb: introduce `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 08/14] builtin/fsck: refactor to use `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 09/14] treewide: enumerate promisor objects via `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 10/14] treewide: drop uses of `for_each_{loose,packed}_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 11/14] odb: introduce mtime fields for object info requests Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 12/14] builtin/pack-objects: use `packfile_store_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 13/14] reachable: convert to use `odb_for_each_object()` Patrick Steinhardt
2026-01-26  9:51   ` [PATCH v4 14/14] odb: drop unused `for_each_{loose,packed}_object()` functions Patrick Steinhardt
2026-02-20 22:59   ` [PATCH v4 00/14] odb: introduce `odb_for_each_object()` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXO/YLzRlDXD5IPY@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox