From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1F142F531F for ; Fri, 23 Jan 2026 01:06:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769130408; cv=none; b=ioSNlOj9ZVoessyKXaznVSsMQFmbcaQbKOY9xY1T7FvzHgjVrJfsY8+SKSGe4C1kj9muKp6H8nV7XbnAREIM8yB5mGN6+Rr9GUe1RZYpkhqSCYHhpzEOkOar89sf6L+fADc9Gib+6+SBMLuVwV1HGXW9jXiuiXc7pFr6yGtrb7s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769130408; c=relaxed/simple; bh=qOLNx2dboO5QapMrMna+hg/Sc9gxIAFdP/sg6ESf3XU=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=BfHsB0+G+Qj0dgwz9A+9vA250I4Uc5kYC1lmmQCvEob7cqTr/sRVTLeiT5y3Q9Z289/YFP+l9ucYNAbx4MVLcxveSTwtBuJUmBcru+xUCZ6yEbLLE6sUxlpQm0TtwFVbbz/MsoxTyhhKqPM5pdYJl2jqvWFIuFS9Rvrl1bCcIiU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ttaylorr.com; spf=pass smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr.com header.i=@ttaylorr.com header.b=UlddWePR; arc=none smtp.client-ip=209.85.128.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr.com header.i=@ttaylorr.com header.b="UlddWePR" Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-794265fb73bso15692707b3.1 for ; Thu, 22 Jan 2026 17:06:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr.com; s=google; t=1769130402; x=1769735202; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JfW+U4ba3fnJIrHOallX62Piz/BBF2EhLWlB5fpb6Jw=; b=UlddWePRdTqcDFIYRh0PkoqSYpiQqmTN9v1rtfYr6GzDGRayNKzIDVtDFWVzN03me3 MeYREow+CzQOSLrIVFPbmABFn0/D0JgpdWBMWapUa5fgEsBb0a6qeRgFBZqw3rmVUuc6 gXxVlYFHuhFNz8czip9eHn2KG93OpxK7gKPWMMBJUjR6G5+ivnu8z8zXnT44c2gP5+oY E2sZXfU9nTa+GLfywjo+u1YFgT7sAGxEXRIwUwSDL++0yzhnfYbfA2MX//FWAPvxsq1M Y/QCYQLJnjIXHsJ6GEcQL5entMwfR88DFafemBNGYjtvrjw9teI6kG+KxvtwVuRcethT 6YZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769130402; x=1769735202; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JfW+U4ba3fnJIrHOallX62Piz/BBF2EhLWlB5fpb6Jw=; b=AQWiQEpWuVE/yGrmTVtT+HErdf6WP5EM2sk6ndijW9V6Xzc31+onSUBg0KYqSAtChk ZrJqtKndKQUSvVWfJQgek21/sIedVxUWGpyQlymW/QR/f0o7iS7jzivC+3+I3cIkfnjJ MHyACZSW26tBJJ5WxItdNB3U0w8D4HxB+fgREi5uad75jB+Q1GZ1xq5/sxmck/AD2tlV 2KTtK7GY+y2FATF5hrdNsRWpln7LlFgQ8JeUQzDh55ZJGUhnzI74sSJQZIjH0ZFo7yUG KgtUlwI4/GgMrqBDMQ5P1BbSPy/cvAh1rJlGBnOJgeEpoPNL9zVb6Kl4EZlxv/jz90Ye MtXA== X-Gm-Message-State: AOJu0YxWwH+/IUFQhYUAZHzqvOFLUUhmKu8sYVmgup02H9wV0o9+UDlK 31Rd9qG0KPYX0hi5UM5D84rFTccD6veNdqtLctu4KWxD2hBbZrIA+xLq2fXM87z87QCZ4lXHnHc nf5wio6I= X-Gm-Gg: AZuq6aLB+IJaR7ojnlsPLqCbF0LG/eh2zKdjiTe3i2Q6VCyh7Hdiv6/BaGRzpvpKDYE Rpy0SM24QBjQQpdFENt2jMKQHfX9RaP1GpVPgZCmJ9Bkm3gsHtK3rlgS3eDQ3kjwPKDM3aCI3gL B3ZlC5Dxd1qvh3qjT64sGnB6V1BTY7LI6x7fn9Wr6ddiCW73QbawMpmSzlqKJrSZoSPY4ksieo+ b/frfNRLulJcacskDf+JS4+rJs8sokNvASdht1tw490hFDF7LSCRv2LDTecul7uOj9pehqngtC7 7tOTZ5qrGzBCEWrrPiAvh/S8V7Md/VvNih/dq0BN3yUAR60pl2t6jmFdvqs2V36eSG1EBdcAra8 BwSd+NsG0mHBd9zhBUaENIpuxvEe6dUQsM88j4g2EMDF/2L0L2xCKNkitBX2PW66J53fOoSBW09 nD1BdywKa8FgNWHYDWLn0W0+8RiIyF3Ui7sPv6H0fFj5LeDLsJuibDVmEqZw+zSyGMvv6e5RlVp FzmMweDBp2rZ0xUog== X-Received: by 2002:a05:690c:6809:b0:784:aec5:7042 with SMTP id 00721157ae682-794399d2cd2mr14581537b3.38.1769130401665; Thu, 22 Jan 2026 17:06:41 -0800 (PST) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 00721157ae682-7943ae2d7f9sm4544437b3.0.2026.01.22.17.06.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Jan 2026 17:06:41 -0800 (PST) Date: Thu, 22 Jan 2026 20:06:40 -0500 From: Taylor Blau To: Patrick Steinhardt Cc: git@vger.kernel.org, Karthik Nayak , Justin Tobler , Junio C Hamano Subject: Re: [PATCH v3 11/14] odb: introduce mtime fields for object info requests Message-ID: References: <20260121-pks-odb-for-each-object-v3-0-12c4dfd24227@pks.im> <20260121-pks-odb-for-each-object-v3-11-12c4dfd24227@pks.im> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20260121-pks-odb-for-each-object-v3-11-12c4dfd24227@pks.im> On Wed, Jan 21, 2026 at 01:50:27PM +0100, Patrick Steinhardt wrote: > There are some use cases where we need to figure out the mtime for > objects. Most importantly, this is the case when we want to prune > unreachable objects. But getting at that data requires users to manually > derive the info either via the loose object's mtime, the packfiles' > mtime or via the ".mtimes" file. > > Introduce a new `struct object_info::mtimep` pointer that allows callers > to request an object's mtime. This new field will be used in a > subsequent commit. The goal seems reasonable to me, but I am a little unsure about whether or not this is the right place to expose this information. I have some more thoughts below... > diff --git a/object-file.c b/object-file.c > index 65e730684b..c0f896673b 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -409,6 +409,7 @@ static int read_object_info_from_path(struct odb_source *source, > char hdr[MAX_HEADER_LEN]; > unsigned long size_scratch; > enum object_type type_scratch; > + struct stat st; I was a little confused why we were declaring a stat struct here... > /* > * If we don't care about type or size, then we don't > @@ -421,7 +422,7 @@ static int read_object_info_from_path(struct odb_source *source, > if (!oi || (!oi->typep && !oi->sizep && !oi->contentp)) { > struct stat st; > > - if ((!oi || !oi->disk_sizep) && (flags & OBJECT_INFO_QUICK)) { > + if ((!oi || (!oi->disk_sizep && !oi->mtimep)) && (flags & OBJECT_INFO_QUICK)) { > ret = quick_has_loose(source->loose, oid) ? 0 : -1; > goto out; > } > @@ -431,8 +432,12 @@ static int read_object_info_from_path(struct odb_source *source, > goto out; > } > > - if (oi && oi->disk_sizep) > - *oi->disk_sizep = st.st_size; > + if (oi) { > + if (oi->disk_sizep) > + *oi->disk_sizep = st.st_size; ...and then assigning it here without actually calling lstat() between the two. But the diff context elides the fact that there is another stat declaration within this block that we *do* lstat() into before reading it. That tripped me up a little while reviewing, but not a huge deal. I do wonder whether or not there is a clearer way to structure all of these conditionals. I *think* that what you wrote here is right, but the way that it has grown organically over time (to be clear, not the fault of your series) makes it a little difficult to follow. > + if (oi->mtimep) > + *oi->mtimep = st.st_mtime; > + } > > ret = 0; > goto out; > @@ -446,7 +451,21 @@ static int read_object_info_from_path(struct odb_source *source, > goto out; > } > > - map = map_fd(fd, path, &mapsize); > + if (fstat(fd, &st)) { > + close(fd); > + ret = -1; > + goto out; > + } Makes sense. We were previously letting map_fd() take care of stat()-ing the file to know how large the mmap should be, but now we might need that information for the mtime as well. So doing what map_fd() is doing underneath here directly makes sense. > diff --git a/odb.c b/odb.c > index 65f0447aa5..67decd3908 100644 > --- a/odb.c > +++ b/odb.c > @@ -702,6 +702,8 @@ static int do_oid_object_info_extended(struct object_database *odb, > oidclr(oi->delta_base_oid, odb->repo->hash_algo); > if (oi->contentp) > *oi->contentp = xmemdupz(co->buf, co->size); > + if (oi->mtimep) > + *oi->mtimep = 0; Assuming that you do not change the object_info request/response semantics, I wonder if it might make sense to zero out the entirety of the response section as a belt-and-suspenders mechanism in case future contributors forget to assign zero to the new fields themselves. > @@ -1619,16 +1620,34 @@ int packed_object_info(struct packed_git *p, > } > } > > - if (oi->disk_sizep) { > - uint32_t pos; > - if (offset_to_pack_pos(p, obj_offset, &pos) < 0) { > + if (oi->disk_sizep || (oi->mtimep && p->is_cruft)) { > + if (offset_to_pack_pos(p, obj_offset, &pack_pos) < 0) { > error("could not find object at offset %"PRIuMAX" " > "in pack %s", (uintmax_t)obj_offset, p->pack_name); > ret = -1; > goto out; > } > + } > + > + if (oi->disk_sizep) > + *oi->disk_sizep = pack_pos_to_offset(p, pack_pos + 1) - obj_offset; > + > + if (oi->mtimep) { > + if (p->is_cruft) { > + uint32_t index_pos; > + > + if (load_pack_mtimes(p) < 0) > + die(_("could not load cruft pack .mtimes")); Do you think it would be worth doing instead: die(_("could not load .mtimes for cruft pack '%s'"), pack_basename(p)); ? Most repositories should only ever have one cruft pack in practice (even so, there should still be some value in identifying it by its checksum in case someone is repacking underneath us). But some repositories will have >1 cruft pack, so knowing which one is busted may be useful in that case. > + > + if (maybe_index_pos) > + index_pos = *maybe_index_pos; > + else > + index_pos = pack_pos_to_index(p, pack_pos); > > - *oi->disk_sizep = pack_pos_to_offset(p, pos + 1) - obj_offset; > + *oi->mtimep = nth_packed_mtime(p, index_pos); > + } else { > + *oi->mtimep = p->mtime; > + } I am a little stuck here on whether or not this is the right layer to determine an object's mtime. On the one hand, it makes sense to me that callers would want to know the mtime of an object, either by the mtime of the loose object on disk, or the mtime of the contain pack otherwise. But I'm not sure whether the GC-specific definition of "mtime" is what the caller would always want. For GC uses, yes, having mtime be aware of cruft packs makes total sense to me. But for non-GC uses, would there ever be a scenario where the caller would want to know the mtime of an object's containing pack, regardless of whether or not that pack is cruft? I suppose they could get around that today by doing something like: if (oi->whence == OI_PACKED) { struct packed_git *p = oi->u.packed.p; if (p->is_cruft) { /* reinterpret the meaning of mtime... */ *oi->mtimep = p->mtime; } } , but that feels a little clunky. I dunno, maybe this hypothetical doesn't really exist and I'm overthinking this. But I have this nagging feeling that we are exposing this information at too low of a level as to make the object store aware of cruft pack/GC-specific mechanics. Thanks, Taylor