From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06EA91EF372 for ; Fri, 2 May 2025 23:21:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746228094; cv=none; b=JG3+C37PNlROBhvyE98ym+hFlH3HXwrfUQhTKpDZq8P8COieMJcFDWCcM5UhHNAlUTN4I4H7X2g37dK6nYfiThEhg+5NTFmVLcp7Y3nCKOtu1aTgg73g+Jvyw+lwZzOk/1N2ae9NEIBkkuXpzd3iuMSRh5CIXIfojqv2GUY3VCs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746228094; c=relaxed/simple; bh=dru6ILqhMZtWjde1l6/M1C9h6xvEPsZRTnlNAMDPh1A=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=h49VVfwijqrN7sE+ysmTerSQYf6/DTtAoZESSNs+ncsY+UnotQacS4qsqa0rno1TmV7qrt9vcPUbV+UPfI3Bt6PbP6iUzHbcMdwIamZoo5MU+dmyQlgUYnFritrRCGz5ex1IJe9tbKPXf18ggCHOh0sCRVdeuRPqjhUcEI3GYI0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com; spf=pass smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b=tDDC19Y2; arc=none smtp.client-ip=209.85.222.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="tDDC19Y2" Received: by mail-qk1-f177.google.com with SMTP id af79cd13be357-7c56a3def84so251567585a.0 for ; Fri, 02 May 2025 16:21:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1746228091; x=1746832891; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=gRdO7akMND3mC2mlDsLhbcQKhR2ko5mvKkAkKoJKwFs=; b=tDDC19Y2Kmv25wynqjBu6NwzCvGee7wfqdqHbpRgWuFZMqe5bQBn7PZXSYRXjF3Zc7 zfrmrwFO80ETQIJRY3UEpXltRBsPg3ZXxdX3tU4ZUZi1SOQNAyEKKM6fUqLrf4Vz5N2t OJfxHtvBTAuHS8IZGZQ9kci4bNozPN0vSaGWMYVwHNWIdAr/dmozngYYzhycFdUMno/H kDLJZmZXVcIIhN7A3JKQwxXMcP0v241QNLzTT9ivTK5KE/X/L5+2whBgNiW+o1GhHjX0 orVGQij4Bo+W5PYDMJzjhw1i6Sr4NEVSgAAacitVF4AlLuNVQa2b2UGnmjXRTLI72Hxz OvRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746228091; x=1746832891; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=gRdO7akMND3mC2mlDsLhbcQKhR2ko5mvKkAkKoJKwFs=; b=BnXM5LoHchTkASyFXqrVe+4Hhdkp1ClPY7JBZvlUosJjuUp6M2VlPNbI29sWD6LmYy Sav5u93MtcN0xkJ5SKNO4K757FVBWhHnNgOYV7mezpaJqbzIfosyneYXgUs9R0kU2J0s 6lULarSYurMJWPR+gJRe3I6ZS8L9WIjqWwioyyeRFaharZgXYfw3YRo/VQumHEo1SlpR WFuQesRGa0aAUJSyljTF4qrNsCYs1vOcelwlrZgt/2mjC+DLwLUZ7b/M76t74rvTn0h7 TCRm2WS4wibmTvFC6C9PdSWmkZ3ubxS7iyjuUamkBAmPVbJlLskmZ8uqYcvaw6VQlcll j0UA== X-Gm-Message-State: AOJu0YyXRzYPFPbq2iW6WCAuz2GVyPyE+3KBc3OAI74duFj1Kxm6fh15 Q5TYk5q5VWaXdUxTlsupzj9jrPJxiVb13KIlgFhi7IK2+FFWmDl4aDOqiwrsgQ0= X-Gm-Gg: ASbGncu+LvazoHyO9exxqZxA9BrrgXxl8ruCCtC7zudtT+F0Q/zb7ttYkNdFBl93beE zvMt2gANraIB7656RjfKi3zYUkDa7ObpWGDS2aL+nN0pmx8JA776nY4Z9vYTxjkms0Hwe73dBnx uvZIU1D1kGGjnIogUrNMMUNOC0K44KzQ12Avf4ELjrwsq1fTaUQXdcRE0R7QSDTCStWZUvVg4Ms bregfzx49lFtyFbhf3mLZwNssFvmx5vl5wGfBUPpvMwIGWFx4R24+4QyNRDUbQNiLECiCLeFNDW SgEoIy4nOKB72fwBD7p2ii7SZ0Nj2XGreYTjDBOTtrr37EBsE6tet1Fddrwuge3VPRnOetGV2l9 5d0V0LZ3zAmxe X-Google-Smtp-Source: AGHT+IGcg0mk23H41aIPz9zaDOnSrR9+9LNWecgzqBDTEGPH+O98oohEy4ihv/ZvmDJ+W1eNeYVgWQ== X-Received: by 2002:a05:620a:171f:b0:7c9:2383:afef with SMTP id af79cd13be357-7cadfea91aemr132428785a.37.1746228090626; Fri, 02 May 2025 16:21:30 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with UTF8SMTPSA id af79cd13be357-7cad242b657sm253356685a.69.2025.05.02.16.21.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 May 2025 16:21:30 -0700 (PDT) Date: Fri, 2 May 2025 19:21:22 -0400 From: Taylor Blau To: Derrick Stolee via GitGitGadget Cc: git@vger.kernel.org, christian.couder@gmail.com, gitster@pobox.com, johannes.schindelin@gmx.de, johncai86@gmail.com, jonathantanmy@google.com, karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com, newren@gmail.com, peff@peff.net, ps@pks.im, Derrick Stolee Subject: Re: [PATCH v2 02/13] pack-objects: add --path-walk option Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On Mon, Mar 24, 2025 at 03:22:38PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee > > In order to more easily compute delta bases among objects that appear at > the exact same path, add a --path-walk option to 'git pack-objects'. > > This option will use the path-walk API instead of the object walk given > by the revision machinery. Since objects will be provided in batches > representing a common path, those objects can be tested for delta bases > immediately instead of waiting for a sort of the full object list by > name-hash. This has multiple benefits, including avoiding collisions by > name-hash. > > The objects marked as UNINTERESTING are included in these batches, so we > are guaranteeing some locality to find good delta bases. > > After the individual passes are done on a per-path basis, the default > name-hash is used to find other opportunistic delta bases that did not > match exactly by the full path name. Thanks for the clear explanation here. > The current implementation performs delta calculations while walking > objects, which is not ideal for a few reasons. First, this will cause > the "Enumerating objects" phase to be much longer than usual. Second, it > does not take advantage of threading during the path-scoped delta > calculations. Even with this lack of threading, the path-walk option is > sometimes faster than the usual approach. Future changes will refactor > this code to allow for threading, but that complexity is deferred until > later to keep this patch as simple as possible. I can't remember if I called this out in my previous round of review, but I definitely appreciate you separating out the threading implementation. When I first started to read this paragraph, I thought "huh, I bet this is parallelizeable", and I was glad to see that you both (a) called out the same thing, and (b) deferred it to a later patch to reduce the complexity of this one. Thanks. > This new walk is incompatible with some features and is ignored by > others: > > * Object filters are not currently integrated with the path-walk API, > such as sparse-checkout or tree depth. A blobless packfile could be > integrated easily, but that is deferred for later. > > * Server-focused features such as delta islands, shallow packs, and > using a bitmap index are incompatible with the path-walk API. > > * The path walk API is only compatible with the --revs option, not > taking object lists or pack lists over stdin. These alternative ways > to specify the objects currently ignores the --path-walk option > without even a warning. Much appreciated to likewise call out the (current) limitations here as well. > Future changes will create performance tests that demonstrate the power > of this approach. > > Signed-off-by: Derrick Stolee > --- > Documentation/git-pack-objects.adoc | 14 +- > Documentation/technical/api-path-walk.adoc | 1 + > builtin/pack-objects.c | 147 +++++++++++++++++++-- > t/t5300-pack-object.sh | 15 +++ > 4 files changed, 167 insertions(+), 10 deletions(-) > > diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc > index 7f69ae4855f..7065758eddf 100644 > --- a/Documentation/git-pack-objects.adoc > +++ b/Documentation/git-pack-objects.adoc > @@ -16,7 +16,7 @@ SYNOPSIS > [--cruft] [--cruft-expiration=