From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73EF86AB88 for ; Thu, 21 Dec 2023 11:13:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Received: (qmail 22817 invoked by uid 109); 21 Dec 2023 11:13:34 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Thu, 21 Dec 2023 11:13:34 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 23348 invoked by uid 111); 21 Dec 2023 11:13:34 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Thu, 21 Dec 2023 06:13:34 -0500 Authentication-Results: peff.net; auth=none Date: Thu, 21 Dec 2023 06:13:33 -0500 From: Jeff King To: Taylor Blau Cc: git@vger.kernel.org, Patrick Steinhardt , Junio C Hamano Subject: Re: [PATCH 00/24] pack-objects: multi-pack verbatim reuse Message-ID: <20231221111333.GE570888@coredump.intra.peff.net> References: <20231212081238.GD1117953@coredump.intra.peff.net> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Fri, Dec 15, 2023 at 10:37:57AM -0500, Taylor Blau wrote: > On Tue, Dec 12, 2023 at 03:12:38AM -0500, Jeff King wrote: > > So my question is: how much of what you're seeing is from (1) and (2), > > and how much is from (3)? Because there are other ways to trigger (3), > > such as lowering the window size. For example, if you try your same > > packing example with --window=0, how do the CPU and output size compare > > to the results of your series? (I'd also check peak memory usage). > > Interesting question! Here are some preliminary numbers on my machine > (which runs Debian unstable with a Intel Xenon W-2255 CPU @ 3.70GHz and > 64GB of RAM). > > I ran the following hyperfine command on my testing repository, which > has the Git repository broken up into ~75 packs or so: Thanks for running these tests. The results are similar to what expected, which is: yes, most of your CPU savings are from skipping deltas, but not all. Here's what I see (which I think is mostly redundant with what you've said, but I just want to lay out my line of thinking). I'll reorder your quoted sections a bit as I go: > Benchmark 2: multi-pack reuse, pack.window=0 > [...] > Time (mean ± σ): 1.075 s ± 0.005 s [User: 0.990 s, System: 0.188 s] > Range (min … max): 1.071 s … 1.088 s 10 runs > > Benchmark 4: multi-pack reuse, pack.window=10 > [...] > Time (mean ± σ): 1.028 s ± 0.002 s [User: 1.150 s, System: 0.184 s] > Range (min … max): 1.026 s … 1.032 s 10 runs OK, so when we're doing more full ("multi") reuse, the pack window doesn't make a big difference either way. You didn't show the stderr from each, but presumably most of the objects are hitting the "reuse" path, and only a few are deltas (and that is backed up by the fact that doing deltas only gives us a slight improvement in the output size: > Benchmark 2: multi-pack reuse, pack.window=0 > 268.670 MB > Benchmark 4: multi-pack reuse, pack.window=10 > 266.473 MB Comparing the runs with less reuse: > Benchmark 1: single-pack reuse, pack.window=0 > [...] > Time (mean ± σ): 1.248 s ± 0.004 s [User: 1.160 s, System: 0.188 s] > Range (min … max): 1.244 s … 1.259 s 10 runs > > Benchmark 3: single-pack reuse, pack.window=10 > [...] > Time (mean ± σ): 6.281 s ± 0.024 s [User: 43.727 s, System: 0.492 s] > Range (min … max): 6.252 s … 6.326 s 10 runs there obviously is a huge amount of time saved by not doing deltas, but we pay for it with a much bigger pack: > Benchmark 1: single-pack reuse, pack.window=0 > 264.443 MB > Benchmark 3: single-pack reuse, pack.window=10 > 194.355 MB But of course that "much bigger" pack is about the same size as the one we get from doing multi-pack reuse. Which is not surprising, because both are avoiding looking for new deltas (and the packs after the preferred one probably have mediocre deltas). So I do actually think that disabling pack.window gives you a similar-ish tradeoff to expanding the pack-reuse code (~6s down to ~1s, and a 36% embiggening of the resulting pack size). Which implies that one option is to scrap your entire series and just set pack.window. Basically comparing multi/10 (your patches) to single/0 (a hypothetical config option), which have similar run-times and pack sizes. But that's not quite the whole story. There is still a CPU improvement in your series (1.2s vs 1.0s, a 20% speedup). And as I'd expect, a memory improvement from avoiding the extra book-keeping (almost 10%): > Benchmark 1: single-pack reuse, pack.window=0 > 354.224 MB (max RSS) > Benchmark 4: multi-pack reuse, pack.window=10 > 328.786 MB (max RSS) So while it's a lot less code to just set the window size, I do think those improvements are worth it. And really, it's the same tradeoff we make for the single-pack case (i.e., one could argue that we could/should rip out the verbatim-reuse code entirely in favor of just tweaking the window size). > It's pretty close between multi-pack reuse with a window size of 0 and > a window size of 10. If you want to optimize for pack size, you could > trade a ~4% reduction in pack size for a ~1% increase in peak memory > usage. I think if you want to optimize for pack size, you should consider repacking all-into-one to get better on-disk deltas. ;) I know that's easier said than done when the I/O costs are significant. I do wonder if storing thin packs on disk would let us more cheaply reach a state that could serve optimal-ish packs without spending CPU computing bespoke deltas for each client. But that's a much larger topic. -Peff