From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from cloud.peff.net (cloud.peff.net [217.216.95.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D1D73A7F5D for ; Mon, 8 Jun 2026 22:34:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.216.95.84 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780958080; cv=none; b=rY7NI3l4lnnOdizXvVSM2YIwQUInMY8ImdZbaC7+0VIGjdZdZJHoAJ/F3YcjewIqWP9c3Y9k1DYoUciVjwZKi8sNNeQhJY1+A3Weh2Yfg/7h+ff602iSUX6z1XMCXHR1kvx6PWt9Cvg7/d6xd9vpH2G7vfA1uBocfOxfWaX6a2g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780958080; c=relaxed/simple; bh=TRuUjfguJZJTm2dJoUH/HfX+3Yffa2+QLsgHq84FkgA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=dhL36b0tIK/JterDvw0hqAkiDFxdzRkQ7ku/A9AFALZCLwauAV2TaUMra31MY4jBOAo3KGV8Xf+e4B9kdhYjdXVYxmofiiNT7SCOmGGK+N3jXGnxcuOJbCSVQw6JoTAR8CXH7l8B6+80KZQdwX//JQW0fOiKYoFyq+Au1JeJJyo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net; spf=pass smtp.mailfrom=peff.net; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b=QFcybVSf; arc=none smtp.client-ip=217.216.95.84 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=peff.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=peff.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=peff.net header.i=@peff.net header.b="QFcybVSf" Received: (qmail 83549 invoked by uid 106); 8 Jun 2026 22:34:31 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=peff.net; h=date:from:to:cc:subject:message-id:references:mime-version:content-type:in-reply-to; s=20240930; bh=TRuUjfguJZJTm2dJoUH/HfX+3Yffa2+QLsgHq84FkgA=; b=QFcybVSfHs5pI3dliJeewRbgMcNiQAxzfNDvNm7eehVAUCkya0s2HBxzxjn37a/EaO/YdunSsRMdn0bKoEBiziohQ63KXhfgo119Amvkkby6pA+BCdVs3/Dxs/UWThxyAOWKB/6jbBHrzNcT3lk2jKqHWZa+DrmGCFeMeNwdDlIY0eukzFEUISItqhYlE3+cE0EMVh3IQ1JrUZpniyjT9f4MudNLX3ODdg0TuKds79YlMWd+0D0se2tm6DbULx60CLm0+dMmD/su1eckqNGDpk9kE7w7/g/7Z/xExE1M5wRzsDwvVqdXhLmui8VLuHfRWP/CQmMyFnj/12jClbe5Ow== Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Mon, 08 Jun 2026 22:34:31 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 243805 invoked by uid 111); 8 Jun 2026 22:34:33 -0000 Received: from coredump.intra.peff.net (HELO coredump.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Mon, 08 Jun 2026 18:34:33 -0400 Authentication-Results: peff.net; auth=none Date: Mon, 8 Jun 2026 18:34:30 -0400 From: Jeff King To: Tamir Duberstein Cc: git@vger.kernel.org, Karthik Nayak , Junio C Hamano , Victoria Dye , Derrick Stolee , Elijah Newren , Kristofer Karlsson Subject: Re: [PATCH] ref-filter: reuse --contains traversal results Message-ID: <20260608223430.GA340696@coredump.intra.peff.net> References: <20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com> Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20260607-ref-filter-memoized-contains-v1-1-a1972dde9c76@gmail.com> On Sun, Jun 07, 2026 at 08:33:29PM -0700, Tamir Duberstein wrote: > git branch and git for-each-ref call repo_is_descendant_of() for each > candidate selected by --contains or --no-contains. Each call starts a > new graph walk, so refs with shared history repeatedly traverse the same > commits. > > ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced > the tag traversal that caches positive and negative answers across > candidates. ee2bd06b0f (ref-filter: implement '--contains' option, > 2015-07-07) preserved the branch and tag implementations when ref-filter > learned --contains. 008ed7df930 (tag.c: use the correct algorithm for > the '--contains' option, 2015-10-18) noted that they should be unified. > > Use the memoized traversal for every ref-filter contains check and > remove the implementation selector. The cache records answers for one > fixed target list, so document that callers must clear it before > changing the list. The subject line obfuscated the intent here (at least for me). I think a more clear subject would just be: "ref-filter: always use contains_tag_algo" or something. But more importantly, I think the analysis above is missing a key point about why we didn't make the tag algo the default in the first place: it is depth first, and thus slower when the merge base can be found quickly by the breadth-first traversal. For tags, you tend to have to look at all of history anyway (because you have at least one old tag that requires walking back that far), but that is often not true for branches. We are able to get the best of both worlds if we can cut off the depth-first traversal early using generation numbers. So I think a better rule here is to tweak the selection in commit_contains() to select the depth-first algorithm when we have generation numbers enabled. There's a patch in an old thread, which was revived a week or two ago by Kristofer (cc'd): https://lore.kernel.org/git/20260527070510.3510836-1-krka@spotify.com/ > The memoized depth-first walk assumes acyclic ancestry, but replacement > refs can create cycles. Track commits while they are on the walk. If a > cycle is found, discard partial cache entries and use > repo_is_descendant_of() for that candidate. I can believe that the depth-first code doesn't handle cycles well. But if that's the case, then it's already a problem for "git tag --contains". And we should fix it as a separate patch from enabling that algorithm in more cases. I'm not quite sure how ancestry should be defined in a cycle. How does the algorithm behave now when it sees a cycle? If it loops infinitely, we definitely would want to fix that. If not, then to some degree I don't care too much what answer is provided, since the input is somewhat nonsense in the first place. And if it is expensive to track, it might not be worth inflicting that penalty on the sane cases. But it looks like your solution is just setting an extra flag value in the slab, which should be pretty cheap. > The branch and for-each-ref path passed repo_is_descendant_of() through > a Boolean interface. In configurations where it returned -1 for missing > ancestry, ref-filter treated the error as "contains". The memoized path > instead fails when ancestry cannot be parsed, as git tag already did. > During review of the 2018 reachability series, making parse failures > fatal was explicitly deferred because that series was intended to > preserve behavior. Unifying the implementations now makes all callers > fail consistently instead of preserving that accidental Boolean > interpretation. I think that's a good outcome. > The added p1500 case uses up to 8,192 packed refs along one first-parent > history. It improves from 0.68 to 0.03 seconds. > > On a checkout with 62,174 remote-tracking refs, I ran: > > hyperfine --warmup 0 --runs 3 \ > --command-name parent \ > '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \ > --command-name this-commit \ > '"$this" branch -r --contains c78ae85f3ce7e >/dev/null' > > The results were: > > parent this commit > elapsed 104.365 s 467.7 ms > user 93.702 s 220.2 ms > system 0.723 s 182.7 ms I didn't time it, but the probable regression case is something like this: a very deep history with a small number of branches diverging only a few commits away. Without a commit-graph file (or one without generation numbers), that probably makes "git branch --contains" slower. -Peff