From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qv1-f49.google.com (mail-qv1-f49.google.com [209.85.219.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C2B03A782A for ; Thu, 14 May 2026 19:57:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.49 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778788670; cv=none; b=OHvR28UtHj1n/bBpSxM7dy5INHYE+F0XjNiTB7gDosD9uqmWgiRaI+qjLffwOVrOzv/4lds46WH8qLJL6t4zDYRoKlRpN7m98DaMCf9PDD0V57S6D0O/NaD/V4GSuPvT7ms+hH9dbJ/6c1gKz03Pd0bd/imm2NIDrR0YkfC6B7Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778788670; c=relaxed/simple; bh=6oJXVllevu7qA8Z5GaFQBSiXCAfJLMDo82g/KKEZ3KU=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=TfUEtVzBgoWBr8fjcoiO3G9YyMPA03dZKIs6iDU7DJ/Fod3nc63zX7jKII+sxK1ownvfOipzevITcztgdKO9sRsVCGjGrm/4OqdYZpjlJXZwxxwhe05DJR1+sMfeEBpy4zKX9RVxzQbFyFg5OcBEUGN3dxYrpmhaD9TPYVOotSQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JLUUNYbw; arc=none smtp.client-ip=209.85.219.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JLUUNYbw" Received: by mail-qv1-f49.google.com with SMTP id 6a1803df08f44-8bb09239328so69506666d6.3 for ; Thu, 14 May 2026 12:57:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778788667; x=1779393467; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=wBwc1BnIiHLYrnA2OijEKE1ErGWVgqTT//dN1srqyCU=; b=JLUUNYbw4UGf+JryyZnBKC6teomdpr2axHsY+cHzSs2Ob0yuCbgdTgYkbrW+HNTZ5O aUCfpxHuoY/wAAmPMZUmlxMSMEHmmKEedYjw+kg/cc7Sx9Dixf4XRWZU3S9ybQ8F3p4h p3pb8MnYMHTbF71OF5RyYb/NOTBeWLIYWVP3ru4CwABeieTc7R60538YzryuOCsfTNT0 UhgQycmFyjL3JqJ1DLNrdd39wWBuwmoeazQXNTZhVs6BPpS6ENTGodnmOpIWx30Ak3J8 A/Epdz4uLJPrtiP4YQJdHElMWS508l9MSQra+1oSQEPE73Rpu9yBHAAnz+l5l7LQffDZ hphw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778788667; x=1779393467; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wBwc1BnIiHLYrnA2OijEKE1ErGWVgqTT//dN1srqyCU=; b=HWdNyzH9tKXWQmeQpaIspUMgj6IYCQALx0cNYZ566tQr+rOWQLvQJBN+K8XMQOpaPD ydeVL1MlnazmP2WdphIRBdFOduljApqI+R4HyeBK1DbER8AOUmzBpbsJCIOBrC2jeZ2e mCzSq/hQdZqc8TX0wLuTOcALaOdUhxK/oHCfQDvbnL53CbrtNhO/+kVByLhcBn3GzWS4 9UmlgGjAD7mix7SMss0BeznOPUxO9ApgvciZgPvW7PB9EAHMbIxPCo09WH388y/fxQ+T 6zp7OneRHWAFnHpxHJx/BnbIQD6pO+OawxeK2d91HCYQ6cQlu87sECVmd9UR7sRzZFKd 6m5Q== X-Forwarded-Encrypted: i=1; AFNElJ96jcXiibhJzJizT+YjqDqaceOIURtC/cZQ+TDCJjKL4D3fsULfd9FCwbUHy7To6MOQhFQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yxbe9KdiQsXY1FaismJlHwgssVfZss6Y7QNAlSEiYWpqBavXpXM 3lXhC4IxY0HRpNU0pZunbhelm67WK6dZk+ERzUHpGXscdlSVU2WSyFMW X-Gm-Gg: Acq92OFbM+g3AbigyTdX+jgBvfV0/+RNDQjKFLVqb9GTHADtIN4DSQcSleMVe5whAQO uOOrGjQvRQ59iQLaeY10br+tnQZWPOcet05ZCMfIrYpAJCtE6a2B0stvZA7YHotIwtb7S+m58hH 0+4RVISXGckfEug3Yq7b+0bULTGXcgyWge03JTGJopCg7dMdxjBTET/d0KuUNI0yG4fGz4a5IHH 766m/BchQ0czXLXiK6fMtFv3VK89OVQ/Zz8Fbhq2hI0uUDvReaEOtqDjmrQyWmUAobciQyc381K GGi4Dx5LDTObDvSTutS/Uy9d+sS7MtuMP17nfSupp0AmnjOqBA8aQ8fHUdk6XxDBQjzb7LolWrN l3aeTCGMrnWcZKaTOTWdq55hSJhMU+szkrDoCGJUctplloc+Ik9BmpH5VJmpCHH5B8U3fixLEZV R8NZc4YnvN+jN5idj8U+08VZFJDYf3a1C7ut0VdCNPDBajDfxsvzTRQi8vjVIRpBKnVnFdkw== X-Received: by 2002:a0c:f014:0:b0:8c7:2c4e:ec3c with SMTP id 6a1803df08f44-8ca0f6a09d0mr16412956d6.29.1778788667122; Thu, 14 May 2026 12:57:47 -0700 (PDT) Received: from [192.168.1.109] ([136.61.121.155]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8c908d1bb46sm31846706d6.11.2026.05.14.12.57.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 14 May 2026 12:57:46 -0700 (PDT) Message-ID: <7e5abff7-79c9-41c3-9cfa-2aaf0e69a6a8@gmail.com> Date: Thu, 14 May 2026 15:57:46 -0400 Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] revision: use priority queue in limit_list() To: Kristofer Karlsson via GitGitGadget , git@vger.kernel.org Cc: Kristofer Karlsson References: Content-Language: en-US From: Derrick Stolee In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 5/14/2026 12:51 PM, Kristofer Karlsson via GitGitGadget wrote: > From: Kristofer Karlsson > > limit_list() maintains a date-sorted work queue of commits using a > linked list with commit_list_insert_by_date() for insertion. Each > insertion walks the list to find the right position — O(n) per insert. > In repositories with merge-heavy histories, the symmetric difference > can contain thousands of commits, making this O(n) insertion the > dominant cost. Linear operations are bad, especially when multiplied by linear-ish loops, causing quadratic behavior. > Replace the sorted linked list with a prio_queue (binary heap). This > gives O(log n) insertion and O(log n) extraction instead of O(n) > insertion and O(1) extraction, which is a net win when the queue is > large. Yes, much better. > The still_interesting() and everybody_uninteresting() helpers are > updated to scan the prio_queue's contiguous array instead of walking a > linked list. process_parents() already accepts both a commit_list and > a prio_queue parameter, so the change in limit_list() simply switches > which one is passed. > > Benchmark: git rev-list --left-right --count HEAD~N...HEAD > Repository: 2.3M commits, merge-heavy DAG (monorepo) > Best of 5 runs, times in seconds: > > commits in > symmetric diff baseline patched speedup > -------------- -------- ------- ------- > 10 0.01 0.01 1.0x > 50 0.01 0.01 1.0x > 3751 21.23 8.49 2.5x > 4524 21.70 8.29 2.6x > 10130 20.10 6.65 3.0x > > No change for small traversals; 2.5-3.0x faster when the queue grows > to thousands of commits. This is good. Is there any chance that you could demonstrate this with any commits in the Git repo? It does have some interesting behavior, especially around point releases that are independent from the 'master' branch and thus could have lopsided symmetric differences using well- established tag names. > Switching to a prio_queue (binary heap) reduces insertion cost to O(log > w), bringing total cost to O(N·log w). The practical result on the same > repository: > > commits in > symmetric diff before after speedup > -------------- -------- ------- ------- > 3751 21.2s 8.5s 2.5x > 4524 21.7s 8.3s 2.6x > 10130 20.1s 6.6s 3.0x Very nice! I notice that this data is in your cover letter, but not the commit message. Is that intentional? > This affects any command that triggers limit_list() — i.e., when > revs->limited is set — including --left-right, --cherry-mark, > --cherry-pick, --ancestry-path, bisect, and rebase's fork-point > computation. The practical trigger is git status --ahead-behind on a > branch that has diverged from upstream in a merge-heavy repository. This also impacts 'git log --graph' when there is no serialized commit-graph file. We are still using limit_list() in that case. > The change is minimal (+21/−17 lines, single file) because > process_parents() already accepts both a commit_list and a prio_queue > parameter — limit_list() just switches which one it passes. The key logic is turning the initial list into the starting points for the priority queue and everything else is about moving types around, it seems. > @@ -1451,6 +1447,7 @@ static int limit_list(struct rev_info *revs) > struct commit_list *newlist = NULL; > struct commit_list **p = &newlist; > struct commit *interesting_cache = NULL; > + struct prio_queue queue = { .compare = compare_commits_by_commit_date }; Here, we are _not_ using generation numbers, which is correct for this case because we are matching the date-based sorting of the previous list. > while (original_list) { > struct commit *commit = pop_commit(&original_list); > + prio_queue_put(&queue, commit); > + } > + > + while (queue.nr) { > + struct commit *commit = prio_queue_get(&queue); > struct object *obj = &commit->object; This is a fun reuse of lines to take the old "drain the list as it is being mutated" loop and turn it into "fill the priority queue" and "drain the priority queue as it is being mutated" This code change looks good. No new tests are needed, since this is a performance-only change. Do any of the tests in t/perf/ demonstrate this improvement? Thanks, -Stolee