git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Karthik Nayak <karthik.188@gmail.com>
To: git@vger.kernel.org
Cc: jltobler@gmail.com, ps@pks.im, toon@iotcl.com,
	 Karthik Nayak <karthik.188@gmail.com>
Subject: [PATCH v2 0/2] bundle: fix non-linear performance scaling with refs
Date: Tue, 08 Apr 2025 11:00:51 +0200	[thread overview]
Message-ID: <20250408-488-generating-bundles-with-many-references-has-non-linear-performance-v2-0-0802fc36a23d@gmail.com> (raw)
In-Reply-To: <20250401-488-generating-bundles-with-many-references-has-non-linear-performance-v1-0-6d23b2d96557@gmail.com>

Hello, 

At GitLab, we noticed that bundle creation doesn't seem to scale linearly 
the number of references in a repository. The following benchmark demostrates
the issue:

Benchmark 1: bundle (refcount = 100)
  Time (mean ± σ):       4.4 ms ±   0.5 ms    [User: 1.8 ms, System: 2.4 ms]
  Range (min … max):     3.4 ms …   7.7 ms    434 runs

Benchmark 2: bundle (refcount = 1000)
  Time (mean ± σ):      16.5 ms ±   1.7 ms    [User: 9.6 ms, System: 7.2 ms]
  Range (min … max):    14.1 ms …  21.7 ms    176 runs

Benchmark 3: bundle (refcount = 10000)
  Time (mean ± σ):     220.6 ms ±   3.2 ms    [User: 171.6 ms, System: 55.7 ms]
  Range (min … max):   215.8 ms … 224.9 ms    13 runs

Benchmark 4: bundle (refcount = 100000)
  Time (mean ± σ):      9.622 s ±  0.063 s    [User: 9.143 s, System: 0.546 s]
  Range (min … max):    9.563 s …  9.738 s    10 runs

Summary
  bundle (refcount = 100) ran
    3.79 ± 0.61 times faster than bundle (refcount = 1000)
   50.63 ± 6.39 times faster than bundle (refcount = 10000)
 2207.95 ± 277.35 times faster than bundle (refcount = 100000)

Digging into this, the reason for this is because we check for duplicate refnames
added by the user. But this check uses an O(N^2) algorithm, which would not 
scale linearly with the number of refs.

The first commit in this small series adds a bunch of tests for this behavior,
while also discovering a missed edge case. The second commit introduces an 
alternative approach which uses an 'strset' to check for duplicates. The new
approach fixes the performance problems noticed while also fixing the earlier
missed edge case. Overall we see a 6x performance improvement with this series.

I found that there is a conflict with 'ps/object-wo-the-repository' in seen,
the resolution seems simple enough. Happy to support as needed.

---
Changes in v2:
- Use STRSET_INIT macro instead of strset_init(). 
- Link to v1: https://lore.kernel.org/r/20250401-488-generating-bundles-with-many-references-has-non-linear-performance-v1-0-6d23b2d96557@gmail.com

---
 bundle.c               |  8 +++++++-
 object.c               | 33 -------------------------------
 object.h               |  6 ------
 t/t6020-bundle-misc.sh | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+), 40 deletions(-)

Karthik Nayak (2):
      t6020: test for duplicate refnames in bundle creation
      bundle: fix non-linear performance scaling with refs
---

Range-diff versus v1:

1:  1fd141cd34 = 1:  2b608cb908 t6020: test for duplicate refnames in bundle creation
2:  25b86d1a6c ! 2:  054389419f bundle: fix non-linear performance scaling with refs
    @@ bundle.c: static int write_bundle_refs(int bundle_fd, struct rev_info *revs)
      {
      	int i;
      	int ref_count = 0;
    -+	struct strset objects;
    -+
    -+	strset_init(&objects);
    ++	struct strset objects = STRSET_INIT;
      
      	for (i = 0; i < revs->pending.nr; i++) {
      		struct object_array_entry *e = revs->pending.objects + i;


---

base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e
change-id: 20250322-488-generating-bundles-with-many-references-has-non-linear-performance-64aec8e0cf1d

Thanks
- Karthik


  parent reply	other threads:[~2025-04-08  9:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-01 17:00 [PATCH 0/2] bundle: fix non-linear performance scaling with refs Karthik Nayak
2025-04-01 17:00 ` [PATCH 1/2] t6020: test for duplicate refnames in bundle creation Karthik Nayak
2025-04-01 17:00 ` [PATCH 2/2] bundle: fix non-linear performance scaling with refs Karthik Nayak
2025-04-03 19:07   ` Toon Claes
2025-04-06 20:48     ` Karthik Nayak
2025-04-08  9:00 ` Karthik Nayak [this message]
2025-04-08  9:00   ` [PATCH v2 1/2] t6020: test for duplicate refnames in bundle creation Karthik Nayak
2025-04-08  9:00   ` [PATCH v2 2/2] bundle: fix non-linear performance scaling with refs Karthik Nayak
2025-04-10  8:57     ` Toon Claes
2025-04-10  9:04       ` Karthik Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250408-488-generating-bundles-with-many-references-has-non-linear-performance-v2-0-0802fc36a23d@gmail.com \
    --to=karthik.188@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=ps@pks.im \
    --cc=toon@iotcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).