git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [QUESTION] Performance comparison: full clone + sparse-checkout vs partial clone + sparse-checkout
@ 2024-10-28 10:30 Manoraj K
  2024-11-07  4:52 ` Manoraj K
  0 siblings, 1 reply; 4+ messages in thread
From: Manoraj K @ 2024-10-28 10:30 UTC (permalink / raw)
  To: git; +Cc: stolee, Shubham Kanodia, Ajith Kuttickattu Sakharia

Hi,

We've conducted benchmarks comparing Git operations between a fully
cloned and partially cloned repository (both using sparse-checkout).
We'd like to understand the technical reasons behind the consistent
performance gains we're seeing in the partial clone setup.

Benchmark Results:

Full Clone + Sparse-checkout:
- .git size: 8.5G
- Git index size: 20MB
- Pack objects: 18,761,646
- Operations (mean ± std dev):
  * git status: 0.634s ± 0.004s
  * git commit: 2.677s ± 0.019s
  * git checkout branch: 0.615s ± 0.005s
  * git pull (no changes): 5.983s ± 0.391s

Partial Clone + Sparse-checkout:
- .git size: 2.0G
- Git index size: 20MB
- Pack objects: 13,560,436
- Operations (mean ± std dev):
  * git status: 0.575s ± 0.012s (9.3% faster)
  * git commit: 2.164s ± 0.032s (19.2% faster)
  * git checkout branch: 0.724s ± 0.154s
  * git pull (no changes): 1.866s ± 0.018s (68.8% faster)

Key Questions:
1. What are the technical factors causing these performance
improvements in the partial clone setup?
2. To be able to get these benefits, is there a way to convert our
existing fully cloned repository to behave like a partial clone
without re-cloning from scratch?

Appreciate any insights here.

Best regards,
Manoraj K

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [QUESTION] Performance comparison: full clone + sparse-checkout vs partial clone + sparse-checkout
  2024-10-28 10:30 [QUESTION] Performance comparison: full clone + sparse-checkout vs partial clone + sparse-checkout Manoraj K
@ 2024-11-07  4:52 ` Manoraj K
  2024-11-08 17:24   ` Elijah Newren
  0 siblings, 1 reply; 4+ messages in thread
From: Manoraj K @ 2024-11-07  4:52 UTC (permalink / raw)
  To: git; +Cc: stolee, Shubham Kanodia, Ajith Kuttickattu Sakharia

Bump

On Mon, Oct 28, 2024 at 4:00 PM Manoraj K <mkenchugonde@atlassian.com> wrote:
>
> Hi,
>
> We've conducted benchmarks comparing Git operations between a fully
> cloned and partially cloned repository (both using sparse-checkout).
> We'd like to understand the technical reasons behind the consistent
> performance gains we're seeing in the partial clone setup.
>
> Benchmark Results:
>
> Full Clone + Sparse-checkout:
> - .git size: 8.5G
> - Git index size: 20MB
> - Pack objects: 18,761,646
> - Operations (mean ± std dev):
>   * git status: 0.634s ± 0.004s
>   * git commit: 2.677s ± 0.019s
>   * git checkout branch: 0.615s ± 0.005s
>   * git pull (no changes): 5.983s ± 0.391s
>
> Partial Clone + Sparse-checkout:
> - .git size: 2.0G
> - Git index size: 20MB
> - Pack objects: 13,560,436
> - Operations (mean ± std dev):
>   * git status: 0.575s ± 0.012s (9.3% faster)
>   * git commit: 2.164s ± 0.032s (19.2% faster)
>   * git checkout branch: 0.724s ± 0.154s
>   * git pull (no changes): 1.866s ± 0.018s (68.8% faster)
>
> Key Questions:
> 1. What are the technical factors causing these performance
> improvements in the partial clone setup?
> 2. To be able to get these benefits, is there a way to convert our
> existing fully cloned repository to behave like a partial clone
> without re-cloning from scratch?
>
> Appreciate any insights here.
>
> Best regards,
> Manoraj K

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [QUESTION] Performance comparison: full clone + sparse-checkout vs partial clone + sparse-checkout
  2024-11-07  4:52 ` Manoraj K
@ 2024-11-08 17:24   ` Elijah Newren
  2024-12-06  3:18     ` Manoraj K
  0 siblings, 1 reply; 4+ messages in thread
From: Elijah Newren @ 2024-11-08 17:24 UTC (permalink / raw)
  To: Manoraj K; +Cc: git, stolee, Shubham Kanodia, Ajith Kuttickattu Sakharia

On Wed, Nov 6, 2024 at 8:52 PM Manoraj K <mkenchugonde@atlassian.com> wrote:
>
> Bump
>
> On Mon, Oct 28, 2024 at 4:00 PM Manoraj K <mkenchugonde@atlassian.com> wrote:
> >
> > Hi,
> >
> > We've conducted benchmarks comparing Git operations between a fully
> > cloned and partially cloned repository (both using sparse-checkout).
> > We'd like to understand the technical reasons behind the consistent
> > performance gains we're seeing in the partial clone setup.
> >
> > Benchmark Results:
> >
> > Full Clone + Sparse-checkout:
> > - .git size: 8.5G
> > - Git index size: 20MB
> > - Pack objects: 18,761,646
> > - Operations (mean ± std dev):
> >   * git status: 0.634s ± 0.004s
> >   * git commit: 2.677s ± 0.019s
> >   * git checkout branch: 0.615s ± 0.005s
> >   * git pull (no changes): 5.983s ± 0.391s
> >
> > Partial Clone + Sparse-checkout:
> > - .git size: 2.0G
> > - Git index size: 20MB
> > - Pack objects: 13,560,436
> > - Operations (mean ± std dev):
> >   * git status: 0.575s ± 0.012s (9.3% faster)
> >   * git commit: 2.164s ± 0.032s (19.2% faster)
> >   * git checkout branch: 0.724s ± 0.154s
> >   * git pull (no changes): 1.866s ± 0.018s (68.8% faster)
> >
> > Key Questions:
> > 1. What are the technical factors causing these performance
> > improvements in the partial clone setup?
> > 2. To be able to get these benefits, is there a way to convert our
> > existing fully cloned repository to behave like a partial clone
> > without re-cloning from scratch?
> >
> > Appreciate any insights here.
> >
> > Best regards,
> > Manoraj K

Taking some wild guesses:

`git pull` will both fetch updates for _all_ branches, as well as
merge (or rebase) the updates for the current branch.  Your "no
changes" probably means there's no merge/rebase needed, but that
doesn't mean there was nothing to fetch.  A partial clone isn't going
to download all the blobs, so it has much less to download and is thus
significantly faster.

`git checkout branch` would likely be slower in a partial clone
because sometimes objects are missing and need to be downloaded.  And
indeed, it shows as being a little slower for you.

`git status` is harder to guess at.  The only guess I can come up with
for this case is that fewer objects means faster lookup (I'm not
familiar with the packfile code, but  think object lookups use a
bisect to find the objects in question, and fewer objects to bisect
would make things faster if so); not sure if this could account for a
9% difference, though.  Maybe someone who understands packfiles,
object lookup, and promisor remotes has a better idea here?

I'm a bit surprised by the `git commit` case; how can it take so long
on your repo (2-3s)?  Do you have commit hooks in place?  If so, what
are they doing?  (And if you do, I suspect whatever they are doing is
responsible for the differences in timings between the partial clone
and the full clone, so you'd need to dig into them.)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [QUESTION] Performance comparison: full clone + sparse-checkout vs partial clone + sparse-checkout
  2024-11-08 17:24   ` Elijah Newren
@ 2024-12-06  3:18     ` Manoraj K
  0 siblings, 0 replies; 4+ messages in thread
From: Manoraj K @ 2024-12-06  3:18 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, stolee, Shubham Kanodia, Ajith Kuttickattu Sakharia

Hi Elijah,

Thanks for your response! Sorry for not responding sooner.

-- `git pull` will both fetch updates for _all_ branches, as well as
merge (or rebase) the updates for the current branch.

The `git pull` here is actually `git pull origin master`. I guess it
will fetch objects and blobs for the master branch only, and in this
case, both partial clone pull and full clone pull should perform
equally.

-- I'm a bit surprised by the `git commit` case; how can it take so
long on your repo (2-3s)?

I run these with `--no-verify,` so hooks don't impact these benchmarks.

How does git understand that it's a partial clone repository during
the object lookup? How does it understand that the object needs to be
fetched instead of coming to understand that the object is not found
in error?


On Fri, Nov 8, 2024 at 10:54 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Nov 6, 2024 at 8:52 PM Manoraj K <mkenchugonde@atlassian.com> wrote:
> >
> > Bump
> >
> > On Mon, Oct 28, 2024 at 4:00 PM Manoraj K <mkenchugonde@atlassian.com> wrote:
> > >
> > > Hi,
> > >
> > > We've conducted benchmarks comparing Git operations between a fully
> > > cloned and partially cloned repository (both using sparse-checkout).
> > > We'd like to understand the technical reasons behind the consistent
> > > performance gains we're seeing in the partial clone setup.
> > >
> > > Benchmark Results:
> > >
> > > Full Clone + Sparse-checkout:
> > > - .git size: 8.5G
> > > - Git index size: 20MB
> > > - Pack objects: 18,761,646
> > > - Operations (mean ± std dev):
> > >   * git status: 0.634s ± 0.004s
> > >   * git commit: 2.677s ± 0.019s
> > >   * git checkout branch: 0.615s ± 0.005s
> > >   * git pull (no changes): 5.983s ± 0.391s
> > >
> > > Partial Clone + Sparse-checkout:
> > > - .git size: 2.0G
> > > - Git index size: 20MB
> > > - Pack objects: 13,560,436
> > > - Operations (mean ± std dev):
> > >   * git status: 0.575s ± 0.012s (9.3% faster)
> > >   * git commit: 2.164s ± 0.032s (19.2% faster)
> > >   * git checkout branch: 0.724s ± 0.154s
> > >   * git pull (no changes): 1.866s ± 0.018s (68.8% faster)
> > >
> > > Key Questions:
> > > 1. What are the technical factors causing these performance
> > > improvements in the partial clone setup?
> > > 2. To be able to get these benefits, is there a way to convert our
> > > existing fully cloned repository to behave like a partial clone
> > > without re-cloning from scratch?
> > >
> > > Appreciate any insights here.
> > >
> > > Best regards,
> > > Manoraj K
>
> Taking some wild guesses:
>
> `git pull` will both fetch updates for _all_ branches, as well as
> merge (or rebase) the updates for the current branch.  Your "no
> changes" probably means there's no merge/rebase needed, but that
> doesn't mean there was nothing to fetch.  A partial clone isn't going
> to download all the blobs, so it has much less to download and is thus
> significantly faster.
>
> `git checkout branch` would likely be slower in a partial clone
> because sometimes objects are missing and need to be downloaded.  And
> indeed, it shows as being a little slower for you.
>
> `git status` is harder to guess at.  The only guess I can come up with
> for this case is that fewer objects means faster lookup (I'm not
> familiar with the packfile code, but  think object lookups use a
> bisect to find the objects in question, and fewer objects to bisect
> would make things faster if so); not sure if this could account for a
> 9% difference, though.  Maybe someone who understands packfiles,
> object lookup, and promisor remotes has a better idea here?
>
> I'm a bit surprised by the `git commit` case; how can it take so long
> on your repo (2-3s)?  Do you have commit hooks in place?  If so, what
> are they doing?  (And if you do, I suspect whatever they are doing is
> responsible for the differences in timings between the partial clone
> and the full clone, so you'd need to dig into them.)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-12-06  3:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-28 10:30 [QUESTION] Performance comparison: full clone + sparse-checkout vs partial clone + sparse-checkout Manoraj K
2024-11-07  4:52 ` Manoraj K
2024-11-08 17:24   ` Elijah Newren
2024-12-06  3:18     ` Manoraj K

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).