Continuous Benchmarking

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Continuous Benchmarking
@ 2025-02-03  9:54 Patrick Steinhardt
  2025-02-03 16:33 ` Junio C Hamano
  2025-02-05 23:14 ` Emily Shaffer
  0 siblings, 2 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2025-02-03  9:54 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Hi,

due to a couple performance regressions that we have hit over the last
couple Git releases at GitLab, we have started to set up an effort to
implement continuous benchmarking for the Git project. The intent is to
have regular (daily) benchmarking runs against Git's `master` and `next`
branches to be able to spot any performance regressions before they make
it into the next release.

I have started with a relatively simple setup:

  - I have started collection benchmarks that I myself do regularly [1].
    These benchmarks are built on hyperfine and are thus not part of the
    Git repository itself.

  - GitLab CI runs on a nightly basis, executing a subset of these
    benchmarks [2].

  - Results are uploaded with a hyperfine adaptor to Bencher and are
    summarized in dashboards.

This at least gives us some visibility in severe performance outliers,
whether these are improvements or regressions. Some statistics are
applied on this data to automatically generate alerts when things are
significantly changing.

The setup is of course not perfect. It's built on top of CI jobs, which
are by their very nature not really performing consistent. The scripts
are hosted outside of Git. And I'm the only one running this.

So I wonder whether there is a wider interest in the Git community to
have this infrastructure part of the Git project itself. This may
include steps like the following:

  - Extending our performance tests we have in "t/perf" to cover more
    benchmarks.

  - Writing an adaptor that is able to upload the data generated from
    our perf scripts to Bencher.

  - Setting up proper infrastructure to do the benchmarking. We may for
    now also continue to use GitLab CI, but as said they are quite noisy
    overall. Dedicated servers would help here.

  - Sending alerts to the Git mailing list.

I'm happy to hear your thoughts on this. Any ideas are welcome,
including "we're not interested at all". In that case, we'd simply
continue to maintain the setup ourselves at GitLab.

Thanks!

Patrick

[1]: https://gitlab.com/gitlab-org/data-access/git/benchmarks
[2]: https://gitlab.com/gitlab-org/data-access/git/benchmarks/-/blob/main/.gitlab-ci.yml?ref_type=heads
[3]: https://bencher.dev/console/projects/git/plots

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Continuous Benchmarking
  2025-02-03  9:54 Continuous Benchmarking Patrick Steinhardt
@ 2025-02-03 16:33 ` Junio C Hamano
  2025-02-05 23:14 ` Emily Shaffer
  1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2025-02-03 16:33 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Emily Shaffer

Patrick Steinhardt <ps@pks.im> writes:

> ... implement continuous benchmarking for the Git project. The intent is to
> have regular (daily) benchmarking runs against Git's `master` and `next`
> branches to be able to spot any performance regressions before they make
> it into the next release.

This is great.

> I have started with a relatively simple setup:
>
>   - I have started collection benchmarks that I myself do regularly [1].
>     These benchmarks are built on hyperfine and are thus not part of the
>     Git repository itself.
>
>   - GitLab CI runs on a nightly basis, executing a subset of these
>     benchmarks [2].
>
>   - Results are uploaded with a hyperfine adaptor to Bencher and are
>     summarized in dashboards.
>
> This at least gives us some visibility in severe performance outliers,
> whether these are improvements or regressions. Some statistics are
> applied on this data to automatically generate alerts when things are
> significantly changing.
>
> The setup is of course not perfect. It's built on top of CI jobs, which
> are by their very nature not really performing consistent. The scripts
> are hosted outside of Git. And I'm the only one running this.
>
> So I wonder whether there is a wider interest in the Git community to
> have this infrastructure part of the Git project itself. This may
> include steps like the following:
>
>   - Extending our performance tests we have in "t/perf" to cover more
>     benchmarks.
>
>   - Writing an adaptor that is able to upload the data generated from
>     our perf scripts to Bencher.
>
>   - Setting up proper infrastructure to do the benchmarking. We may for
>     now also continue to use GitLab CI, but as said they are quite noisy
>     overall. Dedicated servers would help here.
>
>   - Sending alerts to the Git mailing list.
>
> I'm happy to hear your thoughts on this. Any ideas are welcome,
> including "we're not interested at all". In that case, we'd simply
> continue to maintain the setup ourselves at GitLab.

Elsewhere Peff was talking about his adventure with Coverty running
on 'next'.  The more eyes and tools on the topics before they hit
'master', the less chance we have to scramble just before the
release.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Continuous Benchmarking
  2025-02-03  9:54 Continuous Benchmarking Patrick Steinhardt
  2025-02-03 16:33 ` Junio C Hamano
@ 2025-02-05 23:14 ` Emily Shaffer
  2025-02-21  8:48   ` Patrick Steinhardt
  1 sibling, 1 reply; 4+ messages in thread
From: Emily Shaffer @ 2025-02-05 23:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Mon, Feb 3, 2025 at 1:55 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> Hi,
>
> due to a couple performance regressions that we have hit over the last
> couple Git releases at GitLab, we have started to set up an effort to
> implement continuous benchmarking for the Git project. The intent is to
> have regular (daily) benchmarking runs against Git's `master` and `next`
> branches to be able to spot any performance regressions before they make
> it into the next release.
>
> I have started with a relatively simple setup:
>
>   - I have started collection benchmarks that I myself do regularly [1].
>     These benchmarks are built on hyperfine and are thus not part of the
>     Git repository itself.
>
>   - GitLab CI runs on a nightly basis, executing a subset of these
>     benchmarks [2].
>
>   - Results are uploaded with a hyperfine adaptor to Bencher and are
>     summarized in dashboards.
>
> This at least gives us some visibility in severe performance outliers,
> whether these are improvements or regressions. Some statistics are
> applied on this data to automatically generate alerts when things are
> significantly changing.
>
> The setup is of course not perfect. It's built on top of CI jobs, which
> are by their very nature not really performing consistent. The scripts
> are hosted outside of Git. And I'm the only one running this.

For the CI "noisy neighbors" problem at least, it could be an option
to try to host in GCE (or some other compute that isn't shared). I
asked around a little inside Google and it seems like it's possible,
I'll keep pushing on it and see just how hard it would be. I'd even be
happy to trade on-push runs with noisy neighbors for nightly runs with
no neighbors, which makes it not really a CI thing - guess I will find
out if that's easier or harder for us to implement. :)

>
> So I wonder whether there is a wider interest in the Git community to
> have this infrastructure part of the Git project itself. This may
> include steps like the following:
>
>   - Extending our performance tests we have in "t/perf" to cover more
>     benchmarks.

Folks may be aware that our biggest (in terms of scale) internal
customer at Google is Android project. They are the ones who complain
to me and my team the most about performance; they are also open to
setting up nightly performance regression test. Would it be appealing
to get reports from such a test upstream? I think it's more compelling
to our customer team if we run it against the closed-source Android
repo, which means the Git project doesn't get to see as much about the
shape and content of the repos the performance tests are running
against, but we might be able to publish info about the shape without
the contents. Would that be useful? What would help to know (# of
commits, size of largest object, distribution of object size, # of
branches, size of worktree...?) If not having the specifics of the
repo-under-test is a dealbreaker we could explore running performance
tests in public with Android Open Source Project as the
repo-under-test instead, but it's much more manageable than full
Android.

Maybe in the long term it would be even better to have some toy
repo-under-test, like "sample repo with massive object store", "sample
repo with massive history", etc. to help us pinpoint which ways we're
scaling well and which ways we aren't. But having a ready made
repo-under-test, and a team who's got a very large stake in Git
performing well with it (so they can invest their time in setting up
tests), might be a good enough place to start.

>
>   - Writing an adaptor that is able to upload the data generated from
>     our perf scripts to Bencher.
>
>   - Setting up proper infrastructure to do the benchmarking. We may for
>     now also continue to use GitLab CI, but as said they are quite noisy
>     overall. Dedicated servers would help here.
>
>   - Sending alerts to the Git mailing list.

Yeah, I'd love to see reports coming to Git mailing list, or at least
bad news reports (maybe we don't need "everything ran great!" every
night, but would appreciate "last night the performance suite ran 50%
slower than last-6-months average"). That seems the easiest to
integrate with the way the project runs now, and I think we are used
to list noise :)

>
> I'm happy to hear your thoughts on this. Any ideas are welcome,
> including "we're not interested at all". In that case, we'd simply
> continue to maintain the setup ourselves at GitLab.

In general, though, yes! I am very interested! Google had trouble with
performance regressions over the last 3 months or so, I'd love to see
the community noticing it more. I think in general we have a sense
that performance matters, during code review, but aren't always sure
where it matters most, and a regular performance test that anybody can
see the results of would help a lot.

>
> Thanks!
>
> Patrick
>
> [1]: https://gitlab.com/gitlab-org/data-access/git/benchmarks
> [2]: https://gitlab.com/gitlab-org/data-access/git/benchmarks/-/blob/main/.gitlab-ci.yml?ref_type=heads
> [3]: https://bencher.dev/console/projects/git/plots

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Continuous Benchmarking
  2025-02-05 23:14 ` Emily Shaffer
@ 2025-02-21  8:48   ` Patrick Steinhardt
  0 siblings, 0 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2025-02-21  8:48 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

On Wed, Feb 05, 2025 at 03:14:21PM -0800, Emily Shaffer wrote:
> On Mon, Feb 3, 2025 at 1:55 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > Hi,
> >
> > due to a couple performance regressions that we have hit over the last
> > couple Git releases at GitLab, we have started to set up an effort to
> > implement continuous benchmarking for the Git project. The intent is to
> > have regular (daily) benchmarking runs against Git's `master` and `next`
> > branches to be able to spot any performance regressions before they make
> > it into the next release.
> >
> > I have started with a relatively simple setup:
> >
> >   - I have started collection benchmarks that I myself do regularly [1].
> >     These benchmarks are built on hyperfine and are thus not part of the
> >     Git repository itself.
> >
> >   - GitLab CI runs on a nightly basis, executing a subset of these
> >     benchmarks [2].
> >
> >   - Results are uploaded with a hyperfine adaptor to Bencher and are
> >     summarized in dashboards.
> >
> > This at least gives us some visibility in severe performance outliers,
> > whether these are improvements or regressions. Some statistics are
> > applied on this data to automatically generate alerts when things are
> > significantly changing.
> >
> > The setup is of course not perfect. It's built on top of CI jobs, which
> > are by their very nature not really performing consistent. The scripts
> > are hosted outside of Git. And I'm the only one running this.
> 
> For the CI "noisy neighbors" problem at least, it could be an option
> to try to host in GCE (or some other compute that isn't shared). I
> asked around a little inside Google and it seems like it's possible,
> I'll keep pushing on it and see just how hard it would be. I'd even be
> happy to trade on-push runs with noisy neighbors for nightly runs with
> no neighbors, which makes it not really a CI thing - guess I will find
> out if that's easier or harder for us to implement. :)

That would be awesome.

> > So I wonder whether there is a wider interest in the Git community to
> > have this infrastructure part of the Git project itself. This may
> > include steps like the following:
> >
> >   - Extending our performance tests we have in "t/perf" to cover more
> >     benchmarks.
> 
> Folks may be aware that our biggest (in terms of scale) internal
> customer at Google is Android project. They are the ones who complain
> to me and my team the most about performance; they are also open to
> setting up nightly performance regression test. Would it be appealing
> to get reports from such a test upstream? I think it's more compelling
> to our customer team if we run it against the closed-source Android
> repo, which means the Git project doesn't get to see as much about the
> shape and content of the repos the performance tests are running
> against, but we might be able to publish info about the shape without
> the contents. Would that be useful? What would help to know (# of
> commits, size of largest object, distribution of object size, # of
> branches, size of worktree...?) If not having the specifics of the
> repo-under-test is a dealbreaker we could explore running performance
> tests in public with Android Open Source Project as the
> repo-under-test instead, but it's much more manageable than full
> Android.

The biggest question is whether such regression reports would be
actionable by the Git community. I often found performance issues to be
very specific to the repository at hand, and reconstructing the exact
situation tends to be extremely tedious or completely infeasible. I run
into the situation way too often where customers come knock at my door
with a performance issue, but don't want to provide the underlying data.
More often than not I end up not being able to reproduce, so I have to
push back on such reports.

Ideally, any report should be accompanied by a trivial reproducer that
any developer can execute on their local machine.

> Maybe in the long term it would be even better to have some toy
> repo-under-test, like "sample repo with massive object store", "sample
> repo with massive history", etc. to help us pinpoint which ways we're
> scaling well and which ways we aren't. But having a ready made
> repo-under-test, and a team who's got a very large stake in Git
> performing well with it (so they can invest their time in setting up
> tests), might be a good enough place to start.

That would be great. I guess this wouldn't be a single repository, but a
set of repositories that have different kinds of characteristics.

> >   - Writing an adaptor that is able to upload the data generated from
> >     our perf scripts to Bencher.
> >
> >   - Setting up proper infrastructure to do the benchmarking. We may for
> >     now also continue to use GitLab CI, but as said they are quite noisy
> >     overall. Dedicated servers would help here.
> >
> >   - Sending alerts to the Git mailing list.
> 
> Yeah, I'd love to see reports coming to Git mailing list, or at least
> bad news reports (maybe we don't need "everything ran great!" every
> night, but would appreciate "last night the performance suite ran 50%
> slower than last-6-months average"). That seems the easiest to
> integrate with the way the project runs now, and I think we are used
> to list noise :)

Oh, totally, I certainly don't think there's any benefit in reporting
anything when there is no information. Right now there still are semi-
frequent outliers where an alert is generated only because of a flake,
not a real performance regression. But my hope would be that we can
address this issue once we address the noisy neighbour problem.

> > I'm happy to hear your thoughts on this. Any ideas are welcome,
> > including "we're not interested at all". In that case, we'd simply
> > continue to maintain the setup ourselves at GitLab.
> 
> In general, though, yes! I am very interested! Google had trouble with
> performance regressions over the last 3 months or so, I'd love to see
> the community noticing it more. I think in general we have a sense
> that performance matters, during code review, but aren't always sure
> where it matters most, and a regular performance test that anybody can
> see the results of would help a lot.

Thanks for your input!

Patrick

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-02-21  8:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-03  9:54 Continuous Benchmarking Patrick Steinhardt
2025-02-03 16:33 ` Junio C Hamano
2025-02-05 23:14 ` Emily Shaffer
2025-02-21  8:48   ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).