* Should object repacking only update server-info for packs instead of doing it for refs?
@ 2024-10-14 20:07 Martin Fick
2024-10-14 20:33 ` Luca Milanesio
0 siblings, 1 reply; 7+ messages in thread
From: Martin Fick @ 2024-10-14 20:07 UTC (permalink / raw)
To: git@vger.kernel.org
I have been experimenting with trying to run git geometric repacking after every push on our servers and have noticed that even a NOOP geometric repack takes approximately 30s on one of my large repositories. Further investigation determined that it was the update-server-info call that was taking that 30s, so running repacking with the -n instead only takes 6ms in this case! I dug a bit deeper and found that it was the update_info_refs() call that takes all the time. This seemed a bit off to me. Yes, it makes sense that would be slow, I have almost 2Mrefs, but I had to ask myself, why is this being done from git repack? It seems counterintuitive that an operation designed to repack objects would be performing maintenance of any sort on refs? Should this be improved somehow? Should there be a -packs and a -refs switch to git update-server-info, and should git repack call it with -packs? Should git pack-refs be calling git update-server-info with -refs instead?
Thanks,
-Martin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs?
2024-10-14 20:07 Should object repacking only update server-info for packs instead of doing it for refs? Martin Fick
@ 2024-10-14 20:33 ` Luca Milanesio
2024-10-14 21:25 ` Taylor Blau
0 siblings, 1 reply; 7+ messages in thread
From: Luca Milanesio @ 2024-10-14 20:33 UTC (permalink / raw)
To: git@vger.kernel.org; +Cc: Luca Milanesio, Martin Fick
> On 14 Oct 2024, at 21:07, Martin Fick <mfick@nvidia.com> wrote:
>
> I have been experimenting with trying to run git geometric repacking after every push on our servers
I had the same sentiment when I studied how geometric repacking works.
When we tried to do geometric repacking every 5 minutes on a busy repo, we ended up with a single pack file most of the time, which isn’t really what you would expect with a geometric progression.
Running it after *every* push would make more sense, as Martin is experimenting.
Please keep us posted on the results :-)
> and have noticed that even a NOOP geometric repack takes approximately 30s on one of my large repositories. Further investigation determined that it was the update-server-info call that was taking that 30s, so running repacking with the -n instead only takes 6ms in this case! I dug a bit deeper and found that it was the update_info_refs() call that takes all the time. This seemed a bit off to me. Yes, it makes sense that would be slow, I have almost 2Mrefs, but I had to ask myself, why is this being done from git repack?
I believe it should be done to update objects/info/packs
> It seems counterintuitive that an operation designed to repack objects would be performing maintenance of any sort on refs?
True, it should not touch info/refs IMHO, as you’re really not changing any refs.
Luca.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs?
2024-10-14 20:33 ` Luca Milanesio
@ 2024-10-14 21:25 ` Taylor Blau
2024-10-15 11:02 ` Patrick Steinhardt
0 siblings, 1 reply; 7+ messages in thread
From: Taylor Blau @ 2024-10-14 21:25 UTC (permalink / raw)
To: Luca Milanesio; +Cc: git@vger.kernel.org, Martin Fick
On Mon, Oct 14, 2024 at 09:33:17PM +0100, Luca Milanesio wrote:
> > It seems counterintuitive that an operation designed to repack
> > objects would be performing maintenance of any sort on refs?
>
> True, it should not touch info/refs IMHO, as you’re really not
> changing any refs.
Right. I don't think that the current behavior is a bug, but just that
it's doing unnecessary work from within 'git repack' to update the
info/refs file when it's known ahead of time that the refs haven't
changed.
I think it's reasonable to skip this step when repacking, but of course
we would still want to update info/packs (assuming that the repack
wasn't a noop, of course).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs?
2024-10-14 21:25 ` Taylor Blau
@ 2024-10-15 11:02 ` Patrick Steinhardt
2024-10-15 11:52 ` Luca Milanesio
0 siblings, 1 reply; 7+ messages in thread
From: Patrick Steinhardt @ 2024-10-15 11:02 UTC (permalink / raw)
To: Taylor Blau; +Cc: Luca Milanesio, git@vger.kernel.org, Martin Fick
On Mon, Oct 14, 2024 at 05:25:42PM -0400, Taylor Blau wrote:
> On Mon, Oct 14, 2024 at 09:33:17PM +0100, Luca Milanesio wrote:
> > > It seems counterintuitive that an operation designed to repack
> > > objects would be performing maintenance of any sort on refs?
> >
> > True, it should not touch info/refs IMHO, as you’re really not
> > changing any refs.
>
> Right. I don't think that the current behavior is a bug, but just that
> it's doing unnecessary work from within 'git repack' to update the
> info/refs file when it's known ahead of time that the refs haven't
> changed.
>
> I think it's reasonable to skip this step when repacking, but of course
> we would still want to update info/packs (assuming that the repack
> wasn't a noop, of course).
It certainly may be reasonable. But in my opinion, it would be even more
reasonable to not use the dumb HTTP transport at all. If you don't there
is no reason to run git-update-server-info(1) in the first place, so
you'd neither generate info/refs nor info/packs.
We have been discussing in the past whether the dumb HTTP protocol
should be deprecated, and in the context of that discussion we were also
wondering whether we should start disabling git-update-server-info(1) by
default. Users don't generally need this, and most server operators
don't need this nowadays, either. So why generate data that is useless
in almost all cases?
Patrick
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs?
2024-10-15 11:02 ` Patrick Steinhardt
@ 2024-10-15 11:52 ` Luca Milanesio
2024-10-15 12:14 ` Patrick Steinhardt
0 siblings, 1 reply; 7+ messages in thread
From: Luca Milanesio @ 2024-10-15 11:52 UTC (permalink / raw)
To: git@vger.kernel.org
Cc: Luca Milanesio, Taylor Blau, Martin Fick, Patrick Steinhardt
> On 15 Oct 2024, at 12:02, Patrick Steinhardt <ps@pks.im> wrote:
>
> On Mon, Oct 14, 2024 at 05:25:42PM -0400, Taylor Blau wrote:
>> On Mon, Oct 14, 2024 at 09:33:17PM +0100, Luca Milanesio wrote:
>>>> It seems counterintuitive that an operation designed to repack
>>>> objects would be performing maintenance of any sort on refs?
>>>
>>> True, it should not touch info/refs IMHO, as you’re really not
>>> changing any refs.
>>
>> Right. I don't think that the current behavior is a bug, but just that
>> it's doing unnecessary work from within 'git repack' to update the
>> info/refs file when it's known ahead of time that the refs haven't
>> changed.
>>
>> I think it's reasonable to skip this step when repacking, but of course
>> we would still want to update info/packs (assuming that the repack
>> wasn't a noop, of course).
>
> It certainly may be reasonable. But in my opinion, it would be even more
> reasonable to not use the dumb HTTP transport at all. If you don't there
> is no reason to run git-update-server-info(1) in the first place, so
> you'd neither generate info/refs nor info/packs.
I don’t believe anyone uses it anymore, but someone *may* still use it, and therefore, Git should update the info/packs during geometric repacking.
However, why not introduce a “kill switch” in git config to disable it so that all tools can automatically skip the expensive info/packs (and info/refs) updates?
> We have been discussing in the past whether the dumb HTTP protocol
> should be deprecated, and in the context of that discussion we were also
> wondering whether we should start disabling git-update-server-info(1) by
> default.
+1 from me.
> Users don't generally need this, and most server operators
> don't need this nowadays, either. So why generate data that is useless
> in almost all cases?
Certainly not IMHO.
Luca.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs?
2024-10-15 11:52 ` Luca Milanesio
@ 2024-10-15 12:14 ` Patrick Steinhardt
2024-10-15 13:56 ` Taylor Blau
0 siblings, 1 reply; 7+ messages in thread
From: Patrick Steinhardt @ 2024-10-15 12:14 UTC (permalink / raw)
To: Luca Milanesio; +Cc: git@vger.kernel.org, Taylor Blau, Martin Fick
On Tue, Oct 15, 2024 at 12:52:12PM +0100, Luca Milanesio wrote:
>
>
> > On 15 Oct 2024, at 12:02, Patrick Steinhardt <ps@pks.im> wrote:
> >
> > On Mon, Oct 14, 2024 at 05:25:42PM -0400, Taylor Blau wrote:
> >> On Mon, Oct 14, 2024 at 09:33:17PM +0100, Luca Milanesio wrote:
> >>>> It seems counterintuitive that an operation designed to repack
> >>>> objects would be performing maintenance of any sort on refs?
> >>>
> >>> True, it should not touch info/refs IMHO, as you’re really not
> >>> changing any refs.
> >>
> >> Right. I don't think that the current behavior is a bug, but just that
> >> it's doing unnecessary work from within 'git repack' to update the
> >> info/refs file when it's known ahead of time that the refs haven't
> >> changed.
> >>
> >> I think it's reasonable to skip this step when repacking, but of course
> >> we would still want to update info/packs (assuming that the repack
> >> wasn't a noop, of course).
> >
> > It certainly may be reasonable. But in my opinion, it would be even more
> > reasonable to not use the dumb HTTP transport at all. If you don't there
> > is no reason to run git-update-server-info(1) in the first place, so
> > you'd neither generate info/refs nor info/packs.
>
> I don’t believe anyone uses it anymore, but someone *may* still use
> it, and therefore, Git should update the info/packs during geometric
> repacking. However, why not introduce a “kill switch” in git config to
> disable it so that all tools can automatically skip the expensive
> info/packs (and info/refs) updates?
We already have those, see "repack.updateServerInfo" and
"receive.updateServerInfo". Not quite a single kill switch, but it is
possible :)
Patrick
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs?
2024-10-15 12:14 ` Patrick Steinhardt
@ 2024-10-15 13:56 ` Taylor Blau
0 siblings, 0 replies; 7+ messages in thread
From: Taylor Blau @ 2024-10-15 13:56 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Luca Milanesio, git@vger.kernel.org, Martin Fick
On Tue, Oct 15, 2024 at 02:14:59PM +0200, Patrick Steinhardt wrote:
> > >> I think it's reasonable to skip this step when repacking, but of course
> > >> we would still want to update info/packs (assuming that the repack
> > >> wasn't a noop, of course).
> > >
> > > It certainly may be reasonable. But in my opinion, it would be even more
> > > reasonable to not use the dumb HTTP transport at all. If you don't there
> > > is no reason to run git-update-server-info(1) in the first place, so
> > > you'd neither generate info/refs nor info/packs.
> >
> > I don’t believe anyone uses it anymore, but someone *may* still use
> > it, and therefore, Git should update the info/packs during geometric
> > repacking. However, why not introduce a “kill switch” in git config to
> > disable it so that all tools can automatically skip the expensive
> > info/packs (and info/refs) updates?
>
> We already have those, see "repack.updateServerInfo" and
> "receive.updateServerInfo". Not quite a single kill switch, but it is
> possible :)
There is also 'git repack -n' (which I know GitHub uses) to disable
updating info/refs and info/packs as a command-line flag (if for
whatever reason modifying the 'git repack' invocation is more
convenient than touching the repository configuration).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-15 13:56 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-14 20:07 Should object repacking only update server-info for packs instead of doing it for refs? Martin Fick
2024-10-14 20:33 ` Luca Milanesio
2024-10-14 21:25 ` Taylor Blau
2024-10-15 11:02 ` Patrick Steinhardt
2024-10-15 11:52 ` Luca Milanesio
2024-10-15 12:14 ` Patrick Steinhardt
2024-10-15 13:56 ` Taylor Blau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).