* Should object repacking only update server-info for packs instead of doing it for refs? @ 2024-10-14 20:07 Martin Fick 2024-10-14 20:33 ` Luca Milanesio 0 siblings, 1 reply; 7+ messages in thread From: Martin Fick @ 2024-10-14 20:07 UTC (permalink / raw) To: git@vger.kernel.org I have been experimenting with trying to run git geometric repacking after every push on our servers and have noticed that even a NOOP geometric repack takes approximately 30s on one of my large repositories. Further investigation determined that it was the update-server-info call that was taking that 30s, so running repacking with the -n instead only takes 6ms in this case! I dug a bit deeper and found that it was the update_info_refs() call that takes all the time. This seemed a bit off to me. Yes, it makes sense that would be slow, I have almost 2Mrefs, but I had to ask myself, why is this being done from git repack? It seems counterintuitive that an operation designed to repack objects would be performing maintenance of any sort on refs? Should this be improved somehow? Should there be a -packs and a -refs switch to git update-server-info, and should git repack call it with -packs? Should git pack-refs be calling git update-server-info with -refs instead? Thanks, -Martin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs? 2024-10-14 20:07 Should object repacking only update server-info for packs instead of doing it for refs? Martin Fick @ 2024-10-14 20:33 ` Luca Milanesio 2024-10-14 21:25 ` Taylor Blau 0 siblings, 1 reply; 7+ messages in thread From: Luca Milanesio @ 2024-10-14 20:33 UTC (permalink / raw) To: git@vger.kernel.org; +Cc: Luca Milanesio, Martin Fick > On 14 Oct 2024, at 21:07, Martin Fick <mfick@nvidia.com> wrote: > > I have been experimenting with trying to run git geometric repacking after every push on our servers I had the same sentiment when I studied how geometric repacking works. When we tried to do geometric repacking every 5 minutes on a busy repo, we ended up with a single pack file most of the time, which isn’t really what you would expect with a geometric progression. Running it after *every* push would make more sense, as Martin is experimenting. Please keep us posted on the results :-) > and have noticed that even a NOOP geometric repack takes approximately 30s on one of my large repositories. Further investigation determined that it was the update-server-info call that was taking that 30s, so running repacking with the -n instead only takes 6ms in this case! I dug a bit deeper and found that it was the update_info_refs() call that takes all the time. This seemed a bit off to me. Yes, it makes sense that would be slow, I have almost 2Mrefs, but I had to ask myself, why is this being done from git repack? I believe it should be done to update objects/info/packs > It seems counterintuitive that an operation designed to repack objects would be performing maintenance of any sort on refs? True, it should not touch info/refs IMHO, as you’re really not changing any refs. Luca. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs? 2024-10-14 20:33 ` Luca Milanesio @ 2024-10-14 21:25 ` Taylor Blau 2024-10-15 11:02 ` Patrick Steinhardt 0 siblings, 1 reply; 7+ messages in thread From: Taylor Blau @ 2024-10-14 21:25 UTC (permalink / raw) To: Luca Milanesio; +Cc: git@vger.kernel.org, Martin Fick On Mon, Oct 14, 2024 at 09:33:17PM +0100, Luca Milanesio wrote: > > It seems counterintuitive that an operation designed to repack > > objects would be performing maintenance of any sort on refs? > > True, it should not touch info/refs IMHO, as you’re really not > changing any refs. Right. I don't think that the current behavior is a bug, but just that it's doing unnecessary work from within 'git repack' to update the info/refs file when it's known ahead of time that the refs haven't changed. I think it's reasonable to skip this step when repacking, but of course we would still want to update info/packs (assuming that the repack wasn't a noop, of course). Thanks, Taylor ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs? 2024-10-14 21:25 ` Taylor Blau @ 2024-10-15 11:02 ` Patrick Steinhardt 2024-10-15 11:52 ` Luca Milanesio 0 siblings, 1 reply; 7+ messages in thread From: Patrick Steinhardt @ 2024-10-15 11:02 UTC (permalink / raw) To: Taylor Blau; +Cc: Luca Milanesio, git@vger.kernel.org, Martin Fick On Mon, Oct 14, 2024 at 05:25:42PM -0400, Taylor Blau wrote: > On Mon, Oct 14, 2024 at 09:33:17PM +0100, Luca Milanesio wrote: > > > It seems counterintuitive that an operation designed to repack > > > objects would be performing maintenance of any sort on refs? > > > > True, it should not touch info/refs IMHO, as you’re really not > > changing any refs. > > Right. I don't think that the current behavior is a bug, but just that > it's doing unnecessary work from within 'git repack' to update the > info/refs file when it's known ahead of time that the refs haven't > changed. > > I think it's reasonable to skip this step when repacking, but of course > we would still want to update info/packs (assuming that the repack > wasn't a noop, of course). It certainly may be reasonable. But in my opinion, it would be even more reasonable to not use the dumb HTTP transport at all. If you don't there is no reason to run git-update-server-info(1) in the first place, so you'd neither generate info/refs nor info/packs. We have been discussing in the past whether the dumb HTTP protocol should be deprecated, and in the context of that discussion we were also wondering whether we should start disabling git-update-server-info(1) by default. Users don't generally need this, and most server operators don't need this nowadays, either. So why generate data that is useless in almost all cases? Patrick ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs? 2024-10-15 11:02 ` Patrick Steinhardt @ 2024-10-15 11:52 ` Luca Milanesio 2024-10-15 12:14 ` Patrick Steinhardt 0 siblings, 1 reply; 7+ messages in thread From: Luca Milanesio @ 2024-10-15 11:52 UTC (permalink / raw) To: git@vger.kernel.org Cc: Luca Milanesio, Taylor Blau, Martin Fick, Patrick Steinhardt > On 15 Oct 2024, at 12:02, Patrick Steinhardt <ps@pks.im> wrote: > > On Mon, Oct 14, 2024 at 05:25:42PM -0400, Taylor Blau wrote: >> On Mon, Oct 14, 2024 at 09:33:17PM +0100, Luca Milanesio wrote: >>>> It seems counterintuitive that an operation designed to repack >>>> objects would be performing maintenance of any sort on refs? >>> >>> True, it should not touch info/refs IMHO, as you’re really not >>> changing any refs. >> >> Right. I don't think that the current behavior is a bug, but just that >> it's doing unnecessary work from within 'git repack' to update the >> info/refs file when it's known ahead of time that the refs haven't >> changed. >> >> I think it's reasonable to skip this step when repacking, but of course >> we would still want to update info/packs (assuming that the repack >> wasn't a noop, of course). > > It certainly may be reasonable. But in my opinion, it would be even more > reasonable to not use the dumb HTTP transport at all. If you don't there > is no reason to run git-update-server-info(1) in the first place, so > you'd neither generate info/refs nor info/packs. I don’t believe anyone uses it anymore, but someone *may* still use it, and therefore, Git should update the info/packs during geometric repacking. However, why not introduce a “kill switch” in git config to disable it so that all tools can automatically skip the expensive info/packs (and info/refs) updates? > We have been discussing in the past whether the dumb HTTP protocol > should be deprecated, and in the context of that discussion we were also > wondering whether we should start disabling git-update-server-info(1) by > default. +1 from me. > Users don't generally need this, and most server operators > don't need this nowadays, either. So why generate data that is useless > in almost all cases? Certainly not IMHO. Luca. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs? 2024-10-15 11:52 ` Luca Milanesio @ 2024-10-15 12:14 ` Patrick Steinhardt 2024-10-15 13:56 ` Taylor Blau 0 siblings, 1 reply; 7+ messages in thread From: Patrick Steinhardt @ 2024-10-15 12:14 UTC (permalink / raw) To: Luca Milanesio; +Cc: git@vger.kernel.org, Taylor Blau, Martin Fick On Tue, Oct 15, 2024 at 12:52:12PM +0100, Luca Milanesio wrote: > > > > On 15 Oct 2024, at 12:02, Patrick Steinhardt <ps@pks.im> wrote: > > > > On Mon, Oct 14, 2024 at 05:25:42PM -0400, Taylor Blau wrote: > >> On Mon, Oct 14, 2024 at 09:33:17PM +0100, Luca Milanesio wrote: > >>>> It seems counterintuitive that an operation designed to repack > >>>> objects would be performing maintenance of any sort on refs? > >>> > >>> True, it should not touch info/refs IMHO, as you’re really not > >>> changing any refs. > >> > >> Right. I don't think that the current behavior is a bug, but just that > >> it's doing unnecessary work from within 'git repack' to update the > >> info/refs file when it's known ahead of time that the refs haven't > >> changed. > >> > >> I think it's reasonable to skip this step when repacking, but of course > >> we would still want to update info/packs (assuming that the repack > >> wasn't a noop, of course). > > > > It certainly may be reasonable. But in my opinion, it would be even more > > reasonable to not use the dumb HTTP transport at all. If you don't there > > is no reason to run git-update-server-info(1) in the first place, so > > you'd neither generate info/refs nor info/packs. > > I don’t believe anyone uses it anymore, but someone *may* still use > it, and therefore, Git should update the info/packs during geometric > repacking. However, why not introduce a “kill switch” in git config to > disable it so that all tools can automatically skip the expensive > info/packs (and info/refs) updates? We already have those, see "repack.updateServerInfo" and "receive.updateServerInfo". Not quite a single kill switch, but it is possible :) Patrick ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Should object repacking only update server-info for packs instead of doing it for refs? 2024-10-15 12:14 ` Patrick Steinhardt @ 2024-10-15 13:56 ` Taylor Blau 0 siblings, 0 replies; 7+ messages in thread From: Taylor Blau @ 2024-10-15 13:56 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Luca Milanesio, git@vger.kernel.org, Martin Fick On Tue, Oct 15, 2024 at 02:14:59PM +0200, Patrick Steinhardt wrote: > > >> I think it's reasonable to skip this step when repacking, but of course > > >> we would still want to update info/packs (assuming that the repack > > >> wasn't a noop, of course). > > > > > > It certainly may be reasonable. But in my opinion, it would be even more > > > reasonable to not use the dumb HTTP transport at all. If you don't there > > > is no reason to run git-update-server-info(1) in the first place, so > > > you'd neither generate info/refs nor info/packs. > > > > I don’t believe anyone uses it anymore, but someone *may* still use > > it, and therefore, Git should update the info/packs during geometric > > repacking. However, why not introduce a “kill switch” in git config to > > disable it so that all tools can automatically skip the expensive > > info/packs (and info/refs) updates? > > We already have those, see "repack.updateServerInfo" and > "receive.updateServerInfo". Not quite a single kill switch, but it is > possible :) There is also 'git repack -n' (which I know GitHub uses) to disable updating info/refs and info/packs as a command-line flag (if for whatever reason modifying the 'git repack' invocation is more convenient than touching the repository configuration). Thanks, Taylor ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-15 13:56 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-14 20:07 Should object repacking only update server-info for packs instead of doing it for refs? Martin Fick 2024-10-14 20:33 ` Luca Milanesio 2024-10-14 21:25 ` Taylor Blau 2024-10-15 11:02 ` Patrick Steinhardt 2024-10-15 11:52 ` Luca Milanesio 2024-10-15 12:14 ` Patrick Steinhardt 2024-10-15 13:56 ` Taylor Blau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).