* Usability issue: "Your branch is up to date" @ 2025-02-03 16:45 Manuel Quiñones 2025-02-03 16:56 ` Junio C Hamano 0 siblings, 1 reply; 14+ messages in thread From: Manuel Quiñones @ 2025-02-03 16:45 UTC (permalink / raw) To: git Hi, I've been teaching Git to a group of young learners lately. They find it odd that commands like `git status` or `git switch main` say "Your branch is up to date with 'origin/main'" even when there are changes that can be fetched from the remote. My proposal: Add the timestamp of the last fetch to the message. For example: ``` $ git switch main Switched to branch 'main' Your branch is up to date with 'origin/main'. Last check was 2 hours ago. ``` It looks like the timestamp of file `.git/FETCH_HEAD` would be enough to implement it. -- .. manuq .. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-03 16:45 Usability issue: "Your branch is up to date" Manuel Quiñones @ 2025-02-03 16:56 ` Junio C Hamano 2025-02-04 0:10 ` Junio C Hamano 0 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2025-02-03 16:56 UTC (permalink / raw) To: Manuel Quiñones; +Cc: git Manuel Quiñones <manuel.por.aca@gmail.com> writes: > that can be fetched from the remote. My proposal: Add the timestamp of > the last fetch to the message. For example: > > ``` > $ git switch main > Switched to branch 'main' > Your branch is up to date with 'origin/main'. Last check was 2 hours ago. > ``` > > It looks like the timestamp of file `.git/FETCH_HEAD` would be enough > to implement it. Not generally. Your last fetch may not have been about origin/main (e.g., "git fetch origin next"), or it may even have been about a totally different remote (e.g., "git fetch elsewhere"). The timestamp of the last entry of the reflog of origin/main may be a lot better place to look for the information, if available. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-03 16:56 ` Junio C Hamano @ 2025-02-04 0:10 ` Junio C Hamano 2025-02-04 0:28 ` Bram van Oosterhout 2025-02-04 12:38 ` Manuel Quiñones 0 siblings, 2 replies; 14+ messages in thread From: Junio C Hamano @ 2025-02-04 0:10 UTC (permalink / raw) To: Manuel Quiñones; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > Manuel Quiñones <manuel.por.aca@gmail.com> writes: > >> that can be fetched from the remote. My proposal: Add the timestamp of >> the last fetch to the message. For example: >> >> ``` >> $ git switch main >> Switched to branch 'main' >> Your branch is up to date with 'origin/main'. Last check was 2 hours ago. >> ``` >> >> It looks like the timestamp of file `.git/FETCH_HEAD` would be enough >> to implement it. > > Not generally. Your last fetch may not have been about origin/main > (e.g., "git fetch origin next"), or it may even have been about a > totally different remote (e.g., "git fetch elsewhere"). > > The timestamp of the last entry of the reflog of origin/main may be > a lot better place to look for the information, if available. Unfortunately, this is not quite enough. I do not think a "git fetch" that noticed that the remote-tracking branch is up-to-date updates the reflog of the remote-tracking branch, so if you observed that their 'main' is at certain value 10 hours ago, and if your more recent fetch done two hours ago found that they haven't made any progress, the reflog says "You observed that their 'main' is at this commit as of 10 hours ago" and not the number you want. However, as I said, the fetch that touched the FETCH_HEAD file may not have been about the ref in question, so while a two-hour old FETCH_HEAD can guarantee that update of any ref by fetching (including a fetch done as part of "git pull") did not happen in the last two hours, it does not really mean what you have in your remote-tracking branch is not stale from reality by more than two hours. You could inspect the contents of FETCH_HEAD to see if the source of the remote-tracking branch is listed there, and when it appears in the file, can use the timestamp of the file. If you did this: $ git fetch origin main and it left something like f93ff170b... branch 'main' of https://www.kernel.org/... in the file, you can reverse map the URL and the branch using the remote.*.URL and the remote.*.fetch configuration variables to figure out that it must have been stored at our 'origin/main'. At that point, you know that the timestamp of FETCH_HEAD would be when we observed that value in the 'origin/main'. But even then, because the FETCH_HEAD file is not versioned, if you did $ git fetch elsewhere main then the file gets overwritten, and you would no longer know when was the last time you observed the value of 'origin/main'. In short, there is not enough information kept anywhere to compute the number you want to show reliably. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-04 0:10 ` Junio C Hamano @ 2025-02-04 0:28 ` Bram van Oosterhout [not found] ` <CAPx1GveyP4+yn5NMgvO3JpbOwPRT5=tb9YBx7U1Ufvae7gFnHQ@mail.gmail.com> 2025-02-04 2:08 ` D. Ben Knoble 2025-02-04 12:38 ` Manuel Quiñones 1 sibling, 2 replies; 14+ messages in thread From: Bram van Oosterhout @ 2025-02-04 0:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Manuel Quiñones, git Ahhhh, this thread explains my confusion when, even though git locally tells me my branch is "up to date", a fetch demonstrates the branch is not up to date. Which begs the question: Why does git say: "Your branch is up to date ..." if at best it can say: "Your branch MIGHT BE up to date with ..."? I have learned not to rely on the message and come to expect (sometimes nasty) surprises when I return to a project after a few months, Bram On Tue, Feb 4, 2025 at 11:11 AM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > Manuel Quiñones <manuel.por.aca@gmail.com> writes: > > > >> that can be fetched from the remote. My proposal: Add the timestamp of > >> the last fetch to the message. For example: > >> > >> ``` > >> $ git switch main > >> Switched to branch 'main' > >> Your branch is up to date with 'origin/main'. Last check was 2 hours ago. > >> ``` > >> > >> It looks like the timestamp of file `.git/FETCH_HEAD` would be enough > >> to implement it. > > > > Not generally. Your last fetch may not have been about origin/main > > (e.g., "git fetch origin next"), or it may even have been about a > > totally different remote (e.g., "git fetch elsewhere"). > > > > The timestamp of the last entry of the reflog of origin/main may be > > a lot better place to look for the information, if available. > > Unfortunately, this is not quite enough. > > I do not think a "git fetch" that noticed that the remote-tracking > branch is up-to-date updates the reflog of the remote-tracking > branch, so if you observed that their 'main' is at certain value 10 > hours ago, and if your more recent fetch done two hours ago found > that they haven't made any progress, the reflog says "You observed > that their 'main' is at this commit as of 10 hours ago" and not the > number you want. > > However, as I said, the fetch that touched the FETCH_HEAD file may > not have been about the ref in question, so while a two-hour old > FETCH_HEAD can guarantee that update of any ref by fetching > (including a fetch done as part of "git pull") did not happen in the > last two hours, it does not really mean what you have in your > remote-tracking branch is not stale from reality by more than two > hours. > > You could inspect the contents of FETCH_HEAD to see if the source of > the remote-tracking branch is listed there, and when it appears in > the file, can use the timestamp of the file. If you did this: > > $ git fetch origin main > > and it left something like > > f93ff170b... branch 'main' of https://www.kernel.org/... > > in the file, you can reverse map the URL and the branch using the > remote.*.URL and the remote.*.fetch configuration variables to > figure out that it must have been stored at our 'origin/main'. > At that point, you know that the timestamp of FETCH_HEAD would be > when we observed that value in the 'origin/main'. > > But even then, because the FETCH_HEAD file is not versioned, if you > did > > $ git fetch elsewhere main > > then the file gets overwritten, and you would no longer know when > was the last time you observed the value of 'origin/main'. > > In short, there is not enough information kept anywhere to compute > the number you want to show reliably. > ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <CAPx1GveyP4+yn5NMgvO3JpbOwPRT5=tb9YBx7U1Ufvae7gFnHQ@mail.gmail.com>]
[parent not found: <CAMoUM6LstYx3PJcx-Sz3Dfs-1BxF1uP373MO8+eknbO7j-S01Q@mail.gmail.com>]
* Fwd: Usability issue: "Your branch is up to date" [not found] ` <CAMoUM6LstYx3PJcx-Sz3Dfs-1BxF1uP373MO8+eknbO7j-S01Q@mail.gmail.com> @ 2025-02-04 0:51 ` Bram van Oosterhout 0 siblings, 0 replies; 14+ messages in thread From: Bram van Oosterhout @ 2025-02-04 0:51 UTC (permalink / raw) To: git ---------- Forwarded message --------- From: Bram van Oosterhout <adriaanbram0712@gmail.com> Date: Tue, Feb 4, 2025 at 11:47 AM Subject: Re: Usability issue: "Your branch is up to date" To: Chris Torek <chris.torek@gmail.com> On Tue, Feb 4, 2025 at 11:32 AM Chris Torek <chris.torek@gmail.com> wrote: > > On Mon, Feb 3, 2025 at 4:28 PM Bram van Oosterhout > <adriaanbram0712@gmail.com> wrote: > > Ahhhh, this thread explains my confusion when, even though git locally > > tells me my branch is "up to date", a fetch demonstrates the branch is > > not up to date. > > > > Which begs the question: Why does git say: "Your branch is up to date > > ..." if at best it can say: "Your > > branch MIGHT BE up to date with ..."? > (resend: I perpetuated the reply/reply all mistake) > Perhaps a small wording change is in order, to say "your branch is > up to date as of the most recent information I have from git fetch". Or perhaps: "Your local branch is unchanged since your last fetch from ...". That says that I have not made any changes since I last fetched the branch and suggests there could be changes in the remote branch. Bram ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-04 0:28 ` Bram van Oosterhout [not found] ` <CAPx1GveyP4+yn5NMgvO3JpbOwPRT5=tb9YBx7U1Ufvae7gFnHQ@mail.gmail.com> @ 2025-02-04 2:08 ` D. Ben Knoble 2025-02-04 12:53 ` Manuel Quiñones 2025-02-05 3:55 ` Bram van Oosterhout 1 sibling, 2 replies; 14+ messages in thread From: D. Ben Knoble @ 2025-02-04 2:08 UTC (permalink / raw) To: bram; +Cc: Junio C Hamano, Manuel Quiñones, git On Mon, Feb 3, 2025 at 7:28 PM Bram van Oosterhout <adriaanbram0712@gmail.com> wrote: > > Ahhhh, this thread explains my confusion when, even though git locally > tells me my branch is "up to date", a fetch demonstrates the branch is > not up to date. > > Which begs the question: Why does git say: "Your branch is up to date > ..." if at best it can say: "Your > branch MIGHT BE up to date with ..."? Well, the branch _is_ up to date with your remote-tracking branch [1] origin/main; that doesn't mean the tracking branch is up-to-date with the repository origin's branch main! I find it helpful to break the notion for newcomers early on that origin/main somehow is "equal to" the repository named by origin's main branch. Git (mostly) only communicates with remote repos when you fetch, push, or, pull—in other words (and this bit may be more for Manuel), try to reinforce that things Git knows locally are only local and not inherently tied to other repositories. Learning this distributed lesson proves hard in my experience but explains a lot about the reality of how Git operates. Exceptions to the "remote communication" rule I can think of that probably don't need to clutter things for beginners: - git-maintenance has pre-fetching as a default task - git ls-remote lists remote refs by communicating with the remote > I have learned not to rely on the message and come to expect > (sometimes nasty) surprises when I return to a project after a few > months, > > Bram And thus `git fetch [--all]` because a part of your typical workflow, or something like `git pull --rebase [origin [main]]` before pushing. [1]: https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefremotetrackingbrancharemote-trackingbranch -- D. Ben Knoble ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-04 2:08 ` D. Ben Knoble @ 2025-02-04 12:53 ` Manuel Quiñones 2025-02-05 3:55 ` Bram van Oosterhout 1 sibling, 0 replies; 14+ messages in thread From: Manuel Quiñones @ 2025-02-04 12:53 UTC (permalink / raw) To: D. Ben Knoble; +Cc: bram, Junio C Hamano, git El lun, 3 feb 2025 a la(s) 11:08 p.m., D. Ben Knoble (ben.knoble@gmail.com) escribió: > > On Mon, Feb 3, 2025 at 7:28 PM Bram van Oosterhout > <adriaanbram0712@gmail.com> wrote: > > > > Ahhhh, this thread explains my confusion when, even though git locally > > tells me my branch is "up to date", a fetch demonstrates the branch is > > not up to date. > > > > Which begs the question: Why does git say: "Your branch is up to date > > ..." if at best it can say: "Your > > branch MIGHT BE up to date with ..."? > > > Well, the branch _is_ up to date with your remote-tracking branch [1] > origin/main; that doesn't mean the tracking branch is up-to-date with > the repository origin's branch main! > > I find it helpful to break the notion for newcomers early on that > origin/main somehow is "equal to" the repository named by origin's > main branch. Git (mostly) only communicates with remote repos when you > fetch, push, or, pull—in other words (and this bit may be more for > Manuel), try to reinforce that things Git knows locally are only local > and not inherently tied to other repositories. Learning this > distributed lesson proves hard in my experience but explains a lot > about the reality of how Git operates. Thanks for the advice Ben. Very good point. I will introduce the difference between the origin's main branch and the remote-tracking branch early in lessons. This is a core part of how Git works. Still I suggest improving the usability for new generations with a timestamp of the remote-tracking branch last update. Hopefully in the future it will be possible! -- .. manuq .. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-04 2:08 ` D. Ben Knoble 2025-02-04 12:53 ` Manuel Quiñones @ 2025-02-05 3:55 ` Bram van Oosterhout 1 sibling, 0 replies; 14+ messages in thread From: Bram van Oosterhout @ 2025-02-05 3:55 UTC (permalink / raw) To: D. Ben Knoble; +Cc: bram, Junio C Hamano, Manuel Quiñones, git On Tue, Feb 4, 2025 at 1:08 PM D. Ben Knoble <ben.knoble@gmail.com> wrote: > > On Mon, Feb 3, 2025 at 7:28 PM Bram van Oosterhout > <adriaanbram0712@gmail.com> wrote: > > > > Ahhhh, this thread explains my confusion when, even though git locally > > tells me my branch is "up to date", a fetch demonstrates the branch is > > not up to date. > > > > Which begs the question: Why does git say: "Your branch is up to date > > ..." if at best it can say: "Your > > branch MIGHT BE up to date with ..."? > > > Well, the branch _is_ up to date with your remote-tracking branch [1] > origin/main; that doesn't mean the tracking branch is up-to-date with > the repository origin's branch main! > > I find it helpful to break the notion for newcomers early on that > origin/main somehow is "equal to" the repository named by origin's > main branch. Git (mostly) only communicates with remote repos when you > fetch, push, or, pull—in other words (and this bit may be more for > Manuel), try to reinforce that things Git knows locally are only local > and not inherently tied to other repositories. Learning this > distributed lesson proves hard in my experience but explains a lot > about the reality of how Git operates. > > Exceptions to the "remote communication" rule I can think of that > probably don't need to clutter things for beginners: > - git-maintenance has pre-fetching as a default task > - git ls-remote lists remote refs by communicating with the remote > > > I have learned not to rely on the message and come to expect > > (sometimes nasty) surprises when I return to a project after a few > > months, > > > > Bram > > And thus `git fetch [--all]` because a part of your typical workflow, > or something like `git pull --rebase [origin [main]]` before pushing. Thanks all for the education. I have always read the message "Your branch is up to date with 'origin/main'." as "Your branch is up to date with _main_ at _origin_", with _origin_ being the remote repo. I now understand it says: Your branch is up to date _according to_ the information available at .git/refs/remotes/origin/main. Since that is a local file , I can reasonably expect the info to be stale when I return to my repo after 6 months and I should do a git fetch to assess the situation Thanks again. Bram > > [1]: https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefremotetrackingbrancharemote-trackingbranch > > -- > D. Ben Knoble ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-04 0:10 ` Junio C Hamano 2025-02-04 0:28 ` Bram van Oosterhout @ 2025-02-04 12:38 ` Manuel Quiñones 2025-02-04 17:43 ` Junio C Hamano 1 sibling, 1 reply; 14+ messages in thread From: Manuel Quiñones @ 2025-02-04 12:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: git El lun, 3 feb 2025 a la(s) 9:10 p.m., Junio C Hamano (gitster@pobox.com) escribió: > > Junio C Hamano <gitster@pobox.com> writes: > > > Manuel Quiñones <manuel.por.aca@gmail.com> writes: > > > >> that can be fetched from the remote. My proposal: Add the timestamp of > >> the last fetch to the message. For example: > >> > >> ``` > >> $ git switch main > >> Switched to branch 'main' > >> Your branch is up to date with 'origin/main'. Last check was 2 hours ago. > >> ``` > >> > >> It looks like the timestamp of file `.git/FETCH_HEAD` would be enough > >> to implement it. > > > > Not generally. Your last fetch may not have been about origin/main > > (e.g., "git fetch origin next"), or it may even have been about a > > totally different remote (e.g., "git fetch elsewhere"). > > > > The timestamp of the last entry of the reflog of origin/main may be > > a lot better place to look for the information, if available. > > Unfortunately, this is not quite enough. > > I do not think a "git fetch" that noticed that the remote-tracking > branch is up-to-date updates the reflog of the remote-tracking > branch, so if you observed that their 'main' is at certain value 10 > hours ago, and if your more recent fetch done two hours ago found > that they haven't made any progress, the reflog says "You observed > that their 'main' is at this commit as of 10 hours ago" and not the > number you want. > > However, as I said, the fetch that touched the FETCH_HEAD file may > not have been about the ref in question, so while a two-hour old > FETCH_HEAD can guarantee that update of any ref by fetching > (including a fetch done as part of "git pull") did not happen in the > last two hours, it does not really mean what you have in your > remote-tracking branch is not stale from reality by more than two > hours. > > You could inspect the contents of FETCH_HEAD to see if the source of > the remote-tracking branch is listed there, and when it appears in > the file, can use the timestamp of the file. If you did this: > > $ git fetch origin main > > and it left something like > > f93ff170b... branch 'main' of https://www.kernel.org/... > > in the file, you can reverse map the URL and the branch using the > remote.*.URL and the remote.*.fetch configuration variables to > figure out that it must have been stored at our 'origin/main'. > At that point, you know that the timestamp of FETCH_HEAD would be > when we observed that value in the 'origin/main'. > > But even then, because the FETCH_HEAD file is not versioned, if you > did > > $ git fetch elsewhere main > > then the file gets overwritten, and you would no longer know when > was the last time you observed the value of 'origin/main'. > > In short, there is not enough information kept anywhere to compute > the number you want to show reliably. Thanks for the insightful explanation Junio! Looking forward, do you think that it could be possible to record the timestamp that the remote-tracking branch has been updated with the remote branch? In order to make such information available to the end user. -- .. manuq .. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-04 12:38 ` Manuel Quiñones @ 2025-02-04 17:43 ` Junio C Hamano 2025-02-05 6:54 ` Patrick Steinhardt 0 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2025-02-04 17:43 UTC (permalink / raw) To: Manuel Quiñones; +Cc: git Manuel Quiñones <manuel.por.aca@gmail.com> writes: > Thanks for the insightful explanation Junio! Looking forward, do you > think that it could be possible to record the timestamp that the > remote-tracking branch has been updated with the remote branch? In > order to make such information available to the end user. The time at which each remote-tracking branch was updated is already recorded in the reflog. What is missing is the timestamp that a fetch checked if a remote-tracking branch needs updating, found that the branch at the remote hasn't changed, and did not update the remote-tracking branch. You'd need to first design where to store that information and how. It does not have to be in the reflog, but as a thought experiment, let's take how the design would go if we decided to use reflog to store that information. What a reflog entry records, in textual form, looks like <old-object-name> <new-object-name> <user-ident> <timestamp> <comment> We can imagine adding a new reflog entry whenever "git fetch" finds that the branch at the remote hasn't been updated, with the same value in <old-object-name> and <new-object-name>. A reflog file I randomly picked as a sample is ~5k long with 34 entries (it keeps track of my fetching from and pushing to https://git.kernel.org/pub/scm/git/git.git/#master), so a reflog costs around 150 bytes per entry, and if you fetch once every hour that would be like ~3k per branch per day. While that is a trivial and insignificant number from storage cost point of view, if you are monitoring the progress of the remote with "git reflog origin/main", I suspect that such a change would make it unusably noisy, so "git reflog" command may need to grow an option that tells it to skip these no-op entries. As to required change to "git fetch", this may be a bit tricky. IIRC (I am writing from the memory without looking at the code), when you say "git fetch [<remote> [<refspec>...]]", what it does is roughly to: - figure out what <remote> and <refspec>... to use from the configuration, if omitted on the command line. - connect to the remote, and ask the current value of their refs. - drop any refspec <src>:<dst> whose <dst> side already has the value the remote has. - drive the object transfer machinery to receive the pack data from the remote and store it locally. - update the remote-tracking branches. And the last step is where the remote-tracking branches are updated, together with their reflog (if enabled). Because that step does not even see the remote-tracking branches whose value do not need to change (filtered out earlier to help reduce the number of refs fed to the object transfer machinery), the "drop no-op early" part need to be designed differently (e.g. mark them as no-op, so that the object tranfer machinery can notice them and ignore) and then the "update refs" step can see these no-op updates. I do not think writing the "no-op" reflog entries should be done at a step separate from the step that writes the real ref updates, as I suspect that such a separate update scheme would have a funny interactions with "git fetch --atomic". So, do I think it could be possible? Sure. Do I think it would be too hard as a rocket surgery? No. Will I jump up and down excited and start coding? I am not interested all that much, but I can help reviewing patches if somebody else works on it. There may be some other downsides (other than the cost of storage and making the reflog noisy) I haven't thought about, which need to be considered if somebody decides to work on this. Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-04 17:43 ` Junio C Hamano @ 2025-02-05 6:54 ` Patrick Steinhardt 2025-02-05 18:40 ` Junio C Hamano 0 siblings, 1 reply; 14+ messages in thread From: Patrick Steinhardt @ 2025-02-05 6:54 UTC (permalink / raw) To: Junio C Hamano; +Cc: Manuel Quiñones, git On Tue, Feb 04, 2025 at 09:43:10AM -0800, Junio C Hamano wrote: > And the last step is where the remote-tracking branches are updated, > together with their reflog (if enabled). Because that step does not > even see the remote-tracking branches whose value do not need to > change (filtered out earlier to help reduce the number of refs fed > to the object transfer machinery), the "drop no-op early" part need > to be designed differently (e.g. mark them as no-op, so that the > object tranfer machinery can notice them and ignore) and then the > "update refs" step can see these no-op updates. > > I do not think writing the "no-op" reflog entries should be done at > a step separate from the step that writes the real ref updates, as I > suspect that such a separate update scheme would have a funny > interactions with "git fetch --atomic". > > So, do I think it could be possible? Sure. Do I think it would be > too hard as a rocket surgery? No. Will I jump up and down excited > and start coding? I am not interested all that much, but I can help > reviewing patches if somebody else works on it. > > There may be some other downsides (other than the cost of storage > and making the reflog noisy) I haven't thought about, which need to > be considered if somebody decides to work on this. One thing to consider is that some remotes tend to have many thousands or even hundreds of thousands of references. Updating timestamps for all of them could be quite inefficient depending on where exactly that data is store. If it was in the form of no-op reflog entries, the "files" backend would have to touch as many files as the remote has references. Consequently, even if only a single remote ref changed, we'd potentially have to update metadata on hundreds of thousands of files. So I'm not sure whether such a schema would scale well enough in the general case for large repos. Patrick ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-05 6:54 ` Patrick Steinhardt @ 2025-02-05 18:40 ` Junio C Hamano 2025-02-06 9:53 ` Patrick Steinhardt 0 siblings, 1 reply; 14+ messages in thread From: Junio C Hamano @ 2025-02-05 18:40 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Manuel Quiñones, git Patrick Steinhardt <ps@pks.im> writes: > One thing to consider is that some remotes tend to have many thousands > or even hundreds of thousands of references. Updating timestamps for all > of them could be quite inefficient depending on where exactly that data > is store. If it was in the form of no-op reflog entries, the "files" > backend would have to touch as many files as the remote has references. > Consequently, even if only a single remote ref changed, we'd potentially > have to update metadata on hundreds of thousands of files. > > So I'm not sure whether such a schema would scale well enough in the > general case for large repos. I actually view that as quite an orthogonal issue. Recording the fact that you checked the state of thousands of refs at the remote and found them unchanged is probably a very small part of a larger problem that checking the state of thousands of refs is already expensive. People have solved it at the protocol level to limit the ref advertisement to only the relevant refs (as opposed to the original protocol where the server end unconditionally advertises the state of all of its refs at the beginning of the conversation), so when you are only pulling a single branch from there, you do not even observe the state of other unrelated refs (like other branches or pull/*/ hierarchy), hence you would not create these no-op reflog entries. If the user, on the other hand, is interested in keeping track of all these thousands of refs, "git fetch" would have to ask and receive advertisement for all these thousands of refs anyway, and at that point, recording the no-op update would be a very small part of the problem, I suspect. Besides, we have reftable that would make this kind of problem easier to solve, no? ;-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-05 18:40 ` Junio C Hamano @ 2025-02-06 9:53 ` Patrick Steinhardt 2025-02-07 8:20 ` Karthik Nayak 0 siblings, 1 reply; 14+ messages in thread From: Patrick Steinhardt @ 2025-02-06 9:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: Manuel Quiñones, git, Karthik Nayak On Wed, Feb 05, 2025 at 10:40:41AM -0800, Junio C Hamano wrote: > If the user, on the other hand, is interested in keeping track of > all these thousands of refs, "git fetch" would have to ask and > receive advertisement for all these thousands of refs anyway, and > at that point, recording the no-op update would be a very small > part of the problem, I suspect. Besides, we have reftable that > would make this kind of problem easier to solve, no? ;-) Yeah, I was pondering whether to bring up reftables or not :) But indeed, with them it would be way more efficient, at least assuming that we write everything in a single transaction and not via multiple transactions. Which we generally don't in git-fetch(1) unless the user asks for `--atomic` because we allow for a subset of the updates to fail. Consequently, even with reftables we'd end up writing N separate updates, where N is the number of advertised refs. This is a known problem that we actually plan to fix. Karthik is working on support for "partial" transactions, where it is allowed that a subset of ref updates fails without impacting other refs where the update would succeed. With this in place we could then refactor git-fetch(1) to write the update with a single transaction, only, even in the non-atomic case. Patrick ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Usability issue: "Your branch is up to date" 2025-02-06 9:53 ` Patrick Steinhardt @ 2025-02-07 8:20 ` Karthik Nayak 0 siblings, 0 replies; 14+ messages in thread From: Karthik Nayak @ 2025-02-07 8:20 UTC (permalink / raw) To: Patrick Steinhardt, Junio C Hamano; +Cc: Manuel Quiñones, git [-- Attachment #1: Type: text/plain, Size: 1645 bytes --] Patrick Steinhardt <ps@pks.im> writes: > On Wed, Feb 05, 2025 at 10:40:41AM -0800, Junio C Hamano wrote: >> If the user, on the other hand, is interested in keeping track of >> all these thousands of refs, "git fetch" would have to ask and >> receive advertisement for all these thousands of refs anyway, and >> at that point, recording the no-op update would be a very small >> part of the problem, I suspect. Besides, we have reftable that >> would make this kind of problem easier to solve, no? ;-) > > Yeah, I was pondering whether to bring up reftables or not :) But > indeed, with them it would be way more efficient, at least assuming that > we write everything in a single transaction and not via multiple > transactions. Which we generally don't in git-fetch(1) unless the user > asks for `--atomic` because we allow for a subset of the updates to > fail. Consequently, even with reftables we'd end up writing N separate > updates, where N is the number of advertised refs. > > This is a known problem that we actually plan to fix. Karthik is working > on support for "partial" transactions, where it is allowed that a subset > of ref updates fails without impacting other refs where the update would > succeed. With this in place we could then refactor git-fetch(1) to write > the update with a single transaction, only, even in the non-atomic case. > You've played my hand here, I've posted the series now [1] and agree with everything you've said here. It should really help with optimizing reftables. [1]: https://lore.kernel.org/git/20250207-245-partially-atomic-ref-updates-v1-0-e6a3690ff23a@gmail.com/T/#t Thanks > Patrick [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-02-07 8:20 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-03 16:45 Usability issue: "Your branch is up to date" Manuel Quiñones 2025-02-03 16:56 ` Junio C Hamano 2025-02-04 0:10 ` Junio C Hamano 2025-02-04 0:28 ` Bram van Oosterhout [not found] ` <CAPx1GveyP4+yn5NMgvO3JpbOwPRT5=tb9YBx7U1Ufvae7gFnHQ@mail.gmail.com> [not found] ` <CAMoUM6LstYx3PJcx-Sz3Dfs-1BxF1uP373MO8+eknbO7j-S01Q@mail.gmail.com> 2025-02-04 0:51 ` Fwd: " Bram van Oosterhout 2025-02-04 2:08 ` D. Ben Knoble 2025-02-04 12:53 ` Manuel Quiñones 2025-02-05 3:55 ` Bram van Oosterhout 2025-02-04 12:38 ` Manuel Quiñones 2025-02-04 17:43 ` Junio C Hamano 2025-02-05 6:54 ` Patrick Steinhardt 2025-02-05 18:40 ` Junio C Hamano 2025-02-06 9:53 ` Patrick Steinhardt 2025-02-07 8:20 ` Karthik Nayak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).