git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Usability issue: "Your branch is up to date"
@ 2025-02-03 16:45 Manuel Quiñones
  2025-02-03 16:56 ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Manuel Quiñones @ 2025-02-03 16:45 UTC (permalink / raw)
  To: git

Hi,
I've been teaching Git to a group of young learners lately. They find
it odd that commands like `git status` or `git switch main` say "Your
branch is up to date with 'origin/main'" even when there are changes
that can be fetched from the remote. My proposal: Add the timestamp of
the last fetch to the message. For example:

```
$ git switch main
Switched to branch 'main'
Your branch is up to date with 'origin/main'. Last check was 2 hours ago.
```

It looks like the timestamp of file `.git/FETCH_HEAD` would be enough
to implement it.


-- 
.. manuq ..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-03 16:45 Usability issue: "Your branch is up to date" Manuel Quiñones
@ 2025-02-03 16:56 ` Junio C Hamano
  2025-02-04  0:10   ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2025-02-03 16:56 UTC (permalink / raw)
  To: Manuel Quiñones; +Cc: git

Manuel Quiñones <manuel.por.aca@gmail.com> writes:

> that can be fetched from the remote. My proposal: Add the timestamp of
> the last fetch to the message. For example:
>
> ```
> $ git switch main
> Switched to branch 'main'
> Your branch is up to date with 'origin/main'. Last check was 2 hours ago.
> ```
>
> It looks like the timestamp of file `.git/FETCH_HEAD` would be enough
> to implement it.

Not generally.  Your last fetch may not have been about origin/main
(e.g., "git fetch origin next"), or it may even have been about a
totally different remote (e.g., "git fetch elsewhere").

The timestamp of the last entry of the reflog of origin/main may be
a lot better place to look for the information, if available.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-03 16:56 ` Junio C Hamano
@ 2025-02-04  0:10   ` Junio C Hamano
  2025-02-04  0:28     ` Bram van Oosterhout
  2025-02-04 12:38     ` Manuel Quiñones
  0 siblings, 2 replies; 14+ messages in thread
From: Junio C Hamano @ 2025-02-04  0:10 UTC (permalink / raw)
  To: Manuel Quiñones; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Manuel Quiñones <manuel.por.aca@gmail.com> writes:
>
>> that can be fetched from the remote. My proposal: Add the timestamp of
>> the last fetch to the message. For example:
>>
>> ```
>> $ git switch main
>> Switched to branch 'main'
>> Your branch is up to date with 'origin/main'. Last check was 2 hours ago.
>> ```
>>
>> It looks like the timestamp of file `.git/FETCH_HEAD` would be enough
>> to implement it.
>
> Not generally.  Your last fetch may not have been about origin/main
> (e.g., "git fetch origin next"), or it may even have been about a
> totally different remote (e.g., "git fetch elsewhere").
>
> The timestamp of the last entry of the reflog of origin/main may be
> a lot better place to look for the information, if available.

Unfortunately, this is not quite enough.

I do not think a "git fetch" that noticed that the remote-tracking
branch is up-to-date updates the reflog of the remote-tracking
branch, so if you observed that their 'main' is at certain value 10
hours ago, and if your more recent fetch done two hours ago found
that they haven't made any progress, the reflog says "You observed
that their 'main' is at this commit as of 10 hours ago" and not the
number you want.

However, as I said, the fetch that touched the FETCH_HEAD file may
not have been about the ref in question, so while a two-hour old
FETCH_HEAD can guarantee that update of any ref by fetching
(including a fetch done as part of "git pull") did not happen in the
last two hours, it does not really mean what you have in your
remote-tracking branch is not stale from reality by more than two
hours.

You could inspect the contents of FETCH_HEAD to see if the source of
the remote-tracking branch is listed there, and when it appears in
the file, can use the timestamp of the file.  If you did this:

    $ git fetch origin main

and it left something like

	f93ff170b... branch 'main' of https://www.kernel.org/...

in the file, you can reverse map the URL and the branch using the
remote.*.URL and the remote.*.fetch configuration variables to
figure out that it must have been stored at our 'origin/main'.
At that point, you know that the timestamp of FETCH_HEAD would be
when we observed that value in the 'origin/main'.

But even then, because the FETCH_HEAD file is not versioned, if you
did

    $ git fetch elsewhere main

then the file gets overwritten, and you would no longer know when
was the last time you observed the value of 'origin/main'.

In short, there is not enough information kept anywhere to compute
the number you want to show reliably.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-04  0:10   ` Junio C Hamano
@ 2025-02-04  0:28     ` Bram van Oosterhout
       [not found]       ` <CAPx1GveyP4+yn5NMgvO3JpbOwPRT5=tb9YBx7U1Ufvae7gFnHQ@mail.gmail.com>
  2025-02-04  2:08       ` D. Ben Knoble
  2025-02-04 12:38     ` Manuel Quiñones
  1 sibling, 2 replies; 14+ messages in thread
From: Bram van Oosterhout @ 2025-02-04  0:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Manuel Quiñones, git

Ahhhh, this thread explains my confusion when, even though git locally
tells me my branch is "up to date", a fetch demonstrates the branch is
not up to date.

Which begs the question: Why does git say: "Your branch is up to date
..." if at best it can say: "Your
branch MIGHT BE up to date with ..."?

I have learned not to rely on the message and come to expect
(sometimes nasty) surprises when I return to a project after a few
months,

Bram

On Tue, Feb 4, 2025 at 11:11 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Manuel Quiñones <manuel.por.aca@gmail.com> writes:
> >
> >> that can be fetched from the remote. My proposal: Add the timestamp of
> >> the last fetch to the message. For example:
> >>
> >> ```
> >> $ git switch main
> >> Switched to branch 'main'
> >> Your branch is up to date with 'origin/main'. Last check was 2 hours ago.
> >> ```
> >>
> >> It looks like the timestamp of file `.git/FETCH_HEAD` would be enough
> >> to implement it.
> >
> > Not generally.  Your last fetch may not have been about origin/main
> > (e.g., "git fetch origin next"), or it may even have been about a
> > totally different remote (e.g., "git fetch elsewhere").
> >
> > The timestamp of the last entry of the reflog of origin/main may be
> > a lot better place to look for the information, if available.
>
> Unfortunately, this is not quite enough.
>
> I do not think a "git fetch" that noticed that the remote-tracking
> branch is up-to-date updates the reflog of the remote-tracking
> branch, so if you observed that their 'main' is at certain value 10
> hours ago, and if your more recent fetch done two hours ago found
> that they haven't made any progress, the reflog says "You observed
> that their 'main' is at this commit as of 10 hours ago" and not the
> number you want.
>
> However, as I said, the fetch that touched the FETCH_HEAD file may
> not have been about the ref in question, so while a two-hour old
> FETCH_HEAD can guarantee that update of any ref by fetching
> (including a fetch done as part of "git pull") did not happen in the
> last two hours, it does not really mean what you have in your
> remote-tracking branch is not stale from reality by more than two
> hours.
>
> You could inspect the contents of FETCH_HEAD to see if the source of
> the remote-tracking branch is listed there, and when it appears in
> the file, can use the timestamp of the file.  If you did this:
>
>     $ git fetch origin main
>
> and it left something like
>
>         f93ff170b... branch 'main' of https://www.kernel.org/...
>
> in the file, you can reverse map the URL and the branch using the
> remote.*.URL and the remote.*.fetch configuration variables to
> figure out that it must have been stored at our 'origin/main'.
> At that point, you know that the timestamp of FETCH_HEAD would be
> when we observed that value in the 'origin/main'.
>
> But even then, because the FETCH_HEAD file is not versioned, if you
> did
>
>     $ git fetch elsewhere main
>
> then the file gets overwritten, and you would no longer know when
> was the last time you observed the value of 'origin/main'.
>
> In short, there is not enough information kept anywhere to compute
> the number you want to show reliably.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Fwd: Usability issue: "Your branch is up to date"
       [not found]         ` <CAMoUM6LstYx3PJcx-Sz3Dfs-1BxF1uP373MO8+eknbO7j-S01Q@mail.gmail.com>
@ 2025-02-04  0:51           ` Bram van Oosterhout
  0 siblings, 0 replies; 14+ messages in thread
From: Bram van Oosterhout @ 2025-02-04  0:51 UTC (permalink / raw)
  To: git

---------- Forwarded message ---------
From: Bram van Oosterhout <adriaanbram0712@gmail.com>
Date: Tue, Feb 4, 2025 at 11:47 AM
Subject: Re: Usability issue: "Your branch is up to date"
To: Chris Torek <chris.torek@gmail.com>


On Tue, Feb 4, 2025 at 11:32 AM Chris Torek <chris.torek@gmail.com> wrote:
>
> On Mon, Feb 3, 2025 at 4:28 PM Bram van Oosterhout
> <adriaanbram0712@gmail.com> wrote:
> > Ahhhh, this thread explains my confusion when, even though git locally
> > tells me my branch is "up to date", a fetch demonstrates the branch is
> > not up to date.
> >
> > Which begs the question: Why does git say: "Your branch is up to date
> > ..." if at best it can say: "Your
> > branch MIGHT BE up to date with ..."?
>

(resend: I perpetuated the reply/reply all mistake)
> Perhaps a small wording change is in order, to say "your branch is
> up to date as of the most recent information I have from git fetch".

Or perhaps: "Your local branch is unchanged since your last fetch from ...".
That says that I have not made any changes since I last fetched the
branch and suggests there could be changes in the remote branch.

Bram

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-04  0:28     ` Bram van Oosterhout
       [not found]       ` <CAPx1GveyP4+yn5NMgvO3JpbOwPRT5=tb9YBx7U1Ufvae7gFnHQ@mail.gmail.com>
@ 2025-02-04  2:08       ` D. Ben Knoble
  2025-02-04 12:53         ` Manuel Quiñones
  2025-02-05  3:55         ` Bram van Oosterhout
  1 sibling, 2 replies; 14+ messages in thread
From: D. Ben Knoble @ 2025-02-04  2:08 UTC (permalink / raw)
  To: bram; +Cc: Junio C Hamano, Manuel Quiñones, git

On Mon, Feb 3, 2025 at 7:28 PM Bram van Oosterhout
<adriaanbram0712@gmail.com> wrote:
>
> Ahhhh, this thread explains my confusion when, even though git locally
> tells me my branch is "up to date", a fetch demonstrates the branch is
> not up to date.
>
> Which begs the question: Why does git say: "Your branch is up to date
> ..." if at best it can say: "Your
> branch MIGHT BE up to date with ..."?


Well, the branch _is_ up to date with your remote-tracking branch [1]
origin/main; that doesn't mean the tracking branch is up-to-date with
the repository origin's branch main!

I find it helpful to break the notion for newcomers early on that
origin/main somehow is "equal to" the repository named by origin's
main branch. Git (mostly) only communicates with remote repos when you
fetch, push, or, pull—in other words (and this bit may be more for
Manuel), try to reinforce that things Git knows locally are only local
and not inherently tied to other repositories. Learning this
distributed lesson proves hard in my experience but explains a lot
about the reality of how Git operates.

Exceptions to the "remote communication" rule I can think of that
probably don't need to clutter things for beginners:
- git-maintenance has pre-fetching as a default task
- git ls-remote lists remote refs by communicating with the remote

> I have learned not to rely on the message and come to expect
> (sometimes nasty) surprises when I return to a project after a few
> months,
>
> Bram

And thus `git fetch [--all]` because a part of your typical workflow,
or something like `git pull --rebase [origin [main]]` before pushing.

[1]: https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefremotetrackingbrancharemote-trackingbranch

-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-04  0:10   ` Junio C Hamano
  2025-02-04  0:28     ` Bram van Oosterhout
@ 2025-02-04 12:38     ` Manuel Quiñones
  2025-02-04 17:43       ` Junio C Hamano
  1 sibling, 1 reply; 14+ messages in thread
From: Manuel Quiñones @ 2025-02-04 12:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

El lun, 3 feb 2025 a la(s) 9:10 p.m., Junio C Hamano
(gitster@pobox.com) escribió:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Manuel Quiñones <manuel.por.aca@gmail.com> writes:
> >
> >> that can be fetched from the remote. My proposal: Add the timestamp of
> >> the last fetch to the message. For example:
> >>
> >> ```
> >> $ git switch main
> >> Switched to branch 'main'
> >> Your branch is up to date with 'origin/main'. Last check was 2 hours ago.
> >> ```
> >>
> >> It looks like the timestamp of file `.git/FETCH_HEAD` would be enough
> >> to implement it.
> >
> > Not generally.  Your last fetch may not have been about origin/main
> > (e.g., "git fetch origin next"), or it may even have been about a
> > totally different remote (e.g., "git fetch elsewhere").
> >
> > The timestamp of the last entry of the reflog of origin/main may be
> > a lot better place to look for the information, if available.
>
> Unfortunately, this is not quite enough.
>
> I do not think a "git fetch" that noticed that the remote-tracking
> branch is up-to-date updates the reflog of the remote-tracking
> branch, so if you observed that their 'main' is at certain value 10
> hours ago, and if your more recent fetch done two hours ago found
> that they haven't made any progress, the reflog says "You observed
> that their 'main' is at this commit as of 10 hours ago" and not the
> number you want.
>
> However, as I said, the fetch that touched the FETCH_HEAD file may
> not have been about the ref in question, so while a two-hour old
> FETCH_HEAD can guarantee that update of any ref by fetching
> (including a fetch done as part of "git pull") did not happen in the
> last two hours, it does not really mean what you have in your
> remote-tracking branch is not stale from reality by more than two
> hours.
>
> You could inspect the contents of FETCH_HEAD to see if the source of
> the remote-tracking branch is listed there, and when it appears in
> the file, can use the timestamp of the file.  If you did this:
>
>     $ git fetch origin main
>
> and it left something like
>
>         f93ff170b... branch 'main' of https://www.kernel.org/...
>
> in the file, you can reverse map the URL and the branch using the
> remote.*.URL and the remote.*.fetch configuration variables to
> figure out that it must have been stored at our 'origin/main'.
> At that point, you know that the timestamp of FETCH_HEAD would be
> when we observed that value in the 'origin/main'.
>
> But even then, because the FETCH_HEAD file is not versioned, if you
> did
>
>     $ git fetch elsewhere main
>
> then the file gets overwritten, and you would no longer know when
> was the last time you observed the value of 'origin/main'.
>
> In short, there is not enough information kept anywhere to compute
> the number you want to show reliably.

Thanks for the insightful explanation Junio! Looking forward, do you
think that it could be possible to record the timestamp that the
remote-tracking branch has been updated with the remote branch? In
order to make such information available to the end user.

-- 
.. manuq ..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-04  2:08       ` D. Ben Knoble
@ 2025-02-04 12:53         ` Manuel Quiñones
  2025-02-05  3:55         ` Bram van Oosterhout
  1 sibling, 0 replies; 14+ messages in thread
From: Manuel Quiñones @ 2025-02-04 12:53 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: bram, Junio C Hamano, git

El lun, 3 feb 2025 a la(s) 11:08 p.m., D. Ben Knoble
(ben.knoble@gmail.com) escribió:
>
> On Mon, Feb 3, 2025 at 7:28 PM Bram van Oosterhout
> <adriaanbram0712@gmail.com> wrote:
> >
> > Ahhhh, this thread explains my confusion when, even though git locally
> > tells me my branch is "up to date", a fetch demonstrates the branch is
> > not up to date.
> >
> > Which begs the question: Why does git say: "Your branch is up to date
> > ..." if at best it can say: "Your
> > branch MIGHT BE up to date with ..."?
>
>
> Well, the branch _is_ up to date with your remote-tracking branch [1]
> origin/main; that doesn't mean the tracking branch is up-to-date with
> the repository origin's branch main!
>
> I find it helpful to break the notion for newcomers early on that
> origin/main somehow is "equal to" the repository named by origin's
> main branch. Git (mostly) only communicates with remote repos when you
> fetch, push, or, pull—in other words (and this bit may be more for
> Manuel), try to reinforce that things Git knows locally are only local
> and not inherently tied to other repositories. Learning this
> distributed lesson proves hard in my experience but explains a lot
> about the reality of how Git operates.

Thanks for the advice Ben. Very good point. I will introduce the
difference between the origin's main branch and the remote-tracking
branch early in lessons. This is a core part of how Git works.

Still I suggest improving the usability for new generations with a
timestamp of the remote-tracking branch last update. Hopefully in the
future it will be possible!

--
.. manuq ..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-04 12:38     ` Manuel Quiñones
@ 2025-02-04 17:43       ` Junio C Hamano
  2025-02-05  6:54         ` Patrick Steinhardt
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2025-02-04 17:43 UTC (permalink / raw)
  To: Manuel Quiñones; +Cc: git

Manuel Quiñones <manuel.por.aca@gmail.com> writes:

> Thanks for the insightful explanation Junio! Looking forward, do you
> think that it could be possible to record the timestamp that the
> remote-tracking branch has been updated with the remote branch? In
> order to make such information available to the end user.

The time at which each remote-tracking branch was updated is already
recorded in the reflog.  What is missing is the timestamp that a
fetch checked if a remote-tracking branch needs updating, found that
the branch at the remote hasn't changed, and did not update the
remote-tracking branch.

You'd need to first design where to store that information and how.

It does not have to be in the reflog, but as a thought experiment,
let's take how the design would go if we decided to use reflog to
store that information.

What a reflog entry records, in textual form, looks like

<old-object-name> <new-object-name> <user-ident> <timestamp> <comment>

We can imagine adding a new reflog entry whenever "git fetch" finds
that the branch at the remote hasn't been updated, with the same
value in <old-object-name> and <new-object-name>.

A reflog file I randomly picked as a sample is ~5k long with 34
entries (it keeps track of my fetching from and pushing to
https://git.kernel.org/pub/scm/git/git.git/#master), so a reflog
costs around 150 bytes per entry, and if you fetch once every hour
that would be like ~3k per branch per day.

While that is a trivial and insignificant number from storage cost
point of view, if you are monitoring the progress of the remote with
"git reflog origin/main", I suspect that such a change would make it
unusably noisy, so "git reflog" command may need to grow an option
that tells it to skip these no-op entries.

As to required change to "git fetch", this may be a bit tricky.

IIRC (I am writing from the memory without looking at the code),
when you say "git fetch [<remote> [<refspec>...]]", what it does
is roughly to:

 - figure out what <remote> and <refspec>... to use from the
   configuration, if omitted on the command line.

 - connect to the remote, and ask the current value of their refs.

 - drop any refspec <src>:<dst> whose <dst> side already has the
   value the remote has.

 - drive the object transfer machinery to receive the pack data from
   the remote and store it locally.

 - update the remote-tracking branches.

And the last step is where the remote-tracking branches are updated,
together with their reflog (if enabled).  Because that step does not
even see the remote-tracking branches whose value do not need to
change (filtered out earlier to help reduce the number of refs fed
to the object transfer machinery), the "drop no-op early" part need
to be designed differently (e.g. mark them as no-op, so that the 
object tranfer machinery can notice them and ignore) and then the
"update refs" step can see these no-op updates.

I do not think writing the "no-op" reflog entries should be done at
a step separate from the step that writes the real ref updates, as I
suspect that such a separate update scheme would have a funny
interactions with "git fetch --atomic".

So, do I think it could be possible?  Sure.  Do I think it would be
too hard as a rocket surgery?  No.  Will I jump up and down excited
and start coding?  I am not interested all that much, but I can help
reviewing patches if somebody else works on it.

There may be some other downsides (other than the cost of storage
and making the reflog noisy) I haven't thought about, which need to
be considered if somebody decides to work on this.

Thanks.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-04  2:08       ` D. Ben Knoble
  2025-02-04 12:53         ` Manuel Quiñones
@ 2025-02-05  3:55         ` Bram van Oosterhout
  1 sibling, 0 replies; 14+ messages in thread
From: Bram van Oosterhout @ 2025-02-05  3:55 UTC (permalink / raw)
  To: D. Ben Knoble; +Cc: bram, Junio C Hamano, Manuel Quiñones, git

On Tue, Feb 4, 2025 at 1:08 PM D. Ben Knoble <ben.knoble@gmail.com> wrote:
>
> On Mon, Feb 3, 2025 at 7:28 PM Bram van Oosterhout
> <adriaanbram0712@gmail.com> wrote:
> >
> > Ahhhh, this thread explains my confusion when, even though git locally
> > tells me my branch is "up to date", a fetch demonstrates the branch is
> > not up to date.
> >
> > Which begs the question: Why does git say: "Your branch is up to date
> > ..." if at best it can say: "Your
> > branch MIGHT BE up to date with ..."?
>
>
> Well, the branch _is_ up to date with your remote-tracking branch [1]
> origin/main; that doesn't mean the tracking branch is up-to-date with
> the repository origin's branch main!
>
> I find it helpful to break the notion for newcomers early on that
> origin/main somehow is "equal to" the repository named by origin's
> main branch. Git (mostly) only communicates with remote repos when you
> fetch, push, or, pull—in other words (and this bit may be more for
> Manuel), try to reinforce that things Git knows locally are only local
> and not inherently tied to other repositories. Learning this
> distributed lesson proves hard in my experience but explains a lot
> about the reality of how Git operates.
>
> Exceptions to the "remote communication" rule I can think of that
> probably don't need to clutter things for beginners:
> - git-maintenance has pre-fetching as a default task
> - git ls-remote lists remote refs by communicating with the remote
>
> > I have learned not to rely on the message and come to expect
> > (sometimes nasty) surprises when I return to a project after a few
> > months,
> >
> > Bram
>
> And thus `git fetch [--all]` because a part of your typical workflow,
> or something like `git pull --rebase [origin [main]]` before pushing.

Thanks all for the education.

I have always read the message "Your branch is up to date with
'origin/main'." as
"Your branch is up to date with _main_ at _origin_", with _origin_
being the remote repo.

I now understand it says:
Your branch is up to date _according to_ the information available at
.git/refs/remotes/origin/main.
Since that is a local file , I can reasonably expect the info to be
stale when I return to my repo after 6 months and I should do a git
fetch to assess the situation

Thanks again. Bram
>
> [1]: https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefremotetrackingbrancharemote-trackingbranch
>
> --
> D. Ben Knoble

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-04 17:43       ` Junio C Hamano
@ 2025-02-05  6:54         ` Patrick Steinhardt
  2025-02-05 18:40           ` Junio C Hamano
  0 siblings, 1 reply; 14+ messages in thread
From: Patrick Steinhardt @ 2025-02-05  6:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Manuel Quiñones, git

On Tue, Feb 04, 2025 at 09:43:10AM -0800, Junio C Hamano wrote:
> And the last step is where the remote-tracking branches are updated,
> together with their reflog (if enabled).  Because that step does not
> even see the remote-tracking branches whose value do not need to
> change (filtered out earlier to help reduce the number of refs fed
> to the object transfer machinery), the "drop no-op early" part need
> to be designed differently (e.g. mark them as no-op, so that the 
> object tranfer machinery can notice them and ignore) and then the
> "update refs" step can see these no-op updates.
> 
> I do not think writing the "no-op" reflog entries should be done at
> a step separate from the step that writes the real ref updates, as I
> suspect that such a separate update scheme would have a funny
> interactions with "git fetch --atomic".
> 
> So, do I think it could be possible?  Sure.  Do I think it would be
> too hard as a rocket surgery?  No.  Will I jump up and down excited
> and start coding?  I am not interested all that much, but I can help
> reviewing patches if somebody else works on it.
> 
> There may be some other downsides (other than the cost of storage
> and making the reflog noisy) I haven't thought about, which need to
> be considered if somebody decides to work on this.

One thing to consider is that some remotes tend to have many thousands
or even hundreds of thousands of references. Updating timestamps for all
of them could be quite inefficient depending on where exactly that data
is store. If it was in the form of no-op reflog entries, the "files"
backend would have to touch as many files as the remote has references.
Consequently, even if only a single remote ref changed, we'd potentially
have to update metadata on hundreds of thousands of files.

So I'm not sure whether such a schema would scale well enough in the
general case for large repos.

Patrick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-05  6:54         ` Patrick Steinhardt
@ 2025-02-05 18:40           ` Junio C Hamano
  2025-02-06  9:53             ` Patrick Steinhardt
  0 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2025-02-05 18:40 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Manuel Quiñones, git

Patrick Steinhardt <ps@pks.im> writes:

> One thing to consider is that some remotes tend to have many thousands
> or even hundreds of thousands of references. Updating timestamps for all
> of them could be quite inefficient depending on where exactly that data
> is store. If it was in the form of no-op reflog entries, the "files"
> backend would have to touch as many files as the remote has references.
> Consequently, even if only a single remote ref changed, we'd potentially
> have to update metadata on hundreds of thousands of files.
>
> So I'm not sure whether such a schema would scale well enough in the
> general case for large repos.

I actually view that as quite an orthogonal issue.

Recording the fact that you checked the state of thousands of refs
at the remote and found them unchanged is probably a very small part
of a larger problem that checking the state of thousands of refs is
already expensive.  People have solved it at the protocol level to
limit the ref advertisement to only the relevant refs (as opposed to
the original protocol where the server end unconditionally
advertises the state of all of its refs at the beginning of the
conversation), so when you are only pulling a single branch from
there, you do not even observe the state of other unrelated refs
(like other branches or pull/*/ hierarchy), hence you would not
create these no-op reflog entries.

If the user, on the other hand, is interested in keeping track of
all these thousands of refs, "git fetch" would have to ask and
receive advertisement for all these thousands of refs anyway, and
at that point, recording the no-op update would be a very small
part of the problem, I suspect.  Besides, we have reftable that
would make this kind of problem easier to solve, no? ;-)







^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-05 18:40           ` Junio C Hamano
@ 2025-02-06  9:53             ` Patrick Steinhardt
  2025-02-07  8:20               ` Karthik Nayak
  0 siblings, 1 reply; 14+ messages in thread
From: Patrick Steinhardt @ 2025-02-06  9:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Manuel Quiñones, git, Karthik Nayak

On Wed, Feb 05, 2025 at 10:40:41AM -0800, Junio C Hamano wrote:
> If the user, on the other hand, is interested in keeping track of
> all these thousands of refs, "git fetch" would have to ask and
> receive advertisement for all these thousands of refs anyway, and
> at that point, recording the no-op update would be a very small
> part of the problem, I suspect.  Besides, we have reftable that
> would make this kind of problem easier to solve, no? ;-)

Yeah, I was pondering whether to bring up reftables or not :) But
indeed, with them it would be way more efficient, at least assuming that
we write everything in a single transaction and not via multiple
transactions. Which we generally don't in git-fetch(1) unless the user
asks for `--atomic` because we allow for a subset of the updates to
fail. Consequently, even with reftables we'd end up writing N separate
updates, where N is the number of advertised refs.

This is a known problem that we actually plan to fix. Karthik is working
on support for "partial" transactions, where it is allowed that a subset
of ref updates fails without impacting other refs where the update would
succeed. With this in place we could then refactor git-fetch(1) to write
the update with a single transaction, only, even in the non-atomic case.

Patrick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Usability issue: "Your branch is up to date"
  2025-02-06  9:53             ` Patrick Steinhardt
@ 2025-02-07  8:20               ` Karthik Nayak
  0 siblings, 0 replies; 14+ messages in thread
From: Karthik Nayak @ 2025-02-07  8:20 UTC (permalink / raw)
  To: Patrick Steinhardt, Junio C Hamano; +Cc: Manuel Quiñones, git

[-- Attachment #1: Type: text/plain, Size: 1645 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Feb 05, 2025 at 10:40:41AM -0800, Junio C Hamano wrote:
>> If the user, on the other hand, is interested in keeping track of
>> all these thousands of refs, "git fetch" would have to ask and
>> receive advertisement for all these thousands of refs anyway, and
>> at that point, recording the no-op update would be a very small
>> part of the problem, I suspect.  Besides, we have reftable that
>> would make this kind of problem easier to solve, no? ;-)
>
> Yeah, I was pondering whether to bring up reftables or not :) But
> indeed, with them it would be way more efficient, at least assuming that
> we write everything in a single transaction and not via multiple
> transactions. Which we generally don't in git-fetch(1) unless the user
> asks for `--atomic` because we allow for a subset of the updates to
> fail. Consequently, even with reftables we'd end up writing N separate
> updates, where N is the number of advertised refs.
>
> This is a known problem that we actually plan to fix. Karthik is working
> on support for "partial" transactions, where it is allowed that a subset
> of ref updates fails without impacting other refs where the update would
> succeed. With this in place we could then refactor git-fetch(1) to write
> the update with a single transaction, only, even in the non-atomic case.
>

You've played my hand here, I've posted the series now [1] and agree
with everything you've said here. It should really help with optimizing
reftables.

[1]: https://lore.kernel.org/git/20250207-245-partially-atomic-ref-updates-v1-0-e6a3690ff23a@gmail.com/T/#t

Thanks

> Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-02-07  8:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-03 16:45 Usability issue: "Your branch is up to date" Manuel Quiñones
2025-02-03 16:56 ` Junio C Hamano
2025-02-04  0:10   ` Junio C Hamano
2025-02-04  0:28     ` Bram van Oosterhout
     [not found]       ` <CAPx1GveyP4+yn5NMgvO3JpbOwPRT5=tb9YBx7U1Ufvae7gFnHQ@mail.gmail.com>
     [not found]         ` <CAMoUM6LstYx3PJcx-Sz3Dfs-1BxF1uP373MO8+eknbO7j-S01Q@mail.gmail.com>
2025-02-04  0:51           ` Fwd: " Bram van Oosterhout
2025-02-04  2:08       ` D. Ben Knoble
2025-02-04 12:53         ` Manuel Quiñones
2025-02-05  3:55         ` Bram van Oosterhout
2025-02-04 12:38     ` Manuel Quiñones
2025-02-04 17:43       ` Junio C Hamano
2025-02-05  6:54         ` Patrick Steinhardt
2025-02-05 18:40           ` Junio C Hamano
2025-02-06  9:53             ` Patrick Steinhardt
2025-02-07  8:20               ` Karthik Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).