git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git auto-repack is broken...
@ 2011-12-02 16:22 Linus Torvalds
  2011-12-02 16:27 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2011-12-02 16:22 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List

I actually tend to repack things pretty religiously (ok, not really,
but I do "git gc" reasonably regularly, so I was surprised to see
thig:

  Auto packing the repository for optimum performance. You may also
  run "git gc" manually. See "git help gc" for more information.

followed by this pitiful effort:

  Counting objects: 8, done.
  Delta compression using up to 4 threads.
  Compressing objects: 100% (8/8), done.
  Writing objects: 100% (8/8), done.
  Total 8 (delta 0), reused 0 (delta 0)

Ok, those 8 objects will *not* help anything at all, and the
autorepack is broken.

So what's going on? It turns out that I have a fair amount of
unreachable objects in this repository, because I do things like
fetching things without then merging them, etc. So the "git gc --auto"
will happily do "git repack -A" or whatever, and that in turn does
*nothing* what-so-ever (or rather, it packs my latest merge commit
like the above and generates that pack of a whopping 8 objects).

I can fix it with "git gc --prune=now", so it's not like I personally
really care, but since the whole point of "git gc --auto" is to allow
people who don't know what they are doing to ignore the whole issue of
GC and pruning, I do think this is a real UI bug.

I don't really have any suggestions for fixing it, though. Maybe we
should make "git gc --auto" remove any unreachable objects? That would
be potentially dangerous in shared repository situations, though. Or
have an extra option to "git repack -A" to also pack any loose objects
it finds at the end (whether reachable or not)?

                         Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-02 16:22 git auto-repack is broken Linus Torvalds
@ 2011-12-02 16:27 ` Ævar Arnfjörð Bjarmason
  2011-12-02 16:56   ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-12-02 16:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Fri, Dec 2, 2011 at 17:22, Linus Torvalds
<torvalds@linux-foundation.org> wrote:

> Maybe we should make "git gc --auto" remove any unreachable objects?

Wouldn't that mean that any loose commit objects you have lying around
would be removed by the automatic git gc?

One feature of git that I personally rely on is that I can liberally
move heads around / make commits on detached heads and not have those
commits gc'd unless I explicitly ask for it for a while.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-02 16:27 ` Ævar Arnfjörð Bjarmason
@ 2011-12-02 16:56   ` Linus Torvalds
  2011-12-02 17:10     ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2011-12-02 16:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, Git Mailing List

On Fri, Dec 2, 2011 at 8:27 AM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>> Maybe we should make "git gc --auto" remove any unreachable objects?
>
> Wouldn't that mean that any loose commit objects you have lying around
> would be removed by the automatic git gc?
>
> One feature of git that I personally rely on is that I can liberally
> move heads around / make commits on detached heads and not have those
> commits gc'd unless I explicitly ask for it for a while.

Well, with reflogs, you actually do have those objects reachable for
quite a while (90 days by default).

The "unreachable objects" tends to happen when you do fetches without
ever merging the result or actually remove branches (and/or expiring
the reflogs early etc). Not from the normal "use 'git reset' and
friends to move heads around".

That said, I do agree that removing loose objects is the much less
safe approach.

Of course, repacking the objects results in problems too: now you've
entirely lost the age information for that object, so now you cannot
prune it based on age any more.

But leaving the loose objects around and basically failing auto-gc
isn't good either.

                     Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-02 16:56   ` Linus Torvalds
@ 2011-12-02 17:10     ` Jeff King
  2011-12-02 17:35       ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2011-12-02 17:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Git Mailing List

On Fri, Dec 02, 2011 at 08:56:34AM -0800, Linus Torvalds wrote:

> On Fri, Dec 2, 2011 at 8:27 AM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> >
> >> Maybe we should make "git gc --auto" remove any unreachable objects?
> >
> > Wouldn't that mean that any loose commit objects you have lying around
> > would be removed by the automatic git gc?
> >
> > One feature of git that I personally rely on is that I can liberally
> > move heads around / make commits on detached heads and not have those
> > commits gc'd unless I explicitly ask for it for a while.
> 
> Well, with reflogs, you actually do have those objects reachable for
> quite a while (90 days by default).
> 
> The "unreachable objects" tends to happen when you do fetches without
> ever merging the result or actually remove branches (and/or expiring
> the reflogs early etc). Not from the normal "use 'git reset' and
> friends to move heads around".
> 
> That said, I do agree that removing loose objects is the much less
> safe approach.

We do remove loose objects that are totally unreferenced, but there is
still a time-delay, because we don't want to prune something like an
in-progress commit operation. The default delay for that is 2 weeks,
which I think is an arbitrary number that was "wow, if your git
operation takes longer than this, you're way too patient".

And in general, it works OK because people don't tend to accumulate more
than the auto-gc number of objects within a 2 week period. So perhaps
you're just special in your usage patterns.

One solution is just dropping that "2 weeks" down to something smaller,
but still conservative (say, 3 days?).

If you still have the repo in question, what is the date breakdown on
your loose objects?

> Of course, repacking the objects results in problems too: now you've
> entirely lost the age information for that object, so now you cannot
> prune it based on age any more.

When the objects become unreferenced, we eject them from the pack into
loose form again. If they don't become referenced in the 2-week window,
they get pruned then. So yes, you drop the age information, but they do
eventually go away.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-02 17:10     ` Jeff King
@ 2011-12-02 17:35       ` Junio C Hamano
  2011-12-02 17:45         ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2011-12-02 17:35 UTC (permalink / raw)
  To: Jeff King
  Cc: Linus Torvalds, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Git Mailing List

Jeff King <peff@peff.net> writes:

> When the objects become unreferenced, we eject them from the pack into
> loose form again. If they don't become referenced in the 2-week window,
> they get pruned then. So yes, you drop the age information, but they do
> eventually go away.

If you update gc/repack -A to put them in a separate pack, then you would
never be able to get rid of them, no? You pack, then eject (which gives
them a fresher timestamp), then notice that you are within the 2-week window
and pack them again,...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-02 17:35       ` Junio C Hamano
@ 2011-12-02 17:45         ` Jeff King
  2011-12-02 18:08           ` Junio C Hamano
  2011-12-03 19:42           ` Brandon Casey
  0 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2011-12-02 17:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, Dec 02, 2011 at 09:35:52AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > When the objects become unreferenced, we eject them from the pack into
> > loose form again. If they don't become referenced in the 2-week window,
> > they get pruned then. So yes, you drop the age information, but they do
> > eventually go away.
> 
> If you update gc/repack -A to put them in a separate pack, then you would
> never be able to get rid of them, no? You pack, then eject (which gives
> them a fresher timestamp), then notice that you are within the 2-week window
> and pack them again,...

But we shouldn't be packing totally unreferenced objects. Barring bugs,
the life cycle of such an object should be something like:

  1. Object X is created on branch 'foo'.

  2. Branch 'foo' is deleted, but its commits are still in the HEAD
     reflog, referencing X.

  3. 90 days pass (actually, I think this might be the 30-day
     expire-unreachable time)

  4. "git gc" runs "git repack -Ad", which will eject X from the pack
     into a loose form (because it is not becoming part of the new pack
     we are writing).

  5. Two weeks pass.

  6. "git gc" runs "git prune --expire=2.weeks.ago", which removes the
     object.

"gc" runs between (4) and (6) will not re-pack the object, because it
remains unreferenced.

I think things might be slowed somewhat by "gc --auto", which will not
do a "repack -A" until we have too many packs. So steps (3) and (4) are
really more like "gc runs git-repack without -A" 50 times, and then we
finally run "git repack -A".

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-02 17:45         ` Jeff King
@ 2011-12-02 18:08           ` Junio C Hamano
  2011-12-02 18:13             ` Jeff King
  2011-12-03 19:42           ` Brandon Casey
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2011-12-02 18:08 UTC (permalink / raw)
  To: Jeff King
  Cc: Linus Torvalds, Ævar Arnfjörð Bjarmason,
	Git Mailing List

Jeff King <peff@peff.net> writes:

> But we shouldn't be packing totally unreferenced objects.

Everything you said is correct in today's Git and I obviously know it, but
I was taking the "Or have an extra option to..." at the end of the OP's
message in the thread into account, so...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-02 18:08           ` Junio C Hamano
@ 2011-12-02 18:13             ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2011-12-02 18:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Fri, Dec 02, 2011 at 10:08:15AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > But we shouldn't be packing totally unreferenced objects.
> 
> Everything you said is correct in today's Git and I obviously know it, but
> I was taking the "Or have an extra option to..." at the end of the OP's
> message in the thread into account, so...

Ah, sorry, I missed the subtlety of Linus's "repacking the objects
results in problems..." from his later message and thought he just meant
repacking in general. Yes, it's a bad idea to repack unreachable objects
because then you could never prune anything.

I think just shrinking the --expire window that we already use is a much
more reasonable bet. It's not about preventing the loss of old work
(reflogs are there for that), but about avoiding hurting an actively
running, about-to-reference-the-objects git process. And 2 weeks is
quite conservative for that.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
@ 2011-12-03  6:55 George Spelvin
  0 siblings, 0 replies; 19+ messages in thread
From: George Spelvin @ 2011-12-03  6:55 UTC (permalink / raw)
  To: git; +Cc: linux, peff

Thanks, Jeff, for the life-cycle chart.

A couple of ideas come to mind:
- When unpacking objects from a pack, it should be fine to set their
  date to that of the pack.  After all, they're at least that old.
- We could put unreferenced objects into packs whose date is the most
  recent of any of the contained objects.
- We could then group unreferenced objects into packs based on age,
  so their ages sould not be affected too much by the preceding
  operations.

That still produces a noticeable number of packs, which isn't
good, but maybe it's better that keeping thousands of loose
objects for a month...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-02 17:45         ` Jeff King
  2011-12-02 18:08           ` Junio C Hamano
@ 2011-12-03 19:42           ` Brandon Casey
  2011-12-07 22:12             ` Nicolas Pitre
  2011-12-08  0:49             ` Jeff King
  1 sibling, 2 replies; 19+ messages in thread
From: Brandon Casey @ 2011-12-03 19:42 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Linus Torvalds, Ævar Arnfjörð,
	Git Mailing List

On Fri, Dec 2, 2011 at 11:45 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Dec 02, 2011 at 09:35:52AM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>> > When the objects become unreferenced, we eject them from the pack into
>> > loose form again. If they don't become referenced in the 2-week window,
>> > they get pruned then. So yes, you drop the age information, but they do
>> > eventually go away.
>>
>> If you update gc/repack -A to put them in a separate pack, then you would
>> never be able to get rid of them, no? You pack, then eject (which gives
>> them a fresher timestamp), then notice that you are within the 2-week window
>> and pack them again,...
>
> But we shouldn't be packing totally unreferenced objects. Barring bugs,
> the life cycle of such an object should be something like:
>
>  1. Object X is created on branch 'foo'.
>
>  2. Branch 'foo' is deleted, but its commits are still in the HEAD
>     reflog, referencing X.
>
>  3. 90 days pass (actually, I think this might be the 30-day
>     expire-unreachable time)
>
>  4. "git gc" runs "git repack -Ad", which will eject X from the pack
>     into a loose form (because it is not becoming part of the new pack
>     we are writing).

Actually, it is right here when the newly loosened unreferenced
objects will be deleted.  Objects ejected from a pack _are_ given the
timestamp of the pack they were ejected from.  So, if the pack is
older than two weeks (90 days in your example), then so will be the
loosened objects, and git prune will delete them when called by git
gc.

>  5. Two weeks pass.
>
>  6. "git gc" runs "git prune --expire=2.weeks.ago", which removes the
>     object.
>
> "gc" runs between (4) and (6) will not re-pack the object, because it
> remains unreferenced.

Correct with the recognition that loose objects get pack mtime, so
step 5 may be less than two weeks.

> I think things might be slowed somewhat by "gc --auto", which will not
> do a "repack -A" until we have too many packs. So steps (3) and (4) are
> really more like "gc runs git-repack without -A" 50 times, and then we
> finally run "git repack -A".

This is correct.  This should have the effect of increasing the age of
unreferenced objects when they are finally loosened and make it more
likely that they are pruned during the same git gc operation that
loosens them.

Linus's scenario of fetching a lot of stuff that never actually makes
it into the reflogs is still a valid problem.  I'm not sure that
people who don't know what they are doing are going to run into this
problem though.  Since he fetches a lot of stuff without ever checking
it out or creating a branch from it, potentially many objects become
unreferenced every time FETCH_HEAD changes.  If he does this many
times in a short period of time, he could reach the gc.autopacklimit
and trigger gc --auto and produce more than gc.auto loose objects that
are younger than gc.pruneExpire.

Decreasing gc.pruneExpire as you suggested should make it much less
likely to run into this problem.  I wonder if it is worth trying to
limit how often gc --auto is run to not be more often than
gc.pruneExpire or something.  If we modified the timestamp that is
assigned to fetched packs, maybe we could use the pack timestamps as
an indicator of how recently git gc has run.

-Brandon

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-03 19:42           ` Brandon Casey
@ 2011-12-07 22:12             ` Nicolas Pitre
  2011-12-07 22:53               ` Jeff King
  2011-12-09 17:35               ` Junio C Hamano
  2011-12-08  0:49             ` Jeff King
  1 sibling, 2 replies; 19+ messages in thread
From: Nicolas Pitre @ 2011-12-07 22:12 UTC (permalink / raw)
  To: Brandon Casey
  Cc: Jeff King, Junio C Hamano, Linus Torvalds,
	Ævar Arnfjörð, Git Mailing List

On Sat, 3 Dec 2011, Brandon Casey wrote:

> Linus's scenario of fetching a lot of stuff that never actually makes
> it into the reflogs is still a valid problem.  I'm not sure that
> people who don't know what they are doing are going to run into this
> problem though.  Since he fetches a lot of stuff without ever checking
> it out or creating a branch from it, potentially many objects become
> unreferenced every time FETCH_HEAD changes.

Maybe  FETCH_HEAD should have a reflog too?


Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-07 22:12             ` Nicolas Pitre
@ 2011-12-07 22:53               ` Jeff King
  2011-12-08  0:18                 ` Nicolas Pitre
  2011-12-09 17:35               ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2011-12-07 22:53 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Brandon Casey, Junio C Hamano, Linus Torvalds,
	Ævar Arnfjörð, Git Mailing List

On Wed, Dec 07, 2011 at 05:12:14PM -0500, Nicolas Pitre wrote:

> Maybe  FETCH_HEAD should have a reflog too?

That might be nice. However, there is a complication, in that FETCH_HEAD
may contain many sha1s, but each reflog entry only has room for a single
sha1 transition. You could obviously encode it as a series of reflog
entries, but then "git show FETCH_HEAD@{1}" is not very meaningful.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-07 22:53               ` Jeff King
@ 2011-12-08  0:18                 ` Nicolas Pitre
  2011-12-08  0:45                   ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Nicolas Pitre @ 2011-12-08  0:18 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, Junio C Hamano, Linus Torvalds,
	Ævar Arnfjörð, Git Mailing List

On Wed, 7 Dec 2011, Jeff King wrote:

> On Wed, Dec 07, 2011 at 05:12:14PM -0500, Nicolas Pitre wrote:
> 
> > Maybe  FETCH_HEAD should have a reflog too?
> 
> That might be nice. However, there is a complication, in that FETCH_HEAD
> may contain many sha1s, but each reflog entry only has room for a single
> sha1 transition. You could obviously encode it as a series of reflog
> entries, but then "git show FETCH_HEAD@{1}" is not very meaningful.

What does "git show FETCH_HEAD" do now?  If it shows only one 
(presumably the first) SHA1 then its reflog doesn't have to be smarter, 
which would properly cover most cases already.  I certainly never did a 
multi-ref fetch myself.


Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-08  0:18                 ` Nicolas Pitre
@ 2011-12-08  0:45                   ` Jeff King
  2011-12-08  3:35                     ` Nicolas Pitre
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2011-12-08  0:45 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Brandon Casey, Junio C Hamano, Linus Torvalds,
	Ævar Arnfjörð, Git Mailing List

On Wed, Dec 07, 2011 at 07:18:13PM -0500, Nicolas Pitre wrote:

> > > Maybe  FETCH_HEAD should have a reflog too?
> > 
> > That might be nice. However, there is a complication, in that FETCH_HEAD
> > may contain many sha1s, but each reflog entry only has room for a single
> > sha1 transition. You could obviously encode it as a series of reflog
> > entries, but then "git show FETCH_HEAD@{1}" is not very meaningful.
> 
> What does "git show FETCH_HEAD" do now?  If it shows only one
> (presumably the first) SHA1 then its reflog doesn't have to be
> smarter, which would properly cover most cases already.

Are you proposing that it only store the first ref in the reflog, or
that we accept that a single fetch may write lots of reflog entries?

If the former, then you are missing the expiration/connectivity
properties.

If the latter, then it is not just "we only show the first one for
FETCH_HEAD@{1}", but also "the thing that used to be FETCH_HEAD@{1} does
not graduate to FETCH_HEAD@{2}, but rather FETCH_HEAD@{n} for some
unknown n". That may be an acceptable limitation; I just wanted to
mention it in case somebody can think of some clever solution.

> I certainly never did a multi-ref fetch myself.

Not consciously, perhaps, but you do it all the time without realizing
it:

  $ git clone git://git.kernel.org/pub/scm/git/git.git
  $ cd git
  $ git fetch -v origin
   = [up to date]      maint      -> origin/maint
   = [up to date]      master     -> origin/master
   = [up to date]      next       -> origin/next
   = [up to date]      pu         -> origin/pu
   = [up to date]      todo       -> origin/todo
  $ cat .git/FETCH_HEAD
  b1af9630d758e1728fc0008b3f18d90d8f87f4c5        not-for-merge   branch 'maint' of git://git.kernel.org/pub/scm/git/git
  4cb5d10b14dcbe0155bed9c45ccb94e83bd4c599                branch 'master' of git://git.kernel.org/pub/scm/git/git
  03e5527c5df33d4550ccc1446d861c0aa5689d58        not-for-merge   branch 'next' of git://git.kernel.org/pub/scm/git/git
  cc4e3f01fc6a5e09ae5bbdc464965981fae4cf39        not-for-merge   branch 'pu' of git://git.kernel.org/pub/scm/git/git
  7a02dba15bd28826344f9c14a5e2b5c57eeb7e50        not-for-merge   branch 'todo' of git://git.kernel.org/pub/scm/git/git

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-03 19:42           ` Brandon Casey
  2011-12-07 22:12             ` Nicolas Pitre
@ 2011-12-08  0:49             ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff King @ 2011-12-08  0:49 UTC (permalink / raw)
  To: Brandon Casey
  Cc: Junio C Hamano, Linus Torvalds, Ævar Arnfjörð,
	Git Mailing List

On Sat, Dec 03, 2011 at 01:42:22PM -0600, Brandon Casey wrote:

> >  4. "git gc" runs "git repack -Ad", which will eject X from the pack
> >     into a loose form (because it is not becoming part of the new pack
> >     we are writing).
> 
> Actually, it is right here when the newly loosened unreferenced
> objects will be deleted.  Objects ejected from a pack _are_ given the
> timestamp of the pack they were ejected from.  So, if the pack is
> older than two weeks (90 days in your example), then so will be the
> loosened objects, and git prune will delete them when called by git
> gc.

Thanks, I didn't notice that when looking at the code.

> Decreasing gc.pruneExpire as you suggested should make it much less
> likely to run into this problem.

I'd be more comfortable with that solution if we had data on what the
timestamps look like when it actually happens (e.g., an "ls -lR" listing
of a repository that in practice is wanting to auto-gc too often).

> I wonder if it is worth trying to limit how often gc --auto is run to
> not be more often than gc.pruneExpire or something.  If we modified
> the timestamp that is assigned to fetched packs, maybe we could use
> the pack timestamps as an indicator of how recently git gc has run.

I'm worried you run into other corner cases, there. Like a repository
which is generating new, referenced objects at a fast rate (e.g.,
because you're importing something) should trigger auto-gc much sooner
than that, and this rule would prevent it.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-08  0:45                   ` Jeff King
@ 2011-12-08  3:35                     ` Nicolas Pitre
  2011-12-08  3:40                       ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Nicolas Pitre @ 2011-12-08  3:35 UTC (permalink / raw)
  To: Jeff King
  Cc: Brandon Casey, Junio C Hamano, Linus Torvalds,
	Ævar Arnfjörð, Git Mailing List

On Wed, 7 Dec 2011, Jeff King wrote:

> On Wed, Dec 07, 2011 at 07:18:13PM -0500, Nicolas Pitre wrote:
> 
> > I certainly never did a multi-ref fetch myself.
> 
> Not consciously, perhaps, but you do it all the time without realizing
> it:
> 
>   $ git clone git://git.kernel.org/pub/scm/git/git.git
>   $ cd git
>   $ git fetch -v origin
>    = [up to date]      maint      -> origin/maint
>    = [up to date]      master     -> origin/master
>    = [up to date]      next       -> origin/next
>    = [up to date]      pu         -> origin/pu
>    = [up to date]      todo       -> origin/todo
>   $ cat .git/FETCH_HEAD
>   b1af9630d758e1728fc0008b3f18d90d8f87f4c5        not-for-merge   branch 'maint' of git://git.kernel.org/pub/scm/git/git
>   4cb5d10b14dcbe0155bed9c45ccb94e83bd4c599                branch 'master' of git://git.kernel.org/pub/scm/git/git
>   03e5527c5df33d4550ccc1446d861c0aa5689d58        not-for-merge   branch 'next' of git://git.kernel.org/pub/scm/git/git
>   cc4e3f01fc6a5e09ae5bbdc464965981fae4cf39        not-for-merge   branch 'pu' of git://git.kernel.org/pub/scm/git/git
>   7a02dba15bd28826344f9c14a5e2b5c57eeb7e50        not-for-merge   branch 'todo' of git://git.kernel.org/pub/scm/git/git

OK, nevermind.  I admitedly never have been close enough to the related 
code.

And I don't think this particular case is interesting anyway as the 
reflogs for the various branches alre already involved.  I was thinking 
more about the "git fetch git://some.random.repo foobar" case where the 
summary also explicitly shows:

From: git://some.random.repo
  ......  foobar   -> FETCH_HEAD

In that case the only reference to the fetched branch is stored in 
FETCH_HEAD and that is what might be worthwile for a reflog.


Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-08  3:35                     ` Nicolas Pitre
@ 2011-12-08  3:40                       ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2011-12-08  3:40 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Brandon Casey, Junio C Hamano, Linus Torvalds,
	Ævar Arnfjörð, Git Mailing List

On Wed, Dec 07, 2011 at 10:35:00PM -0500, Nicolas Pitre wrote:

> And I don't think this particular case is interesting anyway as the 
> reflogs for the various branches alre already involved.  I was thinking 
> more about the "git fetch git://some.random.repo foobar" case where the 
> summary also explicitly shows:
> 
> From: git://some.random.repo
>   ......  foobar   -> FETCH_HEAD
> 
> In that case the only reference to the fetched branch is stored in 
> FETCH_HEAD and that is what might be worthwile for a reflog.

I agree that is the interesting case. Perhaps we could just not bother
writing the other case into the reflog at all. So the reflog would be
sensible and contain only the set of things they had fetched or pulled
explicitly by URL. If they really want to do a multi-ref one-off fetch
from some URL, then we write multiple reflog entries. But at least the
user is very aware of what they've done, so they're not surprised by the
reflog advancing by more than 1 entry.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-07 22:12             ` Nicolas Pitre
  2011-12-07 22:53               ` Jeff King
@ 2011-12-09 17:35               ` Junio C Hamano
  2011-12-09 18:34                 ` Nicolas Pitre
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2011-12-09 17:35 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Brandon Casey, Jeff King, Linus Torvalds,
	Ævar Arnfjörð, Git Mailing List

Nicolas Pitre <nico@fluxnic.net> writes:

> On Sat, 3 Dec 2011, Brandon Casey wrote:
>
>> Linus's scenario of fetching a lot of stuff that never actually makes
>> it into the reflogs is still a valid problem.  I'm not sure that
>> people who don't know what they are doing are going to run into this
>> problem though.  Since he fetches a lot of stuff without ever checking
>> it out or creating a branch from it, potentially many objects become
>> unreferenced every time FETCH_HEAD changes.
>
> Maybe  FETCH_HEAD should have a reflog too?

It is a feature that the objects that were fetched for a quick peek become
immediately unreferenced and eligible for early removal unless they are
kept somewhere, e.g. remote tracking refs. What problem are we trying to
solve?

I thought everybody agreed that the current expire window for unreachable
objects is way too conservative, especially given that the only purpose of
that window is to protect live objects from concurrent gcs. Perhaps the
only thing we need to do is to trim that window down to say 2 days or even
8 hours?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git auto-repack is broken...
  2011-12-09 17:35               ` Junio C Hamano
@ 2011-12-09 18:34                 ` Nicolas Pitre
  0 siblings, 0 replies; 19+ messages in thread
From: Nicolas Pitre @ 2011-12-09 18:34 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Brandon Casey, Jeff King, Linus Torvalds,
	Ævar Arnfjörð, Git Mailing List

On Fri, 9 Dec 2011, Junio C Hamano wrote:

> Nicolas Pitre <nico@fluxnic.net> writes:
> 
> > On Sat, 3 Dec 2011, Brandon Casey wrote:
> >
> >> Linus's scenario of fetching a lot of stuff that never actually makes
> >> it into the reflogs is still a valid problem.  I'm not sure that
> >> people who don't know what they are doing are going to run into this
> >> problem though.  Since he fetches a lot of stuff without ever checking
> >> it out or creating a branch from it, potentially many objects become
> >> unreferenced every time FETCH_HEAD changes.
> >
> > Maybe  FETCH_HEAD should have a reflog too?
> 
> It is a feature that the objects that were fetched for a quick peek become
> immediately unreferenced and eligible for early removal unless they are
> kept somewhere, e.g. remote tracking refs. What problem are we trying to
> solve?

This is indeed a tangential observation to the expiration delay.  I was 
just suggesting that having a reflog for FETCH_HEAD in the case when you 
fetch a branch with an explicit URL might be handy.


Nicolas

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-12-09 18:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-02 16:22 git auto-repack is broken Linus Torvalds
2011-12-02 16:27 ` Ævar Arnfjörð Bjarmason
2011-12-02 16:56   ` Linus Torvalds
2011-12-02 17:10     ` Jeff King
2011-12-02 17:35       ` Junio C Hamano
2011-12-02 17:45         ` Jeff King
2011-12-02 18:08           ` Junio C Hamano
2011-12-02 18:13             ` Jeff King
2011-12-03 19:42           ` Brandon Casey
2011-12-07 22:12             ` Nicolas Pitre
2011-12-07 22:53               ` Jeff King
2011-12-08  0:18                 ` Nicolas Pitre
2011-12-08  0:45                   ` Jeff King
2011-12-08  3:35                     ` Nicolas Pitre
2011-12-08  3:40                       ` Jeff King
2011-12-09 17:35               ` Junio C Hamano
2011-12-09 18:34                 ` Nicolas Pitre
2011-12-08  0:49             ` Jeff King
  -- strict thread matches above, loose matches on Subject: below --
2011-12-03  6:55 George Spelvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).