git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dangling blob which is not dangling at all
@ 2007-08-01  1:34 Domenico Andreoli
  2007-08-01  2:22 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Domenico Andreoli @ 2007-08-01  1:34 UTC (permalink / raw)
  To: git

Hi,

  first of all, I want to thank Linus and you all for git, it is
revolutionizing my every-day work flow. Exceptional.

Playing with my central bare git repository (yes, I am a former
CVS/SVN user) and trying to lose data I discovered something I am not
understanding well.

Running git fsck --no-reflogs I found some dangling objects (I have
to say I enjoyed a lot in navigating commits, trees and blobs with
plumbing... really!), two were commits and one was a blob.

One of the commits was there because I pushed (forcing) from a working
repository after a git reset HEAD^. I checked it and removed it and
all the other dependant objects until the blob which contained the new
version of that file. It seems I even understood what I was doing! ;)
Until here, everything had been smooth.

Second commit was something pushed from another repository but at the
right head was strangely recorded with a different hash. Removing it,
its tree and another sub-tree, no blob was pending. So the final blob
containing the change was still used elsewhere, indeed by the "right
head" of above. While I would expect this in a working repository where
merging is happening all the day, it is not clear how it happened to
my central repository, where nobody does any work. Any idea?

And now, what I think is a bug, the dangling blob. It is signaled as
dangling but it is not. Hunting for a commit/tree/blob to compare it to
in order to understand which modification it was hiding, I found a tree
object which referred to it, which by definition of "dangling object"
should not exist. So fsck looks f*cked... and I am well available to
understand what is going wrong here, but please help me.

$ git fsck --no-reflogs
dangling blob e5d444e61b834c34710ce8fb5cb176e20e5894e1
$ git-ls-tree 70b58535361eb633d44d4f1275af3421ca6a5ed7
...
100644 blob e5d444e61b834c34710ce8fb5cb176e20e5894e1    link_stream.c
...

If you read me until here, good night! ;)

Cheers,
Domenico

-----[ Domenico Andreoli, aka cavok
 --[ http://www.dandreoli.com/gpgkey.asc
   ---[ 3A0F 2F80 F79C 678A 8936  4FEE 0677 9033 A20E BC50

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dangling blob which is not dangling at all
  2007-08-01  1:34 dangling blob which is not dangling at all Domenico Andreoli
@ 2007-08-01  2:22 ` Linus Torvalds
  2007-08-01  6:32   ` Domenico Andreoli
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2007-08-01  2:22 UTC (permalink / raw)
  To: Domenico Andreoli; +Cc: git



On Wed, 1 Aug 2007, Domenico Andreoli wrote:
> 
> $ git fsck --no-reflogs
> dangling blob e5d444e61b834c34710ce8fb5cb176e20e5894e1
>
> $ git-ls-tree 70b58535361eb633d44d4f1275af3421ca6a5ed7
> ...
> 100644 blob e5d444e61b834c34710ce8fb5cb176e20e5894e1    link_stream.c

Have you done clones with stupid protocols (rsync and/or http)?

The simplest explanation for this is that since you didn't do "--full" for 
fsck, then your git-fsck never looked into the pack-files you had. And the 
tree might well exist in a pack-file, and thus not even looked at by fsck.

So try "git fsck --full", and see if that changes the picture.

(Usually, you'd never have a pack-file *and* the loose object it points to 
both at the same time, but especially if you use the dumb transports 
(rsync and/or http), you'll get pack-files from remotes, and thus you 
won't have the normal nice behaviour of pack-files being "old state", and 
loose objects being "new state".

The easiest fixup is likely to just do "git gc", which which do a nice 
repack, and get rid of loose objects that are duplicates of stuff 
that is also in a pack-file.

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dangling blob which is not dangling at all
  2007-08-01  2:22 ` Linus Torvalds
@ 2007-08-01  6:32   ` Domenico Andreoli
  2007-08-01  7:27     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Domenico Andreoli @ 2007-08-01  6:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Tue, Jul 31, 2007 at 07:22:14PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 1 Aug 2007, Domenico Andreoli wrote:
> > 
> > $ git fsck --no-reflogs
> > dangling blob e5d444e61b834c34710ce8fb5cb176e20e5894e1
> >
> > $ git-ls-tree 70b58535361eb633d44d4f1275af3421ca6a5ed7
> > ...
> > 100644 blob e5d444e61b834c34710ce8fb5cb176e20e5894e1    link_stream.c
> 
> Have you done clones with stupid protocols (rsync and/or http)?

I do not remember having used any dump transport on this repository but
I recall having tried git-repack with the intent of git gc.

> So try "git fsck --full", and see if that changes the picture.

This did not change anything.

> The easiest fixup is likely to just do "git gc", which which do a nice 
> repack, and get rid of loose objects that are duplicates of stuff 
> that is also in a pack-file.

This fixed things and also warned about two heads referring to pruned
commits, which may be those two commits I removed by hand (I hope).

Cheers,
Domenico

-----[ Domenico Andreoli, aka cavok
 --[ http://www.dandreoli.com/gpgkey.asc
   ---[ 3A0F 2F80 F79C 678A 8936  4FEE 0677 9033 A20E BC50

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dangling blob which is not dangling at all
  2007-08-01  6:32   ` Domenico Andreoli
@ 2007-08-01  7:27     ` Junio C Hamano
  2007-08-01  7:42       ` Domenico Andreoli
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2007-08-01  7:27 UTC (permalink / raw)
  To: Domenico Andreoli; +Cc: Linus Torvalds, git

Domenico Andreoli <cavokz@gmail.com> writes:

> This fixed things and also warned about two heads referring to pruned
> commits, which may be those two commits I removed by hand (I hope).

Exactly.

All refs under .git/refs (the special case of this includes the
branch heads in .git/refs/heads) are your _promise_ to git that
everything that is reachable from them are supposed to be
available in your repository.  If you remove specific commits by
hand without adjusting the branch ref, you are breaking that
promise and git-fsck will notice it as a repository breakage.

If you do not need a branch and everything reachable only from
that branch, you can remove that branch (with "git branch -D"),
and run git-gc, which internally does the same reachability
analysis as git-fsck does and gets rid of objects that are no
longer necessary.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dangling blob which is not dangling at all
  2007-08-01  7:27     ` Junio C Hamano
@ 2007-08-01  7:42       ` Domenico Andreoli
  2007-08-01  8:35         ` Steven Grimm
  0 siblings, 1 reply; 8+ messages in thread
From: Domenico Andreoli @ 2007-08-01  7:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Aug 01, 2007 at 12:27:10AM -0700, Junio C Hamano wrote:
> Domenico Andreoli <cavokz@gmail.com> writes:
> 
> > This fixed things and also warned about two heads referring to pruned
> > commits, which may be those two commits I removed by hand (I hope).
> 
> Exactly.
> 
> All refs under .git/refs (the special case of this includes the
> branch heads in .git/refs/heads) are your _promise_ to git that
> everything that is reachable from them are supposed to be
> available in your repository.  If you remove specific commits by
> hand without adjusting the branch ref, you are breaking that
> promise and git-fsck will notice it as a repository breakage.

If I move any ref by hand (not that I pass the day doing this..), I
understand that some commits may suddenly result as unreachable. But
those commits I removed by hand were already unreachable so no refs
should have been referring them.

What is this reflog thing and why is required?

Domenico

-----[ Domenico Andreoli, aka cavok
 --[ http://www.dandreoli.com/gpgkey.asc
   ---[ 3A0F 2F80 F79C 678A 8936  4FEE 0677 9033 A20E BC50

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dangling blob which is not dangling at all
  2007-08-01  7:42       ` Domenico Andreoli
@ 2007-08-01  8:35         ` Steven Grimm
  2007-08-01  9:13           ` Rogan Dawes
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Grimm @ 2007-08-01  8:35 UTC (permalink / raw)
  To: Domenico Andreoli; +Cc: Junio C Hamano, git

Domenico Andreoli wrote:
> What is this reflog thing and why is required?
>   

It is a log of where each ref pointed at any given time. Or rather, a 
log of changes to refs, with timestamps. It is not *required* per se 
(you can turn it off and almost all of git will continue to work as 
before) but it's handy in that you can say stuff like

git checkout -b newbranch master@"{4 days ago}"

and git will give you a new branch pointing at the rev that master 
pointed to 4 days ago, even if it's a rev that is no longer reachable 
from any of the existing heads (e.g., because you did a "git rebase" and 
the rev in question was replaced by a new one.) Obviously as soon as you 
do a "git gc" you will lose the ability to go back to unreachable revs 
using the reflog.

I primarily use the reflog to undo rebase operations. Not that I need to 
do that very often, but it's occasionally handy, e.g., if there was a 
conflict and I made a mistake while resolving it.

-Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dangling blob which is not dangling at all
  2007-08-01  8:35         ` Steven Grimm
@ 2007-08-01  9:13           ` Rogan Dawes
  2007-08-01 14:21             ` Domenico Andreoli
  0 siblings, 1 reply; 8+ messages in thread
From: Rogan Dawes @ 2007-08-01  9:13 UTC (permalink / raw)
  To: Steven Grimm; +Cc: Domenico Andreoli, git

Steven Grimm wrote:
> Domenico Andreoli wrote:
>> What is this reflog thing and why is required?
>>   
> 
> It is a log of where each ref pointed at any given time. Or rather, a 
> log of changes to refs, with timestamps. It is not *required* per se 
> (you can turn it off and almost all of git will continue to work as 
> before) but it's handy in that you can say stuff like
> 
> git checkout -b newbranch master@"{4 days ago}"
> 
> and git will give you a new branch pointing at the rev that master 
> pointed to 4 days ago, even if it's a rev that is no longer reachable 
> from any of the existing heads (e.g., because you did a "git rebase" and 
> the rev in question was replaced by a new one.) Obviously as soon as you 
> do a "git gc" you will lose the ability to go back to unreachable revs 
> using the reflog.
> 

Not strictly true. "git gc" does take the reflogs into account when 
determining reachability, but it also prunes the reflogs periodically to 
prevent them from growing without bound (and preventing pruning of 
otherwise unreachable objects).

 From the git-gc manpage:

CONFIGURATION
  The optional configuration variable gc.reflogExpire can be set to
  indicate how long historical entries within each branch's reflog should
  remain available in this repository. The setting is expressed as a
  length of time, for example 90 days or 3 months. It defaults to 90
  days.

Regards,

Rogan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: dangling blob which is not dangling at all
  2007-08-01  9:13           ` Rogan Dawes
@ 2007-08-01 14:21             ` Domenico Andreoli
  0 siblings, 0 replies; 8+ messages in thread
From: Domenico Andreoli @ 2007-08-01 14:21 UTC (permalink / raw)
  To: git

On Wed, Aug 01, 2007 at 11:13:27AM +0200, Rogan Dawes wrote:
> Steven Grimm wrote:
>> Domenico Andreoli wrote:
>>> What is this reflog thing and why is required?
>>>   
>> It is a log of where each ref pointed at any given time. Or rather, a log 
>> of changes to refs, with timestamps. It is not *required* per se (you can 
>> turn it off and almost all of git will continue to work as before) but 
>> it's handy in that you can say stuff like
>> git checkout -b newbranch master@"{4 days ago}"
>> and git will give you a new branch pointing at the rev that master pointed 
>> to 4 days ago, even if it's a rev that is no longer reachable from any of 
>> the existing heads (e.g., because you did a "git rebase" and the rev in 
>> question was replaced by a new one.) Obviously as soon as you do a "git 
>> gc" you will lose the ability to go back to unreachable revs using the 
>> reflog.
>
> Not strictly true. "git gc" does take the reflogs into account when 
> determining reachability, but it also prunes the reflogs periodically to 
> prevent them from growing without bound (and preventing pruning of 
> otherwise unreachable objects).

so, besides playing with head refs by hand and forcing pushing to
"not strict subset" heads, having dangling commits may be physiologic?

and the only way to leak commits is from heads? on the countary, has
one a severely broken repository?

many thanks,
Domenico

-----[ Domenico Andreoli, aka cavok
 --[ http://www.dandreoli.com/gpgkey.asc
   ---[ 3A0F 2F80 F79C 678A 8936  4FEE 0677 9033 A20E BC50

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-08-01 14:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-01  1:34 dangling blob which is not dangling at all Domenico Andreoli
2007-08-01  2:22 ` Linus Torvalds
2007-08-01  6:32   ` Domenico Andreoli
2007-08-01  7:27     ` Junio C Hamano
2007-08-01  7:42       ` Domenico Andreoli
2007-08-01  8:35         ` Steven Grimm
2007-08-01  9:13           ` Rogan Dawes
2007-08-01 14:21             ` Domenico Andreoli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).