git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* grafts+repack+prune = history at danger
@ 2007-01-25 17:17 Johannes Sixt
  2007-01-25 23:07 ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Sixt @ 2007-01-25 17:17 UTC (permalink / raw)
  To: git

Isn't there a major hole in the logic how repack works when grafts are
in effect?

I did this (details follow):

1. specify grafts
2. repack
3. prune
4. clone

Result: Broken history in the clone; info/grafts was not copied.
This is with git version 1.5.0.rc2.g18af.


1. I imported a cvs repository into git and "fixed" the history using
grafts. In particular:

      o--B--X   <== this commit is should be skipped
          \  \
graft =>   ---A--o

I specified in .git/info/grafts that the parent of A should be B. Of
course, commit A has still recorded X as its parent.

2. Then I repacked the repo. But this did not erase all objects:

$ git repack -a -d
$ git count-objects -v
count: 5
size: 28
in-pack: 3392
packs: 1
prune-packable: 0
garbage: 0
$ git fsck-objects
dangling commit bb828bfbd213a97817a95506bab4eeaa70538e2e

This commit bb828... is X.

3. Now git prune happily removes the 5 objects.

4. 'git clone First Second' clones the repository without problems.

But now in the clone the history is kaputt. Because commit X is not in
the cloned pack. Nor is there any info/grafts file. The original history
is still OK as long as the info/grafts file is present; but if it is
removed, the original repo is also damaged.

IMHO, this is a very serious issue. I think that repack should not walk
the grafted history. Alternatively, the info/grafts file must be copied
by the clone and respected by fsck-objects.

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-25 17:17 grafts+repack+prune = history at danger Johannes Sixt
@ 2007-01-25 23:07 ` Junio C Hamano
  2007-01-26  8:13   ` Johannes Sixt
  2007-01-26  9:15   ` Mark Wooding
  0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2007-01-25 23:07 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Johannes Sixt <J.Sixt@eudaptics.com> writes:

> Isn't there a major hole in the logic how repack works when grafts are
> in effect?
>
> I did this (details follow):
>
> 1. specify grafts
> 2. repack
> 3. prune
> 4. clone
>
> Result: Broken history in the clone; info/grafts was not copied.

That is expected.

If you had problem in the original repository (i.e. the one with
grafts) that lost objects after step 3., that would be serious
and needs to be fixed, but otherwise the rule of thumb has
always been not to expose repositories with grafts without
telling unsuspecting downstream people for cloning or fetching.
It will give objects they did not even ask for.

grafts are local matter for archaeologist's convenience to glue
two independent histories together, and not much more.  For
example, the history that starts at v2.6.12-rc2 can be grafted
on top of old bkcvs history, but people who clone from you may
not expect to get anything beyond the true origin of the history
at v2.6.12-rc2 (after all that commit object records it as a
parentless commit).

I suspect you could extend fetch-pack protocol to give existing
grafts from upload-pack to trivially fix 'clone', but I do not
know offhand what the ramifications of it are for normal
'fetch'.  You would need to merge potentially conflicting graft
information you obtained from where you fetched from and what
you already had before starting to fetch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-25 23:07 ` Junio C Hamano
@ 2007-01-26  8:13   ` Johannes Sixt
  2007-01-26  8:54     ` Junio C Hamano
  2007-01-26  9:15   ` Mark Wooding
  1 sibling, 1 reply; 15+ messages in thread
From: Johannes Sixt @ 2007-01-26  8:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> 
> Johannes Sixt <J.Sixt@eudaptics.com> writes:
> 
> > Isn't there a major hole in the logic how repack works when grafts are
> > in effect?
> >
> > I did this (details follow):
> >
> > 1. specify grafts
> > 2. repack
> > 3. prune
> > 4. clone
> >
> > Result: Broken history in the clone; info/grafts was not copied.
> 
> That is expected.
> 
> If you had problem in the original repository (i.e. the one with
> grafts) that lost objects after step 3., that would be serious
> and needs to be fixed,

Oh, the original repo *does* loose the object after step 3, but you
would not notice it until you remove the grafts file.

> grafts are local matter for archaeologist's convenience to glue
> two independent histories together, and not much more.

Agreed. Then grafts must be disregarded by (almost) all plumbing, most
notably fsck-objects, prune, pack-objects, but also
{fetch,upload,send,receive}-pack. They should be obeyed only by the log
and diff families and certainly also rev-list on request.

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26  8:13   ` Johannes Sixt
@ 2007-01-26  8:54     ` Junio C Hamano
  2007-01-26  9:21       ` Johannes Sixt
  2007-01-26 15:55       ` Linus Torvalds
  0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2007-01-26  8:54 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Johannes Sixt <J.Sixt@eudaptics.com> writes:

> Oh, the original repo *does* loose the object after step 3, but you
> would not notice it until you remove the grafts file.

That is Ok -- you did want to lose that object B because you
told git you want to pretend A's parent is not B in _your_ local
repository.

>> grafts are local matter for archaeologist's convenience to glue
>> two independent histories together, and not much more.
>
> Agreed. Then grafts must be disregarded by (almost) all plumbing, most
> notably fsck-objects, prune, pack-objects,...

You are not agreeing.

Graft is a local matter, but that does not mean it should
introduce inconsistencies.  It is a way to _locally_ change the
world view, and to give the consistent world view locally, not
only the commands you listed (fsck, prune, pack-objects) but
also log, rev-list and friends all should take grafts into
account, which is why losing B is the right thing to do if you
repack or prune.  In your altered world, B is not part of any
remaining history.

The problem you noticed is a limitation of fetch/clone.
Exposing the locally modified world view to the other end so
that a cloned repository has the exactly the same view by
copying the grafts file would be trivial [*1*].

However, it is rather tricky if you try to extend it to fetching
into an existing repository.  Which may have its own grafts and
define an altered world view in its own way.  And that altered
world view may conflict (e.g. it may already say the parent of A
is not B but not X as in the repository you are cloning from but
some other commit Y).

That's why traditionally we just punt the whole issue by saying
don't exchange objects between repositories that have grafts
without thinking (primarily because we haven't thought things
through -- we are lazy bastards).

One thing you could do is to take the local-ness of grafts more
literally and enforce it more strictly by dropping grafts while
fetch-pack and receive-pack exchange common objects and spawn
pack-objects to come up with objects needed to be sent.  But
because we currently punt, we do not even do that.

If we were to spend the effort to do that temporary dropping of
grafts (which I would expect to be quite ugly code), I suspect
we are better off thinking things through to define the desired
semantics, what should happen when objects are exchanged between
two repositories that have their world views altered with their
grafts.  The end result would most likely update the info/grafts
in your repository when you fetch from a repository with grafts,
and probably update info/grafts at the remote when you push from
a repository with grafts.

[*1*] It does require fetch-pack protocol update, though, so it
is some work.  It is still trivial in the sense that it is clear
what is needed to realize exactly the same the world view -- the
copy should have the exact copy of info/grafts file.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-25 23:07 ` Junio C Hamano
  2007-01-26  8:13   ` Johannes Sixt
@ 2007-01-26  9:15   ` Mark Wooding
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Wooding @ 2007-01-26  9:15 UTC (permalink / raw)
  To: git

Junio C Hamano <junkio@cox.net> wrote:

> grafts are local matter for archaeologist's convenience to glue
> two independent histories together, and not much more. 

I've found them useful for doing imports from CVS.  You run
git-cvsimport, and then manually find the places where merges happened
and record them as grafts.  gitk then correctly displays the history,
which is nice.

What you then do is run cg-admin-rewritehist, which magically transforms
the history into one with the grafts etched in.  All that remains is to
translate the refs, which you can do with a sed script you got
cg-admin-rewritehist to write for you.

-- [mdw]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26  8:54     ` Junio C Hamano
@ 2007-01-26  9:21       ` Johannes Sixt
  2007-01-26  9:31         ` Junio C Hamano
  2007-01-26 13:08         ` Jakub Narebski
  2007-01-26 15:55       ` Linus Torvalds
  1 sibling, 2 replies; 15+ messages in thread
From: Johannes Sixt @ 2007-01-26  9:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> Graft is a local matter, but that does not mean it should
> introduce inconsistencies.  It is a way to _locally_ change the
> world view, and to give the consistent world view locally, not
> only the commands you listed (fsck, prune, pack-objects) but
> also log, rev-list and friends all should take grafts into
> account, which is why losing B is the right thing to do if you
> repack or prune.  In your altered world, B is not part of any
> remaining history.

Here's my stance on it. Grafts should be a local matter. And they alter
the world view, with a pronounciation on *view*. That's why I proposed
that only log familiy of commands obey them[*]. And probably rev-list so
that gitk et.al. have a way to obey them. And also the ref parser (so
that master~20 is what it looks it is). Everything else should disregard
grafts: repack, prune, fetch, <transfer>-pack, push etc. No nasty side
effects anymore. No transfer of the grafts file needed. No clash when
someone else has a different *view* of the world.

Then the location of the file in .git/info/grafts is justified. If
grafts continue to have the radical influence that they have now, then
the grafts file is better located in .git/objects/info/grafts as part of
the objects database.

[*] ok, I originally also proposed the diff family, but that's likely
not necessary.

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26  9:21       ` Johannes Sixt
@ 2007-01-26  9:31         ` Junio C Hamano
  2007-01-26  9:48           ` Johannes Sixt
  2007-01-26 13:08         ` Jakub Narebski
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2007-01-26  9:31 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Johannes Sixt <J.Sixt@eudaptics.com> writes:

> Here's my stance on it. Grafts should be a local matter. And they alter
> the world view, with a pronounciation on *view*. That's why I proposed
> that only log familiy of commands obey them[*]. And probably rev-list so
> that gitk et.al. have a way to obey them. And also the ref parser (so
> that master~20 is what it looks it is). Everything else should disregard
> grafts: repack, prune, fetch, <transfer>-pack, push etc. No nasty side
> effects anymore.

I said you are not agreeing, but I should have said you are not
understanding.

grafts can bring otherwise disconnected commits into the
altered history, so if you want your log to honor grafts, your
prune and repack need to be aware of them lest you would not
lose them.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26  9:31         ` Junio C Hamano
@ 2007-01-26  9:48           ` Johannes Sixt
  2007-01-26 10:15             ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Sixt @ 2007-01-26  9:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> 
> Johannes Sixt <J.Sixt@eudaptics.com> writes:
> 
> > Here's my stance on it. Grafts should be a local matter. And they alter
> > the world view, with a pronounciation on *view*. That's why I proposed
> > that only log familiy of commands obey them[*]. And probably rev-list so
> > that gitk et.al. have a way to obey them. And also the ref parser (so
> > that master~20 is what it looks it is). Everything else should disregard
> > grafts: repack, prune, fetch, <transfer>-pack, push etc. No nasty side
> > effects anymore.
> 
> I said you are not agreeing, but I should have said you are not
> understanding.

Oh, I think I understand very well. It may just be that I cannot express
myself that well ;)

I propose that grafts are only about *view*, not database integrity.

There are no tools that manipulate grafts, that would stop the user to
make some blunder; the user has to edit the file *manually*. It is
wrong, wrong, wrong to let such a file dictate database integrity.

> grafts can bring otherwise disconnected commits into the
> altered history, so if you want your log to honor grafts, your
> prune and repack need to be aware of them lest you would not
> lose them.

Sure, if I connect my linux repo with a graft to the historical BK tree,
then toss the ref that pointed to the historical tree, then git prune:
- then currently it won't prune the historical tree
- but under my proposal it would. Silly me. Why did I remove that ref?

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26  9:48           ` Johannes Sixt
@ 2007-01-26 10:15             ` Junio C Hamano
  2007-01-26 10:41               ` Johannes Sixt
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2007-01-26 10:15 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Johannes Sixt <J.Sixt@eudaptics.com> writes:

> Sure, if I connect my linux repo with a graft to the historical BK tree,
> then toss the ref that pointed to the historical tree, then git prune:
> - then currently it won't prune the historical tree
> - but under my proposal it would. Silly me. Why did I remove that ref?

If your version of git worked the way you describe, the only
reason you removed that ref would be to make sure that your
altered view would be destroyed (you won't be able to do "git
log" across that graft boundary anymore).  Indeed that would be
a silly thing to do.

Thankfully, the real git does not behave that way.  That is why
fsck/prune _must_ honor grafts.  That makes the locally altered
view consistent.  To the altered world view, what are stored in
the object database do not change, but your view of how they are
connected does.  And if your altered view thinks commit
v2.6.12-rc2 has one of the commits in the bkcvs history as its
parent, you do not want to lose that history merely because you
lost a ref to it -- as long as the commit tagged as v2.6.12-rc2
is reachable, its (imaginary) parent should be as well.

If you want to switch out of an altered universe, you may need
to do more than just remove grafts (objects that were hidden by
grafts were immaterial in the altered universe, but now you may
need to get them back, as in your fixed-up imported repository
example, and objects that existed only because grafts pulled
them in are now made unreachable and become prunable), but that
goes without saying.

If you want to make it easier to switch back and forth between
altered reality and the real world, fsck/prune/repack may need
to be taught to consider both real and grafted parents to be
connected, so that you do not have to lose objects that will
become necessary when you come out of the altered world, but I
am not sure if it is worth it.  If you prune while in the real
world the "both real and grafted" safety would obviously not
kick in when you run prune, so when you reinstall the grafts,
some of the necessary objects would be already gone.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26 10:15             ` Junio C Hamano
@ 2007-01-26 10:41               ` Johannes Sixt
  2007-01-26 11:29                 ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Sixt @ 2007-01-26 10:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> 
> Johannes Sixt <J.Sixt@eudaptics.com> writes:
> 
> > Sure, if I connect my linux repo with a graft to the historical BK tree,
> > then toss the ref that pointed to the historical tree, then git prune:
> > - then currently it won't prune the historical tree
> > - but under my proposal it would. Silly me. Why did I remove that ref?
> 
> [...]
> 
> Thankfully, the real git does not behave that way.  That is why
> fsck/prune _must_ honor grafts.  That makes the locally altered
> view consistent.  To the altered world view, what are stored in
> the object database do not change, but your view of how they are
> connected does.  And if your altered view thinks commit
> v2.6.12-rc2 has one of the commits in the bkcvs history as its
> parent, you do not want to lose that history merely because you
> lost a ref to it -- as long as the commit tagged as v2.6.12-rc2
> is reachable, its (imaginary) parent should be as well.

>From your argument I deduce that grafts are a very important thing (once
they exist in a repo). But the current implementation does not honor
this:

- the grafts file is not part of the objects database
- it is manipulated manually instead of by tools the check for errors
- it is not transferred across clones/pulls/pushes (it's even possible
to create an inconsistent clone)

The way out that I see is to make grafts much, much less important.
Namely that they are obeyed _only_ by tools that _present_ the database
contents. All manipulators must disregard grafts.

Consequently, if I install grafts, I must make sure that I don't prune
away objects that the grafted history needs (i.e. avoid the silliness
mentioned above). If I happen to make the grafted history inconsistent,
I can make it consistent again by removing the grafts file (it was a
local thingy anyway) - no harm done - just the _presentation_ was
altered.

-- Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26 10:41               ` Johannes Sixt
@ 2007-01-26 11:29                 ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2007-01-26 11:29 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

Johannes Sixt <J.Sixt@eudaptics.com> writes:

> - the grafts file is not part of the objects database

This is a very conscious design decision from an ancient times.
It used to be fashionable to share object store across different
repositories (you literally symlinked .git/objects), and grafts
are local in the sense that they are per-repository, and that is
the reason it lives in .git/info.  There is not much reason
either way and if I were doing this from scratch I would
probably place it in .git/objects/info next to alternates.

> - it is manipulated manually instead of by tools the check for errors

Yes, but that is only because nobody saw need for such a tool so
far.  In reality, grafts have been pretty much "install and
forget" thing.  You graft 2.6.12-rc2 on top of the bkcvs tip
once, and then do not think about it after doing so.

When somebody sees a need, you know what will happen ;-).

> - it is not transferred across clones/pulls/pushes (it's even possible
> to create an inconsistent clone)

Yes, as I already said that is where we punted and declared that
the grafts are local matter.

Even though your resulting clone is inconsistent, I do not even
have to say "tough".  You can just tell what the necessary graft
file should look like to the repository owner at the other end,
and the life will be peachy again.

I even outlined the issues you (or somebody else who may be
interested) would need to look into to make it more global.  Do
you need anything more?

> The way out that I see is to make grafts much, much less important.

Breaking what already works does not sound like a way out.  

For local-only, "install and forget" use, what the current setup
does is consistent and works reasonably well.  I would not say
it is perfect, but I do not know of any outstanding bugs (and
what you mentioned in these message are certainly not).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26  9:21       ` Johannes Sixt
  2007-01-26  9:31         ` Junio C Hamano
@ 2007-01-26 13:08         ` Jakub Narebski
  1 sibling, 0 replies; 15+ messages in thread
From: Jakub Narebski @ 2007-01-26 13:08 UTC (permalink / raw)
  To: git

[Cc: git@vger.kernel.org]

Johannes Sixt wrote:

> Junio C Hamano wrote:
>> Graft is a local matter, but that does not mean it should
>> introduce inconsistencies.  It is a way to _locally_ change the
>> world view, and to give the consistent world view locally, not
>> only the commands you listed (fsck, prune, pack-objects) but
>> also log, rev-list and friends all should take grafts into
>> account, which is why losing B is the right thing to do if you
>> repack or prune.  In your altered world, B is not part of any
>> remaining history.
> 
> Here's my stance on it. Grafts should be a local matter. And they alter
> the world view, with a pronounciation on *view*. That's why I proposed
> that only log familiy of commands obey them[*]. And probably rev-list so
> that gitk et.al. have a way to obey them. And also the ref parser (so
> that master~20 is what it looks it is). Everything else should disregard
> grafts: repack, prune, fetch, <transfer>-pack, push etc. No nasty side
> effects anymore. No transfer of the grafts file needed. No clash when
> someone else has a different *view* of the world.

If I remember correctly there was some time ago discussion about this
topic, namely should connectivity (including prune, repack, etc.) take
only true parents, only grafts (local view), or both. IIRC there were
no conclusion (besides perhaps that the option to choose should be
configurable), and no code.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26  8:54     ` Junio C Hamano
  2007-01-26  9:21       ` Johannes Sixt
@ 2007-01-26 15:55       ` Linus Torvalds
  2007-01-26 23:46         ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2007-01-26 15:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git



On Fri, 26 Jan 2007, Junio C Hamano wrote:
> 
> One thing you could do is to take the local-ness of grafts more
> literally and enforce it more strictly by dropping grafts while
> fetch-pack and receive-pack exchange common objects and spawn
> pack-objects to come up with objects needed to be sent.  But
> because we currently punt, we do not even do that.

One option might be:

 - add a global flag (like the current "save_commit_buffer") that commands 
   can set to specify whether they want to honor grafts or not.

   The "please_follow_grafts" flag defaults to 1.

 - "git send-pack" would explicitly set it to zero, and thus we'd always 
   send a non-grafted result.

 - "git prune" would *also* explicitly set it to zero, but would also 
   manually look at the grafts file, and mark anything that is set in the 
   grafts file as being reachable (the same way it does for index entries 
   etc).

It might also be an option to then do:

 - "git repack" should probably also set it to zero - I think we might be 
   better off packing any grafted data separately.

The alternative, of course, is to try to transfer the grafts file for 
clones and fetches, but that is likely to be a *bad* idea. It's even a 
potential security issue: grafts can literally be used to short-circuit 
some of the inherent safety in git, in that an attacker can make a graft 
that makes history *look* fine, but hide part of it (you can't "really" 
hide history, but you can make normal git operations like "git log" 
basically ignore it by judicious use of grafts).

			Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26 15:55       ` Linus Torvalds
@ 2007-01-26 23:46         ` Junio C Hamano
  2007-01-27  0:56           ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2007-01-26 23:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Sixt, git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, 26 Jan 2007, Junio C Hamano wrote:
>> 
>> One thing you could do is to take the local-ness of grafts more
>> literally and enforce it more strictly by dropping grafts while
>> fetch-pack and receive-pack exchange common objects and spawn
>> pack-objects to come up with objects needed to be sent.  But
>> because we currently punt, we do not even do that.
>
> One option might be:
>
>  - add a global flag (like the current "save_commit_buffer") that commands 
>    can set to specify whether they want to honor grafts or not.
>
>    The "please_follow_grafts" flag defaults to 1.
>
>  - "git send-pack" would explicitly set it to zero, and thus we'd always 
>    send a non-grafted result.
>
>  - "git prune" would *also* explicitly set it to zero, but would also 
>    manually look at the grafts file, and mark anything that is set in the 
>    grafts file as being reachable (the same way it does for index entries 
>    etc).

I am not sure why your "git prune" one does that, but will think
about it for some time first before I ask you to waste your time
explaining it me.

> It might also be an option to then do:
>
>  - "git repack" should probably also set it to zero - I think we might be 
>    better off packing any grafted data separately.
>
> The alternative, of course, is to try to transfer the grafts file for 
> clones and fetches, but that is likely to be a *bad* idea. It's even a 
> potential security issue: grafts can literally be used to short-circuit 
> some of the inherent safety in git, in that an attacker can make a graft 
> that makes history *look* fine, but hide part of it (you can't "really" 
> hide history, but you can make normal git operations like "git log" 
> basically ignore it by judicious use of grafts).

I agree that transferring, potentially merging, and
automatically installing grafts upon fetch has security
implications.  Thanks for pointing it out [*1*].

But if you are cloning, it would be handy if send-pack followed
the altered world view and the result had identical grafts,
which is why I am not 100% convinced about send-pack always
sending a non-grafted result.


[Footnote]

*1* Running "git fetch some-random-url" is supposed to be a safe
operation.  The only thing it does is to download some objects
that are only reachable from .git/FETCH_HEAD, and it never
overwrites objects that existed in your repository before the
fetch, so after looking at what .git/FETCH_HEAD has, potentially
malicious contents will become cruft and you can gc them away.

Running "git pull some-random-url ref" (without storing refspec)
to merge and then running "git reset --hard ORIG_HEAD" also is,
except that the reflog entry for the current branch would refer
to the merge commit and you could inject bad objects that will
not be immediately pruned in your object database that way.

The moral of the story is you should not pull from suspicious
source without thinking; fetching and immediately discarding
should always be safe.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: grafts+repack+prune = history at danger
  2007-01-26 23:46         ` Junio C Hamano
@ 2007-01-27  0:56           ` Linus Torvalds
  0 siblings, 0 replies; 15+ messages in thread
From: Linus Torvalds @ 2007-01-27  0:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git



On Fri, 26 Jan 2007, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> >
> >  - "git prune" would *also* explicitly set it to zero, but would also 
> >    manually look at the grafts file, and mark anything that is set in the 
> >    grafts file as being reachable (the same way it does for index entries 
> >    etc).
> 
> I am not sure why your "git prune" one does that, but will think
> about it for some time first before I ask you to waste your time
> explaining it me.

Simple: the grafts may actually _hide_ history too - not just add it.

Sure, commonly, a graft is used to graft two complete trees together (eg, 
you'd graft the old Linux history into the new Linux history tree). 

However, they _can_ also be used to "fix" history - say that you had one 
tree that has a rough history (with all the releases, but not the full 
history between them), and another "fine-grained" historical tree. You 
could use a graft to replace the rough history version with the 
fine-grained one, so the graft may actually hide stuff that is there in 
the rough history.

So in this case, we wouldn't necessarily want to prune stuff that 
"exists", but is hidden by a graft. So in my suggestion, pruning would 
basically only use the grafts file to *add* refs, never to hide them.

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-01-27  0:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-25 17:17 grafts+repack+prune = history at danger Johannes Sixt
2007-01-25 23:07 ` Junio C Hamano
2007-01-26  8:13   ` Johannes Sixt
2007-01-26  8:54     ` Junio C Hamano
2007-01-26  9:21       ` Johannes Sixt
2007-01-26  9:31         ` Junio C Hamano
2007-01-26  9:48           ` Johannes Sixt
2007-01-26 10:15             ` Junio C Hamano
2007-01-26 10:41               ` Johannes Sixt
2007-01-26 11:29                 ` Junio C Hamano
2007-01-26 13:08         ` Jakub Narebski
2007-01-26 15:55       ` Linus Torvalds
2007-01-26 23:46         ` Junio C Hamano
2007-01-27  0:56           ` Linus Torvalds
2007-01-26  9:15   ` Mark Wooding

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).