Storing state in $GIT

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Storing state in $GIT_DIR
@ 2005-08-25  3:32 Martin Langhoff
  2005-08-25 18:16 ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Langhoff @ 2005-08-25  3:32 UTC (permalink / raw)
  To: GIT

Is there a convention of where/how it is safe to store additional
(non-git) data in $GIT_DIR?

The arch import needs to keep a cache with arch-commit-id  =
git-commit-id mappings, and some notes about what patch-trading Arch
recorded. It'd be great to be able to store those in
$GIT_DIR/archimport/ . Is that supported?

It does not need to be replicated with push or pull, merely preserved. 

cheers,

martin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-25  3:32 Storing state in $GIT_DIR Martin Langhoff
@ 2005-08-25 18:16 ` Linus Torvalds
  2005-08-26  1:30   ` Martin Langhoff
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Linus Torvalds @ 2005-08-25 18:16 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: GIT, Junio C Hamano

[ Junio, the fact that you can't script the initial commit with "git 
  commit" is _really_ irritating. ]

On Thu, 25 Aug 2005, Martin Langhoff wrote:
>
> Is there a convention of where/how it is safe to store additional
> (non-git) data in $GIT_DIR?

There's no convention, but I have a suggestion.

> The arch import needs to keep a cache with arch-commit-id  =
> git-commit-id mappings, and some notes about what patch-trading Arch
> recorded. It'd be great to be able to store those in
> $GIT_DIR/archimport/ . Is that supported?

Git won't care, so it will work, but things like clone/pull etc also won't
actually ever look there, so it will only work for that one repo.

Now, I have what I consider a clever idea (I've mentioned variations on it 
before), but it's entirely possible that people hate it.

The thing is, I think you _do_ want to revision-control the git-commit-id
mappings, but at the same time, you do _not_ want to mess up the resulting
git commit history with arch information.

The reason you want to revision-control them is that that way you get them 
on clones, and you can use push/pull to update them. And the reason you 
don't want to mess up the commit history is that it's just wrong and ugly.

The git solution to this (which nobody has ever _used_, but which
technically is wonderful) is to have a "side branch" that does not share
any commits (or files, for that matter) in common with the "real branch",
and which is used to track any metadata. In fact, you can obviously have 
any number of side branches.

So that "metadata branch" is a real git branch in its own right, but it
doesn't share the same root as the "normal" branch, and it's really
totally independent: you can pull just the main branch (ie somebody who
isn't arch-aware and has no reason to want the arch mappings), or you
could pull just the metadata branch (for example, somebody who doesn't
want to use git, but is trying to match up a git commit ID to whatever
ID's arch uses).

The way to maintain a metadata branch is to have not only a different 
branch name (obviously), but also use a totally different index file, so 
that you can index both branches in parallell, and you don't actually need 
to check out one or the other.

Now, your arch import tools would then use the raw git commands explicitly 
to maintain the metadata branch. Every time you do an incremental import 
from an arch project, your import scripts would save away the mapping 
information into the metadata branch.

I'll make a _really_ stupid example for you, just to make this a bit more 
concrete:

	mkdir silly-example
	cd silly-example

	#
	# The normal "main branch": use regular git
	# infrastructure
	#
	git init-db
	echo "Hello" > file
	git update-cache --add file
	git commit -m "Main branch"

	#
	# The metadata branch: magic, very special stuff
	#
	echo "initial commit:" $(git-rev-parse HEAD) > .archdata
	GIT_INDEX_FILE=.git/archindex git-update-cache --add .archdata
	arch_index_tree=$(GIT_INDEX_FILE=.git/archindex git-write-tree)
	echo "arch index" | git-commit-tree $arch_index_tree > .git/refs/heads/arch-index

(Btw, the above example shows that the initial "git commit" won't take a 
"-m" flag, which is really irritating for scripts.)

Then do a "gitk --all", see the two different branches, and realize that 
the "arch-index" branch can now contain all the tracking information 
necessary to go back-and-forth. 

		Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-25 18:16 ` Linus Torvalds
@ 2005-08-26  1:30   ` Martin Langhoff
  2005-08-26  3:54     ` Linus Torvalds
       [not found]   ` <7vwtm9u5jj.fsf@assigned-by-dhcp.cox.net>
  2005-08-26  2:03   ` [PATCH] Accept -m and friends for initial commits and merge commits Junio C Hamano
  2 siblings, 1 reply; 17+ messages in thread
From: Martin Langhoff @ 2005-08-26  1:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: GIT, Junio C Hamano

Linus, 

I like the solution you are suggesting, but I suspect it will create
more problems that it will solve, and while the coolness factor is
drawing me in.... we ain't gonna need it, as the xp people say.

More below...

On 8/26/05, Linus Torvalds <torvalds@osdl.org> wrote:
> Git won't care, so it will work, but things like clone/pull etc also won't
> actually ever look there, so it will only work for that one repo.

Storing things there _works_ in the sense that it will be ignored, and
that is fine with me. So I could just be lazy and have it strictly
tied to the repo. In practice, if you are tracking an external Arch
repo, you really have it scripted, and use a dedicated git repo for
that.

Not using a dedicated repo is quite... messy. If you do other things
in that particular repo, the import script may find it dirty, and mess
things up on import. And after the import, you'll probably run
git-push-script --all because it's bringing a dynamically growing
forest of heads from the arch repo. That's another reason why your
private branches should be elsewhere.

OTOH, storing the metadata in a branch will allow us to run the import
in alternating repositories. But as Junio points out, unless I can
guarantee that the metadata and the tree are in sync, I cannot
trivially resume the import cycle from a new repo.

> The git solution to this (which nobody has ever _used_, but which
> technically is wonderful) is to have a "side branch" that does not share
> any commits (or files, for that matter) in common with the "real branch",
> and which is used to track any metadata. In fact, you can obviously have
> any number of side branches.

A couple of days ago, playing with the import, I realised that the git
repo can hold unrelated projects, too, if you just commit orphan trees
as new heads. I mean - it was a bug in my script but I thought it was
cool. ;)

> The way to maintain a metadata branch is to have not only a different
> branch name (obviously), but also use a totally different index file, so
> that you can index both branches in parallell, and you don't actually need
> to check out one or the other.

Hmmm. Now that's voodoo magic! I was thinking of reading the file by
asking directly for the object by its sha, or doing a checkout in a
tmpdir. Interesting.

cheers,

martin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
       [not found]   ` <7vwtm9u5jj.fsf@assigned-by-dhcp.cox.net>
@ 2005-08-26  1:57     ` Martin Langhoff
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Langhoff @ 2005-08-26  1:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, GIT

On 8/26/05, Junio C Hamano <junkio@cox.net> wrote:
> If I am not mistaken, we have another foreign SCM import
> interface that can repeatedly slurp from the same foreign SCM to
> get updates.  Doesn't cvsimport have the same issue?  

Yes and no. 

cvsimport uses cvsps which uses in ~/.cvsps as a cache. Other than
that, all the info is transient - cvsimport doesn't need to know that
much about past commits -- when it sees BRANCH_A_A1 open from BRANCH_A
it opens a new head BRANCH_A_A1 with the parent in the _latest_
BRANCH_A.

This is a bug/limitation that only hits you when you are tracking an
evolving cvs project, because cvsps will otherwise mark the branching
point in order, right after the 'correct' commit. IOW cvsimport gets
it kind-of-right most of the time due to cvsps behaviour and sheer
luck, but doesn't do it strictly right either.

With Arch, we cannot even fake it. We see the branch in its own time,
and it can branch off any point in the source branch history. We have
the correct parent information -- it'd be silly to drop it. All we
have to do, is map the parents correctly...

cheers,

martin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] Accept -m and friends for initial commits and merge commits.
  2005-08-25 18:16 ` Linus Torvalds
  2005-08-26  1:30   ` Martin Langhoff
       [not found]   ` <7vwtm9u5jj.fsf@assigned-by-dhcp.cox.net>
@ 2005-08-26  2:03   ` Junio C Hamano
  2 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2005-08-26  2:03 UTC (permalink / raw)
  To: GIT; +Cc: Linus Torvalds

Yes it was irritating.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

    Linus Torvalds <torvalds@osdl.org> writes:
    > [ Junio, the fact that you can't script the initial commit with "git 
    >   commit" is _really_ irritating. ]

 git-commit-script |   86 ++++++++++++++++++++++++-----------------------------
 1 files changed, 39 insertions(+), 47 deletions(-)

c038244ac9260c8c895bf791ff587103bacadaba
diff --git a/git-commit-script b/git-commit-script
--- a/git-commit-script
+++ b/git-commit-script
@@ -110,57 +110,51 @@ t)
 	fi
 esac
 
+if [ ! -r "$GIT_DIR/HEAD" ]
+then
+	echo "#"
+	echo "# Initial commit"
+	echo "#"
+	git-ls-files | sed 's/^/# New file: /'
+	echo "#"
+elif [ -f "$GIT_DIR/MERGE_HEAD" ]; then
+	echo "#"
+	echo "# It looks like your may be committing a MERGE."
+	echo "# If this is not correct, please remove the file"
+	echo "#	$GIT_DIR/MERGE_HEAD"
+	echo "# and try again"
+	echo "#"
+fi >.editmsg
+if test "$log_message" != ''
+then
+	echo "$log_message"
+elif test "$logfile" != ""
+then
+	if test "$logfile" = -
+	then
+		test -t 0 &&
+		echo >&2 "(reading log message from standard input)"
+		cat
+	else
+		cat <"$logfile"
+	fi
+elif test "$use_commit" != ""
+then
+	git-cat-file commit "$use_commit" | sed -e '1,/^$/d'
+fi | git-stripspace >>.editmsg
+
 PARENTS="-p HEAD"
 if [ ! -r "$GIT_DIR/HEAD" ]; then
 	if [ -z "$(git-ls-files)" ]; then
 		echo Nothing to commit 1>&2
 		exit 1
 	fi
-	{
-		echo "#"
-		echo "# Initial commit"
-		case "$no_edit" in
-		t) echo "# (ignoring your commit message for initial commit)"
-		   no_edit= 
-		esac
-		echo "#"
-		git-ls-files | sed 's/^/# New file: /'
-		echo "#"
-	} >.editmsg
 	PARENTS=""
-	no_edit=
 else
 	if [ -f "$GIT_DIR/MERGE_HEAD" ]; then
-		{
-		echo "#"
-		echo "# It looks like your may be committing a MERGE."
-		echo "# If this is not correct, please remove the file"
-		echo "#	$GIT_DIR/MERGE_HEAD"
-		echo "# and try again"
-		case "$no_edit" in
-		t) echo "# (ignoring your commit message for merge commit)"
-		   no_edit= 
-		esac
-		echo "#"
-		} |
-		git-stripspace >.editmsg
 		PARENTS="-p HEAD -p MERGE_HEAD"
-	elif test "$log_message" != ''
-	then
-		echo "$log_message" |
-		git-stripspace >.editmsg
-	elif test "$logfile" != ""
-	then
-		if test "$logfile" = -
-		then
-			test -t 0 &&
-			echo >&2 "(reading log message from standard input)"
-			cat
-		else
-			cat <"$logfile"
-		fi |
-		git-stripspace >.editmsg
-	elif test "$use_commit" != ""
+	fi
+	if test "$use_commit" != ""
 	then
 		pick_author_script='
 		/^author /{
@@ -188,22 +182,20 @@ else
 		export GIT_AUTHOR_NAME
 		export GIT_AUTHOR_EMAIL
 		export GIT_AUTHOR_DATE
-		git-cat-file commit "$use_commit" |
-		sed -e '1,/^$/d' |
-		git-stripspace >.editmsg
 	fi
-
 	case "$signoff" in
 	t)
 		git-var GIT_COMMITTER_IDENT | sed -e '
 			s/>.*/>/
-			s/^/Signed-off-by: /' >>.editmsg ;;
+			s/^/Signed-off-by: /
+		' >>.editmsg
+		;;
 	esac
 	git-status-script >>.editmsg
 fi
 if [ "$?" != "0" -a ! -f $GIT_DIR/MERGE_HEAD ]
 then
-	cat .editmsg
+	sed -ne '/^#/p' .editmsg
 	rm .editmsg
 	exit 1
 fi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  1:30   ` Martin Langhoff
@ 2005-08-26  3:54     ` Linus Torvalds
  2005-08-26  4:15       ` Martin Langhoff
  0 siblings, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2005-08-26  3:54 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: GIT, Junio C Hamano

On Fri, 26 Aug 2005, Martin Langhoff wrote:
> 
> OTOH, storing the metadata in a branch will allow us to run the import
> in alternating repositories. But as Junio points out, unless I can
> guarantee that the metadata and the tree are in sync, I cannot
> trivially resume the import cycle from a new repo.

But you can.

Remember: the metadata is the pointers to the original git conversion, and 
objects are immutable.

In other words, if you just have a "last commit" pointer in your 
meta-data, then git is _by_definition_ in sync. There's never anything to 
get out of sync, because objects aren't going to change.

So you can think of your meta-data as a strange kind of head ref. Or 
rather, a _collection_ of these strange refs.

And it doesn't matter if somebody ends up committing on top of an arch 
import. The metadata by definition doesn't know about it, so the "import" 
head doesn't move anywhere (if you do git and arch work in parallell, you 
can then merge the two heads with git, of course).

			Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  3:54     ` Linus Torvalds
@ 2005-08-26  4:15       ` Martin Langhoff
  2005-08-26  4:31         ` Junio C Hamano
  2005-08-26  6:53         ` Eric W. Biederman
  0 siblings, 2 replies; 17+ messages in thread
From: Martin Langhoff @ 2005-08-26  4:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: GIT, Junio C Hamano

On 8/26/05, Linus Torvalds <torvalds@osdl.org> wrote:
> > OTOH, storing the metadata in a branch will allow us to run the import
> > in alternating repositories. But as Junio points out, unless I can
> > guarantee that the metadata and the tree are in sync, I cannot
> > trivially resume the import cycle from a new repo.
> 
> But you can.
> 
> Remember: the metadata is the pointers to the original git conversion, and
> objects are immutable.
> 
> In other words, if you just have a "last commit" pointer in your
> meta-data, then git is _by_definition_ in sync. There's never anything to
> get out of sync, because objects aren't going to change.

Hmmm. That repo is in sync, but there are no guarantees that they will
travel together to a different repo. In fact, the push/pull
infrastructure wants to push/pull one head at a time.

And if they are not in sync, I have no way of knowing. Hmpf. I lie:
the arch metadata could keep track of what it expects the last head
commits to be, and complain bitterly if something smells rotten.

let me think about it ;)


martin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  4:15       ` Martin Langhoff
@ 2005-08-26  4:31         ` Junio C Hamano
  2005-08-26  5:08           ` Daniel Barkalow
  2005-08-26  6:43           ` Martin Langhoff
  2005-08-26  6:53         ` Eric W. Biederman
  1 sibling, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2005-08-26  4:31 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, GIT

Martin Langhoff <martin.langhoff@gmail.com> writes:

>> In other words, if you just have a "last commit" pointer in your
>> meta-data, then git is _by_definition_ in sync. There's never anything to
>> get out of sync, because objects aren't going to change.
>
> Hmmm. That repo is in sync, but there are no guarantees that they will
> travel together to a different repo. In fact, the push/pull
> infrastructure wants to push/pull one head at a time.

Wrong as of last week ;-), and definitely wrong since this morning.

> And if they are not in sync, I have no way of knowing. Hmpf. I lie:
> the arch metadata could keep track of what it expects the last head
> commits to be, and complain bitterly if something smells rotten.

What Linus suggests is doable by using an object that can hold
a pointer to at least one commit---you used that to record the
head commit of the corresponding git branch that the arch
metainfo represents.

You only pull arch metainfo branch; the objects associated with
the corresponding git branch head will be pulled together when
you pull it.  You do not have to tell git to pull git-part of
the commit chain.  There is no need to worry about version skew
when you use git this way.

Now, among the existing object types, there are only two kinds
of objects you can use for this.  If the only thing you need to
record is some textual information with one pointer to git
branch head, then you can use tag that points at the git head,
and store everything else as the tag comment.  This is doable
but unwieldy.

You could abuse a commit object as well; you store commit
objects (such as the corresponding git branch head) as parent
commits, and put everything else in a tree that is associated
with that commit.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  4:31         ` Junio C Hamano
@ 2005-08-26  5:08           ` Daniel Barkalow
  2005-08-26  5:31             ` Linus Torvalds
  2005-08-26  5:52             ` Junio C Hamano
  2005-08-26  6:43           ` Martin Langhoff
  1 sibling, 2 replies; 17+ messages in thread
From: Daniel Barkalow @ 2005-08-26  5:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Langhoff, Linus Torvalds, GIT

On Thu, 25 Aug 2005, Junio C Hamano wrote:

> Now, among the existing object types, there are only two kinds
> of objects you can use for this.  If the only thing you need to
> record is some textual information with one pointer to git
> branch head, then you can use tag that points at the git head,
> and store everything else as the tag comment.  This is doable
> but unwieldy.

I don't think this buys you anything, because then the tag needs to be
accessible from something, which is the same problem you were trying to
solve for the commit.

> You could abuse a commit object as well; you store commit
> objects (such as the corresponding git branch head) as parent
> commits, and put everything else in a tree that is associated
> with that commit.

If you want to go that way, you could add a new field to commits with
minimal effort: you just need to parse it in commit.c, generate it in
git-commit-tree (with an option), and pull it in pull.c, and everything
should work as far as making the git portion follow the metadata around.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  5:08           ` Daniel Barkalow
@ 2005-08-26  5:31             ` Linus Torvalds
  2005-08-26  5:49               ` Junio C Hamano
  2005-08-26  5:52             ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Linus Torvalds @ 2005-08-26  5:31 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, Martin Langhoff, GIT

On Fri, 26 Aug 2005, Daniel Barkalow wrote:
> 
> I don't think this buys you anything, because then the tag needs to be
> accessible from something, which is the same problem you were trying to
> solve for the commit.

Yes. 

We had an earlier discussion somewhat along these lines, where a 
"collection" object might be useful. The "tree" object is that, of course, 
but the tree object really is very strictly structured (and has to be that 
way). There might be a valid case for an object that can point to an 
arbitrary collection of other objects, and have a free-form tail to it.

Of course, such an object would inevitably look very much like a 
generalized "tag" object, so one possibility might be to just allow a tag 
to have multiple object pointers.

We could easily generalize the tag format: just make it be something like

 - 1 or more lines of "object <sha1>"
 - make the "type " line optional (it used to have an implementation 
   reason: the internal interfaces always used to want to know the type 
   up-front, but we've moved away from that).
 - a single "tag" line to start the free-form section, and to name the 
   collection some way.

That kind of extension shouldn't be too hard, and might make tags much 
more generally usable (ie you could say "I sign these <n> official 
releases" or something).

			Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  5:31             ` Linus Torvalds
@ 2005-08-26  5:49               ` Junio C Hamano
  2005-08-27  0:23                 ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2005-08-26  5:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Barkalow, Martin Langhoff, GIT

Linus Torvalds <torvalds@osdl.org> writes:

> That kind of extension shouldn't be too hard, and might make tags much 
> more generally usable (ie you could say "I sign these <n> official 
> releases" or something).

Well, I admit that once I advocated changing "tag" to "bag", but
one problem is how you would dereference something like that.

"v0.99.5^0" means "look at the named object v0.99.5, dereference
it repeatedly until you get a non-tag, and take the result,
which had better be a commit".  If a tag can contain more than
one pointers, I do not know what it means.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  5:08           ` Daniel Barkalow
  2005-08-26  5:31             ` Linus Torvalds
@ 2005-08-26  5:52             ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2005-08-26  5:52 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Martin Langhoff, Linus Torvalds, GIT

Daniel Barkalow <barkalow@iabervon.org> writes:

> I don't think this buys you anything, because then the tag needs to be
> accessible from something, which is the same problem you were trying to
> solve for the commit.

Actually not.  My suggestion was a qualified one: "If all you
need is a textual information plus a single pointer to a commit
object", and Martin did not say different generations of arch
metainfo needs to be strung together, so you could keep updating
the tag object in "refs/tags/arch-meta-info", which has such a
pointer and textual arch meta info and nothing else.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  4:31         ` Junio C Hamano
  2005-08-26  5:08           ` Daniel Barkalow
@ 2005-08-26  6:43           ` Martin Langhoff
  1 sibling, 0 replies; 17+ messages in thread
From: Martin Langhoff @ 2005-08-26  6:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, GIT

On 8/26/05, Junio C Hamano <junkio@cox.net> wrote:
> > Hmmm. That repo is in sync, but there are no guarantees that they will
> > travel together to a different repo. In fact, the push/pull
> > infrastructure wants to push/pull one head at a time.
> 
> Wrong as of last week ;-), and definitely wrong since this morning.

Haven't had time to learn what the new conventions are for push/pull
scenarios. Will try and read up...

> > And if they are not in sync, I have no way of knowing. Hmpf. I lie:
> > the arch metadata could keep track of what it expects the last head
> > commits to be, and complain bitterly if something smells rotten.
> 
> What Linus suggests is doable by using an object that can hold
> a pointer to at least one commit---you used that to record the
> head commit of the corresponding git branch that the arch
> metainfo represents.

Yes. If I have my "arch-metadata" head, I can have several files
there, one of them containing a list of head "names" and the sha we
expect them to correspond to. If the thing doesn't match, we crash and
burn because we are out of sync.

Now, during import I'll have to be extra careful at commit-time, and
update and commit the arch-metadata head immediately after I commit
the head I'm importing, with strong error-handling. This should
minimize the out-of-sync situation.

If we _are_ out-of-sync, I could have a recovery mode that rewinds the
heads to the 'last known good' position and replays things forward. If
my script is stable, the results should be stable too...

> You only pull arch metainfo branch; the objects associated with
> the corresponding git branch head will be pulled together when
> you pull it.  You do not have to tell git to pull git-part of
> the commit chain.  There is no need to worry about version skew
> when you use git this way.

>From here onwards, you lost me, mate ;)

cheers,

martin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  4:15       ` Martin Langhoff
  2005-08-26  4:31         ` Junio C Hamano
@ 2005-08-26  6:53         ` Eric W. Biederman
  2005-08-26  7:08           ` Martin Langhoff
  1 sibling, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2005-08-26  6:53 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, GIT, Junio C Hamano

Martin Langhoff <martin.langhoff@gmail.com> writes:

> Hmmm. That repo is in sync, but there are no guarantees that they will
> travel together to a different repo. In fact, the push/pull
> infrastructure wants to push/pull one head at a time.
>
> And if they are not in sync, I have no way of knowing. Hmpf. I lie:
> the arch metadata could keep track of what it expects the last head
> commits to be, and complain bitterly if something smells rotten.
>
> let me think about it ;)

Thinking about it going from arch to git should be just a matter
of checking sha1 hashes, possibly back to the beginning of the
arch tree.  

Going from git to arch is the trickier mapping, because you
need to know the full repo--category--branch--version--patch
mapping.

Hmm.  Thinking about arch from a git perspective arch tags every
commit.  So the really sane thing to do (I think) is to create
a git tag object for every arch commit.

With that structure you would just need to create a git-arch-rev-list
so you can get a list of which arch branches you already have.
And then a git-arch-push and a git-arch-pull should be just a matter
of finding the common ancestor and continuing along the branch until
you reach the head.  Handling all heads in an arch repository is a
little trickier but should not be too bad.

On the push side you can just treat git as an arch working directory 
and push changsets into the appropriate branch.  For branches that
do not have tla as the ancestor you can do the equivalent of
tla archive-mirror.

Changes can be merged on whichever side make sense.

With patch trading (Martin I think I know what you are refering to)
arch does seem to have a concept that does not map very well to git,
and this I think is a failing in git.  Arch can record that just the
changes from a single changset/patch were merged.  This happens all
of the time in the kernel tree when patches are merged.  The
interesting case for merge algorithms is when two maintainers merge
the same patches into separate branches and then the branches are
merged.  Does git have a good way of coping with that case?

On the simple side it for patch trading it might just be worth
treating them as a special git merge with just one parent in
the parents line and the real parent listed in the merge comment,
along with the original parents commit comment.  But that just
might be too ugly to think about.

How does StGit handle this?

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  6:53         ` Eric W. Biederman
@ 2005-08-26  7:08           ` Martin Langhoff
  2005-08-26 14:26             ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Langhoff @ 2005-08-26  7:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, GIT, Junio C Hamano

On 8/26/05, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Thinking about it going from arch to git should be just a matter
> of checking sha1 hashes, possibly back to the beginning of the
> arch tree.

Yup, though actually replaying the tree to compute the hashes is
something I just _won't_ do ;)

> Going from git to arch is the trickier mapping, because you
> need to know the full repo--category--branch--version--patch
> mapping.

My plan doesn't include git->arch support... yet...

> Hmm.  Thinking about arch from a git perspective arch tags every
> commit.  So the really sane thing to do (I think) is to create
> a git tag object for every arch commit.

Now I like that interesting idea. It doesn't solve all my problems,
but is a reasonable mapping point. Will probably do it.

> With patch trading (Martin I think I know what you are refering to)
> arch does seem to have a concept that does not map very well to git,
> and this I think is a failing in git.

I won't get into _that_ flamewar ;)

My plan for merges is to detect when two branches up until what point
branches are fully merged, and mark that in git -- because that is
what git considers a merge. The rest will be known to the importer,
but nothing else.

cheers,


martin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  7:08           ` Martin Langhoff
@ 2005-08-26 14:26             ` Eric W. Biederman
  0 siblings, 0 replies; 17+ messages in thread
From: Eric W. Biederman @ 2005-08-26 14:26 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Linus Torvalds, GIT, Junio C Hamano

Martin Langhoff <martin.langhoff@gmail.com> writes:

> On 8/26/05, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Thinking about it going from arch to git should be just a matter
>> of checking sha1 hashes, possibly back to the beginning of the
>> arch tree.
>
> Yup, though actually replaying the tree to compute the hashes is
> something I just _won't_ do ;)

I guess if you have the tla branch names it won't be necessary.
If you are careful how you do the import you can have two parallel
imports of the same data and produce exactly the same git tree.
That is largely why I care about a stable algorithm for the hashes.

>> Going from git to arch is the trickier mapping, because you
>> need to know the full repo--category--branch--version--patch
>> mapping.
>
> My plan doesn't include git->arch support... yet...

One of my interests, and if I get the time to worry about it
is to get a scm that is a sufficient superset of what other
scms do so it can serve as a bidirectional gateway.

git is fairly close to what is needed to implement that.

Hmm.  I wonder if a git metadata branch in general is sufficient to
store information that does not map to git natively?

>> Hmm.  Thinking about arch from a git perspective arch tags every
>> commit.  So the really sane thing to do (I think) is to create
>> a git tag object for every arch commit.
>
> Now I like that interesting idea. It doesn't solve all my problems,
> but is a reasonable mapping point. Will probably do it.
>
>> With patch trading (Martin I think I know what you are refering to)
>> arch does seem to have a concept that does not map very well to git,
>> and this I think is a failing in git.
>
> I won't get into _that_ flamewar ;)

<pouts> No flamewar </pouts>

> My plan for merges is to detect when two branches up until what point
> branches are fully merged, and mark that in git -- because that is
> what git considers a merge. The rest will be known to the importer,
> but nothing else.

I looked at least back to the StGit announcement and it helped to
clarify my thinking.  A patch is equivalent to a branch with
just one change. This makes cherry picking a single patch roughly
equivalent to describing that patch as a single commit branch
at the fork point from the common ancestor of the two branches,
and then having the single commit merged.

The fact that the original branch that was cherry picked from
can really only be represented as a an graft.  Like the original
linux kernel history.

The shortcoming I see in git-applypatch is that it doesn't attempt
to find the original base of a patch and instead simply assumes it
is against the current tree.

There is a similar short coming in git-diff-tree where it reports
the commit that you are on when take the diff, but it does not
report the commit the diff is against. 

......

Thinking a little more there is also a connection with reverting
patches.  Cherry picking changes from a branch may also be thought of
as reverting all of the other changes from a branch and then merging
the branch.

The practical impact of all of these things is there a form that
will allow future merges to realize the same change has already
been applied so it can skip it the second time.

Inter-operating with darcs, tla, quilt, and raw diff/patch brings up
these issues.

So my practical questions are:
- What information can a current git merge algorithms and more
  sophisticated merge algorithms use to avoid having conflicts when
  the same changes are merged into the same branch multiple times?

- Is the git meta data sufficient to represent the history
  sophisticated merge algorithms can use.

- Is the git meta data sufficient to represent the result
  of sufficient meta data operations.

- Is the current representation of a reverted change sufficient
  for the merge algorithms, or could they do a better job if
  they new a change was revert of a previous change.

I'm just trying to think through the issues that working with patch
based systems bring up.

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Storing state in $GIT_DIR
  2005-08-26  5:49               ` Junio C Hamano
@ 2005-08-27  0:23                 ` Linus Torvalds
  0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2005-08-27  0:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Barkalow, Martin Langhoff, GIT



On Thu, 25 Aug 2005, Junio C Hamano wrote:
> 
> "v0.99.5^0" means "look at the named object v0.99.5, dereference
> it repeatedly until you get a non-tag, and take the result,
> which had better be a commit".  If a tag can contain more than
> one pointers, I do not know what it means.

Yeah, we'd have to either just say "I can't do that, Dave", or specify 
that it only looks at the first object in the list.

		Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2005-08-27  0:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-25  3:32 Storing state in $GIT_DIR Martin Langhoff
2005-08-25 18:16 ` Linus Torvalds
2005-08-26  1:30   ` Martin Langhoff
2005-08-26  3:54     ` Linus Torvalds
2005-08-26  4:15       ` Martin Langhoff
2005-08-26  4:31         ` Junio C Hamano
2005-08-26  5:08           ` Daniel Barkalow
2005-08-26  5:31             ` Linus Torvalds
2005-08-26  5:49               ` Junio C Hamano
2005-08-27  0:23                 ` Linus Torvalds
2005-08-26  5:52             ` Junio C Hamano
2005-08-26  6:43           ` Martin Langhoff
2005-08-26  6:53         ` Eric W. Biederman
2005-08-26  7:08           ` Martin Langhoff
2005-08-26 14:26             ` Eric W. Biederman
     [not found]   ` <7vwtm9u5jj.fsf@assigned-by-dhcp.cox.net>
2005-08-26  1:57     ` Martin Langhoff
2005-08-26  2:03   ` [PATCH] Accept -m and friends for initial commits and merge commits Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).