* Re: ALSA official git repository
From: Andrew Morton @ 2005-05-27 22:46 UTC (permalink / raw)
To: Linus Torvalds; +Cc: perex, linux-kernel, git
In-Reply-To: <Pine.LNX.4.58.0505271502240.17402@ppc970.osdl.org>
Linus Torvalds <torvalds@osdl.org> wrote:
>
> > which means that the algorithm for identifying the author is "the final
> > From:".
>
> No, the algorithm is:
> - the email author, _or_ if there is one, the top "From:" in the body.
That all assumes that the tools are smart enough to separate the email
headers from the body :(
^ permalink raw reply
* [PATCH 00/12] Diff updates
From: Junio C Hamano @ 2005-05-27 22:43 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505270848220.17402@ppc970.osdl.org>
This series consists of the following 12 patches. Most of them
are bugfixes and cleanups. The last one is somewhat iffy,
although it does not break things, and lies somewhere between a
request for inclusion and a request for comments.
[PATCH 01/12] Fix math thinko in similarity estimator.
[PATCH 02/12] Introduce diff_free_filepair() funcion.
[PATCH 03/12] Make pathspec only care about the detination tree.
[PATCH 04/12] Remove unused rank field from diff_core structure.
[PATCH 05/12] Do not expose internal scaling to diff-helper.
[PATCH 06/12] Remove final newline from the value of xfrm_msg variable.
[PATCH 07/12] Clean up diff_setup() to make it more extensible.
[PATCH 08/12] Remove a function not used anymore.
[PATCH 09/12] Add --pickaxe-all to diff-* brothers.
[PATCH 10/12] Fix the way diffcore-rename records unremoved source.
[PATCH 11/12] Move pathspec to the beginning of the diffcore chain.
[PATCH 12/12] Optimize diff-tree -[CM] --stdin
^ permalink raw reply
* Re: ALSA official git repository
From: Linus Torvalds @ 2005-05-27 22:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: perex, linux-kernel, git
In-Reply-To: <20050527135124.0d98c33e.akpm@osdl.org>
On Fri, 27 May 2005, Andrew Morton wrote:
>
> Yes, I'll occasionally do patches which were written by "A" as:
>
> From: A
> ...
> Signed-off-by: B
>
> And that comes through email as:
>
>
> ...
> From: <akpm@osdl.org>
> ...
> From: A
> ...
> Signed-off-by: B
>
>
> which means that the algorithm for identifying the author is "the final
> From:".
No, the algorithm is:
- the email author, _or_ if there is one, the top "From:" in the body.
And the rule is that you never remove (or add to) an existing From:, since
the author doesn't change from being passed around.
Put another way: authorship is very different from sign-off. The sign-off
gets stacked, the authorship is constant, and thus the rules are
different.
Also, authorship is more important than sign-off-ship, so authorship goes
at the top, while sign-offs go at the bottom.
> I guess the bug here is the use of From: to identify the primary author,
> because transporting the patch via email adds ambiguity.
No it doesn't, the email "from" just ends up being the "default" if no
explicit authorship is noted.
> Maybe we should introduce "^Author:"?
It would still have the same rules, so it wouldn't change anything but the
tag, so I don't think there is any real advantage to it.
Linus
^ permalink raw reply
* Re: More gitweb queries..
From: Thomas Glanzmann @ 2005-05-27 22:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505271457480.17402@ppc970.osdl.org>
Hello,
> Actually, even that is not actually built into the commit object
> itself, that's just a #define in commit-tree.c. Change the MAXPARENT
> design from 16 to 1024, and nobody will notice any difference at all,
> except "git-commit-tree.c" will use 20kB more memory ;)
That sounds just way to perfect. :-)
> There's no limit in the data structures, although there clearly is a
> "sanity" limit (and I personally suspect it comes before you hit 16 ;)
I like how this 'simple' git concepts just fits into all this usage
scenarios. Including this one or the way we can track renames. Name it!
Thanks for giving us this perfect piece of software! :-)
Thomas
^ permalink raw reply
* Re: More gitweb queries..
From: Linus Torvalds @ 2005-05-27 22:00 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Thomas Glanzmann, Git Mailing List
In-Reply-To: <7vk6lkwgfl.fsf@assigned-by-dhcp.cox.net>
On Fri, 27 May 2005, Junio C Hamano wrote:
>
> >>>>> "TG" == Thomas Glanzmann <sithglan@stud.uni-erlangen.de> writes:
>
> TG> But I guess 8 is the limit, isn't it? Did you thought to make this 8 a
> TG> 'n' or is 8 just enough? :-)
>
> Built-in limit of commit object is 16, not 8.
Actually, even that is not actually built into the commit object itself,
that's just a #define in commit-tree.c.
Change the MAXPARENT design from 16 to 1024, and nobody will notice any
difference at all, except "git-commit-tree.c" will use 20kB more memory ;)
There's no limit in the data structures, although there clearly is a
"sanity" limit (and I personally suspect it comes before you hit 16 ;)
Linus
^ permalink raw reply
* Re: [PATCH] git-tar-tree: small doc update
From: David Greaves @ 2005-05-27 21:32 UTC (permalink / raw)
To: Rene Scharfe; +Cc: Linus Torvalds, git
In-Reply-To: <20050527212032.GB17478@lsrfire.ath.cx>
Rene Scharfe wrote:
>I'll take the blame for
>that contraption, if you don't mind. ;)
>
<snip>
>Author
> ------
>-Written by Linus Torvalds <torvalds@osdl.org>
>+Written by Rene Scharfe.
>
>
Good - this was what I intended to have happen all along :)
Also Junio suggested a more general attribution - which I have now put
into eg git-mkdelta.txt
Git is written by Linus Torvalds <torvalds@osdl.org> and the git-list
<git@vger.kernel.org>.
I'll generalise the rest of the files at some point and respect any more
specific atributions.
David
--
^ permalink raw reply
* [PATCH] git-tar-tree: small doc update
From: Rene Scharfe @ 2005-05-27 21:20 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Update git-tar-tree documenation a teensy bit: document where the file
times come from and correct author section. I'll take the blame for
that contraption, if you don't mind. ;)
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Index: Documentation/git-tar-tree.txt
===================================================================
--- fa5c736eeabbead4a4c024051d104930d836092a/Documentation/git-tar-tree.txt (mode:100644)
+++ ff1cce79554723d915cb45315fda2d56a1c5ea04/Documentation/git-tar-tree.txt (mode:100644)
@@ -17,10 +17,14 @@
When <base> is specified it is added as a leading path as the files in the
generated tar archive.
+When the given ID is a tree ID then all the files in the archive get
+their mtime set to the current time. When called with the ID of a commit
+object the commit time recorded therein is used instead.
+
Author
------
-Written by Linus Torvalds <torvalds@osdl.org>
+Written by Rene Scharfe.
Documentation
--------------
^ permalink raw reply
* Re: ALSA official git repository
From: Schneelocke @ 2005-05-27 21:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linus Torvalds, perex, linux-kernel, git
In-Reply-To: <20050527135124.0d98c33e.akpm@osdl.org>
On 27/05/05, Andrew Morton <akpm@osdl.org> wrote:
> Yes, I'll occasionally do patches which were written by "A" as:
>
> From: A
> ...
> Signed-off-by: B
>
> And that comes through email as:
>
> ...
> From: <akpm@osdl.org>
> ...
> From: A
> ...
> Signed-off-by: B
>
> which means that the algorithm for identifying the author is "the final
> From:".
>
> I guess the bug here is the use of From: to identify the primary author,
> because transporting the patch via email adds ambiguity.
>
> Maybe we should introduce "^Author:"?
How about "^Written-by:"? That seems to fit in much more nicely with
"Signed-off-by:".
--
schnee
^ permalink raw reply
* [PATCH] git-tar-tree: cleanup trailer writing
From: Rene Scharfe @ 2005-05-27 21:16 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Replace open-coded variants of get_record().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Index: tar-tree.c
===================================================================
--- ba1de5878d8e0cd1c7c728379e033ea6bf8567e5/tar-tree.c (mode:100644)
+++ fa5c736eeabbead4a4c024051d104930d836092a/tar-tree.c (mode:100644)
@@ -73,16 +73,13 @@
*/
static void write_trailer(void)
{
- memset(block + offset, 0, RECORDSIZE);
- offset += RECORDSIZE;
+ get_record();
write_if_needed();
- memset(block + offset, 0, RECORDSIZE);
- offset += RECORDSIZE;
+ get_record();
write_if_needed();
- if (offset) {
- memset(block + offset, 0, BLOCKSIZE - offset);
- reliable_write(block, BLOCKSIZE);
- offset = 0;
+ while (offset) {
+ get_record();
+ write_if_needed();
}
}
^ permalink raw reply
* Re: ALSA official git repository
From: Jesper Juhl @ 2005-05-27 21:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linus Torvalds, perex, linux-kernel, git
In-Reply-To: <20050527135124.0d98c33e.akpm@osdl.org>
On Fri, 27 May 2005, Andrew Morton wrote:
> Linus Torvalds <torvalds@osdl.org> wrote:
> >
> >
> >
> > On Fri, 27 May 2005, Jaroslav Kysela wrote:
> > >
> > > Okay, sorry for this small bug. I'll recreate the ALSA git tree with
> > > proper comments again. Also, the author is not correct (should be taken
> > > from the first Signed-off-by:).
> >
> > Hmm.. That's not always true in general, since Sign-off does allow to sign
> > off on other peoples patches (see the "(b)" clause in DCO), but maybe in
> > the ALSA tree it is.
>
> Yes, I'll occasionally do patches which were written by "A" as:
>
> From: A
> ...
> Signed-off-by: B
>
> And that comes through email as:
>
>
> ...
> From: <akpm@osdl.org>
> ...
> From: A
> ...
> Signed-off-by: B
>
>
> which means that the algorithm for identifying the author is "the final
> From:".
>
> I guess the bug here is the use of From: to identify the primary author,
> because transporting the patch via email adds ambiguity.
>
> Maybe we should introduce "^Author:"?
>
That might be good. I honestly don't know what would be the best
solution, but what happens often at the moment is that patches get passed
on as "From" whatever maintainer (or random resender) happened to pass it
on to Andrew/Linus and that person then effectively gets labeled as the
author of the patch in the changelogs/git/whatever. That's not perfect...
Author: might solve it.. worth a shot if you ask me..
--
Jesper Juhl
^ permalink raw reply
* Re: ALSA official git repository
From: Junio C Hamano @ 2005-05-27 20:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, perex, linux-kernel, git
In-Reply-To: <20050527135124.0d98c33e.akpm@osdl.org>
>>>>> "AM" == Andrew Morton <akpm@osdl.org> writes:
AM> I guess the bug here is the use of From: to identify the primary author,
AM> because transporting the patch via email adds ambiguity.
AM> Maybe we should introduce "^Author:"?
While we are at it, we probably would want "^Author-Date:" as
well.
^ permalink raw reply
* Re: ALSA official git repository
From: Andrew Morton @ 2005-05-27 20:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: perex, linux-kernel, git
In-Reply-To: <Pine.LNX.4.58.0505271113410.17402@ppc970.osdl.org>
Linus Torvalds <torvalds@osdl.org> wrote:
>
>
>
> On Fri, 27 May 2005, Jaroslav Kysela wrote:
> >
> > Okay, sorry for this small bug. I'll recreate the ALSA git tree with
> > proper comments again. Also, the author is not correct (should be taken
> > from the first Signed-off-by:).
>
> Hmm.. That's not always true in general, since Sign-off does allow to sign
> off on other peoples patches (see the "(b)" clause in DCO), but maybe in
> the ALSA tree it is.
Yes, I'll occasionally do patches which were written by "A" as:
From: A
...
Signed-off-by: B
And that comes through email as:
...
From: <akpm@osdl.org>
...
From: A
...
Signed-off-by: B
which means that the algorithm for identifying the author is "the final
From:".
I guess the bug here is the use of From: to identify the primary author,
because transporting the patch via email adds ambiguity.
Maybe we should introduce "^Author:"?
^ permalink raw reply
* Re: More gitweb queries..
From: Junio C Hamano @ 2005-05-27 20:40 UTC (permalink / raw)
To: Thomas Glanzmann; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <20050527203227.GA11139@cip.informatik.uni-erlangen.de>
>>>>> "TG" == Thomas Glanzmann <sithglan@stud.uni-erlangen.de> writes:
TG> But I guess 8 is the limit, isn't it? Did you thought to make this 8 a
TG> 'n' or is 8 just enough? :-)
Built-in limit of commit object is 16, not 8.
^ permalink raw reply
* [PATCH] testcase for git-tar-tree
From: Rene Scharfe @ 2005-05-27 20:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Add testcase for git-tar-tree and git-get-tar-commit-id.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Index: t/t3200-tar-tree.sh
===================================================================
--- /dev/null (tree:f6db9be9d431080d9e7f61edb616b8bac8c9618f)
+++ ba1de5878d8e0cd1c7c728379e033ea6bf8567e5/t/t3200-tar-tree.sh (mode:100755)
@@ -0,0 +1,106 @@
+#!/bin/sh
+#
+# Copyright (C) 2005 Rene Scharfe
+#
+
+test_description='git-tar-tree and git-get-tar-commit-id test
+
+This test covers the topics of long paths, file contents, commit date
+handling and commit id embedding:
+
+ Paths longer than 100 characters require the use of a pax extended
+ header to store them. The test creates files with pathes both longer
+ and shorter than 100 chars, and also checks symlinks with long and
+ short pathes both as their own name and as target path.
+
+ The contents of the repository is compared to the extracted tar
+ archive. The repository contains simple text files, symlinks and a
+ binary file (/bin/sh).
+
+ git-tar-tree applies the commit date to every file in the archive it
+ creates. The test sets the commit date to a specific value and checks
+ if the tar archive contains that value.
+
+ When giving git-tar-tree a commit id (in contrast to a tree id) it
+ embeds this commit id into the tar archive as a comment. The test
+ checks the ability of git-get-tar-commit-id to figure it out from the
+ tar file.
+
+'
+
+. ./test-lib.sh
+
+test_expect_success \
+ 'populate workdir' \
+ 'mkdir a b c &&
+ p48=1.......10........20........30........40......48 &&
+ p50=1.......10........20........30........40........50 &&
+ p98=${p48}${p50} &&
+ echo simple textfile >a/a &&
+ echo 100 chars in path >a/${p98} &&
+ echo 101 chars in path >a/${p98}x &&
+ echo 102 chars in path >a/${p98}xx &&
+ echo 103 chars in path >a/${p98}xxx &&
+ mkdir a/bin &&
+ cp /bin/sh a/bin/sh &&
+ ln -s a a/l1 &&
+ ln -s ${p98}xx a/l100 &&
+ ln -s ${p98}xxx a/l101 &&
+ ln -s ${p98}xxx a/l${p98} &&
+ (cd a && find .) | sort >a.lst'
+
+test_expect_success \
+ 'add files to repository' \
+ 'find a -type f | xargs git-update-cache --add &&
+ find a -type l | xargs git-update-cache --add &&
+ treeid=`git-write-tree` &&
+ echo $treeid >treeid &&
+ TZ= GIT_COMMITTER_DATE="2005-05-27 22:00:00" \
+ git-commit-tree $treeid </dev/null >.git/HEAD'
+
+test_expect_success \
+ 'git-tar-tree' \
+ 'git-tar-tree HEAD >b.tar'
+
+test_expect_success \
+ 'validate file modification time' \
+ 'tar tvf b.tar a/a | awk \{print\ \$4,\$5\} >b.mtime &&
+ echo "2005-05-27 22:00:00" >expected.mtime &&
+ diff expected.mtime b.mtime'
+
+test_expect_success \
+ 'git-get-tar-commit-id' \
+ 'git-get-tar-commit-id <b.tar >b.commitid &&
+ diff .git/HEAD b.commitid'
+
+test_expect_success \
+ 'extract tar archive' \
+ '(cd b && tar xf -) <b.tar'
+
+test_expect_success \
+ 'validate filenames' \
+ '(cd b/a && find .) | sort >b.lst &&
+ diff a.lst b.lst'
+
+test_expect_success \
+ 'validate file contents' \
+ 'diff -r a b/a'
+
+test_expect_success \
+ 'git-tar-tree with prefix' \
+ 'git-tar-tree HEAD prefix >c.tar'
+
+test_expect_success \
+ 'extract tar archive with prefix' \
+ '(cd c && tar xf -) <c.tar'
+
+test_expect_success \
+ 'validate filenames with prefix' \
+ '(cd c/prefix/a && find .) | sort >c.lst &&
+ diff a.lst c.lst'
+
+test_expect_success \
+ 'validate file contents with prefix' \
+ 'diff -r a c/prefix/a'
+
+test_done
^ permalink raw reply
* Re: More gitweb queries..
From: Thomas Glanzmann @ 2005-05-27 20:32 UTC (permalink / raw)
To: Junio C Hamano, Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <7vu0kowho9.fsf@assigned-by-dhcp.cox.net>
Hello,
okay thanks for the elaboration on the topic. I will now adopt my
scripts to handle it. I think I already have a use for it.
-- mutt-hcache --
/-- mutt-imap --\
/--- mutt-whatever ---\
mutt-cvs ---- ... ----- mutt-tg (my working tree)
\ ... ----/
\-- ... -/
Actually, I have already 12 trees with different features which I work on.
1 mutt-attach-file 5 mutt-hcache 9 mutt-menu-move
2 mutt-collapse-flags 6 mutt-headers 10 mutt-move-hook
3 mutt-cstatus 7 mutt-imap 11 mutt-setenv-hack
4 mutt-edit-threads 8 mutt-maildir-mtime 12 mutt-thread-pattern
But I guess 8 is the limit, isn't it? Did you thought to make this 8 a
'n' or is 8 just enough? :-)
Thomas
^ permalink raw reply
* Re: More gitweb queries..
From: Junio C Hamano @ 2005-05-27 20:24 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Thomas Glanzmann, Kay Sievers, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505271248450.17402@ppc970.osdl.org>
>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
LT> For example, for somebody like
LT> Jeff, who maintains 50 different branches, and merges 5 of them to send
LT> them to me, an octopus merge in many ways is much more intuitive: it
LT> really says "I took these five branches and combined them", while a
LT> series of four regular merges just gets messy.
This really is a good use case for an Octopus. I hope Jeff is
reading this thread.
^ permalink raw reply
* Re: More gitweb queries..
From: Linus Torvalds @ 2005-05-27 20:17 UTC (permalink / raw)
To: Thomas Glanzmann; +Cc: Junio C Hamano, Kay Sievers, Git Mailing List
In-Reply-To: <20050527195552.GA6541@cip.informatik.uni-erlangen.de>
On Fri, 27 May 2005, Thomas Glanzmann wrote:
> > You merge by hand and resolve if they have conflicts, just like
> > what you already do in two head merge case.
>
> I see. Does that mean that 'git-ls-files --unmerged' will report upto 9
> stages per file?
No, you must always merge trees one by one against each other. The
simplest ordering is to merge trees 1-2 first, then the result of that
with 3, then the result of _that_ with 4 etc etc, but you can - if you
really want to - do 1-2 and 3-4 separately and then merge those two
together.
The ordering does actually end up mattering a bit when it comes to
deciding on parenthood, but in the end you will have used the same most
remote common parent for _one_ of the merges anyway, so assuming all the
merges were automatically resolved by the regular 3-way thing, I claim
that it doesn't really matter noticeably (*).
Regardless, you'd end up with seven "git-read-tree -m x y z" invocations
(plus possibly a few git-merge-cache calls), and one final commit.
Linus
(*) I bet you could find some case where the ordering either generates a
create-create conflict or it doesn't, depending on how you pair things up.
But I also claim that you'd be crazy to do a octopus merge for something
like that anyway, and that the reason to do one is that you've had five
totally disjoint things you've been working on - like updating five
different drivers or five different filesystems in different branches, and
there are no conflicts however you turn.
^ permalink raw reply
* Re: More gitweb queries..
From: Junio C Hamano @ 2005-05-27 20:13 UTC (permalink / raw)
To: Thomas Glanzmann; +Cc: Git Mailing List
In-Reply-To: <20050527195552.GA6541@cip.informatik.uni-erlangen.de>
>>>>> "TG" == Thomas Glanzmann <sithglan@stud.uni-erlangen.de> writes:
>> You merge by hand and resolve if they have conflicts, just like
>> what you already do in two head merge case.
TG> I see. Does that mean that 'git-ls-files --unmerged' will report upto 9
TG> stages per file?
No, I think my description was unclear. You still merge two at
a time because that is what git-read-tree -m gives you (3-way
merge is between $(merge-base $A $B) and $A and $B so you are
merging two heads).
To confess, my workflow to merge with Linus is currently
primarily patch based, so I do not even use git-read-tree -m
3-way merge when I make an Octopus (for that matter, I do not
myself do Octopus at all these days). When I have bunch of
independent changes, I would first prepare and test these:
-- JC#1
/ - JC#2
/ - JC#3
Linus#1- - JC#4
\ ...
\-- JC#7
By the time I am done and happy with them, tip of Linus tree may
have already advanced and he is at Linus#2. I would then apply
diffs between Linus#1 and JC#n (1 <= n <= 7) on top of Linus #2,
and commit the result with parents set to Linus #2 and JC#1,
JC#2, ..., JC#7.
------- Linus#2
/ \
/ -- JC#1 --------\
/ /-- JC#2 ---------\
/ /--- JC#3 ----------\
Linus#1---- JC#4 ---------- Octopus
\ ... /
\-- JC#7 ----------
^ permalink raw reply
* Re: More gitweb queries..
From: Linus Torvalds @ 2005-05-27 20:03 UTC (permalink / raw)
To: Thomas Glanzmann; +Cc: Kay Sievers, Git Mailing List
In-Reply-To: <20050527192941.GE7068@cip.informatik.uni-erlangen.de>
On Fri, 27 May 2005, Thomas Glanzmann wrote:
>
> > I get the urge to do octopus-merges in the kernel just because of how
> > good they look in gitk ;) ]
>
> talking about octopus-merges ... I don't understand how they work. What
> happens if one file is touched in every of the 8 trees. How can that be
> handled?
Automatically? You can do multiple three-way merges, no problem.
In fact, the general algorithm for an n-way merge is to just do the
"git-resolve-script" n-1 times, but _without_ the commit. Then you just
commit the result, and the only thing to keep in mind is to get the
parents right, because if you don't, you're screwed.
This does imply a merge ordering, but since we order the parents anyway,
that's actually also described 100% by the commit, so the end result is
clean and good.
There are two reasons not to do octopus-merges, and neither of them is
huge, but they've kept me from doing them..
- if you screw up half-way through the merge, it's a lot harder to
recover without blowing away all the other merges too and having to
re-do them. You certainly _can_ do it (say, by just recording the trees
in between merges - it's definitely not rocket science), but it
basically means that you need to keep track of things _outside_ of the
normal "what was the last HEAD" model.
More importantly, since an octopus merge has only one commit message
associated with it, you really should never use one for anything that
needs any manual intervention. Otherwise you'll have to start
explaining which merge you needed to fix up manually etc, and it just
gets complex for no actual gain.
IOW, this argument is only against complex merges. The trivial ones can
easily be done as octopuses, and in many ways the resulting history may
actually reflect what you did better. For example, for somebody like
Jeff, who maintains 50 different branches, and merges 5 of them to send
them to me, an octopus merge in many ways is much more intuitive: it
really says "I took these five branches and combined them", while a
series of four regular merges just gets messy.
- Compatibility with other systems.
I don't care one whit about stuff I consider broken (ie CVS), but there
are SCM's out there that I _don't_ think are broken, and that don't do
multi-parent merges for "nrparent > 2". You can always split an
octopus merge that didn't have any manual intervention, so again, this
is not a huge argument if you follow rule #1, but unless you have a
reason for doing an octopus merge, it means that you should probably
avoid it.
So _I_ usually don't have any reason at all, it would be stupid of me
to merge trees from different people as an octopus, but usage like
Jeff's (where the merge is due to "pass these <n> trees upwards") is
different.
So there you have it. Don't do it just because you can, but if you have a
good reason for them and they were done automatically without any human
intervention (apart from having to change the scripts, of course), I won't
argue too much against them either. I already took one such merge from
Junio in the GIT tree, and I actually like having that as a way to make
sure the tools can handle it.
Linus
^ permalink raw reply
* Re: More gitweb queries..
From: Thomas Glanzmann @ 2005-05-27 19:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <7vd5rcxx5p.fsf@assigned-by-dhcp.cox.net>
Hello,
* Junio C Hamano <junkio@cox.net> [050527 21:54]:
> Thomas, could you please stop doing Mail-Followup-To in your
> header please? I automatically did 'reply all' and ended up
> preaching Linus (because that was the first mailbox on your
> Mail-Followup-to header) how Octopus works, when he knows what
> it is already.
test without the mft.
Thomas
^ permalink raw reply
* Re: More gitweb queries..
From: Thomas Glanzmann @ 2005-05-27 19:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, Kay Sievers, Git Mailing List
In-Reply-To: <7vhdgoxx8c.fsf@assigned-by-dhcp.cox.net>
Hello,
> You merge by hand and resolve if they have conflicts, just like
> what you already do in two head merge case.
I see. Does that mean that 'git-ls-files --unmerged' will report upto 9
stages per file?
> Octopus is only about how you record the results. Instead of
> making 7 consecutive "merge from A" "merge from B" to record two
> head merges, you just say "I merged these 8 heads" in a single
> commit.
I got that part. :-)
Thomas
^ permalink raw reply
* Re: More gitweb queries..
From: Junio C Hamano @ 2005-05-27 19:54 UTC (permalink / raw)
To: Thomas Glanzmann; +Cc: Kay Sievers, Git Mailing List
In-Reply-To: <20050527192941.GE7068@cip.informatik.uni-erlangen.de>
Thomas, could you please stop doing Mail-Followup-To in your
header please? I automatically did 'reply all' and ended up
preaching Linus (because that was the first mailbox on your
Mail-Followup-to header) how Octopus works, when he knows what
it is already.
^ permalink raw reply
* Re: More gitweb queries..
From: Junio C Hamano @ 2005-05-27 19:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kay Sievers, Git Mailing List
In-Reply-To: <20050527192941.GE7068@cip.informatik.uni-erlangen.de>
>>>>> "TG" == Thomas Glanzmann <sithglan@stud.uni-erlangen.de> writes:
TG> talking about octopus-merges ... I don't understand how they work. What
TG> happens if one file is touched in every of the 8 trees. How can that be
TG> handled?
You merge by hand and resolve if they have conflicts, just like
what you already do in two head merge case.
Octopus is only about how you record the results. Instead of
making 7 consecutive "merge from A" "merge from B" to record two
head merges, you just say "I merged these 8 heads" in a single
commit.
^ permalink raw reply
* Re: More gitweb queries..
From: Linus Torvalds @ 2005-05-27 19:48 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Kay Sievers, Git Mailing List
In-Reply-To: <7voeawxy53.fsf@assigned-by-dhcp.cox.net>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=iso-2022-jp-2, Size: 546 bytes --]
On Fri, 27 May 2005, Junio C Hamano wrote:
>
> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
>
> LT> Combining some of the features of the two (that ^[.A^[N|ber-cool revision
> LT> history graph from gitk rules, for example) might be cool. I get the
> LT> urge to do octopus-merges in the kernel just because of how good they
> LT> look in gitk ;) ]
>
> Hey, Octopus is what you explicitly told me not to do ;-).
I know, I know. I said "I get urges", I didn't say I'll do it.
I'll try to control myself.
Maybe.
Linus
^ permalink raw reply
* [PATCH] mkdelta enhancements (take 2)
From: Nicolas Pitre @ 2005-05-27 19:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Although it was described as such, git-mkdelta didn't really attempt to
find the best delta against any previous object in the list but was only
able to create a delta against the preceeding object. This patch
reworks the code to fix that limitation and hopefully makes it a bit
clearer than before.
This means that
git-mkdelta sha1 sha2 sha3 sha4 sha5 sha6
will now create a sha2 delta against sha1, a sha3 delta against either
sha2 or sha1 and keep the best one, a sha4 delta against either sha3,
sha2 or sha1, etc. The --max-behind argument limits that search for the
best delta to the specified number of previous objects in the list. If
no limit is specified it is unlimited (note: it might run out of
memory with long object lists).
Also added a -q (quiet) switch so it is possible to have 3 levels of
output: -q for nothing, -v for verbose, and if none of -q nor -v is
specified then only actual changes on the object database are shown.
Finally the git-deltafy-script has been updated accordingly, and some
bugs fixed (thanks to Stephen C. Tweedie for spotting them).
This version has been toroughly tested and I think it might be ready
for public consumption.
Signed-off-by: Nicolas Pitre <nico@cam.org>
diff --git a/git-deltafy-script b/git-deltafy-script
old mode 100644
new mode 100755
--- a/git-deltafy-script
+++ b/git-deltafy-script
@@ -1,40 +1,67 @@
#!/bin/bash
-# Script to deltafy an entire GIT repository based on the commit list.
+# Example script to deltafy an entire GIT repository based on the commit list.
# The most recent version of a file is the reference and previous versions
# are made delta against the best earlier version available. And so on for
-# successive versions going back in time. This way the delta overhead is
-# pushed towards older version of any given file.
-#
-# NOTE: the "best earlier version" is not implemented in mkdelta yet
-# and therefore only the next eariler version is used at this time.
-#
-# TODO: deltafy tree objects as well.
+# successive versions going back in time. This way the increasing delta
+# overhead is pushed towards older versions of any given file.
#
# The -d argument allows to provide a limit on the delta chain depth.
-# If 0 is passed then everything is undeltafied.
+# If 0 is passed then everything is undeltafied. Limiting the delta
+# depth is meaningful for subsequent access performance to old revisions.
+# A value of 16 might be a good compromize between performance and good
+# space saving. Current default is unbounded.
+#
+# The --max-behind=30 argument is passed to git-mkdelta so to keep
+# combinations and memory usage bounded a bit. If you have lots of memory
+# and CPU power you may remove it (or set to 0) to let git-mkdelta find the
+# best delta match regardless of the number of revisions for a given file.
+# You can also make the value smaller to make it faster and less
+# memory hungry. A value of 5 ought to still give pretty good results.
+# When set to 0 or ommitted then look behind is unbounded. Note that
+# git-mkdelta might die with a segmentation fault in that case if it
+# runs out of memory. Note that the GIT repository will still be consistent
+# even if git-mkdelta dies unexpectedly.
set -e
depth=
[ "$1" == "-d" ] && depth="--max-depth=$2" && shift 2
+function process_list() {
+ if [ "$list" ]; then
+ echo "Processing $curr_file"
+ echo "$head $list" | xargs git-mkdelta $depth --max-behind=30 -v
+ fi
+}
+
curr_file=""
git-rev-list HEAD |
-git-diff-tree -r --stdin |
-awk '/^:/ { if ($5 == "M" || $5 == "N") print $4, $6 }' |
+git-diff-tree -r -t --stdin |
+awk '/^:/ { if ($5 == "M" || $5 == "N") print $4, $6;
+ if ($5 == "M") print $3, $6 }' |
LC_ALL=C sort -s -k 2 | uniq |
while read sha1 file; do
if [ "$file" == "$curr_file" ]; then
list="$list $sha1"
else
- if [ "$list" ]; then
- echo "Processing $curr_file"
- echo "$head $list" | xargs git-mkdelta $depth -v
- fi
+ process_list
curr_file="$file"
list=""
head="$sha1"
fi
done
+process_list
+
+curr_file="root directory"
+head=""
+list="$(
+ git-rev-list HEAD |
+ while read commit; do
+ git-cat-file commit $commit |
+ sed -n 's/tree //p;Q'
+ done
+ )"
+process_list
+
diff --git a/mkdelta.c b/mkdelta.c
--- a/mkdelta.c
+++ b/mkdelta.c
@@ -98,21 +98,16 @@ static void *create_delta_object(char *b
return create_object(buf, len, hdr, hdrlen, size);
}
-static unsigned long get_object_size(unsigned char *sha1)
-{
- struct stat st;
- if (stat(sha1_file_name(sha1), &st))
- die("%s: %s", sha1_to_hex(sha1), strerror(errno));
- return st.st_size;
-}
-
-static void *get_buffer(unsigned char *sha1, char *type, unsigned long *size)
+static void *get_buffer(unsigned char *sha1, char *type,
+ unsigned long *size, unsigned long *compsize)
{
unsigned long mapsize;
void *map = map_sha1_file(sha1, &mapsize);
if (map) {
void *buffer = unpack_sha1_file(map, mapsize, type, size);
munmap(map, mapsize);
+ if (compsize)
+ *compsize = mapsize;
if (buffer)
return buffer;
}
@@ -120,198 +115,246 @@ static void *get_buffer(unsigned char *s
return NULL;
}
-static void *expand_delta(void *delta, unsigned long delta_size, char *type,
- unsigned long *size, unsigned int *depth, char *head)
+static void *expand_delta(void *delta, unsigned long *size, char *type,
+ unsigned int *depth, unsigned char **links)
{
void *buf = NULL;
- *depth++;
- if (delta_size < 20) {
+ unsigned int level = (*depth)++;
+ if (*size < 20) {
error("delta object is bad");
free(delta);
} else {
unsigned long ref_size;
- void *ref = get_buffer(delta, type, &ref_size);
+ void *ref = get_buffer(delta, type, &ref_size, NULL);
if (ref && !strcmp(type, "delta"))
- ref = expand_delta(ref, ref_size, type, &ref_size,
- depth, head);
- else
- memcpy(head, delta, 20);
- if (ref)
- buf = patch_delta(ref, ref_size, delta+20,
- delta_size-20, size);
- free(ref);
+ ref = expand_delta(ref, &ref_size, type, depth, links);
+ else if (ref)
+{
+ *links = xmalloc(*depth * 20);
+}
+ if (ref) {
+ buf = patch_delta(ref, ref_size, delta+20, *size-20, size);
+ free(ref);
+ if (buf)
+ memcpy(*links + level*20, delta, 20);
+ else
+ free(*links);
+ }
free(delta);
}
return buf;
}
static char *mkdelta_usage =
-"mkdelta [ --max-depth=N ] <reference_sha1> <target_sha1> [ <next_sha1> ... ]";
+"mkdelta [--max-depth=N] [--max-behind=N] <reference_sha1> <target_sha1> [<next_sha1> ...]";
+struct delta {
+ unsigned char sha1[20]; /* object sha1 */
+ unsigned long size; /* object size */
+ void *buf; /* object content */
+ unsigned char *links; /* delta reference links */
+ unsigned int depth; /* delta depth */
+};
+
int main(int argc, char **argv)
{
- unsigned char sha1_ref[20], sha1_trg[20], head_ref[20], head_trg[20];
- char type_ref[20], type_trg[20];
- void *buf_ref, *buf_trg, *buf_delta;
- unsigned long size_ref, size_trg, size_orig, size_delta;
- unsigned int depth_ref, depth_trg, depth_max = -1;
- int i, verbose = 0;
+ struct delta *ref, trg;
+ char ref_type[20], trg_type[20], *skip_reason;
+ void *best_buf;
+ unsigned long best_size, orig_size, orig_compsize;
+ unsigned int r, orig_ref, best_ref, nb_refs, next_ref, max_refs = 0;
+ unsigned int i, duplicate, skip_lvl, verbose = 0, quiet = 0;
+ unsigned int max_depth = -1;
for (i = 1; i < argc; i++) {
if (!strcmp(argv[i], "-v")) {
verbose = 1;
+ quiet = 0;
+ } else if (!strcmp(argv[i], "-q")) {
+ quiet = 1;
+ verbose = 0;
} else if (!strcmp(argv[i], "-d") && i+1 < argc) {
- depth_max = atoi(argv[++i]);
+ max_depth = atoi(argv[++i]);
} else if (!strncmp(argv[i], "--max-depth=", 12)) {
- depth_max = atoi(argv[i]+12);
+ max_depth = atoi(argv[i]+12);
+ } else if (!strcmp(argv[i], "-b") && i+1 < argc) {
+ max_refs = atoi(argv[++i]);
+ } else if (!strncmp(argv[i], "--max-behind=", 13)) {
+ max_refs = atoi(argv[i]+13);
} else
break;
}
- if (i + (depth_max != 0) >= argc)
+ if (i + (max_depth != 0) >= argc)
usage(mkdelta_usage);
- if (get_sha1(argv[i], sha1_ref))
- die("bad sha1 %s", argv[i]);
- depth_ref = 0;
- buf_ref = get_buffer(sha1_ref, type_ref, &size_ref);
- if (buf_ref && !strcmp(type_ref, "delta"))
- buf_ref = expand_delta(buf_ref, size_ref, type_ref,
- &size_ref, &depth_ref, head_ref);
- else
- memcpy(head_ref, sha1_ref, 20);
- if (!buf_ref)
- die("unable to obtain initial object %s", argv[i]);
-
- if (depth_ref > depth_max) {
- if (restore_original_object(buf_ref, size_ref, type_ref, sha1_ref))
- die("unable to restore %s", argv[i]);
- if (verbose)
- printf("undelta %s (depth was %d)\n", argv[i], depth_ref);
- depth_ref = 0;
- }
-
- /*
- * TODO: deltafication should be tried against any early object
- * in the object list and not only the previous object.
- */
+ if (!max_refs || max_refs > argc - i)
+ max_refs = argc - i;
+ ref = xmalloc(max_refs * sizeof(*ref));
+ for (r = 0; r < max_refs; r++)
+ ref[r].buf = ref[r].links = NULL;
+ next_ref = nb_refs = 0;
- while (++i < argc) {
- if (get_sha1(argv[i], sha1_trg))
+ do {
+ if (get_sha1(argv[i], trg.sha1))
die("bad sha1 %s", argv[i]);
- depth_trg = 0;
- buf_trg = get_buffer(sha1_trg, type_trg, &size_trg);
- if (buf_trg && !size_trg) {
+ trg.buf = get_buffer(trg.sha1, trg_type, &trg.size, &orig_compsize);
+ if (trg.buf && !trg.size) {
if (verbose)
printf("skip %s (object is empty)\n", argv[i]);
continue;
}
- size_orig = size_trg;
- if (buf_trg && !strcmp(type_trg, "delta")) {
- if (!memcmp(buf_trg, sha1_ref, 20)) {
- /* delta already in place */
- depth_ref++;
- memcpy(sha1_ref, sha1_trg, 20);
- buf_ref = patch_delta(buf_ref, size_ref,
- buf_trg+20, size_trg-20,
- &size_ref);
- if (!buf_ref)
- die("unable to apply delta %s", argv[i]);
- if (depth_ref > depth_max) {
- if (restore_original_object(buf_ref, size_ref,
- type_ref, sha1_ref))
- die("unable to restore %s", argv[i]);
- if (verbose)
- printf("undelta %s (depth was %d)\n", argv[i], depth_ref);
- depth_ref = 0;
- continue;
- }
- if (verbose)
- printf("skip %s (delta already in place)\n", argv[i]);
- continue;
+ orig_size = trg.size;
+ orig_ref = -1;
+ trg.depth = 0;
+ trg.links = NULL;
+ if (trg.buf && !strcmp(trg_type, "delta")) {
+ for (r = 0; r < nb_refs; r++)
+ if (!memcmp(trg.buf, ref[r].sha1, 20))
+ break;
+ if (r < nb_refs) {
+ /* no need to reload the reference object */
+ trg.depth = ref[r].depth + 1;
+ trg.links = xmalloc(trg.depth*20);
+ memcpy(trg.links, trg.buf, 20);
+ memcpy(trg.links+20, ref[r].links, ref[r].depth*20);
+ trg.buf = patch_delta(ref[r].buf, ref[r].size,
+ trg.buf+20, trg.size-20,
+ &trg.size);
+ strcpy(trg_type, ref_type);
+ orig_ref = r;
+ } else {
+ trg.buf = expand_delta(trg.buf, &trg.size, trg_type,
+ &trg.depth, &trg.links);
}
- buf_trg = expand_delta(buf_trg, size_trg, type_trg,
- &size_trg, &depth_trg, head_trg);
- } else
- memcpy(head_trg, sha1_trg, 20);
- if (!buf_trg)
- die("unable to read target object %s", argv[i]);
-
- if (depth_trg > depth_max) {
- if (restore_original_object(buf_trg, size_trg, type_trg, sha1_trg))
- die("unable to restore %s", argv[i]);
- if (verbose)
- printf("undelta %s (depth was %d)\n", argv[i], depth_trg);
- depth_trg = 0;
- size_orig = size_trg;
}
+ if (!trg.buf)
+ die("unable to read target object %s", argv[i]);
- if (depth_max == 0)
- goto skip;
-
- if (strcmp(type_ref, type_trg))
+ if (!nb_refs) {
+ strcpy(ref_type, trg_type);
+ } else if (max_depth && strcmp(ref_type, trg_type)) {
die("type mismatch for object %s", argv[i]);
-
- if (!size_ref) {
- if (verbose)
- printf("skip %s (initial object is empty)\n", argv[i]);
- goto skip;
- }
-
- if (depth_ref + 1 > depth_max) {
- if (verbose)
- printf("skip %s (exceeding max link depth)\n", argv[i]);
- goto skip;
}
- if (!memcmp(head_ref, sha1_trg, 20)) {
- if (verbose)
- printf("skip %s (would create a loop)\n", argv[i]);
- goto skip;
+ duplicate = 0;
+ best_buf = NULL;
+ best_size = -1;
+ best_ref = -1;
+ skip_lvl = 0;
+ skip_reason = NULL;
+ for (r = 0; max_depth && r < nb_refs; r++) {
+ void *delta_buf, *comp_buf;
+ unsigned long delta_size, comp_size;
+ unsigned int l;
+
+ duplicate = !memcmp(trg.sha1, ref[r].sha1, 20);
+ if (duplicate) {
+ skip_reason = "already seen";
+ break;
+ }
+ if (ref[r].depth >= max_depth) {
+ if (skip_lvl < 1) {
+ skip_reason = "exceeding max link depth";
+ skip_lvl = 1;
+ }
+ continue;
+ }
+ for (l = 0; l < ref[r].depth; l++)
+ if (!memcmp(trg.sha1, ref[r].links + l*20, 20))
+ break;
+ if (l != ref[r].depth) {
+ if (skip_lvl < 2) {
+ skip_reason = "would create a loop";
+ skip_lvl = 2;
+ }
+ continue;
+ }
+ if (trg.depth < max_depth && r == orig_ref) {
+ if (skip_lvl < 3) {
+ skip_reason = "delta already in place";
+ skip_lvl = 3;
+ }
+ continue;
+ }
+ delta_buf = diff_delta(ref[r].buf, ref[r].size,
+ trg.buf, trg.size, &delta_size);
+ if (!delta_buf)
+ die("out of memory");
+ if (trg.depth < max_depth &&
+ delta_size+20 >= orig_size) {
+ /* no need to even try to compress if original
+ object is smaller than this delta */
+ free(delta_buf);
+ if (skip_lvl < 4) {
+ skip_reason = "no size reduction";
+ skip_lvl = 4;
+ }
+ continue;
+ }
+ comp_buf = create_delta_object(delta_buf, delta_size,
+ ref[r].sha1, &comp_size);
+ if (!comp_buf)
+ die("out of memory");
+ free(delta_buf);
+ if (trg.depth < max_depth &&
+ comp_size >= orig_compsize) {
+ free(comp_buf);
+ if (skip_lvl < 5) {
+ skip_reason = "no size reduction";
+ skip_lvl = 5;
+ }
+ continue;
+ }
+ if ((comp_size < best_size) ||
+ (comp_size == best_size &&
+ ref[r].depth < ref[best_ref].depth)) {
+ free(best_buf);
+ best_buf = comp_buf;
+ best_size = comp_size;
+ best_ref = r;
+ }
}
- buf_delta = diff_delta(buf_ref, size_ref, buf_trg, size_trg, &size_delta);
- if (!buf_delta)
- die("out of memory");
-
- /* no need to even try to compress if original
- uncompressed is already smaller */
- if (size_delta+20 < size_orig) {
- void *buf_obj;
- unsigned long size_obj;
- buf_obj = create_delta_object(buf_delta, size_delta,
- sha1_ref, &size_obj);
- free(buf_delta);
- size_orig = get_object_size(sha1_trg);
- if (size_obj >= size_orig) {
- free(buf_obj);
- if (verbose)
- printf("skip %s (original is smaller)\n", argv[i]);
- goto skip;
- }
- if (replace_object(buf_obj, size_obj, sha1_trg))
+ if (best_buf) {
+ if (replace_object(best_buf, best_size, trg.sha1))
die("unable to write delta for %s", argv[i]);
- free(buf_obj);
- depth_ref++;
- if (verbose)
- printf("delta %s (size=%ld.%02ld%%, depth=%d)\n",
- argv[i], size_obj*100 / size_orig,
- (size_obj*10000 / size_orig)%100,
- depth_ref);
- } else {
- free(buf_delta);
- if (verbose)
- printf("skip %s (original is smaller)\n", argv[i]);
- skip:
- depth_ref = depth_trg;
- memcpy(head_ref, head_trg, 20);
+ free(best_buf);
+ free(trg.links);
+ trg.depth = ref[best_ref].depth + 1;
+ trg.links = xmalloc(trg.depth*20);
+ memcpy(trg.links, ref[best_ref].sha1, 20);
+ memcpy(trg.links+20, ref[best_ref].links, ref[best_ref].depth*20);
+ if (!quiet)
+ printf("delta %s (size=%ld.%02ld%% depth=%d dist=%d)\n",
+ argv[i], best_size*100 / orig_compsize,
+ (best_size*10000 / orig_compsize)%100,
+ trg.depth,
+ (next_ref - best_ref + max_refs)
+ % (max_refs + 1) + 1);
+ } else if (trg.depth > max_depth) {
+ if (restore_original_object(trg.buf, trg.size, trg_type, trg.sha1))
+ die("unable to restore %s", argv[i]);
+ if (!quiet)
+ printf("undelta %s (depth was %d)\n",
+ argv[i], trg.depth);
+ trg.depth = 0;
+ free(trg.links);
+ trg.links = NULL;
+ } else if (skip_reason && verbose) {
+ printf("skip %s (%s)\n", argv[i], skip_reason);
}
- free(buf_ref);
- buf_ref = buf_trg;
- size_ref = size_trg;
- memcpy(sha1_ref, sha1_trg, 20);
- }
+ if (!duplicate) {
+ free(ref[next_ref].buf);
+ free(ref[next_ref].links);
+ ref[next_ref] = trg;
+ if (++next_ref > nb_refs)
+ nb_refs = next_ref;
+ if (next_ref == max_refs)
+ next_ref = 0;
+ }
+ } while (++i < argc);
return 0;
}
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox