Git development

Git development
 help / color / mirror / Atom feed

* Re: gitview: Set the default width of graph cell
From: Aneesh Kumar @ 2006-03-01  7:15 UTC (permalink / raw)
  To: git, Junio C Hamano
In-Reply-To: <440460DC.7080307@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 305 bytes --]

On 2/28/06, Aneesh Kumar K.V <aneesh.kumar@gmail.com> wrote:
>
>
> Subject: gitview: Set the default width  of graph cell
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com>
>
> ---
>

I guess this one is better. Please apply this one . This is on top of
the previous one.

-aneesh

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: git.diff --]
[-- Type: text/x-patch; name="git.diff", Size: 810 bytes --]

diff --git a/contrib/gitview/gitview b/contrib/gitview/gitview
index ea05cd4..de9f3f3 100755
--- a/contrib/gitview/gitview
+++ b/contrib/gitview/gitview
@@ -513,7 +513,7 @@ class GitView:
 
 
 		scrollwin = gtk.ScrolledWindow()
-		scrollwin.set_policy(gtk.POLICY_NEVER, gtk.POLICY_AUTOMATIC)
+		scrollwin.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC)
 		scrollwin.set_shadow_type(gtk.SHADOW_IN)
 		vbox.pack_start(scrollwin, expand=True, fill=True)
 		scrollwin.show()
@@ -526,9 +526,6 @@ class GitView:
 		self.treeview.show()
 
 		cell = CellRendererGraph()
-		#  Set the default width to 265
-		#  This make sure that we have nice display with large tag names
-		cell.set_property("width", 265)
 		column = gtk.TreeViewColumn()
 		column.set_resizable(True)
 		column.pack_start(cell, expand=True)

^ permalink raw reply related

* Re: Quick question: end of lines
From: Martin Langhoff @ 2006-03-01  8:31 UTC (permalink / raw)
  To: Emmanuel Guerin; +Cc: git
In-Reply-To: <f898cca90602281612n777a4f17m@mail.gmail.com>

On 3/1/06, Emmanuel Guerin <emmanuel@guerin.fr.eu.org> wrote:
> What I begin to realize is that the only possibility probably lies in
> using a tool that converts the modified files "on the fly" before
> commits. I just want to make sure that no other solution was found by
> others facing a similar problem.

Perhaps a pre-commit hook? Read the documentation (and search the list
archives). I'm pretty sure you can do newline cleanup before commit or
at least newline checks before commits.

There's always the option of filing a bug in MS's bugzilla ;-)

cheers,


martin

^ permalink raw reply

* Re: Quick question: end of lines
From: Junio C Hamano @ 2006-03-01  9:01 UTC (permalink / raw)
  To: Emmanuel Guerin; +Cc: git, Martin Langhoff
In-Reply-To: <46a038f90603010031g31f8bc33xd3f45f2e19950c78@mail.gmail.com>

"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> On 3/1/06, Emmanuel Guerin <emmanuel@guerin.fr.eu.org> wrote:
>> What I begin to realize is that the only possibility probably lies in
>> using a tool that converts the modified files "on the fly" before
>> commits. I just want to make sure that no other solution was found by
>> others facing a similar problem.
>
> Perhaps a pre-commit hook? Read the documentation (and search the list
> archives). I'm pretty sure you can do newline cleanup before commit or
> at least newline checks before commits.
>
> There's always the option of filing a bug in MS's bugzilla ;-)

You can use .git/hooks/pre-commit hook in the repository the
editor that munges line-termination, to fix things up.

The hook is called with GIT_INDEX_FILE set to the appropriate
index file, so you could "git-diff-index --cached --name-only
HEAD" to obtain the list of files being committed, sanitize the
working tree files and update-index them again before returning
true from the hook.

This is a silly example to standardize on uppercase.

        git-diff-index --cached --name-only HEAD |
        xargs sh -c '
                while case "$#" in 0) break ;; esac
                do
                        perl -p -i -e "\$_ = uc(\$_)" "$1"
                        git-update-index "$1"
                        shift
                done
        ' dummy
        exit 0

^ permalink raw reply

* Re: [PATCH 3/3] Tie it all together: "git log"
From: Junio C Hamano @ 2006-03-01  9:02 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90602281538m90c4d04pbb6f277e3bec89e8@mail.gmail.com>

"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> On 3/1/06, Junio C Hamano <junkio@cox.net> wrote:
>> I would say we should just rip merge-order out.  Who uses it,
>> and why does it not work with topo-order, again?
>
> IIRC archimport uses it, but there's no reason why topo-order wouldn't work.

Thanks.  I'll push out a few patches on top of Linus' series,
with s/merge-order/topo-order/ on archimport.

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Andreas Ericsson @ 2006-03-01  9:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: Linus Torvalds, Martin Langhoff, git
In-Reply-To: <20060301065138.GC21684@hand.yhbt.net>

Eric Wong wrote:
> Linus Torvalds <torvalds@osdl.org> wrote:
> 
>>
>>On Tue, 28 Feb 2006, Martin Langhoff wrote:
>>
>>>git-svn-HEAD "moves" so it's really a bad idea to have it as a tag.
>>>Nothing within core git prevents it from moving, but I think that
>>>porcelains will start breaking. Tags and heads are the same thing,
>>>except that heads are expected to change (specifically, to move
>>>forward), and tags are expected to stand still.
>>
>><snipped>
>>Using a "refs/remotes" subdirectory makes tons of sense for something like 
>>this. Or something even more specific, like "refs/svn-tracking/". Git 
>>shouldn't care - all the tools _should_ work fine with any subdirectory 
>>structure.
> 
> 
> Git tools only work as long as the 'refs/{remotes,svn-tracking,...}/'
> prefix is specified.  git-svn-HEAD (or any $GIT_SVN_ID-HEAD) does get
> specified from the command-line quite often:
> 	
> 	git checkout -b mine git-svn-HEAD
> 	git-log git-svn-HEAD..head
> 	git-svn commit git-svn-HEAD..mine
> 	git-log mine..git-svn-HEAD
> 
> Should rev-parse be taught to be less strict and look for basenames
> that can't be found in heads/ and tags/ in other directories?
> 

It already does. The search order is this, for a ref named 'foo':
	$GIT_DIR/foo
	$GIT_DIR/refs/foo
	$GIT_DIR/refs/tags/foo
	$GIT_DIR/refs/heads/foo

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: [PATCH] diff-delta: bound hash list length to avoid O(m*n) behavior
From: Junio C Hamano @ 2006-03-01 10:38 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602281017241.25336@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

>> I tried an experimental patch to cull collided hash buckets
>> very aggressively.  I haven't applied your last "reuse index"
>> patch, though -- I think that is orthogonal and I'd like to
>> leave that to the next round.
>
> It is indeed orthogonal and I think you could apply it to the next 
> branch without the other patches (it should apply with little problems).  
> This is an obvious and undisputable gain, even more if pack-objects is 
> reworked to reduce memory usage by keeping only one live index for 
> multiple consecutive deltaattempts.

Umm.  The hash-index is rather huge, isn't it?  I did not
realize it was two-pointer structure for every byte in the
source material, and we typically delta from larger to smaller,
so we will keep about 10x the unpacked source.  Until we swap
the windowing around, that means about 100x the unpacked source
with the default window size.

Also, I am not sure which one is more costly: hash-index
building or use of that to search inside target.  I somehow got
an impression that the former is relatively cheap, and that is
what is being cached here.

> Let's suppose the reference buffer has:
>  
> ***********************************************************************/
>...
> One improvement might consist of counting the number of consecutive 
> identical bytes when starting a compare, and manage to skip as many hash 
> entries (minus the block size) before looping again with more entries in 
> the same hash bucket.

Umm, again.  Consecutive identical bytes (BTW, I think "* * *"
and "** ** **" patterns have the same collision issues without
being consecutive bytes, so such an optimization may be trickier
and cost more), when emitted as literals, would compress well,
wouldn't they?  At the end of the day, I think what matters is
the size of deflated delta, since going to disk to read it out
is more expensive than deflating and applying.  I think you made
a suggestion along the same line, capping the max delta used by
try_delta() more precisely by taking the deflated size into
account.

^ permalink raw reply

* Re: bug?: stgit creates (unneccessary?) conflicts when pulling
From: Catalin Marinas @ 2006-03-01 10:59 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Karl Hasselström, git
In-Reply-To: <20060227222600.GA11797@spearce.org>

Shawn Pearce <spearce@spearce.org> wrote:
> Karl Hasselstr?m <kha@treskal.com> wrote:
>> If I make a patch series where more than one patch touches the same
>> line, I get a lot of merge errors when upstream has accepted them and
>> I try to merge them back.
>
> When pg grabs its (possibly remote) parent ("stg pull" aka pg-rebase)
> we try to push down PatchA.  If PatchA fails to push cleanly we'll
> pop it off and try to push PatchA + PatchB.  If that pushes cleanly
> then we fold the content of PatchA into PatchB, effectively making
> PatchA part of PatchB.  If PatchA + PatchB failed to push down
> cleanly then we pop both and retry pushing PatchA + PatchB + PatchC.

How do you solve the situation where only PatchA, PatchC and PatchE
were merged, B and D still pending? Trying combinations of patches is
not a good idea.

As I said, if you have a big number of patches this might be pretty
slow. Have a look at my patch for trying the reversed patches in
reverse order. It seems to solve this problem for most of the
cases. There are cases when this method would fail like adjacent
changes made by third-party patches that break the context of the git
patches and git-apply would fail. An addition to this would be to try
a diff3 merge with the reversed patch but I don't think it's worth
since it would become much slower.

> If that pushes down cleanly then we make PatchA and PatchB officially
> part of PatchC.

I don't agree with this. For example, patches A, B and C change the
same line in file1 but patch A also changes file2 and patch B changed
file3. With your approach, merging A+B+C succeeds and you make A and B
part of C and hence move the changed to file2 and file3 in patch C.

The above can happen when the maintainer only merges part of the patch
or simply decides to merge patch C only and manually solve the
conflict in file1 (since patch C is based on the context from patches
A+B).

-- 
Catalin

^ permalink raw reply

* What's in git.git
From: Junio C Hamano @ 2006-03-01 12:24 UTC (permalink / raw)
  To: git

* The 'master' branch has these since the last announcement.

  - Cygwin related fixes (Alex Riesen) [*]
  - git-rm fixes and docs (Carl Worth)
  - gitview updates (Aneesh Kumar, Pavel Roskin)
  - git-svn updates (Eric Wong)
  - git-cvsserver (Martin Langhoff, Johannes Schindelin)
  - git-annotate (Ryan Anderson)
  - format-patch fix (Alexandre Julliard)
  - fix send-pack to a remote with insanely large number of refs [*]
  - "thin" pack git-push/git-fetch.
  - eye candies to checkout [*].
  - error() formatting fixes [*].
  - git-am empty commit prevention [*].
  - git-mailinfo now is built and installed again.
  - fix two sample hooks [*].
  - diffcore-rename and diffcore-break microfix [*].
  - svnimport enhancements (Karl Hasselström)
  - git-fetch output tweak (Lukas Sandström)
  - start to do more things in git wrapper (Linus)
  - combine-diff fixes (Mark Wooding) [*]
  - ls-files -i -o fix (Shawn Pearce)
  - Darwin related fix (Shawn Pearce)
  - compilation warning fixes (Timo Hirvonen, Tony Luck, Andreas Ericsson)

  The changes marked with [*] will appear in the next
  maintenance release; they are either first applied to 1.2.X
  maintenance branch and pulled into master, or first applied to
  master and then cherry picked to 1.2.X maintenance branch.

* The 'next' branch, in addition, has these.

  I wanted to have this out to "master", but ran out of time.
  The same set of changes are already cherry-picked and waiting
  for inclusion in the next maintenance release.

  - git-apply trailing whitespace warning (Linus and me)

  These are waiting for further progress by authors:

  - git-blame (Fredrik Kuivinen)
  - delta packer updates for tighter packs (Nicolas Pitre)

  These are here only because they are new, not because I have
  any qualms about them:

  - for_each_ref warning (Johannes Schindelin)
  - prepare to make rename/break detection independent from delta packing.
  - checkout-index --stdin (Shawn Pearce)

  These are here because they are rather important and I am
  playing it safe.

  - beginning of rev-list libification (Linus)
  - git-log without shell script (Linus and me) 

  I am almost happy about this.  Now the author mapping format
  is the same between cvs/svn importers, would it make sense to
  unify them so that other foreign scm interface can also follow
  suit?  Usually you would not have upstreams with two different
  foreign scm to a single repository anyway, so this may not be an
  issue, though...

  - git-svnimport save author name mapping to a file (Karl Hasselström)

* The 'pu' branch, in addition, has these.

  This is in preparation for Nico's delta work already in "next".

  - make rename/break detection independent from delta packing.

  These muddy the water for what is in "next", improving of
  which is more important.

  - diff-delta: cull collided hash bucket more aggressively.
  - diff-delta: allow reusing of the reference buffer index (Nicolas Pitre)

  I am not sure about the command line interface of this.  Would
  it make more sense to checkout three stages in one pass?

  - checkout-index --suffix (Shawn Pearce)

^ permalink raw reply

* [PATCH 1/2] git-log (internal): add approxidate.
From: Junio C Hamano @ 2006-03-01 12:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602281504280.22647@g5.osdl.org>

Next will be the pretty-print format.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---
  Linus Torvalds <torvalds@osdl.org> writes:

  > I didn't add the logic for --before/--after flags, but that should be 
  > pretty trivial, and is independent of this anyway.
  >
  > Perhaps more importantly, I didn't remove the tests that now start 
  > failing, nor did I remove the actual code to do --merge-order ;/

  I've done the janitorial, and have two more on top.  Here is the
  first one.  I'd appreciate comments on the second one.

 revision.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

2eba658eaffdf4c5c9d0767b49e4c27d7281cda6
diff --git a/revision.c b/revision.c
index c84f146..4885871 100644
--- a/revision.c
+++ b/revision.c
@@ -492,6 +492,26 @@ int setup_revisions(int argc, const char
 				revs->limited = 1;
 				continue;
 			}
+			if (!strncmp(arg, "--since=", 8)) {
+				revs->max_age = approxidate(arg + 8);
+				revs->limited = 1;
+				continue;
+			}
+			if (!strncmp(arg, "--after=", 8)) {
+				revs->max_age = approxidate(arg + 8);
+				revs->limited = 1;
+				continue;
+			}
+			if (!strncmp(arg, "--before=", 9)) {
+				revs->min_age = approxidate(arg + 9);
+				revs->limited = 1;
+				continue;
+			}
+			if (!strncmp(arg, "--until=", 8)) {
+				revs->min_age = approxidate(arg + 8);
+				revs->limited = 1;
+				continue;
+			}
 			if (!strcmp(arg, "--all")) {
 				handle_all(revs, flags);
 				continue;
-- 
1.2.3.g9425

^ permalink raw reply related

* [PATCH 2/2] git-log (internal): more options.
From: Junio C Hamano @ 2006-03-01 12:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602281504280.22647@g5.osdl.org>

This ports the following options from rev-list based git-log
implementation:

 * -<n>, -n<n>, and -n <n>.  I am still wondering if we want
    this natively supported by setup_revisions(), which already
    takes --max-count.  We may want to move them in the next
    round.  Also I am not sure if we can get away with not
    setting revs->limited when we set max-count.  The latest
    rev-list.c and revision.c in this series do not, so I left
    them as they are.

 * --pretty and --pretty=<fmt>.

 * --abbrev=<n> and --no-abbrev.

The previous commit already handles time-based limiters
(--since, --until and friends).  The remaining things that
rev-list based git-log happens to do are not useful in a pure
log-viewing purposes, and not ported:

 * --bisect (obviously).

 * --header.  I am actually in favor of doing the NUL
   terminated record format, but rev-list based one always
   passed --pretty, which defeated this option.  Maybe next
   round.

 * --parents.  I do not think of a reason a log viewer wants
   this.  The flag is primarily for feeding squashed history
   via pipe to downstream tools.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * comes on top of --since/--until patch which in turn comes on
   top of janitorial "remove merge-order" change.

 git.c      |   70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 rev-list.c |    5 ++--
 revision.h |    1 +
 3 files changed, 72 insertions(+), 4 deletions(-)

4e365d2558356cd091ebf57f689c477a86822d53
diff --git a/git.c b/git.c
index b0da6b1..bf68dac 100644
--- a/git.c
+++ b/git.c
@@ -256,12 +256,80 @@ static int cmd_log(int argc, char **argv
 	struct rev_info rev;
 	struct commit *commit;
 	char *buf = xmalloc(LOGSIZE);
+	static enum cmit_fmt commit_format = CMIT_FMT_DEFAULT;
+	int abbrev = DEFAULT_ABBREV;
+	int show_parents = 0;
+	const char *commit_prefix = "commit ";
 
 	argc = setup_revisions(argc, argv, &rev, "HEAD");
+	while (1 < argc) {
+		char *arg = argv[1];
+		/* accept -<digit>, like traditilnal "head" */
+		if ((*arg == '-') && isdigit(arg[1])) {
+			rev.max_count = atoi(arg + 1);
+		}
+		else if (!strcmp(arg, "-n")) {
+			if (argc < 2)
+				die("-n requires an argument");
+			rev.max_count = atoi(argv[2]);
+			argc--; argv++;
+		}
+		else if (!strncmp(arg,"-n",2)) {
+			rev.max_count = atoi(arg + 2);
+		}
+		else if (!strncmp(arg, "--pretty", 8)) {
+			commit_format = get_commit_format(arg + 8);
+			if (commit_format == CMIT_FMT_ONELINE)
+				commit_prefix = "";
+		}
+		else if (!strcmp(arg, "--parents")) {
+			show_parents = 1;
+		}
+		else if (!strcmp(arg, "--no-abbrev")) {
+			abbrev = 0;
+		}
+		else if (!strncmp(arg, "--abbrev=", 9)) {
+			abbrev = strtoul(arg + 9, NULL, 10);
+			if (abbrev && abbrev < MINIMUM_ABBREV)
+				abbrev = MINIMUM_ABBREV;
+			else if (40 < abbrev)
+				abbrev = 40;
+		}
+		else
+			die("unrecognized argument: %s", arg);
+		argc--; argv++;
+	}
+
 	prepare_revision_walk(&rev);
 	setup_pager();
 	while ((commit = get_revision(&rev)) != NULL) {
-		pretty_print_commit(CMIT_FMT_DEFAULT, commit, ~0, buf, LOGSIZE, 18);
+		printf("%s%s", commit_prefix,
+		       sha1_to_hex(commit->object.sha1));
+		if (show_parents) {
+			struct commit_list *parents = commit->parents;
+			while (parents) {
+				struct object *o = &(parents->item->object);
+				parents = parents->next;
+				if (o->flags & TMP_MARK)
+					continue;
+				printf(" %s", sha1_to_hex(o->sha1));
+				o->flags |= TMP_MARK;
+			}
+			/* TMP_MARK is a general purpose flag that can
+			 * be used locally, but the user should clean
+			 * things up after it is done with them.
+			 */
+			for (parents = commit->parents;
+			     parents;
+			     parents = parents->next)
+				parents->item->object.flags &= ~TMP_MARK;
+		}
+		if (commit_format == CMIT_FMT_ONELINE)
+			putchar(' ');
+		else
+			putchar('\n');
+		pretty_print_commit(commit_format, commit, ~0, buf,
+				    LOGSIZE, abbrev);
 		printf("%s\n", buf);
 	}
 	free(buf);
diff --git a/rev-list.c b/rev-list.c
index 6af8d86..8e4d83e 100644
--- a/rev-list.c
+++ b/rev-list.c
@@ -7,10 +7,9 @@
 #include "diff.h"
 #include "revision.h"
 
-/* bits #0-3 in revision.h */
+/* bits #0-4 in revision.h */
 
-#define COUNTED		(1u << 4)
-#define TMP_MARK	(1u << 5) /* for isolated cases; clean after use */
+#define COUNTED		(1u<<5)
 
 static const char rev_list_usage[] =
 "git-rev-list [OPTION] <commit-id>... [ -- paths... ]\n"
diff --git a/revision.h b/revision.h
index 0043c16..31e8f61 100644
--- a/revision.h
+++ b/revision.h
@@ -5,6 +5,7 @@
 #define UNINTERESTING   (1u<<1)
 #define TREECHANGE	(1u<<2)
 #define SHOWN		(1u<<3)
+#define TMP_MARK	(1u<<4) /* for isolated cases; clean after use */
 
 struct rev_info {
 	/* Starting list */
-- 
1.2.3.g9425

^ permalink raw reply related

* impure renames / history tracking
From: Paul Jakma @ 2006-03-01 14:01 UTC (permalink / raw)
  To: git list

Hi,

I'm trying to understand git better (so I can explain it better to 
others, with an eye to them considering switching to git), one 
question I have is about renames.

- git obviously detects pure renames perfectly well

- git doesn't however record renames, so 'impure' renames may not be
   detected

My question is:

- why not record rename information explicitely in the commit object?

I.e. so as to be able to follow history information through 'impure' 
renames without having to resort to heuristics.

E.g. imagine a project where development typically occurs through:

o: commit
m: merge

    o---o-m--o-o-o--o----m <- project
   /     /              /
o-o-o-o-o--o-o-o--o-o-o <- main branch

The project merge back to main in one 'big' combined merge 
(collapsing all of the commits on 'project' into one commit). This 
leads to 'impure renames' being not uncommon. The desired end-result 
of merging back to 'main' being to rebase 'project' as one commit 
against 'main', and merge that single commit back, a la:

    o---o-m--o-o-o--o----m <- project
   /     /              /
o-o-o-o-o--o-o-o--o-o-o---m <- main branch
                        \ /
                         o <- project_collapsed

So that 'm' on 'main' is that one commit[1].

The merits or demerits of such merging practice aside, what reason 
would there be /against/ recording explicit rename information in the 
commit object, so as to help browsers follow history (particularly 
impure renames) better in a commit?

I.e. would there be resistance to adding meta-info rename headers 
commit objects, and having diffcore and other tools to use those 
headers to /augment/ their existing heuristics in detecting renames?

Thanks!

1. Git currently doesn't have 'porcelain' to do this, presumably 
there'd be no objection to one?

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
It is the quality rather than the quantity that matters.
- Lucius Annaeus Seneca (4 B.C. - A.D. 65)

^ permalink raw reply

* Re: bug?: stgit creates (unneccessary?) conflicts when pulling
From: Shawn Pearce @ 2006-03-01 14:51 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Karl Hasselström, git
In-Reply-To: <tnx1wxmig75.fsf@arm.com>

[Side Note: I've suddenly stopped receiving mail from vger.
 Even majordomo isn't replying to my pleas for help.  Arggh!
 Yet all other incoming email seems to be fine.]

Catalin Marinas <catalin.marinas@arm.com> wrote:
> Shawn Pearce <spearce@spearce.org> wrote:
> > Karl Hasselstr?m <kha@treskal.com> wrote:
> >> If I make a patch series where more than one patch touches the same
> >> line, I get a lot of merge errors when upstream has accepted them and
> >> I try to merge them back.
> >
> > When pg grabs its (possibly remote) parent ("stg pull" aka pg-rebase)
> > we try to push down PatchA.  If PatchA fails to push cleanly we'll
> > pop it off and try to push PatchA + PatchB.  If that pushes cleanly
> > then we fold the content of PatchA into PatchB, effectively making
> > PatchA part of PatchB.  If PatchA + PatchB failed to push down
> > cleanly then we pop both and retry pushing PatchA + PatchB + PatchC.
> 
> How do you solve the situation where only PatchA, PatchC and PatchE
> were merged, B and D still pending? Trying combinations of patches is
> not a good idea.

Yea, ouch.  pg would fold everything into E, destroying the B
and D boundary.  A (not so good) workaround right now would be to
undo the rebase, pop all patches, rebase, then push one by one.
I didn't even consider this case as its not my workflow style:
at least not right now.

> As I said, if you have a big number of patches this might be pretty
> slow. Have a look at my patch for trying the reversed patches in
> reverse order. It seems to solve this problem for most of the
> cases. There are cases when this method would fail like adjacent
> changes made by third-party patches that break the context of the git
> patches and git-apply would fail. An addition to this would be to try
> a diff3 merge with the reversed patch but I don't think it's worth
> since it would become much slower.

True.  The constant reapplication does really slow it down.  So does
grabbing the reverse patch and seeing if it applies backwards
cleanly.  Neither operation is fast, and neither is really going
to be fast.

BTW - I did read through your patch when it was posted: the reverse
apply idea is pretty slick and should work a large part of the time,
as you said.  Nice addition to StGIT.

> > If that pushes down cleanly then we make PatchA and PatchB officially
> > part of PatchC.
> 
> I don't agree with this. For example, patches A, B and C change the
> same line in file1 but patch A also changes file2 and patch B changed
> file3. With your approach, merging A+B+C succeeds and you make A and B
> part of C and hence move the changed to file2 and file3 in patch C.
> 
> The above can happen when the maintainer only merges part of the patch
> or simply decides to merge patch C only and manually solve the
> conflict in file1 (since patch C is based on the context from patches
> A+B).

Ah, yes.  The upstream maintainer who doesn't take everything.
Shame on them.  :-) Shame on me for also not dealing with this
case in pg, you are completely correct that folding these patches
together in this scenario is a really bad idea.

In the one environment where I really use pg the upstream is forced
to take the entire patch: I have my own foreign SCM interface to PVCS
VM (its heavily customized crap, so I'm not going to contribute it)
and in this interface the upstream is forced to take the entire patch
every time.  So right now its not a huge concern to me personally,
but if anyone else is trying to use pg it might be.

-- 
Shawn.

^ permalink raw reply

* Re: [PATCH] Teach git-checkout-index to use file suffixes.
From: Shawn Pearce @ 2006-03-01 15:06 UTC (permalink / raw)
  To: git
In-Reply-To: <20060301044132.GF22894@spearce.org>

Shawn Pearce <spearce@spearce.org> wrote:
> Sometimes it is useful to unpack the unmerged stage entries
> to the same directory as the tracked file itself, but with
> a suffix indicating which stage that version came from.
> In many user interface level scripts this is being done
> by git-unpack-file followed by creating the necessary
> directory structure and then moving the file into the
> directory with the requested name.  It is now possible to
> perform the same action for a larger set of files directly
> through git-checkout-index.

Junio mentioned in his ``What's in git.git'' email that he's not
sure of this command line interface:

Junio C Hamano <junkio@cox.net> wrote:
> I am not sure about the command line interface of this.  Would
> it make more sense to checkout three stages in one pass?
> 
>     - checkout-index --suffix (Shawn Pearce)

I thought about the same thing myself when I submitted the patch
to the list.  I probably should have talked a little bit about that
in the email.  :-)

I thought about using instead:

  --stage=all --suffix1=\#1 --suffix2\=#2 --suffix3=\#3

but then thought that the performance gains achieved by only forking
git-checkout-index once, scanning the index once, etc. were not
that big of a difference compared to the rather horrible looking
command line syntax that produced and required one to use.

If anyone has any suggestions for these options, please pass them
along.  I'll rebuild the patch to pull all available stages if we
can come up with a suitable way of describing such.

-- 
Shawn.

^ permalink raw reply

* Re: bug?: stgit creates (unneccessary?) conflicts when pulling
From: Catalin Marinas @ 2006-03-01 15:08 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Karl Hasselström, git
In-Reply-To: <20060301145105.GB3313@spearce.org>

On 01/03/06, Shawn Pearce <spearce@spearce.org> wrote:
> [Side Note: I've suddenly stopped receiving mail from vger.
>  Even majordomo isn't replying to my pleas for help.  Arggh!
>  Yet all other incoming email seems to be fine.]

news.gmane.org

> True.  The constant reapplication does really slow it down.  So does
> grabbing the reverse patch and seeing if it applies backwards
> cleanly.  Neither operation is fast, and neither is really going
> to be fast.

I realised that, depending on the number of patches merged upstream,
using this option can make StGIT faster. That's because when pushing a
patch (without the --merged option), StGIT first tries a diff | apply
followed by a three-way merge (even slower) if the former method
fails. This means that for all the patches merged upstream, StGIT
tries both methods since diff | apply fails anyway. With the --merged
option, StGIT would only try the reverse-diff | apply and, if this
succeeds, it will skip the normal push methods.

--
Catalin

^ permalink raw reply

* Re: impure renames / history tracking
From: Andreas Ericsson @ 2006-03-01 15:38 UTC (permalink / raw)
  To: Paul Jakma; +Cc: git list
In-Reply-To: <Pine.LNX.4.64.0603011343170.13612@sheen.jakma.org>

Paul Jakma wrote:
> 
> - git obviously detects pure renames perfectly well
> 
> - git doesn't however record renames, so 'impure' renames may not be
>   detected
> 
> My question is:
> 
> - why not record rename information explicitely in the commit object?
> 

Mainly for two reasons, iirc:
1. Extensive metadata is evil.
2. Backwards compatibility. Old repos should always work with new tools. 
Old tools should work with new repos, at least until a new major-release 
is released.

> I.e. so as to be able to follow history information through 'impure' 
> renames without having to resort to heuristics.
> 
> E.g. imagine a project where development typically occurs through:
> 
> o: commit
> m: merge
> 
>    o---o-m--o-o-o--o----m <- project
>   /     /              /
> o-o-o-o-o--o-o-o--o-o-o <- main branch
> 
> The project merge back to main in one 'big' combined merge (collapsing 
> all of the commits on 'project' into one commit). This leads to 'impure 
> renames' being not uncommon. The desired end-result of merging back to 
> 'main' being to rebase 'project' as one commit against 'main', and merge 
> that single commit back, a la:
> 
>    o---o-m--o-o-o--o----m <- project
>   /     /              /
> o-o-o-o-o--o-o-o--o-o-o---m <- main branch
>                        \ /
>                         o <- project_collapsed
> 
> So that 'm' on 'main' is that one commit[1].
> 

I think you're misunderstanding the git meaning of rebase here. "git 
rebase" moves all commits since "project" forked from "main branch" to 
the tip of "main branch".

Other than that, this is the recommended workflow, and exactly how Linux 
and git both are managed (i.e. topic branches eventually merged into 
'master').

In your drawings, 'main branch' would be 'master' and 'project' would be 
any amount of topic-branches (or just one, if you like that better).

I'm not sure what you mean by 'project_collapsed' though. If I 
understand you correctly, each branch-head represents one 'collapse'. I 
suggest you clone the git repo and do

	$ gitk master
	$ gitk next
	$ gitk pu

gitk is great for visualizing what you've done and what the repo looks 
like. Use and abuse it frequently every time you're unsure what was you 
just did. It's the best way to quickly learn what happens, really.

If you just want to distribute snapshots I suggest you do take a look at 
git-tar-tree. Junio makes nice use of it in the git Makefile (the dist: 
target).

> The merits or demerits of such merging practice aside, what reason would 
> there be /against/ recording explicit rename information in the commit 
> object, so as to help browsers follow history (particularly impure 
> renames) better in a commit?
> 
> I.e. would there be resistance to adding meta-info rename headers commit 
> objects, and having diffcore and other tools to use those headers to 
> /augment/ their existing heuristics in detecting renames?
> 

Personally I think metadata is evil. Renames will still be auto-detected 
anyway, and with the distributed repo setup the only reason git 
shouldn't be able to detect a rename is if you rename a file and hack it 
up so it doesn't even come close to matching its origin (close in this 
case is 80% by default, I think). In those cases it isn't so much a 
rename as a rewrite. If you find the commit where the file was renamed 
it should be listed in that commit, like so:

	similarity index 92%
	rename from Documentation/git-log-script.txt
	rename to Documentation/git-log.txt

(this is gitk output from the git repo. Search for "Big tool rename")

IMO this is far better than having to tell git "I renamed this file to 
that", since it also detects code-copying with modifications, and it's 
usually quick enough to find those renames as well.

> Thanks!
> 
> 1. Git currently doesn't have 'porcelain' to do this, presumably there'd 
> be no objection to one?
> 

	$ git checkout master
	$ git pull . project

The dot means "pull from the local repo". "project" is the branch you 
want to merge into master. You can pull an arbitrary amount of branches 
in one go ("octopus" merge). The current tested limit is 12 (thanks, Len 
;) ).

If, for some reason, you want to combine lots of commits into a single 
mega-patch (like Linus does for each release of the kernel), you can do:

	$ git diff $(git merge-base main project) project > patch-file

Then you can apply patch-file to whatever branch you want and make the 
commit as if it was a single change-set. I'd recommend against it unless 
you're just toying around though. It's a bad idea to lie in a projects 
history.

Hope that helps.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: [PATCH 2/2] git-log (internal): more options.
From: Linus Torvalds @ 2006-03-01 15:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vbqwqgxo8.fsf@assigned-by-dhcp.cox.net>

On Wed, 1 Mar 2006, Junio C Hamano wrote:
>
> This ports the following options from rev-list based git-log
> implementation:
> 
>  * -<n>, -n<n>, and -n <n>.  I am still wondering if we want
>     this natively supported by setup_revisions(), which already
>     takes --max-count.  We may want to move them in the next
>     round.  Also I am not sure if we can get away with not
>     setting revs->limited when we set max-count.  The latest
>     rev-list.c and revision.c in this series do not, so I left
>     them as they are.
> 
>  * --pretty and --pretty=<fmt>.
> 
>  * --abbrev=<n> and --no-abbrev.

Looks good.

I _suspect_ that we want to handle them all in setup_revision(), but I 
wasn't sure, so I left them in rev-list.c originally.

Most helpers that want a list of commits probably want the printing 
options too, and the ones that do not probably simply don't care (ie if 
they silently pass a "--pretty=raw" without it affecting anything, who 
really cares?)

> The previous commit already handles time-based limiters
> (--since, --until and friends).  The remaining things that
> rev-list based git-log happens to do are not useful in a pure
> log-viewing purposes, and not ported:
> 
>  * --bisect (obviously).
> 
>  * --header.  I am actually in favor of doing the NUL
>    terminated record format, but rev-list based one always
>    passed --pretty, which defeated this option.  Maybe next
>    round.
> 
>  * --parents.  I do not think of a reason a log viewer wants
>    this.  The flag is primarily for feeding squashed history
>    via pipe to downstream tools.

I can actually imagine using "--parents" as a way of parsing both the 
commit log and the history. Of course, any such use is likely in a script, 
at which point the script probably doesn't actually want "git log", but 
just a raw "git-rev-list".

After all, the only _real_ difference between "git log" and "git-rev-list" 
is the purely syntactic one (things like defaulting to HEAD in "git log" 
and requiring revisions in git-rev-list), and the use of PAGER.

To me, the question whether a flag would be parsed in the "revision.c" 
library or in the "rev-list.c" binary was more a question of whether that 
flag makes sense for other things than just "git log". 

For example, "git whatchanged" and "git diff" could both use 
setup_revision(), although "git diff" wouldn't actually _walk_ the 
revisions (it would just look at the "revs->commits" list to see what was 
passed in).

"git whatchanged" would obviously take all the same flags "git log" does, 
and "git diff" could take them and just test the values for sanity (ie 
error out if min/max_date is not -1, for example).

"git show" is like a "git-whatchanged" except it wouldn't walk the diffs 
(I considered adding a "--nowalk" option to setup_revisions(), which would 
just suppress the "add_parents_to_list()" entirely)

			Linus

^ permalink raw reply

* Re: bug?: stgit creates (unneccessary?) conflicts when pulling
From: Shawn Pearce @ 2006-03-01 15:50 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git
In-Reply-To: <b0943d9e0603010708l72cb14d1w@mail.gmail.com>

Catalin Marinas <catalin.marinas@gmail.com> wrote:
> On 01/03/06, Shawn Pearce <spearce@spearce.org> wrote:
> > True.  The constant reapplication does really slow it down.  So does
> > grabbing the reverse patch and seeing if it applies backwards
> > cleanly.  Neither operation is fast, and neither is really going
> > to be fast.
> 
> I realised that, depending on the number of patches merged upstream,
> using this option can make StGIT faster. That's because when pushing a
> patch (without the --merged option), StGIT first tries a diff | apply
> followed by a three-way merge (even slower) if the former method
> fails. This means that for all the patches merged upstream, StGIT
> tries both methods since diff | apply fails anyway. With the --merged
> option, StGIT would only try the reverse-diff | apply and, if this
> succeeds, it will skip the normal push methods.

Speaking of making StGIT faster: earlier we were talking about how
git-diff|git-apply is faster than a 3 way git-read-tree on large
merges when there are many structural changes in the tree due to
the smaller number of process spawns required.

You might want to take a look at pg--merge-all: This is sort of based
on git-merge-recursive, but I've gotten it down to just a handful
of process spawns, aside from the stupidity of git-checkout-index.
(My recent git-checkout-index patches are working to correct that.)

-- 
Shawn.

^ permalink raw reply

* Re: [PATCH] Teach git-checkout-index to read filenames from stdin.
From: Christopher Faylor @ 2006-03-01 15:50 UTC (permalink / raw)
  To: git
In-Reply-To: <20060301024333.GB21186@spearce.org>

On Tue, Feb 28, 2006 at 09:43:33PM -0500, Shawn Pearce wrote:
>Since git-checkout-index is often used from scripts which may have a
>stream of filenames they wish to checkout it is more convenient to use
>--stdin than xargs.  On platforms where fork performance is currently
>sub-optimal and the length of a command line is limited (*cough* Cygwin
>*cough*)

AFAIK, the length of the command line for cygwin apps is very large --
if you're using recent versions of Cygwin.  I believe that it is longer
than the linux default.  We bypass the Windows mechanism for setting the
command line when a cygwin program starts a cygwin program.

For native Windows programs, the command line length is ~32K but I don't
think that git uses any native Windows programs, does it?

cgf

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Linus Torvalds @ 2006-03-01 15:53 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Eric Wong, Martin Langhoff, git
In-Reply-To: <44056BF1.6000109@op5.se>

On Wed, 1 Mar 2006, Andreas Ericsson wrote:
>
> Eric Wong wrote:
> > 
> > Should rev-parse be taught to be less strict and look for basenames
> > that can't be found in heads/ and tags/ in other directories?
> 
> It already does. The search order is this, for a ref named 'foo':
> 	$GIT_DIR/foo
> 	$GIT_DIR/refs/foo
> 	$GIT_DIR/refs/tags/foo
> 	$GIT_DIR/refs/heads/foo

Yes, but I think Eric wanted to avoid having to write the prefix part, 
which git won't let you do right now.

If you have a ref in .git/refs/svn-tracker/git-svn-HEAD, you would have to 
write out all of "svn-tracker/git-svn-HEAD", because unlike a "real 
branch", get_sha1() won't look into the "svn-tracker" without it being 
explicitly mentioned.

Now, some tools will actually do "for_each_ref()" and check the ref-name 
against each of them (so if you pass in "foo", it will check them afainst 
_any_ ref-subdirectory that contains "foo"). But get_sha1() won't.

We could fix get_sha1(), but part of the logic was that other 
subdirectories are special, and as such they _should_ be mentioned, so 
that a file in such a special directory isn't ever confused with a real 
branch.

But if you were to use for example .git/refs/git-svn/tracking as the 
svn-tracking reference head, and then you'd be perfectly able to use

	git log git-svn/tracking..

to see what you've done since the last svn import?

(or use HEAD, if you prefer that over "tracking")

		Linus

^ permalink raw reply

* Re: [PATCH] Teach git-checkout-index to use file suffixes.
From: Mark Wooding @ 2006-03-01 15:56 UTC (permalink / raw)
  To: git
In-Reply-To: <20060301150629.GB3456@spearce.org>

Shawn Pearce <spearce@spearce.org> wrote:

> I thought about using instead:
>
>   --stage=all --suffix1=\#1 --suffix2\=#2 --suffix3=\#3

How about something like

  --suffixes=:#1:#2:#3

uses first character as a delimiter to separate the suffixes.  A single
--suffix option could plausibly provide the suffix if only one stage is
being checked out, and doesn't have the grim delimiter wart.

I suppose, though, that if this is going to be wrapped up in a script,
it doesn't really matter that much.

-- [mdw]

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Andreas Ericsson @ 2006-03-01 16:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Eric Wong, Martin Langhoff, git
In-Reply-To: <Pine.LNX.4.64.0603010745320.22647@g5.osdl.org>

Linus Torvalds wrote:
> 
> On Wed, 1 Mar 2006, Andreas Ericsson wrote:
> 
>>Eric Wong wrote:
>>
>>>Should rev-parse be taught to be less strict and look for basenames
>>>that can't be found in heads/ and tags/ in other directories?
>>
>>It already does. The search order is this, for a ref named 'foo':
>>	$GIT_DIR/foo
>>	$GIT_DIR/refs/foo
>>	$GIT_DIR/refs/tags/foo
>>	$GIT_DIR/refs/heads/foo
> 
> 
> Yes, but I think Eric wanted to avoid having to write the prefix part, 
> which git won't let you do right now.
> 
> If you have a ref in .git/refs/svn-tracker/git-svn-HEAD, you would have to 
> write out all of "svn-tracker/git-svn-HEAD", because unlike a "real 
> branch", get_sha1() won't look into the "svn-tracker" without it being 
> explicitly mentioned.
> 
> Now, some tools will actually do "for_each_ref()" and check the ref-name 
> against each of them (so if you pass in "foo", it will check them afainst 
> _any_ ref-subdirectory that contains "foo"). But get_sha1() won't.
> 

Didn't know that. The day is not a complete waste then.


> We could fix get_sha1(), but part of the logic was that other 
> subdirectories are special, and as such they _should_ be mentioned, so 
> that a file in such a special directory isn't ever confused with a real 
> branch.
> 
> But if you were to use for example .git/refs/git-svn/tracking as the 
> svn-tracking reference head, and then you'd be perfectly able to use
> 
> 	git log git-svn/tracking..
> 
> to see what you've done since the last svn import?
> 

Personally I'm all for namespace separation. I'm assuming the script has 
the tracker-branch hardcoded anyway, so I don't really understand why it 
would be necessary to keep other refs in a separate directory and, if it 
*is* necessary, why that subdirectory can't be .git/refs/heads/svn.

Eric mentioned earlier that the tracking-branch can't be committed to 
(ever), so the user convenience for searching other directories should 
be nearly non-existant.

Perhaps I'm missing something obvious. Perhaps I'm just stupid. Perhaps 
the pub just opened and I don't feel like reading it twice to make sure 
I understood. ;)

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: git-svn and huge data and modifying the git-svn-HEAD branch directly
From: Linus Torvalds @ 2006-03-01 16:24 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Eric Wong, Martin Langhoff, git
In-Reply-To: <4405C6BE.2000706@op5.se>

On Wed, 1 Mar 2006, Andreas Ericsson wrote:
> 
> Personally I'm all for namespace separation. I'm assuming the script has the
> tracker-branch hardcoded anyway, so I don't really understand why it would be
> necessary to keep other refs in a separate directory and, if it *is*
> necessary, why that subdirectory can't be .git/refs/heads/svn.
> 
> Eric mentioned earlier that the tracking-branch can't be committed to (ever),
> so the user convenience for searching other directories should be nearly
> non-existant.

The thing about it being .git/refs/heads/svn/xyzzy is that then you can do

	git checkout svn/xyzzy

and start modifying it. Which is exactly against the point: the thing is 
_not_ a branch and you must _not_ commit to it.

It's much more like a tag: it's a pointer to the last point of an 
svn-import.

So I think it should either _be_ a tag (although Dscho worries about some 
broken porcelain being confused by tags changing) or it should be in a 
namespace all it's own. Not under .git/refs/heads/ at any point, because 
it is _not_ a head of development.

		Linus

^ permalink raw reply

* Re: impure renames / history tracking
From: Paul Jakma @ 2006-03-01 16:27 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: git list
In-Reply-To: <4405C012.6080407@op5.se>

On Wed, 1 Mar 2006, Andreas Ericsson wrote:

> Mainly for two reasons, iirc:

> 1. Extensive metadata is evil.

Only if /required/. I wouldn't argue for rename meta-data to be 
'core', only as an additional hint into the rename-detection process.

FWIW, I think git's rename handling is really nice. It's just I 
suspect, being a heuristic, it won't be able to follow history 
reliably across 'very impure' renames.

> 2. Backwards compatibility. Old repos should always work with new 
> tools. Old tools should work with new repos, at least until a new 
> major-release is released.

Absolutely.

>> o: commit
>> m: merge
>>
>>    o---o-m--o-o-o--o----m <- project
>>   /     /              /
>> o-o-o-o-o--o-o-o--o-o-o <- main branch
>> 
>> The project merge back to main in one 'big' combined merge (collapsing all 
>> of the commits on 'project' into one commit). This leads to 'impure 
>> renames' being not uncommon. The desired end-result of merging back to 
>> 'main' being to rebase 'project' as one commit against 'main', and merge 
>> that single commit back, a la:
>>
>>    o---o-m--o-o-o--o----m <- project
>>   /     /              /
>> o-o-o-o-o--o-o-o--o-o-o---m <- main branch
>>                        \ /
>>                         o <- project_collapsed
>> 
>> So that 'm' on 'main' is that one commit[1].

> I think you're misunderstanding the git meaning of rebase here. 
> "git rebase" moves all commits since "project" forked from "main 
> branch" to the tip of "main branch".

Right, I'm referring to 'rebase' generally, as a concept, not to 
git-rebase specifically. E.g. git diff main..project is another way 
of rebasing I think.

> Other than that, this is the recommended workflow, and exactly how Linux and 
> git both are managed (i.e. topic branches eventually merged into 'master').

They're not rebased though, generally. They're pulled. Ie, in Linux 
and git when 'project' is merged, things look like:

     o---o-m--o-o-o--o----m   <- project
    /     /              / \
o-o-o-o-o--o-o-o--o-o-o----m <- main branch

The rest of the world sees /all/ the individual commits of 'project' 
right? The traditional process for the case I'm thinking of results 
in the 'main' tree seeing only /one/ single commit for the project.

> I'm not sure what you mean by 'project_collapsed' though.

All the commits on the project branch are 'collapsed' into one single 
commit/delta, and then that /single/ commit is merged to 'main'. Rest 
of the world sees:

o-o-o-o-o--o-o-o--o-o-o---m <- main branch
                        \ /
                         o <- project

> correctly, each branch-head represents one 'collapse'.

Not quite. It represents a branch with one or more commits. In the 
Linux and git work flow, multiple commits are left as is.

> gitk is great for visualizing what you've done and what the repo 
> looks like. Use and abuse it frequently every time you're unsure 
> what was you just did. It's the best way to quickly learn what 
> happens, really.

I do. It rocks! :)

> If you just want to distribute snapshots I suggest you do take a 
> look at git-tar-tree. Junio makes nice use of it in the git 
> Makefile (the dist: target).

Neat.

Though, I probably should stay away from the git Makefile for now. 
<cough>.

> Personally I think metadata is evil.

Not sure I agree. Silly/redundant meta-data can be evil alright. But 
I'm talking about meta-data which is not there and potentially not 
reconstructable.

> Renames will still be auto-detected anyway,

Chances are so, yes. Definitely with the git and Linux workflows.

The traditional workflow for the software project I'm thinking of is 
different though. One commit may encompass multiple renames and edits 
of a file (discouraged, but it's possible).

If my understanding is correct, following back history for such cases 
would be difficult.

There is an argument that that 'traditional' process should be 
changed. However, leaving aside that argument, I'd like to know if 
git could accomodate that process.

> be able to detect a rename is if you rename a file and hack it up 
> so it doesn't even come close to matching its origin (close in this 
> case is 80% by default, I think). In those cases it isn't so much a 
> rename as a rewrite.

Exactly - this is the case I'm concerned about. Imagine that you'd 
like to be follow the history back through the rewrite and through to 
the original file.

> IMO this is far better than having to tell git "I renamed this file 
> to that", since it also detects code-copying with modifications, 
> and it's usually quick enough to find those renames as well.

I think so too, but that involves arguing that very very 
long-standing workflows should be changed to accomodate git. I intend 
to make that argument to the 'project' concerned, however I would 
also like to be say git could equally well deal with the 
'traditional' workflow, modulo having to explicitely use (say) 
git-mv.

>> 1. Git currently doesn't have 'porcelain' to do this, presumably there'd be 
>> no objection to one?
>> 
>
> 	$ git checkout master
> 	$ git pull . project

Right, but 'pull' isn't what I mean :).

I mean:

 	$ git checkout project
 	$ git pull . master
 	$ git checkout -b tmp project
 	$ git diff project..master | <git apply I think>

> If, for some reason, you want to combine lots of commits into a single 
> mega-patch (like Linus does for each release of the kernel), you can do:
>
> 	$ git diff $(git merge-base main project) project > patch-file

Right.

> Then you can apply patch-file to whatever branch you want and make 
> the commit as if it was a single change-set. I'd recommend against 
> it unless you're just toying around though. It's a bad idea to lie 
> in a projects history.

Presume that 'project' in the workflow is defined as

 	"achieve one goal with one commit to the master"

So by definition, it always correct that the project only ever has 
one commit.

The trouble is that /sometimes/ projects do indeed 'rename and 
rewrite' a file. At present, chances are git might not notice this, 
and ability to follow history through the rename+rewrite would be 
lost.

I'm wondering whether:

- this could be solved?
- how? (some additional advisory-only meta-data in the
   index-cache and commit?)

If there is consensus on an acceptable way, I'm willing to implement 
it. (I was thinking of just adding 'rename' headers to the commit 
objects, then teaching diffcore to consider them in addition to 
current heuristics).

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Be nice to people on the way up, because you'll meet them on your way down.
 		-- Wilson Mizner

^ permalink raw reply

* Merge question
From: Bertrand Jacquin @ 2006-03-01 16:31 UTC (permalink / raw)
  To: git

Hello,

Maybe someone could explain me something I can't find in docs.

In a have a repo a, and a repo b.
The a's arbo is :
            2eme_annee/jacqui_b/C/projects/
The b's orbo is :
            my_ls-l

How could I merge b in a and merge b's blob in
2eme_annee/jacqui_b/C/projects/ to have
2eme_annee/jacqui_b/C/projects/my_ls-l ?

--
Beber
#e.fr@freenode

^ permalink raw reply

* Re: impure renames / history tracking
From: Linus Torvalds @ 2006-03-01 17:13 UTC (permalink / raw)
  To: Paul Jakma; +Cc: Andreas Ericsson, git list
In-Reply-To: <Pine.LNX.4.64.0603011558390.13612@sheen.jakma.org>

On Wed, 1 Mar 2006, Paul Jakma wrote:
> 
> FWIW, I think git's rename handling is really nice. It's just I suspect, being
> a heuristic, it won't be able to follow history reliably across 'very impure'
> renames.

The thing is, it does better than anything that _tries_ to be "reliable".

I can pretty much _guarantee_ that you can't do it better.

Tracking "inodes" - aka file identities - (which is what BK does, and I 
assume what SVN does) is fundamentally problematic. I particular, it's a 
horrible problem when two inodes "meet" under the same name. You now have 
two identities for the same file, and you're fundamentally screwed.

And don't tell me it doesn't happen. It _does_ happen, and it did happen 
with the kernel under BK.

It doesn't even need renames to be a problem. JUST THE FACT THAT YOU TRY 
TO TRACK FILE "IDENTITY" HISTORY IS BROKEN. For example, take CVS, which 
doesn't actually try to do renames, but _does_ try to track the identity 
of a file, since all the history is tied into that identity: think about 
what happens in Attic when a file is deleted. Completely broken model.

Now, CVS doesn't tend to show the problems very much, because people don't 
actually use branches that much (they are a pain in the neck), and they 
sure as hell try to avoid deleting and creating the same filename under a 
branch and on HEAD. I'm sure you can do it, but I'm also pretty sure 
there's a lot of old projects around that have ended up moving the ,v 
files around to play rename/delete games.

And that's really fundamental. CVS doesn't show the problems so much, 
because CVS actively tries to make it hard to do these things.

With renames-tracking-file-identities, it's _really_ easy to get some 
major confusion going. What happens when one branch creates a file, and 
another one renames a file to that same name, and they merge?

Don't tell me it doesn't happen. It happened under BK. The way BK "solved" 
it was to keep the two separate identities: one of them got resolved to 
the new filename, the other one went into the "deleted" directory. Guess 
what happens when the side that got merged into "deleted" continues to 
edit the file? That's right - their edits happen on the deleted file, and 
never show up in the real tree in a subsequent merge ever again.

And as far as I can tell, BK really did the best you can do. Following 
file identities really _is_ fundamentally broken. It sounds like a nice 
idea, but while you migth solve a few problems, you create a whole raft of 
much more fundamental problems.

So next time you think about a merge that migt have been improved by 
tracking renames, please also think about a merge where one of the 
filenames came from two or more different sources through an earlier 
merge, and thank your benevolent Gods that they instructed me to make git 
be based purely on file contents.

		Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox