Git development
 help / color / mirror / Atom feed
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Junio C Hamano @ 2006-04-26  7:50 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2n72h$aqe$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> Do I understand correctly that toplevel (master project) commits have tree
> which points to combined tree, and "bind" links which points to the
> subprojects commits whose trees make up the overall tree, or does the
> master tree points to tree containing only toplevel files (overall Makefile
> for example, INSTALL or README for the whole project including
> subprojects,...)?

The plan for "bind commit" was to have the toplevel commit to
contain:

	tree -- this covers the whole tree including subprojects
        parent -- list of parents in the toplevel project
        bind -- commit object name of subproject, plus which
	        directory to graft its tree onto.

And a subproject commit, unless it contains subsubproject, would
look like just an ordinary commit.  Its tree would match the
entry in the tree the toplevel commit at the path in "bind" line
of the top-level commit.

Some reading material, from newer to older:

  * http://www.kernel.org/git/?p=git/git.git;a=blob;hb=todo;f=Subpro.txt

  This talks about the overall "vision" on how the user-level
  interaction might look like, with a sketch on how the core-level
  would help Porcelain to implement that interaction.  Most of the
  core-level support described there is in the "bind commit"
  changes, except "update-index --bind/-unbind" to record the
  information on bound subprojects in the index file.

  * http://thread.gmane.org/gmane.comp.version-control.git/15072

  This was the thread that led to the above proposal.

  * http://thread.gmane.org/gmane.comp.version-control.git/14486

  This is older.  It touches an alternative "gitlink" approach,
  which I meant to prototype but never got around to.

  Surprisingly, these two threads are mostly noise-free and
  literally every message is worth reading.

Some old but working core-side code is available at jc/bind
branch of public git.git repository.

> BTW. I have lately stumbled upon (somewhat Vault and Subversion biased)
>  http://software.ericsink.com/Beyond_CheckOut_and_CheckIn.html
> Read about Share and Pin -- it's about subprojects (when you edit out the
> flawed "branch as folder" approach of author).

Not really.  You can easily do that by checking out another
project in a separate subdirectory.

My private working area for git.git is structured like this:

	/home/junio/git.junio/.git
        		      Makefile
                              COPYING
                              Documentation/
                              ...
                              Meta/.git
                              Meta/TODO
                              Meta/Make
                              Meta/TO
                              Meta/WI
                              ...

Notice two .git directories?  That's right.  

The top-level .git repository has the familiar branches like
"maint", "master", "next", "pu", in addition to various topic
branches.

Meta/.git is a separate repository that is a clone of "todo"
branch of git.git repository.  The top-level .git repository
does not even have "todo" branch.  I just happen to push into
the same public repository git.git at kernel.org from these two
separate repositories.

The Meta/ repository is "pinned" to a specific version, without
having any funky "Pin feature", no thank you, because I have
full control of when I update what is checked out in the Meta/
directory.

What you _might_ want is a reverse of Pinning.  Sometimes, you
would want to make sure subproject part is at least this version
or later to build other parts of the whole.

But for my particular "Meta/" directory, I do not need such a
linkage.  The major reason I do not keep TODO in the main
project is because it is supposed to be a task list for me
across "maint", "master" and "next".  I do not want it to
fluctuate whenever I work on different branches.

-jc

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  7:22 UTC (permalink / raw)
  To: git
In-Reply-To: <7vlktssudl.fsf_-_@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> (On topic again)
> 
> Link from subproject commit back to the toplevel might work for
> some kind of subprojects, but it would not work for the
> subproject support that frequently comes up on this list.  The
> development of an embedded Linux device, where a Linux kernel
> source tree is grafted at kernel/ subdirectory of the toplevel
> project.  The "prior" link would be placed in the commit that
> belong to the kernel subproject, but that would never be merged
> to the Linus kernel (why should he care about one particular
> embedded device's development history).  The link must go from
> the toplevel to generic parts reusable out of the context of the
> combined project.

Yes, I guess subproject support is most needed for the "third-party embedded
(sub)project", when one sometimes have to modify (sub)project files, and
perhaps have to watch for the (sub)project version. Hmmm... if one used
Tailor (to allow for projects not managed under GIT, though I wonder if it
would be possible to link up project without [externally available] SCM)
one could use this approach for managing distribution packages, like RPMS
or debs...

Do I understand correctly that toplevel (master project) commits have tree
which points to combined tree, and "bind" links which points to the
subprojects commits whose trees make up the overall tree, or does the
master tree points to tree containing only toplevel files (overall Makefile
for example, INSTALL or README for the whole project including
subprojects,...)?


BTW. I have lately stumbled upon (somewhat Vault and Subversion biased)
 http://software.ericsink.com/Beyond_CheckOut_and_CheckIn.html
Read about Share and Pin -- it's about subprojects (when you edit out the
flawed "branch as folder" approach of author). I wonder if it could be
easily implemented in "subprojects for GIT" proposal... Of course we can do
better, i.e. original subproject repository doesn't need to be on the same
machine, we can use remote repository.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Sam Vilain @ 2006-04-26  6:51 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2mv30$k08$1@sea.gmane.org>

Jakub Narebski wrote:

>>It would still support that. Each commit to the sub-project involves a
>>change to the tree of the "main" commit line (a copy of the commit into
>>a sub-directory of it). The advantage is that the "tree" in the main
>>commit is the combined tree, you don't need to treat the case specially
>>to just get the contents out.
>>    
>>
>
>As far as I understand, for subproject commit "bind" link (and perhaps the
>keyword/name "link" or "ref" would be better than "related") point to other
>subprojects commits (trees), while the Sam's "prior (3)" example link would
>point to the toplevel project (gathering all subprojects) commit, and it
>would probably be named/noted "toplevel", not "prior".
>
>Am I correct?
>  
>

I don't think you quite get my meaning.

What I'm saying is that with the right kind of general purpose relation
between commits, you don't need "bind" at all.

Firstly, you would have your sub-project as its own commit line. That is
a fairly straightforward thing.

Secondly, the project that includes it has a corresponding commit for
each commit on the sub-project. This commit changes the portion of the
outer project's tree where the sub-project is bound.

This means that you don't need to understand this "bind" relation to be
able to extract the tree, and keeps the model simple at the expense of
an extra tree object or three per commit. It also does not restrict the
manner of the "binding", porcelains or users are free to do it
selectively, for instance.

Actually there is large similarity to this and cherry-picking. In
essence you're cherry picking every single commit from a different
commit heirarchy, except that you are applying the patches into a
sub-directory.

Sam.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Junio C Hamano @ 2006-04-26  6:50 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2n4am$1vn$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> Junio C Hamano wrote:
>
>> Jakub Narebski <jnareb@gmail.com> writes:
>> 
>>> Jakub Narebski wrote:
>>>
>>>> [...] Sam's "prior (3)" example
>>>> link would point to the toplevel project (gathering all subprojects)
>>>> commit, and it would probably be named/noted "toplevel", not "prior".
>>>
>>> Or "master" (like "master document" in DTP).
>> 
>> (Offtopic) isn't "master" in DTP more like template?
>
> Well, in (La)TeX "master document" is a document on it's own rights,
> subdocuments are transcluded using some kind of "include"-like command.

(Offtopic) Ah, the hard-core stuff.  I had something else in
mind ("master page" in "DTP for dummies"), sorry for the
confusion.

(On topic again)

Link from subproject commit back to the toplevel might work for
some kind of subprojects, but it would not work for the
subproject support that frequently comes up on this list.  The
development of an embedded Linux device, where a Linux kernel
source tree is grafted at kernel/ subdirectory of the toplevel
project.  The "prior" link would be placed in the commit that
belong to the kernel subproject, but that would never be merged
to the Linus kernel (why should he care about one particular
embedded device's development history).  The link must go from
the toplevel to generic parts reusable out of the context of the
combined project.

^ permalink raw reply

* Re: [OT] Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  6:35 UTC (permalink / raw)
  To: git
In-Reply-To: <7vzmi8sxt1.fsf_-_@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> Jakub Narebski wrote:
>>
>>> [...] Sam's "prior (3)" example
>>> link would point to the toplevel project (gathering all subprojects)
>>> commit, and it would probably be named/noted "toplevel", not "prior".
>>
>> Or "master" (like "master document" in DTP).
> 
> (Offtopic) isn't "master" in DTP more like template?

Well, in (La)TeX "master document" is a document on it's own rights,
subdocuments are transcluded using some kind of "include"-like command.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [PATCH/RFC] reverse the pack-objects delta window logic
From: Junio C Hamano @ 2006-04-26  5:45 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604252330190.18520@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> Note, this is a RFC particularly to Junio since the resulting pack is 
> larger than without the patch with git-repack -a -f.  However using a 
> subsequent git-repack -a brings the pack size down to expected size.  So 
> I'm not sure I've got everything right.

I haven't tested it seriously yet, but there is nothing that
looks obviously wrong that might cause the inflation problem,
from the cursory look after applying the patch on top of your
last round.

> +	if (nr_objects == nr_result && trg_entry->delta_limit >= max_depth)
> +		return 0;

The older code was loosening this check only for a delta chain
that is already in pack (which is limited to its previous
max_depth).  The end result is almost the same -- a thin pack
recipient would have deeper delta than it asked. The difference
is that the earlier code had implicit 2*max_depth limit, but
this one makes the chain length unbounded, which I do not think
it is necessarily a bad change.  In any case it does not explain
why you are getting larger resulting pack, though.

> +	/* Now some size filtering euristics. */
> +	size = trg_entry->size;
>  	if (size < 50)
> -		return -1;
> -	if (old_entry->depth >= max_depth)
>  		return 0;

This is necessary because you are scanning from smaller to
larger, and I think it is a good change.

> -	/*
> -	 * NOTE!
> -	 *
> -	 * We always delta from the bigger to the smaller, since that's
> -	 * more space-efficient (deletes don't have to say _what_ they
> -	 * delete).
> -	 */

This comment by Linus still applies, even though the scan order
is now reversed; no need to remove it.

> +
> +	if (trg_entry->delta) {
> +		/*
> +		 * The target object already has a delta base but we just
> +		 * found a better one.  Remove it from its former base
> +		 * childhood and redetermine the base delta_limit (if used).
> +		 */

And you are making the delta chain unbound for thin case, you
can probably omit this with the same if() here; the
recomputation seems rather expensive.

> +			die("object %s inconsistent object length (%lu vs %lu)",
> +			    sha1_to_hex(entry->sha1), size, entry->size);
> +		if (!size)
> +			continue;
> +		delta_index = create_delta_index(n->data, size);
> +		if (!delta_index)
> +			die("out of memory");

It might be worth saying "if (size < 50)" here as well; no point
wasting the delta window for small sources.

> -#if 0
> -		/* if we made n a delta, and if n is already at max
> -		 * depth, leaving it in the window is pointless.  we
> -		 * should evict it first.
> -		 * ... in theory only; somehow this makes things worse.
> -		 */
> -		if (entry->delta && depth <= entry->depth)
> -			continue;
> -#endif

I was almost tempted to suggest that the degradation you are
seeing might be related to this mystery I did not get around to
solve.  By allowing to give chance to try delta against less
optimum candidates, it appeared that we ended up making the
final pack size bigger than otherwise, which suggests that our
choice between plain undeltified and a delta half its size might
be favoring delta too much.  But it does not appear to be
related to the inflation you are seeing.

With object list taken between v1.2.3..v1.3.0 in git.git
repository and without delta reuse, 3054 objects are packed
(delta 1734) with this code.  The "next" makes 1818 delta (only
5% more), which makes me suspect that it is making a bad choice
of delta base, because the final pack size is 1.5M vs 1.9M.

The chain length distribution is a bit different (run
"git-verify-pack -v" and look at the end of its output).

The "next" version:

chain length = 1: 257 objects
chain length = 2: 189 objects
chain length = 3: 156 objects
chain length = 4: 149 objects
chain length = 5: 113 objects
chain length = 6: 105 objects
chain length = 7: 105 objects
chain length = 8: 102 objects
chain length = 9: 103 objects
chain length = 10: 539 objects

this version:

chain length = 1: 415 objects
chain length = 2: 333 objects
chain length = 3: 259 objects
chain length = 4: 197 objects
chain length = 5: 155 objects
chain length = 6: 134 objects
chain length = 7: 106 objects
chain length = 8: 69 objects
chain length = 9: 47 objects
chain length = 10: 19 objects

The resulting pack would be faster to access (it has much
shorter median chain length).

BTW, have you tried it without --no-reuse-pack on an object list
that is not thin?  It appears you are busting the depth limit.

Using the same "git rev-list --objects v1.2.3..v1.3.0" as input,
git-pack-objects without --no-reuse-pack gives this
distribution:

chain length = 1: 364 objects
chain length = 2: 269 objects
chain length = 3: 198 objects
chain length = 4: 164 objects
chain length = 5: 148 objects
chain length = 6: 123 objects
chain length = 7: 122 objects
chain length = 8: 103 objects
chain length = 9: 92 objects
chain length = 10: 234 objects
chain length = 11: 12 objects
chain length = 12: 1 object
chain length = 13: 2 objects

So it _might_ be that the depth limiting code is subtly broken
which is causing you throw away a perfectly good delta base
which in turn results in a bad pack.  The distribution from the
"next" version looks like this:

chain length = 1: 358 objects
chain length = 2: 250 objects
chain length = 3: 214 objects
chain length = 4: 169 objects
chain length = 5: 150 objects
chain length = 6: 122 objects
chain length = 7: 126 objects
chain length = 8: 100 objects
chain length = 9: 101 objects
chain length = 10: 232 objects


-- >8 --

Summary of the experiment.

# test dataset
git rev-list --objects v1.2.3..v1.3.0 >RL-1.2.3--1.3.0

# baseline: "next" version is what is on my $PATH
git-pack-objects --no-reuse-delta test-next-pack-nr <RL-1.2.3--1.3.0
git-verify-pack -v test-next-pack-nr-*.pack | tail -n 20
git-pack-objects test-next-pack <RL-1.2.3--1.3.0
git-verify-pack -v test-next-pack-*.pack | tail -n 20

# freshly compiled version with the patch in question
./git-pack-objects --no-reuse-delta test-nico-pack-nr <RL-1.2.3--1.3.0
git-verify-pack -v test-nico-pack-nr-*.pack | tail -n 20
./git-pack-objects test-nico-pack <RL-1.2.3--1.3.0
git-verify-pack -v test-nico-pack-*.pack | tail -n 20

^ permalink raw reply

* [OT] Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Junio C Hamano @ 2006-04-26  5:36 UTC (permalink / raw)
  To: git; +Cc: jnareb
In-Reply-To: <e2n01t$m8j$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> Jakub Narebski wrote:
>
>> [...] Sam's "prior (3)" example
>> link would point to the toplevel project (gathering all subprojects)
>> commit, and it would probably be named/noted "toplevel", not "prior".
>
> Or "master" (like "master document" in DTP).

(Offtopic) isn't "master" in DTP more like template?

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  5:22 UTC (permalink / raw)
  To: git
In-Reply-To: <e2mv30$k08$1@sea.gmane.org>

Jakub Narebski wrote:

> [...] Sam's "prior (3)" example
> link would point to the toplevel project (gathering all subprojects)
> commit, and it would probably be named/noted "toplevel", not "prior".

Or "master" (like "master document" in DTP).

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  5:06 UTC (permalink / raw)
  To: git
In-Reply-To: <444EAE7C.5010402@vilain.net>

Sam Vilain wrote:

> Junio C Hamano wrote:

>>> 3. sub-projects
>>>
>>>    In this case, the commit on the "main" commit line would have a
>>>    "prior" link to the commit on the sub-project.  The sub-project
>>>    would effectively be its own head with copied commits objects on
>>>    the main head.
>>>
>>
>>You say you can have only one "prior" per commit, which makes
>>this unsuitable to bind multiple subprojects into a larger
>>project (the earlier "bind" proposal allows zero or more).
> 
> It would still support that. Each commit to the sub-project involves a
> change to the tree of the "main" commit line (a copy of the commit into
> a sub-directory of it). The advantage is that the "tree" in the main
> commit is the combined tree, you don't need to treat the case specially
> to just get the contents out.

As far as I understand, for subproject commit "bind" link (and perhaps the
keyword/name "link" or "ref" would be better than "related") point to other
subprojects commits (trees), while the Sam's "prior (3)" example link would
point to the toplevel project (gathering all subprojects) commit, and it
would probably be named/noted "toplevel", not "prior".

Am I correct?

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* [PATCH/RFC] reverse the pack-objects delta window logic
From: Nicolas Pitre @ 2006-04-26  3:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

This allows for keeping a single delta index constant while delta 
targets are tested against the same base object.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

Note, this is a RFC particularly to Junio since the resulting pack is 
larger than without the patch with git-repack -a -f.  However using a 
subsequent git-repack -a brings the pack size down to expected size.  So 
I'm not sure I've got everything right.

diff --git a/pack-objects.c b/pack-objects.c
index c0acc46..33027a8 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -19,19 +19,17 @@ struct object_entry {
 	unsigned long offset;	/* offset into the final pack file;
 				 * nonzero if already written.
 				 */
-	unsigned int depth;	/* delta depth */
-	unsigned int delta_limit;	/* base adjustment for in-pack delta */
+	unsigned int delta_limit;	/* deepest delta from this object */
 	unsigned int hash;	/* name hint hash */
 	enum object_type type;
 	enum object_type in_pack_type;	/* could be delta */
 	unsigned long delta_size;	/* delta data size (uncompressed) */
 	struct object_entry *delta;	/* delta base object */
-	struct packed_git *in_pack; 	/* already in pack */
-	unsigned int in_pack_offset;
 	struct object_entry *delta_child; /* delitified objects who bases me */
 	struct object_entry *delta_sibling; /* other deltified objects who
-					     * uses the same base as me
-					     */
+					       uses the same base as me */
+	struct packed_git *in_pack; 	/* already in pack */
+	unsigned int in_pack_offset;
 	int preferred_base;	/* we do not pack this, but is encouraged to
 				 * be used as the base objectto delta huge
 				 * objects against.
@@ -906,11 +904,11 @@ static void get_object_details(void)
 	for (i = 0, entry = objects; i < nr_objects; i++, entry++)
 		check_object(entry);
 
-	if (nr_objects == nr_result) {
+	if (!no_reuse_delta && nr_objects == nr_result) {
 		/*
-		 * Depth of objects that depend on the entry -- this
-		 * is subtracted from depth-max to break too deep
-		 * delta chain because of delta data reusing.
+		 * We must determine the maximum depth of reused deltas
+		 * for those objects used as their base before find_deltas()
+		 * starts considering them as potential delta targets.
 		 * However, we loosen this restriction when we know we
 		 * are creating a thin pack -- it will have to be
 		 * expanded on the other end anyway, so do not
@@ -1004,64 +1002,78 @@ struct unpacked {
  * more importantly, the bigger file is likely the more recent
  * one.
  */
-static int try_delta(struct unpacked *cur, struct unpacked *old, unsigned max_depth)
+static int try_delta(struct unpacked *trg, struct unpacked *src,
+		     struct delta_index *src_index, unsigned max_depth)
 {
-	struct object_entry *cur_entry = cur->entry;
-	struct object_entry *old_entry = old->entry;
-	unsigned long size, oldsize, delta_size, sizediff;
-	long max_size;
+	struct object_entry *trg_entry = trg->entry;
+	struct object_entry *src_entry = src->entry;
+	unsigned long size, src_size, delta_size, sizediff, max_size;
 	void *delta_buf;
 
 	/* Don't bother doing diffs between different types */
-	if (cur_entry->type != old_entry->type)
+	if (trg_entry->type != src_entry->type)
 		return -1;
 
 	/* We do not compute delta to *create* objects we are not
 	 * going to pack.
 	 */
-	if (cur_entry->preferred_base)
-		return -1;
+	if (trg_entry->preferred_base)
+		return 0;
 
-	/* If the current object is at pack edge, take the depth the
-	 * objects that depend on the current object into account --
-	 * otherwise they would become too deep.
+	/*
+	 * Make sure deltifying this object won't make its deepest delta
+	 * too deep, but only when not producing a thin pack.
 	 */
-	if (cur_entry->delta_child) {
-		if (max_depth <= cur_entry->delta_limit)
-			return 0;
-		max_depth -= cur_entry->delta_limit;
-	}
-
-	size = cur_entry->size;
-	oldsize = old_entry->size;
-	sizediff = oldsize > size ? oldsize - size : size - oldsize;
+	if (nr_objects == nr_result && trg_entry->delta_limit >= max_depth)
+		return 0;
 
+	/* Now some size filtering euristics. */
+	size = trg_entry->size;
 	if (size < 50)
-		return -1;
-	if (old_entry->depth >= max_depth)
 		return 0;
-
-	/*
-	 * NOTE!
-	 *
-	 * We always delta from the bigger to the smaller, since that's
-	 * more space-efficient (deletes don't have to say _what_ they
-	 * delete).
-	 */
 	max_size = size / 2 - 20;
-	if (cur_entry->delta)
-		max_size = cur_entry->delta_size-1;
+	if (trg_entry->delta)
+		max_size = trg_entry->delta_size-1;
+	src_size = src_entry->size;
+	sizediff = src_size < size ? size - src_size : 0;
 	if (sizediff >= max_size)
 		return 0;
-	delta_buf = diff_delta(old->data, oldsize,
-			       cur->data, size, &delta_size, max_size);
+
+	delta_buf = create_delta(src_index, trg->data, size, &delta_size, max_size);
 	if (!delta_buf)
 		return 0;
-	cur_entry->delta = old_entry;
-	cur_entry->delta_size = delta_size;
-	cur_entry->depth = old_entry->depth + 1;
+
+	if (trg_entry->delta) {
+		/*
+		 * The target object already has a delta base but we just
+		 * found a better one.  Remove it from its former base
+		 * childhood and redetermine the base delta_limit (if used).
+		 */
+		struct object_entry *base = trg_entry->delta;
+		struct object_entry **child_link = &base->delta_child;
+		base->delta_limit = 0;
+		while (*child_link) {
+			if (*child_link == trg_entry) {
+				*child_link = trg_entry->delta_sibling;
+				if (nr_objects != nr_result)
+					break;
+				continue;
+			}
+			if (base->delta_limit <= (*child_link)->delta_limit)
+				base->delta_limit =
+					(*child_link)->delta_limit + 1;
+			child_link = &(*child_link)->delta_sibling;
+		}
+	}
+
+	trg_entry->delta = src_entry;
+	trg_entry->delta_size = delta_size;
+	trg_entry->delta_sibling = src_entry->delta_child;
+	src_entry->delta_child = trg_entry;
+	if (src_entry->delta_limit <= trg_entry->delta_limit)
+		src_entry->delta_limit = trg_entry->delta_limit + 1;
 	free(delta_buf);
-	return 0;
+	return 1;
 }
 
 static void progress_interval(int signum)
@@ -1078,14 +1090,15 @@ static void find_deltas(struct object_en
 	unsigned last_percent = 999;
 
 	memset(array, 0, array_size);
-	i = nr_objects;
+	i = 0;
 	idx = 0;
 	if (progress)
 		fprintf(stderr, "Deltifying %d objects.\n", nr_result);
 
-	while (--i >= 0) {
-		struct object_entry *entry = list[i];
+	while (i < nr_objects) {
+		struct object_entry *entry = list[i++];
 		struct unpacked *n = array + idx;
+		struct delta_index *delta_index;
 		unsigned long size;
 		char type[10];
 		int j;
@@ -1113,7 +1126,13 @@ static void find_deltas(struct object_en
 		n->entry = entry;
 		n->data = read_sha1_file(entry->sha1, type, &size);
 		if (size != entry->size)
-			die("object %s inconsistent object length (%lu vs %lu)", sha1_to_hex(entry->sha1), size, entry->size);
+			die("object %s inconsistent object length (%lu vs %lu)",
+			    sha1_to_hex(entry->sha1), size, entry->size);
+		if (!size)
+			continue;
+		delta_index = create_delta_index(n->data, size);
+		if (!delta_index)
+			die("out of memory");
 
 		j = window;
 		while (--j > 0) {
@@ -1124,18 +1143,10 @@ static void find_deltas(struct object_en
 			m = array + other_idx;
 			if (!m->entry)
 				break;
-			if (try_delta(n, m, depth) < 0)
+			if (try_delta(m, n, delta_index, depth) < 0)
 				break;
 		}
-#if 0
-		/* if we made n a delta, and if n is already at max
-		 * depth, leaving it in the window is pointless.  we
-		 * should evict it first.
-		 * ... in theory only; somehow this makes things worse.
-		 */
-		if (entry->delta && depth <= entry->depth)
-			continue;
-#endif
+		free_delta_index(delta_index);
 		idx++;
 		if (idx >= window)
 			idx = 0;

^ permalink raw reply related

* Re: [PATCH] send-email: Change from Mail::Sendmail to Net::SMTP
From: Martin Langhoff @ 2006-04-26  0:45 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git, Ryan Anderson
In-Reply-To: <1143336048205-git-send-email-normalperson@yhbt.net>

On 3/26/06, Eric Wong <normalperson@yhbt.net> wrote:
> Net::SMTP is in the base Perl distribution, so users are more
> likely to have it.  Net::SMTP also allows reusing the SMTP
> connection, so sending multiple emails is faster.

This is causing problems for me on my Debian sarge dev box.

 * If I have to believe strace(), Net::SMTP is trying to look up
"localhost" via DNS. Sketchy workaround: use 127.0.0.1.

 * This box has nothing listening on port 25. It doesn't get email
from the net, being a LAN machine, so I've told the debian config
system that we don't need an smtp daemon. Net::SMTP doesn't know how
to use /usr/bin/sendmail

 * That nasty @@VERSION@@ thing isn't valid perl, so working on this
code is a pain. Something like this (warning! broken diff ahead!)
fixes it for me.

@@ -292,6 +292,11 @@ sub send_message
        @recipients = unique_email_list(@recipients,@cc);
        my $date = strftime('%a, %d %b %Y %H:%M:%S %z', localtime($time++));

+       my $gitversion = '@@GIT_VERSION@@';
+       if ($gitversion eq '@@'.'GIT_VERSION@@') {
+           $gitversion = `git --version`;
+       }
+
        my $header = "From: $from
 To: $to
 Cc: $cc
@@ -299,11 +304,11 @@ Subject: $subject
 Reply-To: $from
 Date: $date
 Message-Id: $message_id
-X-Mailer: git-send-email @@GIT_VERSION@@
+X-Mailer: git-send-email $gitversion
 ";
        $header .= "In-Reply-To: $reply_to\n" if $reply_to;

cheers,


martin

^ permalink raw reply

* Proposal: git-based dependency tracking build system
From: Matt McCutchen @ 2006-04-26  0:13 UTC (permalink / raw)
  To: git

Dear git people,

I have been thinking for some time about how to write a foolproof
general-use build system that automatically tracks dependencies.  (Make
+ depcomp is decent as long as source files aren't added/removed or
generated often.  Cons is good but not general-purpose.)  I know there's
been some work on tracing the compiler to see which files it actually
opens.  Another possibility is to layer a FUSE filesystem over the build
tree and note which files in the virtual filesystem are opened; this has
the advantage of missing most of the boring files (e.g. shared libraries
that make up the compiler).

So I was thinking, why not write a build system that uses git's
excellent hash-based object storage support to store the files in the
virtual build tree?  Hashing the files makes it easy to notice when a
file is rewritten with the same contents, meaning files that depend on
it don't actually have to be rebuilt.  I also envision the build system
automatically marking generated files as git-ignored.

Thoughts?

-- 
Matt McCutchen
hashproduct@verizon.net
http://hashproduct.metaesthetics.net/

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Sam Vilain @ 2006-04-25 23:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vwtde2q1z.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

>> 2. revising published commits / re-basing
>>
>>    This is what "stg" et al do.  The tools allow you to commit,
>>    rewind, revise, recommit, fast forward, etc.
>>    
>>
>
>stg wants to have a link to the fork-point commit.  I do not
>know if it is absolutely necessary (you might be able to figure
>it out using merge-base, I dunno).
>  
>

"stg pull" and "stg pick" could conceivably link individual patches in a
patchset to their precedent in a previous series. This would make
looking at the evolution of individual patches over time more feasible.

>>    In this case, the "prior" link would point to the last revision of
>>    a patch.  Tools would probably
>>    
>>
>
>Probably what...???
>  
>

...probably support this as an explicit operation - ie "publish", so
that winding whilst developing is not tracked.

>> 3. sub-projects
>>
>>    In this case, the commit on the "main" commit line would have a
>>    "prior" link to the commit on the sub-project.  The sub-project
>>    would effectively be its own head with copied commits objects on
>>    the main head.
>>    
>>
>
>You say you can have only one "prior" per commit, which makes
>this unsuitable to bind multiple subprojects into a larger
>project (the earlier "bind" proposal allows zero or more).
>  
>

It would still support that. Each commit to the sub-project involves a
change to the tree of the "main" commit line (a copy of the commit into
a sub-directory of it). The advantage is that the "tree" in the main
commit is the combined tree, you don't need to treat the case specially
to just get the contents out.

This is kind of like how SVK works by default - you have one local
repository, inside which you track remote repositories. Each commit on
the upstream repository is copied individually into your own repository.
So your local repository numbers easily reach into tens of thousands
(small numbers in git land, I know) while the upstream revisions are
just in the thousands.

>There may be some narrower concrete use case for which you can
>devise coherent semantics, and teach tools and humans how to
>interpret such inter-commit relationship that are _not_
>parent-child ancestry.  For example, if you have one special
>link to point at a "cherry-picked" commit, rebasing _could_ take
>advantage of it.  When your side branch tip is at D, and commit
>D has "this was cherry-picked from commit E" note, and if you
>are rebasing your work on top of F:
>
>        A---B---C---D
>       /
>  o---o---E---F
>
>the tool can notice that F can reach E and carry forward only A,
>B, and C on top of F, omitting D.  So having such a link might
>be useful.  But if that is what you are going to do, I do not
>think you would want to conflate that with other inter-commit
>relationships, such as "previous hydra cap".
>  
>

Right, I see the problem, a strong argument for a more generic solution
as you presented.

Sam.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Sam Vilain @ 2006-04-25 23:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jnareb
In-Reply-To: <7v7j5e2jv7.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

>Here is a related but not necessarily competing idle thought.
>
>How about an ability to "attach" arbitrary objects to commit
>objects?  The commit object would look like:
>
>    tree 0aaa3fecff73ab428999cb9156f8abc075516abe
>    parent 5a6a8c0e012137a3f0059be40ec7b2f4aa614355
>    parent e1cbc46d12a0524fd5e710cbfaf3f178fc3da504
>    related a0e7d36193b96f552073558acf5fcc1f10528917 key
>    related 0032d548db56eac9ea09b4ba05843365f6325b85 cherrypick
>    author Junio C Hamano <junkio@cox.net> 1145943079 -0700
>    committer Junio C Hamano <junkio@cox.net> 1145943079 -0700
>  
>

I agree with the criticisms of the patchset, and I think this is
probably a more comprehensive and less ambiguous solution. I originally
thought that the use cases were close enough together that they could be
called the same thing, but I see now that they are not.

IMHO one important goal is to stop "parent" from meaning anything other
than:

1. for a regular commit, the base for this change. The change consists
of the differences between the two trees.
2. for a "merge", the merge parents for this change. The change consists
of all differences between the index merges (allowing duplicate blobs at
each location) and the final merged tree.

If you were to, for a moving merge head, just record the previous merge
as a "parent", then it would make it difficult to look at the commit
history to figure out which parent links represent the last merge, and
which represent the merge bases.

This suggestion fixes that problem nicely, while being nice and flexible
for solving the other problems too.

>    Merge branch 'pb/config' into next
>
>    * pb/config:
>      Deprecate usage of git-var -l for getting config vars list
>      git-repo-config --list support
>
>The format of "related" attribute is, keyword "related", SP, 40-byte
>hexadecimal object name, SP, and arbitrary sequence of bytes
>except LF and NUL.  Let's call this arbitrary sequence of bytes
>"the nature of relation".
>
>The semantics I would attach to these "related" links are as
>follows:
>
> * To the "core" level git, they do not mean anything other than
>   "you must to have these objects, and objects reachable from
>   them, if you are going to have this commit and claim your
>   repository is without missing objects".
>  
>

This is essentially correct, however you have already described a use
case where you want the behaviour to be to lose the previous commit chain:

>The reason I do not include the previous head when I reconstruct
>"pu" is because I explicitly *want* to drop history -- not
>having to carry forward a failed experiment is what is desired
>there.  Otherwise I would manage "pu" just like I currently do
>"next" and "master".  So this is not a justification to add
>something new.
>  
>

In this case, I think that there are types of relations that are more
along the lines of "don't bother following this link by default, but
warn/fail if it is unavailable depending on the user preferences".

git-fsck could then have options to prune (or archive) certain types of
optional relations. This way people can still record complete history if
they like. And people who want to mark portions of history as bad (such
as, violating copyright law) have a clear way to state that intent.

>That means "git-rev-list --objects" needs to list these objects
>(and if they are tags, commits, and trees, then what are
>reachable from them), and "git-fsck" needs to consider these
>related objects and objects reachable from them are reachable
>from this commit.  NOTHING ELSE NEEDS TO BE DONE by the core
>(obviously, cat-file needs to show them, and commit-tree needs to
>record them, but that goes without saying).
>  
>

Ok, I'll investigate that.

>Then porcelains can agree on what different kinds of nature of
>relation mean and do sensible things.  The earlier "omit the
>cherry-picked ones" example I gave can examine "cherrypick".
>  
>

Sounds good. Let things evolve.

Sam.

^ permalink raw reply

* Re: maintenance of cache-tree data
From: Junio C Hamano @ 2006-04-25 23:05 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds
In-Reply-To: <7vk69e61s4.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> Well, I was blind ;-).  As long as the whole-file SHA1 matches,
> read_cache() does not care if we have extra data after the
> series of active_nr cache entry data in the index file.
>
> I'm working on a patch now.

So I did.

There is one bad thing; so far "write-tree" was a read-only
consumer of the index file, but now it primes the cache-tree
structure and needs to update the index.  But that is minor.

While I was at it, I made this "stuffing extra cruft in the
index" slightly more generic than I needed it for this
particular application.  What I see this _might_ be useful for
are:

 - We would want to store which commit of a subproject a
   particular subdirectory came from.  This was one missing
   piece from the "bind commit" proposal that wasn't implemented
   in the jc/bind branch.

 - We might want to record "at this path there is a directory,
   albeit empty"; this cannot be expressed with an usual index
   entry.

   We might be able to use cache-tree for that, but I think this
   is something different at the logical level.  While
   cache-tree is to be fully populated (by write-tree and
   perhaps read-tree later) and invalidated partially when
   update-index and friends smudge part of the tree, this is not
   something we would want to even invalidate (IOW, it should
   always be up-to-date), so they serve different purposes.


I still haven't looked at the read-tree yet, but as I outlined
in a previous message, its intra-index merge could take
advantage of cache-tree.  "diff-index", especially "--cached"
kind, also could use it to skip unchanged subtrees altogether.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Jason Riedy @ 2006-04-25 22:17 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2lrk5$ed5$1@sea.gmane.org>

And Jakub Narebski writes:
 - I don't mean we shouldn't define semantic for each use of "related" or
 - "note" header. Just like email X-* headres have detailed form and semantic
 - (long, long time ago Sender was X-Sender for example ;-). It's just a
 - toolkit.

You just proved Linus's point.  Ever have to parse
archives of old mail?  There are many different ways
of saying the same thing, and many of the same way
of saying different things.  It's pure hell.

And people expect you to get the X-* headers correct
for whatever definition of correct they happen to have
at the moment.  ugh.  You have many de-facto semantics
for the same headers, and no way to disambiguate them.

People will need to parse and understand git archives
thirty+ years from now.  Don't place this curse on
them.

Jason

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 19:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vodypv3gz.fsf@assigned-by-dhcp.cox.net>



On Tue, 25 Apr 2006, Junio C Hamano wrote:
> >
> > Sure it does. It's an integral part of logging: we not only verify the 
> > format, we also have multiple different ways of showing it. So it 
> > definitely changes the way we "act", very fundamentally.
> 
> Unfair ;-).  I'd consider "git log" semi-Porcelain and consider
> rev-list and cat-file the true core level.

Well, "git log" is really just "git-rev-list --pretty", so whichever way 
you turn, it's there.

I come from a slightly different background, where "core git" in many ways 
originally was about "what I use" and the whole "porcelain" side ends up 
being "what people who need hand-holding use" ;)

Of course, it expanded a bit from that original definition ;)

		Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Junio C Hamano @ 2006-04-25 19:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604251233340.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Tue, 25 Apr 2006, Junio C Hamano wrote:
>> 
>> Then we should drop the author header and make it part of free
>> form text.  The core does not give any meaning to it.
>
> Sure it does. It's an integral part of logging: we not only verify the 
> format, we also have multiple different ways of showing it. So it 
> definitely changes the way we "act", very fundamentally.

Unfair ;-).  I'd consider "git log" semi-Porcelain and consider
rev-list and cat-file the true core level.

But you already made it clear that you are not opposed to 'note'
with a clear semantics "we _ignore_ it", the point was moot.

Sorry for the noise.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 19:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslo1v4zw.fsf@assigned-by-dhcp.cox.net>



On Tue, 25 Apr 2006, Junio C Hamano wrote:
> 
> Then we should drop the author header and make it part of free
> form text.  The core does not give any meaning to it.

Sure it does. It's an integral part of logging: we not only verify the 
format, we also have multiple different ways of showing it. So it 
definitely changes the way we "act", very fundamentally.

		Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Junio C Hamano @ 2006-04-25 19:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604251155530.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> And the rule is: git cares about the commit header, but not about the 
> free-form. Which means that anything it doesn't care about, it goes into 
> the free-form section, not into some "X-header" section.
>
> Whatever you build on TOP of git can have its own rules in that free-form 
> section. For example, the kernel project has this "X-header" thing called 
> the "sign-off", and git itself picked it up. There's even some support to 
> add it automatically to commits (the same way we add the "revert" info 
> automatically to commits), but nobody claims that git should "parse" that 
> information, or that it should be part of the "header".

Then we should drop the author header and make it part of free
form text.  The core does not give any meaning to it.  And the
name <email> part of the commit header as well.  The only thing
used by the core is the timestamp of the commit.

My initial 'related' without 'note' was flawed - it used
cherry-pick as an example of 'related' when it clearly should
have been 'note' (no connectivitiy required).

Having said what I wanted to say about 'note', let's clarify
what I have in mind about the 'related' that _means_
connectivity.  As I said, I am far less convinced it is a good
thing than I am about 'note' by now, but just for the sake of
completeness of the discussion.

I tend to agree with you that ability to misuse 'related' (I'd
call it 'link' to make it clear that it means connectivity) to
fetch/push "related" objects, with an unclear definition of
related-ness, is a bad thing.  Even if we fetched the objects
that are claimed to be related to the main project, if we do not
know what to do with them, it is not useful.

And for well defined connectivity, we could give separate names,
just like we have 'tree' and 'parent' in the commit header.
That's how "bind commit" was initially proposed.  It was not
'link bind'.

The suggestion of 'link bind' came primarily from the pain I
experienced when I taught rev-list --objects and fsck-objects
about it in the jc/bind branch.  If the only thing asked to the
core by 'link' is to make sure the related objects are made
available, and Porcelains take responsibility after they are
made available, we would be better off teaching the commit
parser how to parse 'link' (regardless of its nature of linkage)
and teach rev-list --objects and fsck-objects to do connectivity
just once, rather than adding 'bind' now and then having to do
the same backward incompatible change when adding something else
that requires connectivity.

There definitely needs to be an ability to specify a list of
"nature of links this repository accepts", if we were to do
'link'.  It probably should default to an empty set.  rev-list
--objects would include objects pointed by 'link' only when the
repository wants such links to be honored.  fsck-objects will
declare an object that is reachable only by a 'link' that is not
accepted by the repository "uninteresting" and let git-prune
remove it.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 19:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vr73lwkdt.fsf@assigned-by-dhcp.cox.net>



On Tue, 25 Apr 2006, Junio C Hamano wrote:
> 
> Actually, it does help Porcelain to be able to mark unrelated
> crud as 'note'. 

A "note" header that explicitly has no meaning _what-so-ever_ for git 
would be fine. Then the semantics are well-defined, and they really do 
boil down to: random strings that git will ignore, and that won't normally 
be shown by "git log".

Those are actually real semantics, the same way the current "content" is 
real semantics: we don't care about it at all, and we _guarantee_ that we 
don't care about it.

The problem with the proposed "related" thing was that it was somethign 
that git was supposed to care about, but since it had no sane semantics, 
there was no way to _make_ git care about it sanely. That was the problem.

So I'm not objecting to adding headers. I'm objecting to adding headers 
that have insane or badly defined semantics where we might be asked to do 
something for them and different versions of git migth do different 
things. 

			Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Jakub Narebski @ 2006-04-25 19:00 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0604251151350.3701@g5.osdl.org>

Linus Torvalds wrote:

> 
> 
> On Tue, 25 Apr 2006, Jakub Narebski wrote:
>> 
>> Additionally, in "related" links we require that object exist (core git),
>> regardless of detailed semantics.

And history browsers (gitk, qgit) can use it, drawing line, regardless of
semantics.

> And as I've now mentioned a hundred times, that's just unacceptable to me.
> No suggested use of this has actually been useful, that I can tell.

I don't mean we shouldn't define semantic for each use of "related" or
"note" header. Just like email X-* headres have detailed form and semantic
(long, long time ago Sender was X-Sender for example ;-). It's just a
toolkit.

As to suggested "related" (requiring object to exists) headers: "bind",
"prior", and perhaps "revert".

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Junio C Hamano @ 2006-04-25 19:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604251125010.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Tue, 25 Apr 2006, sean wrote:
>
>> On Tue, 25 Apr 2006 11:08:31 -0700 (PDT)
>> Linus Torvalds <torvalds@osdl.org> wrote:
>> 
>> > Which is exactly what I told you to do. Just don't make it a git header. 
>> 
>> Well I just don't see how making it a header, or plopping it at the
>> end of a commit message makes an iota of difference to git, while it 
>> can help porcelain.
>
> It can't help porcelain.
>
> If we have undefined or bad semantics for it, the only thing it can do is 
> _hurt_ porcelain, because it will cause confusion down the line.
>
> Semantics for data objects are _the_ most important part of a SCM. Pretty 
> much any project, in fact. 
>
> And bad or weakly defined semantics will invariably cause problems later.
>
>> But that's exactly the point, it's no different than extending git to be
>> able to store more than one comment.
>
> So why argue for it?
>
> Just use the existing comment field.

Actually, it does help Porcelain to be able to mark unrelated
crud as 'note'.  Sane people (including git barebone
Porcelainish) would just ignore it.  Unless --pretty=raw is used
the 'note' headers will not be shown.  It would unclutter
things for us.

If different Porcelains use "the existing comment field" by
defining certain mark-up to embed their own data, it has the
same "weak semantics causing confusion down the line" issue,
_and_ the crud will be shown to the end user by "git log".

So I am starting to be actually in favor of the 'note' header.

Earlier somebody wondered if that has impact on merge semantics.
I think we do _not_ care.  The core level does not track how
things changed (the operation to make preimage to postimage),
but tracks what the results of changes are (the content).

Some "misguided" set of Porcelains may come up with a convention
to record renames and token-replaces in the 'note' header to
say:

	tree 0000000000000000000000000000000000000000
        parent 0000000000000000000000000000000000000000
	author A U Thor <author@example.com> 000000000 +0000
	committer C O Mitter <comitter@example.com> 000000000 +0000
	note rename hello.c world.c
        note token-replace s/cache/index/

        Replaced old nomenclature 'cache' to 'index'.  Oh, while
        at it, I renamed hello.c to world.c.

But unlike systems that records the transformation from preimage
to postimage, we record the postimage (on "tree" header) and
preimage (by the way of "parent" header).  We (as the core and
Porcelain that do not use "note") do not even need to look at
what 'note' says.  The Porcelains that _do_ look at the note may
try to take advantage of it, and if they make better result that
would be a good thing.  I suspect such 'note rename' provided by
the end user is not trustworthy at times, so a Porcelain that
relies on that may make silent mismerge.  You may claim that is
the reason why you do not want to pull from a tree managed with
such a Porcelain.

But at the end of the day what matters is the content, and
people.

You will not be using such a Porcelain yourself, but when you
fetch the above commit, which records its tree and its parents,
git barebone Porcelainish merge will just do what it has always
done, without even looking at 'note'.  It's not like use of
'note' on the other end is forcing you to take a note on them.

Refusing to merge from a tree that is managed with a Porcelain
that uses the information in 'note rename' for its own operation
(maybe because we believe such Porcelain tends to make silent
mismerges more often) does not make much more sense than
refusing to merge from a tree whose developer uses vi (because
it tends to lose "missing LF at the end of file").  The content
matters, so you would check the merge result; and 'note' thing
is opt-in, which we opt out.

Also you ultimately trust people -- "I will pull from his tree,
because I know he is careful and has good taste".  Now the tool
they use _may_ be part of their taste, but any tool can be
misused (remember you stayed away from pulling things that have
Octopus?)

I am less (a lot less) sure about the 'related' header now,
which will be the topic of a separate message.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 19:00 UTC (permalink / raw)
  To: sean; +Cc: jnareb, git
In-Reply-To: <BAYC1-PASMTP03E0B5376ACFF165B29ED1AEBF0@CEZ.ICE>



On Tue, 25 Apr 2006, sean wrote:
> 
> It's no different for a bug tracker or other 3rd party software that wants
> to interface with git, it's bad design to force them to parse a single
> free form text comment into individual pieces to extract their meta data.
> Especially when git could easily add the ability to add multple comments
> to each commit.  

Git _does_ make that easy. It's called the "tree". It's where you add any 
arbitrary files to a commit.

The point here is that core git should do one thing, and one thing only. 
You can then build up any policy you want on top of that. But in order for 
core git to be stable, it has to have nice rules about what it cares 
about, and what it does not.

And the rule is: git cares about the commit header, but not about the 
free-form. Which means that anything it doesn't care about, it goes into 
the free-form section, not into some "X-header" section.

Whatever you build on TOP of git can have its own rules in that free-form 
section. For example, the kernel project has this "X-header" thing called 
the "sign-off", and git itself picked it up. There's even some support to 
add it automatically to commits (the same way we add the "revert" info 
automatically to commits), but nobody claims that git should "parse" that 
information, or that it should be part of the "header".

		Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 18:52 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2lqf1$a5k$1@sea.gmane.org>



On Tue, 25 Apr 2006, Jakub Narebski wrote:
> 
> Additionally, in "related" links we require that object exist (core git),
> regardless of detailed semantics.

And as I've now mentioned a hundred times, that's just unacceptable to me. 
No suggested use of this has actually been useful, that I can tell.

		Linus

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox