Git development
 help / color / mirror / Atom feed
* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Jakub Narebski @ 2006-04-26  5:06 UTC (permalink / raw)
  To: git
In-Reply-To: <444EAE7C.5010402@vilain.net>

Sam Vilain wrote:

> Junio C Hamano wrote:

>>> 3. sub-projects
>>>
>>>    In this case, the commit on the "main" commit line would have a
>>>    "prior" link to the commit on the sub-project.  The sub-project
>>>    would effectively be its own head with copied commits objects on
>>>    the main head.
>>>
>>
>>You say you can have only one "prior" per commit, which makes
>>this unsuitable to bind multiple subprojects into a larger
>>project (the earlier "bind" proposal allows zero or more).
> 
> It would still support that. Each commit to the sub-project involves a
> change to the tree of the "main" commit line (a copy of the commit into
> a sub-directory of it). The advantage is that the "tree" in the main
> commit is the combined tree, you don't need to treat the case specially
> to just get the contents out.

As far as I understand, for subproject commit "bind" link (and perhaps the
keyword/name "link" or "ref" would be better than "related") point to other
subprojects commits (trees), while the Sam's "prior (3)" example link would
point to the toplevel project (gathering all subprojects) commit, and it
would probably be named/noted "toplevel", not "prior".

Am I correct?

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* [PATCH/RFC] reverse the pack-objects delta window logic
From: Nicolas Pitre @ 2006-04-26  3:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

This allows for keeping a single delta index constant while delta 
targets are tested against the same base object.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

Note, this is a RFC particularly to Junio since the resulting pack is 
larger than without the patch with git-repack -a -f.  However using a 
subsequent git-repack -a brings the pack size down to expected size.  So 
I'm not sure I've got everything right.

diff --git a/pack-objects.c b/pack-objects.c
index c0acc46..33027a8 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -19,19 +19,17 @@ struct object_entry {
 	unsigned long offset;	/* offset into the final pack file;
 				 * nonzero if already written.
 				 */
-	unsigned int depth;	/* delta depth */
-	unsigned int delta_limit;	/* base adjustment for in-pack delta */
+	unsigned int delta_limit;	/* deepest delta from this object */
 	unsigned int hash;	/* name hint hash */
 	enum object_type type;
 	enum object_type in_pack_type;	/* could be delta */
 	unsigned long delta_size;	/* delta data size (uncompressed) */
 	struct object_entry *delta;	/* delta base object */
-	struct packed_git *in_pack; 	/* already in pack */
-	unsigned int in_pack_offset;
 	struct object_entry *delta_child; /* delitified objects who bases me */
 	struct object_entry *delta_sibling; /* other deltified objects who
-					     * uses the same base as me
-					     */
+					       uses the same base as me */
+	struct packed_git *in_pack; 	/* already in pack */
+	unsigned int in_pack_offset;
 	int preferred_base;	/* we do not pack this, but is encouraged to
 				 * be used as the base objectto delta huge
 				 * objects against.
@@ -906,11 +904,11 @@ static void get_object_details(void)
 	for (i = 0, entry = objects; i < nr_objects; i++, entry++)
 		check_object(entry);
 
-	if (nr_objects == nr_result) {
+	if (!no_reuse_delta && nr_objects == nr_result) {
 		/*
-		 * Depth of objects that depend on the entry -- this
-		 * is subtracted from depth-max to break too deep
-		 * delta chain because of delta data reusing.
+		 * We must determine the maximum depth of reused deltas
+		 * for those objects used as their base before find_deltas()
+		 * starts considering them as potential delta targets.
 		 * However, we loosen this restriction when we know we
 		 * are creating a thin pack -- it will have to be
 		 * expanded on the other end anyway, so do not
@@ -1004,64 +1002,78 @@ struct unpacked {
  * more importantly, the bigger file is likely the more recent
  * one.
  */
-static int try_delta(struct unpacked *cur, struct unpacked *old, unsigned max_depth)
+static int try_delta(struct unpacked *trg, struct unpacked *src,
+		     struct delta_index *src_index, unsigned max_depth)
 {
-	struct object_entry *cur_entry = cur->entry;
-	struct object_entry *old_entry = old->entry;
-	unsigned long size, oldsize, delta_size, sizediff;
-	long max_size;
+	struct object_entry *trg_entry = trg->entry;
+	struct object_entry *src_entry = src->entry;
+	unsigned long size, src_size, delta_size, sizediff, max_size;
 	void *delta_buf;
 
 	/* Don't bother doing diffs between different types */
-	if (cur_entry->type != old_entry->type)
+	if (trg_entry->type != src_entry->type)
 		return -1;
 
 	/* We do not compute delta to *create* objects we are not
 	 * going to pack.
 	 */
-	if (cur_entry->preferred_base)
-		return -1;
+	if (trg_entry->preferred_base)
+		return 0;
 
-	/* If the current object is at pack edge, take the depth the
-	 * objects that depend on the current object into account --
-	 * otherwise they would become too deep.
+	/*
+	 * Make sure deltifying this object won't make its deepest delta
+	 * too deep, but only when not producing a thin pack.
 	 */
-	if (cur_entry->delta_child) {
-		if (max_depth <= cur_entry->delta_limit)
-			return 0;
-		max_depth -= cur_entry->delta_limit;
-	}
-
-	size = cur_entry->size;
-	oldsize = old_entry->size;
-	sizediff = oldsize > size ? oldsize - size : size - oldsize;
+	if (nr_objects == nr_result && trg_entry->delta_limit >= max_depth)
+		return 0;
 
+	/* Now some size filtering euristics. */
+	size = trg_entry->size;
 	if (size < 50)
-		return -1;
-	if (old_entry->depth >= max_depth)
 		return 0;
-
-	/*
-	 * NOTE!
-	 *
-	 * We always delta from the bigger to the smaller, since that's
-	 * more space-efficient (deletes don't have to say _what_ they
-	 * delete).
-	 */
 	max_size = size / 2 - 20;
-	if (cur_entry->delta)
-		max_size = cur_entry->delta_size-1;
+	if (trg_entry->delta)
+		max_size = trg_entry->delta_size-1;
+	src_size = src_entry->size;
+	sizediff = src_size < size ? size - src_size : 0;
 	if (sizediff >= max_size)
 		return 0;
-	delta_buf = diff_delta(old->data, oldsize,
-			       cur->data, size, &delta_size, max_size);
+
+	delta_buf = create_delta(src_index, trg->data, size, &delta_size, max_size);
 	if (!delta_buf)
 		return 0;
-	cur_entry->delta = old_entry;
-	cur_entry->delta_size = delta_size;
-	cur_entry->depth = old_entry->depth + 1;
+
+	if (trg_entry->delta) {
+		/*
+		 * The target object already has a delta base but we just
+		 * found a better one.  Remove it from its former base
+		 * childhood and redetermine the base delta_limit (if used).
+		 */
+		struct object_entry *base = trg_entry->delta;
+		struct object_entry **child_link = &base->delta_child;
+		base->delta_limit = 0;
+		while (*child_link) {
+			if (*child_link == trg_entry) {
+				*child_link = trg_entry->delta_sibling;
+				if (nr_objects != nr_result)
+					break;
+				continue;
+			}
+			if (base->delta_limit <= (*child_link)->delta_limit)
+				base->delta_limit =
+					(*child_link)->delta_limit + 1;
+			child_link = &(*child_link)->delta_sibling;
+		}
+	}
+
+	trg_entry->delta = src_entry;
+	trg_entry->delta_size = delta_size;
+	trg_entry->delta_sibling = src_entry->delta_child;
+	src_entry->delta_child = trg_entry;
+	if (src_entry->delta_limit <= trg_entry->delta_limit)
+		src_entry->delta_limit = trg_entry->delta_limit + 1;
 	free(delta_buf);
-	return 0;
+	return 1;
 }
 
 static void progress_interval(int signum)
@@ -1078,14 +1090,15 @@ static void find_deltas(struct object_en
 	unsigned last_percent = 999;
 
 	memset(array, 0, array_size);
-	i = nr_objects;
+	i = 0;
 	idx = 0;
 	if (progress)
 		fprintf(stderr, "Deltifying %d objects.\n", nr_result);
 
-	while (--i >= 0) {
-		struct object_entry *entry = list[i];
+	while (i < nr_objects) {
+		struct object_entry *entry = list[i++];
 		struct unpacked *n = array + idx;
+		struct delta_index *delta_index;
 		unsigned long size;
 		char type[10];
 		int j;
@@ -1113,7 +1126,13 @@ static void find_deltas(struct object_en
 		n->entry = entry;
 		n->data = read_sha1_file(entry->sha1, type, &size);
 		if (size != entry->size)
-			die("object %s inconsistent object length (%lu vs %lu)", sha1_to_hex(entry->sha1), size, entry->size);
+			die("object %s inconsistent object length (%lu vs %lu)",
+			    sha1_to_hex(entry->sha1), size, entry->size);
+		if (!size)
+			continue;
+		delta_index = create_delta_index(n->data, size);
+		if (!delta_index)
+			die("out of memory");
 
 		j = window;
 		while (--j > 0) {
@@ -1124,18 +1143,10 @@ static void find_deltas(struct object_en
 			m = array + other_idx;
 			if (!m->entry)
 				break;
-			if (try_delta(n, m, depth) < 0)
+			if (try_delta(m, n, delta_index, depth) < 0)
 				break;
 		}
-#if 0
-		/* if we made n a delta, and if n is already at max
-		 * depth, leaving it in the window is pointless.  we
-		 * should evict it first.
-		 * ... in theory only; somehow this makes things worse.
-		 */
-		if (entry->delta && depth <= entry->depth)
-			continue;
-#endif
+		free_delta_index(delta_index);
 		idx++;
 		if (idx >= window)
 			idx = 0;

^ permalink raw reply related

* Re: [PATCH] send-email: Change from Mail::Sendmail to Net::SMTP
From: Martin Langhoff @ 2006-04-26  0:45 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git, Ryan Anderson
In-Reply-To: <1143336048205-git-send-email-normalperson@yhbt.net>

On 3/26/06, Eric Wong <normalperson@yhbt.net> wrote:
> Net::SMTP is in the base Perl distribution, so users are more
> likely to have it.  Net::SMTP also allows reusing the SMTP
> connection, so sending multiple emails is faster.

This is causing problems for me on my Debian sarge dev box.

 * If I have to believe strace(), Net::SMTP is trying to look up
"localhost" via DNS. Sketchy workaround: use 127.0.0.1.

 * This box has nothing listening on port 25. It doesn't get email
from the net, being a LAN machine, so I've told the debian config
system that we don't need an smtp daemon. Net::SMTP doesn't know how
to use /usr/bin/sendmail

 * That nasty @@VERSION@@ thing isn't valid perl, so working on this
code is a pain. Something like this (warning! broken diff ahead!)
fixes it for me.

@@ -292,6 +292,11 @@ sub send_message
        @recipients = unique_email_list(@recipients,@cc);
        my $date = strftime('%a, %d %b %Y %H:%M:%S %z', localtime($time++));

+       my $gitversion = '@@GIT_VERSION@@';
+       if ($gitversion eq '@@'.'GIT_VERSION@@') {
+           $gitversion = `git --version`;
+       }
+
        my $header = "From: $from
 To: $to
 Cc: $cc
@@ -299,11 +304,11 @@ Subject: $subject
 Reply-To: $from
 Date: $date
 Message-Id: $message_id
-X-Mailer: git-send-email @@GIT_VERSION@@
+X-Mailer: git-send-email $gitversion
 ";
        $header .= "In-Reply-To: $reply_to\n" if $reply_to;

cheers,


martin

^ permalink raw reply

* Proposal: git-based dependency tracking build system
From: Matt McCutchen @ 2006-04-26  0:13 UTC (permalink / raw)
  To: git

Dear git people,

I have been thinking for some time about how to write a foolproof
general-use build system that automatically tracks dependencies.  (Make
+ depcomp is decent as long as source files aren't added/removed or
generated often.  Cons is good but not general-purpose.)  I know there's
been some work on tracing the compiler to see which files it actually
opens.  Another possibility is to layer a FUSE filesystem over the build
tree and note which files in the virtual filesystem are opened; this has
the advantage of missing most of the boring files (e.g. shared libraries
that make up the compiler).

So I was thinking, why not write a build system that uses git's
excellent hash-based object storage support to store the files in the
virtual build tree?  Hashing the files makes it easy to notice when a
file is rewritten with the same contents, meaning files that depend on
it don't actually have to be rebuilt.  I also envision the build system
automatically marking generated files as git-ignored.

Thoughts?

-- 
Matt McCutchen
hashproduct@verizon.net
http://hashproduct.metaesthetics.net/

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links
From: Sam Vilain @ 2006-04-25 23:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vwtde2q1z.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

>> 2. revising published commits / re-basing
>>
>>    This is what "stg" et al do.  The tools allow you to commit,
>>    rewind, revise, recommit, fast forward, etc.
>>    
>>
>
>stg wants to have a link to the fork-point commit.  I do not
>know if it is absolutely necessary (you might be able to figure
>it out using merge-base, I dunno).
>  
>

"stg pull" and "stg pick" could conceivably link individual patches in a
patchset to their precedent in a previous series. This would make
looking at the evolution of individual patches over time more feasible.

>>    In this case, the "prior" link would point to the last revision of
>>    a patch.  Tools would probably
>>    
>>
>
>Probably what...???
>  
>

...probably support this as an explicit operation - ie "publish", so
that winding whilst developing is not tracked.

>> 3. sub-projects
>>
>>    In this case, the commit on the "main" commit line would have a
>>    "prior" link to the commit on the sub-project.  The sub-project
>>    would effectively be its own head with copied commits objects on
>>    the main head.
>>    
>>
>
>You say you can have only one "prior" per commit, which makes
>this unsuitable to bind multiple subprojects into a larger
>project (the earlier "bind" proposal allows zero or more).
>  
>

It would still support that. Each commit to the sub-project involves a
change to the tree of the "main" commit line (a copy of the commit into
a sub-directory of it). The advantage is that the "tree" in the main
commit is the combined tree, you don't need to treat the case specially
to just get the contents out.

This is kind of like how SVK works by default - you have one local
repository, inside which you track remote repositories. Each commit on
the upstream repository is copied individually into your own repository.
So your local repository numbers easily reach into tens of thousands
(small numbers in git land, I know) while the upstream revisions are
just in the thousands.

>There may be some narrower concrete use case for which you can
>devise coherent semantics, and teach tools and humans how to
>interpret such inter-commit relationship that are _not_
>parent-child ancestry.  For example, if you have one special
>link to point at a "cherry-picked" commit, rebasing _could_ take
>advantage of it.  When your side branch tip is at D, and commit
>D has "this was cherry-picked from commit E" note, and if you
>are rebasing your work on top of F:
>
>        A---B---C---D
>       /
>  o---o---E---F
>
>the tool can notice that F can reach E and carry forward only A,
>B, and C on top of F, omitting D.  So having such a link might
>be useful.  But if that is what you are going to do, I do not
>think you would want to conflate that with other inter-commit
>relationships, such as "previous hydra cap".
>  
>

Right, I see the problem, a strong argument for a more generic solution
as you presented.

Sam.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Sam Vilain @ 2006-04-25 23:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, jnareb
In-Reply-To: <7v7j5e2jv7.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

>Here is a related but not necessarily competing idle thought.
>
>How about an ability to "attach" arbitrary objects to commit
>objects?  The commit object would look like:
>
>    tree 0aaa3fecff73ab428999cb9156f8abc075516abe
>    parent 5a6a8c0e012137a3f0059be40ec7b2f4aa614355
>    parent e1cbc46d12a0524fd5e710cbfaf3f178fc3da504
>    related a0e7d36193b96f552073558acf5fcc1f10528917 key
>    related 0032d548db56eac9ea09b4ba05843365f6325b85 cherrypick
>    author Junio C Hamano <junkio@cox.net> 1145943079 -0700
>    committer Junio C Hamano <junkio@cox.net> 1145943079 -0700
>  
>

I agree with the criticisms of the patchset, and I think this is
probably a more comprehensive and less ambiguous solution. I originally
thought that the use cases were close enough together that they could be
called the same thing, but I see now that they are not.

IMHO one important goal is to stop "parent" from meaning anything other
than:

1. for a regular commit, the base for this change. The change consists
of the differences between the two trees.
2. for a "merge", the merge parents for this change. The change consists
of all differences between the index merges (allowing duplicate blobs at
each location) and the final merged tree.

If you were to, for a moving merge head, just record the previous merge
as a "parent", then it would make it difficult to look at the commit
history to figure out which parent links represent the last merge, and
which represent the merge bases.

This suggestion fixes that problem nicely, while being nice and flexible
for solving the other problems too.

>    Merge branch 'pb/config' into next
>
>    * pb/config:
>      Deprecate usage of git-var -l for getting config vars list
>      git-repo-config --list support
>
>The format of "related" attribute is, keyword "related", SP, 40-byte
>hexadecimal object name, SP, and arbitrary sequence of bytes
>except LF and NUL.  Let's call this arbitrary sequence of bytes
>"the nature of relation".
>
>The semantics I would attach to these "related" links are as
>follows:
>
> * To the "core" level git, they do not mean anything other than
>   "you must to have these objects, and objects reachable from
>   them, if you are going to have this commit and claim your
>   repository is without missing objects".
>  
>

This is essentially correct, however you have already described a use
case where you want the behaviour to be to lose the previous commit chain:

>The reason I do not include the previous head when I reconstruct
>"pu" is because I explicitly *want* to drop history -- not
>having to carry forward a failed experiment is what is desired
>there.  Otherwise I would manage "pu" just like I currently do
>"next" and "master".  So this is not a justification to add
>something new.
>  
>

In this case, I think that there are types of relations that are more
along the lines of "don't bother following this link by default, but
warn/fail if it is unavailable depending on the user preferences".

git-fsck could then have options to prune (or archive) certain types of
optional relations. This way people can still record complete history if
they like. And people who want to mark portions of history as bad (such
as, violating copyright law) have a clear way to state that intent.

>That means "git-rev-list --objects" needs to list these objects
>(and if they are tags, commits, and trees, then what are
>reachable from them), and "git-fsck" needs to consider these
>related objects and objects reachable from them are reachable
>from this commit.  NOTHING ELSE NEEDS TO BE DONE by the core
>(obviously, cat-file needs to show them, and commit-tree needs to
>record them, but that goes without saying).
>  
>

Ok, I'll investigate that.

>Then porcelains can agree on what different kinds of nature of
>relation mean and do sensible things.  The earlier "omit the
>cherry-picked ones" example I gave can examine "cherrypick".
>  
>

Sounds good. Let things evolve.

Sam.

^ permalink raw reply

* Re: maintenance of cache-tree data
From: Junio C Hamano @ 2006-04-25 23:05 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds
In-Reply-To: <7vk69e61s4.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> Well, I was blind ;-).  As long as the whole-file SHA1 matches,
> read_cache() does not care if we have extra data after the
> series of active_nr cache entry data in the index file.
>
> I'm working on a patch now.

So I did.

There is one bad thing; so far "write-tree" was a read-only
consumer of the index file, but now it primes the cache-tree
structure and needs to update the index.  But that is minor.

While I was at it, I made this "stuffing extra cruft in the
index" slightly more generic than I needed it for this
particular application.  What I see this _might_ be useful for
are:

 - We would want to store which commit of a subproject a
   particular subdirectory came from.  This was one missing
   piece from the "bind commit" proposal that wasn't implemented
   in the jc/bind branch.

 - We might want to record "at this path there is a directory,
   albeit empty"; this cannot be expressed with an usual index
   entry.

   We might be able to use cache-tree for that, but I think this
   is something different at the logical level.  While
   cache-tree is to be fully populated (by write-tree and
   perhaps read-tree later) and invalidated partially when
   update-index and friends smudge part of the tree, this is not
   something we would want to even invalidate (IOW, it should
   always be up-to-date), so they serve different purposes.


I still haven't looked at the read-tree yet, but as I outlined
in a previous message, its intra-index merge could take
advantage of cache-tree.  "diff-index", especially "--cached"
kind, also could use it to skip unchanged subtrees altogether.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Jason Riedy @ 2006-04-25 22:17 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2lrk5$ed5$1@sea.gmane.org>

And Jakub Narebski writes:
 - I don't mean we shouldn't define semantic for each use of "related" or
 - "note" header. Just like email X-* headres have detailed form and semantic
 - (long, long time ago Sender was X-Sender for example ;-). It's just a
 - toolkit.

You just proved Linus's point.  Ever have to parse
archives of old mail?  There are many different ways
of saying the same thing, and many of the same way
of saying different things.  It's pure hell.

And people expect you to get the X-* headers correct
for whatever definition of correct they happen to have
at the moment.  ugh.  You have many de-facto semantics
for the same headers, and no way to disambiguate them.

People will need to parse and understand git archives
thirty+ years from now.  Don't place this curse on
them.

Jason

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 19:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vodypv3gz.fsf@assigned-by-dhcp.cox.net>



On Tue, 25 Apr 2006, Junio C Hamano wrote:
> >
> > Sure it does. It's an integral part of logging: we not only verify the 
> > format, we also have multiple different ways of showing it. So it 
> > definitely changes the way we "act", very fundamentally.
> 
> Unfair ;-).  I'd consider "git log" semi-Porcelain and consider
> rev-list and cat-file the true core level.

Well, "git log" is really just "git-rev-list --pretty", so whichever way 
you turn, it's there.

I come from a slightly different background, where "core git" in many ways 
originally was about "what I use" and the whole "porcelain" side ends up 
being "what people who need hand-holding use" ;)

Of course, it expanded a bit from that original definition ;)

		Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Junio C Hamano @ 2006-04-25 19:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604251233340.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Tue, 25 Apr 2006, Junio C Hamano wrote:
>> 
>> Then we should drop the author header and make it part of free
>> form text.  The core does not give any meaning to it.
>
> Sure it does. It's an integral part of logging: we not only verify the 
> format, we also have multiple different ways of showing it. So it 
> definitely changes the way we "act", very fundamentally.

Unfair ;-).  I'd consider "git log" semi-Porcelain and consider
rev-list and cat-file the true core level.

But you already made it clear that you are not opposed to 'note'
with a clear semantics "we _ignore_ it", the point was moot.

Sorry for the noise.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 19:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslo1v4zw.fsf@assigned-by-dhcp.cox.net>



On Tue, 25 Apr 2006, Junio C Hamano wrote:
> 
> Then we should drop the author header and make it part of free
> form text.  The core does not give any meaning to it.

Sure it does. It's an integral part of logging: we not only verify the 
format, we also have multiple different ways of showing it. So it 
definitely changes the way we "act", very fundamentally.

		Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Junio C Hamano @ 2006-04-25 19:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604251155530.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> And the rule is: git cares about the commit header, but not about the 
> free-form. Which means that anything it doesn't care about, it goes into 
> the free-form section, not into some "X-header" section.
>
> Whatever you build on TOP of git can have its own rules in that free-form 
> section. For example, the kernel project has this "X-header" thing called 
> the "sign-off", and git itself picked it up. There's even some support to 
> add it automatically to commits (the same way we add the "revert" info 
> automatically to commits), but nobody claims that git should "parse" that 
> information, or that it should be part of the "header".

Then we should drop the author header and make it part of free
form text.  The core does not give any meaning to it.  And the
name <email> part of the commit header as well.  The only thing
used by the core is the timestamp of the commit.

My initial 'related' without 'note' was flawed - it used
cherry-pick as an example of 'related' when it clearly should
have been 'note' (no connectivitiy required).

Having said what I wanted to say about 'note', let's clarify
what I have in mind about the 'related' that _means_
connectivity.  As I said, I am far less convinced it is a good
thing than I am about 'note' by now, but just for the sake of
completeness of the discussion.

I tend to agree with you that ability to misuse 'related' (I'd
call it 'link' to make it clear that it means connectivity) to
fetch/push "related" objects, with an unclear definition of
related-ness, is a bad thing.  Even if we fetched the objects
that are claimed to be related to the main project, if we do not
know what to do with them, it is not useful.

And for well defined connectivity, we could give separate names,
just like we have 'tree' and 'parent' in the commit header.
That's how "bind commit" was initially proposed.  It was not
'link bind'.

The suggestion of 'link bind' came primarily from the pain I
experienced when I taught rev-list --objects and fsck-objects
about it in the jc/bind branch.  If the only thing asked to the
core by 'link' is to make sure the related objects are made
available, and Porcelains take responsibility after they are
made available, we would be better off teaching the commit
parser how to parse 'link' (regardless of its nature of linkage)
and teach rev-list --objects and fsck-objects to do connectivity
just once, rather than adding 'bind' now and then having to do
the same backward incompatible change when adding something else
that requires connectivity.

There definitely needs to be an ability to specify a list of
"nature of links this repository accepts", if we were to do
'link'.  It probably should default to an empty set.  rev-list
--objects would include objects pointed by 'link' only when the
repository wants such links to be honored.  fsck-objects will
declare an object that is reachable only by a 'link' that is not
accepted by the repository "uninteresting" and let git-prune
remove it.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 19:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vr73lwkdt.fsf@assigned-by-dhcp.cox.net>



On Tue, 25 Apr 2006, Junio C Hamano wrote:
> 
> Actually, it does help Porcelain to be able to mark unrelated
> crud as 'note'. 

A "note" header that explicitly has no meaning _what-so-ever_ for git 
would be fine. Then the semantics are well-defined, and they really do 
boil down to: random strings that git will ignore, and that won't normally 
be shown by "git log".

Those are actually real semantics, the same way the current "content" is 
real semantics: we don't care about it at all, and we _guarantee_ that we 
don't care about it.

The problem with the proposed "related" thing was that it was somethign 
that git was supposed to care about, but since it had no sane semantics, 
there was no way to _make_ git care about it sanely. That was the problem.

So I'm not objecting to adding headers. I'm objecting to adding headers 
that have insane or badly defined semantics where we might be asked to do 
something for them and different versions of git migth do different 
things. 

			Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Jakub Narebski @ 2006-04-25 19:00 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0604251151350.3701@g5.osdl.org>

Linus Torvalds wrote:

> 
> 
> On Tue, 25 Apr 2006, Jakub Narebski wrote:
>> 
>> Additionally, in "related" links we require that object exist (core git),
>> regardless of detailed semantics.

And history browsers (gitk, qgit) can use it, drawing line, regardless of
semantics.

> And as I've now mentioned a hundred times, that's just unacceptable to me.
> No suggested use of this has actually been useful, that I can tell.

I don't mean we shouldn't define semantic for each use of "related" or
"note" header. Just like email X-* headres have detailed form and semantic
(long, long time ago Sender was X-Sender for example ;-). It's just a
toolkit.

As to suggested "related" (requiring object to exists) headers: "bind",
"prior", and perhaps "revert".

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Junio C Hamano @ 2006-04-25 19:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604251125010.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Tue, 25 Apr 2006, sean wrote:
>
>> On Tue, 25 Apr 2006 11:08:31 -0700 (PDT)
>> Linus Torvalds <torvalds@osdl.org> wrote:
>> 
>> > Which is exactly what I told you to do. Just don't make it a git header. 
>> 
>> Well I just don't see how making it a header, or plopping it at the
>> end of a commit message makes an iota of difference to git, while it 
>> can help porcelain.
>
> It can't help porcelain.
>
> If we have undefined or bad semantics for it, the only thing it can do is 
> _hurt_ porcelain, because it will cause confusion down the line.
>
> Semantics for data objects are _the_ most important part of a SCM. Pretty 
> much any project, in fact. 
>
> And bad or weakly defined semantics will invariably cause problems later.
>
>> But that's exactly the point, it's no different than extending git to be
>> able to store more than one comment.
>
> So why argue for it?
>
> Just use the existing comment field.

Actually, it does help Porcelain to be able to mark unrelated
crud as 'note'.  Sane people (including git barebone
Porcelainish) would just ignore it.  Unless --pretty=raw is used
the 'note' headers will not be shown.  It would unclutter
things for us.

If different Porcelains use "the existing comment field" by
defining certain mark-up to embed their own data, it has the
same "weak semantics causing confusion down the line" issue,
_and_ the crud will be shown to the end user by "git log".

So I am starting to be actually in favor of the 'note' header.

Earlier somebody wondered if that has impact on merge semantics.
I think we do _not_ care.  The core level does not track how
things changed (the operation to make preimage to postimage),
but tracks what the results of changes are (the content).

Some "misguided" set of Porcelains may come up with a convention
to record renames and token-replaces in the 'note' header to
say:

	tree 0000000000000000000000000000000000000000
        parent 0000000000000000000000000000000000000000
	author A U Thor <author@example.com> 000000000 +0000
	committer C O Mitter <comitter@example.com> 000000000 +0000
	note rename hello.c world.c
        note token-replace s/cache/index/

        Replaced old nomenclature 'cache' to 'index'.  Oh, while
        at it, I renamed hello.c to world.c.

But unlike systems that records the transformation from preimage
to postimage, we record the postimage (on "tree" header) and
preimage (by the way of "parent" header).  We (as the core and
Porcelain that do not use "note") do not even need to look at
what 'note' says.  The Porcelains that _do_ look at the note may
try to take advantage of it, and if they make better result that
would be a good thing.  I suspect such 'note rename' provided by
the end user is not trustworthy at times, so a Porcelain that
relies on that may make silent mismerge.  You may claim that is
the reason why you do not want to pull from a tree managed with
such a Porcelain.

But at the end of the day what matters is the content, and
people.

You will not be using such a Porcelain yourself, but when you
fetch the above commit, which records its tree and its parents,
git barebone Porcelainish merge will just do what it has always
done, without even looking at 'note'.  It's not like use of
'note' on the other end is forcing you to take a note on them.

Refusing to merge from a tree that is managed with a Porcelain
that uses the information in 'note rename' for its own operation
(maybe because we believe such Porcelain tends to make silent
mismerges more often) does not make much more sense than
refusing to merge from a tree whose developer uses vi (because
it tends to lose "missing LF at the end of file").  The content
matters, so you would check the merge result; and 'note' thing
is opt-in, which we opt out.

Also you ultimately trust people -- "I will pull from his tree,
because I know he is careful and has good taste".  Now the tool
they use _may_ be part of their taste, but any tool can be
misused (remember you stayed away from pulling things that have
Octopus?)

I am less (a lot less) sure about the 'related' header now,
which will be the topic of a separate message.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 19:00 UTC (permalink / raw)
  To: sean; +Cc: jnareb, git
In-Reply-To: <BAYC1-PASMTP03E0B5376ACFF165B29ED1AEBF0@CEZ.ICE>



On Tue, 25 Apr 2006, sean wrote:
> 
> It's no different for a bug tracker or other 3rd party software that wants
> to interface with git, it's bad design to force them to parse a single
> free form text comment into individual pieces to extract their meta data.
> Especially when git could easily add the ability to add multple comments
> to each commit.  

Git _does_ make that easy. It's called the "tree". It's where you add any 
arbitrary files to a commit.

The point here is that core git should do one thing, and one thing only. 
You can then build up any policy you want on top of that. But in order for 
core git to be stable, it has to have nice rules about what it cares 
about, and what it does not.

And the rule is: git cares about the commit header, but not about the 
free-form. Which means that anything it doesn't care about, it goes into 
the free-form section, not into some "X-header" section.

Whatever you build on TOP of git can have its own rules in that free-form 
section. For example, the kernel project has this "X-header" thing called 
the "sign-off", and git itself picked it up. There's even some support to 
add it automatically to commits (the same way we add the "revert" info 
automatically to commits), but nobody claims that git should "parse" that 
information, or that it should be part of the "header".

		Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 18:52 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e2lqf1$a5k$1@sea.gmane.org>



On Tue, 25 Apr 2006, Jakub Narebski wrote:
> 
> Additionally, in "related" links we require that object exist (core git),
> regardless of detailed semantics.

And as I've now mentioned a hundred times, that's just unacceptable to me. 
No suggested use of this has actually been useful, that I can tell.

		Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: sean @ 2006-04-25 18:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jnareb, git
In-Reply-To: <Pine.LNX.4.64.0604251125010.3701@g5.osdl.org>

On Tue, 25 Apr 2006 11:26:25 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:

> It can't help porcelain.
> 
> If we have undefined or bad semantics for it, the only thing it can do is 
> _hurt_ porcelain, because it will cause confusion down the line.
> 
> Semantics for data objects are _the_ most important part of a SCM. Pretty 
> much any project, in fact. 
> 
> And bad or weakly defined semantics will invariably cause problems later.

Take your example of how git-revert works today, it copies the comment from 
the original, thus keeping this semantic-free meta-data intact between
related commits.  However, you'd have to jump through hoops to accomplish
this same simple task with any third party meta data, unless it was 
burried inside the commit message text.
 
> So why argue for it?
> 
> Just use the existing comment field.

The last argument you and I had was me taking the other side, saying that 
it was fine for git to parse the free form text area to extract information; 
you rightfully showed me why that was wrong.

It's no different for a bug tracker or other 3rd party software that wants
to interface with git, it's bad design to force them to parse a single
free form text comment into individual pieces to extract their meta data.
Especially when git could easily add the ability to add multple comments
to each commit.  

Sean

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Jakub Narebski @ 2006-04-25 18:41 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0604251125010.3701@g5.osdl.org>

Linus Torvalds wrote:

> So why argue for it?
> 
> Just use the existing comment field.

For the same reason there exist X-* _header_ fields in email.

Additionally, in "related" links we require that object exist (core git),
regardless of detailed semantics.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Jakub Narebski @ 2006-04-25 18:34 UTC (permalink / raw)
  To: git
In-Reply-To: <BAYC1-PASMTP04D82622D9D5DA7E352079AEBF0@CEZ.ICE>

sean wrote:

> On Tue, 25 Apr 2006 11:08:31 -0700 (PDT)
> Linus Torvalds <torvalds@osdl.org> wrote:
> 
>> Which is exactly what I told you to do. Just don't make it a git header.
> 
> Well I just don't see how making it a header, or plopping it at the
> end of a commit message makes an iota of difference to git, while it 
> [storing information in X-* like header] can help porcelain.

And [graphical] history browsers like gitk or qgit.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 18:26 UTC (permalink / raw)
  To: sean; +Cc: jnareb, git
In-Reply-To: <BAYC1-PASMTP04D82622D9D5DA7E352079AEBF0@CEZ.ICE>



On Tue, 25 Apr 2006, sean wrote:

> On Tue, 25 Apr 2006 11:08:31 -0700 (PDT)
> Linus Torvalds <torvalds@osdl.org> wrote:
> 
> > Which is exactly what I told you to do. Just don't make it a git header. 
> 
> Well I just don't see how making it a header, or plopping it at the
> end of a commit message makes an iota of difference to git, while it 
> can help porcelain.

It can't help porcelain.

If we have undefined or bad semantics for it, the only thing it can do is 
_hurt_ porcelain, because it will cause confusion down the line.

Semantics for data objects are _the_ most important part of a SCM. Pretty 
much any project, in fact. 

And bad or weakly defined semantics will invariably cause problems later.

> But that's exactly the point, it's no different than extending git to be
> able to store more than one comment.

So why argue for it?

Just use the existing comment field.

		Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Jakub Narebski @ 2006-04-25 18:24 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0604251058490.3701@g5.osdl.org>

Linus Torvalds wrote:

> On Tue, 25 Apr 2006, Linus Torvalds wrote:
>> 
>> The "track it with pull/push" thing is NOT one such thing, however much
>> you protest. We already _have_ that thing. It's called a "ref", and it's
>> really really easy to create anywhere in .git/refs/, and the tools
>> already know how to use it.

I agree(d) that tracking pull/push with extra commit header fields is not a
good example.
 
> Btw, there are other cases for that. For example, "parent" is a
> well-specified thing that actually has very clear and unambiguous meaning.

In single parent case, "parent" means that we modified tree pointed by the
parent. Multiple parent case suggests that we combined trees pointed by
parents, most probable by merge. I'd rather we not use parent for anything
else.

> And we had a much better proposals (in the sense that it had real
> suggested _meaning_ and semantics) over the last few months for things
> like sub-projects (trees that point to other commits)

Wasn't it commits pointing to other trees (or to commits)? "bind" field
proposal suggests it. And it could be implemented using 'X-*' "related"
headers in commit.

   related a0e7d36193b96f552073558acf5fcc1f10528917 bind linux-2.6

vs. proposed

   bind f6a8248420395bc9febd66194252fc9957b0052d linux/

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: RFC: New diff-delta.c implementation
From: Rene Scharfe @ 2006-04-25 18:22 UTC (permalink / raw)
  To: Geert Bosch; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <20060424025741.GA636@adacore.com>

Geert Bosch schrieb:
> On Sat, Apr 22, 2006 at 02:36:04PM +0200, Rene Scharfe wrote:
>> You can use "indent -npro -kr -i8 -ts8 -l80 -ss -ncs" to reformat your
>> code into a similar style as used in the rest of git (settings taken
>> from Lindent which is shipped with the Linux source).
> Although I cringe at 8-space indenting, and find much of the GIT
> code close to unreadable for lack of design-level comments, I'll
> gladly reformat any code to conform to existing code standards.
> Please let me know if you've got documentation on that, as it would
> be helpful for me to know what the standard is. (No flame intended. :-)

I'm not aware of a document mandating a certain formatting.  The output
of that indent call should come close to a "standard format", because
Linus followed this style from the beginning and Junio didn't go astray.

Don't worry too much about it.  I just wanted to point out an easy way
to reformat your code to use sane indenting. :->

René

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: sean @ 2006-04-25 18:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: jnareb, git
In-Reply-To: <Pine.LNX.4.64.0604251106400.3701@g5.osdl.org>

On Tue, 25 Apr 2006 11:08:31 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:

> Which is exactly what I told you to do. Just don't make it a git header. 

Well I just don't see how making it a header, or plopping it at the
end of a commit message makes an iota of difference to git, while it 
can help porcelain.

> We do that already. Look at "git revert". Ooh. Aah. It works today.

Nice.  Gotta love git.
 
> Just don't make it something that changes semantics, and that git parses 
> and "understands". Because git clearly doesn't understand it at all, since 
> you didn't define it to have any meaning that _can_ be understood.

But that's exactly the point, it's no different than extending git to be
able to store more than one comment.   Comment1 Comment2 Comment3.  
Pure content that git need not give any semantic meaning.  Git has a 
limitation of only a single comment today, there's no semantic damage
to extending git to allow multiple comments.   And there are a few 
applications, like bug tracking etc, which could use such a feature 
to good effect.

Sean

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Linus Torvalds @ 2006-04-25 18:08 UTC (permalink / raw)
  To: sean; +Cc: jnareb, git
In-Reply-To: <BAYC1-PASMTP091348C4C33C5A0E83C012AEBF0@CEZ.ICE>



On Tue, 25 Apr 2006, sean wrote:

> On Tue, 25 Apr 2006 10:11:13 -0700 (PDT)
> Linus Torvalds <torvalds@osdl.org> wrote:
> 
> > Once you start adding data that has no clear semantics, you're screwed. At 
> > that point, it's a "track guesses" game, not a "track contents" game.
> 
> Then shouldn't Git stop tracking commit comments; they're just developer
> guesses. ;o)

No, they are pure content, and git doesn't actually give them any semantic 
meaning.

WHICH IS OK. I even suggested that you put this thing into that "pure 
content" part.

> Adding a free-form header is no different than adding a few more lines 
> of free form text at the bottom of the commit message, in neither case 
> does it change the nice clean git semantics.

Which is exactly what I told you to do. Just don't make it a git header. 

We do that already. Look at "git revert". Ooh. Aah. It works today.

Just don't make it something that changes semantics, and that git parses 
and "understands". Because git clearly doesn't understand it at all, since 
you didn't define it to have any meaning that _can_ be understood.

		Linus

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox