git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2/2] Add keyword unexpansion support to convert.c
@ 2007-04-17  9:41 Andy Parkins
       [not found] ` <200704171803.58940.an dyparkins@gmail.com>
                   ` (4 more replies)
  0 siblings, 5 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17  9:41 UTC (permalink / raw)
  To: git

This patch adds expansion of keywords support.  The unexpansion is only
performed when the "keywords" attribute it found for a file.  The check
for this attribute is done in the same way as the "crlf" attribute
check.

The actual unexpansion is performed by keyword_unexpand_git() which is
called from convert_to_git() when the "keywords" attribute is found.

keyword_unexpand_git() finds strings of the form

 $KEYWORD: ARBITRARY STRING$

And collapses them into

 $KEYWORD:$

No parsing of the keyword itself is performed, the content is simply
dropped.

Despite the fact that this doesn't do anything useful from the users
perspective, this patch forms the more important half of keyword
expansion support - because it prevents the expansion from entering the
repository.  It effectively creates blind spots that git tools won't
see.

convert_to_git() has also been changed so that it no longer only does
CRLF conversion.  Instead, a flag is kept to say whether any conversion
was done by the CRLF code, and then that converted buffer is passed to
keyword_unexpand_git() and the flag again updated.  It then returns 1 if
either of these conversion functions actually changed anything.

I've also included a test script to show that the keyword unexpansion is
working.  It particular demonstrates that the diff between a file with
keywords and the repository is blind to the expanded keyword.

Signed-off-by: Andy Parkins <andyparkins@gmail.com>
---
I'm not submitting this for application; I've not polished it, and I've not
written the expansion half yet.

However, I did want to show what I've been banging on about with keyword
expansion, and this does a reasonable job.  The test code shows that the idea
is sound - what goes in the repository is stable, and what appears in the
working directory can contain any arbitrary keyword expansion.

Adding expansion is harder, as I have no clue which calls to make to find
even the most basic information about an object; but I thought I'd get
feedback before I expend that effort.

Areas that might cause problems are the git-apply type of commands, I haven't
checked to see if they use convert_to_git() on their input to normalise it
for entry into the repository.  I hope so, as the CRLF support relies on it as
well :-)

 convert.c           |  115 +++++++++++++++++++++++++++++++++++++++++++++++++-
 t/t0030-keywords.sh |   76 +++++++++++++++++++++++++++++++++
 2 files changed, 188 insertions(+), 3 deletions(-)
 create mode 100755 t/t0030-keywords.sh

diff --git a/convert.c b/convert.c
index d0d4b81..0c7b270 100644
--- a/convert.c
+++ b/convert.c
@@ -230,16 +230,125 @@ static int git_path_check_crlf(const char *path)
 	return attr_crlf_check.isset;
 }
 
+/* ------------------ keywords -------------------- */
+
+static void setup_keyword_check(struct git_attr_check *check)
+{
+	static struct git_attr *attr_keyword;
+
+	if (!attr_keyword)
+		attr_keyword = git_attr("keywords", 8);
+	check->attr = attr_keyword;
+}
+
+static int git_path_check_keyword(const char *path)
+{
+	struct git_attr_check attr_keyword_check;
+
+	setup_keyword_check(&attr_keyword_check);
+
+	if (git_checkattr(path, 1, &attr_keyword_check))
+		return -1;
+	return attr_keyword_check.isset;
+}
+
+static int keyword_unexpand_git(const char *path, char **bufp, unsigned long *sizep)
+{
+	char *buffer, *nbuf, *keyword;
+	unsigned long size, keywordlength;
+	int changes = 0;
+	enum {
+		IN_VOID,
+		PRE_KEYWORD,
+		IN_KEYWORD,
+		IN_EXPANSION,
+		END_KEYWORD
+	} parser_state = IN_VOID;
+
+	size = *sizep;
+	if (!size)
+		return 0;
+	buffer = *bufp;
+
+	/*
+	 * Allocate an identically sized buffer, keyword unexpansion can
+	 * only reduce the size so we'll never overflow (although we might
+	 * waste a few bytes
+	 */
+	nbuf = xmalloc(size);
+	*bufp = nbuf;
+
+	while (size) {
+		unsigned char c;
+
+		c = *buffer;
+
+		switch( parser_state ) {
+		case IN_VOID:        /* Normal characters, wait for '$' */
+			if (c == '$')
+				parser_state = PRE_KEYWORD;
+			break;
+		case PRE_KEYWORD:    /* Gap between '$' and keyword */
+			keywordlength = 0;
+			keyword = buffer;
+			if (!isspace(c))
+				parser_state = IN_KEYWORD;
+			else
+				break;
+		case IN_KEYWORD:     /* Keyword itself */
+			if (c == ':')
+				parser_state = IN_EXPANSION;
+			else if (c == '$' || c == '\n' || c == '\r')
+				parser_state = END_KEYWORD;
+			else
+				keywordlength++;
+			break;
+		case IN_EXPANSION:   /* The expansion gets silently removed */
+			if (c == '$' || c == '\n')
+				parser_state = END_KEYWORD;
+			else {
+				changes = 1;
+				/* Every character we skip reduces the overall size */
+				(*sizep)--;
+				buffer++;
+				size--;
+			}
+			continue;
+		case END_KEYWORD:    /* End of keyword */
+			parser_state = IN_VOID;
+			break;
+		}
+
+		*nbuf++ = c;
+		buffer++;
+		size--;
+	}
+
+	return (changes != 0);
+}
+
+
+/* ------------------------------------------------ */
 int convert_to_git(const char *path, char **bufp, unsigned long *sizep)
 {
+	int changes = 0;
+
 	switch (git_path_check_crlf(path)) {
 	case 0:
-		return 0;
+		changes += 0;
 	case 1:
-		return forcecrlf_to_git(path, bufp, sizep);
+		changes += forcecrlf_to_git(path, bufp, sizep);
 	default:
-		return autocrlf_to_git(path, bufp, sizep);
+		changes += autocrlf_to_git(path, bufp, sizep);
+	}
+
+	switch (git_path_check_keyword(path)) {
+	case 0:
+		changes += 0;
+	case 1:
+		changes += keyword_unexpand_git(path, bufp, sizep);
 	}
+	return (changes != 0);
 }
 
 int convert_to_working_tree(const char *path, char **bufp, unsigned long *sizep)
diff --git a/t/t0030-keywords.sh b/t/t0030-keywords.sh
new file mode 100755
index 0000000..375acb8
--- /dev/null
+++ b/t/t0030-keywords.sh
@@ -0,0 +1,76 @@
+#!/bin/sh
+
+cd $(dirname $0)
+
+test_description='Keyword expansion'
+
+. ./test-lib.sh
+
+# Adding the attribute "keywords" turns the keyword expansion on
+# I've used "notkeywords" as an attribute as a placeholder attribute
+# but this is just "somerandomattribute", it has no meaning
+
+# Expect success because the keyword attribute should be found
+test_expect_success 'Keywords attribute present' '
+
+	echo "keywordsfile keywords" >.gitattributes &&
+
+	echo "\$keyword: anythingcangohere\$" > keywordsfile &&
+
+	git add keywordsfile &&
+	git add .gitattributes &&
+	git commit -m test-keywords &&
+
+	git check-attr keywords -- keywordsfile
+'
+
+# Expect failure because the repository version should be different from the
+# working tree version.
+#
+#  In repository : $keyword:$
+#  In working dir: $keyword: anythingcangohere$
+#
+test_expect_failure 'Keywords unexpansion active' '
+
+	git show HEAD:keywordsfile > keywordsfile.cmp &&
+	cmp keywordsfile keywordsfile.cmp
+
+'
+
+# expect success because we want to find the keyword line unexpanded in the
+# and hence appearing unchanged in the output of git-diff
+test_expect_success 'git-diff with keywords present' '
+	echo "Non-keyword containing line" >> keywordsfile &&
+	git diff -- keywordsfile | grep -qs "^ \$keyword:\$$"
+'
+
+# Expect failure because the keywords attribute should NOT be found
+test_expect_failure 'Keywords attribute absent' '
+
+	echo "keywordsfile notkeywords" >.gitattributes &&
+
+	git add .gitattributes &&
+	git commit -m test-not-keywords &&
+
+	git check-attr keywords -- keywordsfile
+
+'
+
+# If keywords are later disabled on that file, then the keyword unexpansion
+# will be ignored, so a diff should now show differences, because git is no
+# longer keyword blind
+test_expect_success 'git-diff with keywords in file but disabled' '
+	git diff -- keywordsfile | grep -qs "^diff"
+'
+
+# Expect success because the repository should be identical to the working tree
+test_expect_success 'Keywords unexpansion inactive' '
+
+	git add keywordsfile &&
+	git commit -m "test-not-keywords"
+
+	git show HEAD:keywordsfile > keywordsfile.cmp &&
+	cmp keywordsfile keywordsfile.cmp
+'
+
+test_done
-- 
1.5.1.1.821.g88bdb

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17  9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
       [not found] ` <200704171803.58940.an dyparkins@gmail.com>
@ 2007-04-17 10:09 ` Junio C Hamano
  2007-04-17 11:35   ` Andy Parkins
  2007-04-17 15:46   ` Linus Torvalds
  2007-04-17 10:41 ` Johannes Sixt
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-17 10:09 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins <andyparkins@gmail.com> writes:

> No parsing of the keyword itself is performed, the content is simply
> dropped.

You are sidestepping the most important problem by doing this.

The only sensible keyword you could have, without destroying
what git is, is blob id.  No commit id, no date, no author.

In http://article.gmane.org/gmane.comp.version-control.git/44654,
Linus said:

    I'll finish off trying to explain the problem in fundamental git terms: 
    say you have a repository with two branches, A and B, and different 
    history  on a file "xyzzy" in those two branches, but because they both 
    ended up applying the same patches, the actual file contents do end up 
    being 100% identical. So they have the same SHA1.

    What is

            git diff A..B -- xyzzy

    supposed to print?

    And *I* claim that if you don't get an immediate and empty diff, your 
    system is TOTALLY BROKEN.

Another thing he could have said is this:

	When you have such two branches, A and B, and you are on
	branch A:

	$ git checkout B

	should be immediate and instantaneous.

If you try to keyword expand commit id, date or anything that is
sensitive to *how* you got there, even though A and B have the
exact same set of blobs, you have to essentially update all of
them.  Computing what to expand to takes (perhaps prohibitively
expensive) time, but more importantly rewriting the whole 20k
(or howmanyever you have in your project) files out becomes
necessary, if your keyword expansion wants to say "oh, this file
was taken from a checkout of branch B", for obvious reasons.

Keyword expanding blob-id, or munging line-endings to CRLF form
on platforms that want it, do not have this problem, as how you
reached to the blob content does not affect the result of
expansion, therefore not just the blobs in commit A and commit B
but the working tree checked out of them must match with each
other.

Having reiterated what Linus already said why keyword expansion
and git are not friendly with each other (perhaps the reason is
because the former is stupid and git is smart), I'd try to be a
bit constructive and point out the areas you _could_ help with
in the nearby codepaths:

 * When 'diff' borrows from the working tree because the
   filesystem data matches the blob we are interested in, we
   already have a call to convert_to_git().  The diff machinery
   operates on the canonicalized representation (i.e. this is an
   area we do not need help from you). 

 * When 'checkout', 'read-tree -u' and 'merge-recursive' write
   things, we already have calls to convert_to_working_tree() to
   munge blob representation to working tree representation
   (i.e. again, this is an area we do not need help from you).

 * We do not do the borrowing from working tree when doing
   grep_sha1(), but when we grep inside a file from working tree
   with grep_file(), we do not currently make it go through
   convert_to_git() to fix line endings.  Maybe we should, if
   only for consistency.

 * We do not currently run convert_to_git() on the patch text
   given to git-apply; we could do so in parse_single_patch().

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17  9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
       [not found] ` <200704171803.58940.an dyparkins@gmail.com>
  2007-04-17 10:09 ` Junio C Hamano
@ 2007-04-17 10:41 ` Johannes Sixt
  2007-04-17 15:32 ` Linus Torvalds
  2007-04-17 21:00 ` Matthieu Moy
  4 siblings, 0 replies; 66+ messages in thread
From: Johannes Sixt @ 2007-04-17 10:41 UTC (permalink / raw)
  To: git

Andy Parkins wrote:
>         switch (git_path_check_crlf(path)) {
>         case 0:
> -               return 0;
> +               changes += 0;
>         case 1:
> -               return forcecrlf_to_git(path, bufp, sizep);
> +               changes += forcecrlf_to_git(path, bufp, sizep);
>         default:
> -               return autocrlf_to_git(path, bufp, sizep);
> +               changes += autocrlf_to_git(path, bufp, sizep);
> +       }
> +
> +       switch (git_path_check_keyword(path)) {
> +       case 0:
> +               changes += 0;
> +       case 1:
> +               changes += keyword_unexpand_git(path, bufp, sizep);
>         }

I think there are 'break's missing all along the way.

-- Hannes

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 10:09 ` Junio C Hamano
@ 2007-04-17 11:35   ` Andy Parkins
  2007-04-17 15:53     ` Linus Torvalds
                       ` (2 more replies)
  2007-04-17 15:46   ` Linus Torvalds
  1 sibling, 3 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 11:35 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

On Tuesday 2007 April 17 11:09, Junio C Hamano wrote:

> In http://article.gmane.org/gmane.comp.version-control.git/44654,
> Linus said:
>     And *I* claim that if you don't get an immediate and empty diff, your
>     system is TOTALLY BROKEN.

Well that one is easy - the file is normalised to contain collapsed keywords 
upon checkin, so diff works the same as it ever did.  The output would be 
immediate and empty so is not TOTALLY BROKEN.

> 	$ git checkout B
>
> 	should be immediate and instantaneous.

Now - that's a much better argument.  However, it's not relevant, keywords (in 
other VCSs, and so why not in git) are only updated when a file is checked 
out.  There is no need to touch every file.  It's actually beneficial, 
because the keyword in the file is the state of the file at the time it was 
checked in - which is actually more useful than updating it to the latest 
commit every time.

That means you're only ever expanding in a file that your changing anyway - so 
it's effectively free.  git-checkout would still be immediate and 
instantaneous.

> If you try to keyword expand commit id, date or anything that is
> sensitive to *how* you got there, even though A and B have the
> exact same set of blobs, you have to essentially update all of
> them.  Computing what to expand to takes (perhaps prohibitively
> expensive) time, but more importantly rewriting the whole 20k
> (or howmanyever you have in your project) files out becomes
> necessary, if your keyword expansion wants to say "oh, this file
> was taken from a checkout of branch B", for obvious reasons.

Ignoring the fact that expansion is only when a file is checked out; I'd argue 
that it's your own fault if you enable keyword expansion on twenty thousand 
files.  A lot of the discussion has been about how useless keyword expansion 
is in almost every case.  I only want it for a few files in my repository; so 
am willing to pay the small computing cost.  Obviously keywords would be 
disabled by default - in which case, you get what you deserve if you enable 
them on everything.

Putting my own selfish requirements aside, from a purely "mine is better than 
yours" point of view, git can't do something that CVS (in all it's 
horridness) can.  It's distinctly off-putting to people when they 
say "keyword expansion", that the response is "YOU'RE AN IDIOT - GO AWAY - 
YOU DON'T DESERVE TO USE GIT"; and back they'll scurry to CVS/subversion.

> Keyword expanding blob-id, or munging line-endings to CRLF form
> on platforms that want it, do not have this problem, as how you
> reached to the blob content does not affect the result of
> expansion, therefore not just the blobs in commit A and commit B
> but the working tree checked out of them must match with each
> other.

That's true - however, even if the only keyword git supports is $BlobID$, that 
would address a large proportion of people's needs.  As I said above though, 
the keywords are only expanded on checkout (and checkin to be consistent).

> Having reiterated what Linus already said why keyword expansion
> and git are not friendly with each other (perhaps the reason is
> because the former is stupid and git is smart), I'd try to be a

(This is were my "YOU'RE AN IDIOT - YOU CAN'T USE GIT" alarm goes off).  Git 
is better than CVS/subversion in every respect - save this one.  It's almost 
completely free to do (apart from the initial coding of it of course) because 
of these two factors:
 - The keywords are collapsed in the repository
 - The keywords are only expanded on checkout
It doesn't fundamentally alter anything that git does right now.

>  * We do not do the borrowing from working tree when doing
>    grep_sha1(), but when we grep inside a file from working tree
>    with grep_file(), we do not currently make it go through
>    convert_to_git() to fix line endings.  Maybe we should, if
>    only for consistency.

I'd actually argue not - git-grep searches the working tree.  The expanded 
keywords are in the working tree.  Take the CRLF case - I'm a clueless user, 
who only understands the system I'm working on.  I want to search for all the 
line endings, so I do git-grep "\r\n" - that should work, because I'm 
searching my working tree.

>  * We do not currently run convert_to_git() on the patch text
>    given to git-apply; we could do so in parse_single_patch().

Yep - definitely; the applied patch should certainly be normalised before 
application.  I'd have to add it if I wanted keywords anyway wouldn't I?



Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17  9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
                   ` (2 preceding siblings ...)
  2007-04-17 10:41 ` Johannes Sixt
@ 2007-04-17 15:32 ` Linus Torvalds
  2007-04-17 17:10   ` Andy Parkins
  2007-04-17 17:18   ` Rogan Dawes
  2007-04-17 21:00 ` Matthieu Moy
  4 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 15:32 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git



On Tue, 17 Apr 2007, Andy Parkins wrote:
> 
> Adding expansion is harder, as I have no clue which calls to make to find
> even the most basic information about an object; but I thought I'd get
> feedback before I expend that effort.

Adding expansion is not just "harder". It's basically impossible to do 
with any kind of performance.

Think "git checkout newbranch".

And think what we do about files (and whole subdirectories!) that haven't 
even changed. And finally, think about how important that optimization is 
in an SCM like git that supports branches.

I think you'll find that keyword expansion simply isn't acceptable.

But hey, you didn't believe me, so I'm happy you are trying to write the 
patches. Either you'll prove me wrong, or you'll realize just *how* broken 
the feature is.

(Yeah, "unexpansion" is easy. It's easy for all the same reasons CRLF is 
easy: it has no state!)

			Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 10:09 ` Junio C Hamano
  2007-04-17 11:35   ` Andy Parkins
@ 2007-04-17 15:46   ` Linus Torvalds
  1 sibling, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 15:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andy Parkins, git



On Tue, 17 Apr 2007, Junio C Hamano wrote:
>
> Andy Parkins <andyparkins@gmail.com> writes:
> 
> > No parsing of the keyword itself is performed, the content is simply
> > dropped.
> 
> You are sidestepping the most important problem by doing this.

I obviosly agree (and I agree with everything in your email), but:

> The only sensible keyword you could have, without destroying
> what git is, is blob id.  No commit id, no date, no author.

Yes. And I already talked about some of the very fundamental problems that 
keyword expansion has (ie switching branches is basically impossible to do 
without checking out _every_single_file_ with the "keyword" attribute 
set. There are others).

Now, unexpansion is trivial to do (it really *is* the same as the 
"CRLF->LF" translation: that's technically really just an "unexpansion" 
too). And it should work. 

The way this does unexpansion also breaks "git diff" in that it bassically 
always makes diff *ignore* the keywords. In other words, when you do

	git diff A..B

and send the diff to somebody else, they'll never see any keywords at all! 

Now, that obviously fulfills my requirement that the diff be empty if A 
and B are the same, so you should expect me to be happy. But I'm not 
happy, because if the other person also is using git, HE CANNOT EVEN APPLY 
THE DIFF! Even if he's at "A", and thus gets a diff that is supposed to 
apply *exactly*, he'll get rejects if there were other changes around the 
unexpanded keyword (which *he* will have expanded in his working tree, of 
course!)

See? Keywords simply *cannot* work. They're broken. Either you can ignore 
them (and not show them in diffs), in which case the diff is broken, or 
you can not ignore them (and show them in diffs) in which case the diff is 
*also* broken, just differently.

The only sane and workable case is to not have them at all. Any keyword 
expansion will *always* result in problems. You simply cannot do it right. 

As I mentioned originally, it results in problems in CVS too, it's just 
that CVS really has so many other issues that you seldom see the problems.

Ok, after that new rant against keywords, I will say one positive thing:

 - keyword *unexpansion* is certainly easy (exactly because it's 
   stateless)

 - if we want to support a git that only does "unexpansion", you can 
   probably hack around stupid release scripting more easily. You can add 
   your keywords *outside* of git, and git will simply ignore them. 

So I'm actually not against keyword un-expansion. It has none of the 
fundamental problems that actually expanding the keywords has. It's 
literally no different from CRLF->LF translation. It can cause confusion, 
but if it has to be explicitly enabled with an attribute and is never done 
automatically, then having some support for unexpansion and letting the 
user who wants to use keywords use his own "wrapper scripts" around git to 
do his own expansion, be my guest..

You would be unable to do fundamental operations like "git checkout B" to 
jump to another branch, but if you don't support multiple branches and 
want to just act like CVS, maybe git unexpanding the crap will help you: 
you can add your own keywords, happy in the knowledge that git simply 
won't *care* about them, and will never see them.

So I absolutely detest keyword expansion and actually have a lot of 
arguments for why I don't think it *can* work even in theory (except by 
being totally unusable), but I don't have the *un*expansion. 

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 11:35   ` Andy Parkins
@ 2007-04-17 15:53     ` Linus Torvalds
  2007-04-17 17:03       ` Andy Parkins
  2007-04-17 21:24     ` Junio C Hamano
  2007-04-20  0:30     ` Jakub Narebski
  2 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 15:53 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Junio C Hamano



On Tue, 17 Apr 2007, Andy Parkins wrote:

> On Tuesday 2007 April 17 11:09, Junio C Hamano wrote:
> 
> > In http://article.gmane.org/gmane.comp.version-control.git/44654,
> > Linus said:
> >     And *I* claim that if you don't get an immediate and empty diff, your
> >     system is TOTALLY BROKEN.
> 
> Well that one is easy - the file is normalised to contain collapsed keywords 
> upon checkin, so diff works the same as it ever did.  The output would be 
> immediate and empty so is not TOTALLY BROKEN.

No, it *is* TOTALLY BROKEN, because your keywords guaranteed that it 
doesn't even *apply*.

That's such a fundamental part of a patch that I didn't even _mention_ it, 
but I obviously should have.

If you cannot apply the diff you generate, what the hell is the *point* of 
a diff?

Try this:

 - File-A in revision 1:

	$ID: some random crap about rev1 $
	Line 2

 - same file in revision 2:
	$ID: some other random crap about rev2 $
	Line 2 got modified

and think about it. Your diff will be something like

	@@ -1,2 +1,2 @@
	 $ID:$
	-Line 2
	+Line 2 got modified

and the diff WON'T EVEN APPLY!

What kind of diff is that? Would you call it perhaps "totally broken"?

In other words, there's no way in hell you can make this work. You'll end 
up always having to edit the keywords parts of diffs to make them apply if 
they are part of the context.

(This, btw, is something that a CVS person says "so what?" about. They're 
_used_ to having to do it. It's how you do merges in CVS. Really. How many 
people have actually *worked* with branches in CVS on any complex project 
with any nontrivial work happening on the branch? I have. I hated CVS for 
many reasons. Keywords was just a small small detail in that hate 
relationship, but it was one of them!)

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 15:53     ` Linus Torvalds
@ 2007-04-17 17:03       ` Andy Parkins
  2007-04-17 18:12         ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 17:03 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Junio C Hamano

On Tuesday 2007, April 17, Linus Torvalds wrote:

> Try this:
>
>  - File-A in revision 1:
>
> 	$ID: some random crap about rev1 $
> 	Line 2
>
>  - same file in revision 2:
> 	$ID: some other random crap about rev2 $
> 	Line 2 got modified
>
> and think about it. Your diff will be something like
>
> 	@@ -1,2 +1,2 @@
> 	 $ID:$
> 	-Line 2
> 	+Line 2 got modified
>
> and the diff WON'T EVEN APPLY!

Why on earth would it not apply?  It's being applied using git-apply, 
which will unexpand the keywords as it goes - as I keep saying.  When 
the apply engine is looking for the context it's going to collapse the 
keyword, so the context will match and the diff WILL EVEN APPLY.

As Junio said in his reply, git-apply doesn't currently call 
convert_to_git(), but that's easily implemented.

> In other words, there's no way in hell you can make this work. You'll

You keep saying these sweepingly general things.  It can be made to 
work.

> end up always having to edit the keywords parts of diffs to make them
> apply if they are part of the context.

No I don't.  If I had to then the keyword code would be broken.  No one 
in their right mind would think that was an acceptable thing to do.

> (This, btw, is something that a CVS person says "so what?" about.
> They're _used_ to having to do it. It's how you do merges in CVS.
> Really. How many people have actually *worked* with branches in CVS

That's because CVS is rubbish.  What has that got to do with it?

> on any complex project with any nontrivial work happening on the
> branch? I have. I hated CVS for many reasons. Keywords was just a
> small small detail in that hate relationship, but it was one of
> them!)

You really can stop trying to persuade me that CVS is no good for 
version control - I agree, a thousand times I agree.  There are a lot 
of things that CVS does in a broken manner, that doesn't mean that git 
does the same thing in a broken manner.



Andy

-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 15:32 ` Linus Torvalds
@ 2007-04-17 17:10   ` Andy Parkins
  2007-04-17 17:18   ` Rogan Dawes
  1 sibling, 0 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 17:10 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds

On Tuesday 2007, April 17, Linus Torvalds wrote:

> And think what we do about files (and whole subdirectories!) that
> haven't even changed. And finally, think about how important that
> optimization is in an SCM like git that supports branches.
>
> I think you'll find that keyword expansion simply isn't acceptable.

As I said in my reply to Junio; other VCSs only expand keywords when the 
file itself is checked out - when git's uber-fast switching decides 
that fileA is the same in the source and target checkouts, then the 
keywords won't be updated - fine.

In this one respect git would be "as good as" instead of "infinitely 
better".  I can live with that.

> But hey, you didn't believe me, so I'm happy you are trying to write
> the patches. Either you'll prove me wrong, or you'll realize just
> *how* broken the feature is.

Could be - I'm happy to disagree.  I'm even happy to accept that I might 
fail miserably.


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 15:32 ` Linus Torvalds
  2007-04-17 17:10   ` Andy Parkins
@ 2007-04-17 17:18   ` Rogan Dawes
  2007-04-17 18:23     ` Linus Torvalds
  1 sibling, 1 reply; 66+ messages in thread
From: Rogan Dawes @ 2007-04-17 17:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andy Parkins, git

Linus Torvalds wrote:
> 
> On Tue, 17 Apr 2007, Andy Parkins wrote:
>> Adding expansion is harder, as I have no clue which calls to make to find
>> even the most basic information about an object; but I thought I'd get
>> feedback before I expend that effort.
> 
> Adding expansion is not just "harder". It's basically impossible to do 
> with any kind of performance.
> 
> Think "git checkout newbranch".
> 
> And think what we do about files (and whole subdirectories!) that haven't 
> even changed. And finally, think about how important that optimization is 
> in an SCM like git that supports branches.

Well, if the only keyword we support is $BlobId:$, then if the 
tree/object hasn't changed, then we still don't need to touch the object.

Not so?

Rogan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 17:03       ` Andy Parkins
@ 2007-04-17 18:12         ` Linus Torvalds
  2007-04-17 19:12           ` Andy Parkins
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 18:12 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Junio C Hamano



On Tue, 17 Apr 2007, Andy Parkins wrote:
> 
> Why on earth would it not apply?  It's being applied using git-apply, 
> which will unexpand the keywords as it goes - as I keep saying.

So you will never work with anybody outside of git?

What about tar-files when you export the tree? Should they have the 
expanded version? 

> You keep saying these sweepingly general things.  It can be made to 
> work.

No, it CANNOT.

Trust me. There's NO WAY IN HELL it will "work" in any other sense than 
"limp along and not be usable".

Yes, you can make it "work" if you:

 - make sure that you never _ever_ leave the git environment

   But why do you want keyword expansion then? The whole point is if you 
   have other tools than the git tools that look at a file. Even your svg 
   example was literally about having non-git tools work with the data. 
   What if you ever email the file to somebody else? 

 - you make all git tools explicitly always strip them.

   Again, what's the point again? You add keyword expansion, and then the 
   only tools that you really allow to touch it (except your "print it 
   out" example) will have to remove the keyword expansion just to work.

That's not "work". That's just stupid. Yes, you can make your "print it 
out" example work, but as alreadyt mentioned, you could have done that 
some other way, with a simple makefile rule, quite independently (and much 
better) than the SCM ever did.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 17:18   ` Rogan Dawes
@ 2007-04-17 18:23     ` Linus Torvalds
  2007-04-17 20:27       ` Rogan Dawes
  2007-04-17 23:56       ` Robin H. Johnson
  0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 18:23 UTC (permalink / raw)
  To: Rogan Dawes; +Cc: Andy Parkins, git



On Tue, 17 Apr 2007, Rogan Dawes wrote:
> 
> Well, if the only keyword we support is $BlobId:$, then if the tree/object
> hasn't changed, then we still don't need to touch the object.
> 
> Not so?

Correct. However, is that actually a useful expansion?

Most of the time, I'd expect people to want things like "last committer, 
time, story of their life" etc.. I don't think the SHA1 ID's are pretty 
enough that anybody would ever want to see them. But yes, they are 
certainly stable.

			Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 18:12         ` Linus Torvalds
@ 2007-04-17 19:12           ` Andy Parkins
       [not found]             ` <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>
                               ` (3 more replies)
  0 siblings, 4 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 19:12 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Junio C Hamano

On Tuesday 2007, April 17, Linus Torvalds wrote:

> So you will never work with anybody outside of git?

For my projects - correct; I don't care about the rest of the world.  
For projects that do - don't enable keywords, it's an option, all I 
want is to have that option.

> What about tar-files when you export the tree? Should they have the
> expanded version?

If I have to pick one then: no.  I think out-of-tree keywords are too 
much trouble for exactly the reasons you say; however, I wouldn't like 
to presume what other people think is too much trouble so I suppose it 
would have to be an option.

> > You keep saying these sweepingly general things.  It can be made to
> > work.
>
> No, it CANNOT.
>
> Trust me. There's NO WAY IN HELL it will "work" in any other sense
> than "limp along and not be usable".

Well I'm making progress, "limp along" is a significant step up from 
impossible.  :-)

Look, my primary objection to this is the SHOUTING about how impossible 
it is even though I've tried to address every problem you've thrown at 
me - I'm finding it really difficult to figure out why you're trying so 
hard to dissuade me from even _trying_.  If it all goes wrong (as I 
fully accept it might), so be it, I can live with that; I'll even be 
happy to tell you you're right and I'm wrong.  Why is this such a 
problem?

Keywords are so hated by everyone that I doubt they would ever be 
accepted into git - it's an intellectual exercise for me at this stage 
really. 

> Yes, you can make it "work" if you:
>
>  - make sure that you never _ever_ leave the git environment

As it happens, _I_ never ever leave the git environment.  Can I use 
keywords then?

You don't seem to have such a problem with git's extended diffs for 
renames or subprojects - "make sure that you never _ever_ leave the git 
environment".

>    But why do you want keyword expansion then? The whole point is if
> you have other tools than the git tools that look at a file. Even
> your svg example was literally about having non-git tools work with
> the data. What if you ever email the file to somebody else?

If by "tools" you mean other version control systems, then I don't 
intend them to work.  If by "tools" you mean gcc, inkscape, gv, bash, 
web browsers or any other fileformat that allows comments in the file 
then I expect it to be fine.  If I publish a web page, it'd be nice to 
show the ID on the page - that's all just "nice" not "necessary" 
or "I'm throwing git away if I don't get it".

Emailing to others isn't a problem either: let's say I email them my SVG 
(with keywords expanded), they make some edits and send it me back - 
worse, they send me a diff back.  I'm going to apply that diff using 
git-apply; which will collapse the keywords and apply the diff.

>  - you make all git tools explicitly always strip them.

Well, not "all", so far I've added one call to convert_to_git() in 
builtin-apply.c - it was a one line addition.  It needed doing anyway 
to deal with the CRLF correctly.  I can't see there being that many 
places that this needs doing.  I may well be wrong, if I end up 
scattering calls to convert_to_git() everywhere I'll give up.

>    Again, what's the point again? You add keyword expansion, and then
> the only tools that you really allow to touch it (except your "print
> it out" example) will have to remove the keyword expansion just to
> work.

(I don't see why my tiny "print it out" example isn't enough - it 
matters to me)

However, most tools don't care about the keywords, it's only non-git 
diff and non-git patch that are affected.  As long as the file format 
supports comments, then keyword expansion will be just fine.

> That's not "work". That's just stupid. Yes, you can make your "print
> it out" example work, but as alreadyt mentioned, you could have done
> that some other way, with a simple makefile rule, quite independently
> (and much better) than the SCM ever did.

That's just being obtuse - no other tool cares in the slightest about 
the keywords, there are more "tools" in the world than just the VCS.



Andy

-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 19:12           ` Andy Parkins
       [not found]             ` <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>
@ 2007-04-17 19:41             ` Nicolas Pitre
  2007-04-17 19:45               ` David Lang
  2007-04-17 19:54             ` Linus Torvalds
  2007-04-17 21:18             ` Martin Langhoff
  3 siblings, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-17 19:41 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Linus Torvalds, Junio C Hamano

On Tue, 17 Apr 2007, Andy Parkins wrote:

> On Tuesday 2007, April 17, Linus Torvalds wrote:
> 
> > Trust me. There's NO WAY IN HELL it will "work" in any other sense
> > than "limp along and not be usable".
> 
> Well I'm making progress, "limp along" is a significant step up from 
> impossible.  :-)
> 
> Look, my primary objection to this is the SHOUTING about how impossible 
> it is even though I've tried to address every problem you've thrown at 
> me - I'm finding it really difficult to figure out why you're trying so 
> hard to dissuade me from even _trying_.  If it all goes wrong (as I 
> fully accept it might), so be it, I can live with that; I'll even be 
> happy to tell you you're right and I'm wrong.  Why is this such a 
> problem?
> 
> Keywords are so hated by everyone that I doubt they would ever be 
> accepted into git - it's an intellectual exercise for me at this stage 
> really. 

I cannot do otherwise than ask at this point in the debate: why isn't 
the makefile rule sufficient for your needs?  Why going through a 
complicated path that no one else will support due to its numerous 
pitfalls?

> > Yes, you can make your "print it out" example work, but as alreadyt 
> > mentioned, you could have done that some other way, with a simple 
> > makefile rule, quite independently (and much better) than the SCM 
> > ever did.
> 
> That's just being obtuse - no other tool cares in the slightest about 
> the keywords, there are more "tools" in the world than just the VCS.

... which reinforces my question: why force a task on the VCS if it 
doesn't fit well with its fundamental design?


Nicolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 19:41             ` Nicolas Pitre
@ 2007-04-17 19:45               ` David Lang
       [not found]                 ` <alpin e.LFD.0.98.0704171624190.4504@xanadu.home>
  2007-04-17 20:29                 ` Nicolas Pitre
  0 siblings, 2 replies; 66+ messages in thread
From: David Lang @ 2007-04-17 19:45 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano

On Tue, 17 Apr 2007, Nicolas Pitre wrote:

> 
> On Tue, 17 Apr 2007, Andy Parkins wrote:
>
>> On Tuesday 2007, April 17, Linus Torvalds wrote:
>>
>>> Trust me. There's NO WAY IN HELL it will "work" in any other sense
>>> than "limp along and not be usable".
>>
>> Well I'm making progress, "limp along" is a significant step up from
>> impossible.  :-)
>>
>> Look, my primary objection to this is the SHOUTING about how impossible
>> it is even though I've tried to address every problem you've thrown at
>> me - I'm finding it really difficult to figure out why you're trying so
>> hard to dissuade me from even _trying_.  If it all goes wrong (as I
>> fully accept it might), so be it, I can live with that; I'll even be
>> happy to tell you you're right and I'm wrong.  Why is this such a
>> problem?
>>
>> Keywords are so hated by everyone that I doubt they would ever be
>> accepted into git - it's an intellectual exercise for me at this stage
>> really.
>
> I cannot do otherwise than ask at this point in the debate: why isn't
> the makefile rule sufficient for your needs?  Why going through a
> complicated path that no one else will support due to its numerous
> pitfalls?

not all uses of VCS's involve useing make

>>> Yes, you can make your "print it out" example work, but as alreadyt
>>> mentioned, you could have done that some other way, with a simple
>>> makefile rule, quite independently (and much better) than the SCM
>>> ever did.
>>
>> That's just being obtuse - no other tool cares in the slightest about
>> the keywords, there are more "tools" in the world than just the VCS.
>
> ... which reinforces my question: why force a task on the VCS if it
> doesn't fit well with its fundamental design?

becouse the VCS can do the job better then anything else? even if there are 
limits to what the VCS can do.

David Lang

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 19:12           ` Andy Parkins
       [not found]             ` <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>
  2007-04-17 19:41             ` Nicolas Pitre
@ 2007-04-17 19:54             ` Linus Torvalds
  2007-04-17 20:46               ` Andy Parkins
  2007-04-17 21:18             ` Martin Langhoff
  3 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 19:54 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Junio C Hamano



On Tue, 17 Apr 2007, Andy Parkins wrote:
> 
> Look, my primary objection to this is the SHOUTING about how impossible 
> it is even though I've tried to address every problem you've thrown at 
> me

No, you haven't. You've "addressed" them by stating they don't matter. It 
doesn't "matter" that a diff won't actually apply to a checked-out tree, 
because you fix it up in another tool.

And it doesn't "matter" that switching branches will just result in the 
wrong keyword expansion, because you don't care about the keywords 
actually being "correct" - they are just random strings, and it apparently 
doesn't really have to "work" as far as you're concerned.

And the "git grep" concern you just dismissed by stating that it should 
use the filesystem copy, never mind that this just means that a clean 
working tree gets different results from doing the same thing based on 
that same revision.

In other words, you simply don't seem to worry about TRUSTING the results. 
It's ok if patches don't apply, or if you get different results on working 
trees than "inside" the revision control.

And the reaon I'm shouting is that "it doesn't matter that it's a bit 
hacky" mentality is what gets you things like CVS in the end. Bit-for-bit 
results actually matter. Guarantees actually matter. And you should not be 
able to see a differece in the working tree just because you happened to 
be on a different branch before.

Those are the kind of nasty surprises that make people go: "I don't know 
what the end result is, because there is an element of 'just how did you 
happen to do that operation' to it".

I want to *trust* the SCM I use.

> I'm finding it really difficult to figure out why you're trying so 
> hard to dissuade me from even _trying_.

You can try, but you are *ignoring* the things that I say. The end result 
will either perform really badly, or you cannot trust it, or *both*. And 
you'll introduce interesting semantics like "diffs won't actually apply to 
the working tree with normal tools".

(And yes, git diffs are extended, but they *do* apply to working trees in 
all cases where normal "patch" can even support the notion in the first 
place.)

And it's not just things like diff and switching branches. If you want 
your keywords to generate things like "last modified by Xyzzy", you 
haven't even explained *how* you'd do that. Yeah, you can do

	git log --pretty=oneline --abbrev-commit -1 -- filename

etc, and you probably think it's instantaneous, but do the timings for a 
big repository with a file that hasn't been modified in months, and then 
imagine doing that for an initial checkout (say, after you set the 
"keyword" attribute for all *.c files).

Whoops. The checkout took an hour. Is that really a path you want to go 
down?

> Keywords are so hated by everyone that I doubt they would ever be 
> accepted into git - it's an intellectual exercise for me at this stage 
> really. 

If that's what it is, fine. But people on the list seem to actually *want* 
it. They must be educated what a *disaster* it would be to actually try to 
really support something like it in real life, and not just as a mental 
exercise.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 20:29                 ` Nicolas Pitre
@ 2007-04-17 20:05                   ` David Lang
  2007-04-17 21:16                     ` Nicolas Pitre
  0 siblings, 1 reply; 66+ messages in thread
From: David Lang @ 2007-04-17 20:05 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano

On Tue, 17 Apr 2007, Nicolas Pitre wrote:

> On Tue, 17 Apr 2007, David Lang wrote:
>
>> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>>
>>> I cannot do otherwise than ask at this point in the debate: why isn't
>>> the makefile rule sufficient for your needs?  Why going through a
>>> complicated path that no one else will support due to its numerous
>>> pitfalls?
>>
>> not all uses of VCS's involve useing make
>
> Use perl then.  Or a shell script.  Or even a command.com batch script.
> Or your own tool.

I would like to, however this doesn't currently integrate well with git. I've 
been told in the past that once .gitattributes is in place then the hooks for 
the crlf stuff can be generalized to allow for calls out to custom code to do 
this sort of thing.

however now it sounds as if people are saying that doing this is so evil that it 
shouldn't ever be allowed.

>>>> That's just being obtuse - no other tool cares in the slightest about
>>>> the keywords, there are more "tools" in the world than just the VCS.
>>>
>>> ... which reinforces my question: why force a task on the VCS if it
>>> doesn't fit well with its fundamental design?
>>
>> becouse the VCS can do the job better then anything else?
>
> On what basis?
>
>> even if there are
>> limits to what the VCS can do.
>
> In the context of keyword expansion I don't agree at all with this
> statement.  Git can *not* do better than an external tool and it has
> been demonstrated a few times already.

the VCS can make sure that the appropriate external code is always run when 
things are checked in/out. external tools (unless they are a complete set of 
wrappers for git) can't do that.

David Lang

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 18:23     ` Linus Torvalds
@ 2007-04-17 20:27       ` Rogan Dawes
  2007-04-17 23:56       ` Robin H. Johnson
  1 sibling, 0 replies; 66+ messages in thread
From: Rogan Dawes @ 2007-04-17 20:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andy Parkins, git

Linus Torvalds wrote:
> 
> On Tue, 17 Apr 2007, Rogan Dawes wrote:
>> Well, if the only keyword we support is $BlobId:$, then if the tree/object
>> hasn't changed, then we still don't need to touch the object.
>>
>> Not so?
> 
> Correct. However, is that actually a useful expansion?
> 
> Most of the time, I'd expect people to want things like "last committer, 
> time, story of their life" etc.. I don't think the SHA1 ID's are pretty 
> enough that anybody would ever want to see them. But yes, they are 
> certainly stable.
> 
> 			Linus

Well, one example for wanting a keyword expansion option was where 
people modify the entire file, and just email it back to the maintainer. 
It surely helps to have the SHA1 of the original object when applying 
the changes.

You also stated in another email that doing keyword expansion prevents 
people from using non-git tools. I agree that you'd probably end up with 
diffs that may include the keyword (object id) being mailed to you if 
the submitter is not using git. But when a git maintainer applies those 
diffs using git-apply, the keyword unexpansion could still take place, 
making the diffs usable in practice.

None of what I said necessarily supports the view that it is a good idea 
from the perspective of trusting the results, of course.

Regards,

Rogan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 19:45               ` David Lang
       [not found]                 ` <alpin e.LFD.0.98.0704171624190.4504@xanadu.home>
@ 2007-04-17 20:29                 ` Nicolas Pitre
  2007-04-17 20:05                   ` David Lang
  1 sibling, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-17 20:29 UTC (permalink / raw)
  To: David Lang; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano

On Tue, 17 Apr 2007, David Lang wrote:

> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
> 
> > I cannot do otherwise than ask at this point in the debate: why isn't
> > the makefile rule sufficient for your needs?  Why going through a
> > complicated path that no one else will support due to its numerous
> > pitfalls?
> 
> not all uses of VCS's involve useing make

Use perl then.  Or a shell script.  Or even a command.com batch script.  
Or your own tool.

> > > That's just being obtuse - no other tool cares in the slightest about
> > > the keywords, there are more "tools" in the world than just the VCS.
> > 
> > ... which reinforces my question: why force a task on the VCS if it
> > doesn't fit well with its fundamental design?
> 
> becouse the VCS can do the job better then anything else?

On what basis?

> even if there are
> limits to what the VCS can do.

In the context of keyword expansion I don't agree at all with this 
statement.  Git can *not* do better than an external tool and it has 
been demonstrated a few times already.


Nicolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 19:54             ` Linus Torvalds
@ 2007-04-17 20:46               ` Andy Parkins
  2007-04-17 20:52                 ` [PATCH] Add keyword collapse " Andy Parkins
                                   ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 20:46 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Junio C Hamano

On Tuesday 2007, April 17, Linus Torvalds wrote:

> No, you haven't. You've "addressed" them by stating they don't
> matter. It doesn't "matter" that a diff won't actually apply to a
> checked-out tree, because you fix it up in another tool.

Okay.  I think this is a matter of perspective - my perspective is that 
if it supplies what svn/cvs supply then that would please the people 
who want it (of whom I am one); yours is obviously that if it isn't 
perfect, it's not worth doing.  That's a reasonable thing to demand, 
and I'm not going to try and argue you out of it.

> And it doesn't "matter" that switching branches will just result in
> the wrong keyword expansion, because you don't care about the
> keywords actually being "correct" - they are just random strings, and
> it apparently doesn't really have to "work" as far as you're
> concerned.

If you define "work" as "works like cvs/svn does", then I was fine with 
it.  I don't like it when my favourite VCS, that I want everyone to 
use, doesn't have an answer to "but does it do X?".

> And the "git grep" concern you just dismissed by stating that it
> should use the filesystem copy, never mind that this just means that
> a clean working tree gets different results from doing the same thing
> based on that same revision.

As I said at the time, I just picked one of the two options.  If you 
don't like that, pick the other option - collapse the keywords during 
the grep...

> And the reaon I'm shouting is that "it doesn't matter that it's a bit
> hacky" mentality is what gets you things like CVS in the end.
> Bit-for-bit results actually matter. Guarantees actually matter. And
> you should not be able to see a differece in the working tree just
> because you happened to be on a different branch before.

Bit-for-bit as in CRLF is untouched?  No?  Bit-for-bit as in you said 
you were okay with keyword-collapsing but not expansion?  You're just 
as willing to compromise as me, you've just drawn the line in a 
different place.

Incidentally: for future reference, I'll read what you write regardless 
of whether you shout it or not.

> You can try, but you are *ignoring* the things that I say. The end

I've tried very hard to respond to every point you've put to me; I've 
not selectively chopped out bits, and I've tried to give answers that 
make it work as you ask.  Now, none of those things were acceptable to 
you - which is fine - but I certinaly wasn't ignoring what you say - 
_disagreeing with_ is not the same as ignoring.

> If that's what it is, fine. But people on the list seem to actually
> *want* it. They must be educated what a *disaster* it would be to
> actually try to really support something like it in real life, and
> not just as a mental exercise.

People wanting something "wrong" so much is not a sign that they need 
educating, it's a sign that they need a solution.   In every other 
respect git has a solution for them; rather than explaining to them 
that what they want is stupid, I'd offer that it's more appropriate to 
offer something better in exchange.  So my keyword expansion idea is 
wrong - fine - where's the something better?  Writing custom scripts 
and makefiles for every project I ever run is /not/ "something better".

Anyway, it's late, and I'm tired - this has turned into a battle of 
wills, and I'm not that into battling.   Enough antihistamine has been 
poured on my itch that I no longer want to scratch it.  I'll send my 
most recent patch for the sake of history, and then abandon this 
project.

Thanks for your time on this, I appreciate your detailed responses, even 
if we don't agree.



Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH] Add keyword collapse support to convert.c
  2007-04-17 20:46               ` Andy Parkins
@ 2007-04-17 20:52                 ` Andy Parkins
  2007-04-17 21:10                 ` [PATCH 2/2] Add keyword unexpansion " Linus Torvalds
  2007-04-20 11:32                 ` Nikolai Weibull
  2 siblings, 0 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 20:52 UTC (permalink / raw)
  To: git

This patch adds expansion of keywords support.  The collapse is only
performed when the "keywords" attribute it found for a file.  The check
for this attribute is done in the same way as the "crlf" attribute
check.

The actual collapse is performed by keyword_collapse_git() which is
called from convert_to_git() when the "keywords" attribute is found.

keyword_collapse_git() finds strings of the form

 $KEYWORD: ARBITRARY STRING$

And collapses them into

 $KEYWORD:$

No parsing of the keyword itself is performed, the content is simply
dropped.

Despite the fact that this doesn't do anything useful from the users
perspective, this patch forms the more important half of keyword
expansion support - because it prevents the expansion from entering the
repository.  It effectively creates blind spots that git tools won't
see.

convert_to_git() has also been changed so that it no longer only does
CRLF conversion.  Instead, a flag is kept to say whether any conversion
was done by the CRLF code, and then that converted buffer is passed to
keyword_collapse_git() and the flag again updated.  It then returns 1 if
either of these conversion functions actually changed anything.

I've also included a test script to show that the keyword collapse is
working.  It particular demonstrates that the diff between a file with
keywords and the repository is blind to the expanded keyword.

git-apply is patched to perform the collapse as well on each fragment.

Signed-off-by: Andy Parkins <andyparkins@gmail.com>
---

This is on top of 1ddfc1ad616550764056077b9e12a35533298c89.

I'm positing it for posterity.  It's not going anywhere though, so I'm not
submitting it for inclusion.


 builtin-apply.c     |    2 +
 convert.c           |  123 +++++++++++++++++++++++++++++++++++++++++++++++++-
 t/t0030-keywords.sh |   95 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 217 insertions(+), 3 deletions(-)
 create mode 100755 t/t0030-keywords.sh

diff --git a/builtin-apply.c b/builtin-apply.c
index fd92ef7..212c7d4 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -1056,6 +1056,8 @@ static int parse_single_patch(char *line, unsigned long size, struct patch *patc
 	unsigned long oldlines = 0, newlines = 0, context = 0;
 	struct fragment **fragp = &patch->fragments;
 
+	convert_to_git( patch->new_name, &line, &size );
+
 	while (size > 4 && !memcmp(line, "@@ -", 4)) {
 		struct fragment *fragment;
 		int len;
diff --git a/convert.c b/convert.c
index d0d4b81..a18e7ea 100644
--- a/convert.c
+++ b/convert.c
@@ -230,16 +230,133 @@ static int git_path_check_crlf(const char *path)
 	return attr_crlf_check.isset;
 }
 
+/* ------------------ keywords -------------------- */
+
+static void setup_keyword_check(struct git_attr_check *check)
+{
+	static struct git_attr *attr_keyword;
+
+	if (!attr_keyword)
+		attr_keyword = git_attr("keywords", 8);
+	check->attr = attr_keyword;
+}
+
+static int git_path_check_keyword(const char *path)
+{
+	struct git_attr_check attr_keyword_check;
+
+	setup_keyword_check(&attr_keyword_check);
+
+	if (git_checkattr(path, 1, &attr_keyword_check))
+		return -1;
+	return attr_keyword_check.isset;
+}
+
+static int keyword_collapse_git(const char *path, char **bufp, unsigned long *sizep)
+{
+	char *buffer, *nbuf, *keyword;
+	unsigned long size, keywordlength;
+	int changes = 0;
+	enum {
+		IN_VOID,
+		PRE_KEYWORD,
+		IN_KEYWORD,
+		IN_EXPANSION,
+		END_KEYWORD
+	} parser_state = IN_VOID;
+
+	size = *sizep;
+	if (!size)
+		return 0;
+	buffer = *bufp;
+
+	/*
+	 * Allocate an identically sized buffer, keyword collapse can
+	 * only reduce the size so we'll never overflow (although we might
+	 * waste a few bytes
+	 */
+	nbuf = xmalloc(size);
+	*bufp = nbuf;
+
+	while (size) {
+		unsigned char c;
+
+		c = *buffer;
+
+		switch( parser_state ) {
+		case IN_VOID:        /* Normal characters, wait for '$' */
+			if (c == '$')
+				parser_state = PRE_KEYWORD;
+			break;
+		case PRE_KEYWORD:    /* Gap between '$' and keyword */
+			keywordlength = 0;
+			keyword = buffer;
+			if (!isspace(c))
+				parser_state = IN_KEYWORD;
+			else
+				break;
+		case IN_KEYWORD:     /* Keyword itself */
+			if (c == ':')
+				parser_state = IN_EXPANSION;
+			else if (c == '$' || c == '\n' || c == '\r' || c == '\0' )
+				parser_state = END_KEYWORD;
+			else
+				keywordlength++;
+			break;
+		case IN_EXPANSION:   /* The expansion gets silently removed */
+			if (c == '$' || c == '\n' || c == '\r' || c == '\0' )
+				parser_state = END_KEYWORD;
+			else {
+				changes = 1;
+				/* Every character we skip reduces the overall size */
+				(*sizep)--;
+				buffer++;
+				size--;
+			}
+			continue;
+		case END_KEYWORD:    /* End of keyword */
+			parser_state = IN_VOID;
+			break;
+		}
+
+		*nbuf++ = c;
+		buffer++;
+		size--;
+	}
+
+	return (changes != 0);
+}
+
+
+/* ------------------------------------------------ */
 int convert_to_git(const char *path, char **bufp, unsigned long *sizep)
 {
+	int changes = 0;
+
 	switch (git_path_check_crlf(path)) {
 	case 0:
-		return 0;
+		changes += 0;
+		break;
+	case 1:
+		changes += forcecrlf_to_git(path, bufp, sizep);
+		break;
+	default:
+		changes += autocrlf_to_git(path, bufp, sizep);
+		break;
+	}
+
+	switch (git_path_check_keyword(path)) {
+	case 0:
+		changes += 0;
+		break;
 	case 1:
-		return forcecrlf_to_git(path, bufp, sizep);
+		changes += keyword_collapse_git(path, bufp, sizep);
+		break;
 	default:
-		return autocrlf_to_git(path, bufp, sizep);
+		changes += 0;
+		break;
 	}
+	return (changes != 0);
 }
 
 int convert_to_working_tree(const char *path, char **bufp, unsigned long *sizep)
diff --git a/t/t0030-keywords.sh b/t/t0030-keywords.sh
new file mode 100755
index 0000000..5180b0e
--- /dev/null
+++ b/t/t0030-keywords.sh
@@ -0,0 +1,95 @@
+#!/bin/sh
+
+cd $(dirname $0)
+
+test_description='Keyword expansion'
+
+. ./test-lib.sh
+
+# Adding the attribute "keywords" turns the keyword expansion on
+# I've used "notkeywords" as an attribute as a placeholder attribute
+# but this is just "somerandomattribute", it has no meaning
+
+# Expect success because the keyword attribute should be found
+test_expect_success 'Keywords attribute present' '
+
+	echo "keywordsfile keywords" >.gitattributes &&
+
+	echo "\$keyword: anythingcangohere\$" > keywordsfile &&
+
+	git add keywordsfile &&
+	git add .gitattributes &&
+	git commit -m test-keywords &&
+
+	git check-attr keywords -- keywordsfile
+'
+
+# Expect failure because the repository version should be different from the
+# working tree version.
+#
+#  In repository : $keyword:$
+#  In working dir: $keyword: anythingcangohere$
+#
+test_expect_failure 'Keywords collapse active' '
+
+	git show HEAD:keywordsfile > keywordsfile.cmp &&
+	cmp keywordsfile keywordsfile.cmp
+
+'
+
+# expect success because we want to find the keyword line collapsed in the
+# and hence appearing unchanged in the output of git-diff
+test_expect_success 'git-diff with keywords present' '
+	echo "Non-keyword containing line" >> keywordsfile &&
+	git diff -- keywordsfile | grep -qs "^ \$keyword:\$$"
+'
+
+# Check git-apply blindness
+cat > keyword-patch.diff << EOF
+diff --git a/keywordsfile b/keywordsfile
+--- a/keywordsfile
++++ b/keywordsfile
+@@ -1,2 +1,2 @@
+ \$keyword:\$
+-Non-keyword containing line
++Another non-keyword containing line
+EOF
+
+test_expect_success 'patch application with keywords active' '
+	git-apply --check keyword-patch.diff
+'
+
+# Expect failure because the keywords attribute should NOT be found
+test_expect_failure 'Keywords attribute absent' '
+
+	echo "keywordsfile notkeywords" >.gitattributes &&
+
+	git add .gitattributes &&
+	git commit -m test-not-keywords &&
+
+	git check-attr keywords -- keywordsfile
+
+'
+
+# If keywords are later disabled on that file, then the keyword collapsed
+# will be ignored, so a diff should now show differences, because git is no
+# longer keyword blind
+test_expect_success 'git-diff with keywords in file but disabled' '
+	git diff -- keywordsfile | grep -qs "^diff"
+'
+
+# Expect success because the repository should be identical to the working tree
+test_expect_success 'Keywords collapse inactive' '
+
+	git add keywordsfile &&
+	git commit -m "test-not-keywords"
+
+	git show HEAD:keywordsfile > keywordsfile.cmp &&
+	cmp keywordsfile keywordsfile.cmp
+'
+
+test_expect_failure 'patch application without keywords active' '
+	git-apply --check keyword-patch.diff
+'
+
+test_done
-- 
1.5.1.1.822.g0049

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 21:16                     ` Nicolas Pitre
       [not found]                       ` <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>
@ 2007-04-17 20:53                       ` David Lang
  2007-04-17 21:52                         ` Andy Parkins
  2007-04-17 22:40                       ` Junio C Hamano
  2007-04-18  6:24                       ` Rogan Dawes
  3 siblings, 1 reply; 66+ messages in thread
From: David Lang @ 2007-04-17 20:53 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano

On Tue, 17 Apr 2007, Nicolas Pitre wrote:

> Subject: Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
> 
> On Tue, 17 Apr 2007, David Lang wrote:
>
>> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>>
>>> On Tue, 17 Apr 2007, David Lang wrote:
>>>
>>>> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>>>>
>>>>> I cannot do otherwise than ask at this point in the debate: why isn't
>>>>> the makefile rule sufficient for your needs?  Why going through a
>>>>> complicated path that no one else will support due to its numerous
>>>>> pitfalls?
>>>>
>>>> not all uses of VCS's involve useing make
>>>
>>> Use perl then.  Or a shell script.  Or even a command.com batch script.
>>> Or your own tool.
>>
>> I would like to, however this doesn't currently integrate well with git. I've
>> been told in the past that once .gitattributes is in place then the hooks for
>> the crlf stuff can be generalized to allow for calls out to custom code to do
>> this sort of thing.
>
> And I agree that this is a perfectly sensible thing to do.  The facility
> should be there for you to apply any kind of transformation with
> external tools on data going in or out from Git.  There are good and bad
> things you can do with such a facility, but at least it becomes your
> responsibility to screw^H^H^H^Hfilter your data and not something that
> is enforced by Git itself.

I'm pretty sure that hooks for an external helper would satisfy Andy with his 
keyword expanstion as well.

David Lang

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17  9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
                   ` (3 preceding siblings ...)
  2007-04-17 15:32 ` Linus Torvalds
@ 2007-04-17 21:00 ` Matthieu Moy
  4 siblings, 0 replies; 66+ messages in thread
From: Matthieu Moy @ 2007-04-17 21:00 UTC (permalink / raw)
  To: git

Andy Parkins <andyparkins@gmail.com> writes:

> This patch adds expansion of keywords support.

I didn't get time to read the whole thread, but just my 2 cents:

Did you have a look at the way Mercurial deals with this?

They have a system where files can be ran through arbitrary filters
when commited:

  http://www.selenic.com/mercurial/wiki/index.cgi/EncodeDecodeFilter

and it can be somehow abused to do keyword expansion.

  http://www.selenic.com/mercurial/wiki/index.cgi/TipsAndTricks#head-f348f796b9560a1cfbdaf3f3f0f7d9d4339266e9

It surely doesn't solve all the problems mentionned in this thread,
but the experience can be interesting to look at.

-- 
Matthieu

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 20:46               ` Andy Parkins
  2007-04-17 20:52                 ` [PATCH] Add keyword collapse " Andy Parkins
@ 2007-04-17 21:10                 ` Linus Torvalds
  2007-04-17 21:13                   ` Linus Torvalds
  2007-04-20 11:32                 ` Nikolai Weibull
  2 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 21:10 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Junio C Hamano



On Tue, 17 Apr 2007, Andy Parkins wrote:
> 
> If you define "work" as "works like cvs/svn does", then I was fine with 
> it.

I can't really argue against that. Yes, I agree 100% that we can "work" in 
the sense that "cvs/svn works". There's clearly no fundamental reasons why 
you can't, since svn/cvs obviously do it.

I just do have higher standards. I really dislike CVS, and in many ways I 
actually think that SVN is even worse (not because it's really "worse", 
but because I think it is such a waste - it fixes the _trivial_ things 
about CVS, but doesn't really fix any of the underlying problems).

So I don't actually think that CVS "works". 

> Bit-for-bit as in CRLF is untouched?  No?  Bit-for-bit as in you said 
> you were okay with keyword-collapsing but not expansion?  You're just 
> as willing to compromise as me, you've just drawn the line in a 
> different place.

Bit-for-bit as in "you have to be able to trust every single bit".

And no, I don't actually love CRLF either. But it doesn't have quite the 
same fundamental problems. It has issues too, but they are fundamentally 
smaller, and I think making "git compatible with Windows" is also a lot 
more important than making "git compatible with CVS users".

Windows we cannot change. CVS users we can try to help. 

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 21:10                 ` [PATCH 2/2] Add keyword unexpansion " Linus Torvalds
@ 2007-04-17 21:13                   ` Linus Torvalds
  2007-04-18 11:11                     ` Johannes Schindelin
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 21:13 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Junio C Hamano



On Tue, 17 Apr 2007, Linus Torvalds wrote:
> 
> Windows we cannot change. CVS users we can try to help. 

.. and if it wasn't clear, "helping" CVS users is not in my opinion to try 
to make git act like CVS, and lettign them do stupid things, but to try to 
help them become *more* than CVS users.

Because they too can become upstanding members of society, and leave their 
dark past behind them. I firmly believe that nobody is past saving.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 20:05                   ` David Lang
@ 2007-04-17 21:16                     ` Nicolas Pitre
       [not found]                       ` <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>
                                         ` (3 more replies)
  0 siblings, 4 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-17 21:16 UTC (permalink / raw)
  To: David Lang; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano

On Tue, 17 Apr 2007, David Lang wrote:

> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
> 
> > On Tue, 17 Apr 2007, David Lang wrote:
> > 
> > > On Tue, 17 Apr 2007, Nicolas Pitre wrote:
> > > 
> > > > I cannot do otherwise than ask at this point in the debate: why isn't
> > > > the makefile rule sufficient for your needs?  Why going through a
> > > > complicated path that no one else will support due to its numerous
> > > > pitfalls?
> > > 
> > > not all uses of VCS's involve useing make
> > 
> > Use perl then.  Or a shell script.  Or even a command.com batch script.
> > Or your own tool.
> 
> I would like to, however this doesn't currently integrate well with git. I've
> been told in the past that once .gitattributes is in place then the hooks for
> the crlf stuff can be generalized to allow for calls out to custom code to do
> this sort of thing.

And I agree that this is a perfectly sensible thing to do.  The facility 
should be there for you to apply any kind of transformation with 
external tools on data going in or out from Git.  There are good and bad 
things you can do with such a facility, but at least it becomes your 
responsibility to screw^H^H^H^Hfilter your data and not something that 
is enforced by Git itself.


Nicolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 19:12           ` Andy Parkins
                               ` (2 preceding siblings ...)
  2007-04-17 19:54             ` Linus Torvalds
@ 2007-04-17 21:18             ` Martin Langhoff
  3 siblings, 0 replies; 66+ messages in thread
From: Martin Langhoff @ 2007-04-17 21:18 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Linus Torvalds, Junio C Hamano

On 4/18/07, Andy Parkins <andyparkins@gmail.com> wrote:
> On Tuesday 2007, April 17, Linus Torvalds wrote:
> > So you will never work with anybody outside of git?
> For my projects - correct; I don't care about the rest of the world.
> For projects that do - don't enable keywords, it's an option, all I
> want is to have that option.

GIT's fundamental respect for the contents its tracking (in not
munging them with keyword expansion) means that it works great in
contexts where other SCMs tools are used. And being content-centric at
the SCM layer means that it is possible to track a git project with -
say - Mercurial (and vice-versa) with nothing more than a bit of perl
glue and no guessing at all. The content is the content is the
content.

When the SCM has "munge-the-content" options, your "upstream" can make
things completely un-trackable. Tracking CVS with git is a breeze but
there is breakage related to keyword expansion. I should write some
better heuristics for it, but it's impossible to know with 100%
certainty that you are doing the right thing, and getting the correct
content from CVS.

All this talk of breaking non-git-patch goes back to the same. With
the current design, projects that use git are easily trackable if you
just look at the content, and ignore the SCM. That's an outstanding
property and quite central to the design, and I wouldn't include an
option to "turn it off" even if the patch to implement it turns out to
be trivial.

Probably using make will help - you might be able to wire it to work
off a post-update-hook so it's completely transparent.

cheers,


martin

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 11:35   ` Andy Parkins
  2007-04-17 15:53     ` Linus Torvalds
@ 2007-04-17 21:24     ` Junio C Hamano
  2007-04-20  0:30     ` Jakub Narebski
  2 siblings, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-17 21:24 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git

Andy Parkins <andyparkins@gmail.com> writes:

> On Tuesday 2007 April 17 11:09, Junio C Hamano wrote:
>
>> 	$ git checkout B
>>
>> 	should be immediate and instantaneous.
>
> Now - that's a much better argument.  However, it's not
> relevant, keywords (in other VCSs, and so why not in git) are
> only updated when a file is checked out.

It _is_ very much relevant.

If you have the keyword in your svg drawing, and if branch A and
branch B happen to have textually the same contents but the way
they got there are different, I do not think not checking it out
upon branch switching is correct.  Otherwise your printed copy
would have information from the version in branch A, even after
switching to B.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 20:53                       ` David Lang
@ 2007-04-17 21:52                         ` Andy Parkins
  0 siblings, 0 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 21:52 UTC (permalink / raw)
  To: git; +Cc: David Lang, Nicolas Pitre, Linus Torvalds, Junio C Hamano

On Tuesday 2007, April 17, David Lang wrote:

> I'm pretty sure that hooks for an external helper would satisfy Andy
> with his keyword expanstion as well.

It would.

Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 21:16                     ` Nicolas Pitre
       [not found]                       ` <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>
  2007-04-17 20:53                       ` David Lang
@ 2007-04-17 22:40                       ` Junio C Hamano
  2007-04-18  2:39                         ` Nicolas Pitre
  2007-04-21  0:42                         ` David Lang
  2007-04-18  6:24                       ` Rogan Dawes
  3 siblings, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-17 22:40 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: David Lang, Andy Parkins, git, Linus Torvalds

Nicolas Pitre <nico@cam.org> writes:

>> I would like to, however this doesn't currently integrate
>> well with git. I've been told in the past that once
>> .gitattributes is in place then the hooks for the crlf stuff
>> can be generalized to allow for calls out to custom code to
>> do this sort of thing.
>
> And I agree that this is a perfectly sensible thing to do.  The facility 
> should be there for you to apply any kind of transformation with 
> external tools on data going in or out from Git.  There are good and bad 
> things you can do with such a facility, but at least it becomes your 
> responsibility to screw^H^H^H^Hfilter your data and not something that 
> is enforced by Git itself.

You have to be careful, though.  Depending on what kind of
transformation you implement with the external tools, you would
end up having to slow down everything we would do.

It boils down to this statement from Andy:

    ..., keywords (in other VCSs, and so why not in git) are
    only updated when a file is checked out.  There is no need
    to touch every file.  It's actually beneficial, because the
    keyword in the file is the state of the file at the time it
    was checked in - which is actually more useful than updating
    it to the latest commit every time.

    That means you're only ever expanding in a file that your
    changing anyway - so it's effectively free.  git-checkout
    would still be immediate and instantaneous.

Back up a bit and think what "when a file is checked out" means.
His argument assumes the current behaviour of not checking out
when the underlying blob objects before munging are the same.

But with keyword expansion and fancier "external tools" whose
semantics are not well defined (iow, defined to be "do whatever
they please"), does it still make sense to consider two blobs
that appear in totally different context "the same" and omit
checking out (and causing the external tools hook not getting
run)?  I already pointed out to Andy that the branch name the
file was taken from, if it were to take part of the keyword
expansion, would come out incorrectly in his printed svg
drawing.

If you want somebody's earlier example of "giving a file with
embedded keyword to somebody, who modifies and sends the result
back in full, now you would want to incorporate the change by
identifying the origin" to work, you would want "$Source$" (I am
looking at CVS documentation, "Keyword substitution/Keyword
List") to identify where that file came from (after all, a
source tree could have duplicated files) so that you can tell
which file the update is about, and this keyword would expand
differently depending on where in the project tree the blob
appears.

It is not just the checkout codepath.  We omit diffs when we
know from SHA-1 that the blobs are the same before decoration.
We even omit diffs when we know from SHA-1 that two trees are
the same without taking possible decorations that can be applied
differently to the blobs they contain into account.  Earlier,
Andy said he wanted to grep for the expanded text if he is
grepping in the working tree, and I think that makes sense, but
that means git-grep cannot do the same "borrow from working tree
when expanding from blob object is more expensive" optimization
we have for diff.  We also need to disable that optimization
from the diff, regardless of what the correct semantics for
grepping in working trees should be.

I suspect that you would have to play safe and say "when
external tools are involved, we need to disable the existing
content SHA-1 based optimization for all paths that ask for
them" to keep your sanity.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 18:23     ` Linus Torvalds
  2007-04-17 20:27       ` Rogan Dawes
@ 2007-04-17 23:56       ` Robin H. Johnson
  2007-04-18  0:02         ` Junio C Hamano
                           ` (2 more replies)
  1 sibling, 3 replies; 66+ messages in thread
From: Robin H. Johnson @ 2007-04-17 23:56 UTC (permalink / raw)
  To: Linus Torvalds, Git Mailing List; +Cc: Rogan Dawes, Andy Parkins

[-- Attachment #1: Type: text/plain, Size: 1856 bytes --]

On Tue, Apr 17, 2007 at 11:23:03AM -0700, Linus Torvalds wrote:
> > Well, if the only keyword we support is $BlobId:$, then if the tree/object
> > hasn't changed, then we still don't need to touch the object.
> > 
> > Not so?
> Correct. However, is that actually a useful expansion?
> 
> Most of the time, I'd expect people to want things like "last committer, 
> time, story of their life" etc.. I don't think the SHA1 ID's are pretty 
> enough that anybody would ever want to see them. But yes, they are 
> certainly stable.
I'd certainly settle for having only $Blobid:$. It fits my requirements
perfectly.

This is perhaps a reasonable wording of my requirement.
"Files from from the VCS should contain a stable machine-usable
identifier that is unique for that revision of the file, without
post-processing to insert the identifier."

In the case of CVS, $Header$ contains the path and revision number of a
file, which serve to identify the content uniquely.

In the case of Git, $BlobId$ fills the same requirement.

As for a usage case:
- J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
  same output)
- Copies some file outside of the tree (the user is NOT smart enough,
  and resists all reasonable attempts at edumacation)
- Modifies said file outside of tree.
- Contacts maintainer with entire changed file.
- User vanishes off the internet.

The entire file he sent if it's CVS, contains a $Header$ that uniquely
identifies the file (path and revision), and the maintainer can simply
drop the file in, and 'cvs diff -r$OLDREV $FILE'.
If it's git, the maintainer drops the file in, and does 'git diff
$OLDSHA1 $FILE'.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 23:56       ` Robin H. Johnson
@ 2007-04-18  0:02         ` Junio C Hamano
  2007-04-18  0:26           ` J. Bruce Fields
                             ` (3 more replies)
  2007-04-18  2:50         ` Martin Langhoff
  2007-04-18 10:06         ` David Kågedal
  2 siblings, 4 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18  0:02 UTC (permalink / raw)
  To: Robin H. Johnson
  Cc: Linus Torvalds, Git Mailing List, Rogan Dawes, Andy Parkins

"Robin H. Johnson" <robbat2@gentoo.org> writes:

> As for a usage case:
> - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
>   same output)
> - Copies some file outside of the tree (the user is NOT smart enough,
>   and resists all reasonable attempts at edumacation)
> - Modifies said file outside of tree.
> - Contacts maintainer with entire changed file.
> - User vanishes off the internet.
>
> The entire file he sent if it's CVS, contains a $Header$ that uniquely
> identifies the file (path and revision), and the maintainer can simply
> drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> If it's git, the maintainer drops the file in, and does 'git diff
> $OLDSHA1 $FILE'.

I personally hope that the maintainer drops such a non-patch
that originates from a PEBKAC.  At least I hope the tools that I
personally use are not maintained by such a maintainer ;-)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  0:02         ` Junio C Hamano
@ 2007-04-18  0:26           ` J. Bruce Fields
  2007-04-18  1:19             ` Linus Torvalds
  2007-04-18  1:06           ` Robin H. Johnson
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 66+ messages in thread
From: J. Bruce Fields @ 2007-04-18  0:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robin H. Johnson, Linus Torvalds, Git Mailing List, Rogan Dawes,
	Andy Parkins

On Tue, Apr 17, 2007 at 05:02:43PM -0700, Junio C Hamano wrote:
> "Robin H. Johnson" <robbat2@gentoo.org> writes:
> 
> > As for a usage case:
> > - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> >   same output)
> > - Copies some file outside of the tree (the user is NOT smart enough,
> >   and resists all reasonable attempts at edumacation)
> > - Modifies said file outside of tree.
> > - Contacts maintainer with entire changed file.
> > - User vanishes off the internet.
> >
> > The entire file he sent if it's CVS, contains a $Header$ that uniquely
> > identifies the file (path and revision), and the maintainer can simply
> > drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> > If it's git, the maintainer drops the file in, and does 'git diff
> > $OLDSHA1 $FILE'.
> 
> I personally hope that the maintainer drops such a non-patch
> that originates from a PEBKAC.  At least I hope the tools that I
> personally use are not maintained by such a maintainer ;-)

That may not be quite fair--note the 'git diff $OLDSHA1 $FILE'.  So the
$Header$ here is a hint telling the maintainer how to produce a
(hopefully) reviewable patch, not an invitation to blindly drop random
files into the tree.  (Other objections to accepting code from random
non-reachable people aside....)

I've occasionally wondered before whether git could offer any help in
the case where, say, somebody hands me a file, I know it's based on
src/widget/widget.c from somewhere in v0.5..v0.7, and I'd like a guess
at the most likely candidates.

I haven't wondered that often enough that I'd consider it worth
embedding the blob SHA1 in every checked-out file, though!

--b.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  0:02         ` Junio C Hamano
  2007-04-18  0:26           ` J. Bruce Fields
@ 2007-04-18  1:06           ` Robin H. Johnson
  2007-04-18  1:15             ` Junio C Hamano
  2007-04-18  4:15           ` Daniel Barkalow
  2007-04-18 11:32           ` Johannes Schindelin
  3 siblings, 1 reply; 66+ messages in thread
From: Robin H. Johnson @ 2007-04-18  1:06 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List
  Cc: Robin H. Johnson, Linus Torvalds, Rogan Dawes, Andy Parkins

[-- Attachment #1: Type: text/plain, Size: 1886 bytes --]

On Tue, Apr 17, 2007 at 05:02:43PM -0700, Junio C Hamano wrote:
> > As for a usage case:
> > - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> >   same output)
> > - Copies some file outside of the tree (the user is NOT smart enough,
> >   and resists all reasonable attempts at edumacation)
> > - Modifies said file outside of tree.
> > - Contacts maintainer with entire changed file.
> > - User vanishes off the internet.
> >
> > The entire file he sent if it's CVS, contains a $Header$ that uniquely
> > identifies the file (path and revision), and the maintainer can simply
> > drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> > If it's git, the maintainer drops the file in, and does 'git diff
> > $OLDSHA1 $FILE'.
> I personally hope that the maintainer drops such a non-patch
> that originates from a PEBKAC.  At least I hope the tools that I
> personally use are not maintained by such a maintainer ;-)
I certainly wasn't stating blindly commit the file. Any Gentoo developer
doing that should not have made it through the recruitment process.

Do the diff, separate the wheat from the chaff, and then put the useful
(and reviewed) changes back into the tree.

Glancing at the Gentoo bugs I've dealt with over the last 2 months as a
quick survey, there are a few levels: 
A - Able to submit a good diff
B - Able to do a good implementation
C - Able to come up with a good idea for improvement

B are in short supply, and even of those, the number that can do A are
smaller :-(. Category C is vastly bigger than B, and those that don't
make B throw up a lot of chaff of bad implementations.

Being able to extract the good ideas is what's important.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  1:06           ` Robin H. Johnson
@ 2007-04-18  1:15             ` Junio C Hamano
  2007-04-18  1:42               ` Junio C Hamano
  0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18  1:15 UTC (permalink / raw)
  To: Robin H. Johnson
  Cc: Git Mailing List, Linus Torvalds, Rogan Dawes, Andy Parkins

"Robin H. Johnson" <robbat2@gentoo.org> writes:

> Glancing at the Gentoo bugs I've dealt with over the last 2 months as a
> quick survey, there are a few levels: 
> A - Able to submit a good diff
> B - Able to do a good implementation
> C - Able to come up with a good idea for improvement
>
> B are in short supply, and even of those, the number that can do A are
> smaller :-(. Category C is vastly bigger than B, and those that don't
> make B throw up a lot of chaff of bad implementations.
>
> Being able to extract the good ideas is what's important.

True, to a certain degree.  Maybe you and your fellow Gentoo
people are very much more accomodating, but my fear is that a
maintainer that goes length to sift through chaff himself
quickly runs out of time, becomes exhausted, and ends up being
careless.

Maybe I am spoiled by having only the best people around me, and
on git list.

But we are straying to a tangent.

I do not have much against an optional "only blob id" expansion
myself, as I do not see any more downside than CRLF expansion in
it.  But I suspect that once people see the $id$ expanded to
blob, they would not stop, because simply they do not understand
why blob-id and CRLF are much less evil than other things.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  0:26           ` J. Bruce Fields
@ 2007-04-18  1:19             ` Linus Torvalds
  2007-04-18  1:28               ` Junio C Hamano
  0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-18  1:19 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Junio C Hamano, Robin H. Johnson, Git Mailing List, Rogan Dawes,
	Andy Parkins



On Tue, 17 Apr 2007, J. Bruce Fields wrote:
> 
> I've occasionally wondered before whether git could offer any help in
> the case where, say, somebody hands me a file, I know it's based on
> src/widget/widget.c from somewhere in v0.5..v0.7, and I'd like a guess
> at the most likely candidates.

It's actually fairly easy to do.

Get the git hash of the blob: use "git hash-object" to do so (although 
you can do it without git too, see later), then just do

	git whatchanged v0.5..v0.7 -- src/widget/widget.c

and just look for the hash. If it's an exact match, you'd find it there, 
and it will tell you when it changed.

If it's *not* an exact match, you have to come up with some "measure of 
minimality" for the thing (the size of the diff might be a good one), and 
you can do

	git rev-list --no-merges --full-history v0.5..v0.7 -- src/widget/widget.c > rev-list

which will get you a full set of commits that changed that file. Then you 
can just do something like

	best_commit=none
	best=1000000
	while read commit
	do 
		git cat-file blob "$commit:src/widget/widget.c" > tmpfile
		lines=$(diff reference-file tmpfile | wc -l)
		if [ "$lines" -lt "$best" ]
		then
			echo Best so far: $commit $lines
			best=$lines
		fi
	done < rev-list

and you're done!

(Yeah, I'm sure that script could be improved, but it's probably really 
not that bad even as-is! The initial "git rev-list" will have done all 
the heavy lifting, and picked out the commits that matter)

> I haven't wondered that often enough that I'd consider it worth
> embedding the blob SHA1 in every checked-out file, though!

It really doesn't pay.

Besides, if you actually have the file, you can trivially get the SHA1 
_without_ embedding it into the file. Just do

	(echo -e -n "blob <size>\0" ; cat file) | sha1sum

where "size" is just the size in bytes of the file.

So embedding the SHA1 doesn't actually buy you anything: every blob BY 
DEFINITION has their SHA1 embedded into them.

In fact, embedding the SHA1 (or doing any other modifications) just makes 
it harder to do this, since then you have to filter it out again.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  1:19             ` Linus Torvalds
@ 2007-04-18  1:28               ` Junio C Hamano
  2007-04-18  1:33                 ` Linus Torvalds
  0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18  1:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: J. Bruce Fields, Robin H. Johnson, Git Mailing List, Rogan Dawes,
	Andy Parkins

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Besides, if you actually have the file, you can trivially get the SHA1 
> _without_ embedding it into the file. Just do
>
> 	(echo -e -n "blob <size>\0" ; cat file) | sha1sum
>
> where "size" is just the size in bytes of the file.
>
> So embedding the SHA1 doesn't actually buy you anything: every blob BY 
> DEFINITION has their SHA1 embedded into them.
>
> In fact, embedding the SHA1 (or doing any other modifications) just makes 
> it harder to do this, since then you have to filter it out again.

The use case the thread you are responding to assumes that you
do *not* have a preimage before the change.

You give a file out, somebody says "here is my updated version"
and returns the whole file.  You may recognize, from the
contents, which path the file was taken from, but you may not
know which revision to diff against, as the "update version" was
not sent to you along with the version before he started
updating.

In my day job, I made a protocol, with a diff-challenged person
in our Japanese subsidiary, for him to send *both* preimage and
postimage of his changes when sending updates to me, and that
has worked reasonably well without embedded ID. I can always do
a file-level 3-way merge to forward port his change to whatever
version I am working on.

But if it is not an option to insist getting the preimage back,
embedded blob ID would be one way to help that exchange.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  1:28               ` Junio C Hamano
@ 2007-04-18  1:33                 ` Linus Torvalds
  0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-18  1:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: J. Bruce Fields, Robin H. Johnson, Git Mailing List, Rogan Dawes,
	Andy Parkins



On Tue, 17 Apr 2007, Junio C Hamano wrote:
> 
> The use case the thread you are responding to assumes that you
> do *not* have a preimage before the change.

Sure. However, see my other script to actually find the "closest version". 
It's pretty easy.

In fact, even if you don't know which file it is, we could easily first 
have a separate "try to guess filename" (based on the same kind of 
heurstics) and then dig deeper on that filename.

So I think there are better ways to get the same effect without embedding 
any information.

		Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  1:15             ` Junio C Hamano
@ 2007-04-18  1:42               ` Junio C Hamano
  2007-04-18  2:53                 ` Robin H. Johnson
  0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18  1:42 UTC (permalink / raw)
  To: Robin H. Johnson
  Cc: Git Mailing List, Linus Torvalds, Rogan Dawes, Andy Parkins

Junio C Hamano <junkio@cox.net> writes:

> I do not have much against an optional "only blob id" expansion
> myself, as I do not see any more downside than CRLF expansion in
> it...

Actually, there is one.  Somebody makes a patch against a file
with $id$ expanded.  Gives it to somebody else who is git
challenged and does not have git-apply.  The patch is useless.

So it is not without more downsides than CRLF.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 22:40                       ` Junio C Hamano
@ 2007-04-18  2:39                         ` Nicolas Pitre
  2007-04-18  5:04                           ` Junio C Hamano
  2007-04-18 11:14                           ` Johannes Schindelin
  2007-04-21  0:42                         ` David Lang
  1 sibling, 2 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18  2:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Lang, Andy Parkins, git, Linus Torvalds

On Tue, 17 Apr 2007, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> >> I would like to, however this doesn't currently integrate
> >> well with git. I've been told in the past that once
> >> .gitattributes is in place then the hooks for the crlf stuff
> >> can be generalized to allow for calls out to custom code to
> >> do this sort of thing.
> >
> > And I agree that this is a perfectly sensible thing to do.  The facility 
> > should be there for you to apply any kind of transformation with 
> > external tools on data going in or out from Git.  There are good and bad 
> > things you can do with such a facility, but at least it becomes your 
> > responsibility to screw^H^H^H^Hfilter your data and not something that 
> > is enforced by Git itself.
> 
> You have to be careful, though.  Depending on what kind of
> transformation you implement with the external tools, you would
> end up having to slow down everything we would do.

So what?  

We provide a rope with proper caveat emptor.  Up to others to hang 
themselves with it if they so desire.  It is not our problem anymore.

> It boils down to this statement from Andy:
> 
>     ..., keywords (in other VCSs, and so why not in git) are
>     only updated when a file is checked out.  There is no need
>     to touch every file.  It's actually beneficial, because the
>     keyword in the file is the state of the file at the time it
>     was checked in - which is actually more useful than updating
>     it to the latest commit every time.
> 
>     That means you're only ever expanding in a file that your
>     changing anyway - so it's effectively free.  git-checkout
>     would still be immediate and instantaneous.
> 
> Back up a bit and think what "when a file is checked out" means.
> His argument assumes the current behaviour of not checking out
> when the underlying blob objects before munging are the same.

And I think that should remain.  If someone really wants full 
transformation aka keyword expansion or whatever then he'd just need to 
force a full read-tree after switching branch.

> But with keyword expansion and fancier "external tools" whose
> semantics are not well defined (iow, defined to be "do whatever
> they please"), does it still make sense to consider two blobs
> that appear in totally different context "the same" and omit
> checking out (and causing the external tools hook not getting
> run)?

I think so.  At least by default.  And if we really want to be kind we 
could provide a special attribute just for disabling such optimization 
i.e. to force a checkout of everything marked with such an attribute 
everytime.  But we might as well wait to see if someone actually ask 
about that.

But for many other cases having such a facility would be just nice 
especially for those cases that don't depend on the branch/commit but 
only on the content itself.  For example it was pointed out that Open 
Office documents are gzipped XML files.  In that case it would be 
extremely advantageous to have the ability to specify an input filter as
"gzip -d" and an output filter as "gzip -c" so Git has a chance to 
actually perform some kind of delta compression.

I'm sure there might be other type of filters for situation we've not 
thought about yet, and that we might not want to carry as a builtin 
feature.  Who knows, maybe someone might want to port Git to the 
System/360 and will need an EBCDIC filter on checked out text files.  Or 
maybe a byte swapping filter for some kind of binary files when on a 
system with a different endianness.

> I already pointed out to Andy that the branch name the
> file was taken from, if it were to take part of the keyword
> expansion, would come out incorrectly in his printed svg
> drawing.

Tough.

> If you want somebody's earlier example of "giving a file with
> embedded keyword to somebody, who modifies and sends the result
> back in full, now you would want to incorporate the change by
> identifying the origin" to work, you would want "$Source$" (I am
> looking at CVS documentation, "Keyword substitution/Keyword
> List") to identify where that file came from (after all, a
> source tree could have duplicated files) so that you can tell
> which file the update is about, and this keyword would expand
> differently depending on where in the project tree the blob
> appears.

Like we don't record renames, I don't think we should record such thing 
in checked out files either.  Using a search for the closest match (like 
we do for rename detection) is probably a better avenue than trusting 
that the ID embedded in the file wasn't messed with.  Linus already 
provided a small script that would do that with pretty good results.

> It is not just the checkout codepath.  We omit diffs when we
> know from SHA-1 that the blobs are the same before decoration.
> We even omit diffs when we know from SHA-1 that two trees are
> the same without taking possible decorations that can be applied
> differently to the blobs they contain into account.  Earlier,
> Andy said he wanted to grep for the expanded text if he is
> grepping in the working tree, and I think that makes sense, but
> that means git-grep cannot do the same "borrow from working tree
> when expanding from blob object is more expensive" optimization
> we have for diff.  We also need to disable that optimization
> from the diff, regardless of what the correct semantics for
> grepping in working trees should be.
> 
> I suspect that you would have to play safe and say "when
> external tools are involved, we need to disable the existing
> content SHA-1 based optimization for all paths that ask for
> them" to keep your sanity.

Maybe.  If that is what's really needed then so be it.  People who 
really want to do strange things will have the flexibility to do so, but 
they'll have to pay the price in loss of performance.


Nicolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 23:56       ` Robin H. Johnson
  2007-04-18  0:02         ` Junio C Hamano
@ 2007-04-18  2:50         ` Martin Langhoff
  2007-04-18 10:06         ` David Kågedal
  2 siblings, 0 replies; 66+ messages in thread
From: Martin Langhoff @ 2007-04-18  2:50 UTC (permalink / raw)
  To: Robin H. Johnson
  Cc: Linus Torvalds, Git Mailing List, Rogan Dawes, Andy Parkins

On 4/18/07, Robin H. Johnson <robbat2@gentoo.org> wrote:
> As for a usage case:
> - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
>   same output)
> - Copies some file outside of the tree (the user is NOT smart enough,
>   and resists all reasonable attempts at edumacation)
> - Modifies said file outside of tree.
> - Contacts maintainer with entire changed file.
> - User vanishes off the internet.

That's a valid case, but as Linus hints, that's a snippet of perl/bash
away to find "closest matches". We could have a

      git-findclosestmatch <head> path/to/scan/ randomfile.c

That would quickly return a few candidates together with the commit
they appear in.

cheers,



martin

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  1:42               ` Junio C Hamano
@ 2007-04-18  2:53                 ` Robin H. Johnson
  0 siblings, 0 replies; 66+ messages in thread
From: Robin H. Johnson @ 2007-04-18  2:53 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List
  Cc: Robin H. Johnson, Linus Torvalds, Rogan Dawes, Andy Parkins

[-- Attachment #1: Type: text/plain, Size: 2282 bytes --]

On Tue, Apr 17, 2007 at 06:42:38PM -0700, Junio C Hamano wrote:
> Junio C Hamano <junkio@cox.net> writes:
> > I do not have much against an optional "only blob id" expansion
> > myself, as I do not see any more downside than CRLF expansion in
> > it...
> Actually, there is one.  Somebody makes a patch against a file
> with $id$ expanded.  Gives it to somebody else who is git
> challenged and does not have git-apply.  The patch is useless.
So they have diff'd outside of Git, and the recipient is applying outside of
Git:
A - If they are applying the patch on top of the same base revision, it will
    apply fine, because the keywords are identical. 
B - If they are applying the patch on top of a different revision, the keywords
    won't apply, and most probably other content in the patch won't apply either.

Additional with B  the longer your individual files, the more likely that the
diff hunk containing the keyword change does not contain any other changes, and
can be easily discarded. More that the changes are likely to be further away
from the keyword ;-).

Discarding portions of patches is already wide-spread (not just for CVS
keywords, the architecture keywords in Gentoo ebuilds change rapidly as well),
and if git-apply can discard the keyword, it only serves to accelerate the
usage of git.

Some quick stats I hacked together on lengths of Gentoo ebuilds.
23161 ebuilds total.
51% of the Gentoo ebuilds are less than 36 lines long.
76% are less than 56 lines long.
90% are less than 92 lines long.
(thereafter the tail gets VERY long).
0.21% are more than 500 lines long.

> So it is not without more downsides than CRLF.
A cleaner version of my earlier command to find the changes between revisions.
diff -Nuar <(git-cat-file blob $SHA1:$FILE) $TMPFILE
where $TMPFILE is a temporary filename for the file from the user.
This saves having to overwrite the local $FILE and restore it afterwards.
It would be nice if git-diff could handle this case directly.

On a tangent, has any work gone into specialized patch mergers for specific
file formats?

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  0:02         ` Junio C Hamano
  2007-04-18  0:26           ` J. Bruce Fields
  2007-04-18  1:06           ` Robin H. Johnson
@ 2007-04-18  4:15           ` Daniel Barkalow
  2007-04-18 11:32           ` Johannes Schindelin
  3 siblings, 0 replies; 66+ messages in thread
From: Daniel Barkalow @ 2007-04-18  4:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robin H. Johnson, Linus Torvalds, Git Mailing List, Rogan Dawes,
	Andy Parkins

On Tue, 17 Apr 2007, Junio C Hamano wrote:

> "Robin H. Johnson" <robbat2@gentoo.org> writes:
> 
> > As for a usage case:
> > - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> >   same output)
> > - Copies some file outside of the tree (the user is NOT smart enough,
> >   and resists all reasonable attempts at edumacation)
> > - Modifies said file outside of tree.
> > - Contacts maintainer with entire changed file.
> > - User vanishes off the internet.
> >
> > The entire file he sent if it's CVS, contains a $Header$ that uniquely
> > identifies the file (path and revision), and the maintainer can simply
> > drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> > If it's git, the maintainer drops the file in, and does 'git diff
> > $OLDSHA1 $FILE'.
> 
> I personally hope that the maintainer drops such a non-patch
> that originates from a PEBKAC.  At least I hope the tools that I
> personally use are not maintained by such a maintainer ;-)

As a concrete example, say I'm not a Gentoo developer at all, but I'm 
trying to get some package to install in a slightly odd situation. (E.g., 
I want to build a version of gcc for ARM microcontrollers, which requires 
flags to be set that aren't normally available for the ARM architecture.) 
In order to do this, I need to make some changes to the gcc ebuild to pass 
those USE flags through to configure. Since I'm not a Gentoo developer, I 
don't have the version-controlled tree, just: (1) the tree that gets 
reverted to the official state every time I sync and (2) my tree of local 
overlays. (2) also contains other packages I've made modifications to 
(adding patches to packages where those patches are only in unreleased 
version, but solve my problems, e.g.), so it's clearly the place to put 
the gcc change.

Now, if I figure out how to get the ebuild working, I'll be happy just to 
have a working compiler for this weird target, and I don't care too much 
further. But then if word gets out that I managed this, or if I notice a 
bug report that other people are failing to get it to build, I may want to 
post my working ebuild. And maybe the maintainer decides that my method 
was good, and wants to use it. But then it could be a lot of work to 
figure out what differences are from me beating on this ebuild, what are 
important, and what are reverts of changes make upstream after I made my 
copy (particularly because the same ebuild for gcc also gets a lot more 
development making it better as the system native compiler).

If the ebuild has the blob ID that the file had when it left Gentoo 
version control and went out into local hack land, it would be relatively 
easy to figure out what patch should be applies to get the useful changes.

(In case you're wondering, I actually eventually gave up and installed a 
gnu-arm binary distribution, because I was in a hurry to get the project 
going, and building from source kept failing to get configured properly; 
but if my first line of attack had worked, I would have ended up with the 
described hacked ebuild.)

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  2:39                         ` Nicolas Pitre
@ 2007-04-18  5:04                           ` Junio C Hamano
  2007-04-18 14:56                             ` Nicolas Pitre
  2007-04-18 11:14                           ` Johannes Schindelin
  1 sibling, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18  5:04 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: David Lang, Andy Parkins, git, Linus Torvalds

Nicolas Pitre <nico@cam.org> writes:

> On Tue, 17 Apr 2007, Junio C Hamano wrote:
>
>> You have to be careful, though.  Depending on what kind of
>> transformation you implement with the external tools, you would
>> end up having to slow down everything we would do.
>
> So what?  
>
> We provide a rope with proper caveat emptor.  Up to others to hang 
> themselves with it if they so desire.  It is not our problem anymore.

I sort-of find it hard to believe hearing this from somebody who
muttered something about importance of perception a few days ago.

>> I suspect that you would have to play safe and say "when
>> external tools are involved, we need to disable the existing
>> content SHA-1 based optimization for all paths that ask for
>> them" to keep your sanity.
>
> Maybe.  If that is what's really needed then so be it.  People who 
> really want to do strange things will have the flexibility to do so, but 
> they'll have to pay the price in loss of performance.

Not just that.  We end up having to pay the price of maintaining
hooks to let them do crazy things.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 21:16                     ` Nicolas Pitre
                                         ` (2 preceding siblings ...)
  2007-04-17 22:40                       ` Junio C Hamano
@ 2007-04-18  6:24                       ` Rogan Dawes
  2007-04-18 15:02                         ` Linus Torvalds
  3 siblings, 1 reply; 66+ messages in thread
From: Rogan Dawes @ 2007-04-18  6:24 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: David Lang, Andy Parkins, git, Linus Torvalds, Junio C Hamano

Nicolas Pitre wrote:
> On Tue, 17 Apr 2007, David Lang wrote:
> 

>> I've
>> been told in the past that once .gitattributes is in place then the hooks for
>> the crlf stuff can be generalized to allow for calls out to custom code to do
>> this sort of thing.
> 
> And I agree that this is a perfectly sensible thing to do.  The facility 
> should be there for you to apply any kind of transformation with 
> external tools on data going in or out from Git.  There are good and bad 
> things you can do with such a facility, but at least it becomes your 
> responsibility to screw^H^H^H^Hfilter your data and not something that 
> is enforced by Git itself.
> 
> 
> Nicolas

One of the examples that has been given in the past has been taking a 
zipped OpenDocumentFormat file, unzipping it to its component parts, and 
then committing the individual files rather than the aggregate.

But I can't figure out how this might work.

One idea is to store the binary ODF file in the index (and in the packs, 
etc) as a directory with the individual text (and other) files as 
entries within that directory. Then, when various git operations want to 
use the directory, the operation is redirected via an attribute match to 
an external script that knows how to checkout an ODF "directory", or 
diff an ODF "directory", etc.

Or similarly, when checking an "ODF" file in, the attribute would lead 
to an appropriate script creating the "tree" of individual files.

Does this sound workable?

Rogan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 23:56       ` Robin H. Johnson
  2007-04-18  0:02         ` Junio C Hamano
  2007-04-18  2:50         ` Martin Langhoff
@ 2007-04-18 10:06         ` David Kågedal
  2007-04-18 11:08           ` Robin H. Johnson
  2 siblings, 1 reply; 66+ messages in thread
From: David Kågedal @ 2007-04-18 10:06 UTC (permalink / raw)
  To: git

"Robin H. Johnson" <robbat2@gentoo.org> writes:

> This is perhaps a reasonable wording of my requirement.
> "Files from from the VCS should contain a stable machine-usable
> identifier that is unique for that revision of the file, without
> post-processing to insert the identifier."

But what is the "revision of the file"?  The blob ID is just a hash of
the contents, and doesn't say anything about where in the history of
the project it appears.  It will usually appear in many "project
revisions", i.e. commits.

If you want to mark the "revision" of a file, the only sensible thing
is to use the commit ID, which will give you all the problems
described in this thread.

-- 
David Kågedal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18 10:06         ` David Kågedal
@ 2007-04-18 11:08           ` Robin H. Johnson
  0 siblings, 0 replies; 66+ messages in thread
From: Robin H. Johnson @ 2007-04-18 11:08 UTC (permalink / raw)
  To: David K??gedal, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]

On Wed, Apr 18, 2007 at 12:06:48PM +0200, David K??gedal wrote:
> > This is perhaps a reasonable wording of my requirement.
> > "Files from from the VCS should contain a stable machine-usable
> > identifier that is unique for that revision of the file, without
> > post-processing to insert the identifier."
> But what is the "revision of the file"?  The blob ID is just a hash of
> the contents, and doesn't say anything about where in the history of
> the project it appears.  It will usually appear in many "project
> revisions", i.e. commits.
The location/context in history of the file is not needed by the
requirement I wrote above.

Since the BlobID is the hash of the contents (taken with all keywords
collapsed obviously) - if the contents are identical, then the blobid is
identical.

Since the contents and blobid are the same, it doesn't matter which
commit you take the file from when you don't care about the history of
that point (eg cat-file, diff).

The file goes out, and when a user throws it (modified) back at us, we
just grab the $BlobId$ and use that to identify what it originally
looked like.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 21:13                   ` Linus Torvalds
@ 2007-04-18 11:11                     ` Johannes Schindelin
  0 siblings, 0 replies; 66+ messages in thread
From: Johannes Schindelin @ 2007-04-18 11:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andy Parkins, git, Junio C Hamano

Hi,

On Tue, 17 Apr 2007, Linus Torvalds wrote:

> On Tue, 17 Apr 2007, Linus Torvalds wrote:
> > 
> > Windows we cannot change. CVS users we can try to help. 
> 
> .. and if it wasn't clear, "helping" CVS users is not in my opinion to 
> try to make git act like CVS, and lettign them do stupid things, but to 
> try to help them become *more* than CVS users.

I am quite certain that we also can help Windows users see the light. Once 
we have them not only complaining, but actually doing something about it.

> Because they too can become upstanding members of society, and leave their 
> dark past behind them. I firmly believe that nobody is past saving.

Well, it depends. If you clicked on that File Menu button, and then 
clicked on the "Save" item, you are past saving.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  2:39                         ` Nicolas Pitre
  2007-04-18  5:04                           ` Junio C Hamano
@ 2007-04-18 11:14                           ` Johannes Schindelin
  2007-04-18 15:10                             ` Nicolas Pitre
  1 sibling, 1 reply; 66+ messages in thread
From: Johannes Schindelin @ 2007-04-18 11:14 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Junio C Hamano, David Lang, Andy Parkins, git, Linus Torvalds

Hi,

On Tue, 17 Apr 2007, Nicolas Pitre wrote:

> So what?
> 
> We provide a rope with proper caveat emptor.  Up to others to hang 
> themselves with it if they so desire.  It is not our problem anymore.

The people will complain. On this list. And I have to check the mails 
before deleting, because the Subject: does not say "I just took the rope, 
ignored your caveat emptor, and now I am dead. What should I do now?".

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  0:02         ` Junio C Hamano
                             ` (2 preceding siblings ...)
  2007-04-18  4:15           ` Daniel Barkalow
@ 2007-04-18 11:32           ` Johannes Schindelin
  3 siblings, 0 replies; 66+ messages in thread
From: Johannes Schindelin @ 2007-04-18 11:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Robin H. Johnson, Linus Torvalds, Git Mailing List, Rogan Dawes,
	Andy Parkins

Hi,

On Tue, 17 Apr 2007, Junio C Hamano wrote:

> "Robin H. Johnson" <robbat2@gentoo.org> writes:
> 
> > As for a usage case:
> > - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> >   same output)
> > - Copies some file outside of the tree (the user is NOT smart enough,
> >   and resists all reasonable attempts at edumacation)
> > - Modifies said file outside of tree.
> > - Contacts maintainer with entire changed file.
> > - User vanishes off the internet.
> >
> > The entire file he sent if it's CVS, contains a $Header$ that uniquely
> > identifies the file (path and revision), and the maintainer can simply
> > drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> > If it's git, the maintainer drops the file in, and does 'git diff
> > $OLDSHA1 $FILE'.
> 
> I personally hope that the maintainer drops such a non-patch
> that originates from a PEBKAC.

Me, too. Although people really believe strange things. When I asked such 
a guy on another list, if he could send me a patch instead of a complete 
file, he shouted loudly at me that patches were obsolete. Yes. Really. I 
begged to differ, but I guess he still believes that.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  5:04                           ` Junio C Hamano
@ 2007-04-18 14:56                             ` Nicolas Pitre
  0 siblings, 0 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18 14:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Lang, Andy Parkins, git, Linus Torvalds

On Tue, 17 Apr 2007, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > On Tue, 17 Apr 2007, Junio C Hamano wrote:
> >
> >> You have to be careful, though.  Depending on what kind of
> >> transformation you implement with the external tools, you would
> >> end up having to slow down everything we would do.
> >
> > So what?  
> >
> > We provide a rope with proper caveat emptor.  Up to others to hang 
> > themselves with it if they so desire.  It is not our problem anymore.
> 
> I sort-of find it hard to believe hearing this from somebody who
> muttered something about importance of perception a few days ago.

Sure!  And that applies in this case as well.

With such a _generic_ hook, Git will be perceived as much more powerful 
and flexible.  I insist on "generic" because people could experiment 
with their own filters without endless debate on the mailing list and 
pressure to include this or that feature in the core, and we don't have 
to commit to those feature we're not in agreement with.

And let's face it: there are probably legitimate and possibly more 
useful things to do with such a hook than keyword expansion.

If you go to Home Hardware you can buy rope.  Of course you can hang 
yourself with it, but the rope manufacturers won't commit to that I'm 
sure.  But if rope was banned by law because it represents a threath to 
life then governments would be perceived really strangely even if their 
intention are good.

Sure we might have a strong opinion against keyword expansion and that 
is reflected by the fact that Git will most probably never ship with the 
ability to perform keyword expansion.  That doesn't mean we should deny 
all possibilities for external filters _even_ if they can be used for 
keyword expansion.

> >> I suspect that you would have to play safe and say "when
> >> external tools are involved, we need to disable the existing
> >> content SHA-1 based optimization for all paths that ask for
> >> them" to keep your sanity.
> >
> > Maybe.  If that is what's really needed then so be it.  People who 
> > really want to do strange things will have the flexibility to do so, but 
> > they'll have to pay the price in loss of performance.
> 
> Not just that.  We end up having to pay the price of maintaining
> hooks to let them do crazy things.

Weight that against the price of fighting them against the crazy things 
they won't quit wanting to do.  At some point it is just a matter of 
getting out of the way.


Nicolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18  6:24                       ` Rogan Dawes
@ 2007-04-18 15:02                         ` Linus Torvalds
  2007-04-18 15:34                           ` Nicolas Pitre
  2007-04-18 15:38                           ` Rogan Dawes
  0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-18 15:02 UTC (permalink / raw)
  To: Rogan Dawes; +Cc: Nicolas Pitre, David Lang, Andy Parkins, git, Junio C Hamano



On Wed, 18 Apr 2007, Rogan Dawes wrote:
> 
> Or similarly, when checking an "ODF" file in, the attribute would lead to an
> appropriate script creating the "tree" of individual files.
> 
> Does this sound workable?

I think it sounds very interesting, and I'd much rather do _those_ kinds 
of rewrites than keyword unexpansion. And yes, some kind of generic 
support for rewriting might give people effectively the keywords they want 
(I think the CVS semantics are not likely to be logical, but people can 
probably do something that works for them), and at that point maybe the 
keyword discussion goes away too.

However, I don't know if it is "workable".

The thing is, it's easy enough (although potentially _very_ expensive) to 
run some per-file script at each commit and at each checkout. But there 
are some fundamental operations that are even more common:

 - checking for "file changed", aka the "git status" kind of thing

   Anything we do would have to follow the same "stat" rules, at a 
   minimum. You can *not* afford to have to check the file manually.

   So especially if you combine several pieces into one, or split one file 
   into several pieces, your index would have to contain the entry 
   that matches the _filesystem_ (because that's what the index is all 
   about), but then the *tree* would contain the pieces (or the single 
   entry that matches several filesystem entries).

 - what about diffs (once the stat information says something has 
   potentially changed)? You'd have to script those too, and it really 
   sounds like some very basic operations get a _lot_ more expensive and 
   complex.

   This is also related to the above: one of the most fundamental diffs is 
   the diff of the index and a tree - so if the index matches the 
   "filesystem state" and the trees contain some "combined entry" or 
   "split entry", you'd have to teach some very core diff functionality 
   about that kind of mapping.

In other words, I think it's too complicated. Not necessarily impossible, 
but likely harder and more complex than it's really worth.

Having a 1:1 file mapping (like the CRLF<->LF object mapping is) is a lot 
easier. You just have to make sure that the index has the *stat* 
information from the filesystem, but the *sha1* identity information from 
the git internal format, and things automatically just fall out right. But 
if you have anything but a 1:1 relationship, it gets hugely more complex.

			Linus

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18 11:14                           ` Johannes Schindelin
@ 2007-04-18 15:10                             ` Nicolas Pitre
  2007-04-19  8:19                               ` Johannes Schindelin
  0 siblings, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18 15:10 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, David Lang, Andy Parkins, git, Linus Torvalds

On Wed, 18 Apr 2007, Johannes Schindelin wrote:

> Hi,
> 
> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
> 
> > So what?
> > 
> > We provide a rope with proper caveat emptor.  Up to others to hang 
> > themselves with it if they so desire.  It is not our problem anymore.
> 
> The people will complain. On this list. And I have to check the mails 
> before deleting, because the Subject: does not say "I just took the rope, 
> ignored your caveat emptor, and now I am dead. What should I do now?".

Well... in the case of keyword expansion (since this is really the 
contentious case here), with such a _generic_ facility to implement 
external filters, people will at least have the opportunity to try it.  
Sure they might complain that it doesn't work well, but 1) it is much 
easier to *understand* why it doesn't work well after experimenting with 
it, and 2) some people *will* be perfectly happy with something that 
doesn't work well but happens to just work in their own particular case.

And because it now becomes a case by case issue it is much easier for 
us to simply provide a generic mechanism and let people figure out by 
themselves what works and what doesn't work instead of having 
philosophical discussions on the merits of keyword expansions on the 
list.

And because people _will_ complain *anyway*, it might lead to more 
productive discussion if those who complain had the chance to realize 
what the issues really are by experience if theoretic demonstrations alone 
doesn't convey the problem fully.


Nicolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18 15:02                         ` Linus Torvalds
@ 2007-04-18 15:34                           ` Nicolas Pitre
  2007-04-18 15:38                           ` Rogan Dawes
  1 sibling, 0 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18 15:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Rogan Dawes, David Lang, Andy Parkins, git, Junio C Hamano

On Wed, 18 Apr 2007, Linus Torvalds wrote:

> 
> 
> On Wed, 18 Apr 2007, Rogan Dawes wrote:
> > 
> > Or similarly, when checking an "ODF" file in, the attribute would lead to an
> > appropriate script creating the "tree" of individual files.
> > 
> > Does this sound workable?
> 
> I think it sounds very interesting, and I'd much rather do _those_ kinds 
> of rewrites than keyword unexpansion. And yes, some kind of generic 
> support for rewriting might give people effectively the keywords they want 
> (I think the CVS semantics are not likely to be logical, but people can 
> probably do something that works for them), and at that point maybe the 
> keyword discussion goes away too.

Exactly my point.

> However, I don't know if it is "workable".
> 
> The thing is, it's easy enough (although potentially _very_ expensive) to 
> run some per-file script at each commit and at each checkout. But there 
> are some fundamental operations that are even more common:
> 
>  - checking for "file changed", aka the "git status" kind of thing
> 
>    Anything we do would have to follow the same "stat" rules, at a 
>    minimum. You can *not* afford to have to check the file manually.
> 
>    So especially if you combine several pieces into one, or split one file 
>    into several pieces, your index would have to contain the entry 
>    that matches the _filesystem_ (because that's what the index is all 
>    about), but then the *tree* would contain the pieces (or the single 
>    entry that matches several filesystem entries).

For that the external script would need the ability to alter the index 
itself.  That becomes a bit yucky.  Or maybe something could be made 
with a mechanism like dnotify/inotify to "touch" the single placeholder 
entry referenced by the index whenever one of the split component 
changes.

>  - what about diffs (once the stat information says something has 
>    potentially changed)? You'd have to script those too, and it really 
>    sounds like some very basic operations get a _lot_ more expensive and 
>    complex.

Of course an attribute for external diff script is certainly something 
that could be useful independently of this case, as some particular 
binary formats might have a way of their own to display their 
differences.

The whole idea of having the ability to call external tools is exactly 
to delegate complex/bizarre/unusual tasks to separate and independent 
agents.  The whole checkout operation becomes much more expensive but 
everyone using such facility might expect it.  It just cannot be as bad 
as a straight checkout with CVS from a remote server though (OK I know 
it can but you know what I mean).

>    This is also related to the above: one of the most fundamental diffs is 
>    the diff of the index and a tree - so if the index matches the 
>    "filesystem state" and the trees contain some "combined entry" or 
>    "split entry", you'd have to teach some very core diff functionality 
>    about that kind of mapping.

Well, if the split components are represented by a single placeholder in 
the index and the filesystem, and the filesystem placeholder is 
"touched" whenever a split component is modified, then the mapping can 
as well be limited to the external scripts for checkin/checkout/diff 
only without the Git core having the slightest idea about it.

Sure it might be slow and unusual, but at least not impossible.  And 
again, with an attribute providing a facility for external tools it is 
then not our problem anymore.


Nicolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18 15:02                         ` Linus Torvalds
  2007-04-18 15:34                           ` Nicolas Pitre
@ 2007-04-18 15:38                           ` Rogan Dawes
  2007-04-18 15:59                             ` Nicolas Pitre
  1 sibling, 1 reply; 66+ messages in thread
From: Rogan Dawes @ 2007-04-18 15:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nicolas Pitre, David Lang, Andy Parkins, git, Junio C Hamano

Linus Torvalds wrote:
> 
> On Wed, 18 Apr 2007, Rogan Dawes wrote:
>> Or similarly, when checking an "ODF" file in, the attribute would lead to an
>> appropriate script creating the "tree" of individual files.
>>
>> Does this sound workable?
> 
> I think it sounds very interesting, and I'd much rather do _those_ kinds 
> of rewrites than keyword unexpansion. And yes, some kind of generic 
> support for rewriting might give people effectively the keywords they want 
> (I think the CVS semantics are not likely to be logical, but people can 
> probably do something that works for them), and at that point maybe the 
> keyword discussion goes away too.
> 
> However, I don't know if it is "workable".
> 
> The thing is, it's easy enough (although potentially _very_ expensive) to 
> run some per-file script at each commit and at each checkout. But there 
> are some fundamental operations that are even more common:
> 
>  - checking for "file changed", aka the "git status" kind of thing
> 
>    Anything we do would have to follow the same "stat" rules, at a 
>    minimum. You can *not* afford to have to check the file manually.
> 
>    So especially if you combine several pieces into one, or split one file 
>    into several pieces, your index would have to contain the entry 
>    that matches the _filesystem_ (because that's what the index is all 
>    about), but then the *tree* would contain the pieces (or the single 
>    entry that matches several filesystem entries).

Right. I would imagine that the script would have to take care of 
setting timestamps in the filesystem appropriately, as well as passing 
them back to git when queried.

e.g. expanding test.odf/: (since we store it as a directory)

git calls "odf.sh checkout test.odf/ <sha1> <perms> <stat>"

odf checkout calls back into git to find out the details of the files 
under test.odf/, and creates a zip file containing the individual files, 
with appropriate timestamps.

User then opens the file using OO.o or whatever, makes some changes and 
saves the file.

The user then runs git status:

git calls "odf.sh stat test.odf/" (again, triggered by an attribute)

odf.sh does the equivalent of "zip -l" to get up to date stat info for 
the component files, and passes it back to git (via stdout?)

User commits his changes:

git calls "odf.sh checkin test.odf/"

odf.sh unpacks the individual files, calls back into git to create 
individual objects (using a fast-import-alike protocol over stdout?)


> 
>  - what about diffs (once the stat information says something has 
>    potentially changed)? You'd have to script those too, and it really 
>    sounds like some very basic operations get a _lot_ more expensive and 
>    complex.
 >
>    This is also related to the above: one of the most fundamental diffs is 
>    the diff of the index and a tree - so if the index matches the 
>    "filesystem state" and the trees contain some "combined entry" or 
>    "split entry", you'd have to teach some very core diff functionality 
>    about that kind of mapping.
> 
> In other words, I think it's too complicated. Not necessarily impossible, 
> but likely harder and more complex than it's really worth.
> 
> Having a 1:1 file mapping (like the CRLF<->LF object mapping is) is a lot 
> easier. You just have to make sure that the index has the *stat* 
> information from the filesystem, but the *sha1* identity information from 
> the git internal format, and things automatically just fall out right. But 
> if you have anything but a 1:1 relationship, it gets hugely more complex.
> 
> 			Linus

Absolutely. I just raised it now since it was originally mentioned quite 
a long time ago as a possible feature of git, and I couldn't see how it 
might work.

Thanks for your time,

Rogan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18 15:38                           ` Rogan Dawes
@ 2007-04-18 15:59                             ` Nicolas Pitre
  2007-04-18 16:09                               ` Rogan Dawes
  2007-04-18 17:58                               ` Alon Ziv
  0 siblings, 2 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18 15:59 UTC (permalink / raw)
  To: Rogan Dawes; +Cc: Linus Torvalds, David Lang, Andy Parkins, git, Junio C Hamano

On Wed, 18 Apr 2007, Rogan Dawes wrote:

> Right. I would imagine that the script would have to take care of setting
> timestamps in the filesystem appropriately, as well as passing them back to
> git when queried.
> 
> e.g. expanding test.odf/: (since we store it as a directory)
> 
> git calls "odf.sh checkout test.odf/ <sha1> <perms> <stat>"
> 
> odf checkout calls back into git to find out the details of the files under
> test.odf/, and creates a zip file containing the individual files, with
> appropriate timestamps.

Why would you need to store the document as multiple files into Git?

The only reasons I can see for external filters are:

 1) Normalization, e.g. the LF->CRLF thing.

    Some might want to do keyword expansion which would fall into this
    category as well.

 2) Better archiving with Git's deltas.

    That means storing files uncompressed into Git since Git will
    compress them anyway, after significant space reduction due to 
    deltas which cannot occur on already compressed data.

So if your .odf file is actually a zip with multiple files, then all you 
have to do is to convert that zip archive into a non compressed tar 
archive on checkins, and the reverse transformation on checkouts.  The 
non compressed tar content will delta well, the Git archive will be 
small, and no tricks with the index will be needed.

Or am I missing something?


Nicolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18 15:59                             ` Nicolas Pitre
@ 2007-04-18 16:09                               ` Rogan Dawes
  2007-04-18 17:58                               ` Alon Ziv
  1 sibling, 0 replies; 66+ messages in thread
From: Rogan Dawes @ 2007-04-18 16:09 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Linus Torvalds, David Lang, Andy Parkins, git, Junio C Hamano

Nicolas Pitre wrote:
> On Wed, 18 Apr 2007, Rogan Dawes wrote:
> 
>> Right. I would imagine that the script would have to take care of setting
>> timestamps in the filesystem appropriately, as well as passing them back to
>> git when queried.
>>
>> e.g. expanding test.odf/: (since we store it as a directory)
>>
>> git calls "odf.sh checkout test.odf/ <sha1> <perms> <stat>"
>>
>> odf checkout calls back into git to find out the details of the files under
>> test.odf/, and creates a zip file containing the individual files, with
>> appropriate timestamps.
> 
> Why would you need to store the document as multiple files into Git?
> 
> The only reasons I can see for external filters are:
> 
>  1) Normalization, e.g. the LF->CRLF thing.
> 
>     Some might want to do keyword expansion which would fall into this
>     category as well.
> 
>  2) Better archiving with Git's deltas.
> 
>     That means storing files uncompressed into Git since Git will
>     compress them anyway, after significant space reduction due to 
>     deltas which cannot occur on already compressed data.
> 
> So if your .odf file is actually a zip with multiple files, then all you 
> have to do is to convert that zip archive into a non compressed tar 
> archive on checkins, and the reverse transformation on checkouts.  The 
> non compressed tar content will delta well, the Git archive will be 
> small, and no tricks with the index will be needed.
> 
> Or am I missing something?
> 
> 
> Nicolas

Probably not! ;-)

I was just thinking that it would be easier to see diffs between 
individual files, rather than between entries in a zip. But if we are 
calling out to a specialized handler, the handler can do that just as 
easily, and without the added complexity in the index, etc.

It also means that someone without the attributes and specialized 
handler would not be able to use the file (if it is stored as a directory).

Clearly a bad idea! Just ignore me, I'm used to it! ;-)

Rogan

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18 15:59                             ` Nicolas Pitre
  2007-04-18 16:09                               ` Rogan Dawes
@ 2007-04-18 17:58                               ` Alon Ziv
  1 sibling, 0 replies; 66+ messages in thread
From: Alon Ziv @ 2007-04-18 17:58 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Rogan Dawes, Linus Torvalds, David Lang, Andy Parkins, git,
	Junio C Hamano

On Wed, 2007-04-18 at 11:59 -0400, Nicolas Pitre wrote:
> So if your .odf file is actually a zip with multiple files, then all you 
> have to do is to convert that zip archive into a non compressed tar 
> archive on checkins, and the reverse transformation on checkouts.  The 
> non compressed tar content will delta well, the Git archive will be 
> small, and no tricks with the index will be needed.
> 

In fact, for the specific case of OO.o files, I would claim the proper
transformation is just converting to non-compressed zip on checkin...

(Non-compressed zip is just as good here as tar, and has the added
advantage that there is no need for a reverse transformation on
checkout :))

	-az

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-18 15:10                             ` Nicolas Pitre
@ 2007-04-19  8:19                               ` Johannes Schindelin
  0 siblings, 0 replies; 66+ messages in thread
From: Johannes Schindelin @ 2007-04-19  8:19 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Junio C Hamano, David Lang, Andy Parkins, git, Linus Torvalds

Hi,

On Wed, 18 Apr 2007, Nicolas Pitre wrote:

> And because it now becomes a case by case issue it is much easier for us 
> to simply provide a generic mechanism and let people figure out by 
> themselves what works and what doesn't work instead of having 
> philosophical discussions on the merits of keyword expansions on the 
> list.

That's a very good point.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 11:35   ` Andy Parkins
  2007-04-17 15:53     ` Linus Torvalds
  2007-04-17 21:24     ` Junio C Hamano
@ 2007-04-20  0:30     ` Jakub Narebski
  2007-04-21  0:47       ` David Lang
  2 siblings, 1 reply; 66+ messages in thread
From: Jakub Narebski @ 2007-04-20  0:30 UTC (permalink / raw)
  To: git

Andy Parkins wrote:

>>  * We do not do the borrowing from working tree when doing
>>    grep_sha1(), but when we grep inside a file from working tree
>>    with grep_file(), we do not currently make it go through
>>    convert_to_git() to fix line endings.  Maybe we should, if
>>    only for consistency.
> 
> I'd actually argue not - git-grep searches the working tree.  The expanded 
> keywords are in the working tree.  Take the CRLF case - I'm a clueless user, 
> who only understands the system I'm working on.  I want to search for all the 
> line endings, so I do git-grep "\r\n" - that should work, because I'm 
> searching my working tree.

Actually, "git grep" can search both the working tree (default), but also
an index (--cached), or specified tree (or tree-ish). The same with
"git diff": it can work on tree (repository), index, working tree version,
now I think in [almost] any combination. 

Think what keyword expansion means to all this... Well, you can have -kk
to expand/not expand keywords, but this is avoiding issue, not solving it
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 20:46               ` Andy Parkins
  2007-04-17 20:52                 ` [PATCH] Add keyword collapse " Andy Parkins
  2007-04-17 21:10                 ` [PATCH 2/2] Add keyword unexpansion " Linus Torvalds
@ 2007-04-20 11:32                 ` Nikolai Weibull
  2 siblings, 0 replies; 66+ messages in thread
From: Nikolai Weibull @ 2007-04-20 11:32 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Linus Torvalds, Junio C Hamano

On 4/17/07, Andy Parkins <andyparkins@gmail.com> wrote:
> On Tuesday 2007, April 17, Linus Torvalds wrote:
>
> > No, you haven't. You've "addressed" them by stating they don't
> > matter. It doesn't "matter" that a diff won't actually apply to a
> > checked-out tree, because you fix it up in another tool.
>
> Okay.  I think this is a matter of perspective - my perspective is that
> if it supplies what svn/cvs supply then that would please the people
> who want it (of whom I am one); yours is obviously that if it isn't
> perfect, it's not worth doing.  That's a reasonable thing to demand,
> and I'm not going to try and argue you out of it.

Loads of people would be pleased if marijuana was legalized.  For some
reason, few governments seem willing to cater to their needs.

  nikolai

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-17 22:40                       ` Junio C Hamano
  2007-04-18  2:39                         ` Nicolas Pitre
@ 2007-04-21  0:42                         ` David Lang
  2007-04-21  1:54                           ` Junio C Hamano
  1 sibling, 1 reply; 66+ messages in thread
From: David Lang @ 2007-04-21  0:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Andy Parkins, git, Linus Torvalds

On Tue, 17 Apr 2007, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
>
>>> I would like to, however this doesn't currently integrate
>>> well with git. I've been told in the past that once
>>> .gitattributes is in place then the hooks for the crlf stuff
>>> can be generalized to allow for calls out to custom code to
>>> do this sort of thing.
>>
>> And I agree that this is a perfectly sensible thing to do.  The facility
>> should be there for you to apply any kind of transformation with
>> external tools on data going in or out from Git.  There are good and bad
>> things you can do with such a facility, but at least it becomes your
>> responsibility to screw^H^H^H^Hfilter your data and not something that
>> is enforced by Git itself.
>
> You have to be careful, though.  Depending on what kind of
> transformation you implement with the external tools, you would
> end up having to slow down everything we would do.

you can slow down everything that you do on your system if you defined too much 
work for external tools, that won't slow other people down who don't define any 
work for external tools.

> It boils down to this statement from Andy:
>
>    ..., keywords (in other VCSs, and so why not in git) are
>    only updated when a file is checked out.  There is no need
>    to touch every file.  It's actually beneficial, because the
>    keyword in the file is the state of the file at the time it
>    was checked in - which is actually more useful than updating
>    it to the latest commit every time.
>
>    That means you're only ever expanding in a file that your
>    changing anyway - so it's effectively free.  git-checkout
>    would still be immediate and instantaneous.
>
> Back up a bit and think what "when a file is checked out" means.
> His argument assumes the current behaviour of not checking out
> when the underlying blob objects before munging are the same.

correct.

> But with keyword expansion and fancier "external tools" whose
> semantics are not well defined (iow, defined to be "do whatever
> they please"), does it still make sense to consider two blobs
> that appear in totally different context "the same" and omit
> checking out (and causing the external tools hook not getting
> run)?  I already pointed out to Andy that the branch name the
> file was taken from, if it were to take part of the keyword
> expansion, would come out incorrectly in his printed svg
> drawing.

this is part of the rope you are handing out. the external tool could do a lot 
of things that don't make sense. you could have the tool include the serial 
number of the cpu you happen to be running on at the moment, it wouldn't make 
sense to do this, but it could be done. the fact that the rope could be used to 
hang someone doesn't mean that you should outlaw rope.

> If you want somebody's earlier example of "giving a file with
> embedded keyword to somebody, who modifies and sends the result
> back in full, now you would want to incorporate the change by
> identifying the origin" to work, you would want "$Source$" (I am
> looking at CVS documentation, "Keyword substitution/Keyword
> List") to identify where that file came from (after all, a
> source tree could have duplicated files) so that you can tell
> which file the update is about, and this keyword would expand
> differently depending on where in the project tree the blob
> appears.
>
> It is not just the checkout codepath.  We omit diffs when we
> know from SHA-1 that the blobs are the same before decoration.
> We even omit diffs when we know from SHA-1 that two trees are
> the same without taking possible decorations that can be applied
> differently to the blobs they contain into account.  Earlier,
> Andy said he wanted to grep for the expanded text if he is
> grepping in the working tree, and I think that makes sense, but
> that means git-grep cannot do the same "borrow from working tree
> when expanding from blob object is more expensive" optimization
> we have for diff.  We also need to disable that optimization
> from the diff, regardless of what the correct semantics for
> grepping in working trees should be.

git would not be able to borrow from the working tree just becouse the index 
thinks that the file is the same (and frankly, I'm not sure this is really a 
safe thing to do in any case, it's just something that works frequently enough 
that we get away with it)

the diff optimizations could (and should) stay.

> I suspect that you would have to play safe and say "when
> external tools are involved, we need to disable the existing
> content SHA-1 based optimization for all paths that ask for
> them" to keep your sanity.

Andy and I are both expecting that if the blobs are the same that none of the 
git tools would flag them as different. this maintains the huge speedups that 
git achieves by doing these checks.

if you want to make an option somewhere that disables this optimization, I guess 
it would be Ok, but I wouldn't do so until someone came up with a situation 
where they really needed it, nothing in what Andy or I have asked for needs 
this.

both of us are treating the keyword expansion as decorations to the file. it's 
useful, but the core meaning of 'what this file is' is the checked in version 
with the keywords unexpanded. all the optmizations that only look at what's 
checked in will remain as valid as they are today, it's only things that look at 
your working directory that would change.

David Lang

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-20  0:30     ` Jakub Narebski
@ 2007-04-21  0:47       ` David Lang
  0 siblings, 0 replies; 66+ messages in thread
From: David Lang @ 2007-04-21  0:47 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1884 bytes --]

On Fri, 20 Apr 2007, Jakub Narebski wrote:

> Andy Parkins wrote:
>
>>>  * We do not do the borrowing from working tree when doing
>>>    grep_sha1(), but when we grep inside a file from working tree
>>>    with grep_file(), we do not currently make it go through
>>>    convert_to_git() to fix line endings.  Maybe we should, if
>>>    only for consistency.
>>
>> I'd actually argue not - git-grep searches the working tree.  The expanded
>> keywords are in the working tree.  Take the CRLF case - I'm a clueless user,
>> who only understands the system I'm working on.  I want to search for all the
>> line endings, so I do git-grep "\r\n" - that should work, because I'm
>> searching my working tree.
>
> Actually, "git grep" can search both the working tree (default), but also
> an index (--cached), or specified tree (or tree-ish). The same with
> "git diff": it can work on tree (repository), index, working tree version,
> now I think in [almost] any combination.
>
> Think what keyword expansion means to all this... Well, you can have -kk
> to expand/not expand keywords, but this is avoiding issue, not solving it

how is git-grep on the working tree any different than just useing grep? the 
value in the git-* versions of system commands are that they work on the 
history, index, etc wher ethe normal system tools don't.

in this particular case, since the git user can do a git-grep of the working 
tree, or a git-grep of HEAD (or of the index), I don't think that it hurts much 
either way.

if git-grep of the working tree converts things to the checked-in version before 
the pattern match, the user can still use grep  to go through the checked-out 
version

if git-grep of the working tree doesn't convert things to the checked-in version 
before the pattern match, the user can stil use git-grep HEAD or --cached to go 
through the checked-in version.

David Lang

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-21  0:42                         ` David Lang
@ 2007-04-21  1:54                           ` Junio C Hamano
  2007-04-21  2:06                             ` Nicolas Pitre
  2007-04-21 23:31                             ` David Lang
  0 siblings, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-21  1:54 UTC (permalink / raw)
  To: David Lang; +Cc: Nicolas Pitre, Andy Parkins, git, Linus Torvalds

David Lang <david.lang@digitalinsight.com> writes:

>> But with keyword expansion and fancier "external tools" whose
>> semantics are not well defined (iow, defined to be "do whatever
>> they please"), does it still make sense to consider two blobs
>> that appear in totally different context "the same" and omit
>> checking out (and causing the external tools hook not getting
>> run)?  I already pointed out to Andy that the branch name the
>> file was taken from, if it were to take part of the keyword
>> expansion, would come out incorrectly in his printed svg
>> drawing.
>
> this is part of the rope you are handing out. the external tool could
> do a lot of things that don't make sense. you could have the tool
> include the serial number of the cpu you happen to be running on at
> the moment, it wouldn't make sense to do this, but it could be
> done. the fact that the rope could be used to hang someone doesn't
> mean that you should outlaw rope.

I do not think you understand, especially after reading the part
you say "Andy and I both...".

The point of my comment was that with Andy's definition of when
the "external tools" should trigger, that CPU serial number
embedder would _NOT_ trigger for a path when you switch branches
that have the same contents at that path.  External tools can do
stupid things and that is what you are calling the rope.  But
the case I am talking about is that we deliberately do _not_
call external tools, so even if external tools can do sensible
things if given a chance to, they are not given a chance to do
so, and deciding not to call them in some cases is made by us.
I think that's different from "we gave you rope, you hang
yourself and that is not our problem".

People have every right to say "if you consistently call these
external tools, they behave sensibly, but you only call them
when you choose, and that is where the idiocy is coming from".
How would you respond to that?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-21  1:54                           ` Junio C Hamano
@ 2007-04-21  2:06                             ` Nicolas Pitre
  2007-04-21 23:31                             ` David Lang
  1 sibling, 0 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-21  2:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Lang, Andy Parkins, git, Linus Torvalds

On Fri, 20 Apr 2007, Junio C Hamano wrote:

> The point of my comment was that with Andy's definition of when
> the "external tools" should trigger, that CPU serial number
> embedder would _NOT_ trigger for a path when you switch branches
> that have the same contents at that path.  External tools can do
> stupid things and that is what you are calling the rope.  But
> the case I am talking about is that we deliberately do _not_
> call external tools, so even if external tools can do sensible
> things if given a chance to, they are not given a chance to do
> so, and deciding not to call them in some cases is made by us.

And that's a fairly acceptable limitation IMHO, which doesn't make the 
thing any less useful for many cases.  We just need to document it 
appropriately.


Nicolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
  2007-04-21  1:54                           ` Junio C Hamano
  2007-04-21  2:06                             ` Nicolas Pitre
@ 2007-04-21 23:31                             ` David Lang
  1 sibling, 0 replies; 66+ messages in thread
From: David Lang @ 2007-04-21 23:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Andy Parkins, git, Linus Torvalds

On Fri, 20 Apr 2007, Junio C Hamano wrote:

> David Lang <david.lang@digitalinsight.com> writes:
>
>>> But with keyword expansion and fancier "external tools" whose
>>> semantics are not well defined (iow, defined to be "do whatever
>>> they please"), does it still make sense to consider two blobs
>>> that appear in totally different context "the same" and omit
>>> checking out (and causing the external tools hook not getting
>>> run)?  I already pointed out to Andy that the branch name the
>>> file was taken from, if it were to take part of the keyword
>>> expansion, would come out incorrectly in his printed svg
>>> drawing.
>>
>> this is part of the rope you are handing out. the external tool could
>> do a lot of things that don't make sense. you could have the tool
>> include the serial number of the cpu you happen to be running on at
>> the moment, it wouldn't make sense to do this, but it could be
>> done. the fact that the rope could be used to hang someone doesn't
>> mean that you should outlaw rope.
>
> I do not think you understand, especially after reading the part
> you say "Andy and I both...".

sorry for not being clear

> The point of my comment was that with Andy's definition of when
> the "external tools" should trigger, that CPU serial number
> embedder would _NOT_ trigger for a path when you switch branches
> that have the same contents at that path.  External tools can do
> stupid things and that is what you are calling the rope.  But
> the case I am talking about is that we deliberately do _not_
> call external tools, so even if external tools can do sensible
> things if given a chance to, they are not given a chance to do
> so, and deciding not to call them in some cases is made by us.
> I think that's different from "we gave you rope, you hang
> yourself and that is not our problem".

the cpu serial number would be different for each cpu in a system, so you could 
get different answers, even in the same branch (let alone on different systems, 
which is what I was thinking of when I wrote that example)

my point was that while it's possible to define external tools that will cause 
problems (my having effectvly random changes to the files), and when external 
tools are used it will slow things down (how much depends on the tools), it's 
also possible to define external tools that only have well defined, easily 
reversable effects, that only touch a few files, and so don't cause a huge 
performance hit.

> People have every right to say "if you consistently call these
> external tools, they behave sensibly, but you only call them
> when you choose, and that is where the idiocy is coming from".
> How would you respond to that?

consistantly calling the external tools is not the same thing as calling them 
for every possible thing that refrences the file, it's calling them every time a 
particular type of access to the file is made (and it helps to have well defined 
and well documented rules for when they are used)

I think the basic rule of 'git commands work against the checked-in version of 
the file' is a solid basis to work from. This means that you can't optimize by 
sometimes looking at the checked-out version, and there may still be some corner 
cases to explain/clarify/define (like the git-diff against the working tree 
mentioned in other messages)

David Lang

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2007-04-22  0:05 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-17  9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
     [not found] ` <200704171803.58940.an dyparkins@gmail.com>
2007-04-17 10:09 ` Junio C Hamano
2007-04-17 11:35   ` Andy Parkins
2007-04-17 15:53     ` Linus Torvalds
2007-04-17 17:03       ` Andy Parkins
2007-04-17 18:12         ` Linus Torvalds
2007-04-17 19:12           ` Andy Parkins
     [not found]             ` <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>
2007-04-17 19:41             ` Nicolas Pitre
2007-04-17 19:45               ` David Lang
     [not found]                 ` <alpin e.LFD.0.98.0704171624190.4504@xanadu.home>
2007-04-17 20:29                 ` Nicolas Pitre
2007-04-17 20:05                   ` David Lang
2007-04-17 21:16                     ` Nicolas Pitre
     [not found]                       ` <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>
2007-04-17 20:53                       ` David Lang
2007-04-17 21:52                         ` Andy Parkins
2007-04-17 22:40                       ` Junio C Hamano
2007-04-18  2:39                         ` Nicolas Pitre
2007-04-18  5:04                           ` Junio C Hamano
2007-04-18 14:56                             ` Nicolas Pitre
2007-04-18 11:14                           ` Johannes Schindelin
2007-04-18 15:10                             ` Nicolas Pitre
2007-04-19  8:19                               ` Johannes Schindelin
2007-04-21  0:42                         ` David Lang
2007-04-21  1:54                           ` Junio C Hamano
2007-04-21  2:06                             ` Nicolas Pitre
2007-04-21 23:31                             ` David Lang
2007-04-18  6:24                       ` Rogan Dawes
2007-04-18 15:02                         ` Linus Torvalds
2007-04-18 15:34                           ` Nicolas Pitre
2007-04-18 15:38                           ` Rogan Dawes
2007-04-18 15:59                             ` Nicolas Pitre
2007-04-18 16:09                               ` Rogan Dawes
2007-04-18 17:58                               ` Alon Ziv
2007-04-17 19:54             ` Linus Torvalds
2007-04-17 20:46               ` Andy Parkins
2007-04-17 20:52                 ` [PATCH] Add keyword collapse " Andy Parkins
2007-04-17 21:10                 ` [PATCH 2/2] Add keyword unexpansion " Linus Torvalds
2007-04-17 21:13                   ` Linus Torvalds
2007-04-18 11:11                     ` Johannes Schindelin
2007-04-20 11:32                 ` Nikolai Weibull
2007-04-17 21:18             ` Martin Langhoff
2007-04-17 21:24     ` Junio C Hamano
2007-04-20  0:30     ` Jakub Narebski
2007-04-21  0:47       ` David Lang
2007-04-17 15:46   ` Linus Torvalds
2007-04-17 10:41 ` Johannes Sixt
2007-04-17 15:32 ` Linus Torvalds
2007-04-17 17:10   ` Andy Parkins
2007-04-17 17:18   ` Rogan Dawes
2007-04-17 18:23     ` Linus Torvalds
2007-04-17 20:27       ` Rogan Dawes
2007-04-17 23:56       ` Robin H. Johnson
2007-04-18  0:02         ` Junio C Hamano
2007-04-18  0:26           ` J. Bruce Fields
2007-04-18  1:19             ` Linus Torvalds
2007-04-18  1:28               ` Junio C Hamano
2007-04-18  1:33                 ` Linus Torvalds
2007-04-18  1:06           ` Robin H. Johnson
2007-04-18  1:15             ` Junio C Hamano
2007-04-18  1:42               ` Junio C Hamano
2007-04-18  2:53                 ` Robin H. Johnson
2007-04-18  4:15           ` Daniel Barkalow
2007-04-18 11:32           ` Johannes Schindelin
2007-04-18  2:50         ` Martin Langhoff
2007-04-18 10:06         ` David Kågedal
2007-04-18 11:08           ` Robin H. Johnson
2007-04-17 21:00 ` Matthieu Moy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).