* [PATCH 2/2] Add keyword unexpansion support to convert.c
@ 2007-04-17 9:41 Andy Parkins
[not found] ` <200704171803.58940.an dyparkins@gmail.com>
` (4 more replies)
0 siblings, 5 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 9:41 UTC (permalink / raw)
To: git
This patch adds expansion of keywords support. The unexpansion is only
performed when the "keywords" attribute it found for a file. The check
for this attribute is done in the same way as the "crlf" attribute
check.
The actual unexpansion is performed by keyword_unexpand_git() which is
called from convert_to_git() when the "keywords" attribute is found.
keyword_unexpand_git() finds strings of the form
$KEYWORD: ARBITRARY STRING$
And collapses them into
$KEYWORD:$
No parsing of the keyword itself is performed, the content is simply
dropped.
Despite the fact that this doesn't do anything useful from the users
perspective, this patch forms the more important half of keyword
expansion support - because it prevents the expansion from entering the
repository. It effectively creates blind spots that git tools won't
see.
convert_to_git() has also been changed so that it no longer only does
CRLF conversion. Instead, a flag is kept to say whether any conversion
was done by the CRLF code, and then that converted buffer is passed to
keyword_unexpand_git() and the flag again updated. It then returns 1 if
either of these conversion functions actually changed anything.
I've also included a test script to show that the keyword unexpansion is
working. It particular demonstrates that the diff between a file with
keywords and the repository is blind to the expanded keyword.
Signed-off-by: Andy Parkins <andyparkins@gmail.com>
---
I'm not submitting this for application; I've not polished it, and I've not
written the expansion half yet.
However, I did want to show what I've been banging on about with keyword
expansion, and this does a reasonable job. The test code shows that the idea
is sound - what goes in the repository is stable, and what appears in the
working directory can contain any arbitrary keyword expansion.
Adding expansion is harder, as I have no clue which calls to make to find
even the most basic information about an object; but I thought I'd get
feedback before I expend that effort.
Areas that might cause problems are the git-apply type of commands, I haven't
checked to see if they use convert_to_git() on their input to normalise it
for entry into the repository. I hope so, as the CRLF support relies on it as
well :-)
convert.c | 115 +++++++++++++++++++++++++++++++++++++++++++++++++-
t/t0030-keywords.sh | 76 +++++++++++++++++++++++++++++++++
2 files changed, 188 insertions(+), 3 deletions(-)
create mode 100755 t/t0030-keywords.sh
diff --git a/convert.c b/convert.c
index d0d4b81..0c7b270 100644
--- a/convert.c
+++ b/convert.c
@@ -230,16 +230,125 @@ static int git_path_check_crlf(const char *path)
return attr_crlf_check.isset;
}
+/* ------------------ keywords -------------------- */
+
+static void setup_keyword_check(struct git_attr_check *check)
+{
+ static struct git_attr *attr_keyword;
+
+ if (!attr_keyword)
+ attr_keyword = git_attr("keywords", 8);
+ check->attr = attr_keyword;
+}
+
+static int git_path_check_keyword(const char *path)
+{
+ struct git_attr_check attr_keyword_check;
+
+ setup_keyword_check(&attr_keyword_check);
+
+ if (git_checkattr(path, 1, &attr_keyword_check))
+ return -1;
+ return attr_keyword_check.isset;
+}
+
+static int keyword_unexpand_git(const char *path, char **bufp, unsigned long *sizep)
+{
+ char *buffer, *nbuf, *keyword;
+ unsigned long size, keywordlength;
+ int changes = 0;
+ enum {
+ IN_VOID,
+ PRE_KEYWORD,
+ IN_KEYWORD,
+ IN_EXPANSION,
+ END_KEYWORD
+ } parser_state = IN_VOID;
+
+ size = *sizep;
+ if (!size)
+ return 0;
+ buffer = *bufp;
+
+ /*
+ * Allocate an identically sized buffer, keyword unexpansion can
+ * only reduce the size so we'll never overflow (although we might
+ * waste a few bytes
+ */
+ nbuf = xmalloc(size);
+ *bufp = nbuf;
+
+ while (size) {
+ unsigned char c;
+
+ c = *buffer;
+
+ switch( parser_state ) {
+ case IN_VOID: /* Normal characters, wait for '$' */
+ if (c == '$')
+ parser_state = PRE_KEYWORD;
+ break;
+ case PRE_KEYWORD: /* Gap between '$' and keyword */
+ keywordlength = 0;
+ keyword = buffer;
+ if (!isspace(c))
+ parser_state = IN_KEYWORD;
+ else
+ break;
+ case IN_KEYWORD: /* Keyword itself */
+ if (c == ':')
+ parser_state = IN_EXPANSION;
+ else if (c == '$' || c == '\n' || c == '\r')
+ parser_state = END_KEYWORD;
+ else
+ keywordlength++;
+ break;
+ case IN_EXPANSION: /* The expansion gets silently removed */
+ if (c == '$' || c == '\n')
+ parser_state = END_KEYWORD;
+ else {
+ changes = 1;
+ /* Every character we skip reduces the overall size */
+ (*sizep)--;
+ buffer++;
+ size--;
+ }
+ continue;
+ case END_KEYWORD: /* End of keyword */
+ parser_state = IN_VOID;
+ break;
+ }
+
+ *nbuf++ = c;
+ buffer++;
+ size--;
+ }
+
+ return (changes != 0);
+}
+
+
+/* ------------------------------------------------ */
int convert_to_git(const char *path, char **bufp, unsigned long *sizep)
{
+ int changes = 0;
+
switch (git_path_check_crlf(path)) {
case 0:
- return 0;
+ changes += 0;
case 1:
- return forcecrlf_to_git(path, bufp, sizep);
+ changes += forcecrlf_to_git(path, bufp, sizep);
default:
- return autocrlf_to_git(path, bufp, sizep);
+ changes += autocrlf_to_git(path, bufp, sizep);
+ }
+
+ switch (git_path_check_keyword(path)) {
+ case 0:
+ changes += 0;
+ case 1:
+ changes += keyword_unexpand_git(path, bufp, sizep);
}
+ return (changes != 0);
}
int convert_to_working_tree(const char *path, char **bufp, unsigned long *sizep)
diff --git a/t/t0030-keywords.sh b/t/t0030-keywords.sh
new file mode 100755
index 0000000..375acb8
--- /dev/null
+++ b/t/t0030-keywords.sh
@@ -0,0 +1,76 @@
+#!/bin/sh
+
+cd $(dirname $0)
+
+test_description='Keyword expansion'
+
+. ./test-lib.sh
+
+# Adding the attribute "keywords" turns the keyword expansion on
+# I've used "notkeywords" as an attribute as a placeholder attribute
+# but this is just "somerandomattribute", it has no meaning
+
+# Expect success because the keyword attribute should be found
+test_expect_success 'Keywords attribute present' '
+
+ echo "keywordsfile keywords" >.gitattributes &&
+
+ echo "\$keyword: anythingcangohere\$" > keywordsfile &&
+
+ git add keywordsfile &&
+ git add .gitattributes &&
+ git commit -m test-keywords &&
+
+ git check-attr keywords -- keywordsfile
+'
+
+# Expect failure because the repository version should be different from the
+# working tree version.
+#
+# In repository : $keyword:$
+# In working dir: $keyword: anythingcangohere$
+#
+test_expect_failure 'Keywords unexpansion active' '
+
+ git show HEAD:keywordsfile > keywordsfile.cmp &&
+ cmp keywordsfile keywordsfile.cmp
+
+'
+
+# expect success because we want to find the keyword line unexpanded in the
+# and hence appearing unchanged in the output of git-diff
+test_expect_success 'git-diff with keywords present' '
+ echo "Non-keyword containing line" >> keywordsfile &&
+ git diff -- keywordsfile | grep -qs "^ \$keyword:\$$"
+'
+
+# Expect failure because the keywords attribute should NOT be found
+test_expect_failure 'Keywords attribute absent' '
+
+ echo "keywordsfile notkeywords" >.gitattributes &&
+
+ git add .gitattributes &&
+ git commit -m test-not-keywords &&
+
+ git check-attr keywords -- keywordsfile
+
+'
+
+# If keywords are later disabled on that file, then the keyword unexpansion
+# will be ignored, so a diff should now show differences, because git is no
+# longer keyword blind
+test_expect_success 'git-diff with keywords in file but disabled' '
+ git diff -- keywordsfile | grep -qs "^diff"
+'
+
+# Expect success because the repository should be identical to the working tree
+test_expect_success 'Keywords unexpansion inactive' '
+
+ git add keywordsfile &&
+ git commit -m "test-not-keywords"
+
+ git show HEAD:keywordsfile > keywordsfile.cmp &&
+ cmp keywordsfile keywordsfile.cmp
+'
+
+test_done
--
1.5.1.1.821.g88bdb
^ permalink raw reply related [flat|nested] 66+ messages in thread
[parent not found: <200704171803.58940.an dyparkins@gmail.com>]
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
[not found] ` <200704171803.58940.an dyparkins@gmail.com>
@ 2007-04-17 10:09 ` Junio C Hamano
2007-04-17 11:35 ` Andy Parkins
2007-04-17 15:46 ` Linus Torvalds
2007-04-17 10:41 ` Johannes Sixt
` (2 subsequent siblings)
4 siblings, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-17 10:09 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
Andy Parkins <andyparkins@gmail.com> writes:
> No parsing of the keyword itself is performed, the content is simply
> dropped.
You are sidestepping the most important problem by doing this.
The only sensible keyword you could have, without destroying
what git is, is blob id. No commit id, no date, no author.
In http://article.gmane.org/gmane.comp.version-control.git/44654,
Linus said:
I'll finish off trying to explain the problem in fundamental git terms:
say you have a repository with two branches, A and B, and different
history on a file "xyzzy" in those two branches, but because they both
ended up applying the same patches, the actual file contents do end up
being 100% identical. So they have the same SHA1.
What is
git diff A..B -- xyzzy
supposed to print?
And *I* claim that if you don't get an immediate and empty diff, your
system is TOTALLY BROKEN.
Another thing he could have said is this:
When you have such two branches, A and B, and you are on
branch A:
$ git checkout B
should be immediate and instantaneous.
If you try to keyword expand commit id, date or anything that is
sensitive to *how* you got there, even though A and B have the
exact same set of blobs, you have to essentially update all of
them. Computing what to expand to takes (perhaps prohibitively
expensive) time, but more importantly rewriting the whole 20k
(or howmanyever you have in your project) files out becomes
necessary, if your keyword expansion wants to say "oh, this file
was taken from a checkout of branch B", for obvious reasons.
Keyword expanding blob-id, or munging line-endings to CRLF form
on platforms that want it, do not have this problem, as how you
reached to the blob content does not affect the result of
expansion, therefore not just the blobs in commit A and commit B
but the working tree checked out of them must match with each
other.
Having reiterated what Linus already said why keyword expansion
and git are not friendly with each other (perhaps the reason is
because the former is stupid and git is smart), I'd try to be a
bit constructive and point out the areas you _could_ help with
in the nearby codepaths:
* When 'diff' borrows from the working tree because the
filesystem data matches the blob we are interested in, we
already have a call to convert_to_git(). The diff machinery
operates on the canonicalized representation (i.e. this is an
area we do not need help from you).
* When 'checkout', 'read-tree -u' and 'merge-recursive' write
things, we already have calls to convert_to_working_tree() to
munge blob representation to working tree representation
(i.e. again, this is an area we do not need help from you).
* We do not do the borrowing from working tree when doing
grep_sha1(), but when we grep inside a file from working tree
with grep_file(), we do not currently make it go through
convert_to_git() to fix line endings. Maybe we should, if
only for consistency.
* We do not currently run convert_to_git() on the patch text
given to git-apply; we could do so in parse_single_patch().
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 10:09 ` Junio C Hamano
@ 2007-04-17 11:35 ` Andy Parkins
2007-04-17 15:53 ` Linus Torvalds
` (2 more replies)
2007-04-17 15:46 ` Linus Torvalds
1 sibling, 3 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 11:35 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
On Tuesday 2007 April 17 11:09, Junio C Hamano wrote:
> In http://article.gmane.org/gmane.comp.version-control.git/44654,
> Linus said:
> And *I* claim that if you don't get an immediate and empty diff, your
> system is TOTALLY BROKEN.
Well that one is easy - the file is normalised to contain collapsed keywords
upon checkin, so diff works the same as it ever did. The output would be
immediate and empty so is not TOTALLY BROKEN.
> $ git checkout B
>
> should be immediate and instantaneous.
Now - that's a much better argument. However, it's not relevant, keywords (in
other VCSs, and so why not in git) are only updated when a file is checked
out. There is no need to touch every file. It's actually beneficial,
because the keyword in the file is the state of the file at the time it was
checked in - which is actually more useful than updating it to the latest
commit every time.
That means you're only ever expanding in a file that your changing anyway - so
it's effectively free. git-checkout would still be immediate and
instantaneous.
> If you try to keyword expand commit id, date or anything that is
> sensitive to *how* you got there, even though A and B have the
> exact same set of blobs, you have to essentially update all of
> them. Computing what to expand to takes (perhaps prohibitively
> expensive) time, but more importantly rewriting the whole 20k
> (or howmanyever you have in your project) files out becomes
> necessary, if your keyword expansion wants to say "oh, this file
> was taken from a checkout of branch B", for obvious reasons.
Ignoring the fact that expansion is only when a file is checked out; I'd argue
that it's your own fault if you enable keyword expansion on twenty thousand
files. A lot of the discussion has been about how useless keyword expansion
is in almost every case. I only want it for a few files in my repository; so
am willing to pay the small computing cost. Obviously keywords would be
disabled by default - in which case, you get what you deserve if you enable
them on everything.
Putting my own selfish requirements aside, from a purely "mine is better than
yours" point of view, git can't do something that CVS (in all it's
horridness) can. It's distinctly off-putting to people when they
say "keyword expansion", that the response is "YOU'RE AN IDIOT - GO AWAY -
YOU DON'T DESERVE TO USE GIT"; and back they'll scurry to CVS/subversion.
> Keyword expanding blob-id, or munging line-endings to CRLF form
> on platforms that want it, do not have this problem, as how you
> reached to the blob content does not affect the result of
> expansion, therefore not just the blobs in commit A and commit B
> but the working tree checked out of them must match with each
> other.
That's true - however, even if the only keyword git supports is $BlobID$, that
would address a large proportion of people's needs. As I said above though,
the keywords are only expanded on checkout (and checkin to be consistent).
> Having reiterated what Linus already said why keyword expansion
> and git are not friendly with each other (perhaps the reason is
> because the former is stupid and git is smart), I'd try to be a
(This is were my "YOU'RE AN IDIOT - YOU CAN'T USE GIT" alarm goes off). Git
is better than CVS/subversion in every respect - save this one. It's almost
completely free to do (apart from the initial coding of it of course) because
of these two factors:
- The keywords are collapsed in the repository
- The keywords are only expanded on checkout
It doesn't fundamentally alter anything that git does right now.
> * We do not do the borrowing from working tree when doing
> grep_sha1(), but when we grep inside a file from working tree
> with grep_file(), we do not currently make it go through
> convert_to_git() to fix line endings. Maybe we should, if
> only for consistency.
I'd actually argue not - git-grep searches the working tree. The expanded
keywords are in the working tree. Take the CRLF case - I'm a clueless user,
who only understands the system I'm working on. I want to search for all the
line endings, so I do git-grep "\r\n" - that should work, because I'm
searching my working tree.
> * We do not currently run convert_to_git() on the patch text
> given to git-apply; we could do so in parse_single_patch().
Yep - definitely; the applied patch should certainly be normalised before
application. I'd have to add it if I wanted keywords anyway wouldn't I?
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 11:35 ` Andy Parkins
@ 2007-04-17 15:53 ` Linus Torvalds
2007-04-17 17:03 ` Andy Parkins
2007-04-17 21:24 ` Junio C Hamano
2007-04-20 0:30 ` Jakub Narebski
2 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 15:53 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Junio C Hamano
On Tue, 17 Apr 2007, Andy Parkins wrote:
> On Tuesday 2007 April 17 11:09, Junio C Hamano wrote:
>
> > In http://article.gmane.org/gmane.comp.version-control.git/44654,
> > Linus said:
> > And *I* claim that if you don't get an immediate and empty diff, your
> > system is TOTALLY BROKEN.
>
> Well that one is easy - the file is normalised to contain collapsed keywords
> upon checkin, so diff works the same as it ever did. The output would be
> immediate and empty so is not TOTALLY BROKEN.
No, it *is* TOTALLY BROKEN, because your keywords guaranteed that it
doesn't even *apply*.
That's such a fundamental part of a patch that I didn't even _mention_ it,
but I obviously should have.
If you cannot apply the diff you generate, what the hell is the *point* of
a diff?
Try this:
- File-A in revision 1:
$ID: some random crap about rev1 $
Line 2
- same file in revision 2:
$ID: some other random crap about rev2 $
Line 2 got modified
and think about it. Your diff will be something like
@@ -1,2 +1,2 @@
$ID:$
-Line 2
+Line 2 got modified
and the diff WON'T EVEN APPLY!
What kind of diff is that? Would you call it perhaps "totally broken"?
In other words, there's no way in hell you can make this work. You'll end
up always having to edit the keywords parts of diffs to make them apply if
they are part of the context.
(This, btw, is something that a CVS person says "so what?" about. They're
_used_ to having to do it. It's how you do merges in CVS. Really. How many
people have actually *worked* with branches in CVS on any complex project
with any nontrivial work happening on the branch? I have. I hated CVS for
many reasons. Keywords was just a small small detail in that hate
relationship, but it was one of them!)
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 15:53 ` Linus Torvalds
@ 2007-04-17 17:03 ` Andy Parkins
2007-04-17 18:12 ` Linus Torvalds
0 siblings, 1 reply; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 17:03 UTC (permalink / raw)
To: git; +Cc: Linus Torvalds, Junio C Hamano
On Tuesday 2007, April 17, Linus Torvalds wrote:
> Try this:
>
> - File-A in revision 1:
>
> $ID: some random crap about rev1 $
> Line 2
>
> - same file in revision 2:
> $ID: some other random crap about rev2 $
> Line 2 got modified
>
> and think about it. Your diff will be something like
>
> @@ -1,2 +1,2 @@
> $ID:$
> -Line 2
> +Line 2 got modified
>
> and the diff WON'T EVEN APPLY!
Why on earth would it not apply? It's being applied using git-apply,
which will unexpand the keywords as it goes - as I keep saying. When
the apply engine is looking for the context it's going to collapse the
keyword, so the context will match and the diff WILL EVEN APPLY.
As Junio said in his reply, git-apply doesn't currently call
convert_to_git(), but that's easily implemented.
> In other words, there's no way in hell you can make this work. You'll
You keep saying these sweepingly general things. It can be made to
work.
> end up always having to edit the keywords parts of diffs to make them
> apply if they are part of the context.
No I don't. If I had to then the keyword code would be broken. No one
in their right mind would think that was an acceptable thing to do.
> (This, btw, is something that a CVS person says "so what?" about.
> They're _used_ to having to do it. It's how you do merges in CVS.
> Really. How many people have actually *worked* with branches in CVS
That's because CVS is rubbish. What has that got to do with it?
> on any complex project with any nontrivial work happening on the
> branch? I have. I hated CVS for many reasons. Keywords was just a
> small small detail in that hate relationship, but it was one of
> them!)
You really can stop trying to persuade me that CVS is no good for
version control - I agree, a thousand times I agree. There are a lot
of things that CVS does in a broken manner, that doesn't mean that git
does the same thing in a broken manner.
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 17:03 ` Andy Parkins
@ 2007-04-17 18:12 ` Linus Torvalds
2007-04-17 19:12 ` Andy Parkins
0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 18:12 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Junio C Hamano
On Tue, 17 Apr 2007, Andy Parkins wrote:
>
> Why on earth would it not apply? It's being applied using git-apply,
> which will unexpand the keywords as it goes - as I keep saying.
So you will never work with anybody outside of git?
What about tar-files when you export the tree? Should they have the
expanded version?
> You keep saying these sweepingly general things. It can be made to
> work.
No, it CANNOT.
Trust me. There's NO WAY IN HELL it will "work" in any other sense than
"limp along and not be usable".
Yes, you can make it "work" if you:
- make sure that you never _ever_ leave the git environment
But why do you want keyword expansion then? The whole point is if you
have other tools than the git tools that look at a file. Even your svg
example was literally about having non-git tools work with the data.
What if you ever email the file to somebody else?
- you make all git tools explicitly always strip them.
Again, what's the point again? You add keyword expansion, and then the
only tools that you really allow to touch it (except your "print it
out" example) will have to remove the keyword expansion just to work.
That's not "work". That's just stupid. Yes, you can make your "print it
out" example work, but as alreadyt mentioned, you could have done that
some other way, with a simple makefile rule, quite independently (and much
better) than the SCM ever did.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 18:12 ` Linus Torvalds
@ 2007-04-17 19:12 ` Andy Parkins
[not found] ` <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>
` (3 more replies)
0 siblings, 4 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 19:12 UTC (permalink / raw)
To: git; +Cc: Linus Torvalds, Junio C Hamano
On Tuesday 2007, April 17, Linus Torvalds wrote:
> So you will never work with anybody outside of git?
For my projects - correct; I don't care about the rest of the world.
For projects that do - don't enable keywords, it's an option, all I
want is to have that option.
> What about tar-files when you export the tree? Should they have the
> expanded version?
If I have to pick one then: no. I think out-of-tree keywords are too
much trouble for exactly the reasons you say; however, I wouldn't like
to presume what other people think is too much trouble so I suppose it
would have to be an option.
> > You keep saying these sweepingly general things. It can be made to
> > work.
>
> No, it CANNOT.
>
> Trust me. There's NO WAY IN HELL it will "work" in any other sense
> than "limp along and not be usable".
Well I'm making progress, "limp along" is a significant step up from
impossible. :-)
Look, my primary objection to this is the SHOUTING about how impossible
it is even though I've tried to address every problem you've thrown at
me - I'm finding it really difficult to figure out why you're trying so
hard to dissuade me from even _trying_. If it all goes wrong (as I
fully accept it might), so be it, I can live with that; I'll even be
happy to tell you you're right and I'm wrong. Why is this such a
problem?
Keywords are so hated by everyone that I doubt they would ever be
accepted into git - it's an intellectual exercise for me at this stage
really.
> Yes, you can make it "work" if you:
>
> - make sure that you never _ever_ leave the git environment
As it happens, _I_ never ever leave the git environment. Can I use
keywords then?
You don't seem to have such a problem with git's extended diffs for
renames or subprojects - "make sure that you never _ever_ leave the git
environment".
> But why do you want keyword expansion then? The whole point is if
> you have other tools than the git tools that look at a file. Even
> your svg example was literally about having non-git tools work with
> the data. What if you ever email the file to somebody else?
If by "tools" you mean other version control systems, then I don't
intend them to work. If by "tools" you mean gcc, inkscape, gv, bash,
web browsers or any other fileformat that allows comments in the file
then I expect it to be fine. If I publish a web page, it'd be nice to
show the ID on the page - that's all just "nice" not "necessary"
or "I'm throwing git away if I don't get it".
Emailing to others isn't a problem either: let's say I email them my SVG
(with keywords expanded), they make some edits and send it me back -
worse, they send me a diff back. I'm going to apply that diff using
git-apply; which will collapse the keywords and apply the diff.
> - you make all git tools explicitly always strip them.
Well, not "all", so far I've added one call to convert_to_git() in
builtin-apply.c - it was a one line addition. It needed doing anyway
to deal with the CRLF correctly. I can't see there being that many
places that this needs doing. I may well be wrong, if I end up
scattering calls to convert_to_git() everywhere I'll give up.
> Again, what's the point again? You add keyword expansion, and then
> the only tools that you really allow to touch it (except your "print
> it out" example) will have to remove the keyword expansion just to
> work.
(I don't see why my tiny "print it out" example isn't enough - it
matters to me)
However, most tools don't care about the keywords, it's only non-git
diff and non-git patch that are affected. As long as the file format
supports comments, then keyword expansion will be just fine.
> That's not "work". That's just stupid. Yes, you can make your "print
> it out" example work, but as alreadyt mentioned, you could have done
> that some other way, with a simple makefile rule, quite independently
> (and much better) than the SCM ever did.
That's just being obtuse - no other tool cares in the slightest about
the keywords, there are more "tools" in the world than just the VCS.
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>]
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 19:12 ` Andy Parkins
[not found] ` <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>
@ 2007-04-17 19:41 ` Nicolas Pitre
2007-04-17 19:45 ` David Lang
2007-04-17 19:54 ` Linus Torvalds
2007-04-17 21:18 ` Martin Langhoff
3 siblings, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-17 19:41 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Linus Torvalds, Junio C Hamano
On Tue, 17 Apr 2007, Andy Parkins wrote:
> On Tuesday 2007, April 17, Linus Torvalds wrote:
>
> > Trust me. There's NO WAY IN HELL it will "work" in any other sense
> > than "limp along and not be usable".
>
> Well I'm making progress, "limp along" is a significant step up from
> impossible. :-)
>
> Look, my primary objection to this is the SHOUTING about how impossible
> it is even though I've tried to address every problem you've thrown at
> me - I'm finding it really difficult to figure out why you're trying so
> hard to dissuade me from even _trying_. If it all goes wrong (as I
> fully accept it might), so be it, I can live with that; I'll even be
> happy to tell you you're right and I'm wrong. Why is this such a
> problem?
>
> Keywords are so hated by everyone that I doubt they would ever be
> accepted into git - it's an intellectual exercise for me at this stage
> really.
I cannot do otherwise than ask at this point in the debate: why isn't
the makefile rule sufficient for your needs? Why going through a
complicated path that no one else will support due to its numerous
pitfalls?
> > Yes, you can make your "print it out" example work, but as alreadyt
> > mentioned, you could have done that some other way, with a simple
> > makefile rule, quite independently (and much better) than the SCM
> > ever did.
>
> That's just being obtuse - no other tool cares in the slightest about
> the keywords, there are more "tools" in the world than just the VCS.
... which reinforces my question: why force a task on the VCS if it
doesn't fit well with its fundamental design?
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 19:41 ` Nicolas Pitre
@ 2007-04-17 19:45 ` David Lang
[not found] ` <alpin e.LFD.0.98.0704171624190.4504@xanadu.home>
2007-04-17 20:29 ` Nicolas Pitre
0 siblings, 2 replies; 66+ messages in thread
From: David Lang @ 2007-04-17 19:45 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano
On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>
> On Tue, 17 Apr 2007, Andy Parkins wrote:
>
>> On Tuesday 2007, April 17, Linus Torvalds wrote:
>>
>>> Trust me. There's NO WAY IN HELL it will "work" in any other sense
>>> than "limp along and not be usable".
>>
>> Well I'm making progress, "limp along" is a significant step up from
>> impossible. :-)
>>
>> Look, my primary objection to this is the SHOUTING about how impossible
>> it is even though I've tried to address every problem you've thrown at
>> me - I'm finding it really difficult to figure out why you're trying so
>> hard to dissuade me from even _trying_. If it all goes wrong (as I
>> fully accept it might), so be it, I can live with that; I'll even be
>> happy to tell you you're right and I'm wrong. Why is this such a
>> problem?
>>
>> Keywords are so hated by everyone that I doubt they would ever be
>> accepted into git - it's an intellectual exercise for me at this stage
>> really.
>
> I cannot do otherwise than ask at this point in the debate: why isn't
> the makefile rule sufficient for your needs? Why going through a
> complicated path that no one else will support due to its numerous
> pitfalls?
not all uses of VCS's involve useing make
>>> Yes, you can make your "print it out" example work, but as alreadyt
>>> mentioned, you could have done that some other way, with a simple
>>> makefile rule, quite independently (and much better) than the SCM
>>> ever did.
>>
>> That's just being obtuse - no other tool cares in the slightest about
>> the keywords, there are more "tools" in the world than just the VCS.
>
> ... which reinforces my question: why force a task on the VCS if it
> doesn't fit well with its fundamental design?
becouse the VCS can do the job better then anything else? even if there are
limits to what the VCS can do.
David Lang
^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <alpin e.LFD.0.98.0704171624190.4504@xanadu.home>]
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 19:45 ` David Lang
[not found] ` <alpin e.LFD.0.98.0704171624190.4504@xanadu.home>
@ 2007-04-17 20:29 ` Nicolas Pitre
2007-04-17 20:05 ` David Lang
1 sibling, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-17 20:29 UTC (permalink / raw)
To: David Lang; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano
On Tue, 17 Apr 2007, David Lang wrote:
> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>
> > I cannot do otherwise than ask at this point in the debate: why isn't
> > the makefile rule sufficient for your needs? Why going through a
> > complicated path that no one else will support due to its numerous
> > pitfalls?
>
> not all uses of VCS's involve useing make
Use perl then. Or a shell script. Or even a command.com batch script.
Or your own tool.
> > > That's just being obtuse - no other tool cares in the slightest about
> > > the keywords, there are more "tools" in the world than just the VCS.
> >
> > ... which reinforces my question: why force a task on the VCS if it
> > doesn't fit well with its fundamental design?
>
> becouse the VCS can do the job better then anything else?
On what basis?
> even if there are
> limits to what the VCS can do.
In the context of keyword expansion I don't agree at all with this
statement. Git can *not* do better than an external tool and it has
been demonstrated a few times already.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 20:29 ` Nicolas Pitre
@ 2007-04-17 20:05 ` David Lang
2007-04-17 21:16 ` Nicolas Pitre
0 siblings, 1 reply; 66+ messages in thread
From: David Lang @ 2007-04-17 20:05 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano
On Tue, 17 Apr 2007, Nicolas Pitre wrote:
> On Tue, 17 Apr 2007, David Lang wrote:
>
>> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>>
>>> I cannot do otherwise than ask at this point in the debate: why isn't
>>> the makefile rule sufficient for your needs? Why going through a
>>> complicated path that no one else will support due to its numerous
>>> pitfalls?
>>
>> not all uses of VCS's involve useing make
>
> Use perl then. Or a shell script. Or even a command.com batch script.
> Or your own tool.
I would like to, however this doesn't currently integrate well with git. I've
been told in the past that once .gitattributes is in place then the hooks for
the crlf stuff can be generalized to allow for calls out to custom code to do
this sort of thing.
however now it sounds as if people are saying that doing this is so evil that it
shouldn't ever be allowed.
>>>> That's just being obtuse - no other tool cares in the slightest about
>>>> the keywords, there are more "tools" in the world than just the VCS.
>>>
>>> ... which reinforces my question: why force a task on the VCS if it
>>> doesn't fit well with its fundamental design?
>>
>> becouse the VCS can do the job better then anything else?
>
> On what basis?
>
>> even if there are
>> limits to what the VCS can do.
>
> In the context of keyword expansion I don't agree at all with this
> statement. Git can *not* do better than an external tool and it has
> been demonstrated a few times already.
the VCS can make sure that the appropriate external code is always run when
things are checked in/out. external tools (unless they are a complete set of
wrappers for git) can't do that.
David Lang
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 20:05 ` David Lang
@ 2007-04-17 21:16 ` Nicolas Pitre
[not found] ` <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>
` (3 more replies)
0 siblings, 4 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-17 21:16 UTC (permalink / raw)
To: David Lang; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano
On Tue, 17 Apr 2007, David Lang wrote:
> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>
> > On Tue, 17 Apr 2007, David Lang wrote:
> >
> > > On Tue, 17 Apr 2007, Nicolas Pitre wrote:
> > >
> > > > I cannot do otherwise than ask at this point in the debate: why isn't
> > > > the makefile rule sufficient for your needs? Why going through a
> > > > complicated path that no one else will support due to its numerous
> > > > pitfalls?
> > >
> > > not all uses of VCS's involve useing make
> >
> > Use perl then. Or a shell script. Or even a command.com batch script.
> > Or your own tool.
>
> I would like to, however this doesn't currently integrate well with git. I've
> been told in the past that once .gitattributes is in place then the hooks for
> the crlf stuff can be generalized to allow for calls out to custom code to do
> this sort of thing.
And I agree that this is a perfectly sensible thing to do. The facility
should be there for you to apply any kind of transformation with
external tools on data going in or out from Git. There are good and bad
things you can do with such a facility, but at least it becomes your
responsibility to screw^H^H^H^Hfilter your data and not something that
is enforced by Git itself.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
[parent not found: <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>]
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 21:16 ` Nicolas Pitre
[not found] ` <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>
@ 2007-04-17 20:53 ` David Lang
2007-04-17 21:52 ` Andy Parkins
2007-04-17 22:40 ` Junio C Hamano
2007-04-18 6:24 ` Rogan Dawes
3 siblings, 1 reply; 66+ messages in thread
From: David Lang @ 2007-04-17 20:53 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Andy Parkins, git, Linus Torvalds, Junio C Hamano
On Tue, 17 Apr 2007, Nicolas Pitre wrote:
> Subject: Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
>
> On Tue, 17 Apr 2007, David Lang wrote:
>
>> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>>
>>> On Tue, 17 Apr 2007, David Lang wrote:
>>>
>>>> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>>>>
>>>>> I cannot do otherwise than ask at this point in the debate: why isn't
>>>>> the makefile rule sufficient for your needs? Why going through a
>>>>> complicated path that no one else will support due to its numerous
>>>>> pitfalls?
>>>>
>>>> not all uses of VCS's involve useing make
>>>
>>> Use perl then. Or a shell script. Or even a command.com batch script.
>>> Or your own tool.
>>
>> I would like to, however this doesn't currently integrate well with git. I've
>> been told in the past that once .gitattributes is in place then the hooks for
>> the crlf stuff can be generalized to allow for calls out to custom code to do
>> this sort of thing.
>
> And I agree that this is a perfectly sensible thing to do. The facility
> should be there for you to apply any kind of transformation with
> external tools on data going in or out from Git. There are good and bad
> things you can do with such a facility, but at least it becomes your
> responsibility to screw^H^H^H^Hfilter your data and not something that
> is enforced by Git itself.
I'm pretty sure that hooks for an external helper would satisfy Andy with his
keyword expanstion as well.
David Lang
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 21:16 ` Nicolas Pitre
[not found] ` <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>
2007-04-17 20:53 ` David Lang
@ 2007-04-17 22:40 ` Junio C Hamano
2007-04-18 2:39 ` Nicolas Pitre
2007-04-21 0:42 ` David Lang
2007-04-18 6:24 ` Rogan Dawes
3 siblings, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-17 22:40 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: David Lang, Andy Parkins, git, Linus Torvalds
Nicolas Pitre <nico@cam.org> writes:
>> I would like to, however this doesn't currently integrate
>> well with git. I've been told in the past that once
>> .gitattributes is in place then the hooks for the crlf stuff
>> can be generalized to allow for calls out to custom code to
>> do this sort of thing.
>
> And I agree that this is a perfectly sensible thing to do. The facility
> should be there for you to apply any kind of transformation with
> external tools on data going in or out from Git. There are good and bad
> things you can do with such a facility, but at least it becomes your
> responsibility to screw^H^H^H^Hfilter your data and not something that
> is enforced by Git itself.
You have to be careful, though. Depending on what kind of
transformation you implement with the external tools, you would
end up having to slow down everything we would do.
It boils down to this statement from Andy:
..., keywords (in other VCSs, and so why not in git) are
only updated when a file is checked out. There is no need
to touch every file. It's actually beneficial, because the
keyword in the file is the state of the file at the time it
was checked in - which is actually more useful than updating
it to the latest commit every time.
That means you're only ever expanding in a file that your
changing anyway - so it's effectively free. git-checkout
would still be immediate and instantaneous.
Back up a bit and think what "when a file is checked out" means.
His argument assumes the current behaviour of not checking out
when the underlying blob objects before munging are the same.
But with keyword expansion and fancier "external tools" whose
semantics are not well defined (iow, defined to be "do whatever
they please"), does it still make sense to consider two blobs
that appear in totally different context "the same" and omit
checking out (and causing the external tools hook not getting
run)? I already pointed out to Andy that the branch name the
file was taken from, if it were to take part of the keyword
expansion, would come out incorrectly in his printed svg
drawing.
If you want somebody's earlier example of "giving a file with
embedded keyword to somebody, who modifies and sends the result
back in full, now you would want to incorporate the change by
identifying the origin" to work, you would want "$Source$" (I am
looking at CVS documentation, "Keyword substitution/Keyword
List") to identify where that file came from (after all, a
source tree could have duplicated files) so that you can tell
which file the update is about, and this keyword would expand
differently depending on where in the project tree the blob
appears.
It is not just the checkout codepath. We omit diffs when we
know from SHA-1 that the blobs are the same before decoration.
We even omit diffs when we know from SHA-1 that two trees are
the same without taking possible decorations that can be applied
differently to the blobs they contain into account. Earlier,
Andy said he wanted to grep for the expanded text if he is
grepping in the working tree, and I think that makes sense, but
that means git-grep cannot do the same "borrow from working tree
when expanding from blob object is more expensive" optimization
we have for diff. We also need to disable that optimization
from the diff, regardless of what the correct semantics for
grepping in working trees should be.
I suspect that you would have to play safe and say "when
external tools are involved, we need to disable the existing
content SHA-1 based optimization for all paths that ask for
them" to keep your sanity.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 22:40 ` Junio C Hamano
@ 2007-04-18 2:39 ` Nicolas Pitre
2007-04-18 5:04 ` Junio C Hamano
2007-04-18 11:14 ` Johannes Schindelin
2007-04-21 0:42 ` David Lang
1 sibling, 2 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18 2:39 UTC (permalink / raw)
To: Junio C Hamano; +Cc: David Lang, Andy Parkins, git, Linus Torvalds
On Tue, 17 Apr 2007, Junio C Hamano wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
> >> I would like to, however this doesn't currently integrate
> >> well with git. I've been told in the past that once
> >> .gitattributes is in place then the hooks for the crlf stuff
> >> can be generalized to allow for calls out to custom code to
> >> do this sort of thing.
> >
> > And I agree that this is a perfectly sensible thing to do. The facility
> > should be there for you to apply any kind of transformation with
> > external tools on data going in or out from Git. There are good and bad
> > things you can do with such a facility, but at least it becomes your
> > responsibility to screw^H^H^H^Hfilter your data and not something that
> > is enforced by Git itself.
>
> You have to be careful, though. Depending on what kind of
> transformation you implement with the external tools, you would
> end up having to slow down everything we would do.
So what?
We provide a rope with proper caveat emptor. Up to others to hang
themselves with it if they so desire. It is not our problem anymore.
> It boils down to this statement from Andy:
>
> ..., keywords (in other VCSs, and so why not in git) are
> only updated when a file is checked out. There is no need
> to touch every file. It's actually beneficial, because the
> keyword in the file is the state of the file at the time it
> was checked in - which is actually more useful than updating
> it to the latest commit every time.
>
> That means you're only ever expanding in a file that your
> changing anyway - so it's effectively free. git-checkout
> would still be immediate and instantaneous.
>
> Back up a bit and think what "when a file is checked out" means.
> His argument assumes the current behaviour of not checking out
> when the underlying blob objects before munging are the same.
And I think that should remain. If someone really wants full
transformation aka keyword expansion or whatever then he'd just need to
force a full read-tree after switching branch.
> But with keyword expansion and fancier "external tools" whose
> semantics are not well defined (iow, defined to be "do whatever
> they please"), does it still make sense to consider two blobs
> that appear in totally different context "the same" and omit
> checking out (and causing the external tools hook not getting
> run)?
I think so. At least by default. And if we really want to be kind we
could provide a special attribute just for disabling such optimization
i.e. to force a checkout of everything marked with such an attribute
everytime. But we might as well wait to see if someone actually ask
about that.
But for many other cases having such a facility would be just nice
especially for those cases that don't depend on the branch/commit but
only on the content itself. For example it was pointed out that Open
Office documents are gzipped XML files. In that case it would be
extremely advantageous to have the ability to specify an input filter as
"gzip -d" and an output filter as "gzip -c" so Git has a chance to
actually perform some kind of delta compression.
I'm sure there might be other type of filters for situation we've not
thought about yet, and that we might not want to carry as a builtin
feature. Who knows, maybe someone might want to port Git to the
System/360 and will need an EBCDIC filter on checked out text files. Or
maybe a byte swapping filter for some kind of binary files when on a
system with a different endianness.
> I already pointed out to Andy that the branch name the
> file was taken from, if it were to take part of the keyword
> expansion, would come out incorrectly in his printed svg
> drawing.
Tough.
> If you want somebody's earlier example of "giving a file with
> embedded keyword to somebody, who modifies and sends the result
> back in full, now you would want to incorporate the change by
> identifying the origin" to work, you would want "$Source$" (I am
> looking at CVS documentation, "Keyword substitution/Keyword
> List") to identify where that file came from (after all, a
> source tree could have duplicated files) so that you can tell
> which file the update is about, and this keyword would expand
> differently depending on where in the project tree the blob
> appears.
Like we don't record renames, I don't think we should record such thing
in checked out files either. Using a search for the closest match (like
we do for rename detection) is probably a better avenue than trusting
that the ID embedded in the file wasn't messed with. Linus already
provided a small script that would do that with pretty good results.
> It is not just the checkout codepath. We omit diffs when we
> know from SHA-1 that the blobs are the same before decoration.
> We even omit diffs when we know from SHA-1 that two trees are
> the same without taking possible decorations that can be applied
> differently to the blobs they contain into account. Earlier,
> Andy said he wanted to grep for the expanded text if he is
> grepping in the working tree, and I think that makes sense, but
> that means git-grep cannot do the same "borrow from working tree
> when expanding from blob object is more expensive" optimization
> we have for diff. We also need to disable that optimization
> from the diff, regardless of what the correct semantics for
> grepping in working trees should be.
>
> I suspect that you would have to play safe and say "when
> external tools are involved, we need to disable the existing
> content SHA-1 based optimization for all paths that ask for
> them" to keep your sanity.
Maybe. If that is what's really needed then so be it. People who
really want to do strange things will have the flexibility to do so, but
they'll have to pay the price in loss of performance.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 2:39 ` Nicolas Pitre
@ 2007-04-18 5:04 ` Junio C Hamano
2007-04-18 14:56 ` Nicolas Pitre
2007-04-18 11:14 ` Johannes Schindelin
1 sibling, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18 5:04 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: David Lang, Andy Parkins, git, Linus Torvalds
Nicolas Pitre <nico@cam.org> writes:
> On Tue, 17 Apr 2007, Junio C Hamano wrote:
>
>> You have to be careful, though. Depending on what kind of
>> transformation you implement with the external tools, you would
>> end up having to slow down everything we would do.
>
> So what?
>
> We provide a rope with proper caveat emptor. Up to others to hang
> themselves with it if they so desire. It is not our problem anymore.
I sort-of find it hard to believe hearing this from somebody who
muttered something about importance of perception a few days ago.
>> I suspect that you would have to play safe and say "when
>> external tools are involved, we need to disable the existing
>> content SHA-1 based optimization for all paths that ask for
>> them" to keep your sanity.
>
> Maybe. If that is what's really needed then so be it. People who
> really want to do strange things will have the flexibility to do so, but
> they'll have to pay the price in loss of performance.
Not just that. We end up having to pay the price of maintaining
hooks to let them do crazy things.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 5:04 ` Junio C Hamano
@ 2007-04-18 14:56 ` Nicolas Pitre
0 siblings, 0 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18 14:56 UTC (permalink / raw)
To: Junio C Hamano; +Cc: David Lang, Andy Parkins, git, Linus Torvalds
On Tue, 17 Apr 2007, Junio C Hamano wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
> > On Tue, 17 Apr 2007, Junio C Hamano wrote:
> >
> >> You have to be careful, though. Depending on what kind of
> >> transformation you implement with the external tools, you would
> >> end up having to slow down everything we would do.
> >
> > So what?
> >
> > We provide a rope with proper caveat emptor. Up to others to hang
> > themselves with it if they so desire. It is not our problem anymore.
>
> I sort-of find it hard to believe hearing this from somebody who
> muttered something about importance of perception a few days ago.
Sure! And that applies in this case as well.
With such a _generic_ hook, Git will be perceived as much more powerful
and flexible. I insist on "generic" because people could experiment
with their own filters without endless debate on the mailing list and
pressure to include this or that feature in the core, and we don't have
to commit to those feature we're not in agreement with.
And let's face it: there are probably legitimate and possibly more
useful things to do with such a hook than keyword expansion.
If you go to Home Hardware you can buy rope. Of course you can hang
yourself with it, but the rope manufacturers won't commit to that I'm
sure. But if rope was banned by law because it represents a threath to
life then governments would be perceived really strangely even if their
intention are good.
Sure we might have a strong opinion against keyword expansion and that
is reflected by the fact that Git will most probably never ship with the
ability to perform keyword expansion. That doesn't mean we should deny
all possibilities for external filters _even_ if they can be used for
keyword expansion.
> >> I suspect that you would have to play safe and say "when
> >> external tools are involved, we need to disable the existing
> >> content SHA-1 based optimization for all paths that ask for
> >> them" to keep your sanity.
> >
> > Maybe. If that is what's really needed then so be it. People who
> > really want to do strange things will have the flexibility to do so, but
> > they'll have to pay the price in loss of performance.
>
> Not just that. We end up having to pay the price of maintaining
> hooks to let them do crazy things.
Weight that against the price of fighting them against the crazy things
they won't quit wanting to do. At some point it is just a matter of
getting out of the way.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 2:39 ` Nicolas Pitre
2007-04-18 5:04 ` Junio C Hamano
@ 2007-04-18 11:14 ` Johannes Schindelin
2007-04-18 15:10 ` Nicolas Pitre
1 sibling, 1 reply; 66+ messages in thread
From: Johannes Schindelin @ 2007-04-18 11:14 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Junio C Hamano, David Lang, Andy Parkins, git, Linus Torvalds
Hi,
On Tue, 17 Apr 2007, Nicolas Pitre wrote:
> So what?
>
> We provide a rope with proper caveat emptor. Up to others to hang
> themselves with it if they so desire. It is not our problem anymore.
The people will complain. On this list. And I have to check the mails
before deleting, because the Subject: does not say "I just took the rope,
ignored your caveat emptor, and now I am dead. What should I do now?".
Ciao,
Dscho
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 11:14 ` Johannes Schindelin
@ 2007-04-18 15:10 ` Nicolas Pitre
2007-04-19 8:19 ` Johannes Schindelin
0 siblings, 1 reply; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18 15:10 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Junio C Hamano, David Lang, Andy Parkins, git, Linus Torvalds
On Wed, 18 Apr 2007, Johannes Schindelin wrote:
> Hi,
>
> On Tue, 17 Apr 2007, Nicolas Pitre wrote:
>
> > So what?
> >
> > We provide a rope with proper caveat emptor. Up to others to hang
> > themselves with it if they so desire. It is not our problem anymore.
>
> The people will complain. On this list. And I have to check the mails
> before deleting, because the Subject: does not say "I just took the rope,
> ignored your caveat emptor, and now I am dead. What should I do now?".
Well... in the case of keyword expansion (since this is really the
contentious case here), with such a _generic_ facility to implement
external filters, people will at least have the opportunity to try it.
Sure they might complain that it doesn't work well, but 1) it is much
easier to *understand* why it doesn't work well after experimenting with
it, and 2) some people *will* be perfectly happy with something that
doesn't work well but happens to just work in their own particular case.
And because it now becomes a case by case issue it is much easier for
us to simply provide a generic mechanism and let people figure out by
themselves what works and what doesn't work instead of having
philosophical discussions on the merits of keyword expansions on the
list.
And because people _will_ complain *anyway*, it might lead to more
productive discussion if those who complain had the chance to realize
what the issues really are by experience if theoretic demonstrations alone
doesn't convey the problem fully.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 15:10 ` Nicolas Pitre
@ 2007-04-19 8:19 ` Johannes Schindelin
0 siblings, 0 replies; 66+ messages in thread
From: Johannes Schindelin @ 2007-04-19 8:19 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Junio C Hamano, David Lang, Andy Parkins, git, Linus Torvalds
Hi,
On Wed, 18 Apr 2007, Nicolas Pitre wrote:
> And because it now becomes a case by case issue it is much easier for us
> to simply provide a generic mechanism and let people figure out by
> themselves what works and what doesn't work instead of having
> philosophical discussions on the merits of keyword expansions on the
> list.
That's a very good point.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 22:40 ` Junio C Hamano
2007-04-18 2:39 ` Nicolas Pitre
@ 2007-04-21 0:42 ` David Lang
2007-04-21 1:54 ` Junio C Hamano
1 sibling, 1 reply; 66+ messages in thread
From: David Lang @ 2007-04-21 0:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nicolas Pitre, Andy Parkins, git, Linus Torvalds
On Tue, 17 Apr 2007, Junio C Hamano wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
>>> I would like to, however this doesn't currently integrate
>>> well with git. I've been told in the past that once
>>> .gitattributes is in place then the hooks for the crlf stuff
>>> can be generalized to allow for calls out to custom code to
>>> do this sort of thing.
>>
>> And I agree that this is a perfectly sensible thing to do. The facility
>> should be there for you to apply any kind of transformation with
>> external tools on data going in or out from Git. There are good and bad
>> things you can do with such a facility, but at least it becomes your
>> responsibility to screw^H^H^H^Hfilter your data and not something that
>> is enforced by Git itself.
>
> You have to be careful, though. Depending on what kind of
> transformation you implement with the external tools, you would
> end up having to slow down everything we would do.
you can slow down everything that you do on your system if you defined too much
work for external tools, that won't slow other people down who don't define any
work for external tools.
> It boils down to this statement from Andy:
>
> ..., keywords (in other VCSs, and so why not in git) are
> only updated when a file is checked out. There is no need
> to touch every file. It's actually beneficial, because the
> keyword in the file is the state of the file at the time it
> was checked in - which is actually more useful than updating
> it to the latest commit every time.
>
> That means you're only ever expanding in a file that your
> changing anyway - so it's effectively free. git-checkout
> would still be immediate and instantaneous.
>
> Back up a bit and think what "when a file is checked out" means.
> His argument assumes the current behaviour of not checking out
> when the underlying blob objects before munging are the same.
correct.
> But with keyword expansion and fancier "external tools" whose
> semantics are not well defined (iow, defined to be "do whatever
> they please"), does it still make sense to consider two blobs
> that appear in totally different context "the same" and omit
> checking out (and causing the external tools hook not getting
> run)? I already pointed out to Andy that the branch name the
> file was taken from, if it were to take part of the keyword
> expansion, would come out incorrectly in his printed svg
> drawing.
this is part of the rope you are handing out. the external tool could do a lot
of things that don't make sense. you could have the tool include the serial
number of the cpu you happen to be running on at the moment, it wouldn't make
sense to do this, but it could be done. the fact that the rope could be used to
hang someone doesn't mean that you should outlaw rope.
> If you want somebody's earlier example of "giving a file with
> embedded keyword to somebody, who modifies and sends the result
> back in full, now you would want to incorporate the change by
> identifying the origin" to work, you would want "$Source$" (I am
> looking at CVS documentation, "Keyword substitution/Keyword
> List") to identify where that file came from (after all, a
> source tree could have duplicated files) so that you can tell
> which file the update is about, and this keyword would expand
> differently depending on where in the project tree the blob
> appears.
>
> It is not just the checkout codepath. We omit diffs when we
> know from SHA-1 that the blobs are the same before decoration.
> We even omit diffs when we know from SHA-1 that two trees are
> the same without taking possible decorations that can be applied
> differently to the blobs they contain into account. Earlier,
> Andy said he wanted to grep for the expanded text if he is
> grepping in the working tree, and I think that makes sense, but
> that means git-grep cannot do the same "borrow from working tree
> when expanding from blob object is more expensive" optimization
> we have for diff. We also need to disable that optimization
> from the diff, regardless of what the correct semantics for
> grepping in working trees should be.
git would not be able to borrow from the working tree just becouse the index
thinks that the file is the same (and frankly, I'm not sure this is really a
safe thing to do in any case, it's just something that works frequently enough
that we get away with it)
the diff optimizations could (and should) stay.
> I suspect that you would have to play safe and say "when
> external tools are involved, we need to disable the existing
> content SHA-1 based optimization for all paths that ask for
> them" to keep your sanity.
Andy and I are both expecting that if the blobs are the same that none of the
git tools would flag them as different. this maintains the huge speedups that
git achieves by doing these checks.
if you want to make an option somewhere that disables this optimization, I guess
it would be Ok, but I wouldn't do so until someone came up with a situation
where they really needed it, nothing in what Andy or I have asked for needs
this.
both of us are treating the keyword expansion as decorations to the file. it's
useful, but the core meaning of 'what this file is' is the checked in version
with the keywords unexpanded. all the optmizations that only look at what's
checked in will remain as valid as they are today, it's only things that look at
your working directory that would change.
David Lang
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-21 0:42 ` David Lang
@ 2007-04-21 1:54 ` Junio C Hamano
2007-04-21 2:06 ` Nicolas Pitre
2007-04-21 23:31 ` David Lang
0 siblings, 2 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-21 1:54 UTC (permalink / raw)
To: David Lang; +Cc: Nicolas Pitre, Andy Parkins, git, Linus Torvalds
David Lang <david.lang@digitalinsight.com> writes:
>> But with keyword expansion and fancier "external tools" whose
>> semantics are not well defined (iow, defined to be "do whatever
>> they please"), does it still make sense to consider two blobs
>> that appear in totally different context "the same" and omit
>> checking out (and causing the external tools hook not getting
>> run)? I already pointed out to Andy that the branch name the
>> file was taken from, if it were to take part of the keyword
>> expansion, would come out incorrectly in his printed svg
>> drawing.
>
> this is part of the rope you are handing out. the external tool could
> do a lot of things that don't make sense. you could have the tool
> include the serial number of the cpu you happen to be running on at
> the moment, it wouldn't make sense to do this, but it could be
> done. the fact that the rope could be used to hang someone doesn't
> mean that you should outlaw rope.
I do not think you understand, especially after reading the part
you say "Andy and I both...".
The point of my comment was that with Andy's definition of when
the "external tools" should trigger, that CPU serial number
embedder would _NOT_ trigger for a path when you switch branches
that have the same contents at that path. External tools can do
stupid things and that is what you are calling the rope. But
the case I am talking about is that we deliberately do _not_
call external tools, so even if external tools can do sensible
things if given a chance to, they are not given a chance to do
so, and deciding not to call them in some cases is made by us.
I think that's different from "we gave you rope, you hang
yourself and that is not our problem".
People have every right to say "if you consistently call these
external tools, they behave sensibly, but you only call them
when you choose, and that is where the idiocy is coming from".
How would you respond to that?
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-21 1:54 ` Junio C Hamano
@ 2007-04-21 2:06 ` Nicolas Pitre
2007-04-21 23:31 ` David Lang
1 sibling, 0 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-21 2:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: David Lang, Andy Parkins, git, Linus Torvalds
On Fri, 20 Apr 2007, Junio C Hamano wrote:
> The point of my comment was that with Andy's definition of when
> the "external tools" should trigger, that CPU serial number
> embedder would _NOT_ trigger for a path when you switch branches
> that have the same contents at that path. External tools can do
> stupid things and that is what you are calling the rope. But
> the case I am talking about is that we deliberately do _not_
> call external tools, so even if external tools can do sensible
> things if given a chance to, they are not given a chance to do
> so, and deciding not to call them in some cases is made by us.
And that's a fairly acceptable limitation IMHO, which doesn't make the
thing any less useful for many cases. We just need to document it
appropriately.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-21 1:54 ` Junio C Hamano
2007-04-21 2:06 ` Nicolas Pitre
@ 2007-04-21 23:31 ` David Lang
1 sibling, 0 replies; 66+ messages in thread
From: David Lang @ 2007-04-21 23:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nicolas Pitre, Andy Parkins, git, Linus Torvalds
On Fri, 20 Apr 2007, Junio C Hamano wrote:
> David Lang <david.lang@digitalinsight.com> writes:
>
>>> But with keyword expansion and fancier "external tools" whose
>>> semantics are not well defined (iow, defined to be "do whatever
>>> they please"), does it still make sense to consider two blobs
>>> that appear in totally different context "the same" and omit
>>> checking out (and causing the external tools hook not getting
>>> run)? I already pointed out to Andy that the branch name the
>>> file was taken from, if it were to take part of the keyword
>>> expansion, would come out incorrectly in his printed svg
>>> drawing.
>>
>> this is part of the rope you are handing out. the external tool could
>> do a lot of things that don't make sense. you could have the tool
>> include the serial number of the cpu you happen to be running on at
>> the moment, it wouldn't make sense to do this, but it could be
>> done. the fact that the rope could be used to hang someone doesn't
>> mean that you should outlaw rope.
>
> I do not think you understand, especially after reading the part
> you say "Andy and I both...".
sorry for not being clear
> The point of my comment was that with Andy's definition of when
> the "external tools" should trigger, that CPU serial number
> embedder would _NOT_ trigger for a path when you switch branches
> that have the same contents at that path. External tools can do
> stupid things and that is what you are calling the rope. But
> the case I am talking about is that we deliberately do _not_
> call external tools, so even if external tools can do sensible
> things if given a chance to, they are not given a chance to do
> so, and deciding not to call them in some cases is made by us.
> I think that's different from "we gave you rope, you hang
> yourself and that is not our problem".
the cpu serial number would be different for each cpu in a system, so you could
get different answers, even in the same branch (let alone on different systems,
which is what I was thinking of when I wrote that example)
my point was that while it's possible to define external tools that will cause
problems (my having effectvly random changes to the files), and when external
tools are used it will slow things down (how much depends on the tools), it's
also possible to define external tools that only have well defined, easily
reversable effects, that only touch a few files, and so don't cause a huge
performance hit.
> People have every right to say "if you consistently call these
> external tools, they behave sensibly, but you only call them
> when you choose, and that is where the idiocy is coming from".
> How would you respond to that?
consistantly calling the external tools is not the same thing as calling them
for every possible thing that refrences the file, it's calling them every time a
particular type of access to the file is made (and it helps to have well defined
and well documented rules for when they are used)
I think the basic rule of 'git commands work against the checked-in version of
the file' is a solid basis to work from. This means that you can't optimize by
sometimes looking at the checked-out version, and there may still be some corner
cases to explain/clarify/define (like the git-diff against the working tree
mentioned in other messages)
David Lang
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 21:16 ` Nicolas Pitre
` (2 preceding siblings ...)
2007-04-17 22:40 ` Junio C Hamano
@ 2007-04-18 6:24 ` Rogan Dawes
2007-04-18 15:02 ` Linus Torvalds
3 siblings, 1 reply; 66+ messages in thread
From: Rogan Dawes @ 2007-04-18 6:24 UTC (permalink / raw)
To: Nicolas Pitre
Cc: David Lang, Andy Parkins, git, Linus Torvalds, Junio C Hamano
Nicolas Pitre wrote:
> On Tue, 17 Apr 2007, David Lang wrote:
>
>> I've
>> been told in the past that once .gitattributes is in place then the hooks for
>> the crlf stuff can be generalized to allow for calls out to custom code to do
>> this sort of thing.
>
> And I agree that this is a perfectly sensible thing to do. The facility
> should be there for you to apply any kind of transformation with
> external tools on data going in or out from Git. There are good and bad
> things you can do with such a facility, but at least it becomes your
> responsibility to screw^H^H^H^Hfilter your data and not something that
> is enforced by Git itself.
>
>
> Nicolas
One of the examples that has been given in the past has been taking a
zipped OpenDocumentFormat file, unzipping it to its component parts, and
then committing the individual files rather than the aggregate.
But I can't figure out how this might work.
One idea is to store the binary ODF file in the index (and in the packs,
etc) as a directory with the individual text (and other) files as
entries within that directory. Then, when various git operations want to
use the directory, the operation is redirected via an attribute match to
an external script that knows how to checkout an ODF "directory", or
diff an ODF "directory", etc.
Or similarly, when checking an "ODF" file in, the attribute would lead
to an appropriate script creating the "tree" of individual files.
Does this sound workable?
Rogan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 6:24 ` Rogan Dawes
@ 2007-04-18 15:02 ` Linus Torvalds
2007-04-18 15:34 ` Nicolas Pitre
2007-04-18 15:38 ` Rogan Dawes
0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-18 15:02 UTC (permalink / raw)
To: Rogan Dawes; +Cc: Nicolas Pitre, David Lang, Andy Parkins, git, Junio C Hamano
On Wed, 18 Apr 2007, Rogan Dawes wrote:
>
> Or similarly, when checking an "ODF" file in, the attribute would lead to an
> appropriate script creating the "tree" of individual files.
>
> Does this sound workable?
I think it sounds very interesting, and I'd much rather do _those_ kinds
of rewrites than keyword unexpansion. And yes, some kind of generic
support for rewriting might give people effectively the keywords they want
(I think the CVS semantics are not likely to be logical, but people can
probably do something that works for them), and at that point maybe the
keyword discussion goes away too.
However, I don't know if it is "workable".
The thing is, it's easy enough (although potentially _very_ expensive) to
run some per-file script at each commit and at each checkout. But there
are some fundamental operations that are even more common:
- checking for "file changed", aka the "git status" kind of thing
Anything we do would have to follow the same "stat" rules, at a
minimum. You can *not* afford to have to check the file manually.
So especially if you combine several pieces into one, or split one file
into several pieces, your index would have to contain the entry
that matches the _filesystem_ (because that's what the index is all
about), but then the *tree* would contain the pieces (or the single
entry that matches several filesystem entries).
- what about diffs (once the stat information says something has
potentially changed)? You'd have to script those too, and it really
sounds like some very basic operations get a _lot_ more expensive and
complex.
This is also related to the above: one of the most fundamental diffs is
the diff of the index and a tree - so if the index matches the
"filesystem state" and the trees contain some "combined entry" or
"split entry", you'd have to teach some very core diff functionality
about that kind of mapping.
In other words, I think it's too complicated. Not necessarily impossible,
but likely harder and more complex than it's really worth.
Having a 1:1 file mapping (like the CRLF<->LF object mapping is) is a lot
easier. You just have to make sure that the index has the *stat*
information from the filesystem, but the *sha1* identity information from
the git internal format, and things automatically just fall out right. But
if you have anything but a 1:1 relationship, it gets hugely more complex.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 15:02 ` Linus Torvalds
@ 2007-04-18 15:34 ` Nicolas Pitre
2007-04-18 15:38 ` Rogan Dawes
1 sibling, 0 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18 15:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Rogan Dawes, David Lang, Andy Parkins, git, Junio C Hamano
On Wed, 18 Apr 2007, Linus Torvalds wrote:
>
>
> On Wed, 18 Apr 2007, Rogan Dawes wrote:
> >
> > Or similarly, when checking an "ODF" file in, the attribute would lead to an
> > appropriate script creating the "tree" of individual files.
> >
> > Does this sound workable?
>
> I think it sounds very interesting, and I'd much rather do _those_ kinds
> of rewrites than keyword unexpansion. And yes, some kind of generic
> support for rewriting might give people effectively the keywords they want
> (I think the CVS semantics are not likely to be logical, but people can
> probably do something that works for them), and at that point maybe the
> keyword discussion goes away too.
Exactly my point.
> However, I don't know if it is "workable".
>
> The thing is, it's easy enough (although potentially _very_ expensive) to
> run some per-file script at each commit and at each checkout. But there
> are some fundamental operations that are even more common:
>
> - checking for "file changed", aka the "git status" kind of thing
>
> Anything we do would have to follow the same "stat" rules, at a
> minimum. You can *not* afford to have to check the file manually.
>
> So especially if you combine several pieces into one, or split one file
> into several pieces, your index would have to contain the entry
> that matches the _filesystem_ (because that's what the index is all
> about), but then the *tree* would contain the pieces (or the single
> entry that matches several filesystem entries).
For that the external script would need the ability to alter the index
itself. That becomes a bit yucky. Or maybe something could be made
with a mechanism like dnotify/inotify to "touch" the single placeholder
entry referenced by the index whenever one of the split component
changes.
> - what about diffs (once the stat information says something has
> potentially changed)? You'd have to script those too, and it really
> sounds like some very basic operations get a _lot_ more expensive and
> complex.
Of course an attribute for external diff script is certainly something
that could be useful independently of this case, as some particular
binary formats might have a way of their own to display their
differences.
The whole idea of having the ability to call external tools is exactly
to delegate complex/bizarre/unusual tasks to separate and independent
agents. The whole checkout operation becomes much more expensive but
everyone using such facility might expect it. It just cannot be as bad
as a straight checkout with CVS from a remote server though (OK I know
it can but you know what I mean).
> This is also related to the above: one of the most fundamental diffs is
> the diff of the index and a tree - so if the index matches the
> "filesystem state" and the trees contain some "combined entry" or
> "split entry", you'd have to teach some very core diff functionality
> about that kind of mapping.
Well, if the split components are represented by a single placeholder in
the index and the filesystem, and the filesystem placeholder is
"touched" whenever a split component is modified, then the mapping can
as well be limited to the external scripts for checkin/checkout/diff
only without the Git core having the slightest idea about it.
Sure it might be slow and unusual, but at least not impossible. And
again, with an attribute providing a facility for external tools it is
then not our problem anymore.
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 15:02 ` Linus Torvalds
2007-04-18 15:34 ` Nicolas Pitre
@ 2007-04-18 15:38 ` Rogan Dawes
2007-04-18 15:59 ` Nicolas Pitre
1 sibling, 1 reply; 66+ messages in thread
From: Rogan Dawes @ 2007-04-18 15:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: Nicolas Pitre, David Lang, Andy Parkins, git, Junio C Hamano
Linus Torvalds wrote:
>
> On Wed, 18 Apr 2007, Rogan Dawes wrote:
>> Or similarly, when checking an "ODF" file in, the attribute would lead to an
>> appropriate script creating the "tree" of individual files.
>>
>> Does this sound workable?
>
> I think it sounds very interesting, and I'd much rather do _those_ kinds
> of rewrites than keyword unexpansion. And yes, some kind of generic
> support for rewriting might give people effectively the keywords they want
> (I think the CVS semantics are not likely to be logical, but people can
> probably do something that works for them), and at that point maybe the
> keyword discussion goes away too.
>
> However, I don't know if it is "workable".
>
> The thing is, it's easy enough (although potentially _very_ expensive) to
> run some per-file script at each commit and at each checkout. But there
> are some fundamental operations that are even more common:
>
> - checking for "file changed", aka the "git status" kind of thing
>
> Anything we do would have to follow the same "stat" rules, at a
> minimum. You can *not* afford to have to check the file manually.
>
> So especially if you combine several pieces into one, or split one file
> into several pieces, your index would have to contain the entry
> that matches the _filesystem_ (because that's what the index is all
> about), but then the *tree* would contain the pieces (or the single
> entry that matches several filesystem entries).
Right. I would imagine that the script would have to take care of
setting timestamps in the filesystem appropriately, as well as passing
them back to git when queried.
e.g. expanding test.odf/: (since we store it as a directory)
git calls "odf.sh checkout test.odf/ <sha1> <perms> <stat>"
odf checkout calls back into git to find out the details of the files
under test.odf/, and creates a zip file containing the individual files,
with appropriate timestamps.
User then opens the file using OO.o or whatever, makes some changes and
saves the file.
The user then runs git status:
git calls "odf.sh stat test.odf/" (again, triggered by an attribute)
odf.sh does the equivalent of "zip -l" to get up to date stat info for
the component files, and passes it back to git (via stdout?)
User commits his changes:
git calls "odf.sh checkin test.odf/"
odf.sh unpacks the individual files, calls back into git to create
individual objects (using a fast-import-alike protocol over stdout?)
>
> - what about diffs (once the stat information says something has
> potentially changed)? You'd have to script those too, and it really
> sounds like some very basic operations get a _lot_ more expensive and
> complex.
>
> This is also related to the above: one of the most fundamental diffs is
> the diff of the index and a tree - so if the index matches the
> "filesystem state" and the trees contain some "combined entry" or
> "split entry", you'd have to teach some very core diff functionality
> about that kind of mapping.
>
> In other words, I think it's too complicated. Not necessarily impossible,
> but likely harder and more complex than it's really worth.
>
> Having a 1:1 file mapping (like the CRLF<->LF object mapping is) is a lot
> easier. You just have to make sure that the index has the *stat*
> information from the filesystem, but the *sha1* identity information from
> the git internal format, and things automatically just fall out right. But
> if you have anything but a 1:1 relationship, it gets hugely more complex.
>
> Linus
Absolutely. I just raised it now since it was originally mentioned quite
a long time ago as a possible feature of git, and I couldn't see how it
might work.
Thanks for your time,
Rogan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 15:38 ` Rogan Dawes
@ 2007-04-18 15:59 ` Nicolas Pitre
2007-04-18 16:09 ` Rogan Dawes
2007-04-18 17:58 ` Alon Ziv
0 siblings, 2 replies; 66+ messages in thread
From: Nicolas Pitre @ 2007-04-18 15:59 UTC (permalink / raw)
To: Rogan Dawes; +Cc: Linus Torvalds, David Lang, Andy Parkins, git, Junio C Hamano
On Wed, 18 Apr 2007, Rogan Dawes wrote:
> Right. I would imagine that the script would have to take care of setting
> timestamps in the filesystem appropriately, as well as passing them back to
> git when queried.
>
> e.g. expanding test.odf/: (since we store it as a directory)
>
> git calls "odf.sh checkout test.odf/ <sha1> <perms> <stat>"
>
> odf checkout calls back into git to find out the details of the files under
> test.odf/, and creates a zip file containing the individual files, with
> appropriate timestamps.
Why would you need to store the document as multiple files into Git?
The only reasons I can see for external filters are:
1) Normalization, e.g. the LF->CRLF thing.
Some might want to do keyword expansion which would fall into this
category as well.
2) Better archiving with Git's deltas.
That means storing files uncompressed into Git since Git will
compress them anyway, after significant space reduction due to
deltas which cannot occur on already compressed data.
So if your .odf file is actually a zip with multiple files, then all you
have to do is to convert that zip archive into a non compressed tar
archive on checkins, and the reverse transformation on checkouts. The
non compressed tar content will delta well, the Git archive will be
small, and no tricks with the index will be needed.
Or am I missing something?
Nicolas
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 15:59 ` Nicolas Pitre
@ 2007-04-18 16:09 ` Rogan Dawes
2007-04-18 17:58 ` Alon Ziv
1 sibling, 0 replies; 66+ messages in thread
From: Rogan Dawes @ 2007-04-18 16:09 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Linus Torvalds, David Lang, Andy Parkins, git, Junio C Hamano
Nicolas Pitre wrote:
> On Wed, 18 Apr 2007, Rogan Dawes wrote:
>
>> Right. I would imagine that the script would have to take care of setting
>> timestamps in the filesystem appropriately, as well as passing them back to
>> git when queried.
>>
>> e.g. expanding test.odf/: (since we store it as a directory)
>>
>> git calls "odf.sh checkout test.odf/ <sha1> <perms> <stat>"
>>
>> odf checkout calls back into git to find out the details of the files under
>> test.odf/, and creates a zip file containing the individual files, with
>> appropriate timestamps.
>
> Why would you need to store the document as multiple files into Git?
>
> The only reasons I can see for external filters are:
>
> 1) Normalization, e.g. the LF->CRLF thing.
>
> Some might want to do keyword expansion which would fall into this
> category as well.
>
> 2) Better archiving with Git's deltas.
>
> That means storing files uncompressed into Git since Git will
> compress them anyway, after significant space reduction due to
> deltas which cannot occur on already compressed data.
>
> So if your .odf file is actually a zip with multiple files, then all you
> have to do is to convert that zip archive into a non compressed tar
> archive on checkins, and the reverse transformation on checkouts. The
> non compressed tar content will delta well, the Git archive will be
> small, and no tricks with the index will be needed.
>
> Or am I missing something?
>
>
> Nicolas
Probably not! ;-)
I was just thinking that it would be easier to see diffs between
individual files, rather than between entries in a zip. But if we are
calling out to a specialized handler, the handler can do that just as
easily, and without the added complexity in the index, etc.
It also means that someone without the attributes and specialized
handler would not be able to use the file (if it is stored as a directory).
Clearly a bad idea! Just ignore me, I'm used to it! ;-)
Rogan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 15:59 ` Nicolas Pitre
2007-04-18 16:09 ` Rogan Dawes
@ 2007-04-18 17:58 ` Alon Ziv
1 sibling, 0 replies; 66+ messages in thread
From: Alon Ziv @ 2007-04-18 17:58 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Rogan Dawes, Linus Torvalds, David Lang, Andy Parkins, git,
Junio C Hamano
On Wed, 2007-04-18 at 11:59 -0400, Nicolas Pitre wrote:
> So if your .odf file is actually a zip with multiple files, then all you
> have to do is to convert that zip archive into a non compressed tar
> archive on checkins, and the reverse transformation on checkouts. The
> non compressed tar content will delta well, the Git archive will be
> small, and no tricks with the index will be needed.
>
In fact, for the specific case of OO.o files, I would claim the proper
transformation is just converting to non-compressed zip on checkin...
(Non-compressed zip is just as good here as tar, and has the added
advantage that there is no need for a reverse transformation on
checkout :))
-az
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 19:12 ` Andy Parkins
[not found] ` <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>
2007-04-17 19:41 ` Nicolas Pitre
@ 2007-04-17 19:54 ` Linus Torvalds
2007-04-17 20:46 ` Andy Parkins
2007-04-17 21:18 ` Martin Langhoff
3 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 19:54 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Junio C Hamano
On Tue, 17 Apr 2007, Andy Parkins wrote:
>
> Look, my primary objection to this is the SHOUTING about how impossible
> it is even though I've tried to address every problem you've thrown at
> me
No, you haven't. You've "addressed" them by stating they don't matter. It
doesn't "matter" that a diff won't actually apply to a checked-out tree,
because you fix it up in another tool.
And it doesn't "matter" that switching branches will just result in the
wrong keyword expansion, because you don't care about the keywords
actually being "correct" - they are just random strings, and it apparently
doesn't really have to "work" as far as you're concerned.
And the "git grep" concern you just dismissed by stating that it should
use the filesystem copy, never mind that this just means that a clean
working tree gets different results from doing the same thing based on
that same revision.
In other words, you simply don't seem to worry about TRUSTING the results.
It's ok if patches don't apply, or if you get different results on working
trees than "inside" the revision control.
And the reaon I'm shouting is that "it doesn't matter that it's a bit
hacky" mentality is what gets you things like CVS in the end. Bit-for-bit
results actually matter. Guarantees actually matter. And you should not be
able to see a differece in the working tree just because you happened to
be on a different branch before.
Those are the kind of nasty surprises that make people go: "I don't know
what the end result is, because there is an element of 'just how did you
happen to do that operation' to it".
I want to *trust* the SCM I use.
> I'm finding it really difficult to figure out why you're trying so
> hard to dissuade me from even _trying_.
You can try, but you are *ignoring* the things that I say. The end result
will either perform really badly, or you cannot trust it, or *both*. And
you'll introduce interesting semantics like "diffs won't actually apply to
the working tree with normal tools".
(And yes, git diffs are extended, but they *do* apply to working trees in
all cases where normal "patch" can even support the notion in the first
place.)
And it's not just things like diff and switching branches. If you want
your keywords to generate things like "last modified by Xyzzy", you
haven't even explained *how* you'd do that. Yeah, you can do
git log --pretty=oneline --abbrev-commit -1 -- filename
etc, and you probably think it's instantaneous, but do the timings for a
big repository with a file that hasn't been modified in months, and then
imagine doing that for an initial checkout (say, after you set the
"keyword" attribute for all *.c files).
Whoops. The checkout took an hour. Is that really a path you want to go
down?
> Keywords are so hated by everyone that I doubt they would ever be
> accepted into git - it's an intellectual exercise for me at this stage
> really.
If that's what it is, fine. But people on the list seem to actually *want*
it. They must be educated what a *disaster* it would be to actually try to
really support something like it in real life, and not just as a mental
exercise.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 19:54 ` Linus Torvalds
@ 2007-04-17 20:46 ` Andy Parkins
2007-04-17 20:52 ` [PATCH] Add keyword collapse " Andy Parkins
` (2 more replies)
0 siblings, 3 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 20:46 UTC (permalink / raw)
To: git; +Cc: Linus Torvalds, Junio C Hamano
On Tuesday 2007, April 17, Linus Torvalds wrote:
> No, you haven't. You've "addressed" them by stating they don't
> matter. It doesn't "matter" that a diff won't actually apply to a
> checked-out tree, because you fix it up in another tool.
Okay. I think this is a matter of perspective - my perspective is that
if it supplies what svn/cvs supply then that would please the people
who want it (of whom I am one); yours is obviously that if it isn't
perfect, it's not worth doing. That's a reasonable thing to demand,
and I'm not going to try and argue you out of it.
> And it doesn't "matter" that switching branches will just result in
> the wrong keyword expansion, because you don't care about the
> keywords actually being "correct" - they are just random strings, and
> it apparently doesn't really have to "work" as far as you're
> concerned.
If you define "work" as "works like cvs/svn does", then I was fine with
it. I don't like it when my favourite VCS, that I want everyone to
use, doesn't have an answer to "but does it do X?".
> And the "git grep" concern you just dismissed by stating that it
> should use the filesystem copy, never mind that this just means that
> a clean working tree gets different results from doing the same thing
> based on that same revision.
As I said at the time, I just picked one of the two options. If you
don't like that, pick the other option - collapse the keywords during
the grep...
> And the reaon I'm shouting is that "it doesn't matter that it's a bit
> hacky" mentality is what gets you things like CVS in the end.
> Bit-for-bit results actually matter. Guarantees actually matter. And
> you should not be able to see a differece in the working tree just
> because you happened to be on a different branch before.
Bit-for-bit as in CRLF is untouched? No? Bit-for-bit as in you said
you were okay with keyword-collapsing but not expansion? You're just
as willing to compromise as me, you've just drawn the line in a
different place.
Incidentally: for future reference, I'll read what you write regardless
of whether you shout it or not.
> You can try, but you are *ignoring* the things that I say. The end
I've tried very hard to respond to every point you've put to me; I've
not selectively chopped out bits, and I've tried to give answers that
make it work as you ask. Now, none of those things were acceptable to
you - which is fine - but I certinaly wasn't ignoring what you say -
_disagreeing with_ is not the same as ignoring.
> If that's what it is, fine. But people on the list seem to actually
> *want* it. They must be educated what a *disaster* it would be to
> actually try to really support something like it in real life, and
> not just as a mental exercise.
People wanting something "wrong" so much is not a sign that they need
educating, it's a sign that they need a solution. In every other
respect git has a solution for them; rather than explaining to them
that what they want is stupid, I'd offer that it's more appropriate to
offer something better in exchange. So my keyword expansion idea is
wrong - fine - where's the something better? Writing custom scripts
and makefiles for every project I ever run is /not/ "something better".
Anyway, it's late, and I'm tired - this has turned into a battle of
wills, and I'm not that into battling. Enough antihistamine has been
poured on my itch that I no longer want to scratch it. I'll send my
most recent patch for the sake of history, and then abandon this
project.
Thanks for your time on this, I appreciate your detailed responses, even
if we don't agree.
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 66+ messages in thread
* [PATCH] Add keyword collapse support to convert.c
2007-04-17 20:46 ` Andy Parkins
@ 2007-04-17 20:52 ` Andy Parkins
2007-04-17 21:10 ` [PATCH 2/2] Add keyword unexpansion " Linus Torvalds
2007-04-20 11:32 ` Nikolai Weibull
2 siblings, 0 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 20:52 UTC (permalink / raw)
To: git
This patch adds expansion of keywords support. The collapse is only
performed when the "keywords" attribute it found for a file. The check
for this attribute is done in the same way as the "crlf" attribute
check.
The actual collapse is performed by keyword_collapse_git() which is
called from convert_to_git() when the "keywords" attribute is found.
keyword_collapse_git() finds strings of the form
$KEYWORD: ARBITRARY STRING$
And collapses them into
$KEYWORD:$
No parsing of the keyword itself is performed, the content is simply
dropped.
Despite the fact that this doesn't do anything useful from the users
perspective, this patch forms the more important half of keyword
expansion support - because it prevents the expansion from entering the
repository. It effectively creates blind spots that git tools won't
see.
convert_to_git() has also been changed so that it no longer only does
CRLF conversion. Instead, a flag is kept to say whether any conversion
was done by the CRLF code, and then that converted buffer is passed to
keyword_collapse_git() and the flag again updated. It then returns 1 if
either of these conversion functions actually changed anything.
I've also included a test script to show that the keyword collapse is
working. It particular demonstrates that the diff between a file with
keywords and the repository is blind to the expanded keyword.
git-apply is patched to perform the collapse as well on each fragment.
Signed-off-by: Andy Parkins <andyparkins@gmail.com>
---
This is on top of 1ddfc1ad616550764056077b9e12a35533298c89.
I'm positing it for posterity. It's not going anywhere though, so I'm not
submitting it for inclusion.
builtin-apply.c | 2 +
convert.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++-
t/t0030-keywords.sh | 95 +++++++++++++++++++++++++++++++++++++++
3 files changed, 217 insertions(+), 3 deletions(-)
create mode 100755 t/t0030-keywords.sh
diff --git a/builtin-apply.c b/builtin-apply.c
index fd92ef7..212c7d4 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -1056,6 +1056,8 @@ static int parse_single_patch(char *line, unsigned long size, struct patch *patc
unsigned long oldlines = 0, newlines = 0, context = 0;
struct fragment **fragp = &patch->fragments;
+ convert_to_git( patch->new_name, &line, &size );
+
while (size > 4 && !memcmp(line, "@@ -", 4)) {
struct fragment *fragment;
int len;
diff --git a/convert.c b/convert.c
index d0d4b81..a18e7ea 100644
--- a/convert.c
+++ b/convert.c
@@ -230,16 +230,133 @@ static int git_path_check_crlf(const char *path)
return attr_crlf_check.isset;
}
+/* ------------------ keywords -------------------- */
+
+static void setup_keyword_check(struct git_attr_check *check)
+{
+ static struct git_attr *attr_keyword;
+
+ if (!attr_keyword)
+ attr_keyword = git_attr("keywords", 8);
+ check->attr = attr_keyword;
+}
+
+static int git_path_check_keyword(const char *path)
+{
+ struct git_attr_check attr_keyword_check;
+
+ setup_keyword_check(&attr_keyword_check);
+
+ if (git_checkattr(path, 1, &attr_keyword_check))
+ return -1;
+ return attr_keyword_check.isset;
+}
+
+static int keyword_collapse_git(const char *path, char **bufp, unsigned long *sizep)
+{
+ char *buffer, *nbuf, *keyword;
+ unsigned long size, keywordlength;
+ int changes = 0;
+ enum {
+ IN_VOID,
+ PRE_KEYWORD,
+ IN_KEYWORD,
+ IN_EXPANSION,
+ END_KEYWORD
+ } parser_state = IN_VOID;
+
+ size = *sizep;
+ if (!size)
+ return 0;
+ buffer = *bufp;
+
+ /*
+ * Allocate an identically sized buffer, keyword collapse can
+ * only reduce the size so we'll never overflow (although we might
+ * waste a few bytes
+ */
+ nbuf = xmalloc(size);
+ *bufp = nbuf;
+
+ while (size) {
+ unsigned char c;
+
+ c = *buffer;
+
+ switch( parser_state ) {
+ case IN_VOID: /* Normal characters, wait for '$' */
+ if (c == '$')
+ parser_state = PRE_KEYWORD;
+ break;
+ case PRE_KEYWORD: /* Gap between '$' and keyword */
+ keywordlength = 0;
+ keyword = buffer;
+ if (!isspace(c))
+ parser_state = IN_KEYWORD;
+ else
+ break;
+ case IN_KEYWORD: /* Keyword itself */
+ if (c == ':')
+ parser_state = IN_EXPANSION;
+ else if (c == '$' || c == '\n' || c == '\r' || c == '\0' )
+ parser_state = END_KEYWORD;
+ else
+ keywordlength++;
+ break;
+ case IN_EXPANSION: /* The expansion gets silently removed */
+ if (c == '$' || c == '\n' || c == '\r' || c == '\0' )
+ parser_state = END_KEYWORD;
+ else {
+ changes = 1;
+ /* Every character we skip reduces the overall size */
+ (*sizep)--;
+ buffer++;
+ size--;
+ }
+ continue;
+ case END_KEYWORD: /* End of keyword */
+ parser_state = IN_VOID;
+ break;
+ }
+
+ *nbuf++ = c;
+ buffer++;
+ size--;
+ }
+
+ return (changes != 0);
+}
+
+
+/* ------------------------------------------------ */
int convert_to_git(const char *path, char **bufp, unsigned long *sizep)
{
+ int changes = 0;
+
switch (git_path_check_crlf(path)) {
case 0:
- return 0;
+ changes += 0;
+ break;
+ case 1:
+ changes += forcecrlf_to_git(path, bufp, sizep);
+ break;
+ default:
+ changes += autocrlf_to_git(path, bufp, sizep);
+ break;
+ }
+
+ switch (git_path_check_keyword(path)) {
+ case 0:
+ changes += 0;
+ break;
case 1:
- return forcecrlf_to_git(path, bufp, sizep);
+ changes += keyword_collapse_git(path, bufp, sizep);
+ break;
default:
- return autocrlf_to_git(path, bufp, sizep);
+ changes += 0;
+ break;
}
+ return (changes != 0);
}
int convert_to_working_tree(const char *path, char **bufp, unsigned long *sizep)
diff --git a/t/t0030-keywords.sh b/t/t0030-keywords.sh
new file mode 100755
index 0000000..5180b0e
--- /dev/null
+++ b/t/t0030-keywords.sh
@@ -0,0 +1,95 @@
+#!/bin/sh
+
+cd $(dirname $0)
+
+test_description='Keyword expansion'
+
+. ./test-lib.sh
+
+# Adding the attribute "keywords" turns the keyword expansion on
+# I've used "notkeywords" as an attribute as a placeholder attribute
+# but this is just "somerandomattribute", it has no meaning
+
+# Expect success because the keyword attribute should be found
+test_expect_success 'Keywords attribute present' '
+
+ echo "keywordsfile keywords" >.gitattributes &&
+
+ echo "\$keyword: anythingcangohere\$" > keywordsfile &&
+
+ git add keywordsfile &&
+ git add .gitattributes &&
+ git commit -m test-keywords &&
+
+ git check-attr keywords -- keywordsfile
+'
+
+# Expect failure because the repository version should be different from the
+# working tree version.
+#
+# In repository : $keyword:$
+# In working dir: $keyword: anythingcangohere$
+#
+test_expect_failure 'Keywords collapse active' '
+
+ git show HEAD:keywordsfile > keywordsfile.cmp &&
+ cmp keywordsfile keywordsfile.cmp
+
+'
+
+# expect success because we want to find the keyword line collapsed in the
+# and hence appearing unchanged in the output of git-diff
+test_expect_success 'git-diff with keywords present' '
+ echo "Non-keyword containing line" >> keywordsfile &&
+ git diff -- keywordsfile | grep -qs "^ \$keyword:\$$"
+'
+
+# Check git-apply blindness
+cat > keyword-patch.diff << EOF
+diff --git a/keywordsfile b/keywordsfile
+--- a/keywordsfile
++++ b/keywordsfile
+@@ -1,2 +1,2 @@
+ \$keyword:\$
+-Non-keyword containing line
++Another non-keyword containing line
+EOF
+
+test_expect_success 'patch application with keywords active' '
+ git-apply --check keyword-patch.diff
+'
+
+# Expect failure because the keywords attribute should NOT be found
+test_expect_failure 'Keywords attribute absent' '
+
+ echo "keywordsfile notkeywords" >.gitattributes &&
+
+ git add .gitattributes &&
+ git commit -m test-not-keywords &&
+
+ git check-attr keywords -- keywordsfile
+
+'
+
+# If keywords are later disabled on that file, then the keyword collapsed
+# will be ignored, so a diff should now show differences, because git is no
+# longer keyword blind
+test_expect_success 'git-diff with keywords in file but disabled' '
+ git diff -- keywordsfile | grep -qs "^diff"
+'
+
+# Expect success because the repository should be identical to the working tree
+test_expect_success 'Keywords collapse inactive' '
+
+ git add keywordsfile &&
+ git commit -m "test-not-keywords"
+
+ git show HEAD:keywordsfile > keywordsfile.cmp &&
+ cmp keywordsfile keywordsfile.cmp
+'
+
+test_expect_failure 'patch application without keywords active' '
+ git-apply --check keyword-patch.diff
+'
+
+test_done
--
1.5.1.1.822.g0049
^ permalink raw reply related [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 20:46 ` Andy Parkins
2007-04-17 20:52 ` [PATCH] Add keyword collapse " Andy Parkins
@ 2007-04-17 21:10 ` Linus Torvalds
2007-04-17 21:13 ` Linus Torvalds
2007-04-20 11:32 ` Nikolai Weibull
2 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 21:10 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Junio C Hamano
On Tue, 17 Apr 2007, Andy Parkins wrote:
>
> If you define "work" as "works like cvs/svn does", then I was fine with
> it.
I can't really argue against that. Yes, I agree 100% that we can "work" in
the sense that "cvs/svn works". There's clearly no fundamental reasons why
you can't, since svn/cvs obviously do it.
I just do have higher standards. I really dislike CVS, and in many ways I
actually think that SVN is even worse (not because it's really "worse",
but because I think it is such a waste - it fixes the _trivial_ things
about CVS, but doesn't really fix any of the underlying problems).
So I don't actually think that CVS "works".
> Bit-for-bit as in CRLF is untouched? No? Bit-for-bit as in you said
> you were okay with keyword-collapsing but not expansion? You're just
> as willing to compromise as me, you've just drawn the line in a
> different place.
Bit-for-bit as in "you have to be able to trust every single bit".
And no, I don't actually love CRLF either. But it doesn't have quite the
same fundamental problems. It has issues too, but they are fundamentally
smaller, and I think making "git compatible with Windows" is also a lot
more important than making "git compatible with CVS users".
Windows we cannot change. CVS users we can try to help.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 21:10 ` [PATCH 2/2] Add keyword unexpansion " Linus Torvalds
@ 2007-04-17 21:13 ` Linus Torvalds
2007-04-18 11:11 ` Johannes Schindelin
0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 21:13 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Junio C Hamano
On Tue, 17 Apr 2007, Linus Torvalds wrote:
>
> Windows we cannot change. CVS users we can try to help.
.. and if it wasn't clear, "helping" CVS users is not in my opinion to try
to make git act like CVS, and lettign them do stupid things, but to try to
help them become *more* than CVS users.
Because they too can become upstanding members of society, and leave their
dark past behind them. I firmly believe that nobody is past saving.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 21:13 ` Linus Torvalds
@ 2007-04-18 11:11 ` Johannes Schindelin
0 siblings, 0 replies; 66+ messages in thread
From: Johannes Schindelin @ 2007-04-18 11:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andy Parkins, git, Junio C Hamano
Hi,
On Tue, 17 Apr 2007, Linus Torvalds wrote:
> On Tue, 17 Apr 2007, Linus Torvalds wrote:
> >
> > Windows we cannot change. CVS users we can try to help.
>
> .. and if it wasn't clear, "helping" CVS users is not in my opinion to
> try to make git act like CVS, and lettign them do stupid things, but to
> try to help them become *more* than CVS users.
I am quite certain that we also can help Windows users see the light. Once
we have them not only complaining, but actually doing something about it.
> Because they too can become upstanding members of society, and leave their
> dark past behind them. I firmly believe that nobody is past saving.
Well, it depends. If you clicked on that File Menu button, and then
clicked on the "Save" item, you are past saving.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 20:46 ` Andy Parkins
2007-04-17 20:52 ` [PATCH] Add keyword collapse " Andy Parkins
2007-04-17 21:10 ` [PATCH 2/2] Add keyword unexpansion " Linus Torvalds
@ 2007-04-20 11:32 ` Nikolai Weibull
2 siblings, 0 replies; 66+ messages in thread
From: Nikolai Weibull @ 2007-04-20 11:32 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Linus Torvalds, Junio C Hamano
On 4/17/07, Andy Parkins <andyparkins@gmail.com> wrote:
> On Tuesday 2007, April 17, Linus Torvalds wrote:
>
> > No, you haven't. You've "addressed" them by stating they don't
> > matter. It doesn't "matter" that a diff won't actually apply to a
> > checked-out tree, because you fix it up in another tool.
>
> Okay. I think this is a matter of perspective - my perspective is that
> if it supplies what svn/cvs supply then that would please the people
> who want it (of whom I am one); yours is obviously that if it isn't
> perfect, it's not worth doing. That's a reasonable thing to demand,
> and I'm not going to try and argue you out of it.
Loads of people would be pleased if marijuana was legalized. For some
reason, few governments seem willing to cater to their needs.
nikolai
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 19:12 ` Andy Parkins
` (2 preceding siblings ...)
2007-04-17 19:54 ` Linus Torvalds
@ 2007-04-17 21:18 ` Martin Langhoff
3 siblings, 0 replies; 66+ messages in thread
From: Martin Langhoff @ 2007-04-17 21:18 UTC (permalink / raw)
To: Andy Parkins; +Cc: git, Linus Torvalds, Junio C Hamano
On 4/18/07, Andy Parkins <andyparkins@gmail.com> wrote:
> On Tuesday 2007, April 17, Linus Torvalds wrote:
> > So you will never work with anybody outside of git?
> For my projects - correct; I don't care about the rest of the world.
> For projects that do - don't enable keywords, it's an option, all I
> want is to have that option.
GIT's fundamental respect for the contents its tracking (in not
munging them with keyword expansion) means that it works great in
contexts where other SCMs tools are used. And being content-centric at
the SCM layer means that it is possible to track a git project with -
say - Mercurial (and vice-versa) with nothing more than a bit of perl
glue and no guessing at all. The content is the content is the
content.
When the SCM has "munge-the-content" options, your "upstream" can make
things completely un-trackable. Tracking CVS with git is a breeze but
there is breakage related to keyword expansion. I should write some
better heuristics for it, but it's impossible to know with 100%
certainty that you are doing the right thing, and getting the correct
content from CVS.
All this talk of breaking non-git-patch goes back to the same. With
the current design, projects that use git are easily trackable if you
just look at the content, and ignore the SCM. That's an outstanding
property and quite central to the design, and I wouldn't include an
option to "turn it off" even if the patch to implement it turns out to
be trivial.
Probably using make will help - you might be able to wire it to work
off a post-update-hook so it's completely transparent.
cheers,
martin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 11:35 ` Andy Parkins
2007-04-17 15:53 ` Linus Torvalds
@ 2007-04-17 21:24 ` Junio C Hamano
2007-04-20 0:30 ` Jakub Narebski
2 siblings, 0 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-17 21:24 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
Andy Parkins <andyparkins@gmail.com> writes:
> On Tuesday 2007 April 17 11:09, Junio C Hamano wrote:
>
>> $ git checkout B
>>
>> should be immediate and instantaneous.
>
> Now - that's a much better argument. However, it's not
> relevant, keywords (in other VCSs, and so why not in git) are
> only updated when a file is checked out.
It _is_ very much relevant.
If you have the keyword in your svg drawing, and if branch A and
branch B happen to have textually the same contents but the way
they got there are different, I do not think not checking it out
upon branch switching is correct. Otherwise your printed copy
would have information from the version in branch A, even after
switching to B.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 11:35 ` Andy Parkins
2007-04-17 15:53 ` Linus Torvalds
2007-04-17 21:24 ` Junio C Hamano
@ 2007-04-20 0:30 ` Jakub Narebski
2007-04-21 0:47 ` David Lang
2 siblings, 1 reply; 66+ messages in thread
From: Jakub Narebski @ 2007-04-20 0:30 UTC (permalink / raw)
To: git
Andy Parkins wrote:
>> * We do not do the borrowing from working tree when doing
>> grep_sha1(), but when we grep inside a file from working tree
>> with grep_file(), we do not currently make it go through
>> convert_to_git() to fix line endings. Maybe we should, if
>> only for consistency.
>
> I'd actually argue not - git-grep searches the working tree. The expanded
> keywords are in the working tree. Take the CRLF case - I'm a clueless user,
> who only understands the system I'm working on. I want to search for all the
> line endings, so I do git-grep "\r\n" - that should work, because I'm
> searching my working tree.
Actually, "git grep" can search both the working tree (default), but also
an index (--cached), or specified tree (or tree-ish). The same with
"git diff": it can work on tree (repository), index, working tree version,
now I think in [almost] any combination.
Think what keyword expansion means to all this... Well, you can have -kk
to expand/not expand keywords, but this is avoiding issue, not solving it
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-20 0:30 ` Jakub Narebski
@ 2007-04-21 0:47 ` David Lang
0 siblings, 0 replies; 66+ messages in thread
From: David Lang @ 2007-04-21 0:47 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1884 bytes --]
On Fri, 20 Apr 2007, Jakub Narebski wrote:
> Andy Parkins wrote:
>
>>> * We do not do the borrowing from working tree when doing
>>> grep_sha1(), but when we grep inside a file from working tree
>>> with grep_file(), we do not currently make it go through
>>> convert_to_git() to fix line endings. Maybe we should, if
>>> only for consistency.
>>
>> I'd actually argue not - git-grep searches the working tree. The expanded
>> keywords are in the working tree. Take the CRLF case - I'm a clueless user,
>> who only understands the system I'm working on. I want to search for all the
>> line endings, so I do git-grep "\r\n" - that should work, because I'm
>> searching my working tree.
>
> Actually, "git grep" can search both the working tree (default), but also
> an index (--cached), or specified tree (or tree-ish). The same with
> "git diff": it can work on tree (repository), index, working tree version,
> now I think in [almost] any combination.
>
> Think what keyword expansion means to all this... Well, you can have -kk
> to expand/not expand keywords, but this is avoiding issue, not solving it
how is git-grep on the working tree any different than just useing grep? the
value in the git-* versions of system commands are that they work on the
history, index, etc wher ethe normal system tools don't.
in this particular case, since the git user can do a git-grep of the working
tree, or a git-grep of HEAD (or of the index), I don't think that it hurts much
either way.
if git-grep of the working tree converts things to the checked-in version before
the pattern match, the user can still use grep to go through the checked-out
version
if git-grep of the working tree doesn't convert things to the checked-in version
before the pattern match, the user can stil use git-grep HEAD or --cached to go
through the checked-in version.
David Lang
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 10:09 ` Junio C Hamano
2007-04-17 11:35 ` Andy Parkins
@ 2007-04-17 15:46 ` Linus Torvalds
1 sibling, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 15:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Andy Parkins, git
On Tue, 17 Apr 2007, Junio C Hamano wrote:
>
> Andy Parkins <andyparkins@gmail.com> writes:
>
> > No parsing of the keyword itself is performed, the content is simply
> > dropped.
>
> You are sidestepping the most important problem by doing this.
I obviosly agree (and I agree with everything in your email), but:
> The only sensible keyword you could have, without destroying
> what git is, is blob id. No commit id, no date, no author.
Yes. And I already talked about some of the very fundamental problems that
keyword expansion has (ie switching branches is basically impossible to do
without checking out _every_single_file_ with the "keyword" attribute
set. There are others).
Now, unexpansion is trivial to do (it really *is* the same as the
"CRLF->LF" translation: that's technically really just an "unexpansion"
too). And it should work.
The way this does unexpansion also breaks "git diff" in that it bassically
always makes diff *ignore* the keywords. In other words, when you do
git diff A..B
and send the diff to somebody else, they'll never see any keywords at all!
Now, that obviously fulfills my requirement that the diff be empty if A
and B are the same, so you should expect me to be happy. But I'm not
happy, because if the other person also is using git, HE CANNOT EVEN APPLY
THE DIFF! Even if he's at "A", and thus gets a diff that is supposed to
apply *exactly*, he'll get rejects if there were other changes around the
unexpanded keyword (which *he* will have expanded in his working tree, of
course!)
See? Keywords simply *cannot* work. They're broken. Either you can ignore
them (and not show them in diffs), in which case the diff is broken, or
you can not ignore them (and show them in diffs) in which case the diff is
*also* broken, just differently.
The only sane and workable case is to not have them at all. Any keyword
expansion will *always* result in problems. You simply cannot do it right.
As I mentioned originally, it results in problems in CVS too, it's just
that CVS really has so many other issues that you seldom see the problems.
Ok, after that new rant against keywords, I will say one positive thing:
- keyword *unexpansion* is certainly easy (exactly because it's
stateless)
- if we want to support a git that only does "unexpansion", you can
probably hack around stupid release scripting more easily. You can add
your keywords *outside* of git, and git will simply ignore them.
So I'm actually not against keyword un-expansion. It has none of the
fundamental problems that actually expanding the keywords has. It's
literally no different from CRLF->LF translation. It can cause confusion,
but if it has to be explicitly enabled with an attribute and is never done
automatically, then having some support for unexpansion and letting the
user who wants to use keywords use his own "wrapper scripts" around git to
do his own expansion, be my guest..
You would be unable to do fundamental operations like "git checkout B" to
jump to another branch, but if you don't support multiple branches and
want to just act like CVS, maybe git unexpanding the crap will help you:
you can add your own keywords, happy in the knowledge that git simply
won't *care* about them, and will never see them.
So I absolutely detest keyword expansion and actually have a lot of
arguments for why I don't think it *can* work even in theory (except by
being totally unusable), but I don't have the *un*expansion.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
[not found] ` <200704171803.58940.an dyparkins@gmail.com>
2007-04-17 10:09 ` Junio C Hamano
@ 2007-04-17 10:41 ` Johannes Sixt
2007-04-17 15:32 ` Linus Torvalds
2007-04-17 21:00 ` Matthieu Moy
4 siblings, 0 replies; 66+ messages in thread
From: Johannes Sixt @ 2007-04-17 10:41 UTC (permalink / raw)
To: git
Andy Parkins wrote:
> switch (git_path_check_crlf(path)) {
> case 0:
> - return 0;
> + changes += 0;
> case 1:
> - return forcecrlf_to_git(path, bufp, sizep);
> + changes += forcecrlf_to_git(path, bufp, sizep);
> default:
> - return autocrlf_to_git(path, bufp, sizep);
> + changes += autocrlf_to_git(path, bufp, sizep);
> + }
> +
> + switch (git_path_check_keyword(path)) {
> + case 0:
> + changes += 0;
> + case 1:
> + changes += keyword_unexpand_git(path, bufp, sizep);
> }
I think there are 'break's missing all along the way.
-- Hannes
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
` (2 preceding siblings ...)
2007-04-17 10:41 ` Johannes Sixt
@ 2007-04-17 15:32 ` Linus Torvalds
2007-04-17 17:10 ` Andy Parkins
2007-04-17 17:18 ` Rogan Dawes
2007-04-17 21:00 ` Matthieu Moy
4 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 15:32 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
On Tue, 17 Apr 2007, Andy Parkins wrote:
>
> Adding expansion is harder, as I have no clue which calls to make to find
> even the most basic information about an object; but I thought I'd get
> feedback before I expend that effort.
Adding expansion is not just "harder". It's basically impossible to do
with any kind of performance.
Think "git checkout newbranch".
And think what we do about files (and whole subdirectories!) that haven't
even changed. And finally, think about how important that optimization is
in an SCM like git that supports branches.
I think you'll find that keyword expansion simply isn't acceptable.
But hey, you didn't believe me, so I'm happy you are trying to write the
patches. Either you'll prove me wrong, or you'll realize just *how* broken
the feature is.
(Yeah, "unexpansion" is easy. It's easy for all the same reasons CRLF is
easy: it has no state!)
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 15:32 ` Linus Torvalds
@ 2007-04-17 17:10 ` Andy Parkins
2007-04-17 17:18 ` Rogan Dawes
1 sibling, 0 replies; 66+ messages in thread
From: Andy Parkins @ 2007-04-17 17:10 UTC (permalink / raw)
To: git; +Cc: Linus Torvalds
On Tuesday 2007, April 17, Linus Torvalds wrote:
> And think what we do about files (and whole subdirectories!) that
> haven't even changed. And finally, think about how important that
> optimization is in an SCM like git that supports branches.
>
> I think you'll find that keyword expansion simply isn't acceptable.
As I said in my reply to Junio; other VCSs only expand keywords when the
file itself is checked out - when git's uber-fast switching decides
that fileA is the same in the source and target checkouts, then the
keywords won't be updated - fine.
In this one respect git would be "as good as" instead of "infinitely
better". I can live with that.
> But hey, you didn't believe me, so I'm happy you are trying to write
> the patches. Either you'll prove me wrong, or you'll realize just
> *how* broken the feature is.
Could be - I'm happy to disagree. I'm even happy to accept that I might
fail miserably.
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 15:32 ` Linus Torvalds
2007-04-17 17:10 ` Andy Parkins
@ 2007-04-17 17:18 ` Rogan Dawes
2007-04-17 18:23 ` Linus Torvalds
1 sibling, 1 reply; 66+ messages in thread
From: Rogan Dawes @ 2007-04-17 17:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andy Parkins, git
Linus Torvalds wrote:
>
> On Tue, 17 Apr 2007, Andy Parkins wrote:
>> Adding expansion is harder, as I have no clue which calls to make to find
>> even the most basic information about an object; but I thought I'd get
>> feedback before I expend that effort.
>
> Adding expansion is not just "harder". It's basically impossible to do
> with any kind of performance.
>
> Think "git checkout newbranch".
>
> And think what we do about files (and whole subdirectories!) that haven't
> even changed. And finally, think about how important that optimization is
> in an SCM like git that supports branches.
Well, if the only keyword we support is $BlobId:$, then if the
tree/object hasn't changed, then we still don't need to touch the object.
Not so?
Rogan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 17:18 ` Rogan Dawes
@ 2007-04-17 18:23 ` Linus Torvalds
2007-04-17 20:27 ` Rogan Dawes
2007-04-17 23:56 ` Robin H. Johnson
0 siblings, 2 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-17 18:23 UTC (permalink / raw)
To: Rogan Dawes; +Cc: Andy Parkins, git
On Tue, 17 Apr 2007, Rogan Dawes wrote:
>
> Well, if the only keyword we support is $BlobId:$, then if the tree/object
> hasn't changed, then we still don't need to touch the object.
>
> Not so?
Correct. However, is that actually a useful expansion?
Most of the time, I'd expect people to want things like "last committer,
time, story of their life" etc.. I don't think the SHA1 ID's are pretty
enough that anybody would ever want to see them. But yes, they are
certainly stable.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 18:23 ` Linus Torvalds
@ 2007-04-17 20:27 ` Rogan Dawes
2007-04-17 23:56 ` Robin H. Johnson
1 sibling, 0 replies; 66+ messages in thread
From: Rogan Dawes @ 2007-04-17 20:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andy Parkins, git
Linus Torvalds wrote:
>
> On Tue, 17 Apr 2007, Rogan Dawes wrote:
>> Well, if the only keyword we support is $BlobId:$, then if the tree/object
>> hasn't changed, then we still don't need to touch the object.
>>
>> Not so?
>
> Correct. However, is that actually a useful expansion?
>
> Most of the time, I'd expect people to want things like "last committer,
> time, story of their life" etc.. I don't think the SHA1 ID's are pretty
> enough that anybody would ever want to see them. But yes, they are
> certainly stable.
>
> Linus
Well, one example for wanting a keyword expansion option was where
people modify the entire file, and just email it back to the maintainer.
It surely helps to have the SHA1 of the original object when applying
the changes.
You also stated in another email that doing keyword expansion prevents
people from using non-git tools. I agree that you'd probably end up with
diffs that may include the keyword (object id) being mailed to you if
the submitter is not using git. But when a git maintainer applies those
diffs using git-apply, the keyword unexpansion could still take place,
making the diffs usable in practice.
None of what I said necessarily supports the view that it is a good idea
from the perspective of trusting the results, of course.
Regards,
Rogan
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 18:23 ` Linus Torvalds
2007-04-17 20:27 ` Rogan Dawes
@ 2007-04-17 23:56 ` Robin H. Johnson
2007-04-18 0:02 ` Junio C Hamano
` (2 more replies)
1 sibling, 3 replies; 66+ messages in thread
From: Robin H. Johnson @ 2007-04-17 23:56 UTC (permalink / raw)
To: Linus Torvalds, Git Mailing List; +Cc: Rogan Dawes, Andy Parkins
[-- Attachment #1: Type: text/plain, Size: 1856 bytes --]
On Tue, Apr 17, 2007 at 11:23:03AM -0700, Linus Torvalds wrote:
> > Well, if the only keyword we support is $BlobId:$, then if the tree/object
> > hasn't changed, then we still don't need to touch the object.
> >
> > Not so?
> Correct. However, is that actually a useful expansion?
>
> Most of the time, I'd expect people to want things like "last committer,
> time, story of their life" etc.. I don't think the SHA1 ID's are pretty
> enough that anybody would ever want to see them. But yes, they are
> certainly stable.
I'd certainly settle for having only $Blobid:$. It fits my requirements
perfectly.
This is perhaps a reasonable wording of my requirement.
"Files from from the VCS should contain a stable machine-usable
identifier that is unique for that revision of the file, without
post-processing to insert the identifier."
In the case of CVS, $Header$ contains the path and revision number of a
file, which serve to identify the content uniquely.
In the case of Git, $BlobId$ fills the same requirement.
As for a usage case:
- J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
same output)
- Copies some file outside of the tree (the user is NOT smart enough,
and resists all reasonable attempts at edumacation)
- Modifies said file outside of tree.
- Contacts maintainer with entire changed file.
- User vanishes off the internet.
The entire file he sent if it's CVS, contains a $Header$ that uniquely
identifies the file (path and revision), and the maintainer can simply
drop the file in, and 'cvs diff -r$OLDREV $FILE'.
If it's git, the maintainer drops the file in, and does 'git diff
$OLDSHA1 $FILE'.
--
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 23:56 ` Robin H. Johnson
@ 2007-04-18 0:02 ` Junio C Hamano
2007-04-18 0:26 ` J. Bruce Fields
` (3 more replies)
2007-04-18 2:50 ` Martin Langhoff
2007-04-18 10:06 ` David Kågedal
2 siblings, 4 replies; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18 0:02 UTC (permalink / raw)
To: Robin H. Johnson
Cc: Linus Torvalds, Git Mailing List, Rogan Dawes, Andy Parkins
"Robin H. Johnson" <robbat2@gentoo.org> writes:
> As for a usage case:
> - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> same output)
> - Copies some file outside of the tree (the user is NOT smart enough,
> and resists all reasonable attempts at edumacation)
> - Modifies said file outside of tree.
> - Contacts maintainer with entire changed file.
> - User vanishes off the internet.
>
> The entire file he sent if it's CVS, contains a $Header$ that uniquely
> identifies the file (path and revision), and the maintainer can simply
> drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> If it's git, the maintainer drops the file in, and does 'git diff
> $OLDSHA1 $FILE'.
I personally hope that the maintainer drops such a non-patch
that originates from a PEBKAC. At least I hope the tools that I
personally use are not maintained by such a maintainer ;-)
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 0:02 ` Junio C Hamano
@ 2007-04-18 0:26 ` J. Bruce Fields
2007-04-18 1:19 ` Linus Torvalds
2007-04-18 1:06 ` Robin H. Johnson
` (2 subsequent siblings)
3 siblings, 1 reply; 66+ messages in thread
From: J. Bruce Fields @ 2007-04-18 0:26 UTC (permalink / raw)
To: Junio C Hamano
Cc: Robin H. Johnson, Linus Torvalds, Git Mailing List, Rogan Dawes,
Andy Parkins
On Tue, Apr 17, 2007 at 05:02:43PM -0700, Junio C Hamano wrote:
> "Robin H. Johnson" <robbat2@gentoo.org> writes:
>
> > As for a usage case:
> > - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> > same output)
> > - Copies some file outside of the tree (the user is NOT smart enough,
> > and resists all reasonable attempts at edumacation)
> > - Modifies said file outside of tree.
> > - Contacts maintainer with entire changed file.
> > - User vanishes off the internet.
> >
> > The entire file he sent if it's CVS, contains a $Header$ that uniquely
> > identifies the file (path and revision), and the maintainer can simply
> > drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> > If it's git, the maintainer drops the file in, and does 'git diff
> > $OLDSHA1 $FILE'.
>
> I personally hope that the maintainer drops such a non-patch
> that originates from a PEBKAC. At least I hope the tools that I
> personally use are not maintained by such a maintainer ;-)
That may not be quite fair--note the 'git diff $OLDSHA1 $FILE'. So the
$Header$ here is a hint telling the maintainer how to produce a
(hopefully) reviewable patch, not an invitation to blindly drop random
files into the tree. (Other objections to accepting code from random
non-reachable people aside....)
I've occasionally wondered before whether git could offer any help in
the case where, say, somebody hands me a file, I know it's based on
src/widget/widget.c from somewhere in v0.5..v0.7, and I'd like a guess
at the most likely candidates.
I haven't wondered that often enough that I'd consider it worth
embedding the blob SHA1 in every checked-out file, though!
--b.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 0:26 ` J. Bruce Fields
@ 2007-04-18 1:19 ` Linus Torvalds
2007-04-18 1:28 ` Junio C Hamano
0 siblings, 1 reply; 66+ messages in thread
From: Linus Torvalds @ 2007-04-18 1:19 UTC (permalink / raw)
To: J. Bruce Fields
Cc: Junio C Hamano, Robin H. Johnson, Git Mailing List, Rogan Dawes,
Andy Parkins
On Tue, 17 Apr 2007, J. Bruce Fields wrote:
>
> I've occasionally wondered before whether git could offer any help in
> the case where, say, somebody hands me a file, I know it's based on
> src/widget/widget.c from somewhere in v0.5..v0.7, and I'd like a guess
> at the most likely candidates.
It's actually fairly easy to do.
Get the git hash of the blob: use "git hash-object" to do so (although
you can do it without git too, see later), then just do
git whatchanged v0.5..v0.7 -- src/widget/widget.c
and just look for the hash. If it's an exact match, you'd find it there,
and it will tell you when it changed.
If it's *not* an exact match, you have to come up with some "measure of
minimality" for the thing (the size of the diff might be a good one), and
you can do
git rev-list --no-merges --full-history v0.5..v0.7 -- src/widget/widget.c > rev-list
which will get you a full set of commits that changed that file. Then you
can just do something like
best_commit=none
best=1000000
while read commit
do
git cat-file blob "$commit:src/widget/widget.c" > tmpfile
lines=$(diff reference-file tmpfile | wc -l)
if [ "$lines" -lt "$best" ]
then
echo Best so far: $commit $lines
best=$lines
fi
done < rev-list
and you're done!
(Yeah, I'm sure that script could be improved, but it's probably really
not that bad even as-is! The initial "git rev-list" will have done all
the heavy lifting, and picked out the commits that matter)
> I haven't wondered that often enough that I'd consider it worth
> embedding the blob SHA1 in every checked-out file, though!
It really doesn't pay.
Besides, if you actually have the file, you can trivially get the SHA1
_without_ embedding it into the file. Just do
(echo -e -n "blob <size>\0" ; cat file) | sha1sum
where "size" is just the size in bytes of the file.
So embedding the SHA1 doesn't actually buy you anything: every blob BY
DEFINITION has their SHA1 embedded into them.
In fact, embedding the SHA1 (or doing any other modifications) just makes
it harder to do this, since then you have to filter it out again.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 1:19 ` Linus Torvalds
@ 2007-04-18 1:28 ` Junio C Hamano
2007-04-18 1:33 ` Linus Torvalds
0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18 1:28 UTC (permalink / raw)
To: Linus Torvalds
Cc: J. Bruce Fields, Robin H. Johnson, Git Mailing List, Rogan Dawes,
Andy Parkins
Linus Torvalds <torvalds@linux-foundation.org> writes:
> Besides, if you actually have the file, you can trivially get the SHA1
> _without_ embedding it into the file. Just do
>
> (echo -e -n "blob <size>\0" ; cat file) | sha1sum
>
> where "size" is just the size in bytes of the file.
>
> So embedding the SHA1 doesn't actually buy you anything: every blob BY
> DEFINITION has their SHA1 embedded into them.
>
> In fact, embedding the SHA1 (or doing any other modifications) just makes
> it harder to do this, since then you have to filter it out again.
The use case the thread you are responding to assumes that you
do *not* have a preimage before the change.
You give a file out, somebody says "here is my updated version"
and returns the whole file. You may recognize, from the
contents, which path the file was taken from, but you may not
know which revision to diff against, as the "update version" was
not sent to you along with the version before he started
updating.
In my day job, I made a protocol, with a diff-challenged person
in our Japanese subsidiary, for him to send *both* preimage and
postimage of his changes when sending updates to me, and that
has worked reasonably well without embedded ID. I can always do
a file-level 3-way merge to forward port his change to whatever
version I am working on.
But if it is not an option to insist getting the preimage back,
embedded blob ID would be one way to help that exchange.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 1:28 ` Junio C Hamano
@ 2007-04-18 1:33 ` Linus Torvalds
0 siblings, 0 replies; 66+ messages in thread
From: Linus Torvalds @ 2007-04-18 1:33 UTC (permalink / raw)
To: Junio C Hamano
Cc: J. Bruce Fields, Robin H. Johnson, Git Mailing List, Rogan Dawes,
Andy Parkins
On Tue, 17 Apr 2007, Junio C Hamano wrote:
>
> The use case the thread you are responding to assumes that you
> do *not* have a preimage before the change.
Sure. However, see my other script to actually find the "closest version".
It's pretty easy.
In fact, even if you don't know which file it is, we could easily first
have a separate "try to guess filename" (based on the same kind of
heurstics) and then dig deeper on that filename.
So I think there are better ways to get the same effect without embedding
any information.
Linus
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 0:02 ` Junio C Hamano
2007-04-18 0:26 ` J. Bruce Fields
@ 2007-04-18 1:06 ` Robin H. Johnson
2007-04-18 1:15 ` Junio C Hamano
2007-04-18 4:15 ` Daniel Barkalow
2007-04-18 11:32 ` Johannes Schindelin
3 siblings, 1 reply; 66+ messages in thread
From: Robin H. Johnson @ 2007-04-18 1:06 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
Cc: Robin H. Johnson, Linus Torvalds, Rogan Dawes, Andy Parkins
[-- Attachment #1: Type: text/plain, Size: 1886 bytes --]
On Tue, Apr 17, 2007 at 05:02:43PM -0700, Junio C Hamano wrote:
> > As for a usage case:
> > - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> > same output)
> > - Copies some file outside of the tree (the user is NOT smart enough,
> > and resists all reasonable attempts at edumacation)
> > - Modifies said file outside of tree.
> > - Contacts maintainer with entire changed file.
> > - User vanishes off the internet.
> >
> > The entire file he sent if it's CVS, contains a $Header$ that uniquely
> > identifies the file (path and revision), and the maintainer can simply
> > drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> > If it's git, the maintainer drops the file in, and does 'git diff
> > $OLDSHA1 $FILE'.
> I personally hope that the maintainer drops such a non-patch
> that originates from a PEBKAC. At least I hope the tools that I
> personally use are not maintained by such a maintainer ;-)
I certainly wasn't stating blindly commit the file. Any Gentoo developer
doing that should not have made it through the recruitment process.
Do the diff, separate the wheat from the chaff, and then put the useful
(and reviewed) changes back into the tree.
Glancing at the Gentoo bugs I've dealt with over the last 2 months as a
quick survey, there are a few levels:
A - Able to submit a good diff
B - Able to do a good implementation
C - Able to come up with a good idea for improvement
B are in short supply, and even of those, the number that can do A are
smaller :-(. Category C is vastly bigger than B, and those that don't
make B throw up a lot of chaff of bad implementations.
Being able to extract the good ideas is what's important.
--
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 1:06 ` Robin H. Johnson
@ 2007-04-18 1:15 ` Junio C Hamano
2007-04-18 1:42 ` Junio C Hamano
0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18 1:15 UTC (permalink / raw)
To: Robin H. Johnson
Cc: Git Mailing List, Linus Torvalds, Rogan Dawes, Andy Parkins
"Robin H. Johnson" <robbat2@gentoo.org> writes:
> Glancing at the Gentoo bugs I've dealt with over the last 2 months as a
> quick survey, there are a few levels:
> A - Able to submit a good diff
> B - Able to do a good implementation
> C - Able to come up with a good idea for improvement
>
> B are in short supply, and even of those, the number that can do A are
> smaller :-(. Category C is vastly bigger than B, and those that don't
> make B throw up a lot of chaff of bad implementations.
>
> Being able to extract the good ideas is what's important.
True, to a certain degree. Maybe you and your fellow Gentoo
people are very much more accomodating, but my fear is that a
maintainer that goes length to sift through chaff himself
quickly runs out of time, becomes exhausted, and ends up being
careless.
Maybe I am spoiled by having only the best people around me, and
on git list.
But we are straying to a tangent.
I do not have much against an optional "only blob id" expansion
myself, as I do not see any more downside than CRLF expansion in
it. But I suspect that once people see the $id$ expanded to
blob, they would not stop, because simply they do not understand
why blob-id and CRLF are much less evil than other things.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 1:15 ` Junio C Hamano
@ 2007-04-18 1:42 ` Junio C Hamano
2007-04-18 2:53 ` Robin H. Johnson
0 siblings, 1 reply; 66+ messages in thread
From: Junio C Hamano @ 2007-04-18 1:42 UTC (permalink / raw)
To: Robin H. Johnson
Cc: Git Mailing List, Linus Torvalds, Rogan Dawes, Andy Parkins
Junio C Hamano <junkio@cox.net> writes:
> I do not have much against an optional "only blob id" expansion
> myself, as I do not see any more downside than CRLF expansion in
> it...
Actually, there is one. Somebody makes a patch against a file
with $id$ expanded. Gives it to somebody else who is git
challenged and does not have git-apply. The patch is useless.
So it is not without more downsides than CRLF.
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 1:42 ` Junio C Hamano
@ 2007-04-18 2:53 ` Robin H. Johnson
0 siblings, 0 replies; 66+ messages in thread
From: Robin H. Johnson @ 2007-04-18 2:53 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
Cc: Robin H. Johnson, Linus Torvalds, Rogan Dawes, Andy Parkins
[-- Attachment #1: Type: text/plain, Size: 2282 bytes --]
On Tue, Apr 17, 2007 at 06:42:38PM -0700, Junio C Hamano wrote:
> Junio C Hamano <junkio@cox.net> writes:
> > I do not have much against an optional "only blob id" expansion
> > myself, as I do not see any more downside than CRLF expansion in
> > it...
> Actually, there is one. Somebody makes a patch against a file
> with $id$ expanded. Gives it to somebody else who is git
> challenged and does not have git-apply. The patch is useless.
So they have diff'd outside of Git, and the recipient is applying outside of
Git:
A - If they are applying the patch on top of the same base revision, it will
apply fine, because the keywords are identical.
B - If they are applying the patch on top of a different revision, the keywords
won't apply, and most probably other content in the patch won't apply either.
Additional with B the longer your individual files, the more likely that the
diff hunk containing the keyword change does not contain any other changes, and
can be easily discarded. More that the changes are likely to be further away
from the keyword ;-).
Discarding portions of patches is already wide-spread (not just for CVS
keywords, the architecture keywords in Gentoo ebuilds change rapidly as well),
and if git-apply can discard the keyword, it only serves to accelerate the
usage of git.
Some quick stats I hacked together on lengths of Gentoo ebuilds.
23161 ebuilds total.
51% of the Gentoo ebuilds are less than 36 lines long.
76% are less than 56 lines long.
90% are less than 92 lines long.
(thereafter the tail gets VERY long).
0.21% are more than 500 lines long.
> So it is not without more downsides than CRLF.
A cleaner version of my earlier command to find the changes between revisions.
diff -Nuar <(git-cat-file blob $SHA1:$FILE) $TMPFILE
where $TMPFILE is a temporary filename for the file from the user.
This saves having to overwrite the local $FILE and restore it afterwards.
It would be nice if git-diff could handle this case directly.
On a tangent, has any work gone into specialized patch mergers for specific
file formats?
--
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 0:02 ` Junio C Hamano
2007-04-18 0:26 ` J. Bruce Fields
2007-04-18 1:06 ` Robin H. Johnson
@ 2007-04-18 4:15 ` Daniel Barkalow
2007-04-18 11:32 ` Johannes Schindelin
3 siblings, 0 replies; 66+ messages in thread
From: Daniel Barkalow @ 2007-04-18 4:15 UTC (permalink / raw)
To: Junio C Hamano
Cc: Robin H. Johnson, Linus Torvalds, Git Mailing List, Rogan Dawes,
Andy Parkins
On Tue, 17 Apr 2007, Junio C Hamano wrote:
> "Robin H. Johnson" <robbat2@gentoo.org> writes:
>
> > As for a usage case:
> > - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> > same output)
> > - Copies some file outside of the tree (the user is NOT smart enough,
> > and resists all reasonable attempts at edumacation)
> > - Modifies said file outside of tree.
> > - Contacts maintainer with entire changed file.
> > - User vanishes off the internet.
> >
> > The entire file he sent if it's CVS, contains a $Header$ that uniquely
> > identifies the file (path and revision), and the maintainer can simply
> > drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> > If it's git, the maintainer drops the file in, and does 'git diff
> > $OLDSHA1 $FILE'.
>
> I personally hope that the maintainer drops such a non-patch
> that originates from a PEBKAC. At least I hope the tools that I
> personally use are not maintained by such a maintainer ;-)
As a concrete example, say I'm not a Gentoo developer at all, but I'm
trying to get some package to install in a slightly odd situation. (E.g.,
I want to build a version of gcc for ARM microcontrollers, which requires
flags to be set that aren't normally available for the ARM architecture.)
In order to do this, I need to make some changes to the gcc ebuild to pass
those USE flags through to configure. Since I'm not a Gentoo developer, I
don't have the version-controlled tree, just: (1) the tree that gets
reverted to the official state every time I sync and (2) my tree of local
overlays. (2) also contains other packages I've made modifications to
(adding patches to packages where those patches are only in unreleased
version, but solve my problems, e.g.), so it's clearly the place to put
the gcc change.
Now, if I figure out how to get the ebuild working, I'll be happy just to
have a working compiler for this weird target, and I don't care too much
further. But then if word gets out that I managed this, or if I notice a
bug report that other people are failing to get it to build, I may want to
post my working ebuild. And maybe the maintainer decides that my method
was good, and wants to use it. But then it could be a lot of work to
figure out what differences are from me beating on this ebuild, what are
important, and what are reverts of changes make upstream after I made my
copy (particularly because the same ebuild for gcc also gets a lot more
development making it better as the system native compiler).
If the ebuild has the blob ID that the file had when it left Gentoo
version control and went out into local hack land, it would be relatively
easy to figure out what patch should be applies to get the useful changes.
(In case you're wondering, I actually eventually gave up and installed a
gnu-arm binary distribution, because I was in a hurry to get the project
going, and building from source kept failing to get configured properly;
but if my first line of attack had worked, I would have ended up with the
described hacked ebuild.)
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 0:02 ` Junio C Hamano
` (2 preceding siblings ...)
2007-04-18 4:15 ` Daniel Barkalow
@ 2007-04-18 11:32 ` Johannes Schindelin
3 siblings, 0 replies; 66+ messages in thread
From: Johannes Schindelin @ 2007-04-18 11:32 UTC (permalink / raw)
To: Junio C Hamano
Cc: Robin H. Johnson, Linus Torvalds, Git Mailing List, Rogan Dawes,
Andy Parkins
Hi,
On Tue, 17 Apr 2007, Junio C Hamano wrote:
> "Robin H. Johnson" <robbat2@gentoo.org> writes:
>
> > As for a usage case:
> > - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> > same output)
> > - Copies some file outside of the tree (the user is NOT smart enough,
> > and resists all reasonable attempts at edumacation)
> > - Modifies said file outside of tree.
> > - Contacts maintainer with entire changed file.
> > - User vanishes off the internet.
> >
> > The entire file he sent if it's CVS, contains a $Header$ that uniquely
> > identifies the file (path and revision), and the maintainer can simply
> > drop the file in, and 'cvs diff -r$OLDREV $FILE'.
> > If it's git, the maintainer drops the file in, and does 'git diff
> > $OLDSHA1 $FILE'.
>
> I personally hope that the maintainer drops such a non-patch
> that originates from a PEBKAC.
Me, too. Although people really believe strange things. When I asked such
a guy on another list, if he could send me a patch instead of a complete
file, he shouted loudly at me that patches were obsolete. Yes. Really. I
begged to differ, but I guess he still believes that.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 23:56 ` Robin H. Johnson
2007-04-18 0:02 ` Junio C Hamano
@ 2007-04-18 2:50 ` Martin Langhoff
2007-04-18 10:06 ` David Kågedal
2 siblings, 0 replies; 66+ messages in thread
From: Martin Langhoff @ 2007-04-18 2:50 UTC (permalink / raw)
To: Robin H. Johnson
Cc: Linus Torvalds, Git Mailing List, Rogan Dawes, Andy Parkins
On 4/18/07, Robin H. Johnson <robbat2@gentoo.org> wrote:
> As for a usage case:
> - J.PEBKAC.User gets a a tree (from a tarball or GIT, we should gain the
> same output)
> - Copies some file outside of the tree (the user is NOT smart enough,
> and resists all reasonable attempts at edumacation)
> - Modifies said file outside of tree.
> - Contacts maintainer with entire changed file.
> - User vanishes off the internet.
That's a valid case, but as Linus hints, that's a snippet of perl/bash
away to find "closest matches". We could have a
git-findclosestmatch <head> path/to/scan/ randomfile.c
That would quickly return a few candidates together with the commit
they appear in.
cheers,
martin
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 23:56 ` Robin H. Johnson
2007-04-18 0:02 ` Junio C Hamano
2007-04-18 2:50 ` Martin Langhoff
@ 2007-04-18 10:06 ` David Kågedal
2007-04-18 11:08 ` Robin H. Johnson
2 siblings, 1 reply; 66+ messages in thread
From: David Kågedal @ 2007-04-18 10:06 UTC (permalink / raw)
To: git
"Robin H. Johnson" <robbat2@gentoo.org> writes:
> This is perhaps a reasonable wording of my requirement.
> "Files from from the VCS should contain a stable machine-usable
> identifier that is unique for that revision of the file, without
> post-processing to insert the identifier."
But what is the "revision of the file"? The blob ID is just a hash of
the contents, and doesn't say anything about where in the history of
the project it appears. It will usually appear in many "project
revisions", i.e. commits.
If you want to mark the "revision" of a file, the only sensible thing
is to use the commit ID, which will give you all the problems
described in this thread.
--
David Kågedal
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-18 10:06 ` David Kågedal
@ 2007-04-18 11:08 ` Robin H. Johnson
0 siblings, 0 replies; 66+ messages in thread
From: Robin H. Johnson @ 2007-04-18 11:08 UTC (permalink / raw)
To: David K??gedal, Git Mailing List
[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]
On Wed, Apr 18, 2007 at 12:06:48PM +0200, David K??gedal wrote:
> > This is perhaps a reasonable wording of my requirement.
> > "Files from from the VCS should contain a stable machine-usable
> > identifier that is unique for that revision of the file, without
> > post-processing to insert the identifier."
> But what is the "revision of the file"? The blob ID is just a hash of
> the contents, and doesn't say anything about where in the history of
> the project it appears. It will usually appear in many "project
> revisions", i.e. commits.
The location/context in history of the file is not needed by the
requirement I wrote above.
Since the BlobID is the hash of the contents (taken with all keywords
collapsed obviously) - if the contents are identical, then the blobid is
identical.
Since the contents and blobid are the same, it doesn't matter which
commit you take the file from when you don't care about the history of
that point (eg cat-file, diff).
The file goes out, and when a user throws it (modified) back at us, we
just grab the $BlobId$ and use that to identify what it originally
looked like.
--
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail : robbat2@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]
^ permalink raw reply [flat|nested] 66+ messages in thread
* Re: [PATCH 2/2] Add keyword unexpansion support to convert.c
2007-04-17 9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
` (3 preceding siblings ...)
2007-04-17 15:32 ` Linus Torvalds
@ 2007-04-17 21:00 ` Matthieu Moy
4 siblings, 0 replies; 66+ messages in thread
From: Matthieu Moy @ 2007-04-17 21:00 UTC (permalink / raw)
To: git
Andy Parkins <andyparkins@gmail.com> writes:
> This patch adds expansion of keywords support.
I didn't get time to read the whole thread, but just my 2 cents:
Did you have a look at the way Mercurial deals with this?
They have a system where files can be ran through arbitrary filters
when commited:
http://www.selenic.com/mercurial/wiki/index.cgi/EncodeDecodeFilter
and it can be somehow abused to do keyword expansion.
http://www.selenic.com/mercurial/wiki/index.cgi/TipsAndTricks#head-f348f796b9560a1cfbdaf3f3f0f7d9d4339266e9
It surely doesn't solve all the problems mentionned in this thread,
but the experience can be interesting to look at.
--
Matthieu
^ permalink raw reply [flat|nested] 66+ messages in thread
end of thread, other threads:[~2007-04-22 0:05 UTC | newest]
Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-17 9:41 [PATCH 2/2] Add keyword unexpansion support to convert.c Andy Parkins
[not found] ` <200704171803.58940.an dyparkins@gmail.com>
2007-04-17 10:09 ` Junio C Hamano
2007-04-17 11:35 ` Andy Parkins
2007-04-17 15:53 ` Linus Torvalds
2007-04-17 17:03 ` Andy Parkins
2007-04-17 18:12 ` Linus Torvalds
2007-04-17 19:12 ` Andy Parkins
[not found] ` <alpine.LFD. 0.98.0704171530220.4504@xanadu.home>
2007-04-17 19:41 ` Nicolas Pitre
2007-04-17 19:45 ` David Lang
[not found] ` <alpin e.LFD.0.98.0704171624190.4504@xanadu.home>
2007-04-17 20:29 ` Nicolas Pitre
2007-04-17 20:05 ` David Lang
2007-04-17 21:16 ` Nicolas Pitre
[not found] ` <7vy7k qlj5r.fsf@assigned-by-dhcp.cox.net>
2007-04-17 20:53 ` David Lang
2007-04-17 21:52 ` Andy Parkins
2007-04-17 22:40 ` Junio C Hamano
2007-04-18 2:39 ` Nicolas Pitre
2007-04-18 5:04 ` Junio C Hamano
2007-04-18 14:56 ` Nicolas Pitre
2007-04-18 11:14 ` Johannes Schindelin
2007-04-18 15:10 ` Nicolas Pitre
2007-04-19 8:19 ` Johannes Schindelin
2007-04-21 0:42 ` David Lang
2007-04-21 1:54 ` Junio C Hamano
2007-04-21 2:06 ` Nicolas Pitre
2007-04-21 23:31 ` David Lang
2007-04-18 6:24 ` Rogan Dawes
2007-04-18 15:02 ` Linus Torvalds
2007-04-18 15:34 ` Nicolas Pitre
2007-04-18 15:38 ` Rogan Dawes
2007-04-18 15:59 ` Nicolas Pitre
2007-04-18 16:09 ` Rogan Dawes
2007-04-18 17:58 ` Alon Ziv
2007-04-17 19:54 ` Linus Torvalds
2007-04-17 20:46 ` Andy Parkins
2007-04-17 20:52 ` [PATCH] Add keyword collapse " Andy Parkins
2007-04-17 21:10 ` [PATCH 2/2] Add keyword unexpansion " Linus Torvalds
2007-04-17 21:13 ` Linus Torvalds
2007-04-18 11:11 ` Johannes Schindelin
2007-04-20 11:32 ` Nikolai Weibull
2007-04-17 21:18 ` Martin Langhoff
2007-04-17 21:24 ` Junio C Hamano
2007-04-20 0:30 ` Jakub Narebski
2007-04-21 0:47 ` David Lang
2007-04-17 15:46 ` Linus Torvalds
2007-04-17 10:41 ` Johannes Sixt
2007-04-17 15:32 ` Linus Torvalds
2007-04-17 17:10 ` Andy Parkins
2007-04-17 17:18 ` Rogan Dawes
2007-04-17 18:23 ` Linus Torvalds
2007-04-17 20:27 ` Rogan Dawes
2007-04-17 23:56 ` Robin H. Johnson
2007-04-18 0:02 ` Junio C Hamano
2007-04-18 0:26 ` J. Bruce Fields
2007-04-18 1:19 ` Linus Torvalds
2007-04-18 1:28 ` Junio C Hamano
2007-04-18 1:33 ` Linus Torvalds
2007-04-18 1:06 ` Robin H. Johnson
2007-04-18 1:15 ` Junio C Hamano
2007-04-18 1:42 ` Junio C Hamano
2007-04-18 2:53 ` Robin H. Johnson
2007-04-18 4:15 ` Daniel Barkalow
2007-04-18 11:32 ` Johannes Schindelin
2007-04-18 2:50 ` Martin Langhoff
2007-04-18 10:06 ` David Kågedal
2007-04-18 11:08 ` Robin H. Johnson
2007-04-17 21:00 ` Matthieu Moy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).