Weird shallow-tree conversion state, and branches of shallow trees

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Weird shallow-tree conversion state, and branches of shallow trees
@ 2007-04-12  0:53 Robin H. Johnson
  2007-04-14  8:56 ` Johannes Schindelin
  0 siblings, 1 reply; 34+ messages in thread
From: Robin H. Johnson @ 2007-04-12  0:53 UTC (permalink / raw)
  To: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2187 bytes --]

I was doing some random tests with shallow trees, and ran into two
issues - the first is a shallow tree that doesn't extend anymore when it
should, and the second is some branched shallow tree trouble.

1.
(I was using the kernel.org Git repo for my testing here)
> git clone --depth 1 git://GIT-REMOTE-URL
> # do some local commits
> git pull --depth 1000000 # some very large number, to try and add all the history

At this point, I noticed that my tree still seemed to be shallow, and no
matter what I tried, I couldn't un-shallow it.

.git/shallow contained a single line:
> 9c405082d96ed7a7ed830f9861dbad9a32e4d268

And moving the shallow file out the way, fsck --full gets me:
> broken link from  commit 9c405082d96ed7a7ed830f9861dbad9a32e4d268
>               to  commit bb3e781d7f6259eb414cbecd8bad74cd4a188b41
> broken link from  commit 9c405082d96ed7a7ed830f9861dbad9a32e4d268
>               to  commit 9bfbe261923f4e9d89f65e6755fa6501aa6531b0
> missing commit bb3e781d7f6259eb414cbecd8bad74cd4a188b41
> missing commit 9bfbe261923f4e9d89f65e6755fa6501aa6531b0

Any ideas on why it's not going to full depth?
I don't have a reliable test case for this yet, sometimes it does go deep
properly, sometimes it doesn't.

2.
Again about shallow repos, a development problem I ran into.
> git clone --depth 1 git://GIT-REMOTE-URL
> git checkout -b working-branch
> # do various work, and git-commit the changes
> git checkout master 
> git pull
> # some time goes by, and you want the latest upstream changes
> git checkout working-branch
> git pull . master

The last pull from the local master fails. This seems weird, because if
working-branch development is done on the master instead, the earlier pull
never complains. So in this case, the working-branch should be able to pull
from the local master branch fine.

This bug basically stops people from being able to take a shallow clone of a
repository with a lot of history, and have multiple working branches on it.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-12  0:53 Weird shallow-tree conversion state, and branches of shallow trees Robin H. Johnson
@ 2007-04-14  8:56 ` Johannes Schindelin
  2007-04-15  0:03   ` Robin H. Johnson
  0 siblings, 1 reply; 34+ messages in thread
From: Johannes Schindelin @ 2007-04-14  8:56 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: Git Mailing List

Hi,

On Wed, 11 Apr 2007, Robin H. Johnson wrote:

> I was doing some random tests with shallow trees, and ran into two 
> issues - the first is a shallow tree that doesn't extend anymore when it 
> should, and the second is some branched shallow tree trouble.

Ah! Seems we finally have a user for shallow clones! ;-)

Seriously again: I am at fault for putting the shallow support into Git, 
failing to provide sensible test cases. This was partly due to my 
laziness, and partly due to the overwhelming lack of demand.

I am in the middle of moving (haven't reached my destination yet), so I 
will take a couple more days until I can look into your problems. If you 
find out in the meantime what is happening, please share the information 
with us.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-14  8:56 ` Johannes Schindelin
@ 2007-04-15  0:03   ` Robin H. Johnson
  2007-04-15  0:02     ` David Lang
  0 siblings, 1 reply; 34+ messages in thread
From: Robin H. Johnson @ 2007-04-15  0:03 UTC (permalink / raw)
  To: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2192 bytes --]

On Sat, Apr 14, 2007 at 10:56:10AM +0200, Johannes Schindelin wrote:
> Ah! Seems we finally have a user for shallow clones! ;-)
Heh. I'm specifically looking at git, trying to resolve the deficiencies
that were identified during by one of our (Gentoo) SoC2006 projects, on
the potential migration of the Gentoo CVS. Git has matured tremendously
since then.

The primary Gentoo CVS module (gentoo-x86), has 234672 files tracked,
and 1309603 CVS revisions. Between 350k and 500k changesets, depending
on how you merge those revisions.

Couple of the things that were identified either in the SoC project, or
since then.
- Shallow history checkouts are important to our low-bandwidth
  ebuild-tree developers (people in places with 33.6k modems, because
  the phone lines don't work well enough for 56k), or other high latency
  setups.
- Shallow tree (subtree) checkouts, for the developers that focus on
  specific portions of large modules and have no interest in the rest of
  the that tree. Eg. Releng does their work in gentoo/src/releng.
- ACLs specific to subtree commits. Something similar to the cvs_acls.pl
  that FreeBSD uses would be great. Eg gentoo-x86/sec-policy/ is
  restricted to members of the security team (SELinux policies).
- CVS Keyword-like behavior, to specifically place the path and revision
  of certain files into the file directly, for ease of tracking when the
  file is removed from it's original surrounding. I know this one is
  going to draw some flack, but it's a very common practice for a user
  to copy a file out of the CVS tree, make some modifications, and then
  post the entire changed version up, esp. when the size of the changes
  exceeds the size of diff.

> Seriously again: I am at fault for putting the shallow support into Git, 
> failing to provide sensible test cases. This was partly due to my 
> laziness, and partly due to the overwhelming lack of demand.
I still haven't figured out a decent testcase for this, I need to dig
harder.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15  0:03   ` Robin H. Johnson
@ 2007-04-15  0:02     ` David Lang
  2007-04-15  2:01       ` Robin H. Johnson
  0 siblings, 1 reply; 34+ messages in thread
From: David Lang @ 2007-04-15  0:02 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: Git Mailing List

On Sat, 14 Apr 2007, Robin H. Johnson wrote:

> On Sat, Apr 14, 2007 at 10:56:10AM +0200, Johannes Schindelin wrote:
>> Ah! Seems we finally have a user for shallow clones! ;-)
> Heh. I'm specifically looking at git, trying to resolve the deficiencies
> that were identified during by one of our (Gentoo) SoC2006 projects, on
> the potential migration of the Gentoo CVS. Git has matured tremendously
> since then.
>
> The primary Gentoo CVS module (gentoo-x86), has 234672 files tracked,
> and 1309603 CVS revisions. Between 350k and 500k changesets, depending
> on how you merge those revisions.
>
> Couple of the things that were identified either in the SoC project, or
> since then.
> - Shallow history checkouts are important to our low-bandwidth
>  ebuild-tree developers (people in places with 33.6k modems, because
>  the phone lines don't work well enough for 56k), or other high latency
>  setups.

note that for people on low-bandwideth lines, makeing too shallow a checkout can 
actually end up costing more over time (they will have to pull full revisions 
since they don't have the earlier versions to just pull a diff against)

> - Shallow tree (subtree) checkouts, for the developers that focus on
>  specific portions of large modules and have no interest in the rest of
>  the that tree. Eg. Releng does their work in gentoo/src/releng.

this could either be shallow tree or subproject, depending on how you end up 
orginizing things.

> - ACLs specific to subtree commits. Something similar to the cvs_acls.pl
>  that FreeBSD uses would be great. Eg gentoo-x86/sec-policy/ is
>  restricted to members of the security team (SELinux policies).

since git isn't designed with a single repository, it also doesn't need to worry 
about acl's (in fact, i don't think it has the concept of permissions at all). 
this is up to the people maintaining the 'master' repository to pull from the 
right people

> - CVS Keyword-like behavior, to specifically place the path and revision
>  of certain files into the file directly, for ease of tracking when the
>  file is removed from it's original surrounding. I know this one is
>  going to draw some flack, but it's a very common practice for a user
>  to copy a file out of the CVS tree, make some modifications, and then
>  post the entire changed version up, esp. when the size of the changes
>  exceeds the size of diff.

I'm not understanding why you need this. git tracks the file content, not the 
diffs betwen files. a developer does their work and git figures out when you do 
a pull if it's better to send the file or a diff (and if you are sending a diff, 
what you are doing the diff against, it may not be the file that had that name 
before)

there's no need to place the path and revision in the file itself.

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15  0:02     ` David Lang
@ 2007-04-15  2:01       ` Robin H. Johnson
  2007-04-15  4:31         ` Shawn O. Pearce
  0 siblings, 1 reply; 34+ messages in thread
From: Robin H. Johnson @ 2007-04-15  2:01 UTC (permalink / raw)
  To: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 4443 bytes --]

On Sat, Apr 14, 2007 at 05:02:47PM -0700, David Lang wrote:
> > - Shallow history checkouts are important to our low-bandwidth
> >  ebuild-tree developers (people in places with 33.6k modems, because
> >  the phone lines don't work well enough for 56k), or other high latency
> >  setups.
>  note that for people on low-bandwideth lines, makeing too shallow a checkout 
>  can actually end up costing more over time (they will have to pull full 
>  revisions since they don't have the earlier versions to just pull a diff 
>  against)
Yes, I'm aware that it may be more efficient over the long term for them
to pull given blocks, and I'm going to recommend that developers have a
full history anyway, but I suspect that they will still make heavy use
of shallow trees, esp. as some do throwaway trees often.
(This one is a moot point anyway, the shallow history support in Git is
pretty much done baring the bugs I posted about previously).

> > - Shallow tree (subtree) checkouts, for the developers that focus on
> >  specific portions of large modules and have no interest in the rest of
> >  the that tree. Eg. Releng does their work in gentoo/src/releng.
>  this could either be shallow tree or subproject, depending on how you end up 
>  orginizing things.
shallow tree, because we really do have people that check out arbitrary
sub-divisions (the web translation teams come to mind, they just have
checkouts of English and their own language), and going sub-project
would be insane for that.

> > - ACLs specific to subtree commits. Something similar to the cvs_acls.pl
> >  that FreeBSD uses would be great. Eg gentoo-x86/sec-policy/ is
> >  restricted to members of the security team (SELinux policies).
>  since git isn't designed with a single repository, it also doesn't need to 
>  worry about acl's (in fact, i don't think it has the concept of permissions 
>  at all). this is up to the people maintaining the 'master' repository to 
>  pull from the right people
I should have mentioned that we aren't following the kernel model here.
All of the developers will have git+ssh access to the central tree, to
push their own changes to it. On a similar tangent, in some subtrees
(our documentation mainly) we have server-side validation tests before
the commit is accepted. The 'update' hook documentation suggests that
ACLs should be possible and implemented via that.

> > - CVS Keyword-like behavior, to specifically place the path and revision
> >  of certain files into the file directly, for ease of tracking when the
> >  file is removed from it's original surrounding. I know this one is
> >  going to draw some flack, but it's a very common practice for a user
> >  to copy a file out of the CVS tree, make some modifications, and then
> >  post the entire changed version up, esp. when the size of the changes
> >  exceeds the size of diff.
>  I'm not understanding why you need this. git tracks the file content, not 
>  the diffs betwen files. a developer does their work and git figures out when 
>  you do a pull if it's better to send the file or a diff (and if you are 
>  sending a diff, what you are doing the diff against, it may not be the file 
>  that had that name before)
The tree that goes out to users is NOT git or CVS. What you point to
here is impossible unless we forced all of the users to migrate to git
(a truly herculean task if there was ever one).
It's a tarball or an rsync of an automatically managed CVS checkout.
(Tarballs go onto the release media, and are also widely used by those
that sneaker-net their trees to machines for security reasons).
Alternatively, the users browse the viewcvs, and pull something from the
Attic. Regardless of where they get the file from, the problem is that
the file doesn't contain any markers to help the developers merge it
back again.

A frequent occurrence of this is where the user takes rev X of a file
(because it was the latest one at the time), makes a local (non
version-controlled) copy, and submits it back our Bugzilla some months
down the line. Thanks to the $Header$ in the file he submits, we can
produce a diff against the original revision, and figure out how best to
merge it with the latest revision.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15  2:01       ` Robin H. Johnson
@ 2007-04-15  4:31         ` Shawn O. Pearce
  2007-04-15  5:57           ` Nguyen Thai Ngoc Duy
  2007-04-15  9:44           ` Robin H. Johnson
  0 siblings, 2 replies; 34+ messages in thread
From: Shawn O. Pearce @ 2007-04-15  4:31 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: Git Mailing List

"Robin H. Johnson" <robbat2@gentoo.org> wrote:
> On Sat, Apr 14, 2007 at 05:02:47PM -0700, David Lang wrote:
> > > - Shallow history checkouts are important to our low-bandwidth
> > >  ebuild-tree developers (people in places with 33.6k modems, because
> > >  the phone lines don't work well enough for 56k), or other high latency
> > >  setups.
> >  note that for people on low-bandwideth lines, makeing too shallow a checkout 
> >  can actually end up costing more over time (they will have to pull full 
> >  revisions since they don't have the earlier versions to just pull a diff 
> >  against)

Mail them a DVD of the Git import, have them load it locally,
and use --reference for all future clones.  With Git its possible
to build fast throwaway trees from any random URL, so long as you
keep at least one repository available locally to act as a reference.

The speed at which a DVD (or small box of CDs) travels through the
various postal systems might very well be faster than 33.6k modem.
:-)

> I should have mentioned that we aren't following the kernel model here.
> All of the developers will have git+ssh access to the central tree, to
> push their own changes to it. On a similar tangent, in some subtrees
> (our documentation mainly) we have server-side validation tests before
> the commit is accepted. The 'update' hook documentation suggests that
> ACLs should be possible and implemented via that.

Yes.  I run probably the most paranoid update hook in existance.
If you want a copy let me know, I'll send it to you.  Its a Perl
script that verifies the 'committer ' line matches the UNIX uid (by
doing a table lookup) for every new commit or tag being introduced
to the repository.  It also verifies that the user can update that
branch, create it, delete it, or rewind it.

It sounds like you would need to add some additional rules about
specific paths being modified only by certain people in certain
branches (for the SELinux stuff), and running other validations in
the documentation (whatever that is).

> The tree that goes out to users is NOT git or CVS. What you point to
> here is impossible unless we forced all of the users to migrate to git
> (a truly herculean task if there was ever one).
> It's a tarball or an rsync of an automatically managed CVS checkout.
> (Tarballs go onto the release media, and are also widely used by those
> that sneaker-net their trees to machines for security reasons).
> Alternatively, the users browse the viewcvs, and pull something from the
> Attic. Regardless of where they get the file from, the problem is that
> the file doesn't contain any markers to help the developers merge it
> back again.

Git won't do this for you.  We specifically don't mangle source[*1*].

What you could do is create a program that mangles the files before
delivery.  You would probably want to do something like:

  $Id: 7fbf239:path/to/file$

where 7fbf239 is the earliest commit that introduced that particular
version of path/to/file, even if that is months old.  That would
be most like what CVS would do.  8 char abbreviated commits should
be reasonably stable, and not too long to read or copy and paste.
A format like the above would also be easy to grab and copy into
a Git command line.

If we had a Git library that could access the repository, this would
a pretty easy program to write.  You are basically blaming each path
in the current HEAD commit on the parent, until you cannot blame
anyone else for that path.  You do this blame on the entire tree,
and then output the munged structure (or only the files you want
munged).

Its good we have a GSoC project working on libification!  ;-)

[*1*] Yes, I'm ignoring the nutso crlf support that's now in...  Even
      though I work on Windows, the only true line ending is LF.  ;-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15  4:31         ` Shawn O. Pearce
@ 2007-04-15  5:57           ` Nguyen Thai Ngoc Duy
  2007-04-15  8:54             ` Jakub Narebski
  2007-04-15 18:18             ` Linus Torvalds
  2007-04-15  9:44           ` Robin H. Johnson
  1 sibling, 2 replies; 34+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2007-04-15  5:57 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Robin H. Johnson, Git Mailing List

On 4/15/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> > The tree that goes out to users is NOT git or CVS. What you point to
> > here is impossible unless we forced all of the users to migrate to git
> > (a truly herculean task if there was ever one).
> > It's a tarball or an rsync of an automatically managed CVS checkout.
> > (Tarballs go onto the release media, and are also widely used by those
> > that sneaker-net their trees to machines for security reasons).
> > Alternatively, the users browse the viewcvs, and pull something from the
> > Attic. Regardless of where they get the file from, the problem is that
> > the file doesn't contain any markers to help the developers merge it
> > back again.
>
> Git won't do this for you.  We specifically don't mangle source[*1*].
>
> What you could do is create a program that mangles the files before
> delivery.  You would probably want to do something like:
>
>   $Id: 7fbf239:path/to/file$
>
> where 7fbf239 is the earliest commit that introduced that particular
> version of path/to/file, even if that is months old.  That would
> be most like what CVS would do.  8 char abbreviated commits should
> be reasonably stable, and not too long to read or copy and paste.
> A format like the above would also be easy to grab and copy into
> a Git command line.
>
> If we had a Git library that could access the repository, this would
> a pretty easy program to write.  You are basically blaming each path
> in the current HEAD commit on the parent, until you cannot blame
> anyone else for that path.  You do this blame on the entire tree,
> and then output the munged structure (or only the files you want
> munged).
>
> Its good we have a GSoC project working on libification!  ;-)
>
> [*1*] Yes, I'm ignoring the nutso crlf support that's now in...  Even
>       though I work on Windows, the only true line ending is LF.  ;-)

Can we add an attribute like Subversion's svn:keywords? If the
attribute is set, we expand keywords when checkout and remove
expansion in memory before doing any git operations. It's some kind of
I/O filter for working directory access.

-- 
Duy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15  5:57           ` Nguyen Thai Ngoc Duy
@ 2007-04-15  8:54             ` Jakub Narebski
  2007-04-15 18:18             ` Linus Torvalds
  1 sibling, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2007-04-15  8:54 UTC (permalink / raw)
  To: git

Nguyen Thai Ngoc Duy wrote:

> On 4/15/07, Shawn O. Pearce <spearce@spearce.org> wrote:
>> > The tree that goes out to users is NOT git or CVS. What you point to
>> > here is impossible unless we forced all of the users to migrate to git
>> > (a truly herculean task if there was ever one).
>> > It's a tarball or an rsync of an automatically managed CVS checkout.
>> > (Tarballs go onto the release media, and are also widely used by those
>> > that sneaker-net their trees to machines for security reasons).
>> > Alternatively, the users browse the viewcvs, and pull something from
the
>> > Attic. Regardless of where they get the file from, the problem is that
>> > the file doesn't contain any markers to help the developers merge it
>> > back again.
>>
>> Git won't do this for you.  We specifically don't mangle source[*1*].
>>
>> What you could do is create a program that mangles the files before
>> delivery.  You would probably want to do something like:
>>
>>   $Id: 7fbf239:path/to/file$
>>
>> where 7fbf239 is the earliest commit that introduced that particular
>> version of path/to/file, even if that is months old.  That would
>> be most like what CVS would do.  8 char abbreviated commits should
>> be reasonably stable, and not too long to read or copy and paste.
>> A format like the above would also be easy to grab and copy into
>> a Git command line.
>>
>> If we had a Git library that could access the repository, this would
>> a pretty easy program to write.  You are basically blaming each path
>> in the current HEAD commit on the parent, until you cannot blame
>> anyone else for that path.  You do this blame on the entire tree,
>> and then output the munged structure (or only the files you want
>> munged).
>>
>> Its good we have a GSoC project working on libification!  ;-)
>>
>> [*1*] Yes, I'm ignoring the nutso crlf support that's now in...  Even
>>       though I work on Windows, the only true line ending is LF.  ;-)
> 
> Can we add an attribute like Subversion's svn:keywords? If the
> attribute is set, we expand keywords when checkout and remove
> expansion in memory before doing any git operations. It's some kind of
> I/O filter for working directory access.

There was some talk about keyword expansion, and it is doable IIRC.
Check out threads containing:
  Message-ID: <20070301175200.GA21433@informatik.uni-freiburg.de>
  http://permalink.gmane.org/gmane.comp.version-control.git/41108
(with some inane totally irrelevant subject)  

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15  5:57           ` Nguyen Thai Ngoc Duy
  2007-04-15  8:54             ` Jakub Narebski
@ 2007-04-15 18:18             ` Linus Torvalds
  2007-04-15 19:51               ` Andy Parkins
  1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2007-04-15 18:18 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Shawn O. Pearce, Robin H. Johnson, Git Mailing List

On Sun, 15 Apr 2007, Nguyen Thai Ngoc Duy wrote:
> 
> Can we add an attribute like Subversion's svn:keywords? If the
> attribute is set, we expand keywords when checkout and remove
> expansion in memory before doing any git operations. It's some kind of
> I/O filter for working directory access.

NNOOo-oooo...

Keyword substitution is just *stupid*. It's an inexcusable braindamage. 
Don't do it. It leads to all kinds of idiotic problems downstream, and it 
really doesn't help *anything* except for "but I'm used to it". There are 
absolutely no valid uses for it.

If you want to tag your files somehow, do it in "git archive" when 
exporting it, but not in the working tree. And realize that once you 
export it with the stupid keyword expansion, diffs etc will all be 
corrupted, and will not - AND MUST NOT - apply to the uncorrupted working 
tree.

			Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15 18:18             ` Linus Torvalds
@ 2007-04-15 19:51               ` Andy Parkins
  2007-04-15 20:51                 ` Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Parkins @ 2007-04-15 19:51 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

On Sunday 2007, April 15, Linus Torvalds wrote:

> Keyword substitution is just *stupid*. It's an inexcusable
> braindamage. Don't do it. It leads to all kinds of idiotic problems
> downstream, and it really doesn't help *anything* except for "but I'm
> used to it". There are absolutely no valid uses for it.

You're right that it can cause problems, but it is certainly not the 
case that there are no valid uses for it.  I've mentioned it before but 
I'll say it again, because it is the only feature I miss from 
subversion and I can't see why it is invalid.

I keep diagrams for a project in SVG format in the repository, this 
works very well because SVG is so nicely ASCII.  In the title block of 
the diagram I put "$Id$", then in subversion, after checking in and 
updating it got expanded to

 $Id: diagram.svg 148 2002-07-28 21:30:43Z andyp $

Now, I print out that diagram and pin it to my wall - sometimes copies 
of it are given to others.  I do this on a regular basis.  The diagram 
is big and complicated and all versions of it look very similar.  In 
short it is very convenient to have the version of the file actually 
printed on the piece of paper.  This is a piece of paper remember, 
there is no way to hash the daigram, or even look at the underlying 
source.  When someone comes to me with a random version of the diagram, 
I can use that ID to checkout exactly the revision that that diagram 
refers to.

Please explain to me why that is not a valid use.

> If you want to tag your files somehow, do it in "git archive" when
> exporting it, but not in the working tree. And realize that once you
> export it with the stupid keyword expansion, diffs etc will all be
> corrupted, and will not - AND MUST NOT - apply to the uncorrupted
> working tree.

All of the problems you describe apply equally to CRLF conversion, and 
yet there seems to be no problem with implementing that.  In fact the 
problem there is significantly worse, as it changes every line of the 
file.

Now, solving the keyword problem is not simple, obviously, but it's 
certainly not impossible.  On git-add the expanded tags get unexpanded 
so $Tag: blah blah blah$ becomes $Tag$; on checkout they get expanded. 
Similarly while calculating diffs - the diff engine unexpands as it 
goes so the lines with the keywords in them are not seen as different 
regardless of the expanded part.

Applying diffs from some external source doesn't corrupt anything - 
because the diff engine is, by definition, going to unexpand the 
keywords when it compares.

So, someone sends you a diff that has this:

- /* $Id: diagram.svg 148 2002-07-28 21:30:43Z andyp $ */
+ /* $Id: diagram.svg 149 2002-07-29 20:32:47Z andyp $ */

And you apply it to the working tree - well, that line will be seen as 
this by the diff engine:

- /* $Id$ */
+ /* $Id$ */

No change.  Obviously this is entirely optional and would be activated 
on a per-file basis.  For git it would be even more useful because of 
all the information actually available.  I'd love to have git-keywords 
like these:

 $Commit: 2bfe3cec92be4f5e3bfc0e71ed560df4a726c07b$
 $Object: b1bd9e46c2bd64e00b671ff5ed512d9c12b53309$
 $Describe: v1.5.1.1-83-g2bfe3ce$
 $Id: cache.h v1.5.1.1-83-g2bfe3ce $

Feelings seem very strong about this; I've seen comments again and again 
about how braindamaged it is and I just can't see it - please, help me 
see - what is it that is so utterly broken about it?  I can see that it 
adds a complication to many parts, but I can't see why it is seen as so 
evil.

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15 19:51               ` Andy Parkins
@ 2007-04-15 20:51                 ` Linus Torvalds
  2007-04-16  0:11                   ` Bill Lear
                                     ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-04-15 20:51 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson

On Sun, 15 Apr 2007, Andy Parkins wrote:
> 
> You're right that it can cause problems, but it is certainly not the 
> case that there are no valid uses for it.

I'm sorry, but you're just wrong.

There are no valid uses for it in the working tree. Full stop.

There are valid uses to tag sources with some revision information WHEN IT 
LEAVES THE REVISION CONTROLLED ENVIRONMENT, but not one second before 
that.

> I keep diagrams for a project in SVG format in the repository, this 
> works very well because SVG is so nicely ASCII.  In the title block of 
> the diagram I put "$Id$", then in subversion, after checking in and 
> updating it got expanded to
> 
>  $Id: diagram.svg 148 2002-07-28 21:30:43Z andyp $
> 
> Now, I print out that diagram and pin it to my wall - sometimes copies 
> of it are given to others.  I do this on a regular basis.

And is there *any* reason why you don't just do that as an "export" 
option, when it's very clear that people won't send diffs that include it 
and that will cause all the endless problems that keyword expansion 
causes?

Why would you ever have the pain and suffering of using it within the 
source control issue? Especially since you would be a *lot* better off 
using just an export script that can do a lot better than CVS/SVN keyword 
expansion could ever do (ie you can add all sorts of more relevant 
information than just a date and user name!)

> Please explain to me why that is not a valid use.

It's not a valid use because there are many SO MUCH BETTER WAYS to get the 
same thing, that have none of the downsides of keyword expansion?

Your argument is akin to saying that "Why isn't it a valid use to replace 
the steering wheel in my car with a mouth-operated joystick under the 
passenger side seat?"

Sure, you *can* steer a car by mouthing at it while having your head under 
the passenger side seat, and your butt sticking out through the moonroof 
("We could add a periscope so that I can see where I'm going!")

But that's not an argument *for* doing it, when there are ways that are 
obviously much better, and don't _need_ the periscope!

See?

The fact that you *can* do something is not a valid argument for it being 
a valid use. You *can* do stupid things, but if you can get to the same 
end result by not doing stupid things, wouldn't you prefer that instead?

Here's a small makefile snippet for you:

	%.prt: %.svg
		sed 's/\$$Id\$$/\$$ $(shell git log --pretty=format:"%h: %s (%an)" --abbrev-commit -1 file.svg) \$$/g' < $< > $@

which would need some work (it doesn't quote things right - in reality 
you'd write a simple script to do this properly).

See? No need for a periscope, and your butt can be toasty warm too if you 
just add a seat heater option...

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15 20:51                 ` Linus Torvalds
@ 2007-04-16  0:11                   ` Bill Lear
  2007-04-16  9:10                     ` Andy Parkins
  2007-04-16  2:17                   ` Robin H. Johnson
  2007-04-16  9:03                   ` Andy Parkins
  2 siblings, 1 reply; 34+ messages in thread
From: Bill Lear @ 2007-04-16  0:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Parkins, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

On Sunday, April 15, 2007 at 13:51:42 (-0700) Linus Torvalds writes:
>On Sun, 15 Apr 2007, Andy Parkins wrote:
>> 
>> You're right that it can cause problems, but it is certainly not the 
>> case that there are no valid uses for it.
>
>I'm sorry, but you're just wrong.
>
>There are no valid uses for it in the working tree. Full stop.
>
>There are valid uses to tag sources with some revision information WHEN IT 
>LEAVES THE REVISION CONTROLLED ENVIRONMENT, but not one second before 
>that. ...

Not that Linus needs any back-up from me, but I second this, very
strongly.  Decorating source code with release information is a proper
function of release management tools, not the SCM system.  We had a
similar argument in our company about this, sparked by a criticism of
git for not having keyword (version number) substitution, and I argued
that having such substitution functions in the SCM was out-of-place
and a crutch for weak release procedures.  It's easy with a proper
make system to put whatever information you want from the SCM into the
release product.

This would probably be as crazy as asking for saving and restoring
timestamps in the working tree on checkout of branches, and we know
how insane that is...

Bill

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  0:11                   ` Bill Lear
@ 2007-04-16  9:10                     ` Andy Parkins
  2007-04-16 15:17                       ` Julian Phillips
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Parkins @ 2007-04-16  9:10 UTC (permalink / raw)
  To: git
  Cc: Bill Lear, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

On Monday 2007 April 16 01:11, Bill Lear wrote:

> Not that Linus needs any back-up from me, but I second this, very
> strongly.  Decorating source code with release information is a proper
> function of release management tools, not the SCM system.  We had a
> similar argument in our company about this, sparked by a criticism of
> git for not having keyword (version number) substitution, and I argued
> that having such substitution functions in the SCM was out-of-place
> and a crutch for weak release procedures.  It's easy with a proper
> make system to put whatever information you want from the SCM into the
> release product.

I'm not disagreeing with any of this - there are certainly cases when 
expansion is completely the wrong tool.  That doesn't mean there are no cases 
where it would be useful.

The case I keep banging on about is that where nothing is made and this is not 
a release.  I don't want to make a release, I just want to print out the 
current version of a file and have something that appears on the printout 
that would allow me to identify what version of the file that printout was 
from.  Are you seriously suggesting I should run release scripts just for 
that?

It's not something you want - fine - not a problem for me that you wouldn't 
use it.  The thing that is bothering me is that everyone keeps waving their 
hands while chanting "keyword expansion evil", while not giving an example of 
what problem it causes.  By this I mean "problem for the end user", 
not "problem in writing the support" - if it's impractical to implement then 
that's fine, say that.

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  9:10                     ` Andy Parkins
@ 2007-04-16 15:17                       ` Julian Phillips
  0 siblings, 0 replies; 34+ messages in thread
From: Julian Phillips @ 2007-04-16 15:17 UTC (permalink / raw)
  To: Andy Parkins
  Cc: git, Bill Lear, Linus Torvalds, Nguyen Thai Ngoc Duy,
	Shawn O. Pearce, Robin H. Johnson

On Mon, 16 Apr 2007, Andy Parkins wrote:

> On Monday 2007 April 16 01:11, Bill Lear wrote:
>
>> Not that Linus needs any back-up from me, but I second this, very
>> strongly.  Decorating source code with release information is a proper
>> function of release management tools, not the SCM system.  We had a
>> similar argument in our company about this, sparked by a criticism of
>> git for not having keyword (version number) substitution, and I argued
>> that having such substitution functions in the SCM was out-of-place
>> and a crutch for weak release procedures.  It's easy with a proper
>> make system to put whatever information you want from the SCM into the
>> release product.
>
> I'm not disagreeing with any of this - there are certainly cases when
> expansion is completely the wrong tool.  That doesn't mean there are no cases
> where it would be useful.
>
> The case I keep banging on about is that where nothing is made and this is not
> a release.  I don't want to make a release, I just want to print out the
> current version of a file and have something that appears on the printout
> that would allow me to identify what version of the file that printout was
> from.  Are you seriously suggesting I should run release scripts just for
> that?
>
> It's not something you want - fine - not a problem for me that you wouldn't
> use it.  The thing that is bothering me is that everyone keeps waving their
> hands while chanting "keyword expansion evil", while not giving an example of
> what problem it causes.  By this I mean "problem for the end user",
> not "problem in writing the support" - if it's impractical to implement then
> that's fine, say that.
>

What I don't understand is why the people who want keyword expansion don't 
simply write a little wrapper script, a keyworded git as it were (you 
could even call it gitk for maximum confusion :P).

In the script you simply:

1) collapse all keywords
2) call appropriate git function
3) expand keywords again

wouldn't that do what people want without having to change the git code at 
all?  You could probably even get it into contrib ..

(In the case of gentoo, you could even change the ebuild so that the real 
git is installed as raw_git or something, and the wrapper is installed as 
git - though personally I wouldn't want to do that)

-- 
Julian

  ---
You may get an opportunity for advancement today.  Watch it!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15 20:51                 ` Linus Torvalds
  2007-04-16  0:11                   ` Bill Lear
@ 2007-04-16  2:17                   ` Robin H. Johnson
  2007-04-16  3:01                     ` Theodore Tso
  2007-04-16 14:59                     ` Linus Torvalds
  2007-04-16  9:03                   ` Andy Parkins
  2 siblings, 2 replies; 34+ messages in thread
From: Robin H. Johnson @ 2007-04-16  2:17 UTC (permalink / raw)
  To: Linus Torvalds, Git Mailing List
  Cc: Andy Parkins, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

[-- Attachment #1: Type: text/plain, Size: 1959 bytes --]

On Sun, Apr 15, 2007 at 01:51:42PM -0700, Linus Torvalds wrote:
> There are valid uses to tag sources with some revision information WHEN IT 
> LEAVES THE REVISION CONTROLLED ENVIRONMENT, but not one second before 
> that.
Nobody has addressed the single problem that I have with adding it when
it's leaving the environment, and that's still of paramount concern to
me. Simply put, there is a conflict between being able to add revision
information of stuff leaving the environment, and those additions
breaking previous checksums (which may be digitally signed, and thus
breaking the signatures).

I'll reduce it further from my previous example.

1. Developer commits some change to file A.
2. The checksum file is updated because A changed (the checksum file
   explicitly does not contain keywords).
3. Developer signs the checksum file, and commits it.

If during the export process (which is undertaken elsewhere, by a
different person or script), file A now has an expansion applied to it,
you break the checksum file, which you CANNOT redo, because you lose the
developer's digital signature on the checksum file!

Using the existing git-verify-tag mechanisms are not suitable, because
it is the exported information that must be verifiable.

There's FOUR possible solutions here:
1. The commit to file A does the keywords - Which Linus is against.
2. An ADDITIONAL commit to file A, after the initial commit, as a
   scripted addition of the keywords, but before the checksum is
   updated. I think this is messy myself, as you'd have to insert the
   data from the N-1 commit always.
3. Lose the ability to tag the files leaving the environment.
4. Stop digitally signing the checksum file (which then leaves the
   possibility for other attacks).

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  2:17                   ` Robin H. Johnson
@ 2007-04-16  3:01                     ` Theodore Tso
  2007-04-16  3:23                       ` Nguyen Thai Ngoc Duy
  2007-04-16  3:32                       ` Robin H. Johnson
  2007-04-16 14:59                     ` Linus Torvalds
  1 sibling, 2 replies; 34+ messages in thread
From: Theodore Tso @ 2007-04-16  3:01 UTC (permalink / raw)
  To: Linus Torvalds, Git Mailing List, Andy Parkins,
	Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson

On Sun, Apr 15, 2007 at 07:17:29PM -0700, Robin H. Johnson wrote:
> Nobody has addressed the single problem that I have with adding it when
> it's leaving the environment, and that's still of paramount concern to
> me. Simply put, there is a conflict between being able to add revision
> information of stuff leaving the environment, and those additions
> breaking previous checksums (which may be digitally signed, and thus
> breaking the signatures).
> 
> I'll reduce it further from my previous example.
> 
> 1. Developer commits some change to file A.
> 2. The checksum file is updated because A changed (the checksum file
>    explicitly does not contain keywords).
> 3. Developer signs the checksum file, and commits it.
> 
> If during the export process (which is undertaken elsewhere, by a
> different person or script), file A now has an expansion applied to it,
> you break the checksum file, which you CANNOT redo, because you lose the
> developer's digital signature on the checksum file!

Simple, the release engineer runs a script which exports the tree,
expanding any keywords and updating the checksum file as necessary,
and then the release engineer signs the checksum file!  As has already
been stated, if this doesn't work, you probably don't have a well
defined and formal release process. 

Just because a developer has signed a checksum doesn't mean that the
tree is suitable for release; that's the job of the release engineer
to confirm, probably after running a set of regression test suites.
And in fact, with git, it's pointless for the developer to sign a
checksum file and then commit it, since git is already maintaining
checksums as an integral part of how revisions are named.  

					- Ted

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  3:01                     ` Theodore Tso
@ 2007-04-16  3:23                       ` Nguyen Thai Ngoc Duy
  2007-04-16 15:08                         ` Linus Torvalds
  2007-04-16  3:32                       ` Robin H. Johnson
  1 sibling, 1 reply; 34+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2007-04-16  3:23 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Linus Torvalds, Git Mailing List, Andy Parkins, Shawn O. Pearce,
	Robin H. Johnson

On 4/16/07, Theodore Tso <tytso@mit.edu> wrote:
> Simple, the release engineer runs a script which exports the tree,
> expanding any keywords and updating the checksum file as necessary,
> and then the release engineer signs the checksum file!  As has already
> been stated, if this doesn't work, you probably don't have a well
> defined and formal release process.
>
> Just because a developer has signed a checksum doesn't mean that the
> tree is suitable for release; that's the job of the release engineer
> to confirm, probably after running a set of regression test suites.
> And in fact, with git, it's pointless for the developer to sign a
> checksum file and then commit it, since git is already maintaining
> checksums as an integral part of how revisions are named.

Changing Gentoo release process won't make Git the best choice while
other SCM candidates can provide the same functionalities that Gentoo
needs without changing the process.
-- 
Duy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  3:23                       ` Nguyen Thai Ngoc Duy
@ 2007-04-16 15:08                         ` Linus Torvalds
  2007-04-16 16:06                           ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2007-04-16 15:08 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Theodore Tso, Git Mailing List, Andy Parkins, Shawn O. Pearce,
	Robin H. Johnson

On Mon, 16 Apr 2007, Nguyen Thai Ngoc Duy wrote:
> 
> Changing Gentoo release process won't make Git the best choice while
> other SCM candidates can provide the same functionalities that Gentoo
> needs without changing the process.

Ahh, the old "argument by blackmail" approach.

You know what? Nobody really cares. Arguing by blackmail ("we'll use 
something else then") just means that you should go somewhere else. If you 
cannot respond intelligently to intelligent arguments, you really *are* 
better off using SVN. 

A billion flies aren't exactly wrong: crap really *is* good. If you're a 
fly or a maggot.

But if you ever actually want to be something *more* than a crap eater, 
come back then.

			Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16 15:08                         ` Linus Torvalds
@ 2007-04-16 16:06                           ` Nguyen Thai Ngoc Duy
  0 siblings, 0 replies; 34+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2007-04-16 16:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Theodore Tso, Git Mailing List, Andy Parkins, Shawn O. Pearce,
	Robin H. Johnson

On 4/16/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 16 Apr 2007, Nguyen Thai Ngoc Duy wrote:
> >
> > Changing Gentoo release process won't make Git the best choice while
> > other SCM candidates can provide the same functionalities that Gentoo
> > needs without changing the process.
>
> Ahh, the old "argument by blackmail" approach.
>
> You know what? Nobody really cares. Arguing by blackmail ("we'll use
> something else then") just means that you should go somewhere else. If you
> cannot respond intelligently to intelligent arguments, you really *are*
> better off using SVN.

All right. I didn't mean to blackmail you or any Git developer. What I
wanted to say is that Gentoo is currently using an old, brain-damaged
SCM called CVS. I would like it to use Git but Git in its current
state can not fully replace CVS regarding to Gentoo usage. To do that
Gentoo needs some changes itself but Gentoo repositories are big ones
and it's just hard to change such beast s from bottom up. So I would
like to see a compromise from Git (which, I think, does not harm other
projects from using Git) to ease the migration.

>
> A billion flies aren't exactly wrong: crap really *is* good. If you're a
> fly or a maggot.
>
> But if you ever actually want to be something *more* than a crap eater,
> come back then.
>

I would want to _slowly_ evolve from a crap eater to something better
because I couldn't become a non-crap eater in a flash :)

>                         Linus
>


-- 
Duy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  3:01                     ` Theodore Tso
  2007-04-16  3:23                       ` Nguyen Thai Ngoc Duy
@ 2007-04-16  3:32                       ` Robin H. Johnson
  2007-04-16 17:00                         ` Linus Torvalds
  2007-04-17  4:16                         ` Daniel Barkalow
  1 sibling, 2 replies; 34+ messages in thread
From: Robin H. Johnson @ 2007-04-16  3:32 UTC (permalink / raw)
  To: Theodore Tso, Git Mailing List
  Cc: Linus Torvalds, Andy Parkins, Nguyen Thai Ngoc Duy,
	Shawn O. Pearce, Robin H. Johnson

[-- Attachment #1: Type: text/plain, Size: 3198 bytes --]

On Sun, Apr 15, 2007 at 11:01:03PM -0400, Theodore Tso wrote:
> On Sun, Apr 15, 2007 at 07:17:29PM -0700, Robin H. Johnson wrote:
> > Nobody has addressed the single problem that I have with adding it when
> > it's leaving the environment, and that's still of paramount concern to
> > me. Simply put, there is a conflict between being able to add revision
> > information of stuff leaving the environment, and those additions
> > breaking previous checksums (which may be digitally signed, and thus
> > breaking the signatures).
> > 
> > I'll reduce it further from my previous example.
> > 
> > 1. Developer commits some change to file A.
> > 2. The checksum file is updated because A changed (the checksum file
> >    explicitly does not contain keywords).
> > 3. Developer signs the checksum file, and commits it.
> > 
> > If during the export process (which is undertaken elsewhere, by a
> > different person or script), file A now has an expansion applied to it,
> > you break the checksum file, which you CANNOT redo, because you lose the
> > developer's digital signature on the checksum file!
> 
> Simple, the release engineer runs a script which exports the tree,
> expanding any keywords and updating the checksum file as necessary,
> and then the release engineer signs the checksum file!  As has already
> been stated, if this doesn't work, you probably don't have a well
> defined and formal release process. 
The checksum file (named Manifest) we are talking about is for a single
subdirectory, and is signed as proof that it was not modified between
the developer and submission to the tree. 

As I wrote originally, this is the Gentoo distribution tree, it's NOT
delineated by well-defined releases in the conventional sense.

There are presently 11571 Manifest files in the tree. Our tools will
not allow commits to each package of things that radically break the
package (semantic correctness and some automatic validation, but thinkos
can still get through the checks).

The 'release' process for the tree runs automatically every 30 minutes,
and consists of more validation checks, updating a cache directory,
producing a signed master Manifest [1] and publishing everything to the
rsync servers.

> Just because a developer has signed a checksum doesn't mean that the
> tree is suitable for release; that's the job of the release engineer
> to confirm, probably after running a set of regression test suites.
> And in fact, with git, it's pointless for the developer to sign a
> checksum file and then commit it, since git is already maintaining
> checksums as an integral part of how revisions are named.  
The entire point of the checksums is to allow end users to validate
content that has been exported, with only minimal tools.

[1] The master Manifest stage is only in production for the tree
tarballs, and NOT in the rsync production at the moment, but will be
within the next month. It exists solely to allow the detection of
compromised mirrors.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  3:32                       ` Robin H. Johnson
@ 2007-04-16 17:00                         ` Linus Torvalds
  2007-04-17  4:16                         ` Daniel Barkalow
  1 sibling, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-04-16 17:00 UTC (permalink / raw)
  To: Robin H. Johnson
  Cc: Theodore Tso, Git Mailing List, Andy Parkins,
	Nguyen Thai Ngoc Duy, Shawn O. Pearce

On Sun, 15 Apr 2007, Robin H. Johnson wrote:
>
> The checksum file (named Manifest) we are talking about is for a single
> subdirectory, and is signed as proof that it was not modified between
> the developer and submission to the tree. 

Well, in git, you can actyally just take the tree entry for that 
subdirectory, and it already is cryptographic proof that two 
subdirectories match.

(It's not signed, but if you actually want to sign it, you can do so, 
either inside git - by using a tag object that points to that 
subdirectory - or outside git by just creating a Manifest that contains a 
list of subdirectories and their tree SHA1's, and signing that).

In fact, in git, there's an explicit command to generate that "Manifest of 
directories in the top level", and it's called

	git ls-tree HEAD

and it will give you cryptographic hashes of each file/directory in the 
top level of a repository. So just sign that, ie do

	git ls-tree HEAD > Manifest
	gpg -sa -u "$username" Manifest 

or something like that. And you're done. Add the "-r" flag to get the 
recursive manifest containing *all* files, rather than just the SHA1's of 
the directories themselves.

Of course, you could just sign and tag the HEAD itself, which is what the 
kernel does, since one signature will guarantee everything under it.

> As I wrote originally, this is the Gentoo distribution tree, it's NOT
> delineated by well-defined releases in the conventional sense.

We do that for the daily (or rather, nightly) snapshots for the kernel. 
There's no "Manifest", but look at

	http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/

and you'll see files like

	patch-2.6.21-rc6-git8.bz2       15-Apr-2007 07:01   38K 	 
	patch-2.6.21-rc6-git8.bz2.sign  15-Apr-2007 07:01  248   
	patch-2.6.21-rc6-git8.gz        15-Apr-2007 07:01   42K  
	patch-2.6.21-rc6-git8.gz.sign   15-Apr-2007 07:01  248   
	patch-2.6.21-rc6-git8.id        15-Apr-2007 07:01   41   
	patch-2.6.21-rc6-git8.log       15-Apr-2007 07:01   63K  
	patch-2.6.21-rc6-git8.sign      15-Apr-2007 07:01  248  

where only the patches are signed, but the system *could* have signed the 
ID file too (the 41-byte "patch-2.6.21-rc6-git8.id" contains the 40-byte 
HEX representation of the SHA of the HEAD of the snapshot, and a newline).

That 41-byte ID file really is sufficient to describe the whole thing, 
after all (although you then need to have the git tree in question to 
actually get the list of files, aka the "Manifest", so if you want that 
list, you'd have to do the "git ls-tree" thing.

> There are presently 11571 Manifest files in the tree. Our tools will
> not allow commits to each package of things that radically break the
> package (semantic correctness and some automatic validation, but thinkos
> can still get through the checks).

Sure. And every single Manifest file is pointless *inside* git, since git 
maintains its own cryptographically secure manifest file anyway. But it's 
trivial to generate them for external use, if you want to.

> The 'release' process for the tree runs automatically every 30 minutes,
> and consists of more validation checks, updating a cache directory,
> producing a signed master Manifest [1] and publishing everything to the
> rsync servers.

That sounds like the nightly snapshots the kernel does, except we only do 
them nightly, and we don't actually validate anythign at all, we just sign 
things as being from the "master.kernel.org" site (so the signature does 
mean something, but only that *that* site thinks it is valid).

> The entire point of the checksums is to allow end users to validate
> content that has been exported, with only minimal tools.

If you do a single 41-byte thing, you could use git itself to validate the 
whole tree. But if you want to have people able to validate any random 
single file in a tar-file without having git installed, you'd have to:

 - have the "full manifest" (aka "git ls-tree -r HEAD")

 - have a trivial script that generates "git ID's" of files, which looks 
   something like this:

	#!/bin/sh
	# generate a "git ID" for one or more files
	while test -n "$1"
	do
		file="$1"
		len=$(stat --format "%s" "$file")
		echo -n " $file (blob $len): "
		# Generate the "git ID" for a blob:
		( echo -e -n "blob $len\0" ; cat "$file") | sha1sum
		shift
	done

and now you can check each file in the Manifest even without having git 
installed.

			Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  3:32                       ` Robin H. Johnson
  2007-04-16 17:00                         ` Linus Torvalds
@ 2007-04-17  4:16                         ` Daniel Barkalow
  1 sibling, 0 replies; 34+ messages in thread
From: Daniel Barkalow @ 2007-04-17  4:16 UTC (permalink / raw)
  To: Robin H. Johnson
  Cc: Theodore Tso, Git Mailing List, Linus Torvalds, Andy Parkins,
	Nguyen Thai Ngoc Duy, Shawn O. Pearce

On Sun, 15 Apr 2007, Robin H. Johnson wrote:

> The checksum file (named Manifest) we are talking about is for a single
> subdirectory, and is signed as proof that it was not modified between
> the developer and submission to the tree. 

So the process has to be:

1. Developer commits changes to files.
2. Checksum utility finds the checksums of the files with IDs added where 
   the master site updater will add them.
3. Developer signs checksums.
4. Developer commits checksums.
5. Developer pushes changes to master site.
6. Master site checks out files, adds IDs, and updates live tree.
7. End user fetches tree.
8. End user checks checksums, which match, because the master site and the 
   developer checksum scripts agree on what the end user will see.

The only difference is that developers working out of the version control 
have to generate the checksums with a tool that knows how the IDs will be 
added, and check the checksums with this tool as well, because working 
directories don't have IDs in them.

Really, it's approximately the same as having the version control system 
do it, except that it's in the project-specific development tools instead 
of the version control system.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  2:17                   ` Robin H. Johnson
  2007-04-16  3:01                     ` Theodore Tso
@ 2007-04-16 14:59                     ` Linus Torvalds
  1 sibling, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-04-16 14:59 UTC (permalink / raw)
  To: Robin H. Johnson
  Cc: Git Mailing List, Andy Parkins, Nguyen Thai Ngoc Duy,
	Shawn O. Pearce

On Sun, 15 Apr 2007, Robin H. Johnson wrote:
>
> Nobody has addressed the single problem that I have with adding it when
> it's leaving the environment, and that's still of paramount concern to
> me. Simply put, there is a conflict between being able to add revision
> information of stuff leaving the environment, and those additions
> breaking previous checksums (which may be digitally signed, and thus
> breaking the signatures).

Don't be silly. 

You can just checksum without the ID. Which you have to do with git 
anyway, since any expanded ID *itself* would be part of any ID, which 
means that under git, you *physically*cannot* make an ID string be part of 
the source control environment anyway, unless you did the SHA1 while 
ignoring the $Id$ expansion.

In other words, the problem you talk about exists *regardless*. You 
suggest pushing that problem into the SCM layer, and de-stabilizing the 
SCM and causing EVERYBODY ELSE provlems.

And I'm telling you that if you want the idiocy of keyword expansion, you 
can have it, BUT YOU CANNOT HAVE IT IN THE SCM.

Because *every* *single* problem you have with keyword expansion (whether 
it be checksums or anything else) will be MUCH MUCH worse if you do it at 
the SCM level!

Really. 

When you talk about your "single problem", why the HELL do you think that 
problem goes away just because you try to deal with it inside the SCM? 
Trust me, the problem does *not* go away, it gets *bigger*.

You're trying to push it into the SCM, because _you_ don't want to deal 
with the inevitable problems that keywords cause. But face it, the SCM 
wants to deal with them *even*less*, because they are much worse there, 
and more importantly, you'd be trying to push them into a level where most 
users have gotten over the braindamage and no longer want it!

So you're trying to make *everybody* suffer, just because you cannot do it 
right. 

And suffer people do. There's a reason people are so negative about 
keyword expansion: we've _seen_ those problems first-hand. 

So the proper solution is:
 - don't do keyword expansion on the "originals".
 - add release information when you do a release. 
 - if you want to sign releases, do so *after* the release. That's what a 
   release process is all about.
 - if you're so damn lazy that you can't be bothered to do the signing of 
   the release, don't ask others to do stupid things because *you* do 
   something stupid - just make sure that whatever release information you 
   add can be *removed*, so that you can verify an exact match.

For example, look at how "git archive" does this. It actually adds release 
information to the tar-file. It's hidden as a magic header, but that also 
means that since it's *separate* from the source code, it avoids all the 
problems with keyword expansion, and now you can (for example) diff the 
tar-ball source tree with the git tree, and you will not get spurious AND 
INCORRECT differences! And any checksums would still be valid!

And the same kind of thing can be done even if you absolutely have to 
embed the information on a file-by-file basis. Just make sure that you do 
it in some reversible manner. But preferably you generate a separate file 
(eg my hypothetical Makefile example that actually generates a "prt" file 
from a "svg" file) so that you have the original and can do any diff or 
validation efforts on *that*.

			Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15 20:51                 ` Linus Torvalds
  2007-04-16  0:11                   ` Bill Lear
  2007-04-16  2:17                   ` Robin H. Johnson
@ 2007-04-16  9:03                   ` Andy Parkins
  2007-04-16 15:54                     ` Sven Verdoolaege
                                       ` (2 more replies)
  2 siblings, 3 replies; 34+ messages in thread
From: Andy Parkins @ 2007-04-16  9:03 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

On Sunday 2007 April 15 21:51, Linus Torvalds wrote:

> > Now, I print out that diagram and pin it to my wall - sometimes copies
> > of it are given to others.  I do this on a regular basis.
>
> And is there *any* reason why you don't just do that as an "export"
> option, when it's very clear that people won't send diffs that include it

Of course there is a reason - the file I edit is the SVG itself, in inkscape 
while editing that file I press "print" to get a print out.  Why on earth 
would I want to jump through hoops by closing the file I'm editing, running 
some export script to a temporary file that I don't want, then open up 
Inkscape again, check the export looks okay and then print - on what planet 
is /that/ simpler?  Worse, there is more chance that I'll lose changes once 
there are two copies of the same file floating around.  Which one am I 
editing and which one am I printing?  Have I run the script yet?  When I 
accidentally make changes to the wrong one, I've now got to merge those 
changes by hand back to the file they should have been in in the first place.

> It's not a valid use because there are many SO MUCH BETTER WAYS to get the
> same thing, that have none of the downsides of keyword expansion?

I'm sorry, but we have different definitions of SO MUCH BETTER; it is _more_ 
trouble for me the user to have to run scripts just to print the file that is 
already on my screen, than not.

> Your argument is akin to saying that "Why isn't it a valid use to replace
> the steering wheel in my car with a mouth-operated joystick under the
> passenger side seat?"

I'd actually say that that is your argument - you want me to add steps to a 
process to get the same result.  I just want the steering wheel, you want the 
steering wheel plus script that I run first to install the steering wheel and 
correctly adapt it for the current car.  In my version the process is "I 
press print"; the fact that is hard for the version control system is 
irrelevant - the whole point of tools like git is to do work for me, not the 
other way around.

> The fact that you *can* do something is not a valid argument for it being
> a valid use. You *can* do stupid things, but if you can get to the same
> end result by not doing stupid things, wouldn't you prefer that instead?

It's not an accurate analogy at all.  Your conclusion is your supposition - 
it's stupid because it's stupid.  I don't understand what the huge problems 
are - all you've done is say again that it's a problem to have keyword 
expansion.  Why?  What problem does it actually cause?

I'm not just being argumentative - I still have not understood what terrible 
evil it is that keyword expansion causes but crlf conversion does not.

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  9:03                   ` Andy Parkins
@ 2007-04-16 15:54                     ` Sven Verdoolaege
  2007-04-16 15:58                     ` Linus Torvalds
  2007-04-16 19:41                     ` Junio C Hamano
  2 siblings, 0 replies; 34+ messages in thread
From: Sven Verdoolaege @ 2007-04-16 15:54 UTC (permalink / raw)
  To: Andy Parkins
  Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

On Mon, Apr 16, 2007 at 10:03:05AM +0100, Andy Parkins wrote:
> there are two copies of the same file floating around.  Which one am I 
> editing and which one am I printing?

Turn off write permissions on the generated file.

> Have I run the script yet?  When I 

Use a post-commit hook.

> I'm not just being argumentative - I still have not understood what terrible 
> evil it is that keyword expansion causes but crlf conversion does not.

For one thing, this keyword expansion thing requires the SCM to modify
the file during commit.  (Hey, my editor says something changed the file.
Do I have the file opened in another session?  Oh, it's the stupid
keyword expansion!)  AFAIU, crlf conversion will not change the working
tree copy of your file on commit.

skimo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  9:03                   ` Andy Parkins
  2007-04-16 15:54                     ` Sven Verdoolaege
@ 2007-04-16 15:58                     ` Linus Torvalds
  2007-04-16 23:25                       ` Weird shallow-tree conversion state, and branches of shallowtrees David Lang
  2007-04-17  9:45                       ` Weird shallow-tree conversion state, and branches of shallow trees Andy Parkins
  2007-04-16 19:41                     ` Junio C Hamano
  2 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2007-04-16 15:58 UTC (permalink / raw)
  To: Andy Parkins; +Cc: git, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson

On Mon, 16 Apr 2007, Andy Parkins wrote:
> 
> It's not an accurate analogy at all.  Your conclusion is your supposition - 
> it's stupid because it's stupid.  I don't understand what the huge problems 
> are - all you've done is say again that it's a problem to have keyword 
> expansion.  Why?  What problem does it actually cause?

The easiest way to explain it is that keyword expansion is like crlf, just 
a million times worse (but if you were to do it in git, you'd literally 
do it in the same path that does crlf expansion).

Like crlf:
 - it requires you to be careful about binary vs non-binary, and corrupts 
   binary files subtly.
 - it never appears to be a problem as long as you stay inside the "same 
   system", because everybody just agrees.

But why did I actually implement auto-CRLF, if I'm so against it? Because 
keyword expansion has a lot of problems that CRLF does *not* have:

 - pretty much every single tool out there actually handles CRLF 
   automatically. When you send emails from a CRLF system to a non-CRLF 
   system, the CRLF will just be removed. Why? Because tools *outside* the 
   SCM already know about "text vs binary", and while you can certainly 
   screw it up (use a CRLF system to generate a kernel patch and send it 
   as a binary attachment, and it won't apply for me, for example), you 
   actually have to work at it a bit.

 - A transformation like LF<->CRLF is "stateless". Anybody can translate a 
   file between CRLF and LF without having to know anything at all, so 
   even *if* somebody sends me a patch with CRLF (and it actually happens: 
   the amount of whitespace damage that people can do with email is just 
   surprisingly high, and people occasionally use Windows machines to send 
   me kernel patches, probably because they send email from some other 
   machine than the one they did development on).

 - Related to the statelessness: CRLF is a "global" operation, and doesn't 
   depend on file history or placement. Keyword expansion explicitly does
   *not* work that way, since the whole *point* of keywords is to make it 
   depend on its place in history!

An example of real-world problems with that lack of statelessness of 
keywords is something as simple as "git rebase". Think about what it does: 
it moves a commit around in history. But then think about *how* it does 
that.

[ Ok, take a break here, and think about why "keyword expansion" might be 
  a problem for "git rebase" in a way that CRLF is not, before you read on ]

Hint: the reason statefulness is broken for things like "git rebase" is 
that the natural operation for something like that is to generate a patch, 
and carry it forward. Now, what is in the patch? Keywords. Will the patch 
apply to the target? Yes? No?

See? Keywords means that you suddenly have merge problems with something 
as simple as patches. Does this matter in CVS? Not often. CVS is so 
limited that you cannot much do those operations anyway, but if you've 
ever done a merge in CVS, keyword expansion tends to be one of the things 
that just make it more complicated. So now you have to remember flags like 
like "-kk" that disable keywords.

(Not a lot of people actually do merges in CVS - branches are hard to use 
to begin with, so the only people who do branches tend to be pretty 
hardcode CVS people, and once you've learnt enough to do a branch, keyword 
expansion is the least of your problems. But it's *one* reason - however 
small compared to the other reasons - that doing things like merging in 
CVS is just more painful than it should be)

Or what about generating a diff between two branches? Keywords are a total 
*nightmare*. Do you realize just how *fast* git is in diff generation. 
Have you ever done "cvs diff"? Have you ever *thought* about how git can 
be so fast? Hint: we don't even *look* at the contents for most files. But 
if the content is "generated" depending on history, you just screwed that 
up too.

Or what about something as seemingly unrelated as "git grep". You may not 
even *realize* how nasty a problem it is when you have two different 
representations of the same data: one that has keywords in it and is 
checked out, and one that does not. Which one should you choose? Which one 
is the right one? What about the git optimization of using the checked-out 
data because it doesn't need any unpacking?

Again, none of these things are problems with CRLF: CRLF is an issue that 
is pretty much *defined* to not matter for text-files. If you do a "grep", 
it doesn't matter if lines end in LF or CRLF. If you do a diff, line 
ending differences (a) shouldn't exist in the first place because they are 
stateless and (b) even if they were to exist, they shouldn't change the 
diff, because LF and CRLF are the same in text.

And the whole keyword issue gets *worse* when you move between 
repositories. If you stay "inside" the SCM, you can generally teach it to 
ignore them. For example, going back to the "git rebase" example (or the 
"git grep" one, for that matter), you can just define that it's done 
without keyword expansion.

But when you move the data between people? That's exactly where keyword 
expansion is enabled, and now you not only make things like "git diff" 
fundamentally broken and much much slower (in fact, it *cannot*work* in 
the git model, because we don't even *have* tree history, so you cannot 
add keywords to a tree!), you also guarantee that the end result is much 
less useful, because now when you send the patch to others, they'll have 
all the same issues that you had to work around locally.

I don't know if I can convince you, but take it from me, keyword expansion 
is fundamentally broken in the first place, but it's *more* so with git 
than with CVS, for example.

In CVS, the reason you can do keyword expansion in the first place is:

 - it's file-based to begin with. A file actually *has* history in CVS, in 
   a way it fundmanentally does *not* have in git. So when you generate a 
   diff on a file, the revision information is "just there". That's simply 
   not true in git. There *is* no per-file revision information. You 
   cannot know who touched the file last, for example, without starting 
   from a commit, and doing very expensive things.

 - it's slow to begin with. This is related to the above thing: exactly 
   because CVS is file-based and not content-based, when you do things 
   like "cvs diff" you will walk files individually anyway. People 
   *accept* (and I cannot imagine why) that an empty "cvs diff" on some 
   big project will take minutes. And the problems aren't even about 
   keyword expansion - keyword expansion is just a small detail.

 - it's centralized in more ways than one. You are simply not expected to 
   work by applying patches between two unrelated CVS trees. It's not 
   done. It cannot work. The closest you get is 
	(a) merging. Which is *hell*. Again, keyword expansion is just a
	    small detail in why it's hell, and people don't generally pick 
	    it up exactly because the merge problems are so much bigger.
	(b) applying patches from the outside from people who do *not* use 
	    CVS, and thus don't generally touch things around the 
	    keywords (but even here, you actually end up having problems
	    occasionally).

 - CVS really fundamentally has so many other problems that keyword
   expansion just isn't on peoples radar. Yeah, it can corrupt data, but 
   you're more likely to corrupt data with binary files other ways, so 
   it's just not an issue.

So basically, other (more fundamental) design mistakes in CVS make 
keywords seem like a better idea there, but all the keyword problems are 
just magnified ten-fold by the fact that git doesn't make those _other_ 
mistakes that CVS does.

And don't get me wrong: I think RCS was a great step forward, and CVS was 
too. A few decades ago. But in git, we sometimes have to teach people to 
*not* make the mistakes they did with CVS. Keyword expansion is a small 
detail, and happily few enough people used it in CVS that it's so far not 
been a huge problem to teach people not to do it.

We had to teach people that there's a difference between doing a local 
repository commit, and pushing that commit to a shared central point. 
That's a much more fundamental difference, and it's a lot harder to get 
your brain to accept that kind of change. In contrast, keywords look 
"trivial", but they really aren't. It's a fundamentally broken notion, 
even if it *sounds* like a small detail.

I'll finish off trying to explain the problem in fundamental git terms: 
say you have a repository with two branches, A and B, and different 
history  on a file "xyzzy" in those two branches, but because they both 
ended up applying the same patches, the actual file contents do end up 
being 100% identical. So they have the same SHA1.

What is

	git diff A..B -- xyzzy

supposed to print?

And *I* claim that if you don't get an immediate and empty diff, your 
system is TOTALLY BROKEN.

And now think about what keywords do. And realize that keywords are 
TOTALLY BROKEN!

			Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of  shallowtrees
  2007-04-16 15:58                     ` Linus Torvalds
@ 2007-04-16 23:25                       ` David Lang
  2007-04-17 19:50                         ` David Lang
  2007-04-17  9:45                       ` Weird shallow-tree conversion state, and branches of shallow trees Andy Parkins
  1 sibling, 1 reply; 34+ messages in thread
From: David Lang @ 2007-04-16 23:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Parkins, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

I have a different situation where I'm interested in keyword expansions, and am 
waiting for the appropriate hooks to be added to git to allow be to use it.

I have a bunch of config files on different servers that are logicly equivalent, 
even though they have different values in some fields there is a translation 
table in my software that tells it what to do.

I'd really like to have a version control repository that I can 
share/more/replicate across the machines. to do this on checkin the software 
would need to run my helper to create a 'generic' version and check that in. on 
checkout it would need to run my helper to take the generic version and make the 
host specific version.

a lot of the problems taht you refer to in your message apply to most of the 
things that have been discussed related to gitattributes.

if improperly used it can corrupt the data (either by the checkin/checkout 
munging or inappropriately merging things)

it breaks the 1-1 coorespondance between the packed version and the checked out 
version.

On Mon, 16 Apr 2007, Linus Torvalds wrote:

> On Mon, 16 Apr 2007, Andy Parkins wrote:
>
> [ Ok, take a break here, and think about why "keyword expansion" might be
>  a problem for "git rebase" in a way that CRLF is not, before you read on ]
>
> Hint: the reason statefulness is broken for things like "git rebase" is
> that the natural operation for something like that is to generate a patch,
> and carry it forward. Now, what is in the patch? Keywords. Will the patch
> apply to the target? Yes? No?

if you send a patch, that patch needs to be relative to the connonical version, 
namely what's checked into the SCM. if your patch includes keywords it won't 
apply cleanly to a checked-out of the file. any mergeing and merge resolution 
needs to be based on the connonical version (i.e. one that doesn't go through 
the checkin/out conversion)

> See? Keywords means that you suddenly have merge problems with something
> as simple as patches. Does this matter in CVS? Not often. CVS is so
> limited that you cannot much do those operations anyway, but if you've
> ever done a merge in CVS, keyword expansion tends to be one of the things
> that just make it more complicated. So now you have to remember flags like
> like "-kk" that disable keywords.

I don't think the problems with patches are insurmountable. if everyone in the 
project is useing git then you don't have to worry about anything, things will 
just work (except for manually fixing failed merges)

I would definantly agree that sprinkling a little of this into a large project 
is going to massivly confuse people

> Or what about generating a diff between two branches? Keywords are a total
> *nightmare*. Do you realize just how *fast* git is in diff generation.
> Have you ever done "cvs diff"? Have you ever *thought* about how git can
> be so fast? Hint: we don't even *look* at the contents for most files. But
> if the content is "generated" depending on history, you just screwed that
> up too.

you do a diff of the connonical files in the repository, the same way you do 
today.

> Or what about something as seemingly unrelated as "git grep". You may not
> even *realize* how nasty a problem it is when you have two different
> representations of the same data: one that has keywords in it and is
> checked out, and one that does not. Which one should you choose? Which one
> is the right one? What about the git optimization of using the checked-out
> data because it doesn't need any unpacking?

the one with the keywords is the one to choose. and you suffer a performance hit 
becouse you can't use the checked-out version (without running it through the 
conversion, which is a performance hit itslef)

> And the whole keyword issue gets *worse* when you move between
> repositories. If you stay "inside" the SCM, you can generally teach it to
> ignore them. For example, going back to the "git rebase" example (or the
> "git grep" one, for that matter), you can just define that it's done
> without keyword expansion.

right, this would avoid most of the problems

> But when you move the data between people? That's exactly where keyword
> expansion is enabled, and now you not only make things like "git diff"
> fundamentally broken and much much slower (in fact, it *cannot*work* in
> the git model, because we don't even *have* tree history, so you cannot
> add keywords to a tree!), you also guarantee that the end result is much
> less useful, because now when you send the patch to others, they'll have
> all the same issues that you had to work around locally.

why would you do keyword expansion when moving the files between different 
people's repositories? or is that still considered 'inside the SCM'?

> I don't know if I can convince you, but take it from me, keyword expansion
> is fundamentally broken in the first place, but it's *more* so with git
> than with CVS, for example.
>
> In CVS, the reason you can do keyword expansion in the first place is:
>
> - it's file-based to begin with. A file actually *has* history in CVS, in
>   a way it fundmanentally does *not* have in git. So when you generate a
>   diff on a file, the revision information is "just there". That's simply
>   not true in git. There *is* no per-file revision information. You
>   cannot know who touched the file last, for example, without starting
>   from a commit, and doing very expensive things.

this is a valid argument against the keyword being a version string. it's not 
nessasarily relavent to other uses.

> - it's slow to begin with. This is related to the above thing: exactly
>   because CVS is file-based and not content-based, when you do things
>   like "cvs diff" you will walk files individually anyway. People
>   *accept* (and I cannot imagine why) that an empty "cvs diff" on some
>   big project will take minutes. And the problems aren't even about
>   keyword expansion - keyword expansion is just a small detail.

if you define the keyword to be equivalent there is no need to look at the 
content of all the files.

> - it's centralized in more ways than one. You are simply not expected to
>   work by applying patches between two unrelated CVS trees. It's not
>   done. It cannot work. The closest you get is
> 	(a) merging. Which is *hell*. Again, keyword expansion is just a
> 	    small detail in why it's hell, and people don't generally pick
> 	    it up exactly because the merge problems are so much bigger.
> 	(b) applying patches from the outside from people who do *not* use
> 	    CVS, and thus don't generally touch things around the
> 	    keywords (but even here, you actually end up having problems
> 	    occasionally).

external patches could be a problem, but there are two ways to deal with them.

1. have the patch be against the version of the file with the keywords expanded, 
and have the result checked in (collapsing the keywords)

2. have the patch be against the version of the file with the keywords 
collapsed. this _would_ require the ability to bypass the expansion of the 
keywords and is not something you would want to do very much.

of these two, I suspect that #1 would make sense in most cases, and should be 
the default.

> I'll finish off trying to explain the problem in fundamental git terms:
> say you have a repository with two branches, A and B, and different
> history  on a file "xyzzy" in those two branches, but because they both
> ended up applying the same patches, the actual file contents do end up
> being 100% identical. So they have the same SHA1.
>
> What is
>
> 	git diff A..B -- xyzzy
>
> supposed to print?
>
> And *I* claim that if you don't get an immediate and empty diff, your
> system is TOTALLY BROKEN.

I agree, and what I've been talking about above would produce exactly this.

> And now think about what keywords do. And realize that keywords are
> TOTALLY BROKEN!

it may be that we are thinking of different things when we use the term 
'keywords', and that may be why we are seeing different levels of problems.

David Lang

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of  shallowtrees
  2007-04-16 23:25                       ` Weird shallow-tree conversion state, and branches of shallowtrees David Lang
@ 2007-04-17 19:50                         ` David Lang
  0 siblings, 0 replies; 34+ messages in thread
From: David Lang @ 2007-04-17 19:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Parkins, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

sorry for the re-send, I didn't see this go through the list and it's relavent 
to the current discussion

On Mon, 16 Apr 2007, David Lang wrote:

> Date: Mon, 16 Apr 2007 16:25:37 -0700 (PDT)
> 
> I have a different situation where I'm interested in keyword expansions, and 
> am waiting for the appropriate hooks to be added to git to allow be to use 
> it.
>
> I have a bunch of config files on different servers that are logicly 
> equivalent, even though they have different values in some fields there is a 
> translation table in my software that tells it what to do.
>
> I'd really like to have a version control repository that I can 
> share/more/replicate across the machines. to do this on checkin the software 
> would need to run my helper to create a 'generic' version and check that in. 
> on checkout it would need to run my helper to take the generic version and 
> make the host specific version.
>
> a lot of the problems taht you refer to in your message apply to most of the 
> things that have been discussed related to gitattributes.
>
> if improperly used it can corrupt the data (either by the checkin/checkout 
> munging or inappropriately merging things)
>
> it breaks the 1-1 coorespondance between the packed version and the checked 
> out version.
>
> On Mon, 16 Apr 2007, Linus Torvalds wrote:
>
>> On Mon, 16 Apr 2007, Andy Parkins wrote:
>> 
>> [ Ok, take a break here, and think about why "keyword expansion" might be
>>  a problem for "git rebase" in a way that CRLF is not, before you read on ]
>> 
>> Hint: the reason statefulness is broken for things like "git rebase" is
>> that the natural operation for something like that is to generate a patch,
>> and carry it forward. Now, what is in the patch? Keywords. Will the patch
>> apply to the target? Yes? No?
>
> if you send a patch, that patch needs to be relative to the connonical 
> version, namely what's checked into the SCM. if your patch includes keywords 
> it won't apply cleanly to a checked-out of the file. any mergeing and merge 
> resolution needs to be based on the connonical version (i.e. one that doesn't 
> go through the checkin/out conversion)
>
>> See? Keywords means that you suddenly have merge problems with something
>> as simple as patches. Does this matter in CVS? Not often. CVS is so
>> limited that you cannot much do those operations anyway, but if you've
>> ever done a merge in CVS, keyword expansion tends to be one of the things
>> that just make it more complicated. So now you have to remember flags like
>> like "-kk" that disable keywords.
>
> I don't think the problems with patches are insurmountable. if everyone in 
> the project is useing git then you don't have to worry about anything, things 
> will just work (except for manually fixing failed merges)
>
> I would definantly agree that sprinkling a little of this into a large 
> project is going to massivly confuse people
>
>> Or what about generating a diff between two branches? Keywords are a total
>> *nightmare*. Do you realize just how *fast* git is in diff generation.
>> Have you ever done "cvs diff"? Have you ever *thought* about how git can
>> be so fast? Hint: we don't even *look* at the contents for most files. But
>> if the content is "generated" depending on history, you just screwed that
>> up too.
>
> you do a diff of the connonical files in the repository, the same way you do 
> today.
>
>> Or what about something as seemingly unrelated as "git grep". You may not
>> even *realize* how nasty a problem it is when you have two different
>> representations of the same data: one that has keywords in it and is
>> checked out, and one that does not. Which one should you choose? Which one
>> is the right one? What about the git optimization of using the checked-out
>> data because it doesn't need any unpacking?
>
> the one with the keywords is the one to choose. and you suffer a performance 
> hit becouse you can't use the checked-out version (without running it through 
> the conversion, which is a performance hit itslef)
>
>> And the whole keyword issue gets *worse* when you move between
>> repositories. If you stay "inside" the SCM, you can generally teach it to
>> ignore them. For example, going back to the "git rebase" example (or the
>> "git grep" one, for that matter), you can just define that it's done
>> without keyword expansion.
>
> right, this would avoid most of the problems
>
>> But when you move the data between people? That's exactly where keyword
>> expansion is enabled, and now you not only make things like "git diff"
>> fundamentally broken and much much slower (in fact, it *cannot*work* in
>> the git model, because we don't even *have* tree history, so you cannot
>> add keywords to a tree!), you also guarantee that the end result is much
>> less useful, because now when you send the patch to others, they'll have
>> all the same issues that you had to work around locally.
>
> why would you do keyword expansion when moving the files between different 
> people's repositories? or is that still considered 'inside the SCM'?
>
>> I don't know if I can convince you, but take it from me, keyword expansion
>> is fundamentally broken in the first place, but it's *more* so with git
>> than with CVS, for example.
>> 
>> In CVS, the reason you can do keyword expansion in the first place is:
>> 
>> - it's file-based to begin with. A file actually *has* history in CVS, in
>>   a way it fundmanentally does *not* have in git. So when you generate a
>>   diff on a file, the revision information is "just there". That's simply
>>   not true in git. There *is* no per-file revision information. You
>>   cannot know who touched the file last, for example, without starting
>>   from a commit, and doing very expensive things.
>
> this is a valid argument against the keyword being a version string. it's not 
> nessasarily relavent to other uses.
>
>> - it's slow to begin with. This is related to the above thing: exactly
>>   because CVS is file-based and not content-based, when you do things
>>   like "cvs diff" you will walk files individually anyway. People
>>   *accept* (and I cannot imagine why) that an empty "cvs diff" on some
>>   big project will take minutes. And the problems aren't even about
>>   keyword expansion - keyword expansion is just a small detail.
>
> if you define the keyword to be equivalent there is no need to look at the 
> content of all the files.
>
>> - it's centralized in more ways than one. You are simply not expected to
>>   work by applying patches between two unrelated CVS trees. It's not
>>   done. It cannot work. The closest you get is
>> 	(a) merging. Which is *hell*. Again, keyword expansion is just a
>> 	    small detail in why it's hell, and people don't generally pick
>> 	    it up exactly because the merge problems are so much bigger.
>> 	(b) applying patches from the outside from people who do *not* use
>> 	    CVS, and thus don't generally touch things around the
>> 	    keywords (but even here, you actually end up having problems
>> 	    occasionally).
>
> external patches could be a problem, but there are two ways to deal with 
> them.
>
> 1. have the patch be against the version of the file with the keywords 
> expanded, and have the result checked in (collapsing the keywords)
>
> 2. have the patch be against the version of the file with the keywords 
> collapsed. this _would_ require the ability to bypass the expansion of the 
> keywords and is not something you would want to do very much.
>
> of these two, I suspect that #1 would make sense in most cases, and should be 
> the default.
>
>> I'll finish off trying to explain the problem in fundamental git terms:
>> say you have a repository with two branches, A and B, and different
>> history  on a file "xyzzy" in those two branches, but because they both
>> ended up applying the same patches, the actual file contents do end up
>> being 100% identical. So they have the same SHA1.
>> 
>> What is
>>
>> 	git diff A..B -- xyzzy
>> 
>> supposed to print?
>> 
>> And *I* claim that if you don't get an immediate and empty diff, your
>> system is TOTALLY BROKEN.
>
> I agree, and what I've been talking about above would produce exactly this.
>
>> And now think about what keywords do. And realize that keywords are
>> TOTALLY BROKEN!
>
> it may be that we are thinking of different things when we use the term 
> 'keywords', and that may be why we are seeing different levels of problems.
>
> David Lang
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16 15:58                     ` Linus Torvalds
  2007-04-16 23:25                       ` Weird shallow-tree conversion state, and branches of shallowtrees David Lang
@ 2007-04-17  9:45                       ` Andy Parkins
  1 sibling, 0 replies; 34+ messages in thread
From: Andy Parkins @ 2007-04-17  9:45 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

On Monday 2007 April 16 16:58, Linus Torvalds wrote:

Thank you for the detailed response.  My apologies for the delay in replying, 
I did write and send a response, but it's gone missing in the world of 
google's SMTP server.  I'll try and resend when I return home.


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16  9:03                   ` Andy Parkins
  2007-04-16 15:54                     ` Sven Verdoolaege
  2007-04-16 15:58                     ` Linus Torvalds
@ 2007-04-16 19:41                     ` Junio C Hamano
  2007-04-16 20:55                       ` Andy Parkins
  2 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2007-04-16 19:41 UTC (permalink / raw)
  To: Andy Parkins
  Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

Andy Parkins <andyparkins@gmail.com> writes:

> On Sunday 2007 April 15 21:51, Linus Torvalds wrote:
>
>> > Now, I print out that diagram and pin it to my wall - sometimes copies
>> > of it are given to others.  I do this on a regular basis.
>>
>> And is there *any* reason why you don't just do that as an "export"
>> option, when it's very clear that people won't send diffs that include it
>
> Of course there is a reason - the file I edit is the SVG itself, in inkscape 
> while editing that file I press "print" to get a print out.  Why on earth 
> would I want to jump through hoops by closing the file I'm editing, running 
> some export script to a temporary file that I don't want, then open up 
> Inkscape again, check the export looks okay and then print - on what planet 
> is /that/ simpler?

I have one question.

In your workflow, when do you "print"?

If you did this:

	$ cvs update draw.svg
        $ inkscape draw.svg
        ... do more editing
        ... press "PRINT"
	$ cvs diff draw.svg

the final "cvs diff" would say you have such and such changes to
the drawing file you just printed since the checked-in version.
However, doesn't "$Id: ... $" embedded in the printed copy say
it is from the last checked-in version?

Is inkscape aware of the "$Id: ... $" keyword and modifies such
string by munging it to "$Id: ..., modified $", once you make a
local modification to the document?  Otherwise you cannot tell
if the printed copy is pristine and match what the $Id$ keyword
claims it is.

Or maybe in your workflow, such a local modification may not
actually matter because you made a habit of not making a drastic
edit before printing.

Or perhaps maybe you never print a locally modified copy.

Does Inkscape have a batch mode operation?  It might be an
option to have something like this in the Makefile if it does (I
do not know if it does, and if so what the syntax is, so this is
totally made up):

        print:: draw.svg
                describe=$(git describe HEAD) && \
                git cat-file -p HEAD:draw.svg | \
                sed -e 's/$$Id$$/$$Id: '"$$described"'/g' | \
                inkscape --print --stdin
	.PHONY: print

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16 19:41                     ` Junio C Hamano
@ 2007-04-16 20:55                       ` Andy Parkins
  2007-04-17 21:24                         ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Parkins @ 2007-04-16 20:55 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Linus Torvalds, Nguyen Thai Ngoc Duy,
	Shawn O. Pearce, Robin H. Johnson

On Monday 2007, April 16, Junio C Hamano wrote:

> In your workflow, when do you "print"?

After a save and commit.  Otherwise - as you point out, the id is wrong.

> the final "cvs diff" would say you have such and such changes to
> the drawing file you just printed since the checked-in version.
> However, doesn't "$Id: ... $" embedded in the printed copy say
> it is from the last checked-in version?

Yep.  You will get no argument from me that keywords are by no means 
definitive.

> Is inkscape aware of the "$Id: ... $" keyword and modifies such
> string by munging it to "$Id: ..., modified $", once you make a

Nope.  Inkscape knows nothing about the expansion.  However, even if I 
wasn't careful to only print out checked in files, it would still 
narrow down the possible versions to one of two.

> local modification to the document?  Otherwise you cannot tell
> if the printed copy is pristine and match what the $Id$ keyword
> claims it is.

Correct.  Every user of keywords is aware that the keyword doesn't 
update all the time - in fact there's nothing to stop you changing the 
keyword yourself to an utter lie.  I think the assumption is that you 
aren't fighting your own tools though.

> Or maybe in your workflow, such a local modification may not
> actually matter because you made a habit of not making a drastic
> edit before printing.

Yep.

> Or perhaps maybe you never print a locally modified copy.

Yep.  In fact, for me, most of the time I'm printing a diagram that was 
checked in a number of revisions ago.  It's not the case that I 
modify-print.  However, that's just me.

> Does Inkscape have a batch mode operation?  It might be an
> option to have something like this in the Makefile if it does (I
> do not know if it does, and if so what the syntax is, so this is
> totally made up):

I think it does as it happens; and your little script is just the sort 
of thing I will use when I get around to fixing this hole.

However, it's missing the point to take my example as an unsolved 
problem - there are plenty of ways I can get what I want; I brought it 
up merely as a counter to the statement that there were no valid 
situations for wanting keyword expansion.

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-16 20:55                       ` Andy Parkins
@ 2007-04-17 21:24                         ` Junio C Hamano
  2007-04-17 21:51                           ` Andy Parkins
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2007-04-17 21:24 UTC (permalink / raw)
  To: Andy Parkins
  Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce,
	Robin H. Johnson

Andy Parkins <andyparkins@gmail.com> writes:

> However, it's missing the point to take my example as an unsolved 
> problem - there are plenty of ways I can get what I want; I brought it 
> up merely as a counter to the statement that there were no valid 
> situations for wanting keyword expansion.

That's actually quite different from what you said.

Andy Parkins <andyparkins@gmail.com> writes:

> Of course there is a reason - the file I edit is the SVG
> itself, in inkscape while editing that file I press "print" to
> get a print out.  Why on earth would I want to jump through
> hoops by closing the file I'm editing, running some export
> script to a temporary file that I don't want, then open up
> Inkscape again, check the export looks okay and then print -
> on what planet is /that/ simpler?

You were claiming that with built-in keyword expansion what you
want becomes /simpler/.  I questioned that.

Maybe it's just me, who is not a GUI person [*1*], but to me,
having to start inkscape, mouse around to find the "Print"
button and print feels much more cumbersome than simply typing
"make print".

[Footnote]

*1* Not in the sense I do not program GUIy applications, but in
the sense that I do not usually _use_ GUI applications.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-17 21:24                         ` Junio C Hamano
@ 2007-04-17 21:51                           ` Andy Parkins
  0 siblings, 0 replies; 34+ messages in thread
From: Andy Parkins @ 2007-04-17 21:51 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Linus Torvalds, Nguyen Thai Ngoc Duy,
	Shawn O. Pearce, Robin H. Johnson

On Tuesday 2007, April 17, Junio C Hamano wrote:
> Andy Parkins <andyparkins@gmail.com> writes:
> > However, it's missing the point to take my example as an unsolved
> > problem - there are plenty of ways I can get what I want; I brought
> > it up merely as a counter to the statement that there were no valid
> > situations for wanting keyword expansion.
>
> That's actually quite different from what you said.

Sorry; I didn't express it very well - the thing that started all this 
was the statement that there was no valid use case for keywords.  I 
just gave an example.  I felt that the thread was moving away from 
keywords and towards solving my particular problem - which is all 
appreciated, but wasn't the point.  Running makefile recipes or extra 
scripts are all valid methods and pragmatic 
working-with-what-git-does-now solutions.  I wanted to distinguish 
between what I could do now and what I could do with keyword support.

> You were claiming that with built-in keyword expansion what you
> want becomes /simpler/.  I questioned that.

Well it does from the point of view of pressing "print".

> Maybe it's just me, who is not a GUI person [*1*], but to me,
> having to start inkscape, mouse around to find the "Print"
> button and print feels much more cumbersome than simply typing
> "make print".

Again, that was addressing my particular problem - good stuff.  However, 
it's just luck that inkscape has a batch mode - there's no guarantee 
for that.

I could just swap the example around a bit, what about if it was an 
OpenOffice document that I want to have transparent 
compression/decompression and I've set the properties tag to 
contain "$Id$".  There is no amount of scripting that will enable batch 
printing of that.

Anyway - I've wasted enough of your time with this foolishness now.  
It's dropped, consider me silenced on this subject ;-)

Andy

-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Weird shallow-tree conversion state, and branches of shallow trees
  2007-04-15  4:31         ` Shawn O. Pearce
  2007-04-15  5:57           ` Nguyen Thai Ngoc Duy
@ 2007-04-15  9:44           ` Robin H. Johnson
  1 sibling, 0 replies; 34+ messages in thread
From: Robin H. Johnson @ 2007-04-15  9:44 UTC (permalink / raw)
  To: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2748 bytes --]

On Sun, Apr 15, 2007 at 12:31:46AM -0400, Shawn O. Pearce wrote:
> Mail them a DVD of the Git import, have them load it locally,
> and use --reference for all future clones.  With Git its possible
> to build fast throwaway trees from any random URL, so long as you
> keep at least one repository available locally to act as a reference.
Ok, that makes it even more worthwhile for them to keep one tree
locally, I didn't think of that :-).

> > the commit is accepted. The 'update' hook documentation suggests that
> > ACLs should be possible and implemented via that.
> Yes.  I run probably the most paranoid update hook in existance.
> If you want a copy let me know, I'll send it to you.  Its a Perl
> script that verifies the 'committer ' line matches the UNIX uid (by
> doing a table lookup) for every new commit or tag being introduced
> to the repository.  It also verifies that the user can update that
> branch, create it, delete it, or rewind it.
> 
> It sounds like you would need to add some additional rules about
> specific paths being modified only by certain people in certain
> branches (for the SELinux stuff), and running other validations in
> the documentation (whatever that is).
Yes please, it would be greatly appreciated. I'll hack path ACLs into
it, and feed it back to contrib/? (CVS and SVN ship ACL stuff in their
contrib/, so we could probably follow suite safely).

> What you could do is create a program that mangles the files before
> delivery.  You would probably want to do something like:
> 
>   $Id: 7fbf239:path/to/file$
There's one core problem with mangled after the fact there:
It's going to break checksum/gpg verification later.
Here's the existing CVS process as a comparison.
1. Developer creates/changes foo-1.2.ebuild. (cvs add, but not cvs ci).
2. Runs the local verify+commit tool (repoman).
(these steps are done by repoman now)
3. Generates the initial Manifest (contains SHA256/MD5/RIPEMD160 etc.).
4. Commits the initial Manifest AND the files from the developer.
5. Gegenerated Manifest because of any keywords in the files.
6. Manifest is clear-signed with gpg.
7. Signed Manifest is committed.

We can't require the re-processing of the files before they can be
verified, as that removes the ability for users to easily verify them
with standard tools (md5sum,sha256sum).

The direct conversion of such a process to insert the $Id$ and then
re-commit that $Id$ runs into chicken-and-egg problems as well, so
either git needs to insert the keyword, or the file can't be changed.

-- 
Robin Hugh Johnson
Gentoo Linux Developer & Council Member
E-Mail     : robbat2@gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

[-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2007-04-17 21:51 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-12  0:53 Weird shallow-tree conversion state, and branches of shallow trees Robin H. Johnson
2007-04-14  8:56 ` Johannes Schindelin
2007-04-15  0:03   ` Robin H. Johnson
2007-04-15  0:02     ` David Lang
2007-04-15  2:01       ` Robin H. Johnson
2007-04-15  4:31         ` Shawn O. Pearce
2007-04-15  5:57           ` Nguyen Thai Ngoc Duy
2007-04-15  8:54             ` Jakub Narebski
2007-04-15 18:18             ` Linus Torvalds
2007-04-15 19:51               ` Andy Parkins
2007-04-15 20:51                 ` Linus Torvalds
2007-04-16  0:11                   ` Bill Lear
2007-04-16  9:10                     ` Andy Parkins
2007-04-16 15:17                       ` Julian Phillips
2007-04-16  2:17                   ` Robin H. Johnson
2007-04-16  3:01                     ` Theodore Tso
2007-04-16  3:23                       ` Nguyen Thai Ngoc Duy
2007-04-16 15:08                         ` Linus Torvalds
2007-04-16 16:06                           ` Nguyen Thai Ngoc Duy
2007-04-16  3:32                       ` Robin H. Johnson
2007-04-16 17:00                         ` Linus Torvalds
2007-04-17  4:16                         ` Daniel Barkalow
2007-04-16 14:59                     ` Linus Torvalds
2007-04-16  9:03                   ` Andy Parkins
2007-04-16 15:54                     ` Sven Verdoolaege
2007-04-16 15:58                     ` Linus Torvalds
2007-04-16 23:25                       ` Weird shallow-tree conversion state, and branches of shallowtrees David Lang
2007-04-17 19:50                         ` David Lang
2007-04-17  9:45                       ` Weird shallow-tree conversion state, and branches of shallow trees Andy Parkins
2007-04-16 19:41                     ` Junio C Hamano
2007-04-16 20:55                       ` Andy Parkins
2007-04-17 21:24                         ` Junio C Hamano
2007-04-17 21:51                           ` Andy Parkins
2007-04-15  9:44           ` Robin H. Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).