* Weird shallow-tree conversion state, and branches of shallow trees @ 2007-04-12 0:53 Robin H. Johnson 2007-04-14 8:56 ` Johannes Schindelin 0 siblings, 1 reply; 34+ messages in thread From: Robin H. Johnson @ 2007-04-12 0:53 UTC (permalink / raw) To: Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2187 bytes --] I was doing some random tests with shallow trees, and ran into two issues - the first is a shallow tree that doesn't extend anymore when it should, and the second is some branched shallow tree trouble. 1. (I was using the kernel.org Git repo for my testing here) > git clone --depth 1 git://GIT-REMOTE-URL > # do some local commits > git pull --depth 1000000 # some very large number, to try and add all the history At this point, I noticed that my tree still seemed to be shallow, and no matter what I tried, I couldn't un-shallow it. .git/shallow contained a single line: > 9c405082d96ed7a7ed830f9861dbad9a32e4d268 And moving the shallow file out the way, fsck --full gets me: > broken link from commit 9c405082d96ed7a7ed830f9861dbad9a32e4d268 > to commit bb3e781d7f6259eb414cbecd8bad74cd4a188b41 > broken link from commit 9c405082d96ed7a7ed830f9861dbad9a32e4d268 > to commit 9bfbe261923f4e9d89f65e6755fa6501aa6531b0 > missing commit bb3e781d7f6259eb414cbecd8bad74cd4a188b41 > missing commit 9bfbe261923f4e9d89f65e6755fa6501aa6531b0 Any ideas on why it's not going to full depth? I don't have a reliable test case for this yet, sometimes it does go deep properly, sometimes it doesn't. 2. Again about shallow repos, a development problem I ran into. > git clone --depth 1 git://GIT-REMOTE-URL > git checkout -b working-branch > # do various work, and git-commit the changes > git checkout master > git pull > # some time goes by, and you want the latest upstream changes > git checkout working-branch > git pull . master The last pull from the local master fails. This seems weird, because if working-branch development is done on the master instead, the earlier pull never complains. So in this case, the working-branch should be able to pull from the local master branch fine. This bug basically stops people from being able to take a shallow clone of a repository with a lot of history, and have multiple working branches on it. -- Robin Hugh Johnson Gentoo Linux Developer & Council Member E-Mail : robbat2@gentoo.org GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 [-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-12 0:53 Weird shallow-tree conversion state, and branches of shallow trees Robin H. Johnson @ 2007-04-14 8:56 ` Johannes Schindelin 2007-04-15 0:03 ` Robin H. Johnson 0 siblings, 1 reply; 34+ messages in thread From: Johannes Schindelin @ 2007-04-14 8:56 UTC (permalink / raw) To: Robin H. Johnson; +Cc: Git Mailing List Hi, On Wed, 11 Apr 2007, Robin H. Johnson wrote: > I was doing some random tests with shallow trees, and ran into two > issues - the first is a shallow tree that doesn't extend anymore when it > should, and the second is some branched shallow tree trouble. Ah! Seems we finally have a user for shallow clones! ;-) Seriously again: I am at fault for putting the shallow support into Git, failing to provide sensible test cases. This was partly due to my laziness, and partly due to the overwhelming lack of demand. I am in the middle of moving (haven't reached my destination yet), so I will take a couple more days until I can look into your problems. If you find out in the meantime what is happening, please share the information with us. Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-14 8:56 ` Johannes Schindelin @ 2007-04-15 0:03 ` Robin H. Johnson 2007-04-15 0:02 ` David Lang 0 siblings, 1 reply; 34+ messages in thread From: Robin H. Johnson @ 2007-04-15 0:03 UTC (permalink / raw) To: Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2192 bytes --] On Sat, Apr 14, 2007 at 10:56:10AM +0200, Johannes Schindelin wrote: > Ah! Seems we finally have a user for shallow clones! ;-) Heh. I'm specifically looking at git, trying to resolve the deficiencies that were identified during by one of our (Gentoo) SoC2006 projects, on the potential migration of the Gentoo CVS. Git has matured tremendously since then. The primary Gentoo CVS module (gentoo-x86), has 234672 files tracked, and 1309603 CVS revisions. Between 350k and 500k changesets, depending on how you merge those revisions. Couple of the things that were identified either in the SoC project, or since then. - Shallow history checkouts are important to our low-bandwidth ebuild-tree developers (people in places with 33.6k modems, because the phone lines don't work well enough for 56k), or other high latency setups. - Shallow tree (subtree) checkouts, for the developers that focus on specific portions of large modules and have no interest in the rest of the that tree. Eg. Releng does their work in gentoo/src/releng. - ACLs specific to subtree commits. Something similar to the cvs_acls.pl that FreeBSD uses would be great. Eg gentoo-x86/sec-policy/ is restricted to members of the security team (SELinux policies). - CVS Keyword-like behavior, to specifically place the path and revision of certain files into the file directly, for ease of tracking when the file is removed from it's original surrounding. I know this one is going to draw some flack, but it's a very common practice for a user to copy a file out of the CVS tree, make some modifications, and then post the entire changed version up, esp. when the size of the changes exceeds the size of diff. > Seriously again: I am at fault for putting the shallow support into Git, > failing to provide sensible test cases. This was partly due to my > laziness, and partly due to the overwhelming lack of demand. I still haven't figured out a decent testcase for this, I need to dig harder. -- Robin Hugh Johnson Gentoo Linux Developer & Council Member E-Mail : robbat2@gentoo.org GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 [-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 0:03 ` Robin H. Johnson @ 2007-04-15 0:02 ` David Lang 2007-04-15 2:01 ` Robin H. Johnson 0 siblings, 1 reply; 34+ messages in thread From: David Lang @ 2007-04-15 0:02 UTC (permalink / raw) To: Robin H. Johnson; +Cc: Git Mailing List On Sat, 14 Apr 2007, Robin H. Johnson wrote: > On Sat, Apr 14, 2007 at 10:56:10AM +0200, Johannes Schindelin wrote: >> Ah! Seems we finally have a user for shallow clones! ;-) > Heh. I'm specifically looking at git, trying to resolve the deficiencies > that were identified during by one of our (Gentoo) SoC2006 projects, on > the potential migration of the Gentoo CVS. Git has matured tremendously > since then. > > The primary Gentoo CVS module (gentoo-x86), has 234672 files tracked, > and 1309603 CVS revisions. Between 350k and 500k changesets, depending > on how you merge those revisions. > > Couple of the things that were identified either in the SoC project, or > since then. > - Shallow history checkouts are important to our low-bandwidth > ebuild-tree developers (people in places with 33.6k modems, because > the phone lines don't work well enough for 56k), or other high latency > setups. note that for people on low-bandwideth lines, makeing too shallow a checkout can actually end up costing more over time (they will have to pull full revisions since they don't have the earlier versions to just pull a diff against) > - Shallow tree (subtree) checkouts, for the developers that focus on > specific portions of large modules and have no interest in the rest of > the that tree. Eg. Releng does their work in gentoo/src/releng. this could either be shallow tree or subproject, depending on how you end up orginizing things. > - ACLs specific to subtree commits. Something similar to the cvs_acls.pl > that FreeBSD uses would be great. Eg gentoo-x86/sec-policy/ is > restricted to members of the security team (SELinux policies). since git isn't designed with a single repository, it also doesn't need to worry about acl's (in fact, i don't think it has the concept of permissions at all). this is up to the people maintaining the 'master' repository to pull from the right people > - CVS Keyword-like behavior, to specifically place the path and revision > of certain files into the file directly, for ease of tracking when the > file is removed from it's original surrounding. I know this one is > going to draw some flack, but it's a very common practice for a user > to copy a file out of the CVS tree, make some modifications, and then > post the entire changed version up, esp. when the size of the changes > exceeds the size of diff. I'm not understanding why you need this. git tracks the file content, not the diffs betwen files. a developer does their work and git figures out when you do a pull if it's better to send the file or a diff (and if you are sending a diff, what you are doing the diff against, it may not be the file that had that name before) there's no need to place the path and revision in the file itself. David Lang ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 0:02 ` David Lang @ 2007-04-15 2:01 ` Robin H. Johnson 2007-04-15 4:31 ` Shawn O. Pearce 0 siblings, 1 reply; 34+ messages in thread From: Robin H. Johnson @ 2007-04-15 2:01 UTC (permalink / raw) To: Git Mailing List [-- Attachment #1: Type: text/plain, Size: 4443 bytes --] On Sat, Apr 14, 2007 at 05:02:47PM -0700, David Lang wrote: > > - Shallow history checkouts are important to our low-bandwidth > > ebuild-tree developers (people in places with 33.6k modems, because > > the phone lines don't work well enough for 56k), or other high latency > > setups. > note that for people on low-bandwideth lines, makeing too shallow a checkout > can actually end up costing more over time (they will have to pull full > revisions since they don't have the earlier versions to just pull a diff > against) Yes, I'm aware that it may be more efficient over the long term for them to pull given blocks, and I'm going to recommend that developers have a full history anyway, but I suspect that they will still make heavy use of shallow trees, esp. as some do throwaway trees often. (This one is a moot point anyway, the shallow history support in Git is pretty much done baring the bugs I posted about previously). > > - Shallow tree (subtree) checkouts, for the developers that focus on > > specific portions of large modules and have no interest in the rest of > > the that tree. Eg. Releng does their work in gentoo/src/releng. > this could either be shallow tree or subproject, depending on how you end up > orginizing things. shallow tree, because we really do have people that check out arbitrary sub-divisions (the web translation teams come to mind, they just have checkouts of English and their own language), and going sub-project would be insane for that. > > - ACLs specific to subtree commits. Something similar to the cvs_acls.pl > > that FreeBSD uses would be great. Eg gentoo-x86/sec-policy/ is > > restricted to members of the security team (SELinux policies). > since git isn't designed with a single repository, it also doesn't need to > worry about acl's (in fact, i don't think it has the concept of permissions > at all). this is up to the people maintaining the 'master' repository to > pull from the right people I should have mentioned that we aren't following the kernel model here. All of the developers will have git+ssh access to the central tree, to push their own changes to it. On a similar tangent, in some subtrees (our documentation mainly) we have server-side validation tests before the commit is accepted. The 'update' hook documentation suggests that ACLs should be possible and implemented via that. > > - CVS Keyword-like behavior, to specifically place the path and revision > > of certain files into the file directly, for ease of tracking when the > > file is removed from it's original surrounding. I know this one is > > going to draw some flack, but it's a very common practice for a user > > to copy a file out of the CVS tree, make some modifications, and then > > post the entire changed version up, esp. when the size of the changes > > exceeds the size of diff. > I'm not understanding why you need this. git tracks the file content, not > the diffs betwen files. a developer does their work and git figures out when > you do a pull if it's better to send the file or a diff (and if you are > sending a diff, what you are doing the diff against, it may not be the file > that had that name before) The tree that goes out to users is NOT git or CVS. What you point to here is impossible unless we forced all of the users to migrate to git (a truly herculean task if there was ever one). It's a tarball or an rsync of an automatically managed CVS checkout. (Tarballs go onto the release media, and are also widely used by those that sneaker-net their trees to machines for security reasons). Alternatively, the users browse the viewcvs, and pull something from the Attic. Regardless of where they get the file from, the problem is that the file doesn't contain any markers to help the developers merge it back again. A frequent occurrence of this is where the user takes rev X of a file (because it was the latest one at the time), makes a local (non version-controlled) copy, and submits it back our Bugzilla some months down the line. Thanks to the $Header$ in the file he submits, we can produce a diff against the original revision, and figure out how best to merge it with the latest revision. -- Robin Hugh Johnson Gentoo Linux Developer & Council Member E-Mail : robbat2@gentoo.org GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 [-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 2:01 ` Robin H. Johnson @ 2007-04-15 4:31 ` Shawn O. Pearce 2007-04-15 5:57 ` Nguyen Thai Ngoc Duy 2007-04-15 9:44 ` Robin H. Johnson 0 siblings, 2 replies; 34+ messages in thread From: Shawn O. Pearce @ 2007-04-15 4:31 UTC (permalink / raw) To: Robin H. Johnson; +Cc: Git Mailing List "Robin H. Johnson" <robbat2@gentoo.org> wrote: > On Sat, Apr 14, 2007 at 05:02:47PM -0700, David Lang wrote: > > > - Shallow history checkouts are important to our low-bandwidth > > > ebuild-tree developers (people in places with 33.6k modems, because > > > the phone lines don't work well enough for 56k), or other high latency > > > setups. > > note that for people on low-bandwideth lines, makeing too shallow a checkout > > can actually end up costing more over time (they will have to pull full > > revisions since they don't have the earlier versions to just pull a diff > > against) Mail them a DVD of the Git import, have them load it locally, and use --reference for all future clones. With Git its possible to build fast throwaway trees from any random URL, so long as you keep at least one repository available locally to act as a reference. The speed at which a DVD (or small box of CDs) travels through the various postal systems might very well be faster than 33.6k modem. :-) > I should have mentioned that we aren't following the kernel model here. > All of the developers will have git+ssh access to the central tree, to > push their own changes to it. On a similar tangent, in some subtrees > (our documentation mainly) we have server-side validation tests before > the commit is accepted. The 'update' hook documentation suggests that > ACLs should be possible and implemented via that. Yes. I run probably the most paranoid update hook in existance. If you want a copy let me know, I'll send it to you. Its a Perl script that verifies the 'committer ' line matches the UNIX uid (by doing a table lookup) for every new commit or tag being introduced to the repository. It also verifies that the user can update that branch, create it, delete it, or rewind it. It sounds like you would need to add some additional rules about specific paths being modified only by certain people in certain branches (for the SELinux stuff), and running other validations in the documentation (whatever that is). > The tree that goes out to users is NOT git or CVS. What you point to > here is impossible unless we forced all of the users to migrate to git > (a truly herculean task if there was ever one). > It's a tarball or an rsync of an automatically managed CVS checkout. > (Tarballs go onto the release media, and are also widely used by those > that sneaker-net their trees to machines for security reasons). > Alternatively, the users browse the viewcvs, and pull something from the > Attic. Regardless of where they get the file from, the problem is that > the file doesn't contain any markers to help the developers merge it > back again. Git won't do this for you. We specifically don't mangle source[*1*]. What you could do is create a program that mangles the files before delivery. You would probably want to do something like: $Id: 7fbf239:path/to/file$ where 7fbf239 is the earliest commit that introduced that particular version of path/to/file, even if that is months old. That would be most like what CVS would do. 8 char abbreviated commits should be reasonably stable, and not too long to read or copy and paste. A format like the above would also be easy to grab and copy into a Git command line. If we had a Git library that could access the repository, this would a pretty easy program to write. You are basically blaming each path in the current HEAD commit on the parent, until you cannot blame anyone else for that path. You do this blame on the entire tree, and then output the munged structure (or only the files you want munged). Its good we have a GSoC project working on libification! ;-) [*1*] Yes, I'm ignoring the nutso crlf support that's now in... Even though I work on Windows, the only true line ending is LF. ;-) -- Shawn. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 4:31 ` Shawn O. Pearce @ 2007-04-15 5:57 ` Nguyen Thai Ngoc Duy 2007-04-15 8:54 ` Jakub Narebski 2007-04-15 18:18 ` Linus Torvalds 2007-04-15 9:44 ` Robin H. Johnson 1 sibling, 2 replies; 34+ messages in thread From: Nguyen Thai Ngoc Duy @ 2007-04-15 5:57 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Robin H. Johnson, Git Mailing List On 4/15/07, Shawn O. Pearce <spearce@spearce.org> wrote: > > The tree that goes out to users is NOT git or CVS. What you point to > > here is impossible unless we forced all of the users to migrate to git > > (a truly herculean task if there was ever one). > > It's a tarball or an rsync of an automatically managed CVS checkout. > > (Tarballs go onto the release media, and are also widely used by those > > that sneaker-net their trees to machines for security reasons). > > Alternatively, the users browse the viewcvs, and pull something from the > > Attic. Regardless of where they get the file from, the problem is that > > the file doesn't contain any markers to help the developers merge it > > back again. > > Git won't do this for you. We specifically don't mangle source[*1*]. > > What you could do is create a program that mangles the files before > delivery. You would probably want to do something like: > > $Id: 7fbf239:path/to/file$ > > where 7fbf239 is the earliest commit that introduced that particular > version of path/to/file, even if that is months old. That would > be most like what CVS would do. 8 char abbreviated commits should > be reasonably stable, and not too long to read or copy and paste. > A format like the above would also be easy to grab and copy into > a Git command line. > > If we had a Git library that could access the repository, this would > a pretty easy program to write. You are basically blaming each path > in the current HEAD commit on the parent, until you cannot blame > anyone else for that path. You do this blame on the entire tree, > and then output the munged structure (or only the files you want > munged). > > Its good we have a GSoC project working on libification! ;-) > > [*1*] Yes, I'm ignoring the nutso crlf support that's now in... Even > though I work on Windows, the only true line ending is LF. ;-) Can we add an attribute like Subversion's svn:keywords? If the attribute is set, we expand keywords when checkout and remove expansion in memory before doing any git operations. It's some kind of I/O filter for working directory access. -- Duy ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 5:57 ` Nguyen Thai Ngoc Duy @ 2007-04-15 8:54 ` Jakub Narebski 2007-04-15 18:18 ` Linus Torvalds 1 sibling, 0 replies; 34+ messages in thread From: Jakub Narebski @ 2007-04-15 8:54 UTC (permalink / raw) To: git Nguyen Thai Ngoc Duy wrote: > On 4/15/07, Shawn O. Pearce <spearce@spearce.org> wrote: >> > The tree that goes out to users is NOT git or CVS. What you point to >> > here is impossible unless we forced all of the users to migrate to git >> > (a truly herculean task if there was ever one). >> > It's a tarball or an rsync of an automatically managed CVS checkout. >> > (Tarballs go onto the release media, and are also widely used by those >> > that sneaker-net their trees to machines for security reasons). >> > Alternatively, the users browse the viewcvs, and pull something from the >> > Attic. Regardless of where they get the file from, the problem is that >> > the file doesn't contain any markers to help the developers merge it >> > back again. >> >> Git won't do this for you. We specifically don't mangle source[*1*]. >> >> What you could do is create a program that mangles the files before >> delivery. You would probably want to do something like: >> >> $Id: 7fbf239:path/to/file$ >> >> where 7fbf239 is the earliest commit that introduced that particular >> version of path/to/file, even if that is months old. That would >> be most like what CVS would do. 8 char abbreviated commits should >> be reasonably stable, and not too long to read or copy and paste. >> A format like the above would also be easy to grab and copy into >> a Git command line. >> >> If we had a Git library that could access the repository, this would >> a pretty easy program to write. You are basically blaming each path >> in the current HEAD commit on the parent, until you cannot blame >> anyone else for that path. You do this blame on the entire tree, >> and then output the munged structure (or only the files you want >> munged). >> >> Its good we have a GSoC project working on libification! ;-) >> >> [*1*] Yes, I'm ignoring the nutso crlf support that's now in... Even >> though I work on Windows, the only true line ending is LF. ;-) > > Can we add an attribute like Subversion's svn:keywords? If the > attribute is set, we expand keywords when checkout and remove > expansion in memory before doing any git operations. It's some kind of > I/O filter for working directory access. There was some talk about keyword expansion, and it is doable IIRC. Check out threads containing: Message-ID: <20070301175200.GA21433@informatik.uni-freiburg.de> http://permalink.gmane.org/gmane.comp.version-control.git/41108 (with some inane totally irrelevant subject) -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 5:57 ` Nguyen Thai Ngoc Duy 2007-04-15 8:54 ` Jakub Narebski @ 2007-04-15 18:18 ` Linus Torvalds 2007-04-15 19:51 ` Andy Parkins 1 sibling, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2007-04-15 18:18 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Shawn O. Pearce, Robin H. Johnson, Git Mailing List On Sun, 15 Apr 2007, Nguyen Thai Ngoc Duy wrote: > > Can we add an attribute like Subversion's svn:keywords? If the > attribute is set, we expand keywords when checkout and remove > expansion in memory before doing any git operations. It's some kind of > I/O filter for working directory access. NNOOo-oooo... Keyword substitution is just *stupid*. It's an inexcusable braindamage. Don't do it. It leads to all kinds of idiotic problems downstream, and it really doesn't help *anything* except for "but I'm used to it". There are absolutely no valid uses for it. If you want to tag your files somehow, do it in "git archive" when exporting it, but not in the working tree. And realize that once you export it with the stupid keyword expansion, diffs etc will all be corrupted, and will not - AND MUST NOT - apply to the uncorrupted working tree. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 18:18 ` Linus Torvalds @ 2007-04-15 19:51 ` Andy Parkins 2007-04-15 20:51 ` Linus Torvalds 0 siblings, 1 reply; 34+ messages in thread From: Andy Parkins @ 2007-04-15 19:51 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Sunday 2007, April 15, Linus Torvalds wrote: > Keyword substitution is just *stupid*. It's an inexcusable > braindamage. Don't do it. It leads to all kinds of idiotic problems > downstream, and it really doesn't help *anything* except for "but I'm > used to it". There are absolutely no valid uses for it. You're right that it can cause problems, but it is certainly not the case that there are no valid uses for it. I've mentioned it before but I'll say it again, because it is the only feature I miss from subversion and I can't see why it is invalid. I keep diagrams for a project in SVG format in the repository, this works very well because SVG is so nicely ASCII. In the title block of the diagram I put "$Id$", then in subversion, after checking in and updating it got expanded to $Id: diagram.svg 148 2002-07-28 21:30:43Z andyp $ Now, I print out that diagram and pin it to my wall - sometimes copies of it are given to others. I do this on a regular basis. The diagram is big and complicated and all versions of it look very similar. In short it is very convenient to have the version of the file actually printed on the piece of paper. This is a piece of paper remember, there is no way to hash the daigram, or even look at the underlying source. When someone comes to me with a random version of the diagram, I can use that ID to checkout exactly the revision that that diagram refers to. Please explain to me why that is not a valid use. > If you want to tag your files somehow, do it in "git archive" when > exporting it, but not in the working tree. And realize that once you > export it with the stupid keyword expansion, diffs etc will all be > corrupted, and will not - AND MUST NOT - apply to the uncorrupted > working tree. All of the problems you describe apply equally to CRLF conversion, and yet there seems to be no problem with implementing that. In fact the problem there is significantly worse, as it changes every line of the file. Now, solving the keyword problem is not simple, obviously, but it's certainly not impossible. On git-add the expanded tags get unexpanded so $Tag: blah blah blah$ becomes $Tag$; on checkout they get expanded. Similarly while calculating diffs - the diff engine unexpands as it goes so the lines with the keywords in them are not seen as different regardless of the expanded part. Applying diffs from some external source doesn't corrupt anything - because the diff engine is, by definition, going to unexpand the keywords when it compares. So, someone sends you a diff that has this: - /* $Id: diagram.svg 148 2002-07-28 21:30:43Z andyp $ */ + /* $Id: diagram.svg 149 2002-07-29 20:32:47Z andyp $ */ And you apply it to the working tree - well, that line will be seen as this by the diff engine: - /* $Id$ */ + /* $Id$ */ No change. Obviously this is entirely optional and would be activated on a per-file basis. For git it would be even more useful because of all the information actually available. I'd love to have git-keywords like these: $Commit: 2bfe3cec92be4f5e3bfc0e71ed560df4a726c07b$ $Object: b1bd9e46c2bd64e00b671ff5ed512d9c12b53309$ $Describe: v1.5.1.1-83-g2bfe3ce$ $Id: cache.h v1.5.1.1-83-g2bfe3ce $ Feelings seem very strong about this; I've seen comments again and again about how braindamaged it is and I just can't see it - please, help me see - what is it that is so utterly broken about it? I can see that it adds a complication to many parts, but I can't see why it is seen as so evil. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 19:51 ` Andy Parkins @ 2007-04-15 20:51 ` Linus Torvalds 2007-04-16 0:11 ` Bill Lear ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Linus Torvalds @ 2007-04-15 20:51 UTC (permalink / raw) To: Andy Parkins; +Cc: git, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Sun, 15 Apr 2007, Andy Parkins wrote: > > You're right that it can cause problems, but it is certainly not the > case that there are no valid uses for it. I'm sorry, but you're just wrong. There are no valid uses for it in the working tree. Full stop. There are valid uses to tag sources with some revision information WHEN IT LEAVES THE REVISION CONTROLLED ENVIRONMENT, but not one second before that. > I keep diagrams for a project in SVG format in the repository, this > works very well because SVG is so nicely ASCII. In the title block of > the diagram I put "$Id$", then in subversion, after checking in and > updating it got expanded to > > $Id: diagram.svg 148 2002-07-28 21:30:43Z andyp $ > > Now, I print out that diagram and pin it to my wall - sometimes copies > of it are given to others. I do this on a regular basis. And is there *any* reason why you don't just do that as an "export" option, when it's very clear that people won't send diffs that include it and that will cause all the endless problems that keyword expansion causes? Why would you ever have the pain and suffering of using it within the source control issue? Especially since you would be a *lot* better off using just an export script that can do a lot better than CVS/SVN keyword expansion could ever do (ie you can add all sorts of more relevant information than just a date and user name!) > Please explain to me why that is not a valid use. It's not a valid use because there are many SO MUCH BETTER WAYS to get the same thing, that have none of the downsides of keyword expansion? Your argument is akin to saying that "Why isn't it a valid use to replace the steering wheel in my car with a mouth-operated joystick under the passenger side seat?" Sure, you *can* steer a car by mouthing at it while having your head under the passenger side seat, and your butt sticking out through the moonroof ("We could add a periscope so that I can see where I'm going!") But that's not an argument *for* doing it, when there are ways that are obviously much better, and don't _need_ the periscope! See? The fact that you *can* do something is not a valid argument for it being a valid use. You *can* do stupid things, but if you can get to the same end result by not doing stupid things, wouldn't you prefer that instead? Here's a small makefile snippet for you: %.prt: %.svg sed 's/\$$Id\$$/\$$ $(shell git log --pretty=format:"%h: %s (%an)" --abbrev-commit -1 file.svg) \$$/g' < $< > $@ which would need some work (it doesn't quote things right - in reality you'd write a simple script to do this properly). See? No need for a periscope, and your butt can be toasty warm too if you just add a seat heater option... Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 20:51 ` Linus Torvalds @ 2007-04-16 0:11 ` Bill Lear 2007-04-16 9:10 ` Andy Parkins 2007-04-16 2:17 ` Robin H. Johnson 2007-04-16 9:03 ` Andy Parkins 2 siblings, 1 reply; 34+ messages in thread From: Bill Lear @ 2007-04-16 0:11 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Parkins, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Sunday, April 15, 2007 at 13:51:42 (-0700) Linus Torvalds writes: >On Sun, 15 Apr 2007, Andy Parkins wrote: >> >> You're right that it can cause problems, but it is certainly not the >> case that there are no valid uses for it. > >I'm sorry, but you're just wrong. > >There are no valid uses for it in the working tree. Full stop. > >There are valid uses to tag sources with some revision information WHEN IT >LEAVES THE REVISION CONTROLLED ENVIRONMENT, but not one second before >that. ... Not that Linus needs any back-up from me, but I second this, very strongly. Decorating source code with release information is a proper function of release management tools, not the SCM system. We had a similar argument in our company about this, sparked by a criticism of git for not having keyword (version number) substitution, and I argued that having such substitution functions in the SCM was out-of-place and a crutch for weak release procedures. It's easy with a proper make system to put whatever information you want from the SCM into the release product. This would probably be as crazy as asking for saving and restoring timestamps in the working tree on checkout of branches, and we know how insane that is... Bill ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 0:11 ` Bill Lear @ 2007-04-16 9:10 ` Andy Parkins 2007-04-16 15:17 ` Julian Phillips 0 siblings, 1 reply; 34+ messages in thread From: Andy Parkins @ 2007-04-16 9:10 UTC (permalink / raw) To: git Cc: Bill Lear, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Monday 2007 April 16 01:11, Bill Lear wrote: > Not that Linus needs any back-up from me, but I second this, very > strongly. Decorating source code with release information is a proper > function of release management tools, not the SCM system. We had a > similar argument in our company about this, sparked by a criticism of > git for not having keyword (version number) substitution, and I argued > that having such substitution functions in the SCM was out-of-place > and a crutch for weak release procedures. It's easy with a proper > make system to put whatever information you want from the SCM into the > release product. I'm not disagreeing with any of this - there are certainly cases when expansion is completely the wrong tool. That doesn't mean there are no cases where it would be useful. The case I keep banging on about is that where nothing is made and this is not a release. I don't want to make a release, I just want to print out the current version of a file and have something that appears on the printout that would allow me to identify what version of the file that printout was from. Are you seriously suggesting I should run release scripts just for that? It's not something you want - fine - not a problem for me that you wouldn't use it. The thing that is bothering me is that everyone keeps waving their hands while chanting "keyword expansion evil", while not giving an example of what problem it causes. By this I mean "problem for the end user", not "problem in writing the support" - if it's impractical to implement then that's fine, say that. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 9:10 ` Andy Parkins @ 2007-04-16 15:17 ` Julian Phillips 0 siblings, 0 replies; 34+ messages in thread From: Julian Phillips @ 2007-04-16 15:17 UTC (permalink / raw) To: Andy Parkins Cc: git, Bill Lear, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Mon, 16 Apr 2007, Andy Parkins wrote: > On Monday 2007 April 16 01:11, Bill Lear wrote: > >> Not that Linus needs any back-up from me, but I second this, very >> strongly. Decorating source code with release information is a proper >> function of release management tools, not the SCM system. We had a >> similar argument in our company about this, sparked by a criticism of >> git for not having keyword (version number) substitution, and I argued >> that having such substitution functions in the SCM was out-of-place >> and a crutch for weak release procedures. It's easy with a proper >> make system to put whatever information you want from the SCM into the >> release product. > > I'm not disagreeing with any of this - there are certainly cases when > expansion is completely the wrong tool. That doesn't mean there are no cases > where it would be useful. > > The case I keep banging on about is that where nothing is made and this is not > a release. I don't want to make a release, I just want to print out the > current version of a file and have something that appears on the printout > that would allow me to identify what version of the file that printout was > from. Are you seriously suggesting I should run release scripts just for > that? > > It's not something you want - fine - not a problem for me that you wouldn't > use it. The thing that is bothering me is that everyone keeps waving their > hands while chanting "keyword expansion evil", while not giving an example of > what problem it causes. By this I mean "problem for the end user", > not "problem in writing the support" - if it's impractical to implement then > that's fine, say that. > What I don't understand is why the people who want keyword expansion don't simply write a little wrapper script, a keyworded git as it were (you could even call it gitk for maximum confusion :P). In the script you simply: 1) collapse all keywords 2) call appropriate git function 3) expand keywords again wouldn't that do what people want without having to change the git code at all? You could probably even get it into contrib .. (In the case of gentoo, you could even change the ebuild so that the real git is installed as raw_git or something, and the wrapper is installed as git - though personally I wouldn't want to do that) -- Julian --- You may get an opportunity for advancement today. Watch it! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 20:51 ` Linus Torvalds 2007-04-16 0:11 ` Bill Lear @ 2007-04-16 2:17 ` Robin H. Johnson 2007-04-16 3:01 ` Theodore Tso 2007-04-16 14:59 ` Linus Torvalds 2007-04-16 9:03 ` Andy Parkins 2 siblings, 2 replies; 34+ messages in thread From: Robin H. Johnson @ 2007-04-16 2:17 UTC (permalink / raw) To: Linus Torvalds, Git Mailing List Cc: Andy Parkins, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson [-- Attachment #1: Type: text/plain, Size: 1959 bytes --] On Sun, Apr 15, 2007 at 01:51:42PM -0700, Linus Torvalds wrote: > There are valid uses to tag sources with some revision information WHEN IT > LEAVES THE REVISION CONTROLLED ENVIRONMENT, but not one second before > that. Nobody has addressed the single problem that I have with adding it when it's leaving the environment, and that's still of paramount concern to me. Simply put, there is a conflict between being able to add revision information of stuff leaving the environment, and those additions breaking previous checksums (which may be digitally signed, and thus breaking the signatures). I'll reduce it further from my previous example. 1. Developer commits some change to file A. 2. The checksum file is updated because A changed (the checksum file explicitly does not contain keywords). 3. Developer signs the checksum file, and commits it. If during the export process (which is undertaken elsewhere, by a different person or script), file A now has an expansion applied to it, you break the checksum file, which you CANNOT redo, because you lose the developer's digital signature on the checksum file! Using the existing git-verify-tag mechanisms are not suitable, because it is the exported information that must be verifiable. There's FOUR possible solutions here: 1. The commit to file A does the keywords - Which Linus is against. 2. An ADDITIONAL commit to file A, after the initial commit, as a scripted addition of the keywords, but before the checksum is updated. I think this is messy myself, as you'd have to insert the data from the N-1 commit always. 3. Lose the ability to tag the files leaving the environment. 4. Stop digitally signing the checksum file (which then leaves the possibility for other attacks). -- Robin Hugh Johnson Gentoo Linux Developer & Council Member E-Mail : robbat2@gentoo.org GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 [-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 2:17 ` Robin H. Johnson @ 2007-04-16 3:01 ` Theodore Tso 2007-04-16 3:23 ` Nguyen Thai Ngoc Duy 2007-04-16 3:32 ` Robin H. Johnson 2007-04-16 14:59 ` Linus Torvalds 1 sibling, 2 replies; 34+ messages in thread From: Theodore Tso @ 2007-04-16 3:01 UTC (permalink / raw) To: Linus Torvalds, Git Mailing List, Andy Parkins, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Sun, Apr 15, 2007 at 07:17:29PM -0700, Robin H. Johnson wrote: > Nobody has addressed the single problem that I have with adding it when > it's leaving the environment, and that's still of paramount concern to > me. Simply put, there is a conflict between being able to add revision > information of stuff leaving the environment, and those additions > breaking previous checksums (which may be digitally signed, and thus > breaking the signatures). > > I'll reduce it further from my previous example. > > 1. Developer commits some change to file A. > 2. The checksum file is updated because A changed (the checksum file > explicitly does not contain keywords). > 3. Developer signs the checksum file, and commits it. > > If during the export process (which is undertaken elsewhere, by a > different person or script), file A now has an expansion applied to it, > you break the checksum file, which you CANNOT redo, because you lose the > developer's digital signature on the checksum file! Simple, the release engineer runs a script which exports the tree, expanding any keywords and updating the checksum file as necessary, and then the release engineer signs the checksum file! As has already been stated, if this doesn't work, you probably don't have a well defined and formal release process. Just because a developer has signed a checksum doesn't mean that the tree is suitable for release; that's the job of the release engineer to confirm, probably after running a set of regression test suites. And in fact, with git, it's pointless for the developer to sign a checksum file and then commit it, since git is already maintaining checksums as an integral part of how revisions are named. - Ted ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 3:01 ` Theodore Tso @ 2007-04-16 3:23 ` Nguyen Thai Ngoc Duy 2007-04-16 15:08 ` Linus Torvalds 2007-04-16 3:32 ` Robin H. Johnson 1 sibling, 1 reply; 34+ messages in thread From: Nguyen Thai Ngoc Duy @ 2007-04-16 3:23 UTC (permalink / raw) To: Theodore Tso Cc: Linus Torvalds, Git Mailing List, Andy Parkins, Shawn O. Pearce, Robin H. Johnson On 4/16/07, Theodore Tso <tytso@mit.edu> wrote: > Simple, the release engineer runs a script which exports the tree, > expanding any keywords and updating the checksum file as necessary, > and then the release engineer signs the checksum file! As has already > been stated, if this doesn't work, you probably don't have a well > defined and formal release process. > > Just because a developer has signed a checksum doesn't mean that the > tree is suitable for release; that's the job of the release engineer > to confirm, probably after running a set of regression test suites. > And in fact, with git, it's pointless for the developer to sign a > checksum file and then commit it, since git is already maintaining > checksums as an integral part of how revisions are named. Changing Gentoo release process won't make Git the best choice while other SCM candidates can provide the same functionalities that Gentoo needs without changing the process. -- Duy ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 3:23 ` Nguyen Thai Ngoc Duy @ 2007-04-16 15:08 ` Linus Torvalds 2007-04-16 16:06 ` Nguyen Thai Ngoc Duy 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2007-04-16 15:08 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy Cc: Theodore Tso, Git Mailing List, Andy Parkins, Shawn O. Pearce, Robin H. Johnson On Mon, 16 Apr 2007, Nguyen Thai Ngoc Duy wrote: > > Changing Gentoo release process won't make Git the best choice while > other SCM candidates can provide the same functionalities that Gentoo > needs without changing the process. Ahh, the old "argument by blackmail" approach. You know what? Nobody really cares. Arguing by blackmail ("we'll use something else then") just means that you should go somewhere else. If you cannot respond intelligently to intelligent arguments, you really *are* better off using SVN. A billion flies aren't exactly wrong: crap really *is* good. If you're a fly or a maggot. But if you ever actually want to be something *more* than a crap eater, come back then. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 15:08 ` Linus Torvalds @ 2007-04-16 16:06 ` Nguyen Thai Ngoc Duy 0 siblings, 0 replies; 34+ messages in thread From: Nguyen Thai Ngoc Duy @ 2007-04-16 16:06 UTC (permalink / raw) To: Linus Torvalds Cc: Theodore Tso, Git Mailing List, Andy Parkins, Shawn O. Pearce, Robin H. Johnson On 4/16/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Mon, 16 Apr 2007, Nguyen Thai Ngoc Duy wrote: > > > > Changing Gentoo release process won't make Git the best choice while > > other SCM candidates can provide the same functionalities that Gentoo > > needs without changing the process. > > Ahh, the old "argument by blackmail" approach. > > You know what? Nobody really cares. Arguing by blackmail ("we'll use > something else then") just means that you should go somewhere else. If you > cannot respond intelligently to intelligent arguments, you really *are* > better off using SVN. All right. I didn't mean to blackmail you or any Git developer. What I wanted to say is that Gentoo is currently using an old, brain-damaged SCM called CVS. I would like it to use Git but Git in its current state can not fully replace CVS regarding to Gentoo usage. To do that Gentoo needs some changes itself but Gentoo repositories are big ones and it's just hard to change such beast s from bottom up. So I would like to see a compromise from Git (which, I think, does not harm other projects from using Git) to ease the migration. > > A billion flies aren't exactly wrong: crap really *is* good. If you're a > fly or a maggot. > > But if you ever actually want to be something *more* than a crap eater, > come back then. > I would want to _slowly_ evolve from a crap eater to something better because I couldn't become a non-crap eater in a flash :) > Linus > -- Duy ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 3:01 ` Theodore Tso 2007-04-16 3:23 ` Nguyen Thai Ngoc Duy @ 2007-04-16 3:32 ` Robin H. Johnson 2007-04-16 17:00 ` Linus Torvalds 2007-04-17 4:16 ` Daniel Barkalow 1 sibling, 2 replies; 34+ messages in thread From: Robin H. Johnson @ 2007-04-16 3:32 UTC (permalink / raw) To: Theodore Tso, Git Mailing List Cc: Linus Torvalds, Andy Parkins, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson [-- Attachment #1: Type: text/plain, Size: 3198 bytes --] On Sun, Apr 15, 2007 at 11:01:03PM -0400, Theodore Tso wrote: > On Sun, Apr 15, 2007 at 07:17:29PM -0700, Robin H. Johnson wrote: > > Nobody has addressed the single problem that I have with adding it when > > it's leaving the environment, and that's still of paramount concern to > > me. Simply put, there is a conflict between being able to add revision > > information of stuff leaving the environment, and those additions > > breaking previous checksums (which may be digitally signed, and thus > > breaking the signatures). > > > > I'll reduce it further from my previous example. > > > > 1. Developer commits some change to file A. > > 2. The checksum file is updated because A changed (the checksum file > > explicitly does not contain keywords). > > 3. Developer signs the checksum file, and commits it. > > > > If during the export process (which is undertaken elsewhere, by a > > different person or script), file A now has an expansion applied to it, > > you break the checksum file, which you CANNOT redo, because you lose the > > developer's digital signature on the checksum file! > > Simple, the release engineer runs a script which exports the tree, > expanding any keywords and updating the checksum file as necessary, > and then the release engineer signs the checksum file! As has already > been stated, if this doesn't work, you probably don't have a well > defined and formal release process. The checksum file (named Manifest) we are talking about is for a single subdirectory, and is signed as proof that it was not modified between the developer and submission to the tree. As I wrote originally, this is the Gentoo distribution tree, it's NOT delineated by well-defined releases in the conventional sense. There are presently 11571 Manifest files in the tree. Our tools will not allow commits to each package of things that radically break the package (semantic correctness and some automatic validation, but thinkos can still get through the checks). The 'release' process for the tree runs automatically every 30 minutes, and consists of more validation checks, updating a cache directory, producing a signed master Manifest [1] and publishing everything to the rsync servers. > Just because a developer has signed a checksum doesn't mean that the > tree is suitable for release; that's the job of the release engineer > to confirm, probably after running a set of regression test suites. > And in fact, with git, it's pointless for the developer to sign a > checksum file and then commit it, since git is already maintaining > checksums as an integral part of how revisions are named. The entire point of the checksums is to allow end users to validate content that has been exported, with only minimal tools. [1] The master Manifest stage is only in production for the tree tarballs, and NOT in the rsync production at the moment, but will be within the next month. It exists solely to allow the detection of compromised mirrors. -- Robin Hugh Johnson Gentoo Linux Developer & Council Member E-Mail : robbat2@gentoo.org GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 [-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 3:32 ` Robin H. Johnson @ 2007-04-16 17:00 ` Linus Torvalds 2007-04-17 4:16 ` Daniel Barkalow 1 sibling, 0 replies; 34+ messages in thread From: Linus Torvalds @ 2007-04-16 17:00 UTC (permalink / raw) To: Robin H. Johnson Cc: Theodore Tso, Git Mailing List, Andy Parkins, Nguyen Thai Ngoc Duy, Shawn O. Pearce On Sun, 15 Apr 2007, Robin H. Johnson wrote: > > The checksum file (named Manifest) we are talking about is for a single > subdirectory, and is signed as proof that it was not modified between > the developer and submission to the tree. Well, in git, you can actyally just take the tree entry for that subdirectory, and it already is cryptographic proof that two subdirectories match. (It's not signed, but if you actually want to sign it, you can do so, either inside git - by using a tag object that points to that subdirectory - or outside git by just creating a Manifest that contains a list of subdirectories and their tree SHA1's, and signing that). In fact, in git, there's an explicit command to generate that "Manifest of directories in the top level", and it's called git ls-tree HEAD and it will give you cryptographic hashes of each file/directory in the top level of a repository. So just sign that, ie do git ls-tree HEAD > Manifest gpg -sa -u "$username" Manifest or something like that. And you're done. Add the "-r" flag to get the recursive manifest containing *all* files, rather than just the SHA1's of the directories themselves. Of course, you could just sign and tag the HEAD itself, which is what the kernel does, since one signature will guarantee everything under it. > As I wrote originally, this is the Gentoo distribution tree, it's NOT > delineated by well-defined releases in the conventional sense. We do that for the daily (or rather, nightly) snapshots for the kernel. There's no "Manifest", but look at http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/ and you'll see files like patch-2.6.21-rc6-git8.bz2 15-Apr-2007 07:01 38K patch-2.6.21-rc6-git8.bz2.sign 15-Apr-2007 07:01 248 patch-2.6.21-rc6-git8.gz 15-Apr-2007 07:01 42K patch-2.6.21-rc6-git8.gz.sign 15-Apr-2007 07:01 248 patch-2.6.21-rc6-git8.id 15-Apr-2007 07:01 41 patch-2.6.21-rc6-git8.log 15-Apr-2007 07:01 63K patch-2.6.21-rc6-git8.sign 15-Apr-2007 07:01 248 where only the patches are signed, but the system *could* have signed the ID file too (the 41-byte "patch-2.6.21-rc6-git8.id" contains the 40-byte HEX representation of the SHA of the HEAD of the snapshot, and a newline). That 41-byte ID file really is sufficient to describe the whole thing, after all (although you then need to have the git tree in question to actually get the list of files, aka the "Manifest", so if you want that list, you'd have to do the "git ls-tree" thing. > There are presently 11571 Manifest files in the tree. Our tools will > not allow commits to each package of things that radically break the > package (semantic correctness and some automatic validation, but thinkos > can still get through the checks). Sure. And every single Manifest file is pointless *inside* git, since git maintains its own cryptographically secure manifest file anyway. But it's trivial to generate them for external use, if you want to. > The 'release' process for the tree runs automatically every 30 minutes, > and consists of more validation checks, updating a cache directory, > producing a signed master Manifest [1] and publishing everything to the > rsync servers. That sounds like the nightly snapshots the kernel does, except we only do them nightly, and we don't actually validate anythign at all, we just sign things as being from the "master.kernel.org" site (so the signature does mean something, but only that *that* site thinks it is valid). > The entire point of the checksums is to allow end users to validate > content that has been exported, with only minimal tools. If you do a single 41-byte thing, you could use git itself to validate the whole tree. But if you want to have people able to validate any random single file in a tar-file without having git installed, you'd have to: - have the "full manifest" (aka "git ls-tree -r HEAD") - have a trivial script that generates "git ID's" of files, which looks something like this: #!/bin/sh # generate a "git ID" for one or more files while test -n "$1" do file="$1" len=$(stat --format "%s" "$file") echo -n " $file (blob $len): " # Generate the "git ID" for a blob: ( echo -e -n "blob $len\0" ; cat "$file") | sha1sum shift done and now you can check each file in the Manifest even without having git installed. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 3:32 ` Robin H. Johnson 2007-04-16 17:00 ` Linus Torvalds @ 2007-04-17 4:16 ` Daniel Barkalow 1 sibling, 0 replies; 34+ messages in thread From: Daniel Barkalow @ 2007-04-17 4:16 UTC (permalink / raw) To: Robin H. Johnson Cc: Theodore Tso, Git Mailing List, Linus Torvalds, Andy Parkins, Nguyen Thai Ngoc Duy, Shawn O. Pearce On Sun, 15 Apr 2007, Robin H. Johnson wrote: > The checksum file (named Manifest) we are talking about is for a single > subdirectory, and is signed as proof that it was not modified between > the developer and submission to the tree. So the process has to be: 1. Developer commits changes to files. 2. Checksum utility finds the checksums of the files with IDs added where the master site updater will add them. 3. Developer signs checksums. 4. Developer commits checksums. 5. Developer pushes changes to master site. 6. Master site checks out files, adds IDs, and updates live tree. 7. End user fetches tree. 8. End user checks checksums, which match, because the master site and the developer checksum scripts agree on what the end user will see. The only difference is that developers working out of the version control have to generate the checksums with a tool that knows how the IDs will be added, and check the checksums with this tool as well, because working directories don't have IDs in them. Really, it's approximately the same as having the version control system do it, except that it's in the project-specific development tools instead of the version control system. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 2:17 ` Robin H. Johnson 2007-04-16 3:01 ` Theodore Tso @ 2007-04-16 14:59 ` Linus Torvalds 1 sibling, 0 replies; 34+ messages in thread From: Linus Torvalds @ 2007-04-16 14:59 UTC (permalink / raw) To: Robin H. Johnson Cc: Git Mailing List, Andy Parkins, Nguyen Thai Ngoc Duy, Shawn O. Pearce On Sun, 15 Apr 2007, Robin H. Johnson wrote: > > Nobody has addressed the single problem that I have with adding it when > it's leaving the environment, and that's still of paramount concern to > me. Simply put, there is a conflict between being able to add revision > information of stuff leaving the environment, and those additions > breaking previous checksums (which may be digitally signed, and thus > breaking the signatures). Don't be silly. You can just checksum without the ID. Which you have to do with git anyway, since any expanded ID *itself* would be part of any ID, which means that under git, you *physically*cannot* make an ID string be part of the source control environment anyway, unless you did the SHA1 while ignoring the $Id$ expansion. In other words, the problem you talk about exists *regardless*. You suggest pushing that problem into the SCM layer, and de-stabilizing the SCM and causing EVERYBODY ELSE provlems. And I'm telling you that if you want the idiocy of keyword expansion, you can have it, BUT YOU CANNOT HAVE IT IN THE SCM. Because *every* *single* problem you have with keyword expansion (whether it be checksums or anything else) will be MUCH MUCH worse if you do it at the SCM level! Really. When you talk about your "single problem", why the HELL do you think that problem goes away just because you try to deal with it inside the SCM? Trust me, the problem does *not* go away, it gets *bigger*. You're trying to push it into the SCM, because _you_ don't want to deal with the inevitable problems that keywords cause. But face it, the SCM wants to deal with them *even*less*, because they are much worse there, and more importantly, you'd be trying to push them into a level where most users have gotten over the braindamage and no longer want it! So you're trying to make *everybody* suffer, just because you cannot do it right. And suffer people do. There's a reason people are so negative about keyword expansion: we've _seen_ those problems first-hand. So the proper solution is: - don't do keyword expansion on the "originals". - add release information when you do a release. - if you want to sign releases, do so *after* the release. That's what a release process is all about. - if you're so damn lazy that you can't be bothered to do the signing of the release, don't ask others to do stupid things because *you* do something stupid - just make sure that whatever release information you add can be *removed*, so that you can verify an exact match. For example, look at how "git archive" does this. It actually adds release information to the tar-file. It's hidden as a magic header, but that also means that since it's *separate* from the source code, it avoids all the problems with keyword expansion, and now you can (for example) diff the tar-ball source tree with the git tree, and you will not get spurious AND INCORRECT differences! And any checksums would still be valid! And the same kind of thing can be done even if you absolutely have to embed the information on a file-by-file basis. Just make sure that you do it in some reversible manner. But preferably you generate a separate file (eg my hypothetical Makefile example that actually generates a "prt" file from a "svg" file) so that you have the original and can do any diff or validation efforts on *that*. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 20:51 ` Linus Torvalds 2007-04-16 0:11 ` Bill Lear 2007-04-16 2:17 ` Robin H. Johnson @ 2007-04-16 9:03 ` Andy Parkins 2007-04-16 15:54 ` Sven Verdoolaege ` (2 more replies) 2 siblings, 3 replies; 34+ messages in thread From: Andy Parkins @ 2007-04-16 9:03 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Sunday 2007 April 15 21:51, Linus Torvalds wrote: > > Now, I print out that diagram and pin it to my wall - sometimes copies > > of it are given to others. I do this on a regular basis. > > And is there *any* reason why you don't just do that as an "export" > option, when it's very clear that people won't send diffs that include it Of course there is a reason - the file I edit is the SVG itself, in inkscape while editing that file I press "print" to get a print out. Why on earth would I want to jump through hoops by closing the file I'm editing, running some export script to a temporary file that I don't want, then open up Inkscape again, check the export looks okay and then print - on what planet is /that/ simpler? Worse, there is more chance that I'll lose changes once there are two copies of the same file floating around. Which one am I editing and which one am I printing? Have I run the script yet? When I accidentally make changes to the wrong one, I've now got to merge those changes by hand back to the file they should have been in in the first place. > It's not a valid use because there are many SO MUCH BETTER WAYS to get the > same thing, that have none of the downsides of keyword expansion? I'm sorry, but we have different definitions of SO MUCH BETTER; it is _more_ trouble for me the user to have to run scripts just to print the file that is already on my screen, than not. > Your argument is akin to saying that "Why isn't it a valid use to replace > the steering wheel in my car with a mouth-operated joystick under the > passenger side seat?" I'd actually say that that is your argument - you want me to add steps to a process to get the same result. I just want the steering wheel, you want the steering wheel plus script that I run first to install the steering wheel and correctly adapt it for the current car. In my version the process is "I press print"; the fact that is hard for the version control system is irrelevant - the whole point of tools like git is to do work for me, not the other way around. > The fact that you *can* do something is not a valid argument for it being > a valid use. You *can* do stupid things, but if you can get to the same > end result by not doing stupid things, wouldn't you prefer that instead? It's not an accurate analogy at all. Your conclusion is your supposition - it's stupid because it's stupid. I don't understand what the huge problems are - all you've done is say again that it's a problem to have keyword expansion. Why? What problem does it actually cause? I'm not just being argumentative - I still have not understood what terrible evil it is that keyword expansion causes but crlf conversion does not. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 9:03 ` Andy Parkins @ 2007-04-16 15:54 ` Sven Verdoolaege 2007-04-16 15:58 ` Linus Torvalds 2007-04-16 19:41 ` Junio C Hamano 2 siblings, 0 replies; 34+ messages in thread From: Sven Verdoolaege @ 2007-04-16 15:54 UTC (permalink / raw) To: Andy Parkins Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Mon, Apr 16, 2007 at 10:03:05AM +0100, Andy Parkins wrote: > there are two copies of the same file floating around. Which one am I > editing and which one am I printing? Turn off write permissions on the generated file. > Have I run the script yet? When I Use a post-commit hook. > I'm not just being argumentative - I still have not understood what terrible > evil it is that keyword expansion causes but crlf conversion does not. For one thing, this keyword expansion thing requires the SCM to modify the file during commit. (Hey, my editor says something changed the file. Do I have the file opened in another session? Oh, it's the stupid keyword expansion!) AFAIU, crlf conversion will not change the working tree copy of your file on commit. skimo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 9:03 ` Andy Parkins 2007-04-16 15:54 ` Sven Verdoolaege @ 2007-04-16 15:58 ` Linus Torvalds 2007-04-16 23:25 ` Weird shallow-tree conversion state, and branches of shallowtrees David Lang 2007-04-17 9:45 ` Weird shallow-tree conversion state, and branches of shallow trees Andy Parkins 2007-04-16 19:41 ` Junio C Hamano 2 siblings, 2 replies; 34+ messages in thread From: Linus Torvalds @ 2007-04-16 15:58 UTC (permalink / raw) To: Andy Parkins; +Cc: git, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Mon, 16 Apr 2007, Andy Parkins wrote: > > It's not an accurate analogy at all. Your conclusion is your supposition - > it's stupid because it's stupid. I don't understand what the huge problems > are - all you've done is say again that it's a problem to have keyword > expansion. Why? What problem does it actually cause? The easiest way to explain it is that keyword expansion is like crlf, just a million times worse (but if you were to do it in git, you'd literally do it in the same path that does crlf expansion). Like crlf: - it requires you to be careful about binary vs non-binary, and corrupts binary files subtly. - it never appears to be a problem as long as you stay inside the "same system", because everybody just agrees. But why did I actually implement auto-CRLF, if I'm so against it? Because keyword expansion has a lot of problems that CRLF does *not* have: - pretty much every single tool out there actually handles CRLF automatically. When you send emails from a CRLF system to a non-CRLF system, the CRLF will just be removed. Why? Because tools *outside* the SCM already know about "text vs binary", and while you can certainly screw it up (use a CRLF system to generate a kernel patch and send it as a binary attachment, and it won't apply for me, for example), you actually have to work at it a bit. - A transformation like LF<->CRLF is "stateless". Anybody can translate a file between CRLF and LF without having to know anything at all, so even *if* somebody sends me a patch with CRLF (and it actually happens: the amount of whitespace damage that people can do with email is just surprisingly high, and people occasionally use Windows machines to send me kernel patches, probably because they send email from some other machine than the one they did development on). - Related to the statelessness: CRLF is a "global" operation, and doesn't depend on file history or placement. Keyword expansion explicitly does *not* work that way, since the whole *point* of keywords is to make it depend on its place in history! An example of real-world problems with that lack of statelessness of keywords is something as simple as "git rebase". Think about what it does: it moves a commit around in history. But then think about *how* it does that. [ Ok, take a break here, and think about why "keyword expansion" might be a problem for "git rebase" in a way that CRLF is not, before you read on ] Hint: the reason statefulness is broken for things like "git rebase" is that the natural operation for something like that is to generate a patch, and carry it forward. Now, what is in the patch? Keywords. Will the patch apply to the target? Yes? No? See? Keywords means that you suddenly have merge problems with something as simple as patches. Does this matter in CVS? Not often. CVS is so limited that you cannot much do those operations anyway, but if you've ever done a merge in CVS, keyword expansion tends to be one of the things that just make it more complicated. So now you have to remember flags like like "-kk" that disable keywords. (Not a lot of people actually do merges in CVS - branches are hard to use to begin with, so the only people who do branches tend to be pretty hardcode CVS people, and once you've learnt enough to do a branch, keyword expansion is the least of your problems. But it's *one* reason - however small compared to the other reasons - that doing things like merging in CVS is just more painful than it should be) Or what about generating a diff between two branches? Keywords are a total *nightmare*. Do you realize just how *fast* git is in diff generation. Have you ever done "cvs diff"? Have you ever *thought* about how git can be so fast? Hint: we don't even *look* at the contents for most files. But if the content is "generated" depending on history, you just screwed that up too. Or what about something as seemingly unrelated as "git grep". You may not even *realize* how nasty a problem it is when you have two different representations of the same data: one that has keywords in it and is checked out, and one that does not. Which one should you choose? Which one is the right one? What about the git optimization of using the checked-out data because it doesn't need any unpacking? Again, none of these things are problems with CRLF: CRLF is an issue that is pretty much *defined* to not matter for text-files. If you do a "grep", it doesn't matter if lines end in LF or CRLF. If you do a diff, line ending differences (a) shouldn't exist in the first place because they are stateless and (b) even if they were to exist, they shouldn't change the diff, because LF and CRLF are the same in text. And the whole keyword issue gets *worse* when you move between repositories. If you stay "inside" the SCM, you can generally teach it to ignore them. For example, going back to the "git rebase" example (or the "git grep" one, for that matter), you can just define that it's done without keyword expansion. But when you move the data between people? That's exactly where keyword expansion is enabled, and now you not only make things like "git diff" fundamentally broken and much much slower (in fact, it *cannot*work* in the git model, because we don't even *have* tree history, so you cannot add keywords to a tree!), you also guarantee that the end result is much less useful, because now when you send the patch to others, they'll have all the same issues that you had to work around locally. I don't know if I can convince you, but take it from me, keyword expansion is fundamentally broken in the first place, but it's *more* so with git than with CVS, for example. In CVS, the reason you can do keyword expansion in the first place is: - it's file-based to begin with. A file actually *has* history in CVS, in a way it fundmanentally does *not* have in git. So when you generate a diff on a file, the revision information is "just there". That's simply not true in git. There *is* no per-file revision information. You cannot know who touched the file last, for example, without starting from a commit, and doing very expensive things. - it's slow to begin with. This is related to the above thing: exactly because CVS is file-based and not content-based, when you do things like "cvs diff" you will walk files individually anyway. People *accept* (and I cannot imagine why) that an empty "cvs diff" on some big project will take minutes. And the problems aren't even about keyword expansion - keyword expansion is just a small detail. - it's centralized in more ways than one. You are simply not expected to work by applying patches between two unrelated CVS trees. It's not done. It cannot work. The closest you get is (a) merging. Which is *hell*. Again, keyword expansion is just a small detail in why it's hell, and people don't generally pick it up exactly because the merge problems are so much bigger. (b) applying patches from the outside from people who do *not* use CVS, and thus don't generally touch things around the keywords (but even here, you actually end up having problems occasionally). - CVS really fundamentally has so many other problems that keyword expansion just isn't on peoples radar. Yeah, it can corrupt data, but you're more likely to corrupt data with binary files other ways, so it's just not an issue. So basically, other (more fundamental) design mistakes in CVS make keywords seem like a better idea there, but all the keyword problems are just magnified ten-fold by the fact that git doesn't make those _other_ mistakes that CVS does. And don't get me wrong: I think RCS was a great step forward, and CVS was too. A few decades ago. But in git, we sometimes have to teach people to *not* make the mistakes they did with CVS. Keyword expansion is a small detail, and happily few enough people used it in CVS that it's so far not been a huge problem to teach people not to do it. We had to teach people that there's a difference between doing a local repository commit, and pushing that commit to a shared central point. That's a much more fundamental difference, and it's a lot harder to get your brain to accept that kind of change. In contrast, keywords look "trivial", but they really aren't. It's a fundamentally broken notion, even if it *sounds* like a small detail. I'll finish off trying to explain the problem in fundamental git terms: say you have a repository with two branches, A and B, and different history on a file "xyzzy" in those two branches, but because they both ended up applying the same patches, the actual file contents do end up being 100% identical. So they have the same SHA1. What is git diff A..B -- xyzzy supposed to print? And *I* claim that if you don't get an immediate and empty diff, your system is TOTALLY BROKEN. And now think about what keywords do. And realize that keywords are TOTALLY BROKEN! Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallowtrees 2007-04-16 15:58 ` Linus Torvalds @ 2007-04-16 23:25 ` David Lang 2007-04-17 19:50 ` David Lang 2007-04-17 9:45 ` Weird shallow-tree conversion state, and branches of shallow trees Andy Parkins 1 sibling, 1 reply; 34+ messages in thread From: David Lang @ 2007-04-16 23:25 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Parkins, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson I have a different situation where I'm interested in keyword expansions, and am waiting for the appropriate hooks to be added to git to allow be to use it. I have a bunch of config files on different servers that are logicly equivalent, even though they have different values in some fields there is a translation table in my software that tells it what to do. I'd really like to have a version control repository that I can share/more/replicate across the machines. to do this on checkin the software would need to run my helper to create a 'generic' version and check that in. on checkout it would need to run my helper to take the generic version and make the host specific version. a lot of the problems taht you refer to in your message apply to most of the things that have been discussed related to gitattributes. if improperly used it can corrupt the data (either by the checkin/checkout munging or inappropriately merging things) it breaks the 1-1 coorespondance between the packed version and the checked out version. On Mon, 16 Apr 2007, Linus Torvalds wrote: > On Mon, 16 Apr 2007, Andy Parkins wrote: > > [ Ok, take a break here, and think about why "keyword expansion" might be > a problem for "git rebase" in a way that CRLF is not, before you read on ] > > Hint: the reason statefulness is broken for things like "git rebase" is > that the natural operation for something like that is to generate a patch, > and carry it forward. Now, what is in the patch? Keywords. Will the patch > apply to the target? Yes? No? if you send a patch, that patch needs to be relative to the connonical version, namely what's checked into the SCM. if your patch includes keywords it won't apply cleanly to a checked-out of the file. any mergeing and merge resolution needs to be based on the connonical version (i.e. one that doesn't go through the checkin/out conversion) > See? Keywords means that you suddenly have merge problems with something > as simple as patches. Does this matter in CVS? Not often. CVS is so > limited that you cannot much do those operations anyway, but if you've > ever done a merge in CVS, keyword expansion tends to be one of the things > that just make it more complicated. So now you have to remember flags like > like "-kk" that disable keywords. I don't think the problems with patches are insurmountable. if everyone in the project is useing git then you don't have to worry about anything, things will just work (except for manually fixing failed merges) I would definantly agree that sprinkling a little of this into a large project is going to massivly confuse people > Or what about generating a diff between two branches? Keywords are a total > *nightmare*. Do you realize just how *fast* git is in diff generation. > Have you ever done "cvs diff"? Have you ever *thought* about how git can > be so fast? Hint: we don't even *look* at the contents for most files. But > if the content is "generated" depending on history, you just screwed that > up too. you do a diff of the connonical files in the repository, the same way you do today. > Or what about something as seemingly unrelated as "git grep". You may not > even *realize* how nasty a problem it is when you have two different > representations of the same data: one that has keywords in it and is > checked out, and one that does not. Which one should you choose? Which one > is the right one? What about the git optimization of using the checked-out > data because it doesn't need any unpacking? the one with the keywords is the one to choose. and you suffer a performance hit becouse you can't use the checked-out version (without running it through the conversion, which is a performance hit itslef) > And the whole keyword issue gets *worse* when you move between > repositories. If you stay "inside" the SCM, you can generally teach it to > ignore them. For example, going back to the "git rebase" example (or the > "git grep" one, for that matter), you can just define that it's done > without keyword expansion. right, this would avoid most of the problems > But when you move the data between people? That's exactly where keyword > expansion is enabled, and now you not only make things like "git diff" > fundamentally broken and much much slower (in fact, it *cannot*work* in > the git model, because we don't even *have* tree history, so you cannot > add keywords to a tree!), you also guarantee that the end result is much > less useful, because now when you send the patch to others, they'll have > all the same issues that you had to work around locally. why would you do keyword expansion when moving the files between different people's repositories? or is that still considered 'inside the SCM'? > I don't know if I can convince you, but take it from me, keyword expansion > is fundamentally broken in the first place, but it's *more* so with git > than with CVS, for example. > > In CVS, the reason you can do keyword expansion in the first place is: > > - it's file-based to begin with. A file actually *has* history in CVS, in > a way it fundmanentally does *not* have in git. So when you generate a > diff on a file, the revision information is "just there". That's simply > not true in git. There *is* no per-file revision information. You > cannot know who touched the file last, for example, without starting > from a commit, and doing very expensive things. this is a valid argument against the keyword being a version string. it's not nessasarily relavent to other uses. > - it's slow to begin with. This is related to the above thing: exactly > because CVS is file-based and not content-based, when you do things > like "cvs diff" you will walk files individually anyway. People > *accept* (and I cannot imagine why) that an empty "cvs diff" on some > big project will take minutes. And the problems aren't even about > keyword expansion - keyword expansion is just a small detail. if you define the keyword to be equivalent there is no need to look at the content of all the files. > - it's centralized in more ways than one. You are simply not expected to > work by applying patches between two unrelated CVS trees. It's not > done. It cannot work. The closest you get is > (a) merging. Which is *hell*. Again, keyword expansion is just a > small detail in why it's hell, and people don't generally pick > it up exactly because the merge problems are so much bigger. > (b) applying patches from the outside from people who do *not* use > CVS, and thus don't generally touch things around the > keywords (but even here, you actually end up having problems > occasionally). external patches could be a problem, but there are two ways to deal with them. 1. have the patch be against the version of the file with the keywords expanded, and have the result checked in (collapsing the keywords) 2. have the patch be against the version of the file with the keywords collapsed. this _would_ require the ability to bypass the expansion of the keywords and is not something you would want to do very much. of these two, I suspect that #1 would make sense in most cases, and should be the default. > I'll finish off trying to explain the problem in fundamental git terms: > say you have a repository with two branches, A and B, and different > history on a file "xyzzy" in those two branches, but because they both > ended up applying the same patches, the actual file contents do end up > being 100% identical. So they have the same SHA1. > > What is > > git diff A..B -- xyzzy > > supposed to print? > > And *I* claim that if you don't get an immediate and empty diff, your > system is TOTALLY BROKEN. I agree, and what I've been talking about above would produce exactly this. > And now think about what keywords do. And realize that keywords are > TOTALLY BROKEN! it may be that we are thinking of different things when we use the term 'keywords', and that may be why we are seeing different levels of problems. David Lang ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallowtrees 2007-04-16 23:25 ` Weird shallow-tree conversion state, and branches of shallowtrees David Lang @ 2007-04-17 19:50 ` David Lang 0 siblings, 0 replies; 34+ messages in thread From: David Lang @ 2007-04-17 19:50 UTC (permalink / raw) To: Linus Torvalds Cc: Andy Parkins, git, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson sorry for the re-send, I didn't see this go through the list and it's relavent to the current discussion On Mon, 16 Apr 2007, David Lang wrote: > Date: Mon, 16 Apr 2007 16:25:37 -0700 (PDT) > > I have a different situation where I'm interested in keyword expansions, and > am waiting for the appropriate hooks to be added to git to allow be to use > it. > > I have a bunch of config files on different servers that are logicly > equivalent, even though they have different values in some fields there is a > translation table in my software that tells it what to do. > > I'd really like to have a version control repository that I can > share/more/replicate across the machines. to do this on checkin the software > would need to run my helper to create a 'generic' version and check that in. > on checkout it would need to run my helper to take the generic version and > make the host specific version. > > a lot of the problems taht you refer to in your message apply to most of the > things that have been discussed related to gitattributes. > > if improperly used it can corrupt the data (either by the checkin/checkout > munging or inappropriately merging things) > > it breaks the 1-1 coorespondance between the packed version and the checked > out version. > > On Mon, 16 Apr 2007, Linus Torvalds wrote: > >> On Mon, 16 Apr 2007, Andy Parkins wrote: >> >> [ Ok, take a break here, and think about why "keyword expansion" might be >> a problem for "git rebase" in a way that CRLF is not, before you read on ] >> >> Hint: the reason statefulness is broken for things like "git rebase" is >> that the natural operation for something like that is to generate a patch, >> and carry it forward. Now, what is in the patch? Keywords. Will the patch >> apply to the target? Yes? No? > > if you send a patch, that patch needs to be relative to the connonical > version, namely what's checked into the SCM. if your patch includes keywords > it won't apply cleanly to a checked-out of the file. any mergeing and merge > resolution needs to be based on the connonical version (i.e. one that doesn't > go through the checkin/out conversion) > >> See? Keywords means that you suddenly have merge problems with something >> as simple as patches. Does this matter in CVS? Not often. CVS is so >> limited that you cannot much do those operations anyway, but if you've >> ever done a merge in CVS, keyword expansion tends to be one of the things >> that just make it more complicated. So now you have to remember flags like >> like "-kk" that disable keywords. > > I don't think the problems with patches are insurmountable. if everyone in > the project is useing git then you don't have to worry about anything, things > will just work (except for manually fixing failed merges) > > I would definantly agree that sprinkling a little of this into a large > project is going to massivly confuse people > >> Or what about generating a diff between two branches? Keywords are a total >> *nightmare*. Do you realize just how *fast* git is in diff generation. >> Have you ever done "cvs diff"? Have you ever *thought* about how git can >> be so fast? Hint: we don't even *look* at the contents for most files. But >> if the content is "generated" depending on history, you just screwed that >> up too. > > you do a diff of the connonical files in the repository, the same way you do > today. > >> Or what about something as seemingly unrelated as "git grep". You may not >> even *realize* how nasty a problem it is when you have two different >> representations of the same data: one that has keywords in it and is >> checked out, and one that does not. Which one should you choose? Which one >> is the right one? What about the git optimization of using the checked-out >> data because it doesn't need any unpacking? > > the one with the keywords is the one to choose. and you suffer a performance > hit becouse you can't use the checked-out version (without running it through > the conversion, which is a performance hit itslef) > >> And the whole keyword issue gets *worse* when you move between >> repositories. If you stay "inside" the SCM, you can generally teach it to >> ignore them. For example, going back to the "git rebase" example (or the >> "git grep" one, for that matter), you can just define that it's done >> without keyword expansion. > > right, this would avoid most of the problems > >> But when you move the data between people? That's exactly where keyword >> expansion is enabled, and now you not only make things like "git diff" >> fundamentally broken and much much slower (in fact, it *cannot*work* in >> the git model, because we don't even *have* tree history, so you cannot >> add keywords to a tree!), you also guarantee that the end result is much >> less useful, because now when you send the patch to others, they'll have >> all the same issues that you had to work around locally. > > why would you do keyword expansion when moving the files between different > people's repositories? or is that still considered 'inside the SCM'? > >> I don't know if I can convince you, but take it from me, keyword expansion >> is fundamentally broken in the first place, but it's *more* so with git >> than with CVS, for example. >> >> In CVS, the reason you can do keyword expansion in the first place is: >> >> - it's file-based to begin with. A file actually *has* history in CVS, in >> a way it fundmanentally does *not* have in git. So when you generate a >> diff on a file, the revision information is "just there". That's simply >> not true in git. There *is* no per-file revision information. You >> cannot know who touched the file last, for example, without starting >> from a commit, and doing very expensive things. > > this is a valid argument against the keyword being a version string. it's not > nessasarily relavent to other uses. > >> - it's slow to begin with. This is related to the above thing: exactly >> because CVS is file-based and not content-based, when you do things >> like "cvs diff" you will walk files individually anyway. People >> *accept* (and I cannot imagine why) that an empty "cvs diff" on some >> big project will take minutes. And the problems aren't even about >> keyword expansion - keyword expansion is just a small detail. > > if you define the keyword to be equivalent there is no need to look at the > content of all the files. > >> - it's centralized in more ways than one. You are simply not expected to >> work by applying patches between two unrelated CVS trees. It's not >> done. It cannot work. The closest you get is >> (a) merging. Which is *hell*. Again, keyword expansion is just a >> small detail in why it's hell, and people don't generally pick >> it up exactly because the merge problems are so much bigger. >> (b) applying patches from the outside from people who do *not* use >> CVS, and thus don't generally touch things around the >> keywords (but even here, you actually end up having problems >> occasionally). > > external patches could be a problem, but there are two ways to deal with > them. > > 1. have the patch be against the version of the file with the keywords > expanded, and have the result checked in (collapsing the keywords) > > 2. have the patch be against the version of the file with the keywords > collapsed. this _would_ require the ability to bypass the expansion of the > keywords and is not something you would want to do very much. > > of these two, I suspect that #1 would make sense in most cases, and should be > the default. > >> I'll finish off trying to explain the problem in fundamental git terms: >> say you have a repository with two branches, A and B, and different >> history on a file "xyzzy" in those two branches, but because they both >> ended up applying the same patches, the actual file contents do end up >> being 100% identical. So they have the same SHA1. >> >> What is >> >> git diff A..B -- xyzzy >> >> supposed to print? >> >> And *I* claim that if you don't get an immediate and empty diff, your >> system is TOTALLY BROKEN. > > I agree, and what I've been talking about above would produce exactly this. > >> And now think about what keywords do. And realize that keywords are >> TOTALLY BROKEN! > > it may be that we are thinking of different things when we use the term > 'keywords', and that may be why we are seeing different levels of problems. > > David Lang > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 15:58 ` Linus Torvalds 2007-04-16 23:25 ` Weird shallow-tree conversion state, and branches of shallowtrees David Lang @ 2007-04-17 9:45 ` Andy Parkins 1 sibling, 0 replies; 34+ messages in thread From: Andy Parkins @ 2007-04-17 9:45 UTC (permalink / raw) To: git; +Cc: Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Monday 2007 April 16 16:58, Linus Torvalds wrote: Thank you for the detailed response. My apologies for the delay in replying, I did write and send a response, but it's gone missing in the world of google's SMTP server. I'll try and resend when I return home. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 9:03 ` Andy Parkins 2007-04-16 15:54 ` Sven Verdoolaege 2007-04-16 15:58 ` Linus Torvalds @ 2007-04-16 19:41 ` Junio C Hamano 2007-04-16 20:55 ` Andy Parkins 2 siblings, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2007-04-16 19:41 UTC (permalink / raw) To: Andy Parkins Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson Andy Parkins <andyparkins@gmail.com> writes: > On Sunday 2007 April 15 21:51, Linus Torvalds wrote: > >> > Now, I print out that diagram and pin it to my wall - sometimes copies >> > of it are given to others. I do this on a regular basis. >> >> And is there *any* reason why you don't just do that as an "export" >> option, when it's very clear that people won't send diffs that include it > > Of course there is a reason - the file I edit is the SVG itself, in inkscape > while editing that file I press "print" to get a print out. Why on earth > would I want to jump through hoops by closing the file I'm editing, running > some export script to a temporary file that I don't want, then open up > Inkscape again, check the export looks okay and then print - on what planet > is /that/ simpler? I have one question. In your workflow, when do you "print"? If you did this: $ cvs update draw.svg $ inkscape draw.svg ... do more editing ... press "PRINT" $ cvs diff draw.svg the final "cvs diff" would say you have such and such changes to the drawing file you just printed since the checked-in version. However, doesn't "$Id: ... $" embedded in the printed copy say it is from the last checked-in version? Is inkscape aware of the "$Id: ... $" keyword and modifies such string by munging it to "$Id: ..., modified $", once you make a local modification to the document? Otherwise you cannot tell if the printed copy is pristine and match what the $Id$ keyword claims it is. Or maybe in your workflow, such a local modification may not actually matter because you made a habit of not making a drastic edit before printing. Or perhaps maybe you never print a locally modified copy. Does Inkscape have a batch mode operation? It might be an option to have something like this in the Makefile if it does (I do not know if it does, and if so what the syntax is, so this is totally made up): print:: draw.svg describe=$(git describe HEAD) && \ git cat-file -p HEAD:draw.svg | \ sed -e 's/$$Id$$/$$Id: '"$$described"'/g' | \ inkscape --print --stdin .PHONY: print ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 19:41 ` Junio C Hamano @ 2007-04-16 20:55 ` Andy Parkins 2007-04-17 21:24 ` Junio C Hamano 0 siblings, 1 reply; 34+ messages in thread From: Andy Parkins @ 2007-04-16 20:55 UTC (permalink / raw) To: git Cc: Junio C Hamano, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Monday 2007, April 16, Junio C Hamano wrote: > In your workflow, when do you "print"? After a save and commit. Otherwise - as you point out, the id is wrong. > the final "cvs diff" would say you have such and such changes to > the drawing file you just printed since the checked-in version. > However, doesn't "$Id: ... $" embedded in the printed copy say > it is from the last checked-in version? Yep. You will get no argument from me that keywords are by no means definitive. > Is inkscape aware of the "$Id: ... $" keyword and modifies such > string by munging it to "$Id: ..., modified $", once you make a Nope. Inkscape knows nothing about the expansion. However, even if I wasn't careful to only print out checked in files, it would still narrow down the possible versions to one of two. > local modification to the document? Otherwise you cannot tell > if the printed copy is pristine and match what the $Id$ keyword > claims it is. Correct. Every user of keywords is aware that the keyword doesn't update all the time - in fact there's nothing to stop you changing the keyword yourself to an utter lie. I think the assumption is that you aren't fighting your own tools though. > Or maybe in your workflow, such a local modification may not > actually matter because you made a habit of not making a drastic > edit before printing. Yep. > Or perhaps maybe you never print a locally modified copy. Yep. In fact, for me, most of the time I'm printing a diagram that was checked in a number of revisions ago. It's not the case that I modify-print. However, that's just me. > Does Inkscape have a batch mode operation? It might be an > option to have something like this in the Makefile if it does (I > do not know if it does, and if so what the syntax is, so this is > totally made up): I think it does as it happens; and your little script is just the sort of thing I will use when I get around to fixing this hole. However, it's missing the point to take my example as an unsolved problem - there are plenty of ways I can get what I want; I brought it up merely as a counter to the statement that there were no valid situations for wanting keyword expansion. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-16 20:55 ` Andy Parkins @ 2007-04-17 21:24 ` Junio C Hamano 2007-04-17 21:51 ` Andy Parkins 0 siblings, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2007-04-17 21:24 UTC (permalink / raw) To: Andy Parkins Cc: git, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson Andy Parkins <andyparkins@gmail.com> writes: > However, it's missing the point to take my example as an unsolved > problem - there are plenty of ways I can get what I want; I brought it > up merely as a counter to the statement that there were no valid > situations for wanting keyword expansion. That's actually quite different from what you said. Andy Parkins <andyparkins@gmail.com> writes: > Of course there is a reason - the file I edit is the SVG > itself, in inkscape while editing that file I press "print" to > get a print out. Why on earth would I want to jump through > hoops by closing the file I'm editing, running some export > script to a temporary file that I don't want, then open up > Inkscape again, check the export looks okay and then print - > on what planet is /that/ simpler? You were claiming that with built-in keyword expansion what you want becomes /simpler/. I questioned that. Maybe it's just me, who is not a GUI person [*1*], but to me, having to start inkscape, mouse around to find the "Print" button and print feels much more cumbersome than simply typing "make print". [Footnote] *1* Not in the sense I do not program GUIy applications, but in the sense that I do not usually _use_ GUI applications. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-17 21:24 ` Junio C Hamano @ 2007-04-17 21:51 ` Andy Parkins 0 siblings, 0 replies; 34+ messages in thread From: Andy Parkins @ 2007-04-17 21:51 UTC (permalink / raw) To: git Cc: Junio C Hamano, Linus Torvalds, Nguyen Thai Ngoc Duy, Shawn O. Pearce, Robin H. Johnson On Tuesday 2007, April 17, Junio C Hamano wrote: > Andy Parkins <andyparkins@gmail.com> writes: > > However, it's missing the point to take my example as an unsolved > > problem - there are plenty of ways I can get what I want; I brought > > it up merely as a counter to the statement that there were no valid > > situations for wanting keyword expansion. > > That's actually quite different from what you said. Sorry; I didn't express it very well - the thing that started all this was the statement that there was no valid use case for keywords. I just gave an example. I felt that the thread was moving away from keywords and towards solving my particular problem - which is all appreciated, but wasn't the point. Running makefile recipes or extra scripts are all valid methods and pragmatic working-with-what-git-does-now solutions. I wanted to distinguish between what I could do now and what I could do with keyword support. > You were claiming that with built-in keyword expansion what you > want becomes /simpler/. I questioned that. Well it does from the point of view of pressing "print". > Maybe it's just me, who is not a GUI person [*1*], but to me, > having to start inkscape, mouse around to find the "Print" > button and print feels much more cumbersome than simply typing > "make print". Again, that was addressing my particular problem - good stuff. However, it's just luck that inkscape has a batch mode - there's no guarantee for that. I could just swap the example around a bit, what about if it was an OpenOffice document that I want to have transparent compression/decompression and I've set the properties tag to contain "$Id$". There is no amount of scripting that will enable batch printing of that. Anyway - I've wasted enough of your time with this foolishness now. It's dropped, consider me silenced on this subject ;-) Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Weird shallow-tree conversion state, and branches of shallow trees 2007-04-15 4:31 ` Shawn O. Pearce 2007-04-15 5:57 ` Nguyen Thai Ngoc Duy @ 2007-04-15 9:44 ` Robin H. Johnson 1 sibling, 0 replies; 34+ messages in thread From: Robin H. Johnson @ 2007-04-15 9:44 UTC (permalink / raw) To: Git Mailing List [-- Attachment #1: Type: text/plain, Size: 2748 bytes --] On Sun, Apr 15, 2007 at 12:31:46AM -0400, Shawn O. Pearce wrote: > Mail them a DVD of the Git import, have them load it locally, > and use --reference for all future clones. With Git its possible > to build fast throwaway trees from any random URL, so long as you > keep at least one repository available locally to act as a reference. Ok, that makes it even more worthwhile for them to keep one tree locally, I didn't think of that :-). > > the commit is accepted. The 'update' hook documentation suggests that > > ACLs should be possible and implemented via that. > Yes. I run probably the most paranoid update hook in existance. > If you want a copy let me know, I'll send it to you. Its a Perl > script that verifies the 'committer ' line matches the UNIX uid (by > doing a table lookup) for every new commit or tag being introduced > to the repository. It also verifies that the user can update that > branch, create it, delete it, or rewind it. > > It sounds like you would need to add some additional rules about > specific paths being modified only by certain people in certain > branches (for the SELinux stuff), and running other validations in > the documentation (whatever that is). Yes please, it would be greatly appreciated. I'll hack path ACLs into it, and feed it back to contrib/? (CVS and SVN ship ACL stuff in their contrib/, so we could probably follow suite safely). > What you could do is create a program that mangles the files before > delivery. You would probably want to do something like: > > $Id: 7fbf239:path/to/file$ There's one core problem with mangled after the fact there: It's going to break checksum/gpg verification later. Here's the existing CVS process as a comparison. 1. Developer creates/changes foo-1.2.ebuild. (cvs add, but not cvs ci). 2. Runs the local verify+commit tool (repoman). (these steps are done by repoman now) 3. Generates the initial Manifest (contains SHA256/MD5/RIPEMD160 etc.). 4. Commits the initial Manifest AND the files from the developer. 5. Gegenerated Manifest because of any keywords in the files. 6. Manifest is clear-signed with gpg. 7. Signed Manifest is committed. We can't require the re-processing of the files before they can be verified, as that removes the ability for users to easily verify them with standard tools (md5sum,sha256sum). The direct conversion of such a process to insert the $Id$ and then re-commit that $Id$ runs into chicken-and-egg problems as well, so either git needs to insert the keyword, or the file can't be changed. -- Robin Hugh Johnson Gentoo Linux Developer & Council Member E-Mail : robbat2@gentoo.org GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 [-- Attachment #2: Type: application/pgp-signature, Size: 321 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2007-04-17 21:51 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-04-12 0:53 Weird shallow-tree conversion state, and branches of shallow trees Robin H. Johnson 2007-04-14 8:56 ` Johannes Schindelin 2007-04-15 0:03 ` Robin H. Johnson 2007-04-15 0:02 ` David Lang 2007-04-15 2:01 ` Robin H. Johnson 2007-04-15 4:31 ` Shawn O. Pearce 2007-04-15 5:57 ` Nguyen Thai Ngoc Duy 2007-04-15 8:54 ` Jakub Narebski 2007-04-15 18:18 ` Linus Torvalds 2007-04-15 19:51 ` Andy Parkins 2007-04-15 20:51 ` Linus Torvalds 2007-04-16 0:11 ` Bill Lear 2007-04-16 9:10 ` Andy Parkins 2007-04-16 15:17 ` Julian Phillips 2007-04-16 2:17 ` Robin H. Johnson 2007-04-16 3:01 ` Theodore Tso 2007-04-16 3:23 ` Nguyen Thai Ngoc Duy 2007-04-16 15:08 ` Linus Torvalds 2007-04-16 16:06 ` Nguyen Thai Ngoc Duy 2007-04-16 3:32 ` Robin H. Johnson 2007-04-16 17:00 ` Linus Torvalds 2007-04-17 4:16 ` Daniel Barkalow 2007-04-16 14:59 ` Linus Torvalds 2007-04-16 9:03 ` Andy Parkins 2007-04-16 15:54 ` Sven Verdoolaege 2007-04-16 15:58 ` Linus Torvalds 2007-04-16 23:25 ` Weird shallow-tree conversion state, and branches of shallowtrees David Lang 2007-04-17 19:50 ` David Lang 2007-04-17 9:45 ` Weird shallow-tree conversion state, and branches of shallow trees Andy Parkins 2007-04-16 19:41 ` Junio C Hamano 2007-04-16 20:55 ` Andy Parkins 2007-04-17 21:24 ` Junio C Hamano 2007-04-17 21:51 ` Andy Parkins 2007-04-15 9:44 ` Robin H. Johnson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).