* more git updates..
@ 2005-04-09 19:45 Linus Torvalds
2005-04-09 19:56 ` Linus Torvalds
` (5 more replies)
0 siblings, 6 replies; 179+ messages in thread
From: Linus Torvalds @ 2005-04-09 19:45 UTC (permalink / raw)
To: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Ingo Molnar,
Dave Jones
Cc: Kernel Mailing List
Sorry guys,
several of you have sent me small fixes and scripts to "git", but I've
been busy on breaking/changing the core infrastructure, so I didn't get
around to looking at the scripts yet.
The good news is, the data structures/indexes haven't changed, but many of
the tools to interface with them have new (and improved!) semantics:
In particular, I changed how "read-tree" works, so that it now mirrors
"write-tree", in that instead of actually changing the working directory,
it only updates the index file (aka "current directory cache" file from
the tree).
To actually change the working directory, you'd first get the index file
setup, and then you do a "checkout-cache -a" to update the files in your
working directory with the files from the sha1 database.
Also, I wrote the "diff-tree" thing I talked about:
torvalds@ppc970:~/git> ./diff-tree 8fd07d4b7778cd0233ea0a17acd3fe9d710af035 8c6d29d6a496d12f1c224db945c0c56fd60ce941 | tr '\0' '\n'
<100664 4870bcf91f8666fc788b07578fb7473eda795587 Makefile
>100664 5493a649bb33b9264e8ed26cc1f832989a307d3b Makefile
<100664 9e1bee21e17c134a2fb008db62679048fc819528 cache.h
>100664 56ef561e590fd99e938bd47fd1f2c7ed46126ff0 cache.h
<100664 fd690acc02ef9c06d7c4c3541f69b10ca4b4f8c9 cat-file.c
>100664 6e6d89291ced17a406e64b97fe8bb96a22eefc9d cat-file.c
+100664 fd00e5603dcc4a93acceda0b8cb914fabc8645d5 checkout-cache.c
<100664 a4a8c3d9ef0c4cc6c82b96b5d1a91ac6d3bed466 commit-tree.c
>100664 236ceb7646e3f5d110fd83f815b82e94cc5b2927 commit-tree.c
+100664 01c92f2620a8e13e7cb7fd98ee644c6b65eeccb7 fsck-cache.c
<100664 0eaa053919e0cc400ab9bc40d9272360117e6978 init-db.c
>100664 815743e92dad7e451c65bab01448ee8ae9deeb56 init-db.c
<100664 e7bfaadd5d2331123663a8f14a26604a3cdcb678 read-cache.c
>100664 71d0cb6fe9b7ff79e3b2c5a61e288ac9f62b39dc read-cache.c
<100664 ec0f167a6a505659e5af6911c97f465506534c34 read-tree.c
>100664 f5c50ba79d02f002b9675fd8f129fa388e3282c6 read-tree.c
<100664 00a29c403e751c2a2a61eb24fa2249c8956d1c80 show-diff.c
>100664 b963dd738989bc92bf02352bbedad13a74e66a7d show-diff.c
<100664 aff074c63ac827801a7d02ff92781365957f1430 update-cache.c
>100664 3a672397164d5ff27a19a6888b578af96824ede7 update-cache.c
<100664 7abeeba116b2b251c12ae32c7b38cb048199b574 write-tree.c
>100664 9525c6fc975888a394477339db86216cd5bd5d7c write-tree.c
(ie the output of "diff-tree" has the same NUL-termination, but if you
insist on getting ASCII output, you can just use "tr" to change the NUL
into a NL).
The format of the "diff-tree" output is that the first character is "-"
for "remove file", "+" for "add file" and "<"/">" for "change file" (where
the "<" shows the old state, and ">" shows the new state).
Btw, the NUL-termination makes this really easy to use even in shell
scripts, ie you can do
diff-tree <sha1> <sha1> | xargs -0 do_something
and you'll get each line as one nice argument to your "do_something"
script. So a do_diff could be based on something like
#!/bin/sh
while [ "$1" != "" ]; do
filename="$(echo $1 | cut -d' ' -f3-)"
first_sha="$(echo $1 | cut -d' ' -f2)"
second_sha="$(echo $2 | cut -d' ' -f2)"
c="$(echo $1 | cut -c1)"
case "$c" in
"+")
echo diff -u /dev/null "$filename($first_sha)";;
"-")
echo diff -u "$filename($first_sha)" /dev/null;;
"<")
echo diff -u "$filename($first_sha)" "$filename($second_sha)"
shift;;
*)
echo WHAT?
exit 1;;
esac
shift
done
which really shows what a horrid shell-person I am (I still use the old
tools I learnt to use fifteen years ago. I bet you can do it trivially in
perl or something sane, and I'm just stuck in the stone age of UNIX).
That makes it _very_ easy to parse. The example above is the diff between
the initial commit and one of the more recent trees, so it has changes to
everything, but a more normal thing would be
torvalds@ppc970:~/git> diff-tree 787763499dc4f8cc345bc6ed8ee1e0ae31adedd6 5b0c2695634b5bab2f5d63c7bb30f7e5815af470 | tr '\0' '\n'
<100664 01c92f2620a8e13e7cb7fd98ee644c6b65eeccb7 fsck-cache.c
>100664 81aa7bee003264ea302db835158e725eefa4012d fsck-cache.c
which tells you that the last commit changed just one file (it's from this
one:
torvalds@ppc970:~/git> cat-file commit `cat .dircache/HEAD`
tree 5b0c2695634b5bab2f5d63c7bb30f7e5815af470
parent 81c53a1d3551f358860731481bb2d87179d221e6
author Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005
committer Linus Torvalds <torvalds@ppc970.osdl.org> Sat Apr 9 12:02:30 2005
Make "fsck-cache" print out all the root commits it finds.
Once I do the reference tracking, I'll also make it print out all the
HEAD commits it finds, which is even more interesting.
in case you care).
I've rsync'ed the new git repository to kernel.org, it should all be there
in /pub/linux/kernel/people/torvalds/git.git/ (and it looks like the
mirror scripts already picked it up on the public side too).
Can you guys re-send the scripts you wrote? They probably need some
updating for the new semantics. Sorry about that ;(
Linus
^ permalink raw reply [flat|nested] 179+ messages in thread* Re: more git updates.. 2005-04-09 19:45 more git updates Linus Torvalds @ 2005-04-09 19:56 ` Linus Torvalds 2005-04-09 20:07 ` Petr Baudis ` (4 subsequent siblings) 5 siblings, 0 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-09 19:56 UTC (permalink / raw) To: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Ingo Molnar, Dave Jones Cc: Kernel Mailing List On Sat, 9 Apr 2005, Linus Torvalds wrote: > > To actually change the working directory, you'd first get the index file > setup, and then you do a "checkout-cache -a" to update the files in your > working directory with the files from the sha1 database. Btw, this will not overwrite any old files, so if you have an old version of something, you'd need to do "checkout-cache -f -a" (and order matters: the "-f" must come first). This time I actually have a big comment at the top of the checkout-cache.c file trying to explain the logic. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 19:45 more git updates Linus Torvalds 2005-04-09 19:56 ` Linus Torvalds @ 2005-04-09 20:07 ` Petr Baudis 2005-04-09 21:00 ` Linus Torvalds 2005-04-09 22:00 ` Paul Jackson ` (3 subsequent siblings) 5 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-09 20:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Hello, Dear diary, on Sat, Apr 09, 2005 at 09:45:52PM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > The good news is, the data structures/indexes haven't changed, but many of > the tools to interface with them have new (and improved!) semantics: > > In particular, I changed how "read-tree" works, so that it now mirrors > "write-tree", in that instead of actually changing the working directory, > it only updates the index file (aka "current directory cache" file from > the tree). > > To actually change the working directory, you'd first get the index file > setup, and then you do a "checkout-cache -a" to update the files in your > working directory with the files from the sha1 database. that's great. I was planning to do something with this since currently it really annoyed me. I think I will like this, even though I didn't look at the code itself yet (just on my way). > Also, I wrote the "diff-tree" thing I talked about: ..snip.. Hmm, I wonder, is this better done in C instead of a simple shell script, like my gitdiff.sh? I'd say it is more flexible and probably hardly performance-critical to have this scripted, and not difficult at all provided you have ls-tree. But maybe I'm just too fond of my script... ;-) (Ok, there's some trouble when you want to have newlines and spaces in file names, and join appears to be awfully ignorant about this... :[ ) BTW, do we care about changed modes? If so, they should probably have their place in the diff-tree output. BTW#2, I hope you will merge my ls-tree anyway, even though there is no user for it currently... I should quickly figure out some. :-) > Can you guys re-send the scripts you wrote? They probably need some > updating for the new semantics. Sorry about that ;( I'll try to merge ASAP. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 20:07 ` Petr Baudis @ 2005-04-09 21:00 ` Linus Torvalds 2005-04-09 21:00 ` tony.luck ` (4 more replies) 0 siblings, 5 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-09 21:00 UTC (permalink / raw) To: Petr Baudis; +Cc: Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sat, 9 Apr 2005, Petr Baudis wrote: > > > Also, I wrote the "diff-tree" thing I talked about: > ..snip.. > > Hmm, I wonder, is this better done in C instead of a simple shell > script, like my gitdiff.sh? With 17,000 files in the kernel, and most commits just changing a small number of them, I actually think "diff-tree" matters. You use "join" (which is quite reasonable), but let's put it this way: just the list of files in the current kernel is about half a megabyte of data. Ie your temporary files that you use in the "ls-tree + ls-tree + join" is actually going to be quite sizeable. My goal here is that the speed of "git" really should be almost totally independent of the size of the project. You clearly cannot avoid _some_ size-dependency: my "diff-tree" clearly also has to work through the same 1MB of data, but I think it's worth making the constant factor be as small as humanly possible. I just tried checking in a kernel tree tar-file, and the initial checkin (which is allt he compression and the sha1 calculations for every single file) took about 1:35 (minutes, not hours ;). Doing a commit (trivial change to the top-level Makefile) and then doing a "treediff" between those two things took 0.05 seconds using my C thing. Ie we're talking so fast that we really don't care. Doing a "show-diff" takes 0.15 secs or so (that's all the "stat" calls), and now that I test it out I realize that the most expensive operation is actually _writing_ the "index" file out. These are the two most expensive steps: torvalds@ppc970:~/lx-test/linux-2.6.12-rc2> time update-cache Makefile real 0m0.283s user 0m0.171s sys 0m0.113s torvalds@ppc970:~/lx-test/linux-2.6.12-rc2> time write-tree 5ca21c9d808fa4bee1eb6948a59dfb9c7d73f36a real 0m0.441s user 0m0.354s sys 0m0.087s ie with the current infrastructure it looks like I can do a "patch + commit" in less than one second on the kernel, and 0.75 secs of that is because the "tree" file actually grows pretty large: cat-file tree 5ca21c9d808fa4bee1eb6948a59dfb9c7d73f36a | wc -c says that the uncompressed tree-file is 950,874 bytes. Compressing it means that the archival version of it is "just" 462,546 bytes, but this is really the part that is going to eat _tons_ of disk-space. In other words, each "commit" file is very small and cheap, but since almost every commit will also imply a totally new tree-file, "git" is going to have an overhead of half a megabyte per commit. Oops. Damn, that's painful. I suspect I will have to change the format somehow. One option (which I haven't tested yet) is that since the tree-file is already sorted, I could always write it out with the common subdirectory part "collapsed", ie instead of writing ... include/asm-i386/mach-default/bios_ebda.h include/asm-i386/mach-default/do_timer.h ... I'd write just ... ///bios_ebda.h ///do_timer.h ... since the directory names are implied by the predecessor. However, that doesn't help with the 20-byte sha1 associated with each file, which is also obviously uncompressible, so with 17,000+ files, we have a minimum overhead of abotu 350kB per tree-file. So even if I did the pathname compression, it wouldn't help all that much. I'd only be removing the only part of the file that _is_ very compressible, and I'd probably end up with something that isn't all that far away from the 450kB+ it is now. I suspect that I have to change the file format. Maybe make the "tree" object a two-level thing, and have a "directory" object. Then a "tree" object would point to a "directory" object, which would in turn point to the individual files (and other "directory" objects, of course). That way a commit that only changes a few files will only need to create a few new "directory" objects, instead of creating one huge "tree" object. Sadly, that will make "tree-diff" potentially more expensive. On the other hand, maybe not: it will also speed it _up_, since directories that are totally shared will be trivially seen as such and need no further operation. Thougths? That would break the current repository formats, and I'd have to create a converter thing (which shouldn't be that bad, of course). I don't have to do it right now. In fact, I'd almost prefer for the current thing to become good enough that it's not painful to work with, since right now I'm using it to develop itself. Then I can convert the format with an automated script later, before I actually start working on the kernel... > BTW, do we care about changed modes? If so, they should probably have > their place in the diff-tree output. They're there. If you want to ignore them, you can just notice that the sha1 matches between two lines, and then you don't even have to diff them. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 21:00 ` Linus Torvalds @ 2005-04-09 21:00 ` tony.luck 2005-04-10 16:01 ` Linus Torvalds 2005-04-10 18:19 ` Paul Jackson 2005-04-09 21:08 ` Linus Torvalds ` (3 subsequent siblings) 4 siblings, 2 replies; 179+ messages in thread From: tony.luck @ 2005-04-09 21:00 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List >In other words, each "commit" file is very small and cheap, but since >almost every commit will also imply a totally new tree-file, "git" is >going to have an overhead of half a megabyte per commit. Oops. > >Damn, that's painful. I suspect I will have to change the format somehow. Having dodged that bullet with the change to make tree files point at other tree files ... here's another (potential) issue. A changeset that touches just one file a few levels down from the top of the tree (say arch/i386/kernel/setup.c) will make six new files in the git repository (one for the changeset, four tree files, and a new blob for the new version of the file). More complex changes make more files ... but say the average is ten new files per changeset since most changes touch few files. With 60,000 changesets in the current tree, we will start out our git repository with about 600,000 files. Assuming the first byte of the SHA1 hash is random, that means an average of 2343 files in each of the objects/xx directories. Give it a few more years at the current pace, and we'll have over 10,000 files per directory. This sounds like a lot to me ... but perhaps filesystems now handle large directories enough better than they used to for this to not be a problem? Or maybe the files should be named objects/xx/yy/zzzzzzzzzzzzzzzz? -Tony ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 21:00 ` tony.luck @ 2005-04-10 16:01 ` Linus Torvalds 2005-04-12 17:34 ` Helge Hafting 2005-04-10 18:19 ` Paul Jackson 1 sibling, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 16:01 UTC (permalink / raw) To: tony.luck; +Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sat, 9 Apr 2005 tony.luck@intel.com wrote: > > With 60,000 changesets in the current tree, we will start out our git > repository with about 600,000 files. Assuming the first byte of the > SHA1 hash is random, that means an average of 2343 files in each of the > objects/xx directories. Give it a few more years at the current pace, > and we'll have over 10,000 files per directory. This sounds like a lot > to me ... but perhaps filesystems now handle large directories enough > better than they used to for this to not be a problem? The good news is that git itself doesn't really care. I think it's literally _one_ function ("get_sha1_filename()") that you need to change, and then you need to write a small script that moves files around, and you're really much done. Also, I did actually debate that issue with myself, and decided that even if we do have tons of files per directory, git doesn't much care. The reason? Git never _searches_ for them. Assuming you have enough memory to cache the tree, you just end up doing a "lookup", and inside the kernel that's done using an efficient hash, which doesn't actually care _at_all_ about how many files there are per directory. So I was for a while debating having a totally flat directory space, but since there are _some_ downsides (linear lookup for cold-cache, and just that "ls -l" ends up being O(n**2) and things), I decided that a single fan-out is probably a good idea. > Or maybe the files should be named objects/xx/yy/zzzzzzzzzzzzzzzz? Hey, I may end up being wrong, and yes, maybe I should have done a two-level one. The good news is that we can trivially fix it later (even dynamically - we can make the "sha1 object tree layout" be a per-tree config option, and there would be no real issue, so you could make small projects use a flat version and big projects use a very deep structure etc). You'd just have to script some renames to move the files around.. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 16:01 ` Linus Torvalds @ 2005-04-12 17:34 ` Helge Hafting 0 siblings, 0 replies; 179+ messages in thread From: Helge Hafting @ 2005-04-12 17:34 UTC (permalink / raw) To: Linus Torvalds Cc: tony.luck, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sun, Apr 10, 2005 at 09:01:22AM -0700, Linus Torvalds wrote: > > So I was for a while debating having a totally flat directory space, but > since there are _some_ downsides (linear lookup for cold-cache, and just > that "ls -l" ends up being O(n**2) and things), I decided that a single > fan-out is probably a good idea. > Isn't that fixed even in ext2/ext3 these days? man mke2fs: dir_index Use hashed b-trees to speed up lookups in large directories. Also, the popular reiserfs was designed with this in mind from the start. > > Or maybe the files should be named objects/xx/yy/zzzzzzzzzzzzzzzz? > > Hey, I may end up being wrong, and yes, maybe I should have done a > two-level one. Unless there still is performance issues, please don't. A directory structure with extra levels is necessarily harder to use if one ever have to use it manually somehow. Helge Hafting ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 21:00 ` tony.luck 2005-04-10 16:01 ` Linus Torvalds @ 2005-04-10 18:19 ` Paul Jackson 2005-04-10 23:04 ` Bernd Eckenfels 1 sibling, 1 reply; 179+ messages in thread From: Paul Jackson @ 2005-04-10 18:19 UTC (permalink / raw) To: tony.luck; +Cc: torvalds, pasky, rddunlap, ross, linux-kernel Tony wrote: > Or maybe the files should be named objects/xx/yy/zzzzzzzzzzzzzzzz? I tend to size these things with the square root of the number of leaf nodes. If I have 2,560,000 leaves (your 10,000 files in each of 16*16 directories), then I will aim for 1600 directories of 1600 leaves each. My backup is sized for about this number of leaves, and it uses: xxx/xxxzzzzzzzzzzzzzzzz (I repeat the xxx in the leaf name - easier to code.) I don't think there is any need for two levels. There are 4096 different values of three digit hex numbers. That's ok in one directory. The only question would be 'xx' or 'xxx' - two or three digits. This one is on the cusp in my view - either works. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 18:19 ` Paul Jackson @ 2005-04-10 23:04 ` Bernd Eckenfels 2005-04-11 9:27 ` Anton Altaparmakov 0 siblings, 1 reply; 179+ messages in thread From: Bernd Eckenfels @ 2005-04-10 23:04 UTC (permalink / raw) To: linux-kernel In article <20050410111905.53a2f6a1.pj@engr.sgi.com> you wrote: > (I repeat the xxx in the leaf name - easier to code.) It is a bit OT, but just a note: there are file systems (hash functions) out there who dont like a lot of files named the same way. For example NTFS with the 8.3 short names. Greetings Bernd ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 23:04 ` Bernd Eckenfels @ 2005-04-11 9:27 ` Anton Altaparmakov 0 siblings, 0 replies; 179+ messages in thread From: Anton Altaparmakov @ 2005-04-11 9:27 UTC (permalink / raw) To: Bernd Eckenfels; +Cc: linux-kernel On Mon, 2005-04-11 at 01:04 +0200, Bernd Eckenfels wrote: > In article <20050410111905.53a2f6a1.pj@engr.sgi.com> you wrote: > > (I repeat the xxx in the leaf name - easier to code.) > > It is a bit OT, but just a note: there are file systems (hash functions) out > there who dont like a lot of files named the same way. For example NTFS with > the 8.3 short names. Since you mention NTFS, there is no need to worry about that for Linux. Certainly the Linux kernel NTFS driver is never going to create 8.3 short names. (It doesn't create names at all at the moment but my grand plan is that it will only ever create file names in the Win32 and/or POSIX name spaces. The DOS name space is a thing of the past IMO.) Best regards, Anton -- Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 21:00 ` Linus Torvalds 2005-04-09 21:00 ` tony.luck @ 2005-04-09 21:08 ` Linus Torvalds 2005-04-09 23:31 ` Linus Torvalds 2005-04-10 2:07 ` Paul Jackson ` (2 subsequent siblings) 4 siblings, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-09 21:08 UTC (permalink / raw) To: Petr Baudis; +Cc: Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sat, 9 Apr 2005, Linus Torvalds wrote: > > I suspect that I have to change the file format. Maybe make the "tree" > object a two-level thing, and have a "directory" object. > > Then a "tree" object would point to a "directory" object, which would in > turn point to the individual files (and other "directory" objects, of > course). That way a commit that only changes a few files will only need to > create a few new "directory" objects, instead of creating one huge "tree" > object. Actually, I guess I wouldn't have to change the format. I could just extend the existing "tree" object to be able to point to other trees, and that's it. The downside of that is that then a tree wouldn't have a canonical format any more: you could have two trees that have the exact same content, but they'd haev different names. They should obviously merge very easily (and thus you could create a new merge that _does_ have a common name), but it's ugly. I'll have to think about it. It's good to notice these issues early, this was the first time I had actually tried to check in a kernel-sized tree for real. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 21:08 ` Linus Torvalds @ 2005-04-09 23:31 ` Linus Torvalds 2005-04-10 2:41 ` Petr Baudis ` (3 more replies) 0 siblings, 4 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-09 23:31 UTC (permalink / raw) To: Petr Baudis; +Cc: Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sat, 9 Apr 2005, Linus Torvalds wrote: > > Actually, I guess I wouldn't have to change the format. I could just > extend the existing "tree" object to be able to point to other trees, and > that's it. Done, and pushed out. The current git.git repository seems to do all of this correctly. NOTE! This means that each "tree" file basically tracks just a single directory. The old style of "every file in one tree file" still works, but fsck-cache will warn about it. Happily, the git archive itself doesn't have any subdirectories, so git itself is not impacted by it. Now, this means that I should add a "recusive" option to "tree-diff", but I haven't done so yet. So right now if I change the top-level Makefile, _and_ change kernel/exit.c, then the "tree diff" between the two commit trees ends up looking like: torvalds@ppc970:~/lx-test/linux-2.6.12-rc2> diff-tree 7bec1223736d7e02c755e9a365984b3cbfa1e6e9 d64817f809a60cd960d3078ae91b4d19cb649501 | tr '\0' '\n' <100644 e1e7f7430c0297f22042cff58da5ca73ef121b95 Makefile >100644 8ee21134577e98fb642dffc5b797a0121645c543 Makefile <40000 2239383d00ae746f5e79ceccf8ac3fbca62f949d kernel >40000 a8fad219cb78a6b6a05a10f8643d615fefc8160f kernel ie it shows that the Makefile blob has changed, and the kernel directory has changed. You then need to recurse into the kernel tree to see what the changes were there: torvalds@ppc970:~/lx-test/linux-2.6.12-rc2> diff-tree 2239383d00ae746f5e79ceccf8ac3fbca62f949d a8fad219cb78a6b6a05a10f8643d615fefc8160f | tr '\0' '\n' <100644 1a50b58453679b6fee8de4f744f4befc39397bb1 exit.c >100644 e8df1325bf25816827a1a64404ad533a97bfdae2 exit.c but it clearly all seems to work. And it means that a subdirectory that didn't change at all (the common case) will be able to re-use the old sha1 file when you create a tree (this may in fact make "diff-tree" much less important, since now it tends to handle objects that are just a few kB in size, rather than almost a megabyte. So in this case, the "commit cost" of changing two files was two small tree files (1468 and 679 bytes respectively for the kernel/ and top-level directory) and the commit file itself (251 bytes). In addition to the actual data files that were changed, of course. Goodie. Big difference between that and the 460kB of the old monolithic tree file. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-09 23:31 ` Linus Torvalds @ 2005-04-10 2:41 ` Petr Baudis 2005-04-10 16:27 ` [ANNOUNCE] git-pasky-0.1 Petr Baudis 2005-04-10 6:53 ` more git updates Christopher Li ` (2 subsequent siblings) 3 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-10 2:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Dear diary, on Sun, Apr 10, 2005 at 01:31:10AM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > On Sat, 9 Apr 2005, Linus Torvalds wrote: > > > > Actually, I guess I wouldn't have to change the format. I could just > > extend the existing "tree" object to be able to point to other trees, and > > that's it. > > Done, and pushed out. The current git.git repository seems to do all of > this correctly. ..snip.. Ok, so now I can dare announce it, I hope. I hacked my branch of git somewhat, kept in sync with Linus, and now I have something to show. Please see it at http://pasky.or.cz/~pasky/dev/git/ It is basically a set of (still rather crude) shell scripts upon Linus' git, which make it sanely usable by mere humans for actual version tracking. Its usage _is_ going to change, so don't get too used to it (that'd be hard anyway, I suspect), but it should be working nicely. I have described most of the interesting parts and some basic usage in the README at that page. It wraps commits, supports log retrieval and comfortable diffing between any two trees. And on top of that, it can do some basic remote repositories - it will pull (rsync) from them and it can make the local copy track them - on pull, it will be updated accordingly (and your local commits on the tracked branch will get orphaned). I didn't attach a patch against Linus since I think it's pretty much useless now. It's available as against-linus.patch on the web, and you can apply it to the latest git tree (NOT 0.03). But it's probably better idea to wget my tree. You can then watch us making progress by gitpull.sh linus gitpull.sh pasky and see where we differ by: gitdiff.sh linus pasky (This is how the against-linus.patch was generated. I'd easily generate even 0.03 patch this way, but I forgot to merge the fsck at that time, so it would suck.) (Note that the tree you wget is set up to track my branch. If you want to stop tracking it (basically necessary now if you want to do local commits), do: cp .dircache/HEAD .dircache/HEAD.local gittrack.sh The cp says that something like "I want to pick up where the tracked branch left off". Otherwise, untracking would return you to your "local" branch, which is just some ancient predecessor of the pasky branch here anyway.) Note that I didn't really test it on anything but git itself yet, so I'm not sure how will it cope especially with directories - I tried to make it aware of them though. I will do some more practical testing tomorrow. Otherwise, I will probably try to consolidate the usage and documentation now, and beautify the scripts. I might start pondering some merging too. Oh, and gitpatch.sh. :-) Have fun and please share your opinions, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* [ANNOUNCE] git-pasky-0.1 2005-04-10 2:41 ` Petr Baudis @ 2005-04-10 16:27 ` Petr Baudis 2005-04-10 16:55 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-10 16:27 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Linus Torvalds, Randy.Dunlap, Ross Vandegrift Hello, so I "released" git-pasky-0.1, my set of patches and scripts upon Linus' git, aimed at human usability and to an extent a SCM-like usage. You can get it at http://pasky.or.cz/~pasky/dev/git/git-pasky-base.tar.bz2 and after unpacking and building (make) do git pull pasky to get the latest changes from my branch. If you already have some git from my branch which can do pulling, you can bring yourself up to date by doing just gitpull.sh pasky (but this style of usage is deprecated now). Please see the README for some details regarding usage etc. You can find the changes from the last announcement in the ChangeLog (the previous announcement corresponds to commit id 5125d089ad862f16a306b4942155092e1dce1c2d). The most important change is probably recursive diff addition, and making git ignore the nsec of ctime and mtime, since it is totally unreliable and likes to taint random files as modified. My near future plans include especially some merge support; I think it should be rather easy, actually. I'll also add some simple tagging mechanism. I've decided to postpone the file moving detection, since there's no big demand for it now. ;-) I will also need to do more testing on the linux kernel tree. Committing patch-2.6.7 on 2.6.6 kernel and then diffing results in $ time gitdiff.sh `parent-id` `tree-id` >p real 5m37.434s user 1m27.113s sys 2m41.036s which is pretty horrible, it seems to me. Any benchmarking help is of course welcomed, as well as any other feedback. BTW, what would be the best (most complete) source for the BK tree metadata? Should I dig it from the BKCVS gateway, or is there a better source? Where did you get the sparse git database from, Linus? (BTW, it would be nice to get sparse.git with the directories as separate.) Have fun, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 16:27 ` [ANNOUNCE] git-pasky-0.1 Petr Baudis @ 2005-04-10 16:55 ` Linus Torvalds 2005-04-10 19:49 ` Sean 2005-04-10 17:33 ` Ingo Molnar 2005-04-11 1:58 ` [ANNOUNCE] git-pasky-0.2 Petr Baudis 2 siblings, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 16:55 UTC (permalink / raw) To: Petr Baudis; +Cc: Kernel Mailing List, Randy.Dunlap, Ross Vandegrift On Sun, 10 Apr 2005, Petr Baudis wrote: > > Where did you get the sparse git database from, Linus? (BTW, it > would be nice to get sparse.git with the directories as separate.) When we were trying to figure out how to avert the BK disaster, and one of Tridges concerns (and, in my opinion, the only really valid one) was that you couldn't get the BK data in some SCM-independent way. So I wrote some very preliminary scripts (on top of BK itself) to extract the data, to show that BK could generate a SCM-neutral file format (a very stupid one and horribly useless for anything but interoperability, but still...). I was hoping that that would convince Tridge that trying to muck around with the internal BK file format was not worth it, and avert the BK trainwreck. Larry was ok with the idea to make my export format actually be natively supported by BK (ie the same way you have "bk export -tpatch"), but Tridge wanted to instead get at the native data and be difficult about it. As a result, I can now not only use BK any more, but we also don't have a nice export format from BK. Yeah, I'm a bit bitter about it. Anyway, the sparse data came out of my hack. It's very inefficient, and I estimated that doing the same for the kernel would have taken ten solid days of conversion, mainly because my hack was really just that: a quick hack to show that BK could do it. Larry could have done it a lot better. I'll re-generate the sparse git-database at some point (and I'll probably do so from the old GIT database itself, rather than re-generating it from my old BK data). Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 16:55 ` Linus Torvalds @ 2005-04-10 19:49 ` Sean 0 siblings, 0 replies; 179+ messages in thread From: Sean @ 2005-04-10 19:49 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift On Sun, April 10, 2005 12:55 pm, Linus Torvalds said: > Larry was ok with the idea to make my export format actually be natively > supported by BK (ie the same way you have "bk export -tpatch"), but > Tridge wanted to instead get at the native data and be difficult about > it. As a result, I can now not only use BK any more, but we also don't > have a nice export format from BK. > > Yeah, I'm a bit bitter about it. > Linus, With all due respect, Larry could have dealt with this years ago and removed the motivation for Tridge and others to pursue reverse engineering. Instead he chose to insult and question the motives of everyone that wanted open-source access to the Linux history data. The blame for the current situation falls firmly on the choice to use a closed-source SCM for Linux and the actions of the company that owned it. Sean ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 16:27 ` [ANNOUNCE] git-pasky-0.1 Petr Baudis 2005-04-10 16:55 ` Linus Torvalds @ 2005-04-10 17:33 ` Ingo Molnar 2005-04-10 17:42 ` Willy Tarreau 2005-04-11 1:58 ` [ANNOUNCE] git-pasky-0.2 Petr Baudis 2 siblings, 1 reply; 179+ messages in thread From: Ingo Molnar @ 2005-04-10 17:33 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift * Petr Baudis <pasky@ucw.cz> wrote: > I will also need to do more testing on the linux kernel tree. > Committing patch-2.6.7 on 2.6.6 kernel and then diffing results in > > $ time gitdiff.sh `parent-id` `tree-id` >p > real 5m37.434s > user 1m27.113s > sys 2m41.036s > > which is pretty horrible, it seems to me. Any benchmarking help is of > course welcomed, as well as any other feedback. it seems from the numbers that your system doesnt have enough RAM for this and is getting IO-bound? Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 17:33 ` Ingo Molnar @ 2005-04-10 17:42 ` Willy Tarreau 2005-04-10 17:45 ` Ingo Molnar 0 siblings, 1 reply; 179+ messages in thread From: Willy Tarreau @ 2005-04-10 17:42 UTC (permalink / raw) To: Ingo Molnar Cc: Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift On Sun, Apr 10, 2005 at 07:33:49PM +0200, Ingo Molnar wrote: > > * Petr Baudis <pasky@ucw.cz> wrote: > > > I will also need to do more testing on the linux kernel tree. > > Committing patch-2.6.7 on 2.6.6 kernel and then diffing results in > > > > $ time gitdiff.sh `parent-id` `tree-id` >p > > real 5m37.434s > > user 1m27.113s > > sys 2m41.036s > > > > which is pretty horrible, it seems to me. Any benchmarking help is of > > course welcomed, as well as any other feedback. > > it seems from the numbers that your system doesnt have enough RAM for > this and is getting IO-bound? Not the only problem, without I/O, he will go down to 4m8s (u+s) which is still in the same order of magnitude. willy ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 17:42 ` Willy Tarreau @ 2005-04-10 17:45 ` Ingo Molnar 2005-04-10 18:45 ` Petr Baudis 0 siblings, 1 reply; 179+ messages in thread From: Ingo Molnar @ 2005-04-10 17:45 UTC (permalink / raw) To: Willy Tarreau Cc: Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift * Willy Tarreau <willy@w.ods.org> wrote: > > > I will also need to do more testing on the linux kernel tree. > > > Committing patch-2.6.7 on 2.6.6 kernel and then diffing results in > > > > > > $ time gitdiff.sh `parent-id` `tree-id` >p > > > real 5m37.434s > > > user 1m27.113s > > > sys 2m41.036s > > > > > > which is pretty horrible, it seems to me. Any benchmarking help is of > > > course welcomed, as well as any other feedback. > > > > it seems from the numbers that your system doesnt have enough RAM for > > this and is getting IO-bound? > > Not the only problem, without I/O, he will go down to 4m8s (u+s) which > is still in the same order of magnitude. probably not the only problem - but if we are lucky then his system was just trashing within the kernel repository and then most of the overhead is the _unnecessary_ IO that happened due to that (which causes CPU overhead just as much). The dominant system time suggests so, to a certain degree. Maybe this is wishful thinking. Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 17:45 ` Ingo Molnar @ 2005-04-10 18:45 ` Petr Baudis 2005-04-10 19:13 ` Willy Tarreau ` (2 more replies) 0 siblings, 3 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-10 18:45 UTC (permalink / raw) To: Ingo Molnar Cc: Willy Tarreau, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift Dear diary, on Sun, Apr 10, 2005 at 07:45:12PM CEST, I got a letter where Ingo Molnar <mingo@elte.hu> told me that... > > * Willy Tarreau <willy@w.ods.org> wrote: > > > > > I will also need to do more testing on the linux kernel tree. > > > > Committing patch-2.6.7 on 2.6.6 kernel and then diffing results in > > > > > > > > $ time gitdiff.sh `parent-id` `tree-id` >p > > > > real 5m37.434s > > > > user 1m27.113s > > > > sys 2m41.036s > > > > > > > > which is pretty horrible, it seems to me. Any benchmarking help is of > > > > course welcomed, as well as any other feedback. > > > > > > it seems from the numbers that your system doesnt have enough RAM for > > > this and is getting IO-bound? > > > > Not the only problem, without I/O, he will go down to 4m8s (u+s) which > > is still in the same order of magnitude. > > probably not the only problem - but if we are lucky then his system was > just trashing within the kernel repository and then most of the overhead > is the _unnecessary_ IO that happened due to that (which causes CPU > overhead just as much). The dominant system time suggests so, to a > certain degree. Maybe this is wishful thinking. It turns out to be the forks for doing all the cuts and such what is bogging it down so awfully (doing diff-tree takes 0.48s ;-). I do about 15 forks per change, I guess, and for some reason cut takes a long of time on its own. I've rewritten the cuts with the use of bash arrays and other smart stuff. I somehow don't feel comfortable using this and prefer the old-fashioned ways, but it would be plain unusable without this. Now I'm down to real 1m21.440s user 0m32.374s sys 0m42.200s and I kinda doubt if it is possible to cut this much down. Almost no disk activity, I have almost everything cached by now, apparently. Anyway, you can git pull to get the optimized version. Thanks for the help, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 18:45 ` Petr Baudis @ 2005-04-10 19:13 ` Willy Tarreau 2005-04-10 21:27 ` Petr Baudis 2005-04-10 20:38 ` Linus Torvalds 2005-04-10 20:41 ` Paul Jackson 2 siblings, 1 reply; 179+ messages in thread From: Willy Tarreau @ 2005-04-10 19:13 UTC (permalink / raw) To: Petr Baudis Cc: Ingo Molnar, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift On Sun, Apr 10, 2005 at 08:45:22PM +0200, Petr Baudis wrote: > It turns out to be the forks for doing all the cuts and such what is > bogging it down so awfully (doing diff-tree takes 0.48s ;-). I do about > 15 forks per change, I guess, and for some reason cut takes a long of > time on its own. > > I've rewritten the cuts with the use of bash arrays and other smart > stuff. I somehow don't feel comfortable using this and prefer the > old-fashioned ways, but it would be plain unusable without this. I've encountered the same problem in a config-generation script a while ago. Fortunately, bash provides enough ways to remove most of the forks, but the result is less portable. I've downloaded your code, but it does not compile here because of the tv_nsec fields in struct stat (2.4, glibc 2.2), so I cannot use it to get the most up to date version to take a look at the script. Basically, all the 'cut' and 'sed' can be removed, as well as the 'dirname'. You can also call mkdir only if the dirs don't exist. I really think you should end up with only one fork in the loop to call 'diff'. > Now I'm down to > > real 1m21.440s > user 0m32.374s > sys 0m42.200s > > and I kinda doubt if it is possible to cut this much down. Almost no > disk activity, I have almost everything cached by now, apparently. It is very common to cut times by a factor of 10 or more when replacing common unix tools by pure shell. Dynamic library initialization also takes a lot of time nowadays, and probably you have localisation which is big too. Sometimes, just wiping a few variables at the top of the shell might remove some useless overhead. > Anyway, you can git pull to get the optimized version. > > Thanks for the help, Willy ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 19:13 ` Willy Tarreau @ 2005-04-10 21:27 ` Petr Baudis 0 siblings, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-10 21:27 UTC (permalink / raw) To: Willy Tarreau Cc: Ingo Molnar, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift Dear diary, on Sun, Apr 10, 2005 at 09:13:19PM CEST, I got a letter where Willy Tarreau <willy@w.ods.org> told me that... > On Sun, Apr 10, 2005 at 08:45:22PM +0200, Petr Baudis wrote: > > > It turns out to be the forks for doing all the cuts and such what is > > bogging it down so awfully (doing diff-tree takes 0.48s ;-). I do about > > 15 forks per change, I guess, and for some reason cut takes a long of > > time on its own. > > > > I've rewritten the cuts with the use of bash arrays and other smart > > stuff. I somehow don't feel comfortable using this and prefer the > > old-fashioned ways, but it would be plain unusable without this. > > I've encountered the same problem in a config-generation script a while > ago. Fortunately, bash provides enough ways to remove most of the forks, > but the result is less portable. > > I've downloaded your code, but it does not compile here because of the > tv_nsec fields in struct stat (2.4, glibc 2.2), so I cannot use it to > get the most up to date version to take a look at the script. Basically, Ok, I decided to stop this nsec madness (since it broke show-diff anyway at least on my ext3), and you get it only if you pass -DNSEC to CFLAGS now. Hope this fixes things for you. :-) BTW, I regularly update the public copy as accessible on the web. > all the 'cut' and 'sed' can be removed, as well as the 'dirname'. You > can also call mkdir only if the dirs don't exist. I really think you > should end up with only one fork in the loop to call 'diff'. You still need to extract the file by cat-file too. ;-) And rm the files after it compares them (so that we don't fill /tmp with crap like certain awful programs like to do). But I will conditionalize the mkdir calls, thanks for the suggestion - I think that's the last bit to be squeezed from this loop (I'll yet check on the read proposal - I considered it before and turned down for some reason, can't remember why anymore, though). Thanks, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 18:45 ` Petr Baudis 2005-04-10 19:13 ` Willy Tarreau @ 2005-04-10 20:38 ` Linus Torvalds 2005-04-10 21:39 ` Linus Torvalds ` (2 more replies) 2005-04-10 20:41 ` Paul Jackson 2 siblings, 3 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 20:38 UTC (permalink / raw) To: Petr Baudis Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift On Sun, 10 Apr 2005, Petr Baudis wrote: > > It turns out to be the forks for doing all the cuts and such what is > bogging it down so awfully (doing diff-tree takes 0.48s ;-). I do about > 15 forks per change, I guess, and for some reason cut takes a long of > time on its own. Heh. Can you pull my current repo, which has "diff-tree -R" that does what the name suggests, and which should be faster than the 0.48 sec you see.. It may not matter a lot, since actually generating the diff from the file contents is what is expensive, but remember my goal: I want the expense of a diff-tree to be relative to the size of the diff, so that implies that small diffs haev to be basically instantaenous. So I care. So I just tried the 2.6.7->2.6.8 diff, and for me the new recursive "diff-tree" can generate the _list_ of files changed in zero time: real 0m0.079s user 0m0.067s sys 0m0.024s but then _doing_ the diff is pretty expensive (in this case 3800+ files changed, so you have to unpack 7600+ objects - and even unpacking isn't the expensive part, the expense is literally in the diff operation itself). Me, the stuff I automate is the small steps. Doing a single checkin. So that's the case I care about going fast, when a "diff-tree" will likely have maybe five files or something. That's why I want the small incremental cases to go fast - it it takes me a minute to generate a diff for a _release_, that's not a big deal. I make one release every other month, but I work with lots of small patches all the time. Anyway, with a fast diff-tree, you should be able to generate the list of objects for a fast "merge". That's next. (And by "merge", I of course mean "suck". I'm talking about the old CVS three-way merge, and you have to specify the common parent explicitly and it won't handle any renames or any other crud. But it would get us to something that might actually be useful for simple things. Which is why "diff-tree" is important - it gives the information about what to tell merge). Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 20:38 ` Linus Torvalds @ 2005-04-10 21:39 ` Linus Torvalds 2005-04-10 23:49 ` Petr Baudis 2005-04-10 22:27 ` Petr Baudis 2005-04-11 0:30 ` Re: " Petr Baudis 2 siblings, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 21:39 UTC (permalink / raw) To: Petr Baudis Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift On Sun, 10 Apr 2005, Linus Torvalds wrote: > > Can you pull my current repo, which has "diff-tree -R" that does what the > name suggests, and which should be faster than the 0.48 sec you see.. Actually, I changed things around. Everybody hated the "<" ">" lines, so I put a changed thing on a line of its own with a "*" instead. So you'd now see lines like *100644->100644 1874e031abf6631ea51cf6177b82a1e662f6183e->e8181df8499f165cacc6a0d8783be7143013d410 CREDITS which means that the CREDITS file has changed, and it shows you the mode -> mode transition (that didn't change in this case) and the sha1 -> sha1 transition. So now it's always just one line per change. Firthermore, the filename is always field 3, if you use spaces as delimeters, regardless of whether it's a +/-/* field. So let's say you want to merge two trees (dst1 and dst2) from a common parent (src), what you would do is: - get the list of files to merge: diff-tree -R <dst1> <dst2> | tr '\0' '\n' > merge-files - Which of those were changed by <src> -> <dstX>? diff-tree -R <src> <dst1> | tr '\0' '\n' | join -j 3 - merge-files > dst1-change diff-tree -R <src> <dst2> | tr '\0' '\n' | join -j 3 - merge-files > dst2-change - Which of those are common to both? Let's see what the merge list is: join dst1-change dst2-change > merge-list and hopefully you'd usually be working on a very small list of files by then (everything else you'd just pick from one of the destination trees directly - you've got the name, the sha-file, everything: no need to even look at the data). Does this sound sane? Pasky? Wanna try a "git merge" thing? Starting off with the user having to tell what the common parent tree is - we can try to do the "automatically find best common parent" crud later. THAT may be expensive. (Btw, this is why I think "diff-tree" is more important than actually generating the real diff itself - the above uses diff-tree three times just to cut down to the point where _hopefully_ you don't actually need to generate very much diffs at all. So I want "diff-tree" to be really fast, even if it then can take a minute to actually generate a big diff between releases etc). Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 21:39 ` Linus Torvalds @ 2005-04-10 23:49 ` Petr Baudis 0 siblings, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-10 23:49 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift Dear diary, on Sun, Apr 10, 2005 at 11:39:02PM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > On Sun, 10 Apr 2005, Linus Torvalds wrote: > > > > Can you pull my current repo, which has "diff-tree -R" that does what the > > name suggests, and which should be faster than the 0.48 sec you see.. > > Actually, I changed things around. Everybody hated the "<" ">" lines, so I > put a changed thing on a line of its own with a "*" instead. > > So you'd now see lines like > > *100644->100644 1874e031abf6631ea51cf6177b82a1e662f6183e->e8181df8499f165cacc6a0d8783be7143013d410 CREDITS > > which means that the CREDITS file has changed, and it shows you the mode > -> mode transition (that didn't change in this case) and the sha1 -> sha1 > transition. > > So now it's always just one line per change. Firthermore, the filename is > always field 3, if you use spaces as delimeters, regardless of whether > it's a +/-/* field. That's great, just when I finally managed to properly fix the xargs boundary case in gitdiff-do (without throwing away the NUL-termination). You know how to please people! ;-) (Not that I'd have *anything* against the change. The logic is simpler and you'll be actually able to work with diff-tree a little sanely.) BTW, it is quite handy to have the entry type in the listing (guessing that from mode in the script just doesn't feel right and doing explicit cat-file kills the performance). I would also really prefer the fields separated by tabs. It looks nicer on the screen (aligned, e.g. modes and type are varsized), and is also easier to parse (cut defaults to tabs as delimiters, for example). > So let's say you want to merge two trees (dst1 and dst2) from a common > parent (src), what you would do is: > > - get the list of files to merge: > > diff-tree -R <dst1> <dst2> | tr '\0' '\n' > merge-files ...oh, I probably forgot to ask - why did you choose -R instead of -r? It looks rather alien to me; if it starts by 'diff', my hand writes -r without thinking. > - Which of those were changed by <src> -> <dstX>? > > diff-tree -R <src> <dst1> | tr '\0' '\n' | join -j 3 - merge-files > dst1-change > diff-tree -R <src> <dst2> | tr '\0' '\n' | join -j 3 - merge-files > dst2-change > > - Which of those are common to both? Let's see what the merge list is: > > join dst1-change dst2-change > merge-list > > and hopefully you'd usually be working on a very small list of files by > then (everything else you'd just pick from one of the destination trees > directly - you've got the name, the sha-file, everything: no need to even > look at the data). Ok, this looks reasonable. (Provided that I DWYM regarding the joins.) > Does this sound sane? Pasky? Wanna try a "git merge" thing? Starting off > with the user having to tell what the common parent tree is - we can try > to do the "automatically find best common parent" crud later. THAT may be > expensive. I will definitively try "git merge", but maybe not this night anymore (it's already 1:32 here now). -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 20:38 ` Linus Torvalds 2005-04-10 21:39 ` Linus Torvalds @ 2005-04-10 22:27 ` Petr Baudis 2005-04-10 23:10 ` Linus Torvalds 2005-04-10 23:23 ` [ANNOUNCE] git-pasky-0.1 Paul Jackson 2005-04-11 0:30 ` Re: " Petr Baudis 2 siblings, 2 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-10 22:27 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift Dear diary, on Sun, Apr 10, 2005 at 10:38:11PM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > On Sun, 10 Apr 2005, Petr Baudis wrote: > > > > It turns out to be the forks for doing all the cuts and such what is > > bogging it down so awfully (doing diff-tree takes 0.48s ;-). I do about > > 15 forks per change, I guess, and for some reason cut takes a long of > > time on its own. > > Heh. > > Can you pull my current repo, which has "diff-tree -R" that does what the > name suggests, and which should be faster than the 0.48 sec you see.. Funnily enough, now after some more cache teasing it's ~0.185. Your one still ~0.17, though. :/ (That might be because of the format changes, though, since you do less printing now.) (BTW, all those measurements are done on my AMD K6 walking on 1600MHz, 512M RAM, about 200M available for caches.) Just out of interest, did you have a look at my diff-tree -r implementation and decided that you don't like it, or you weren't aware of it? I will probably take most of your diff-tree change, but I'd prefer to do the sha1->tree mapping directly in diff_tree(). > It may not matter a lot, since actually generating the diff from the file > contents is what is expensive, but remember my goal: I want the expense of > a diff-tree to be relative to the size of the diff, so that implies that > small diffs haev to be basically instantaenous. So I care. Me too, of course. > So I just tried the 2.6.7->2.6.8 diff, and for me the new recursive > "diff-tree" can generate the _list_ of files changed in zero time: > > real 0m0.079s > user 0m0.067s > sys 0m0.024s > > but then _doing_ the diff is pretty expensive (in this case 3800+ files > changed, so you have to unpack 7600+ objects - and even unpacking isn't > the expensive part, the expense is literally in the diff operation > itself). > > Me, the stuff I automate is the small steps. Doing a single checkin. So > that's the case I care about going fast, when a "diff-tree" will likely > have maybe five files or something. That's why I want the small > incremental cases to go fast - it it takes me a minute to generate a diff > for a _release_, that's not a big deal. I make one release every other > month, but I work with lots of small patches all the time. I see. > Anyway, with a fast diff-tree, you should be able to generate the list of > objects for a fast "merge". That's next. > > (And by "merge", I of course mean "suck". I'm talking about the old CVS > three-way merge, and you have to specify the common parent explicitly and > it won't handle any renames or any other crud. But it would get us to > something that might actually be useful for simple things. Which is why > "diff-tree" is important - it gives the information about what to tell > merge). I currently already do a merge when you track someone's source - it will throw away your previous HEAD record though, so if you committed some local changes after the previous pull, you will get orphaned commits and the changes will turn to uncommitted ones. I have some ideas regarding how to do it properly (and do any arbitrary merging, for that matter), I hope to get to it as soon as I catch up with you. :-) BTW, the three-way merge comes from RCS. That reminds me, is there any tool which will take .rej files and throw them into the file to create rcsmerge-like conflicts? Perhaps it's fault of my bad tools, but I prefer to work with the inline rejects much more to .rej files (except to actually notice the rejects). -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 22:27 ` Petr Baudis @ 2005-04-10 23:10 ` Linus Torvalds 2005-04-10 23:26 ` Petr Baudis 2005-04-10 23:23 ` [ANNOUNCE] git-pasky-0.1 Paul Jackson 1 sibling, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 23:10 UTC (permalink / raw) To: Petr Baudis Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift On Mon, 11 Apr 2005, Petr Baudis wrote: > > I currently already do a merge when you track someone's source - it will > throw away your previous HEAD record though Not only that, it doesn't do what I consider a "merge". A real merge should have two or more parents. The "commit-tree" command already allows that: just add any arbitrary number of "-p xxxxxxxxx" switches (well, I think I limited it to 16 parents, but that's just a totally random number, there's nothing in the file format or anything else that limits it). So while you've merged my "data", but you've not actually merged my revision history in your tree. And the reason a real merge _has_ to show both parents properly is that unless you do that, you can never merge sanely another time without getting lots of clashes from the previous merge. So it's important that a merge really shows both trees it got data from. This is, btw, also the reason I haven't merged with your tree - I want to get to the point where I really _can_ merge without throwing away the information. In fact, at this point I'd rather not merge with your tree at all, because I consider your tree to be "corrupt" thanks to lacking the merge history. So you've done the data merge, but not the history merge. And because you didn't do the history merge, there's no way to automatically find out what point of my tree you merged _with_. See? And since I have no way to see what point in time you merged with me, now I can't generate a nice 3-way diff against the last common ancestor of both of our trees. So now I can't do a three-way merge with you based on any sane ancestor, unless I start guessing which ancestor of mine you merged with. Now, that "guess" is easy enough to do with a project like "git" which currently has just a few tens of commits and effectively only two parallell development trees, but the whole point is to get to a system where that isn't true.. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 23:10 ` Linus Torvalds @ 2005-04-10 23:26 ` Petr Baudis 2005-04-10 23:46 ` Linus Torvalds 0 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-10 23:26 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift Dear diary, on Mon, Apr 11, 2005 at 01:10:58AM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > > > On Mon, 11 Apr 2005, Petr Baudis wrote: > > > > I currently already do a merge when you track someone's source - it will > > throw away your previous HEAD record though > > Not only that, it doesn't do what I consider a "merge". > > A real merge should have two or more parents. The "commit-tree" command > already allows that: just add any arbitrary number of "-p xxxxxxxxx" > switches (well, I think I limited it to 16 parents, but that's just a > totally random number, there's nothing in the file format or anything > else that limits it). > > So while you've merged my "data", but you've not actually merged my > revision history in your tree. Well, that's exactly what I was (am) going to do. :-) That's also why I said that I (virtually) throw the local commits away now. Instead, if there were any local commits, I will do git merge: commit-tree $(write-tree) -p $local_head -p $tracked_tree Note that I will need to make this two-phase - first applying the changes, then doing the commit; between those two phases, the user should resolve potential conflicts and check if the merge went right. I think I will name the first phase git merge and the second phase will be just git commit, and I will store the merge information in .dircache/. (BTW, I think the directory name is pretty awful; what about .git/ ?) > And the reason a real merge _has_ to show both parents properly is that > unless you do that, you can never merge sanely another time without > getting lots of clashes from the previous merge. So it's important that a > merge really shows both trees it got data from. > > This is, btw, also the reason I haven't merged with your tree - I want to > get to the point where I really _can_ merge without throwing away the > information. In fact, at this point I'd rather not merge with your tree at > all, because I consider your tree to be "corrupt" thanks to lacking the > merge history. > > So you've done the data merge, but not the history merge. > > And because you didn't do the history merge, there's no way to > automatically find out what point of my tree you merged _with_. See? > > And since I have no way to see what point in time you merged with me, now > I can't generate a nice 3-way diff against the last common ancestor of > both of our trees. > > So now I can't do a three-way merge with you based on any sane ancestor, > unless I start guessing which ancestor of mine you merged with. Now, that > "guess" is easy enough to do with a project like "git" which currently has > just a few tens of commits and effectively only two parallell development > trees, but the whole point is to get to a system where that isn't true.. Well, I've wanted to get the basic things working first before doing git merge. (Especially since until recently, diff-tree was PITA to work with, and before that it didn't even exist.) If you want, I can rebuild my tree with doing the merging properly, after I have git merge working. (BTW, it would be useful to have a tool which just blindly takes what you give it on input and throws it to an object of given type; I will need to construct arbitrary commits during the rebuild if I'm to keep the correct dates.) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 23:26 ` Petr Baudis @ 2005-04-10 23:46 ` Linus Torvalds 2005-04-10 23:56 ` Petr Baudis 0 siblings, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 23:46 UTC (permalink / raw) To: Petr Baudis Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift On Mon, 11 Apr 2005, Petr Baudis wrote: > > (BTW, it would be useful to have a tool which just blindly takes what > you give it on input and throws it to an object of given type; I will > need to construct arbitrary commits during the rebuild if I'm to keep > the correct dates.) Hah. That's what "COMMITTER_NAME" "COMMITTER_EMAIL" and "COMMITTER_DATE" are there for. There's two things to commits: when (and by whom) it was committed to a tree, and when the changes were really done. So set the COMMITTER_xxx things to the person/time you want to consider the _original_ one, and let "commit-tree" author you as the creator of the commit itself. The regular "ChangeLog" thing should only show the author and original time, but it's nice to see who created the commit itself. I did this very much on purpose: see how I always try to attribute authorship in BK to the person who actually wrote the code. At the same time, I think it's interesting from a tracking standpoint to also see when/where that change got introduced into a tree. I _tried_ to get this right in the sparse tree conversion. I won't guarantee that it's all correct, but the top commit in the sparse tree looks like this: tree 67607f05a66e36b2f038c77cfb61350d2110f7e8 parent 9c59995fef9b52386e5f7242f44720a7aca287d7 author Christopher Li <sparse@chrisli.org> Sat Apr 2 09:30:09 PST 2005 committer Linus Torvalds <torvalds@ppc970.osdl.org> Thu Apr 7 20:06:31 2005 ... exactly because I tracked when I committed it to the sparse tree _separately_ from tracking when it was created. So when I re-create the sparse-tree, I'll also end up re-writing the "committer" information. And that's proper. That's really saying "this sha1 object was created by Xxxx at time Xxxx". Btw, the "COMMITTER_xxxx" environment variables are very confusingly named. They actually go into the _author_ line in the commit object. I'm a total retard, and I really don't know why I called it "COMMITTER_xxx" instead of "AUTHOR_xxx". Linus "retard" Torvalds ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 23:46 ` Linus Torvalds @ 2005-04-10 23:56 ` Petr Baudis 2005-04-11 0:20 ` GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) Linus Torvalds 0 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-10 23:56 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift Dear diary, on Mon, Apr 11, 2005 at 01:46:50AM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > > > On Mon, 11 Apr 2005, Petr Baudis wrote: > > > > (BTW, it would be useful to have a tool which just blindly takes what > > you give it on input and throws it to an object of given type; I will > > need to construct arbitrary commits during the rebuild if I'm to keep > > the correct dates.) > > Hah. That's what "COMMITTER_NAME" "COMMITTER_EMAIL" and "COMMITTER_DATE" > are there for. > > There's two things to commits: when (and by whom) it was committed to a > tree, and when the changes were really done. > > So set the COMMITTER_xxx things to the person/time you want to consider > the _original_ one, and let "commit-tree" author you as the creator of the > commit itself. The regular "ChangeLog" thing should only show the author > and original time, but it's nice to see who created the commit itself. I already use those - look at my ChangeLog. (That's because for certain reasons I'm working on git in a half-broken chrooted environment.) When rebuilding the tree from scratch, I wanted like to do it transparently - that is, so that noone could notice that I rebuilt it, since it effectively still _is_ the original tree from the data standpoint, just the history flow is actually correct this time. > Btw, the "COMMITTER_xxxx" environment variables are very confusingly > named. They actually go into the _author_ line in the commit object. I'm a > total retard, and I really don't know why I called it "COMMITTER_xxx" > instead of "AUTHOR_xxx". So, who will fix it in his tree first! ;-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) 2005-04-10 23:56 ` Petr Baudis @ 2005-04-11 0:20 ` Linus Torvalds 2005-04-11 0:27 ` Petr Baudis 2005-04-11 7:45 ` Ingo Molnar 0 siblings, 2 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-11 0:20 UTC (permalink / raw) To: Petr Baudis Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift Btw, does anybody have strong opinions on the license? I didn't put in a COPYING file exactly because I was torn between GPLv2 and OSL2.1. I'm inclined to go with GPLv2 just because it's the most common one, but I was wondering if anybody really had strong opinions. For example, I'd really make it "v2 by default" like the kernel, since I'm sure v3 will be fine, but regardless of how sure I am, I'm _not_ a gambling man. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) 2005-04-11 0:20 ` GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) Linus Torvalds @ 2005-04-11 0:27 ` Petr Baudis 2005-04-11 7:45 ` Ingo Molnar 1 sibling, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-11 0:27 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift Dear diary, on Mon, Apr 11, 2005 at 02:20:52AM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... > Btw, does anybody have strong opinions on the license? I didn't put in a > COPYING file exactly because I was torn between GPLv2 and OSL2.1. > > I'm inclined to go with GPLv2 just because it's the most common one, but I > was wondering if anybody really had strong opinions. For example, I'd > really make it "v2 by default" like the kernel, since I'm sure v3 will be > fine, but regardless of how sure I am, I'm _not_ a gambling man. Oh, I wanted to ask about this too. I'd mostly prefer GPLv2 (I have no problem with the version restriction, I usually do it too), it's the one I'm mostly familiar with and OSL appears to be incompatible with GPL (at least FSF says so about OSL1.0), which might create various annoying issues. I hate when licenses get in my way and prevent me to possibly include some useful code. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) 2005-04-11 0:20 ` GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) Linus Torvalds 2005-04-11 0:27 ` Petr Baudis @ 2005-04-11 7:45 ` Ingo Molnar 2005-04-11 8:40 ` Florian Weimer 1 sibling, 1 reply; 179+ messages in thread From: Ingo Molnar @ 2005-04-11 7:45 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift * Linus Torvalds <torvalds@osdl.org> wrote: > Btw, does anybody have strong opinions on the license? I didn't put in > a COPYING file exactly because I was torn between GPLv2 and OSL2.1. > > I'm inclined to go with GPLv2 just because it's the most common one, > but I was wondering if anybody really had strong opinions. For > example, I'd really make it "v2 by default" like the kernel, since I'm > sure v3 will be fine, but regardless of how sure I am, I'm _not_ a > gambling man. is there any fundamental problem with going with v2 right now, and then once v3 is out and assuming it looks ok, all newly copyrightable bits (new files, rewrites, substantial contributions, etc.) get a v3 copyright? (and the collection itself would be v3 too) That method wouldnt make it fully v3 automatically once v3 is out, but with time there would be enough v3 bits in it to make it essentially v3. This way we wouldnt have to blanket trust v3 before having seen it, and wouldnt be stuck at v2 either. Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) 2005-04-11 7:45 ` Ingo Molnar @ 2005-04-11 8:40 ` Florian Weimer 2005-04-11 10:52 ` Petr Baudis 0 siblings, 1 reply; 179+ messages in thread From: Florian Weimer @ 2005-04-11 8:40 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Petr Baudis, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift * Ingo Molnar: > is there any fundamental problem with going with v2 right now, and then > once v3 is out and assuming it looks ok, all newly copyrightable bits > (new files, rewrites, substantial contributions, etc.) get a v3 > copyright? (and the collection itself would be v3 too) That method > wouldnt make it fully v3 automatically once v3 is out, but with time > there would be enough v3 bits in it to make it essentially v3. Almost certainly, v3 will be incompatible with v2 because it adds further restrictions. This means that your proposal would result in software which is not redistributable by third parties. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) 2005-04-11 8:40 ` Florian Weimer @ 2005-04-11 10:52 ` Petr Baudis 2005-04-11 16:05 ` Florian Weimer 0 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-11 10:52 UTC (permalink / raw) To: Florian Weimer Cc: Ingo Molnar, Linus Torvalds, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift Dear diary, on Mon, Apr 11, 2005 at 10:40:00AM CEST, I got a letter where Florian Weimer <fw@deneb.enyo.de> told me that... > * Ingo Molnar: > > > is there any fundamental problem with going with v2 right now, and then > > once v3 is out and assuming it looks ok, all newly copyrightable bits > > (new files, rewrites, substantial contributions, etc.) get a v3 > > copyright? (and the collection itself would be v3 too) That method > > wouldnt make it fully v3 automatically once v3 is out, but with time > > there would be enough v3 bits in it to make it essentially v3. > > Almost certainly, v3 will be incompatible with v2 because it adds > further restrictions. This means that your proposal would result in > software which is not redistributable by third parties. Hmm, what would be actually the point in introducing further restrictions? Anyone who then wants to get around them will just distribute the software with the "any later version" provision under GPLv2, and GPLv3 will have no impact expect for new software with "v3 or any later version" provision. What am I missing? I've been doing a lot of LKML catching up, and I remember someone suggesting using GPLv2 (for kernel, but should apply to git too), with a provision to let someone trusted (Linus) decide when GPLv3 is out whether you can use GPLv3 for the kernel too. Does it make sense? And is it even legally doable without sending signed written documents to Linus' tropical hacienda? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) 2005-04-11 10:52 ` Petr Baudis @ 2005-04-11 16:05 ` Florian Weimer 0 siblings, 0 replies; 179+ messages in thread From: Florian Weimer @ 2005-04-11 16:05 UTC (permalink / raw) To: Petr Baudis Cc: Ingo Molnar, Linus Torvalds, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift * Petr Baudis: >> Almost certainly, v3 will be incompatible with v2 because it adds >> further restrictions. This means that your proposal would result in >> software which is not redistributable by third parties. > > Hmm, what would be actually the point in introducing further > restrictions? Anyone who then wants to get around them will just > distribute the software with the "any later version" provision under > GPLv2, and GPLv3 will have no impact expect for new software with "v3 or > any later version" provision. What am I missing? Software continues to evolve. The copyright owners can relicense the code base under v3, and use v3 for all subsequent changes to the software. The trouble with relicensing is that you have to contact all copyright holders (or remove their code). This tends to be impossible in long-running projects without contractual agreements between the developers. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 22:27 ` Petr Baudis 2005-04-10 23:10 ` Linus Torvalds @ 2005-04-10 23:23 ` Paul Jackson 2005-04-11 0:15 ` Randy.Dunlap 1 sibling, 1 reply; 179+ messages in thread From: Paul Jackson @ 2005-04-10 23:23 UTC (permalink / raw) To: Petr Baudis; +Cc: torvalds, mingo, willy, linux-kernel, rddunlap, ross Petr wrote: > That reminds me, is there any > tool which will take .rej files and throw them into the file to create > rcsmerge-like conflicts? Check out 'wiggle' http://www.cse.unsw.edu.au/~neilb/source/wiggle/ -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 23:23 ` [ANNOUNCE] git-pasky-0.1 Paul Jackson @ 2005-04-11 0:15 ` Randy.Dunlap 0 siblings, 0 replies; 179+ messages in thread From: Randy.Dunlap @ 2005-04-11 0:15 UTC (permalink / raw) To: Paul Jackson; +Cc: pasky, torvalds, mingo, willy, linux-kernel, ross On Sun, 10 Apr 2005 16:23:11 -0700 Paul Jackson wrote: | Petr wrote: | > That reminds me, is there any | > tool which will take .rej files and throw them into the file to create | > rcsmerge-like conflicts? | | Check out 'wiggle' | http://www.cse.unsw.edu.au/~neilb/source/wiggle/ or Chris Mason's 'rej' program: ftp://ftp.suse.com/pub/people/mason/rej/ --- ~Randy ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 20:38 ` Linus Torvalds 2005-04-10 21:39 ` Linus Torvalds 2005-04-10 22:27 ` Petr Baudis @ 2005-04-11 0:30 ` Petr Baudis 2005-04-11 1:11 ` Linus Torvalds 2 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-11 0:30 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift Dear diary, on Sun, Apr 10, 2005 at 10:38:11PM CEST, I got a letter where Linus Torvalds <torvalds@osdl.org> told me that... ..snip.. > Can you pull my current repo, which has "diff-tree -R" that does what the > name suggests, and which should be faster than the 0.48 sec you see.. Am I just missing something, or your diff-tree doesn't handle added/removed directories? (Mine does! *hint* *hint* It also doesn't bother with dynamic allocation, but someone might consider the static path buffer ugly. Anyway, I hacked it with a plan to do a massive cleanup of the file later.) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.1 2005-04-11 0:30 ` Re: " Petr Baudis @ 2005-04-11 1:11 ` Linus Torvalds 0 siblings, 0 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-11 1:11 UTC (permalink / raw) To: Petr Baudis Cc: Ingo Molnar, Willy Tarreau, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift On Mon, 11 Apr 2005, Petr Baudis wrote: > > Dear diary, on Sun, Apr 10, 2005 at 10:38:11PM CEST, I got a letter > where Linus Torvalds <torvalds@osdl.org> told me that... > ..snip.. > > Can you pull my current repo, which has "diff-tree -R" that does what the > > name suggests, and which should be faster than the 0.48 sec you see.. > > Am I just missing something, or your diff-tree doesn't handle > added/removed directories? You're not missing anything, I did it that way on purpose. I thought it would be easier to do the expansion in the caller (who knows what it is they want to do with the end result). But now that I look at merging, I realize that was actually the wrong thing to do. A merge algorithm definitely wants to see the expanded tree, since it will compare/join several of the diff-tree output things. So I'll either fix it or decide to just go with your version instead. I'm not overly proud. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.1 2005-04-10 18:45 ` Petr Baudis 2005-04-10 19:13 ` Willy Tarreau 2005-04-10 20:38 ` Linus Torvalds @ 2005-04-10 20:41 ` Paul Jackson 2 siblings, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-10 20:41 UTC (permalink / raw) To: Petr Baudis; +Cc: mingo, willy, linux-kernel, torvalds, rddunlap, ross Good lord - you don't need to use arrays for this. The old-fashioned ways have their ways. Both the 'set' command and the 'read' command can split args and assign to distinct variable names. Try something like the following: diff-tree -r $id1 $id2 | sed -e '/^</ { N; s/\n>/ / }' -e 's/./& /' | while read op mode1 sha1 name1 mode2 sha2 name2 do ... various common stuff ... case "$op" in "+") ... ;; "-") ... ;; "<") test $name1 = $name2 || die mismatched names label1=$(mkbanner "$loc1" $id1 "$name1" $mode1 $sha1) label2=$(mkbanner "$loc2" $id2 "$name1" $mode2 $sha2) diff -L "$label1" -L "$label2" -u "$loc1" "$loc2" ;; esac done -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* [ANNOUNCE] git-pasky-0.2 2005-04-10 16:27 ` [ANNOUNCE] git-pasky-0.1 Petr Baudis 2005-04-10 16:55 ` Linus Torvalds 2005-04-10 17:33 ` Ingo Molnar @ 2005-04-11 1:58 ` Petr Baudis 2005-04-11 2:46 ` Daniel Barkalow ` (2 more replies) 2 siblings, 3 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-11 1:58 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Linus Torvalds, Randy.Dunlap, Ross Vandegrift Hello, here goes git-pasky-0.2, my set of patches and scripts upon Linus' git, aimed at human usability and to an extent a SCM-like usage. If you already have a previous git-pasky version, just git pull pasky to get it. Otherwise, you can get it from: http://pasky.or.cz/~pasky/dev/git/ Please see the README there and/or the parent post for detailed instructions. You can find the changes from the last announcement in the ChangeLog (releases have separate commits so you can find them easily; they are also tagged for purpose of diffing etc). This is release contains mostly bugfixes, performance enhancements (especially w.r.t. git diff), and some merges with Linus (except for diff-tree, where I merged only the new output format). New features are trivial - support for tagging and short SHA1 ids; you can use only the start of the SHA1 hash long enough to be unambiguous. My immediate plan is implementing git merge, which I will do tommorow, if noone will do it before that is. ;-) Any feedback/opinions/suggestions/patches (especially patches) are welcome. Have fun, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.2 2005-04-11 1:58 ` [ANNOUNCE] git-pasky-0.2 Petr Baudis @ 2005-04-11 2:46 ` Daniel Barkalow 2005-04-11 10:17 ` Petr Baudis 2005-04-11 8:50 ` Ingo Molnar 2005-04-11 13:57 ` [ANNOUNCE] git-pasky-0.3 Petr Baudis 2 siblings, 1 reply; 179+ messages in thread From: Daniel Barkalow @ 2005-04-11 2:46 UTC (permalink / raw) To: Petr Baudis; +Cc: Kernel Mailing List On Mon, 11 Apr 2005, Petr Baudis wrote: > Hello, > > here goes git-pasky-0.2, my set of patches and scripts upon > Linus' git, aimed at human usability and to an extent a SCM-like usage. Incidentally, the git-pasky-base tarball you have up has its checked-out tree partway between 0.1 and 0.2, and doesn't compile. (The included HEAD version in .dircache is fine, if the user has some way to bootstrap) -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.2 2005-04-11 2:46 ` Daniel Barkalow @ 2005-04-11 10:17 ` Petr Baudis 0 siblings, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-11 10:17 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Kernel Mailing List Dear diary, on Mon, Apr 11, 2005 at 04:46:42AM CEST, I got a letter where Daniel Barkalow <barkalow@iabervon.org> told me that... > On Mon, 11 Apr 2005, Petr Baudis wrote: > > > Hello, > > > > here goes git-pasky-0.2, my set of patches and scripts upon > > Linus' git, aimed at human usability and to an extent a SCM-like usage. > > Incidentally, the git-pasky-base tarball you have up has its checked-out > tree partway between 0.1 and 0.2, and doesn't compile. (The included HEAD > version in .dircache is fine, if the user has some way to bootstrap) Oops, I'm sorry. It appears some diffs just slipped out from the tracked tree, perhaps I was pulling once when git diff was broken and I didn't notice it. Now there is a newer tarball there, it is not a pure 0.2 anymore though - if you use the COMMITTER_* env variables, they are now AUTHOR_*. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.2 2005-04-11 1:58 ` [ANNOUNCE] git-pasky-0.2 Petr Baudis 2005-04-11 2:46 ` Daniel Barkalow @ 2005-04-11 8:50 ` Ingo Molnar 2005-04-11 10:16 ` Petr Baudis 2005-04-11 13:57 ` [ANNOUNCE] git-pasky-0.3 Petr Baudis 2 siblings, 1 reply; 179+ messages in thread From: Ingo Molnar @ 2005-04-11 8:50 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift * Petr Baudis <pasky@ucw.cz> wrote: > Hello, > > here goes git-pasky-0.2, my set of patches and scripts upon Linus' > git, aimed at human usability and to an extent a SCM-like usage. works fine on FC4, i only minor issues: 'git' in the tarball didnt have the x permission. Also, your scripts assume they are in $PATH. When trying out a tarball one doesnt usually do a 'make install' but tries stuff locally. Also, 'make install' doesnt seem to install the git script itself, is that intentional? Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.2 2005-04-11 8:50 ` Ingo Molnar @ 2005-04-11 10:16 ` Petr Baudis 0 siblings, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-11 10:16 UTC (permalink / raw) To: Ingo Molnar Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift Dear diary, on Mon, Apr 11, 2005 at 10:50:51AM CEST, I got a letter where Ingo Molnar <mingo@elte.hu> told me that... > > * Petr Baudis <pasky@ucw.cz> wrote: > > > Hello, > > > > here goes git-pasky-0.2, my set of patches and scripts upon Linus' > > git, aimed at human usability and to an extent a SCM-like usage. > > works fine on FC4, i only minor issues: 'git' in the tarball didnt have > the x permission. Sorry, fixed in the tarball. It is in the diffs but I have no git patch yet to apply the mode changes. > Also, your scripts assume they are in $PATH. When > trying out a tarball one doesnt usually do a 'make install' but tries > stuff locally. Hmm, I think I will need to make something like exedir=$(dirname $0) on the top of each script and then do all the git calls with ${exedit} prepended. That should fix the issue, right? > Also, 'make install' doesnt seem to install the git script itself, is > that intentional? Oops, I actually didn't even notice that there _is_ any install target in the Makefile already. ;-) I will add the relevant stuff to it. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* [ANNOUNCE] git-pasky-0.3 2005-04-11 1:58 ` [ANNOUNCE] git-pasky-0.2 Petr Baudis 2005-04-11 2:46 ` Daniel Barkalow 2005-04-11 8:50 ` Ingo Molnar @ 2005-04-11 13:57 ` Petr Baudis 2005-04-12 12:47 ` Martin Schlemmer ` (2 more replies) 2 siblings, 3 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-11 13:57 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Linus Torvalds, Randy.Dunlap, Ross Vandegrift Hello, here goes git-pasky-0.3, my set of patches and scripts upon Linus' git, aimed at human usability and to an extent a SCM-like usage. If you already have a previous git-pasky version, just git pull pasky to get it (but see below!!!). Otherwise, you can get it from: http://pasky.or.cz/~pasky/dev/git/ Please see the README there and/or the parent post for detailed instructions. You can find the changes from the last announcement in the ChangeLog (releases have separate commits so you can find them easily; they are also tagged for purpose of diffing etc). This release is mainly focused on bugfixes. Especially, it fixes git diff, which was totally broken in the previous release and would only diff every other file (forgot to remove one shift from the times when changes were reported two-line from diff-tree). Very sorry about that. This implies that git pull was broken too, though - if you pulled tracked branch, git diff wouldn't produce the complete diff for patch to apply. If you didn't do any local changes, it is fortunately easy to repair: git diff | patch -p0 -R (The unapplied changes appear as reverted in your local tree when compared with the cache.) You will need to edit the diff if you did some local changes. Other change breaking some compatibility is regarding commit environment variables - s/COMMITTER_*/AUTHOR_*/. Otherwise it is usual bunch of merges with Linus and some really minor stuff. Oh, and make install works. One annoying thing is rsync error when pulling from Linus - it tries to sync the tags/ directory and I don't know how to safely silence it except throwing away all stderr. I will probably make it fetch the list of .dircache and rsync only things which are really there. Any feedback/opinions/suggestions/patches (especially patches) are welcome. You can also stop by at #git either on FreeNode or on OTFC (I will be around only from 20:00 CET on, though). Have fun, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-11 13:57 ` [ANNOUNCE] git-pasky-0.3 Petr Baudis @ 2005-04-12 12:47 ` Martin Schlemmer 2005-04-12 13:02 ` Petr Baudis 2005-04-12 13:07 ` David Woodhouse 2005-04-13 9:35 ` Russell King 2 siblings, 1 reply; 179+ messages in thread From: Martin Schlemmer @ 2005-04-12 12:47 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift [-- Attachment #1.1: Type: text/plain, Size: 438 bytes --] On Mon, 2005-04-11 at 15:57 +0200, Petr Baudis wrote: > Hello, > > here goes git-pasky-0.3, my set of patches and scripts upon > Linus' git, aimed at human usability and to an extent a SCM-like usage. > Its pretty dependant on where VERSION is located. This patch fixes that. (PS, I left the output of 'git diff' as is to ask about the following stuff after the proper diff ...) Regards, -- Martin Schlemmer [-- Attachment #1.2: add_version.patch --] [-- Type: text/x-patch, Size: 3297 bytes --] --- - 2005-04-12 14:36:44.384822000 +0200 +++ Makefile 2005-04-12 14:33:14.000000000 +0200 @@ -19,10 +19,14 @@ gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \ gitmerge.sh gitpull.sh gitrm.sh gittag.sh gittrack.sh -all: $(PROG) +GEN_SCRIPT= gitversion.sh -install: $(PROG) - install $(PROG) $(SCRIPT) $(HOME)/bin/ +VERSION= VERSION + +all: $(PROG) $(GEN_SCRIPT) + +install: $(PROG) $(GEN_SCRIPT) + install $(PROG) $(SCRIPT) $(GEN_SCRIPT) $(HOME)/bin/ LIBS= -lssl -lz @@ -67,8 +71,14 @@ read-cache.o: cache.h show-diff.o: cache.h +gitversion.sh: $(VERSION) + @rm -f $@ + @echo "#!/bin/sh" > $@ + @echo "echo \"$(shell cat $(VERSION))\"" >> $@ + @chmod +x $@ + clean: - rm -f *.o $(PROG) temp_git_file_* + rm -f *.o $(PROG) temp_git_file_* $(GEN_SCRIPT) backup: clean cd .. ; tar czvf dircache.tar.gz dir-cache --- - 2005-04-12 14:36:44.417284000 +0200 +++ git 2005-04-12 14:31:38.000000000 +0200 @@ -20,7 +20,7 @@ help () { cat <<__END__ -The GIT scripted toolkit $(cat VERSION) +The GIT scripted toolkit $(gitversion.sh) Usage: git COMMAND [ARG]... COPYING: fe2a4177a760fd110e78788734f167bd633be8de 33 Makefile: b514dc5cc62bc9d2b2cf0f81dcce15ff7de83eee 33 README: fa9b676d62f8ac5c1ff36e7742dc6db8f6cdf97f 33 VERSION: d71f8ea875f9fbd86de7b1457924492473cd1718 33 cache.h: d3e9a21b7d9a2ac32abacf5cc40ee1a4d83f9fe8 33 cat-file.c: 45be1badaa8517d4e3a69e0bf1cac2e90191e475 37 checkout-cache.c: a87b31e3787c312364d7295b782d6c22d1577f5c 33 commit-id: 65c81756c8f10d513d073ecbd741a3244663c4c9 3b commit-tree.c: 2e25f72ddb66bd8ebd448405f6df76e15cc9d030 33 diff-tree.c: 317339fc9c1169b886fdfc22863e9451109b88c7 33 fsck-cache.c: 7a2f36aa0bc8677adfbc8542338e16d5188dee4a 33 git: 2f1cc7f80079b9c2feec8e7310d30e57b6e4b2aa 33 gitXnormid.sh: 619a89875c4ccd6f380c4be33274a71bb2a1b7f2 33 gitadd.sh: 3ed93ea0fcb995673ba9ee1982e0e7abdbe35982 33 gitaddremote.sh: ab075628b0b4b16aa05382955b8607700f96101f 33 gitcommit.sh: 5e98e3b5fe501a196a1030c11d1ad6ac87532e6a 33 gitdiff-do: d6174abceab34d22010c36a8453a6c3f3f184fe0 33 gitdiff.sh: 9f558422003160f0d006f7948702e22d5c90254c 33 gitlog.sh: d6b33fb0c47369be7b6af3b21f2188e226bf2feb 33 gitls.sh: b6f15d82f16c1e9982c5031f3be22eb5430273af 33 gitlsobj.sh: 128461d3de6a42cfaaa989fc6401bebdfa885b3f 33 gitmerge.sh: e25d42dde7c9b929476b0967b2a60d9b342b2e79 3b gitpull.sh: f29bb37c5eef416ed65e46aa3c52493d07619cd8 33 gitrm.sh: 5c18c38a890c9fd9ad2b866ee7b529539d2f3f8f 33 gittag.sh: 0cd3188a442a367db327f70aba14ff2a0d69e927 3b gittrack.sh: ae34f5c5e1e9969619dde8d0621fd9c212208694 33 init-db.c: 3296763cdb4bd242a9ec01933ac8d3d5320d20e4 33 ls-tree.c: 3e2a6c7d183a42e41f1073dfec6794e8f8a5e75c 37 parent-id: 1801c6fe426592832e7250f8b760fb9d2e65220f 33 read-cache.c: 95d0ec6e95ab054da4ef9673641c9f809eebef2b 33 read-tree.c: eb548148aa6d212f05c2c622ffbe62a06cd072f9 33 rev-tree.c: 7429b9c4d0aab2e4a494eb4b65129a59da138106 33 show-diff.c: 043772cb08b6795008474316f38a326c4196edd6 37 show-files.c: 347894d6360e5ef56140a9a70d2a0b000a268a33 33 tree-id: cb70e2c508a18107abe305633612ed702aa3ee4f 37 update-cache.c: 3d49a1cbd20c7fcf1010b0f3affaf896310c6797 33 write-tree.c: eed7c02123c6c6458597726ef4f8b2208aefa5bb 33 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-12 12:47 ` Martin Schlemmer @ 2005-04-12 13:02 ` Petr Baudis 2005-04-12 13:13 ` Martin Schlemmer 0 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-12 13:02 UTC (permalink / raw) To: Martin Schlemmer Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift Dear diary, on Tue, Apr 12, 2005 at 02:47:25PM CEST, I got a letter where Martin Schlemmer <azarah@nosferatu.za.org> told me that... > On Mon, 2005-04-11 at 15:57 +0200, Petr Baudis wrote: > > Hello, > > > > here goes git-pasky-0.3, my set of patches and scripts upon > > Linus' git, aimed at human usability and to an extent a SCM-like usage. > > > > Its pretty dependant on where VERSION is located. This patch fixes > that. (PS, I left the output of 'git diff' as is to ask about the > following stuff after the proper diff ...) Thanks, applied. I don't understand your PS, though. :-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-12 13:02 ` Petr Baudis @ 2005-04-12 13:13 ` Martin Schlemmer 2005-04-12 13:23 ` Petr Baudis 0 siblings, 1 reply; 179+ messages in thread From: Martin Schlemmer @ 2005-04-12 13:13 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift [-- Attachment #1: Type: text/plain, Size: 915 bytes --] On Tue, 2005-04-12 at 15:02 +0200, Petr Baudis wrote: > Dear diary, on Tue, Apr 12, 2005 at 02:47:25PM CEST, I got a letter > where Martin Schlemmer <azarah@nosferatu.za.org> told me that... > > On Mon, 2005-04-11 at 15:57 +0200, Petr Baudis wrote: > > > Hello, > > > > > > here goes git-pasky-0.3, my set of patches and scripts upon > > > Linus' git, aimed at human usability and to an extent a SCM-like usage. > > > > > > > Its pretty dependant on where VERSION is located. This patch fixes > > that. (PS, I left the output of 'git diff' as is to ask about the > > following stuff after the proper diff ...) > > Thanks, applied. I don't understand your PS, though. :-) > Heh, yeah I do that sometimes. Basically should 'git diff' output anything (besides maybe not added files like cvs ... sorry, do not know after what you fashion it) like it does now? -- Martin Schlemmer [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-12 13:13 ` Martin Schlemmer @ 2005-04-12 13:23 ` Petr Baudis 0 siblings, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-12 13:23 UTC (permalink / raw) To: Martin Schlemmer Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift Dear diary, on Tue, Apr 12, 2005 at 03:13:15PM CEST, I got a letter where Martin Schlemmer <azarah@nosferatu.za.org> told me that... > On Tue, 2005-04-12 at 15:02 +0200, Petr Baudis wrote: > > Dear diary, on Tue, Apr 12, 2005 at 02:47:25PM CEST, I got a letter > > where Martin Schlemmer <azarah@nosferatu.za.org> told me that... > > > On Mon, 2005-04-11 at 15:57 +0200, Petr Baudis wrote: > > > > Hello, > > > > > > > > here goes git-pasky-0.3, my set of patches and scripts upon > > > > Linus' git, aimed at human usability and to an extent a SCM-like usage. > > > > > > > > > > Its pretty dependant on where VERSION is located. This patch fixes > > > that. (PS, I left the output of 'git diff' as is to ask about the > > > following stuff after the proper diff ...) > > > > Thanks, applied. I don't understand your PS, though. :-) > > > > Heh, yeah I do that sometimes. Basically should 'git diff' output > anything (besides maybe not added files like cvs ... sorry, do not know > after what you fashion it) like it does now? Huh. Well, git diff without any arguments should just call show-diff. That is show your local uncommitted changes. It doesn't show the locally added/removed files yet for several reasons, but it's being worked on. :-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-11 13:57 ` [ANNOUNCE] git-pasky-0.3 Petr Baudis 2005-04-12 12:47 ` Martin Schlemmer @ 2005-04-12 13:07 ` David Woodhouse 2005-04-13 8:47 ` Russell King 2005-04-13 9:35 ` Russell King 2 siblings, 1 reply; 179+ messages in thread From: David Woodhouse @ 2005-04-12 13:07 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift On Mon, 2005-04-11 at 15:57 +0200, Petr Baudis wrote: > Hello, > > here goes git-pasky-0.3, my set of patches and scripts upon > Linus' git, aimed at human usability and to an extent a SCM-like > usage. Untar, make, add to path, pull, 'git diff' fails on PPC: peach /home/dwmw2/git-pasky-base $ git diff error: bad signature error: verify header failed read_cache: Invalid argument A little extra debugging shows the problem: error: bad signature 0x43524944 should be 0x44495243 The cache.h header file suggests that the cache is host-endian on purpose, because it's local-only. So why am I seeing a cache from another host? Is that comment no longer true? Either way, the original decision is probably bogus -- with trees as large as the kernel tree it makes a lot of sense to keep them somewhere NFS-accessible and use them from different hosts, and byteswapping really isn't going to slow it down that much. We should just pick an endianness and stick to it. I'd suggest making it big-endian to make sure the LE weenies don't forget to byteswap properly. -- dwmw2 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-12 13:07 ` David Woodhouse @ 2005-04-13 8:47 ` Russell King 2005-04-13 8:59 ` Petr Baudis 0 siblings, 1 reply; 179+ messages in thread From: Russell King @ 2005-04-13 8:47 UTC (permalink / raw) To: David Woodhouse Cc: Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift On Tue, Apr 12, 2005 at 02:07:36PM +0100, David Woodhouse wrote: > I'd suggest making it big-endian to make sure the LE weenies don't > forget to byteswap properly. That's not a bad argument actually - especially as networking uses BE. (and git is about networking, right?) 8) -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 8:47 ` Russell King @ 2005-04-13 8:59 ` Petr Baudis 2005-04-13 9:06 ` H. Peter Anvin 2005-04-13 9:25 ` David Woodhouse 0 siblings, 2 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-13 8:59 UTC (permalink / raw) To: David Woodhouse, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Dear diary, on Wed, Apr 13, 2005 at 10:47:05AM CEST, I got a letter where Russell King <rmk+lkml@arm.linux.org.uk> told me that... > On Tue, Apr 12, 2005 at 02:07:36PM +0100, David Woodhouse wrote: > > I'd suggest making it [index] big-endian to make sure the LE weenies don't > > forget to byteswap properly. > > That's not a bad argument actually - especially as networking uses BE. > (and git is about networking, right?) 8) Theoretically, you are never supposed to share your index if you work in fully git environment. However, I offer some "base tarballs" which have the unpacked source as well as the .git directory, and I think you want the index there. Of course you can always regenerate it by read-tree $(tree-id) but I really don't want to (hey, dwmw got away with that too! ;-). It forces an additional out-of-order step you need to do before making use of your git for the first time. The NFS argument obviously seems perfectly valid to me too. So, FWIW, I'm personally all for it, if someone gives me a patch. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 8:59 ` Petr Baudis @ 2005-04-13 9:06 ` H. Peter Anvin 2005-04-13 9:09 ` David Woodhouse 2005-04-13 9:25 ` David Woodhouse 1 sibling, 1 reply; 179+ messages in thread From: H. Peter Anvin @ 2005-04-13 9:06 UTC (permalink / raw) To: Petr Baudis Cc: David Woodhouse, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Petr Baudis wrote: > Dear diary, on Wed, Apr 13, 2005 at 10:47:05AM CEST, I got a letter > where Russell King <rmk+lkml@arm.linux.org.uk> told me that... > >>On Tue, Apr 12, 2005 at 02:07:36PM +0100, David Woodhouse wrote: >> >>>I'd suggest making it [index] big-endian to make sure the LE weenies don't >>>forget to byteswap properly. >> >>That's not a bad argument actually - especially as networking uses BE. >>(and git is about networking, right?) 8) > > Theoretically, you are never supposed to share your index if you work in > fully git environment. However, I offer some "base tarballs" which have > the unpacked source as well as the .git directory, and I think you want > the index there. Of course you can always regenerate it by > > read-tree $(tree-id) > > but I really don't want to (hey, dwmw got away with that too! ;-). It > forces an additional out-of-order step you need to do before making use > of your git for the first time. > > The NFS argument obviously seems perfectly valid to me too. So, FWIW, > I'm personally all for it, if someone gives me a patch. > In userspace, it's definitely easier to stick with BE for a standard byte order, simply because it's the one byteorder one can rely on there being macros available to deal with on all platforms. However, then I would also like to suggest replacing "unsigned int" and "unsigned short" with uint32_t and uint16_t, even though they're consistent on all *current* Linux platforms. -hpa ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:06 ` H. Peter Anvin @ 2005-04-13 9:09 ` David Woodhouse 0 siblings, 0 replies; 179+ messages in thread From: David Woodhouse @ 2005-04-13 9:09 UTC (permalink / raw) To: H. Peter Anvin Cc: Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git On Wed, 2005-04-13 at 02:06 -0700, H. Peter Anvin wrote: > However, then I would also like to suggest replacing "unsigned int" > and "unsigned short" with uint32_t and uint16_t, even though they're > consistent on all *current* Linux platforms. Agreed. -- dwmw2 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 8:59 ` Petr Baudis 2005-04-13 9:06 ` H. Peter Anvin @ 2005-04-13 9:25 ` David Woodhouse 2005-04-13 9:42 ` Petr Baudis ` (2 more replies) 1 sibling, 3 replies; 179+ messages in thread From: David Woodhouse @ 2005-04-13 9:25 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git On Wed, 2005-04-13 at 10:59 +0200, Petr Baudis wrote: > Theoretically, you are never supposed to share your index if you work > in fully git environment. Maybe -- if we are prepared to propagate the BK myth that network bandwidth and disk space are free. Meanwhile, in the real world, it'd be really useful to support sharing. I'd even like to see support for using multiple branches checked out of the same .git/ repository. We already cope with having multiple branches _in_ the repository -- all we need to do is cope with multiple indices too, so we can have different versions checked out. -- dwmw2 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:25 ` David Woodhouse @ 2005-04-13 9:42 ` Petr Baudis 2005-04-13 10:24 ` David Woodhouse 2005-04-13 17:01 ` Daniel Barkalow 2005-04-13 12:43 ` Xavier Bestel 2005-04-13 14:38 ` Linus Torvalds 2 siblings, 2 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-13 9:42 UTC (permalink / raw) To: David Woodhouse Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Dear diary, on Wed, Apr 13, 2005 at 11:25:04AM CEST, I got a letter where David Woodhouse <dwmw2@infradead.org> told me that... > On Wed, 2005-04-13 at 10:59 +0200, Petr Baudis wrote: > > Theoretically, you are never supposed to share your index if you work > > in fully git environment. > > Maybe -- if we are prepared to propagate the BK myth that network > bandwidth and disk space are free. > > Meanwhile, in the real world, it'd be really useful to support sharing. It's fine to share the objects database. If you want to share the directory cache, you are doing something wrong, though. What do you need it for? > I'd even like to see support for using multiple branches checked out of > the same .git/ repository. We already cope with having multiple branches > _in_ the repository -- all we need to do is cope with multiple indices > too, so we can have different versions checked out. I'm working on that right now. (Well, I wish I would, if other things didn't keep distracting me.) The idea is to have a command which will do something like: mkdir .git ln -s $origtree/heads $origtree/objects $origtree/tags .git cp $origtree/HEAD .git cd .. read-tree $(tree-id) Voila. Now you have a new tree with almost no current neither future overhead. This will be used to do the out-tree merges. Command for user to do this will likely also make it a regular branch, doing ln -s $(realpath git/HEAD) .git/heads/branchname so that you can reference to it easily from your other branches. Would this do what you want? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:42 ` Petr Baudis @ 2005-04-13 10:24 ` David Woodhouse 2005-04-13 17:01 ` Daniel Barkalow 1 sibling, 0 replies; 179+ messages in thread From: David Woodhouse @ 2005-04-13 10:24 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git On Wed, 2005-04-13 at 11:42 +0200, Petr Baudis wrote: > It's fine to share the objects database. If you want to share the > directory cache, you are doing something wrong, though. What do you > need it for? I want to _not_ care which machine I happen to be on when I use git repositories which live in my home directory. I want all operations to just work, regardless of whether the shell I'm looking at happens to be on a BE or a LE box. > <...> Would this do what you want? Sounds ideal. -- dwmw2 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:42 ` Petr Baudis 2005-04-13 10:24 ` David Woodhouse @ 2005-04-13 17:01 ` Daniel Barkalow 2005-04-13 18:07 ` Petr Baudis 1 sibling, 1 reply; 179+ messages in thread From: Daniel Barkalow @ 2005-04-13 17:01 UTC (permalink / raw) To: Petr Baudis Cc: David Woodhouse, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git On Wed, 13 Apr 2005, Petr Baudis wrote: > Dear diary, on Wed, Apr 13, 2005 at 11:25:04AM CEST, I got a letter > where David Woodhouse <dwmw2@infradead.org> told me that... > > On Wed, 2005-04-13 at 10:59 +0200, Petr Baudis wrote: > > > Theoretically, you are never supposed to share your index if you work > > > in fully git environment. > > > > Maybe -- if we are prepared to propagate the BK myth that network > > bandwidth and disk space are free. > > > > Meanwhile, in the real world, it'd be really useful to support sharing. > > It's fine to share the objects database. If you want to share the > directory cache, you are doing something wrong, though. What do you need > it for? > > > I'd even like to see support for using multiple branches checked out of > > the same .git/ repository. We already cope with having multiple branches > > _in_ the repository -- all we need to do is cope with multiple indices > > too, so we can have different versions checked out. > > I'm working on that right now. (Well, I wish I would, if other things > didn't keep distracting me.) > > The idea is to have a command which will do something like: > > mkdir .git > ln -s $origtree/heads $origtree/objects $origtree/tags .git > cp $origtree/HEAD .git > cd .. > read-tree $(tree-id) > > Voila. Now you have a new tree with almost no current neither future > overhead. For future reference, git is unhappy if you actually do this, because your HEAD won't match the (empty) contents of the new directory. The easiest thing is to cp -r your original, replace the shared stuff with links, and go from there. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 17:01 ` Daniel Barkalow @ 2005-04-13 18:07 ` Petr Baudis 2005-04-13 18:22 ` git mailing list (Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3) Linus Torvalds 2005-04-13 18:38 ` Re: Re: Re: [ANNOUNCE] git-pasky-0.3 Daniel Barkalow 0 siblings, 2 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-13 18:07 UTC (permalink / raw) To: Daniel Barkalow Cc: David Woodhouse, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Dear diary, on Wed, Apr 13, 2005 at 07:01:34PM CEST, I got a letter where Daniel Barkalow <barkalow@iabervon.org> told me that... > For future reference, git is unhappy if you actually do this, because your > HEAD won't match the (empty) contents of the new directory. The easiest > thing is to cp -r your original, replace the shared stuff with links, and > go from there. How is it unhappy? That would likely be a bug, unless you do something which really *needs* the tree populated and doesn't make sense otherwise (show-diff aka git diff w/o arguments, for example). Given that what would you copy with cp -r and wipe shortly after (objects db) is likely to be significantly larger than the working tree itself, checkout-cache would be wiser anyway. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* git mailing list (Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3) 2005-04-13 18:07 ` Petr Baudis @ 2005-04-13 18:22 ` Linus Torvalds 2005-04-13 18:38 ` Re: Re: Re: [ANNOUNCE] git-pasky-0.3 Daniel Barkalow 1 sibling, 0 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-13 18:22 UTC (permalink / raw) To: Petr Baudis Cc: Daniel Barkalow, David Woodhouse, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift, git On Wed, 13 Apr 2005, Petr Baudis wrote: > > Dear diary, on Wed, Apr 13, 2005 at 07:01:34PM CEST, I got a letter > where Daniel Barkalow <barkalow@iabervon.org> told me that... > > For future reference, git is unhappy if you actually do this, because your > > HEAD won't match the (empty) contents of the new directory. The easiest > > thing is to cp -r your original, replace the shared stuff with links, and > > go from there. > > How is it unhappy? I think it's just Daniel being unhappy because he didn't do the read-tree + checkout-cache + update-cache steps ;) Btw, I'm going to stop cc'ing linux-kernel on git issues (after this email, which also acts as an announcement for people who haven't noticed already), since anybody who is interested in git can just use the "git@vger.kernel.org" mailing list: echo 'subscribe git' | mail majordomo@vger.kernel.org to get you subscribed (and you'll get a message back asking you to authorize it to avoid spam - if you don't get anything back, it failed). Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 18:07 ` Petr Baudis 2005-04-13 18:22 ` git mailing list (Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3) Linus Torvalds @ 2005-04-13 18:38 ` Daniel Barkalow 1 sibling, 0 replies; 179+ messages in thread From: Daniel Barkalow @ 2005-04-13 18:38 UTC (permalink / raw) To: Petr Baudis Cc: David Woodhouse, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git On Wed, 13 Apr 2005, Petr Baudis wrote: > Dear diary, on Wed, Apr 13, 2005 at 07:01:34PM CEST, I got a letter > where Daniel Barkalow <barkalow@iabervon.org> told me that... > > For future reference, git is unhappy if you actually do this, because your > > HEAD won't match the (empty) contents of the new directory. The easiest > > thing is to cp -r your original, replace the shared stuff with links, and > > go from there. > > How is it unhappy? That would likely be a bug, unless you do something > which really *needs* the tree populated and doesn't make sense otherwise > (show-diff aka git diff w/o arguments, for example). If you copy HEAD without copying the files, it will then try to apply the patches which would apply to your previous directory to the empty directory, which will just give a lot of errors about missing files. If you don't copy HEAD, it tries to compare against nothing. Upon further consideration, a "checkout-cache -a" at the end of your list makes things generally happy. The next problem is that rsync is replacing the .git/objects symlink with the remote system's directory, which makes this not actually helpful. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:25 ` David Woodhouse 2005-04-13 9:42 ` Petr Baudis @ 2005-04-13 12:43 ` Xavier Bestel 2005-04-13 16:48 ` H. Peter Anvin 2005-04-13 23:05 ` bd 2005-04-13 14:38 ` Linus Torvalds 2 siblings, 2 replies; 179+ messages in thread From: Xavier Bestel @ 2005-04-13 12:43 UTC (permalink / raw) To: David Woodhouse Cc: Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Le mercredi 13 avril 2005 à 10:25 +0100, David Woodhouse a écrit : > On Wed, 2005-04-13 at 10:59 +0200, Petr Baudis wrote: > > Theoretically, you are never supposed to share your index if you work > > in fully git environment. > > Maybe -- if we are prepared to propagate the BK myth that network > bandwidth and disk space are free. On a related note, maybe kernel.org should host .torrent files (and serve them) for the kernel git repository. That would ease the pain. Xav ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 12:43 ` Xavier Bestel @ 2005-04-13 16:48 ` H. Peter Anvin 2005-04-13 18:15 ` Xavier Bestel 2005-04-13 23:05 ` bd 1 sibling, 1 reply; 179+ messages in thread From: H. Peter Anvin @ 2005-04-13 16:48 UTC (permalink / raw) To: Xavier Bestel Cc: David Woodhouse, Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Xavier Bestel wrote: > Le mercredi 13 avril 2005 à 10:25 +0100, David Woodhouse a écrit : > >>On Wed, 2005-04-13 at 10:59 +0200, Petr Baudis wrote: >> >>>Theoretically, you are never supposed to share your index if you work >>>in fully git environment. >> >>Maybe -- if we are prepared to propagate the BK myth that network >>bandwidth and disk space are free. > > > On a related note, maybe kernel.org should host .torrent files (and > serve them) for the kernel git repository. That would ease the pain. > /me inflicts major bodily harm on Xav. There is a reason we (kernel.org) doesn't touch Bittorrent: for a variety of reasons, Bittorrent doesn't lend itself very well to automation. Jeff Garzik and I have been sketching on a sane replacement for Bittorrent with the working name "Software Distribution Protocol", but it's not even vaporware so far. -hpa ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 16:48 ` H. Peter Anvin @ 2005-04-13 18:15 ` Xavier Bestel 0 siblings, 0 replies; 179+ messages in thread From: Xavier Bestel @ 2005-04-13 18:15 UTC (permalink / raw) To: H. Peter Anvin Cc: David Woodhouse, Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Le mercredi 13 avril 2005 à 09:48 -0700, H. Peter Anvin a écrit : > Xavier Bestel wrote: > > On a related note, maybe kernel.org should host .torrent files (and > > serve them) for the kernel git repository. That would ease the pain. > > > > /me inflicts major bodily harm on Xav. > > There is a reason we (kernel.org) doesn't touch Bittorrent: for a > variety of reasons, Bittorrent doesn't lend itself very well to > automation. Jeff Garzik and I have been sketching on a sane replacement > for Bittorrent with the working name "Software Distribution Protocol", > but it's not even vaporware so far. Aah, technical details ... glad to hear that, though. Xav ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 12:43 ` Xavier Bestel 2005-04-13 16:48 ` H. Peter Anvin @ 2005-04-13 23:05 ` bd 1 sibling, 0 replies; 179+ messages in thread From: bd @ 2005-04-13 23:05 UTC (permalink / raw) To: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Xavier Bestel wrote: > Le mercredi 13 avril 2005 à 10:25 +0100, David Woodhouse a écrit : > >>On Wed, 2005-04-13 at 10:59 +0200, Petr Baudis wrote: >> >>>Theoretically, you are never supposed to share your index if you work >>>in fully git environment. >> >>Maybe -- if we are prepared to propagate the BK myth that network >>bandwidth and disk space are free. > > > On a related note, maybe kernel.org should host .torrent files (and > serve them) for the kernel git repository. That would ease the pain. Bittorrent does not lend itself well to frequently-changing files or collections thereof - each time the git repository ip updated, a new metadata file would need to be created, and distributed, and you'd lose all the seeds who don't bother to get the new one every time it changes. Moreover, I imagine some clients would have problems with more than 900 or so files due to open file limits. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1-ecc0.1.6 (GNU/Linux) iQIVAwUBQl2lsXhF4rlE0/81AQMEZA/+MtAwhLVBGbjIGMG4911/Q4tL+RZCni2Z 9wCM2/1Acca7CUeYJOX3bFqx/HMlVyzTN/DFyz7oQbngNrcOFaO4xqHwDT9iVpUB x1fE2Ct1BXOOnAQEzjEoogKrjWuYiy2tkcsFNMFoef0qV9U8olwwtUgXG8+dOQSZ gEPocjFmEJLMxhNxdnigW2R1KWgJ0IoFmpIWxDUnpQGBg/dfVxtI4EQhR7FgZwch O9faPyMdHEct7WW4S8ysMcwGUyRg8J/nlgt413P66PSp9IJ5u8t/gUc0vVcDR0Bl QNO5Hf2kGe/tILYNMJOtQX8sGcKHC4mZJMsNlhs5Y0+GsD9/9JGj3lv69SM+kv92 5S3ePfArzNvnuoCCxS1iC+s1HZ8fyYXAPx6pVA3cs0/+QGv0LjeSZOCBWmh8vrl1 SD4MF6TPy4mdF1corQE1o8bCc/VP0cTnwBvyF6BpZeP9nipgrzLxM1PPTtDjyUDG B3VocEsieTyyzfl3hXJxGqFL3Txt6EbRU4AwYitONbTU5zMaQuEY4xBD+UWQJgAO K8rMXqONSoORWrZVuRyrTmFr/z6zq00BpwQy7BbHuwEXHSPvc/e4UHtk8wNYyY13 LAi2jgMGmGckwucauqZY5Y3mDaOh2m9+0x8hIvvnmLPQC91cVsuerKiKYzjYJ4/4 qsnhjobIq1s= =ZiJ1 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:25 ` David Woodhouse 2005-04-13 9:42 ` Petr Baudis 2005-04-13 12:43 ` Xavier Bestel @ 2005-04-13 14:38 ` Linus Torvalds 2005-04-13 14:47 ` David Woodhouse 2 siblings, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-13 14:38 UTC (permalink / raw) To: David Woodhouse Cc: Petr Baudis, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift, git On Wed, 13 Apr 2005, David Woodhouse wrote: > > I'd even like to see support for using multiple branches checked out of > the same .git/ repository. David, we already can. The objects are _designed_ to be shared. However, that is the ".git/objects" subdirectory. Not the per-view stuff. For each _view_ you do need to have view-specific data, and the view index very much is that. That's ".git/index". The index file isn't small - it's about 1.6MB for a kernel tree, because it needs to list every single file we know about, its "stat" information, and it's sha1 backing store. So multiply 17,000 by ~40, and add in the size of the name of each file, and avoid compression because this is read and written _all_ the time, and you end up with 1.6MB. But you _need_ one per checked-out tree. And it really _is_ private. It's not supposed to be shared. In fact, it _cannot_ be shared, because it doesn't have sufficient locking (it has some, but that's just to catch _errors_ when somebody tries to do two operations that update the index file at the same time in the same view). But even ignoring the locking issues, it just isn't appropriate, it's not how that file works. In other words, that index file simply _cannot_ be shared. Don't even think about it. Only madness will ensue. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 14:38 ` Linus Torvalds @ 2005-04-13 14:47 ` David Woodhouse 2005-04-13 14:59 ` Linus Torvalds 0 siblings, 1 reply; 179+ messages in thread From: David Woodhouse @ 2005-04-13 14:47 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift, git On Wed, 2005-04-13 at 07:38 -0700, Linus Torvalds wrote: > David, we already can. The objects are _designed_ to be shared. > > However, that is the ".git/objects" subdirectory. Not the per-view stuff. > For each _view_ you do need to have view-specific data, and the view index > very much is that. That's ".git/index". Yep, it takes very little to achieve that -- to allow multiple checked- out trees from a single object database. Petr's already outlined what it takes. > In other words, that index file simply _cannot_ be shared. Don't even > think about it. Only madness will ensue. If I use git in my home directory I cannot _help_ but share it. Sometimes I'm using it from a BE box, sometimes from a LE box. Should I really be forced to use separate checkouts for each type of machine? It's bad enough having to do that for ~/bin :) Seriously, it shouldn't have a significantly detrimental effect on the performance if we just use explicitly sized types and fixed byte-order. It's just not worth the pain of being gratuitously non-portable. -- dwmw2 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 14:47 ` David Woodhouse @ 2005-04-13 14:59 ` Linus Torvalds 0 siblings, 0 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-13 14:59 UTC (permalink / raw) To: David Woodhouse Cc: Petr Baudis, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift, git On Wed, 13 Apr 2005, David Woodhouse wrote: > > > In other words, that index file simply _cannot_ be shared. Don't even > > think about it. Only madness will ensue. > > If I use git in my home directory I cannot _help_ but share it. > Sometimes I'm using it from a BE box, sometimes from a LE box. Should I > really be forced to use separate checkouts for each type of machine? Now, that kind of "private sharing" should certainly be ok. In fact, the only locking there is (doing the ".git/index.lock" thing around any updates and erroring out if it already exists) was somewhat designed for that. So making it use BE data (preferred just because then you can use the existing htonl() etc helpers in user space) would work. As long as people don't think this means anything else... It really is a private file. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-11 13:57 ` [ANNOUNCE] git-pasky-0.3 Petr Baudis 2005-04-12 12:47 ` Martin Schlemmer 2005-04-12 13:07 ` David Woodhouse @ 2005-04-13 9:35 ` Russell King 2005-04-13 9:38 ` Russell King ` (2 more replies) 2 siblings, 3 replies; 179+ messages in thread From: Russell King @ 2005-04-13 9:35 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift On Mon, Apr 11, 2005 at 03:57:58PM +0200, Petr Baudis wrote: > here goes git-pasky-0.3, my set of patches and scripts upon > Linus' git, aimed at human usability and to an extent a SCM-like usage. I tried this today, applied my patch for BE<->LE conversions and glibc-2.2 compatibility (attached, still requires cleaning though), and then tried git pull. Umm, whoops. Here's just a small sample of what happened: diff: /9a30ec42a6c4860d3f11ad90c1052823a020de32/show-files.c: No such file or directory diff: /85bf824bd24f034896f5820a2628148a246f8fd1/show-files.c: No such file or directory mkdir: cannot create directory `/9a30ec42a6c4860d3f11ad90c1052823a020de32': Permission denied mkdir: cannot create directory `/85bf824bd24f034896f5820a2628148a246f8fd1': Permission denied ./gitdiff-do: /9a30ec42a6c4860d3f11ad90c1052823a020de32/update-cache.c: No such file or directory ./gitdiff-do: /85bf824bd24f034896f5820a2628148a246f8fd1/update-cache.c: No such file or directory diff: /9a30ec42a6c4860d3f11ad90c1052823a020de32/update-cache.c: No such file or directory diff: /85bf824bd24f034896f5820a2628148a246f8fd1/update-cache.c: No such file or directory patch: **** Only garbage was found in the patch input. --- - Wed Apr 13 09:49:43 2005 +++ cache.h Fri Apr 8 11:15:08 2005 @@ -14,6 +14,12 @@ #include <openssl/sha.h> #include <zlib.h> +#include <netinet/in.h> +#define cpu_to_beuint(x) (htonl(x)) +#define beuint_to_cpu(x) (ntohl(x)) +#define cpu_to_beushort(x) (htons(x)) +#define beushort_to_cpu(x) (ntohs(x)) + /* * Basic data structures for the directory cache * @@ -67,7 +73,7 @@ #define DEFAULT_DB_ENVIRONMENT ".git/objects" #define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7) -#define ce_size(ce) cache_entry_size((ce)->namelen) +#define ce_size(ce) cache_entry_size(beushort_to_cpu((ce)->namelen)) #define alloc_nr(x) (((x)+16)*3/2) --- - Wed Apr 13 09:49:43 2005 +++ read-cache.c Fri Apr 8 11:05:34 2005 @@ -271,6 +271,7 @@ /* nsec seems unreliable - not all filesystems support it, so * as long as it is in the inode cache you get right nsec * but after it gets flushed, you get zero nsec. */ +#if 0 if (ce->mtime.sec != (unsigned int)st->st_mtim.tv_sec #ifdef NSEC || ce->mtime.nsec != (unsigned int)st->st_mtim.tv_nsec @@ -283,15 +284,21 @@ #endif ) changed |= CTIME_CHANGED; - if (ce->st_uid != (unsigned int)st->st_uid || - ce->st_gid != (unsigned int)st->st_gid) +#else + if (beuint_to_cpu(ce->mtime.sec) != (unsigned int)st->st_mtime) + changed |= MTIME_CHANGED; + if (beuint_to_cpu(ce->ctime.sec) != (unsigned int)st->st_ctime) + changed |= CTIME_CHANGED; +#endif + if (beuint_to_cpu(ce->st_uid) != (unsigned int)st->st_uid || + beuint_to_cpu(ce->st_gid) != (unsigned int)st->st_gid) changed |= OWNER_CHANGED; - if (ce->st_mode != (unsigned int)st->st_mode) + if (beuint_to_cpu(ce->st_mode) != (unsigned int)st->st_mode) changed |= MODE_CHANGED; - if (ce->st_dev != (unsigned int)st->st_dev || - ce->st_ino != (unsigned int)st->st_ino) + if (beuint_to_cpu(ce->st_dev) != (unsigned int)st->st_dev || + beuint_to_cpu(ce->st_ino) != (unsigned int)st->st_ino) changed |= INODE_CHANGED; - if (ce->st_size != (unsigned int)st->st_size) + if (beuint_to_cpu(ce->st_size) != (unsigned int)st->st_size) changed |= DATA_CHANGED; return changed; } @@ -378,9 +378,9 @@ SHA_CTX c; unsigned char sha1[20]; - if (hdr->signature != CACHE_SIGNATURE) + if (hdr->signature != cpu_to_beuint(CACHE_SIGNATURE)) return error("bad signature"); - if (hdr->version != 1) + if (hdr->version != cpu_to_beuint(1)) return error("bad version"); SHA1_Init(&c); SHA1_Update(&c, hdr, offsetof(struct cache_header, sha1)); @@ -428,12 +428,12 @@ if (verify_hdr(hdr, size) < 0) goto unmap; - active_nr = hdr->entries; + active_nr = beuint_to_cpu(hdr->entries); active_alloc = alloc_nr(active_nr); active_cache = calloc(active_alloc, sizeof(struct cache_entry *)); offset = sizeof(*hdr); - for (i = 0; i < hdr->entries; i++) { + for (i = 0; i < beuint_to_cpu(hdr->entries); i++) { struct cache_entry *ce = map + offset; offset = offset + ce_size(ce); active_cache[i] = ce; @@ -452,9 +452,9 @@ struct cache_header hdr; int i; - hdr.signature = CACHE_SIGNATURE; - hdr.version = 1; - hdr.entries = entries; + hdr.signature = cpu_to_beuint(CACHE_SIGNATURE); + hdr.version = cpu_to_beuint(1); + hdr.entries = cpu_to_beuint(entries); SHA1_Init(&c); SHA1_Update(&c, &hdr, offsetof(struct cache_header, sha1)); --- - Wed Apr 13 09:49:43 2005 +++ show-diff.c Wed Apr 13 09:49:43 2005 @@ -51,7 +51,7 @@ printf("%s: ok\n", ce->name); continue; } - printf("%.*s: ", ce->namelen, ce->name); + printf("%.*s: ", beushort_to_cpu(ce->namelen), ce->name); for (n = 0; n < 20; n++) printf("%02x", ce->sha1[n]); printf(" %02x\n", changed); --- - Wed Apr 13 09:49:43 2005 +++ update-cache.c Fri Apr 8 11:06:17 2005 @@ -108,11 +108,11 @@ memcpy(ce->name, path, namelen); ce->ctime.sec = st.st_ctime; #ifdef NSEC - ce->ctime.nsec = st.st_ctim.tv_nsec; + ce->ctime.nsec = 0; //st.st_ctim.tv_nsec; #endif ce->mtime.sec = st.st_mtime; #ifdef NSEC - ce->mtime.nsec = st.st_mtim.tv_nsec; + ce->mtime.nsec = 0; //st.st_mtim.tv_nsec; #endif ce->st_dev = st.st_dev; ce->st_ino = st.st_ino; -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:35 ` Russell King @ 2005-04-13 9:38 ` Russell King 2005-04-13 9:49 ` Petr Baudis 2005-04-13 9:46 ` Petr Baudis 2005-04-13 19:03 ` Russell King 2 siblings, 1 reply; 179+ messages in thread From: Russell King @ 2005-04-13 9:38 UTC (permalink / raw) To: Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift On Wed, Apr 13, 2005 at 10:35:21AM +0100, Russell King wrote: > On Mon, Apr 11, 2005 at 03:57:58PM +0200, Petr Baudis wrote: > > here goes git-pasky-0.3, my set of patches and scripts upon > > Linus' git, aimed at human usability and to an extent a SCM-like usage. > > I tried this today, applied my patch for BE<->LE conversions and > glibc-2.2 compatibility (attached, still requires cleaning though), > and then tried git pull. Umm, whoops. Oh, and the other thing is: $ git pull GNU Interactive Tools 4.3.20 (armv4l-rmk-linux-gnu), 20:02:38 Mar 7 2001 GIT is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. Copyright (C) 1993-1999 Free Software Foundation, Inc. Written by Tudor Hulubei and Andrei Pitis, Bucharest, Romania git: fatal error: `chdir' failed: permission denied. "git" already exists as a command from about 4 years ago. Can we have less TLAs for commands please? That namespace is rather over-used and collision-prone. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:38 ` Russell King @ 2005-04-13 9:49 ` Petr Baudis 2005-04-13 11:02 ` Ingo Molnar 0 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-13 9:49 UTC (permalink / raw) To: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Dear diary, on Wed, Apr 13, 2005 at 11:38:52AM CEST, I got a letter where Russell King <rmk+lkml@arm.linux.org.uk> told me that... > On Wed, Apr 13, 2005 at 10:35:21AM +0100, Russell King wrote: > > On Mon, Apr 11, 2005 at 03:57:58PM +0200, Petr Baudis wrote: > > > here goes git-pasky-0.3, my set of patches and scripts upon > > > Linus' git, aimed at human usability and to an extent a SCM-like usage. > > > > I tried this today, applied my patch for BE<->LE conversions and > > glibc-2.2 compatibility (attached, still requires cleaning though), > > and then tried git pull. Umm, whoops. > > Oh, and the other thing is: > > $ git pull > > GNU Interactive Tools 4.3.20 (armv4l-rmk-linux-gnu), 20:02:38 Mar 7 2001 > GIT is free software; you can redistribute it and/or modify it under the > terms of the GNU General Public License as published by the Free Software > Foundation; either version 2, or (at your option) any later version. > Copyright (C) 1993-1999 Free Software Foundation, Inc. > Written by Tudor Hulubei and Andrei Pitis, Bucharest, Romania > > git: fatal error: `chdir' failed: permission denied. > > "git" already exists as a command from about 4 years ago. Can we have > less TLAs for commands please? That namespace is rather over-used and > collision-prone. I've already noticed GNU interactive tools (googling for git), but it's Linus' choice of name. Alternative suggestions welcomed. What about 'gt'? ;-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:49 ` Petr Baudis @ 2005-04-13 11:02 ` Ingo Molnar 2005-04-13 14:50 ` Linus Torvalds 0 siblings, 1 reply; 179+ messages in thread From: Ingo Molnar @ 2005-04-13 11:02 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git * Petr Baudis <pasky@ucw.cz> wrote: > > Oh, and the other thing is: > > > > $ git pull > > > > GNU Interactive Tools 4.3.20 (armv4l-rmk-linux-gnu), 20:02:38 Mar 7 2001 > > GIT is free software; you can redistribute it and/or modify it under the > > terms of the GNU General Public License as published by the Free Software > > Foundation; either version 2, or (at your option) any later version. > > Copyright (C) 1993-1999 Free Software Foundation, Inc. > > Written by Tudor Hulubei and Andrei Pitis, Bucharest, Romania > > > > git: fatal error: `chdir' failed: permission denied. > > > > "git" already exists as a command from about 4 years ago. Can we have > > less TLAs for commands please? That namespace is rather over-used and > > collision-prone. > > I've already noticed GNU interactive tools (googling for git), but > it's Linus' choice of name. Alternative suggestions welcomed. What > about 'gt'? ;-) 'gt' or 'gi' both sound fine - 'gi' being a bit faster to type ;-). (Even 'get' seems to be unused in the command namespace.) Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 11:02 ` Ingo Molnar @ 2005-04-13 14:50 ` Linus Torvalds 0 siblings, 0 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-13 14:50 UTC (permalink / raw) To: Ingo Molnar Cc: Petr Baudis, Kernel Mailing List, Randy.Dunlap, Ross Vandegrift, git On Wed, 13 Apr 2005, Ingo Molnar wrote: > > > > I've already noticed GNU interactive tools (googling for git), but > > it's Linus' choice of name. Alternative suggestions welcomed. What > > about 'gt'? ;-) > > 'gt' or 'gi' both sound fine - 'gi' being a bit faster to type ;-). > (Even 'get' seems to be unused in the command namespace.) Let's be realistic here. "git" as in "gnu interactive tools" was last actively developed in 1996, and had even its last maintenanace release over five years ago. Let it go, people. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:35 ` Russell King 2005-04-13 9:38 ` Russell King @ 2005-04-13 9:46 ` Petr Baudis 2005-04-13 10:28 ` Russell King 2005-04-13 19:03 ` Russell King 2 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-13 9:46 UTC (permalink / raw) To: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift Dear diary, on Wed, Apr 13, 2005 at 11:35:21AM CEST, I got a letter where Russell King <rmk+lkml@arm.linux.org.uk> told me that... > On Mon, Apr 11, 2005 at 03:57:58PM +0200, Petr Baudis wrote: > > here goes git-pasky-0.3, my set of patches and scripts upon > > Linus' git, aimed at human usability and to an extent a SCM-like usage. > > I tried this today, applied my patch for BE<->LE conversions and > glibc-2.2 compatibility (attached, still requires cleaning though), > and then tried git pull. Umm, whoops. > > Here's just a small sample of what happened: > > diff: /9a30ec42a6c4860d3f11ad90c1052823a020de32/show-files.c: No such file or directory > diff: /85bf824bd24f034896f5820a2628148a246f8fd1/show-files.c: No such file or directory > mkdir: cannot create directory `/9a30ec42a6c4860d3f11ad90c1052823a020de32': Permission denied > mkdir: cannot create directory `/85bf824bd24f034896f5820a2628148a246f8fd1': Permission denied > ./gitdiff-do: /9a30ec42a6c4860d3f11ad90c1052823a020de32/update-cache.c: No such > file or directory > ./gitdiff-do: /85bf824bd24f034896f5820a2628148a246f8fd1/update-cache.c: No such > file or directory > diff: /9a30ec42a6c4860d3f11ad90c1052823a020de32/update-cache.c: No such file or > directory > diff: /85bf824bd24f034896f5820a2628148a246f8fd1/update-cache.c: No such file or > directory > patch: **** Only garbage was found in the patch input. I'll bet at the top of this you have a mktemp error. mktemp turns out to be a PITA to use - on some older systems (e.g. Mandrake 10 stock install) it has incompatible usage to the rest of the world. When I will get a convenient infrastructure for making a shell library, I will probably add a test for this to it. Try to upgrade your mktemp. That Mandrake 10 user said that urpmi should have a newer (correct) version. I will make a patch which will refer to ?time instead instead of ?tim.sec for seconds. That should fix your problem. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:46 ` Petr Baudis @ 2005-04-13 10:28 ` Russell King 0 siblings, 0 replies; 179+ messages in thread From: Russell King @ 2005-04-13 10:28 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift On Wed, Apr 13, 2005 at 11:46:19AM +0200, Petr Baudis wrote: > I'll bet at the top of this you have a mktemp error. Indeed, thanks. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 9:35 ` Russell King 2005-04-13 9:38 ` Russell King 2005-04-13 9:46 ` Petr Baudis @ 2005-04-13 19:03 ` Russell King 2005-04-13 19:13 ` Petr Baudis 2 siblings, 1 reply; 179+ messages in thread From: Russell King @ 2005-04-13 19:03 UTC (permalink / raw) To: Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift On Wed, Apr 13, 2005 at 10:35:21AM +0100, Russell King wrote: > I tried this today, applied my patch for BE<->LE conversions and > glibc-2.2 compatibility (attached, still requires cleaning though), > and then tried git pull. Umm, whoops. Here's an updated patch which allows me to work with a BE-based cache. I've just used this to grab and checkout sparse.git. Note: it also fixes my glibc-2.2 build problem with the nsec stat64 structures (see read-cache.c). --- cache.h +++ cache.h Wed Apr 13 11:23:39 2005 @@ -14,6 +14,12 @@ #include <openssl/sha.h> #include <zlib.h> +#include <netinet/in.h> +#define cpu_to_beuint(x) (htonl(x)) +#define beuint_to_cpu(x) (ntohl(x)) +#define cpu_to_beushort(x) (htons(x)) +#define beushort_to_cpu(x) (ntohs(x)) + /* * Basic data structures for the directory cache * @@ -67,7 +73,7 @@ #define DEFAULT_DB_ENVIRONMENT ".git/objects" #define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7) -#define ce_size(ce) cache_entry_size((ce)->namelen) +#define ce_size(ce) cache_entry_size(beushort_to_cpu((ce)->namelen)) #define alloc_nr(x) (((x)+16)*3/2) --- checkout-cache.c +++ checkout-cache.c Wed Apr 13 19:52:08 2005 @@ -77,7 +77,7 @@ return error("checkout-cache: unable to read sha1 file of %s (%s)", ce->name, sha1_to_hex(ce->sha1)); } - fd = create_file(ce->name, ce->st_mode); + fd = create_file(ce->name, beuint_to_cpu(ce->st_mode)); if (fd < 0) { free(new); return error("checkout-cache: unable to create %s (%s)", --- read-cache.c +++ read-cache.c Wed Apr 13 19:37:00 2005 @@ -271,27 +271,34 @@ /* nsec seems unreliable - not all filesystems support it, so * as long as it is in the inode cache you get right nsec * but after it gets flushed, you get zero nsec. */ - if (ce->mtime.sec != (unsigned int)st->st_mtim.tv_sec +#if 0 + if (beuint_to_cpu(ce->mtime.sec) != (unsigned int)st->st_mtim.tv_sec #ifdef NSEC - || ce->mtime.nsec != (unsigned int)st->st_mtim.tv_nsec + || beuint_to_cpu(ce->mtime.nsec) != (unsigned int)st->st_mtim.tv_nsec #endif ) changed |= MTIME_CHANGED; - if (ce->ctime.sec != (unsigned int)st->st_ctim.tv_sec + if (beuint_to_cpu(ce->ctime.sec) != (unsigned int)st->st_ctim.tv_sec #ifdef NSEC - || ce->ctime.nsec != (unsigned int)st->st_ctim.tv_nsec + || beuint_to_cpu(ce->ctime.nsec) != (unsigned int)st->st_ctim.tv_nsec #endif ) changed |= CTIME_CHANGED; - if (ce->st_uid != (unsigned int)st->st_uid || - ce->st_gid != (unsigned int)st->st_gid) +#else + if (beuint_to_cpu(ce->mtime.sec) != (unsigned int)st->st_mtime) + changed |= MTIME_CHANGED; + if (beuint_to_cpu(ce->ctime.sec) != (unsigned int)st->st_ctime) + changed |= CTIME_CHANGED; +#endif + if (beuint_to_cpu(ce->st_uid) != (unsigned int)st->st_uid || + beuint_to_cpu(ce->st_gid) != (unsigned int)st->st_gid) changed |= OWNER_CHANGED; - if (ce->st_mode != (unsigned int)st->st_mode) + if (beuint_to_cpu(ce->st_mode) != (unsigned int)st->st_mode) changed |= MODE_CHANGED; - if (ce->st_dev != (unsigned int)st->st_dev || - ce->st_ino != (unsigned int)st->st_ino) + if (beuint_to_cpu(ce->st_dev) != (unsigned int)st->st_dev || + beuint_to_cpu(ce->st_ino) != (unsigned int)st->st_ino) changed |= INODE_CHANGED; - if (ce->st_size != (unsigned int)st->st_size) + if (beuint_to_cpu(ce->st_size) != (unsigned int)st->st_size) changed |= DATA_CHANGED; return changed; } @@ -320,7 +327,7 @@ while (last > first) { int next = (last + first) >> 1; struct cache_entry *ce = active_cache[next]; - int cmp = cache_name_compare(name, namelen, ce->name, ce->namelen); + int cmp = cache_name_compare(name, namelen, ce->name, beushort_to_cpu(ce->namelen)); if (!cmp) return next; if (cmp < 0) { @@ -347,7 +354,7 @@ { int pos; - pos = cache_name_pos(ce->name, ce->namelen); + pos = cache_name_pos(ce->name, beushort_to_cpu(ce->namelen)); /* existing match? Just replace it */ if (pos >= 0) { @@ -378,9 +385,9 @@ SHA_CTX c; unsigned char sha1[20]; - if (hdr->signature != CACHE_SIGNATURE) + if (hdr->signature != cpu_to_beuint(CACHE_SIGNATURE)) return error("bad signature"); - if (hdr->version != 1) + if (hdr->version != cpu_to_beuint(1)) return error("bad version"); SHA1_Init(&c); SHA1_Update(&c, hdr, offsetof(struct cache_header, sha1)); @@ -428,12 +435,12 @@ if (verify_hdr(hdr, size) < 0) goto unmap; - active_nr = hdr->entries; + active_nr = beuint_to_cpu(hdr->entries); active_alloc = alloc_nr(active_nr); active_cache = calloc(active_alloc, sizeof(struct cache_entry *)); offset = sizeof(*hdr); - for (i = 0; i < hdr->entries; i++) { + for (i = 0; i < beuint_to_cpu(hdr->entries); i++) { struct cache_entry *ce = map + offset; offset = offset + ce_size(ce); active_cache[i] = ce; @@ -452,9 +459,9 @@ struct cache_header hdr; int i; - hdr.signature = CACHE_SIGNATURE; - hdr.version = 1; - hdr.entries = entries; + hdr.signature = cpu_to_beuint(CACHE_SIGNATURE); + hdr.version = cpu_to_beuint(1); + hdr.entries = cpu_to_beuint(entries); SHA1_Init(&c); SHA1_Update(&c, &hdr, offsetof(struct cache_header, sha1)); --- read-tree.c +++ read-tree.c Wed Apr 13 19:56:52 2005 @@ -13,8 +13,8 @@ memset(ce, 0, size); - ce->st_mode = mode; - ce->namelen = baselen + len; + ce->st_mode = cpu_to_beuint(mode); + ce->namelen = cpu_to_beushort(baselen + len); memcpy(ce->name, base, baselen); memcpy(ce->name + baselen, pathname, len+1); memcpy(ce->sha1, sha1, 20); --- show-diff.c +++ show-diff.c Wed Apr 13 11:27:34 2005 @@ -89,7 +89,7 @@ changed = cache_match_stat(ce, &st); if (!changed) continue; - printf("%.*s: ", ce->namelen, ce->name); + printf("%.*s: ", beushort_to_cpu(ce->namelen), ce->name); for (n = 0; n < 20; n++) printf("%02x", ce->sha1[n]); printf(" %02x\n", changed); --- update-cache.c +++ update-cache.c Wed Apr 13 19:55:16 2005 @@ -68,18 +68,18 @@ */ static void fill_stat_cache_info(struct cache_entry *ce, struct stat *st) { - ce->ctime.sec = st->st_ctime; + ce->ctime.sec = cpu_to_beuint(st->st_ctime); #ifdef NSEC - ce->ctime.nsec = st->st_ctim.tv_nsec; + ce->ctime.nsec = cpu_to_beuint(st->st_ctim.tv_nsec); #endif - ce->mtime.sec = st->st_mtime; + ce->mtime.sec = cpu_to_beuint(st->st_mtime); #ifdef NSEC - ce->mtime.nsec = st->st_mtim.tv_nsec; + ce->mtime.nsec = cpu_to_beuint(st->st_mtim.tv_nsec); #endif - ce->st_dev = st->st_dev; - ce->st_ino = st->st_ino; - ce->st_uid = st->st_uid; - ce->st_gid = st->st_gid; + ce->st_dev = cpu_to_beuint(st->st_dev); + ce->st_ino = cpu_to_beuint(st->st_ino); + ce->st_uid = cpu_to_beuint(st->st_uid); + ce->st_gid = cpu_to_beuint(st->st_gid); } static int add_file_to_cache(char *path) @@ -106,21 +106,21 @@ ce = malloc(size); memset(ce, 0, size); memcpy(ce->name, path, namelen); - ce->ctime.sec = st.st_ctime; + ce->ctime.sec = cpu_to_beuint(st.st_ctime); #ifdef NSEC - ce->ctime.nsec = st.st_ctim.tv_nsec; + ce->ctime.nsec = cpu_to_beuint(st.st_ctim.tv_nsec); #endif - ce->mtime.sec = st.st_mtime; + ce->mtime.sec = cpu_to_beuint(st.st_mtime); #ifdef NSEC - ce->mtime.nsec = st.st_mtim.tv_nsec; + ce->mtime.nsec = cpu_to_beuint(st.st_mtim.tv_nsec); #endif - ce->st_dev = st.st_dev; - ce->st_ino = st.st_ino; - ce->st_mode = st.st_mode; - ce->st_uid = st.st_uid; - ce->st_gid = st.st_gid; - ce->st_size = st.st_size; - ce->namelen = namelen; + ce->st_dev = cpu_to_beuint(st.st_dev); + ce->st_ino = cpu_to_beuint(st.st_ino); + ce->st_mode = cpu_to_beuint(st.st_mode); + ce->st_uid = cpu_to_beuint(st.st_uid); + ce->st_gid = cpu_to_beuint(st.st_gid); + ce->st_size = cpu_to_beuint(st.st_size); + ce->namelen = cpu_to_beushort(namelen); if (index_fd(path, namelen, ce, fd, &st) < 0) return -1; @@ -201,7 +201,7 @@ updated = malloc(size); memcpy(updated, ce, size); fill_stat_cache_info(updated, &st); - updated->st_size = st.st_size; + updated->st_size = cpu_to_beuint(st.st_size); return updated; } --- write-tree.c +++ write-tree.c Wed Apr 13 19:30:45 2005 @@ -45,7 +45,7 @@ do { struct cache_entry *ce = cachep[nr]; const char *pathname = ce->name, *filename, *dirname; - int pathlen = ce->namelen, entrylen; + int pathlen = beushort_to_cpu(ce->namelen), entrylen; unsigned char *sha1; unsigned int mode; @@ -54,7 +54,7 @@ break; sha1 = ce->sha1; - mode = ce->st_mode; + mode = beuint_to_cpu(ce->st_mode); /* Do we have _further_ subdirectories? */ filename = pathname + baselen; -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 19:03 ` Russell King @ 2005-04-13 19:13 ` Petr Baudis 2005-04-13 19:21 ` Russell King 0 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-13 19:13 UTC (permalink / raw) To: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Dear diary, on Wed, Apr 13, 2005 at 09:03:07PM CEST, I got a letter where Russell King <rmk+lkml@arm.linux.org.uk> told me that... > On Wed, Apr 13, 2005 at 10:35:21AM +0100, Russell King wrote: > > I tried this today, applied my patch for BE<->LE conversions and > > glibc-2.2 compatibility (attached, still requires cleaning though), > > and then tried git pull. Umm, whoops. > > Here's an updated patch which allows me to work with a BE-based > cache. I've just used this to grab and checkout sparse.git. > > Note: it also fixes my glibc-2.2 build problem with the nsec > stat64 structures (see read-cache.c). > > --- cache.h > +++ cache.h Wed Apr 13 11:23:39 2005 > @@ -14,6 +14,12 @@ > #include <openssl/sha.h> > #include <zlib.h> > > +#include <netinet/in.h> > +#define cpu_to_beuint(x) (htonl(x)) > +#define beuint_to_cpu(x) (ntohl(x)) > +#define cpu_to_beushort(x) (htons(x)) > +#define beushort_to_cpu(x) (ntohs(x)) > + > /* > * Basic data structures for the directory cache > * What do the wrapper macros gain us? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 19:13 ` Petr Baudis @ 2005-04-13 19:21 ` Russell King 2005-04-13 19:23 ` H. Peter Anvin 0 siblings, 1 reply; 179+ messages in thread From: Russell King @ 2005-04-13 19:21 UTC (permalink / raw) To: Petr Baudis Cc: Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git On Wed, Apr 13, 2005 at 09:13:39PM +0200, Petr Baudis wrote: > Dear diary, on Wed, Apr 13, 2005 at 09:03:07PM CEST, I got a letter > where Russell King <rmk+lkml@arm.linux.org.uk> told me that... > > On Wed, Apr 13, 2005 at 10:35:21AM +0100, Russell King wrote: > > > I tried this today, applied my patch for BE<->LE conversions and > > > glibc-2.2 compatibility (attached, still requires cleaning though), > > > and then tried git pull. Umm, whoops. > > > > Here's an updated patch which allows me to work with a BE-based > > cache. I've just used this to grab and checkout sparse.git. > > > > Note: it also fixes my glibc-2.2 build problem with the nsec > > stat64 structures (see read-cache.c). > > > > --- cache.h > > +++ cache.h Wed Apr 13 11:23:39 2005 > > @@ -14,6 +14,12 @@ > > #include <openssl/sha.h> > > #include <zlib.h> > > > > +#include <netinet/in.h> > > +#define cpu_to_beuint(x) (htonl(x)) > > +#define beuint_to_cpu(x) (ntohl(x)) > > +#define cpu_to_beushort(x) (htons(x)) > > +#define beushort_to_cpu(x) (ntohs(x)) > > + > > /* > > * Basic data structures for the directory cache > > * > > What do the wrapper macros gain us? Nothing much - I don't particularly care about them. I thought someone might object to using htonl/ntohl directly. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [ANNOUNCE] git-pasky-0.3 2005-04-13 19:21 ` Russell King @ 2005-04-13 19:23 ` H. Peter Anvin 0 siblings, 0 replies; 179+ messages in thread From: H. Peter Anvin @ 2005-04-13 19:23 UTC (permalink / raw) To: Russell King Cc: Petr Baudis, Kernel Mailing List, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, git Russell King wrote: > > Nothing much - I don't particularly care about them. I thought someone > might object to using htonl/ntohl directly. > Why would they? -hpa ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 23:31 ` Linus Torvalds 2005-04-10 2:41 ` Petr Baudis @ 2005-04-10 6:53 ` Christopher Li 2005-04-10 11:48 ` Ralph Corderoy ` (2 more replies) 2005-04-11 11:35 ` [rfc] git: combo-blobs Ingo Molnar 2005-04-12 4:05 ` more git updates David Eger 3 siblings, 3 replies; 179+ messages in thread From: Christopher Li @ 2005-04-10 6:53 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sat, Apr 09, 2005 at 04:31:10PM -0700, Linus Torvalds wrote: > > Done, and pushed out. The current git.git repository seems to do all of > this correctly. > > NOTE! This means that each "tree" file basically tracks just a single > directory. The old style of "every file in one tree file" still works, but > fsck-cache will warn about it. Happily, the git archive itself doesn't > have any subdirectories, so git itself is not impacted by it. That is really cool stuff. My way to read it, correct me if I am wrong, git is a user space version file system. "tree" <--> directory and "blob" <--> file. "commit" to describe the version history. Git always write out a full new version of blob when there is any update to it. At first I think that waste a lot of space, especially when there is only tiny change to it. But the more I think about it, it make more sense. Kernel source is usually small objects and file is compressed store any way. A very useful thing to gain form it is that, we can truncate the older history. e.g. We can have option not to sync the pre 2.4 change set, only grab it if we need to. Most of the time we only interested in the recent change set. There is one problem though. How about the SHA1 hash collision? Even the chance is very remote, you don't want to lose some data do due to "software" error. I think it is OK that no handle that case right now. On the other hand, it will be nice to detect that and give out a big error message if it really happens. Some thing like the following patch, may be turn off able. Chris Index: git-0.03/read-cache.c =================================================================== --- git-0.03.orig/read-cache.c 2005-04-09 18:42:16.000000000 -0400 +++ git-0.03/read-cache.c 2005-04-10 02:48:36.000000000 -0400 @@ -210,8 +210,22 @@ int fd; fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666); - if (fd < 0) - return (errno == EEXIST) ? 0 : -1; + if (fd < 0) { + void *map; + static int error(const char * string); + + if (errno != EEXIST) + return -1; + fd = open(filename, O_RDONLY); + if (fd < 0) + return -1; + map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); + if (map == MAP_FAILED) + return -1; + if (memcmp(buf, map, size)) + return error("Ouch, Strike by lighting!\n"); + return 0; + } write(fd, buf, size); close(fd); return 0; ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 6:53 ` more git updates Christopher Li @ 2005-04-10 11:48 ` Ralph Corderoy 2005-04-10 19:23 ` Paul Jackson 2005-04-11 13:58 ` H. Peter Anvin 2 siblings, 0 replies; 179+ messages in thread From: Ralph Corderoy @ 2005-04-10 11:48 UTC (permalink / raw) To: Christopher Li Cc: Linus Torvalds, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Hi, Christopher Li wrote: > On Sat, Apr 09, 2005 at 04:31:10PM -0700, Linus Torvalds wrote: > > NOTE! This means that each "tree" file basically tracks just a > > single directory. The old style of "every file in one tree file" > > still works, but fsck-cache will warn about it. Happily, the git > > archive itself doesn't have any subdirectories, so git itself is not > > impacted by it. > > That is really cool stuff. My way to read it, correct me if I am > wrong, git is a user space version file system. "tree" <--> directory > and "blob" <--> file. "commit" to describe the version history. See the Venti filesystem in Bell Labs's Plan 9 OS. It too uses SHA-1. http://www.cs.bell-labs.com/sys/doc/venti/venti.pdf Abstract This paper describes a network storage system, called Venti, intended for archival data. In this system, a unique hash of a block's contents acts as the block identifier for read and write operations. This approach enforces a write-once policy, preventing accidental or malicious destruction of data. In addition, duplicate copies of a block can be coalesced, reducing the consumption of storage and simplifying the implementation of clients. Venti is a building block for constructing a variety of storage applications such as logical backup, physical backup, and snapshot file systems. We have built a prototype of the system and present some preliminary performance results. The system uses magnetic disks as the storage technology, resulting in an access time for archival data that is comparable to non-archival data. The feasibility of the write-once model for storage is demonstrated using data from over a decade's use of two Plan 9 file systems. Cheers, Ralph. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 6:53 ` more git updates Christopher Li 2005-04-10 11:48 ` Ralph Corderoy @ 2005-04-10 19:23 ` Paul Jackson 2005-04-10 18:42 ` Christopher Li 2005-04-11 13:58 ` H. Peter Anvin 2 siblings, 1 reply; 179+ messages in thread From: Paul Jackson @ 2005-04-10 19:23 UTC (permalink / raw) To: Christopher Li; +Cc: torvalds, pasky, rddunlap, ross, linux-kernel > Some thing like the following patch, may be turn off able. Take out an old envelope and compute on it the odds of this happening. Say we have 10,000 kernel hackers, each producing one new file every minute, for 100 hours a week. And we've cloned a small army of Andrew Morton's to integrate the resulting tsunamai of patches. And Linus is well cared for in the state funny farm. What is the probability that this check will fire even once, between now and 10 billion years from now, when the Sun has become a red giant destroying all life on planet Earth? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 19:23 ` Paul Jackson @ 2005-04-10 18:42 ` Christopher Li 2005-04-10 22:30 ` Petr Baudis 0 siblings, 1 reply; 179+ messages in thread From: Christopher Li @ 2005-04-10 18:42 UTC (permalink / raw) To: Paul Jackson; +Cc: torvalds, pasky, rddunlap, ross, linux-kernel I totally agree that odds is really really small. That is why it is not worthy to handle the case. People hit that can just add a new line or some thing to avoid it, if it happen after all. It is the little peace of mind to know for sure that did not happen. I am just paranoid. Chris On Sun, Apr 10, 2005 at 12:23:52PM -0700, Paul Jackson wrote: > > Some thing like the following patch, may be turn off able. > > Take out an old envelope and compute on it the odds of this > happening. > > Say we have 10,000 kernel hackers, each producing one > new file every minute, for 100 hours a week. And we've > cloned a small army of Andrew Morton's to integrate > the resulting tsunamai of patches. And Linus is well > cared for in the state funny farm. > > What is the probability that this check will fire even > once, between now and 10 billion years from now, when > the Sun has become a red giant destroying all life on > planet Earth? > > -- > I won't rest till it's the best ... > Programmer, Linux Scalability > Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-10 18:42 ` Christopher Li @ 2005-04-10 22:30 ` Petr Baudis 0 siblings, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-10 22:30 UTC (permalink / raw) To: Christopher Li; +Cc: Paul Jackson, torvalds, rddunlap, ross, linux-kernel Dear diary, on Sun, Apr 10, 2005 at 08:42:53PM CEST, I got a letter where Christopher Li <lkml@chrisli.org> told me that... > I totally agree that odds is really really small. > That is why it is not worthy to handle the case. People hit that > can just add a new line or some thing to avoid it, if > it happen after all. > > It is the little peace of mind to know for sure that did > not happen. I am just paranoid. BTW, I've merged the check to git-pasky some time ago, you can disable it in the Makefile. It is by default on now, until someone convinces me it actually affects performance measurably. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 6:53 ` more git updates Christopher Li 2005-04-10 11:48 ` Ralph Corderoy 2005-04-10 19:23 ` Paul Jackson @ 2005-04-11 13:58 ` H. Peter Anvin 2005-04-20 20:29 ` Kai Henningsen 2 siblings, 1 reply; 179+ messages in thread From: H. Peter Anvin @ 2005-04-11 13:58 UTC (permalink / raw) To: linux-kernel Followup to: <20050410065307.GC13853@64m.dyndns.org> By author: Christopher Li <lkml@chrisli.org> In newsgroup: linux.dev.kernel > > There is one problem though. How about the SHA1 hash collision? > Even the chance is very remote, you don't want to lose some data do due > to "software" error. I think it is OK that no handle that > case right now. On the other hand, it will be nice to detect that > and give out a big error message if it really happens. > If you're actually worried about it, it'd be better to just use a different hash, like one of the SHA-2's (probably a better choice anyway), instead of SHA-1. -hpa ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-11 13:58 ` H. Peter Anvin @ 2005-04-20 20:29 ` Kai Henningsen 2005-04-24 0:42 ` Paul Jackson [not found] ` <6f6293f10504210220744af114@mail.gmail.com> 0 siblings, 2 replies; 179+ messages in thread From: Kai Henningsen @ 2005-04-20 20:29 UTC (permalink / raw) To: linux-kernel hpa@zytor.com (H. Peter Anvin) wrote on 11.04.05 in <d3dvps$347$1@terminus.zytor.com>: > Followup to: <20050410065307.GC13853@64m.dyndns.org> > By author: Christopher Li <lkml@chrisli.org> > In newsgroup: linux.dev.kernel > > > > There is one problem though. How about the SHA1 hash collision? > > Even the chance is very remote, you don't want to lose some data do due > > to "software" error. I think it is OK that no handle that > > case right now. On the other hand, it will be nice to detect that > > and give out a big error message if it really happens. > > > > If you're actually worried about it, it'd be better to just use a > different hash, like one of the SHA-2's (probably a better choice > anyway), instead of SHA-1. How could that help? *Every* hash has hash collisions. It's an unavoidable result of using less bits than the original data has. MfG Kai ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-20 20:29 ` Kai Henningsen @ 2005-04-24 0:42 ` Paul Jackson 2005-04-24 1:29 ` Bernd Eckenfels 2005-04-24 8:00 ` Kai Henningsen [not found] ` <6f6293f10504210220744af114@mail.gmail.com> 1 sibling, 2 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-24 0:42 UTC (permalink / raw) To: Kai Henningsen; +Cc: linux-kernel > It's an unavoidable > result of using less bits than the original data has. Even _not_ using a hash will have collisions - copy different globs of data around enough, and sooner or later, two globs that started out different will end up the same, due to errors in our computers. Even ECC on all the buses, channels, and memory will just reduce this chance. There is no mathematical perfection obtainable here. Deal with it. Computers are about engineering, not philosophical perfection. If something is likely to happen less than once in a billion years, then for all practical purposes, it won't happen. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-24 0:42 ` Paul Jackson @ 2005-04-24 1:29 ` Bernd Eckenfels 2005-04-24 4:13 ` Paul Jackson 2005-04-24 16:52 ` Horst von Brand 2005-04-24 8:00 ` Kai Henningsen 1 sibling, 2 replies; 179+ messages in thread From: Bernd Eckenfels @ 2005-04-24 1:29 UTC (permalink / raw) To: linux-kernel In article <20050423174227.51360d63.pj@sgi.com> you wrote: > If something is likely to happen less than once in a billion years, > then for all practical purposes, it won't happen. Of course there are colliding files already available and easyly generate-able. So a malicous attack is already possible. Which is especially nasty because one can proof GIT obeject file system is broken. However I dont think it is a problem for Linux Source Control purpose, ever. However using a combined hash might be a good idea, here. So you silence the critics since they have no eploit samples handy. :) Or at least go with FIPS 180-2. Greetings Bernd ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-24 1:29 ` Bernd Eckenfels @ 2005-04-24 4:13 ` Paul Jackson 2005-04-24 4:38 ` Bernd Eckenfels 2005-04-24 16:52 ` Horst von Brand 1 sibling, 1 reply; 179+ messages in thread From: Paul Jackson @ 2005-04-24 4:13 UTC (permalink / raw) To: Bernd Eckenfels; +Cc: linux-kernel Bernd wrote: > Of course there are colliding files already available and easyly > generate-able. So a malicous attack is already possible. I don't believe you. Reference? > Or at least go with FIPS 180-2. FIPS 180-2 specifies four secure hash algorithms - SHA-1, SHA-256, SHA-384, and SHA-512. We're using SHA-1. I think you meant go with SHA-256, which is new in FIPS 180-2. FIPS 180-1 only had SHA-1. FIPS 180-2 superseded FIPS 180-1, adding three the algorithms SHA-256, SHA-384, and SHA-512. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-24 4:13 ` Paul Jackson @ 2005-04-24 4:38 ` Bernd Eckenfels 2005-04-24 4:53 ` Paul Jackson 2005-04-25 11:57 ` Theodore Ts'o 0 siblings, 2 replies; 179+ messages in thread From: Bernd Eckenfels @ 2005-04-24 4:38 UTC (permalink / raw) To: Paul Jackson; +Cc: linux-kernel On Sat, Apr 23, 2005 at 09:13:26PM -0700, Paul Jackson wrote: > I don't believe you. Reference? I had MD5 in mind, sorry. I havent seen the SHA-1 colision samples, yet. However it is likely to be available soon. (a simple pair with two files will be enugh to cause "theoretical" problems. However I think it would be possible to detect collisions on add and append sequence numbers... ugly. > > Or at least go with FIPS 180-2. > > FIPS 180-2 specifies four secure hash algorithms - SHA-1, SHA-256, > SHA-384, and SHA-512. We're using SHA-1. Yes, I was referring to the longer versions (aka SHA-2), since FIPS tries to phase out the 160bit version (till 2010). Anyway I know we dont need to discuss this, I just wanted to point out that in practical usage as source repository we might not see problems, but it does not mean that there arent some already provokeable. PErsonally I see the advantage of the "stateless" hash approach about more correct statefull approaches like BK. Greetings Bernd ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-24 4:38 ` Bernd Eckenfels @ 2005-04-24 4:53 ` Paul Jackson 2005-04-25 11:57 ` Theodore Ts'o 1 sibling, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-24 4:53 UTC (permalink / raw) To: Bernd Eckenfels; +Cc: linux-kernel > I had MD5 in mind, sorry. That's what I suspected. > Anyway I know we dont need to discuss this, Agreed. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-24 4:38 ` Bernd Eckenfels 2005-04-24 4:53 ` Paul Jackson @ 2005-04-25 11:57 ` Theodore Ts'o 2005-04-25 16:40 ` David Wagner 2005-04-25 20:35 ` Bernd Eckenfels 1 sibling, 2 replies; 179+ messages in thread From: Theodore Ts'o @ 2005-04-25 11:57 UTC (permalink / raw) To: Bernd Eckenfels; +Cc: Paul Jackson, linux-kernel On Sun, Apr 24, 2005 at 06:38:13AM +0200, Bernd Eckenfels wrote: > On Sat, Apr 23, 2005 at 09:13:26PM -0700, Paul Jackson wrote: > > I don't believe you. Reference? > > I had MD5 in mind, sorry. I havent seen the SHA-1 colision samples, yet. > However it is likely to be available soon. (a simple pair with two files > will be enugh to cause "theoretical" problems. However I think it would be > possible to detect collisions on add and append sequence numbers... ugly. The MD5 collision smaples are for two 16 byte inputs which when run through the MD5 algorithm, result in the same 128-bit hash. The SHA-1 collision samples are for two 20 byte inputs which when run through the SHA algorithm create the same 160-bit hash. In neither case will the inputs be valid git objects, nor anything approaching ASCII text, let alone valid C files. So what theoretical problems will be caused by this? Sure, an attacker can check a garbage file containing (apparently) random bytes into git, and then produce another garbage file containing some completely other (apparently) random bytes which will collide with the first garbage file. You want to explain how this is going to cause problems in the git systems? And even if you can describe any problems, you want to explain why any such theoretical problems couldn't be trivially detected and fixed? - Ted ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-25 11:57 ` Theodore Ts'o @ 2005-04-25 16:40 ` David Wagner 2005-04-25 20:35 ` Bernd Eckenfels 1 sibling, 0 replies; 179+ messages in thread From: David Wagner @ 2005-04-25 16:40 UTC (permalink / raw) To: linux-kernel Theodore Ts'o wrote: >The MD5 collision smaples are for two 16 byte inputs which when run >through the MD5 algorithm, result in the same 128-bit hash. The SHA-1 >collision samples are for two 20 byte inputs which when run through >the SHA algorithm create the same 160-bit hash. There are no known SHA-1 collision samples. (There are collision samples for MD5, and for SHA-0, but not for SHA-1.) ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-25 11:57 ` Theodore Ts'o 2005-04-25 16:40 ` David Wagner @ 2005-04-25 20:35 ` Bernd Eckenfels 1 sibling, 0 replies; 179+ messages in thread From: Bernd Eckenfels @ 2005-04-25 20:35 UTC (permalink / raw) To: Theodore Ts'o, linux-kernel On Mon, Apr 25, 2005 at 07:57:50AM -0400, Theodore Ts'o wrote: > You want to explain how this is going to cause problems in the git > systems? No because I explained it does not cause Problems. Greetings Bernd BTW: do you have an link to the SHA-1 collisions? -- (OO) -- Bernd_Eckenfels@Mörscher_Strasse_8.76185Karlsruhe.de -- ( .. ) ecki@{inka.de,linux.de,debian.org} http://www.eckes.org/ o--o 1024D/E383CD7E eckes@IRCNet v:+497211603874 f:+497211606754 (O____O) When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl! ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-24 1:29 ` Bernd Eckenfels 2005-04-24 4:13 ` Paul Jackson @ 2005-04-24 16:52 ` Horst von Brand 1 sibling, 0 replies; 179+ messages in thread From: Horst von Brand @ 2005-04-24 16:52 UTC (permalink / raw) To: Bernd Eckenfels; +Cc: linux-kernel Bernd Eckenfels <ecki@lina.inka.de> said: > In article <20050423174227.51360d63.pj@sgi.com> you wrote: > > If something is likely to happen less than once in a billion years, > > then for all practical purposes, it won't happen. > Of course there are colliding files already available and easyly > generate-able. So a malicous attack is already possible. Care to share some? Of what you are smoking, that is... pretty potent stuff. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-24 0:42 ` Paul Jackson 2005-04-24 1:29 ` Bernd Eckenfels @ 2005-04-24 8:00 ` Kai Henningsen 1 sibling, 0 replies; 179+ messages in thread From: Kai Henningsen @ 2005-04-24 8:00 UTC (permalink / raw) To: linux-kernel pj@sgi.com (Paul Jackson) wrote on 23.04.05 in <20050423174227.51360d63.pj@sgi.com>: > > It's an unavoidable > > result of using less bits than the original data has. > > Even _not_ using a hash will have collisions - copy different globs of > data around enough, and sooner or later, two globs that started out > different will end up the same, due to errors in our computers. Even > ECC on all the buses, channels, and memory will just reduce this chance. Umm, the whole point of using a digest for the name is to catch these things as they happen. So if you'd use the whole original bit sequence as a name, you'd need to have exactly the same bit errors in the data, in the name, and in the reference to the object, to miss nopticing the problem early. And it *still* isn't a collision - the data behind name X is exactly X, always, or it's easily recognizable as broken. Whereas a hash collision means that both X and Y should be behind name Z. Both are *correct* behind name Z. Entirely different situations. > There is no mathematical perfection obtainable here. Deal with it. Actually, there is, and your non-hashed name system achieves it. > If something is likely to happen less than once in a billion years, > then for all practical purposes, it won't happen. If that was a truely random thing, then you might have been right. But it isn't. All possible blobs to a given digest are NOT equally probably (or of a probability only depending on their size). We really, really don't know how likely a collision is for the data we want to store there - just for truely random data. MfG Kai ^ permalink raw reply [flat|nested] 179+ messages in thread
[parent not found: <6f6293f10504210220744af114@mail.gmail.com>]
* Re: more git updates.. [not found] ` <6f6293f10504210220744af114@mail.gmail.com> @ 2005-04-24 8:01 ` Kai Henningsen 0 siblings, 0 replies; 179+ messages in thread From: Kai Henningsen @ 2005-04-24 8:01 UTC (permalink / raw) To: linux-kernel felipe.alfaro@gmail.com (Felipe Alfaro Solana) wrote on 21.04.05 in <6f6293f10504210220744af114@mail.gmail.com>: > On 20 Apr 2005 22:29:00 +0200, Kai Henningsen <kaih@khms.westfalen.de> > > > wrote: If you're actually worried about it, it'd be better to just use a > > > different hash, like one of the SHA-2's (probably a better choice > > > anyway), instead of SHA-1. > > > > How could that help? *Every* hash has hash collisions. It's an unavoidable > > result of using less bits than the original data has. > > SHA-2 allows for 256, 384 and 512-bit hashes, which provides greater > resistance to collisions. So? It's still finite. MfG Kai ^ permalink raw reply [flat|nested] 179+ messages in thread
* [rfc] git: combo-blobs 2005-04-09 23:31 ` Linus Torvalds 2005-04-10 2:41 ` Petr Baudis 2005-04-10 6:53 ` more git updates Christopher Li @ 2005-04-11 11:35 ` Ingo Molnar 2005-04-11 14:45 ` Paul Jackson 2005-04-12 4:05 ` more git updates David Eger 3 siblings, 1 reply; 179+ messages in thread From: Ingo Molnar @ 2005-04-11 11:35 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List, git i think all of the 'repository size' and 'bandwidth' concerns could be solved via a new (and pretty much simple and transparent) object type: the 'combo-blob'. Summary: -------- This is a space/bandwidth-efficient blob that 'includes' arbitrary portions of (one, two, or more) simple blobs by reference [1], with byte granularity, plus an optional followup portion that includes the full constructed state, uncompressed. [2] It can also conserve more RAM compared to the current repository format. Representation: --------------- A combo-blob would have the 'simplest possible' and thus most obvious representation: a list (the 'include-table') of "include X bytes at offset Y from parent Z" operations: <parent-blob-ID> <offset> <size> [optional full constructed state] e.g.: 6d11b2dd7f169c29664ac0553090865b7b020973 0 64444 6d374c972c04a0b1894cc6898dffa8ab0b273fcb 0 100 6d11b2dd7f169c29664ac0553090865b7b020973 64545 163656 'punches' 100 bytes out of blob 6d1* at offset 64444, and replaces it with blob 6d3*'s 100 bytes. [offset/size would be stored in a binary form to have constant record sizes.] in OS terms it's similar to an iovec representation. [3] The hash of a combo-blob is calculated off the include-table alone: i.e. it's _not_ equivalent to the hash of the included contents. I.e. you cannot 'collapse' a combo-blob after the fact, it's an immutable part of the history of the repository, similar to other stored objects. You can freely cache/uncache (blow-up/collapse) it on the other hand. [ NOTE: further below you can find a 'Notes' section as well, which might address some of the issues/ideas you might have at this point. ] Cons: ----- there are a number of disadvantages: - performance hit. Linus is perfectly right, in terms of performance, nothing beats having full objects. Hence i kept the option to include the full constructed blob [4] (uncompressed) as well in the combo-blob. When all combo-blobs are 'blown up' then they can be better in terms of performance than the current repository format. [they still carry the small slice & dice information as well] the performance hit can be reduced in a finegrained way by introducing occasional full objects in the history. E.g. after every 8 steps one would include a full blob, to limit the number of blobs necessary to construct a previously unconstructed combo-blob. This would still cut the overhead of the current format substantially. clearly, the most important cache is the current directory cache, which this abstraction does not hurt. - complexity. It's all pretty straightforward, but checking the consistency of a combo object is not as simple as checking the consistency of a simple object, as it would have to recursively check all parent IDs as well. I think it's worth the price though. - repository has optional components: the 'blown up' (cached) portion of a combo-blob can be freely destructed. This means that two repositories can now not only differ in their directory-cache, but also in their objects/ hierarchy. I dont think this is a big issue, BYMMV. Pros: ----- - the main advantage is space/bandwith: it's pretty much as efficient as it gets: it can be used to represent compressed binary deltas. A fully trimmed (uncached) repository is very efficient. - the optional 'fully constructed' portion is not compressed, so once a repository is 'cached', it is faster to process (in areas outside the current directory cache) than the current repository format. (In fact, when a previously unused portion of a repository is accessed _first_, it is IO-bound by nature - so we can very well spend the extra CPU cycles on uncompressing things.) - a 'combo' blob will be more memory-efficient as well. So with given amount of RAM one could access more history, with a small CPU cost - as long as the level of 'history recursion' is kept in check (e.g. via the previously mentioned 'at most 8-deep combinations'). Straightforward iovecs could be passed to Linux system-calls, when constructing a 'view' of a file, without having to cache every step of the file's history. - a combo-blob directly represents the way humans code: combining pre-existing pieces of information and adding relatively low amount of new stuff. Having a natural representation for the type of activity that a tool supports cannot hurt. ( - combo-blobs enable a per-chunk (or per-line) edit history. It's not an important feature though. ) Notes: ------ [1] the combo-blob is not a 'delta' thing. It combines pre-existing parents. One of the parents may of course be a 'delta' that acts upon the other parent - but the combo-blob does not know and does not care. (A combo-blob might as well represent an act of someone consolidating multiple small files into a big file, or splitting up a big file into smaller files. Or a combo-blob might represent the trimming of a preexisting file.) [2]: a combo-blob is conceptually still a simple object with blob data in it, nothing more. It can be referenced in other object types equivalently to other blobs. It just happens to be a combination of existing blobs, and hence the 'git filesystem' has to work harder (but still quite efficiently) to get to the contents. [3]: a combo-blob might reference any parent blob, including combo blobs. This means that e.g. multiple small deltas can be represented via: <blob-#1> | |-----<blob-#2> | <combo-blob-#1> | |-----<blob-#3> | <combo-blob-#2> where combo-blob-#2 is thus a combination of blob-#1,blob-#2,blob-#3. [4] alternatively, it might also make sense to extend the simple combo-blob concept with the concept of a 'cache-blob': a cache-blob 'blows up' combo blobs in that it fully constructs the blob contents, but it is otherwise identical to the blob it caches. Simple (non-combo) blob types are a cache of themselves. Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 11:35 ` [rfc] git: combo-blobs Ingo Molnar @ 2005-04-11 14:45 ` Paul Jackson 2005-04-11 15:12 ` Ingo Molnar 2005-04-11 15:28 ` Ingo Molnar 0 siblings, 2 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-11 14:45 UTC (permalink / raw) To: Ingo Molnar; +Cc: torvalds, pasky, rddunlap, ross, linux-kernel, git Hmmm ... I have this strong sense that I am about 2 hours away from smacking my forehead and groaning "Duh - so that's what Ingo meant!" However, one must play out one's destiny. Could you provide an example scenario, which results in the creation of a combo-blob? The best I can come up with is the following. Let's say Nick changes one line in the middle of kernel/sched.c (yeah - I know - unlikely scenario - he usually changes more than that - nevermind that detail.) In the days Before Combo Blobs (BCB), git would have been told that kernel/sched.c was to be picked up, and would have wrapped it up in a zlib'd blob, sha1summed it, seen it was a new sum, and added that blob to its objects (or something like this -- I'm still a little fuzzy on these git details.) But Nick just downloaded the latest git 1.5.11.1 which has added support for combo blobs, so now, guessing here, instead of wrapping up the new sched.c, git instead unwraps the old one, diff's with the new, notices a couple of long sequences that are unchanged, wraps up both of those sequences as a couple of relatively large blobs, and wraps up the new lines that Nick just coded in the middle as a small blob, and puts all three in the object store, along with another small combo-blob, tying them all together. So far, not too bad. Haven't gained anything, and required the unpacking of a zlib blog we didn't require before, and the running and analyzing of a diff we didn't require before, but the end result is only moderately worse - four object blobs instead of one, but of total size not much larger (well, total size typically 3 disk blocks worse, due to a slight increase in fragmentation from using 4 blocks to store what used to be in one.) But now I get stuck. Unless I throw in something like the interleaved delta compression that's at the heart of Marc Rochind's old SCCS code (and Larry's rewrite thereof), I don't see how we ever come to the practical realization that any of these four new blobs can ever be reused. So explain to me again how we ever gain anything with these combo blobs, while I take a prophylactic aspirin, so the forehead whack won't hurt as much. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 14:45 ` Paul Jackson @ 2005-04-11 15:12 ` Ingo Molnar 2005-04-11 15:32 ` Linus Torvalds 2005-04-11 17:50 ` Paul Jackson 2005-04-11 15:28 ` Ingo Molnar 1 sibling, 2 replies; 179+ messages in thread From: Ingo Molnar @ 2005-04-11 15:12 UTC (permalink / raw) To: Paul Jackson; +Cc: torvalds, pasky, rddunlap, ross, linux-kernel, git * Paul Jackson <pj@engr.sgi.com> wrote: > Hmmm ... I have this strong sense that I am about 2 hours away from > smacking my forehead and groaning "Duh - so that's what Ingo meant!" > > However, one must play out one's destiny. > > Could you provide an example scenario, which results in the creation > of a combo-blob? > > The best I can come up with is the following. > > Let's say Nick changes one line in the middle of kernel/sched.c (yeah > - I know - unlikely scenario - he usually changes more than that - > nevermind that detail.) > > In the days Before Combo Blobs (BCB), git would have been told that > kernel/sched.c was to be picked up, and would have wrapped it up in a > zlib'd blob, sha1summed it, seen it was a new sum, and added that blob > to its objects (or something like this -- I'm still a little fuzzy on > these git details.) > > But Nick just downloaded the latest git 1.5.11.1 which has added > support for combo blobs, so now, guessing here, instead of wrapping up > the new sched.c, git instead unwraps the old one, diff's with the new, > notices a couple of long sequences that are unchanged, wraps up both > of those sequences as a couple of relatively large blobs, and wraps up > the new lines that Nick just coded in the middle as a small blob, and > puts all three in the object store, along with another small > combo-blob, tying them all together. actually, git would just include by reference the previous blob. lets say we had the previous version of sched.c in a blob, ID cc4ee6107d19f89898a8c89d45810f01710f2ff4. We have the new edit (which is small, lets say 20 bytes) in blob e010fab710092b19be6e26de1721e249dff2d141. We'd create the combo-blob representing the new version of sched.c, the following way: include cc4ee6107d19f89898a8c89d45810f01710f2ff4 0 54010 include e010fab710092b19be6e26de1721e249dff2d141 0 20 include cc4ee6107d19f89898a8c89d45810f01710f2ff4 54030 73061 so we'd include (by reference) most of the previous version, with a small blob for the extras. Since sched.c compresses down to 36K, we saved ~32K of bandwidth, and somewhere on the order of 20K of storage. to construct the combo blob later on, we do have to unpack sched.c (and if it's already a combo-blob that is not cached then we'd have to unpack all parents until we arrive at some full blob). > So far, not too bad. Haven't gained anything, and required the > unpacking of a zlib blog we didn't require before, and the running and > analyzing of a diff we didn't require before, but the end result is > only moderately worse - four object blobs instead of one, but of total > size not much larger (well, total size typically 3 disk blocks worse, > due to a slight increase in fragmentation from using 4 blocks to store > what used to be in one.) we'd have 2 new objects (the 'delta' and the 'combo' blob). (if # of objects is an issue then we could include new data in the combo blob itself too, but that's getting too complex i think.) Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 15:12 ` Ingo Molnar @ 2005-04-11 15:32 ` Linus Torvalds 2005-04-11 15:39 ` Ingo Molnar 2005-04-11 17:50 ` Paul Jackson 1 sibling, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-11 15:32 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul Jackson, pasky, rddunlap, ross, linux-kernel, git On Mon, 11 Apr 2005, Ingo Molnar wrote: > > to construct the combo blob later on, we do have to unpack sched.c (and > if it's already a combo-blob that is not cached then we'd have to unpack > all parents until we arrive at some full blob). I really don't want to have this. Having chains of dependencies is really painful, and now if _any_ of them gets corrupted, you're screwed. Yes, GIT already has chains, but they are the minimal possible (ie we have the path-name-dependent tree chain, which I tried to avoid but really couldn't). The "commit" chain can grow to arbitrary sizes, but losing any entry but the top one really doesn't lose any data - you lost your place in history, but at least you're not totally screwed. You still have your data, you just can't find your way to the root (but you can, for example, effectively re-create the whole commit chain if you want to without having to touch any of the data blobs). So I would very strongly suggest that we do not have dependent combo blobs, but that if you want to, a better "network protocol" might be quite possible. Ie send diffs over the network, and re-create the blobs on the other side. You can trivially check that you got it right, because if you didn't, the name of the result won't match ;) Please? Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 15:32 ` Linus Torvalds @ 2005-04-11 15:39 ` Ingo Molnar 2005-04-11 15:57 ` Ingo Molnar 2005-04-11 16:01 ` Linus Torvalds 0 siblings, 2 replies; 179+ messages in thread From: Ingo Molnar @ 2005-04-11 15:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Jackson, pasky, rddunlap, ross, linux-kernel, git * Linus Torvalds <torvalds@osdl.org> wrote: > > to construct the combo blob later on, we do have to unpack sched.c (and > > if it's already a combo-blob that is not cached then we'd have to unpack > > all parents until we arrive at some full blob). > > I really don't want to have this. Having chains of dependencies is > really painful, and now if _any_ of them gets corrupted, you're > screwed. if a repository is corrupted then it pretty much needs to be dropped anyway. Also, with a 'replicate the full object on every 8th commit' rule the risk would be somewhat mitigated as well. but yeah, i can very much see the point of trying to avoid that complexity. (Also, it's not like delta blobs couldnt be introduced later on, if there's enough (if any) pressure to reduce storage overhead.) Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 15:39 ` Ingo Molnar @ 2005-04-11 15:57 ` Ingo Molnar 2005-04-11 16:01 ` Linus Torvalds 1 sibling, 0 replies; 179+ messages in thread From: Ingo Molnar @ 2005-04-11 15:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Jackson, pasky, rddunlap, ross, linux-kernel, git * Ingo Molnar <mingo@elte.hu> wrote: > > * Linus Torvalds <torvalds@osdl.org> wrote: > > > > to construct the combo blob later on, we do have to unpack sched.c (and > > > if it's already a combo-blob that is not cached then we'd have to unpack > > > all parents until we arrive at some full blob). > > > > I really don't want to have this. Having chains of dependencies is > > really painful, and now if _any_ of them gets corrupted, you're > > screwed. > > if a repository is corrupted then it pretty much needs to be dropped > anyway. Also, with a 'replicate the full object on every 8th commit' > rule the risk would be somewhat mitigated as well. another thing is that if the repository is 'cached' (which would normally be the case for work files), then it would be more resilient against corruption as the full uncompressed file would be included at the end of the combo-blob. Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 15:39 ` Ingo Molnar 2005-04-11 15:57 ` Ingo Molnar @ 2005-04-11 16:01 ` Linus Torvalds 2005-04-11 16:33 ` Ingo Molnar 2005-04-11 18:13 ` Chris Wedgwood 1 sibling, 2 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-11 16:01 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul Jackson, pasky, rddunlap, ross, linux-kernel, git On Mon, 11 Apr 2005, Ingo Molnar wrote: > > if a repository is corrupted then it pretty much needs to be dropped > anyway. I disagree. Yes, the thing is designed to be replicated, so most of the time the easiest thing to do is to just rsync with another copy. But dammit, I don't want to just depend on that. I wrote "fsck" for a reason. Right now it only finds errors, which is sufficient if you do the rsync thing, but I think it's _wrong_ to - be slower - be more complex - be less safe to save some diskspace. If you want to save disk-space, the current setup has a great way of doing that: just drop old history. Exactly because a GIT repo doesn't do the dependency chain thing, you can do that, and have a minimal GIT repostiroty that is still perfectly valid (and is basically the size of a single checked-out tree tree, except it's also compressed). I don't think many people will do that, considering how cheap disk is, but the fact is, GIT allows it just fine. "fsck" will complain right now, but I'm actually going to make the "commit->commit" link be a "weaker" thing, and have fsck not complain about missing history unless you do the "-v" thing. (Right now, for development, I _do_ want fsck to complain about missing history, but that's a different thing. Right now it's there to make sure I don't do stupid things, not for "users"). > Also, with a 'replicate the full object on every 8th commit' > rule the risk would be somewhat mitigated as well. ..but not the complexity. The fact is, I want to trust this thing. Dammit, one reason I like GIT is that I can mentally visualize the whole damn tree, and each step is so _simple_. That's extra important when the object database itself is so inscrutable - unlike CVS or SCCS or formats like that, it's damn hard to visualize from looking at a directory listing. So this really is a very important point for me: I want a demented chimpanzee to be able to understand the GIT linkages, and I do not want _any_ partial results anywhere. The recursive tree is already more complexity than I wanted, but at least that seemed inescapable. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 16:01 ` Linus Torvalds @ 2005-04-11 16:33 ` Ingo Molnar 2005-04-12 5:42 ` Barry K. Nathan 2005-04-11 18:13 ` Chris Wedgwood 1 sibling, 1 reply; 179+ messages in thread From: Ingo Molnar @ 2005-04-11 16:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Jackson, pasky, rddunlap, ross, linux-kernel, git * Linus Torvalds <torvalds@osdl.org> wrote: > > Also, with a 'replicate the full object on every 8th commit' > > rule the risk would be somewhat mitigated as well. > > ..but not the complexity. > > The fact is, I want to trust this thing. Dammit, one reason I like GIT > is that I can mentally visualize the whole damn tree, and each step is > so _simple_. That's extra important when the object database itself is > so inscrutable - unlike CVS or SCCS or formats like that, it's damn > hard to visualize from looking at a directory listing. ok. Meanwhile i found another counter-argument: the average committed file size is 36K, which with gzip -9 would compress down to roughly 8K, with the commit message being another block. That's 2+1 blocks used per commit, while with deltas one could at most cut this down to 1+1+1 blocks - just as much space! So we would be almost even with the more complex delta approach, just by increasing the default compression ratio from 6 to 9. (but even with the default we are not that bad.) case closed i guess. (The network bandwith issue can/could indeed be solved independently, without any impact to the fundamentals, as you suggested.) Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 16:33 ` Ingo Molnar @ 2005-04-12 5:42 ` Barry K. Nathan 0 siblings, 0 replies; 179+ messages in thread From: Barry K. Nathan @ 2005-04-12 5:42 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, Paul Jackson, pasky, rddunlap, ross, linux-kernel, git On Mon, Apr 11, 2005 at 06:33:58PM +0200, Ingo Molnar wrote: > ok. Meanwhile i found another counter-argument: the average committed > file size is 36K, which with gzip -9 would compress down to roughly 8K, > with the commit message being another block. That's 2+1 blocks used per > commit, while with deltas one could at most cut this down to 1+1+1 > blocks - just as much space! So we would be almost even with the more > complex delta approach, just by increasing the default compression ratio > from 6 to 9. (but even with the default we are not that bad.) I think you forgot about reiserfs/reiser4 tails. (At least, I *think* reiser4 has tails. I know reiserfs 3.x does.) BTW, I happen to agree completely with Linus on this issue, but I still figured I'd mention this for the sake of completeness. -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 16:01 ` Linus Torvalds 2005-04-11 16:33 ` Ingo Molnar @ 2005-04-11 18:13 ` Chris Wedgwood 2005-04-11 18:30 ` Linus Torvalds 2005-04-11 18:40 ` Petr Baudis 1 sibling, 2 replies; 179+ messages in thread From: Chris Wedgwood @ 2005-04-11 18:13 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Paul Jackson, pasky, rddunlap, ross, linux-kernel, git On Mon, Apr 11, 2005 at 09:01:51AM -0700, Linus Torvalds wrote: > I disagree. Yes, the thing is designed to be replicated, so most of > the time the easiest thing to do is to just rsync with another copy. It's not clear how any of this is going to give me something like bk changes -R or bk changes -L functionality. I'm guessing I will have to sync locally and check between two trees in those cases? Or at least sync enough metadata as to make this possible... but not the entire tree right? ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 18:13 ` Chris Wedgwood @ 2005-04-11 18:30 ` Linus Torvalds 2005-04-11 20:18 ` Linus Torvalds 2005-04-11 18:40 ` Petr Baudis 1 sibling, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-11 18:30 UTC (permalink / raw) To: Chris Wedgwood Cc: Ingo Molnar, Paul Jackson, pasky, rddunlap, ross, linux-kernel, git On Mon, 11 Apr 2005, Chris Wedgwood wrote: > > On Mon, Apr 11, 2005 at 09:01:51AM -0700, Linus Torvalds wrote: > > > I disagree. Yes, the thing is designed to be replicated, so most of > > the time the easiest thing to do is to just rsync with another copy. > > It's not clear how any of this is going to give me something like > > bk changes -R > > or > bk changes -L > > functionality. You'd dowload all the sha1 objects (they don't actually do anything to _your_ state - they only show the possible other states), and then it's a "simple thing" to generate a full tree of your local HEAD commit and compare it to a full tree of the remove HEAD commit. If you then want to merge, you already have all the data. If you don't, you can then prune your object tree from the stuff you don't use (fsck already effectively does all the connectivity work, it just never removes unreferenced files). Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 18:30 ` Linus Torvalds @ 2005-04-11 20:18 ` Linus Torvalds 0 siblings, 0 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-11 20:18 UTC (permalink / raw) To: Chris Wedgwood Cc: Ingo Molnar, Paul Jackson, pasky, rddunlap, ross, linux-kernel, git On Mon, 11 Apr 2005, Linus Torvalds wrote: > > bk changes -R > > > > bk changes -L > > You'd dowload all the sha1 objects (they don't actually do anything to > _your_ state - they only show the possible other states), and then it's a > "simple thing" to generate a full tree of your local HEAD commit and > compare it to a full tree of the remove HEAD commit. Ok, there's a "rev-tree" program there now to generate these things. If you control both ends, or have some other means of a "smart" communications protocol, you don't actually have to download the blobs themselves. Just download the "rev-tree" from the other side, and you can generate the differences by comparing your rev-tree against theirs. (And since they are sorted, the compare is very cheap). The downside? A revtree can get quite large. My "rev-tree" program allows you to cache previous state so that you don't have to follow the whole thing down, though, so it's possible to just send incrementals (since a "commit" _uniquely_ generates the whole rev-tree, you really can do reasonably smart things and create "superset revtrees" etc). So the change difference between two commits is literally rev-tree [commit-id1] > commit1-revtree rev-tree [commit-id2] > commit2-revtree join -t : commit1-revtree commit2-revtree > common-revisions (this is also how to find the most common parent - you'd look at just the head revisions - the ones that aren't referred to by other revisions - in "common-revision", and figure out the best one. I think.) Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: [rfc] git: combo-blobs 2005-04-11 18:13 ` Chris Wedgwood 2005-04-11 18:30 ` Linus Torvalds @ 2005-04-11 18:40 ` Petr Baudis 1 sibling, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-11 18:40 UTC (permalink / raw) To: Chris Wedgwood Cc: Linus Torvalds, Ingo Molnar, Paul Jackson, rddunlap, ross, linux-kernel, git Dear diary, on Mon, Apr 11, 2005 at 08:13:19PM CEST, I got a letter where Chris Wedgwood <cw@f00f.org> told me that... > On Mon, Apr 11, 2005 at 09:01:51AM -0700, Linus Torvalds wrote: > > > I disagree. Yes, the thing is designed to be replicated, so most of > > the time the easiest thing to do is to just rsync with another copy. > > It's not clear how any of this is going to give me something like > > bk changes -R > > or > bk changes -L > > functionality. I'm guessing I will have to sync locally and check > between two trees in those cases? Or at least sync enough metadata as > to make this possible... but not the entire tree right? Checking "what will be transferred when I push" doesn't sound hard - the push itself is not too trivial, but solvable. Perhaps even by pure rsync, if you won't support updating tracked trees (does not sound overwhelmingly useful anyway). Checking "what will be transferred if I pull" is much worse. Perhaps you could make a parallel objects repository, fetch all the newer commit and tree metadata there, and then do diff-tree. I think you need something smarter than rsync for that, though. [git-pasky] As long as you are not pulling from a tracked branch, the worst what can happen is that the enemy will trick you to pulling some terabytes of data. Or overwrite existing objects with garbage, but --ignore-existing would solve that trivially. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 15:12 ` Ingo Molnar 2005-04-11 15:32 ` Linus Torvalds @ 2005-04-11 17:50 ` Paul Jackson 1 sibling, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-11 17:50 UTC (permalink / raw) To: Ingo Molnar; +Cc: torvalds, pasky, rddunlap, ross, linux-kernel, git Ingo wrote: > actually, git would just include by reference the previous blob. Ok - kind of like a patch blob. I can see now where under some conditions this saves space. I agree with conclusion this thread has already reached. Keep it simple. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 14:45 ` Paul Jackson 2005-04-11 15:12 ` Ingo Molnar @ 2005-04-11 15:28 ` Ingo Molnar 2005-04-11 15:31 ` Ingo Molnar 1 sibling, 1 reply; 179+ messages in thread From: Ingo Molnar @ 2005-04-11 15:28 UTC (permalink / raw) To: Paul Jackson; +Cc: torvalds, pasky, rddunlap, ross, linux-kernel, git here are some stats: of the last 34160 files modified in the Linux kernel tree in the past 1 year, the file sizes total to 1 GB, and the average file-size per file committed is 31220 bytes. The changes themselves amount to: 22404 files changed, 1996494 insertions(+), 1396644 deletions(-) (the # of files changed is lower because one file can be modified multiple times) the Linux kernel has an average line-length of 36 bytes, so even without analyzing the commits themselves, the actual size of changes is around 70 MB content added, 50 MB content removed. The patches (plus commit comments, and email headers) add up to 250 MB. So the combo-blob representation would have an uncompressed content somewhere between 130MB and 250MB: 200 MB would be a good guess i think. That's 20% of the 1+ GB the full-blob representation would give, and it would be nearly as compressible. Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: [rfc] git: combo-blobs 2005-04-11 15:28 ` Ingo Molnar @ 2005-04-11 15:31 ` Ingo Molnar 0 siblings, 0 replies; 179+ messages in thread From: Ingo Molnar @ 2005-04-11 15:31 UTC (permalink / raw) To: Paul Jackson; +Cc: torvalds, pasky, rddunlap, ross, linux-kernel, git * Ingo Molnar <mingo@elte.hu> wrote: > here are some stats: of the last 34160 files modified in the Linux > kernel tree in the past 1 year, the file sizes total to 1 GB, and the > average file-size per file committed is 31220 bytes. The changes > themselves amount to: > > 22404 files changed, 1996494 insertions(+), 1396644 deletions(-) > > (the # of files changed is lower because one file can be modified > multiple times) one more number: thus the average commit size is 3575 bytes, i.e. less than a block. Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 23:31 ` Linus Torvalds ` (2 preceding siblings ...) 2005-04-11 11:35 ` [rfc] git: combo-blobs Ingo Molnar @ 2005-04-12 4:05 ` David Eger 2005-04-12 8:16 ` Petr Baudis 3 siblings, 1 reply; 179+ messages in thread From: David Eger @ 2005-04-12 4:05 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List So with git, *every* changeset is an entire (compressed) copy of the kernel. Really? Every patch you accept adds 37 MB to your hard disk? Am I missing something here? -dte ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-12 4:05 ` more git updates David Eger @ 2005-04-12 8:16 ` Petr Baudis 2005-04-12 20:44 ` David Eger 0 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-12 8:16 UTC (permalink / raw) To: David Eger Cc: Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Dear diary, on Tue, Apr 12, 2005 at 06:05:19AM CEST, I got a letter where David Eger <eger@havoc.gtf.org> told me that... > So with git, *every* changeset is an entire (compressed) copy of the > kernel. Really? Every patch you accept adds 37 MB to your hard disk? > > Am I missing something here? Yes. Only changes files re-appear. The unchanged files keep the same SHA1 hash, therefore they don't re-appear in the repository. So, if Linus gets a patch which sanitizes drivers/char/selection.c, only these new objects appear in the repository: drivers/char/selection.c drivers/char drivers . (project root) commit message Kind regards, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-12 8:16 ` Petr Baudis @ 2005-04-12 20:44 ` David Eger 2005-04-12 21:21 ` Linus Torvalds 0 siblings, 1 reply; 179+ messages in thread From: David Eger @ 2005-04-12 20:44 UTC (permalink / raw) To: Petr Baudis Cc: Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List The reason I am questioning this point is the GIT README file. Linus makes explicit that a "blob" is just the "file contents," and that really, a "blob" is not just the SHA1 of the "blob": > In particular, the "current directory cache" certainly does not need to > be consistent with the current directory contents, but it has two very > important attributes: > > (a) it can re-generate the full state it caches (not just the directory > structure: through the "blob" object it can regenerate the data too) And he defines "TREE" with the same name: blob > TREE: The next hierarchical object type is the "tree" object. A tree > object is a list of permission/name/blob data, sorted by name. Therefore, "TREE" must be the *full* data, and since we have the following definition for CHANGESET: > A "changeset" is defined by the tree-object that it results in, the > parent changesets (zero, one or more) that led up to that point, and a > comment on what happened. That each changeset remembers *everything* for *each point in the tree*. Linus, if you actually mean to differentiate between the full data and a SHA1 of the data, *please please please* say "blob" in one place and "SHA1 of the blob" elsewhere. It's quite confusing, to me at least. Also, the details of just what data constitutes a 'changeset' would be lovely... i.e. a precise spec of what Pat is describing below... -dte > where David Eger <eger@havoc.gtf.org> told me that... > > So with git, *every* changeset is an entire (compressed) copy of the > > kernel. Really? Every patch you accept adds 37 MB to your hard disk? > > > > Am I missing something here? > > Yes. Only changes files re-appear. The unchanged files keep the same > SHA1 hash, therefore they don't re-appear in the repository. > > So, if Linus gets a patch which sanitizes drivers/char/selection.c, > only these new objects appear in the repository: > > drivers/char/selection.c > drivers/char > drivers > . (project root) > commit message > ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-12 20:44 ` David Eger @ 2005-04-12 21:21 ` Linus Torvalds 2005-04-12 22:29 ` Krzysztof Halasa ` (2 more replies) 0 siblings, 3 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-12 21:21 UTC (permalink / raw) To: David Eger Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Tue, 12 Apr 2005, David Eger wrote: > > The reason I am questioning this point is the GIT README file. > > Linus makes explicit that a "blob" is just the "file contents," and that > really, a "blob" is not just the SHA1 of the "blob": > > > In particular, the "current directory cache" certainly does not need to > > be consistent with the current directory contents, but it has two very > > important attributes: > > > > (a) it can re-generate the full state it caches (not just the directory > > structure: through the "blob" object it can regenerate the data too) > > And he defines "TREE" with the same name: blob Yes. A tree is defined by the blobs it references (and the subtrees) but it doesn't _contain_ them. It just contains a pointer to them. > Therefore, "TREE" must be the *full* data, and since we have the following > definition for CHANGESET: No. A tree is not the full data. A tree contains enough information to _recreate_ the full data, but the tree itself just tells you _how_ to do that. It doesn't contain very much of the data itself at all. > That each changeset remembers *everything* for *each point in the tree*. But only BY REFERENCE. A "commit" is usually very small. For example, the top-of-tree commit-file for my currest kernel test is literally 401 _bytes_ in size. Because it just references a tree (20 bytes of _reference_). > Linus, if you actually mean to differentiate between the full data > and a SHA1 of the data There is no differentiation. The sha1 _is_ the data as far as git is concerned. It's only confusing if you think they are different. > Also, the details of just what data constitutes a 'changeset' would be > lovely... i.e. a precise spec of what Pat is describing below... torvalds@ppc970:~/test-tools/linux-2.6.12-rc2> cat-file commit `cat .git/HEAD ` tree cf9fd295d3048cd84c65d5e1a5a6b606bf4fddc6 parent c7a1a189dd0fe2c6ecd0aa33f2bd2f414c7892a0 author NeilBrown <neilb@cse.unsw.edu.au> Tue Apr 12 08:27:08 2005 committer Linus Torvalds <torvalds@ppc970.osdl.org> Tue Apr 12 08:27:08 2005 [PATCH] md: remove a number of misleading calls to MD_BUG The conditions that cause these calls to MD_BUG are not kernel bugs, just oddities in what userspace is asking for. Also convert analyze_sbs to return void, and the value it returned was always 0. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> That's it. In all it's glory. Compressed and tagged it's 401 bytes. The tree it references is 677 bytes in size. That in turn references a number of subtrees, but almost all of the sub-trees are shared with _other_ tree commits, so their size is spread out over all the commits. The full archive of the 2.6.12-rc2 kernel that I used for testing (only _one_ version) is 102MB in size. That's about half of what the kernel is uncompressed. The full .git archive for 199 versions of the kernel (the 2.6.12-rc2 one and a test-run of 198 patches from Andrew) is 111MB. In other words, adding 198 "full" new kernels only grew the archive by 9MB (that's all "actual disk usage" btw - the files themselves are smaller, but since they all end up taking up a full disk block..) Basically, the whole point of git is that objects are equated with their sha1 name, and that you can thus "include" an object by just referring to its name. The two are equivalent. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-12 21:21 ` Linus Torvalds @ 2005-04-12 22:29 ` Krzysztof Halasa 2005-04-12 22:49 ` Linus Torvalds 2005-04-12 22:36 ` David Eger 2005-04-12 23:40 ` Andrea Arcangeli 2 siblings, 1 reply; 179+ messages in thread From: Krzysztof Halasa @ 2005-04-12 22:29 UTC (permalink / raw) To: Linus Torvalds Cc: David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Linus Torvalds <torvalds@osdl.org> writes: > The full .git archive for 199 versions of the kernel (the 2.6.12-rc2 one > and a test-run of 198 patches from Andrew) is 111MB. In other words, > adding 198 "full" new kernels only grew the archive by 9MB (that's all > "actual disk usage" btw - the files themselves are smaller, but since they > all end up taking up a full disk block..) Does that mean that the 64 K changes imported from bk would take ~ 3 GB? Is that real? Have to tried to import it? I'm going to import the CVS data (with cvsps) - as the CVS "misses" half the changes, the resulting archive should be half in size too? I don't know how much space did bk use, but 3 GB for the full history is reasonable for most people, isn't it? Especially that one can purge older data. -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-12 22:29 ` Krzysztof Halasa @ 2005-04-12 22:49 ` Linus Torvalds 2005-04-13 4:32 ` Matthias Urlichs 0 siblings, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-12 22:49 UTC (permalink / raw) To: Krzysztof Halasa Cc: David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Wed, 13 Apr 2005, Krzysztof Halasa wrote: > > Does that mean that the 64 K changes imported from bk would take ~ 3 GB? > Is that real? That's a _guess_. > Have to tried to import it? It would take days. > I'm going to import the CVS data (with cvsps) - as the CVS "misses" half > the changes, the resulting archive should be half in size too? No. The CVS archive is going to be almost the same size. BKCVS gets about 98% of all the data. It just doesn't show the complex merge graphs, but those are "small" in comparison. > I don't know how much space did bk use, but 3 GB for the full history > is reasonable for most people, isn't it? Especially that one can purge > older data. I think it's entirely reasonable, yes. But I may be off by an order of magnitude. I based the 3GB on estimating form the sparse tree, but I wasn't being too careful. Andrew estimated 2GB per year (at our current historical rate of changes) based on my merge with him. So it's in that general range of 3-6GB, I htink. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-12 22:49 ` Linus Torvalds @ 2005-04-13 4:32 ` Matthias Urlichs 0 siblings, 0 replies; 179+ messages in thread From: Matthias Urlichs @ 2005-04-13 4:32 UTC (permalink / raw) To: linux-kernel Hi, Linus Torvalds schrub am Tue, 12 Apr 2005 15:49:07 -0700: >> Have to tried to import it? > > It would take days. You can always import it later and then graft it into the commit tree. That would of course change *every* commit node, but so what? They're small, and you can delete the old ones when you're done. -- Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-12 21:21 ` Linus Torvalds 2005-04-12 22:29 ` Krzysztof Halasa @ 2005-04-12 22:36 ` David Eger 2005-04-12 23:48 ` Panagiotis Issaris 2005-04-12 23:40 ` Andrea Arcangeli 2 siblings, 1 reply; 179+ messages in thread From: David Eger @ 2005-04-12 22:36 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Tue, Apr 12, 2005 at 02:21:58PM -0700, Linus Torvalds wrote: > > Yes. A tree is defined by the blobs it references (and the subtrees) but > it doesn't _contain_ them. It just contains a pointer to them. A pointer to them? You mean a SHA1 hash of them? or what? Where is the *real* data stored? The real files, the real patches? Are these somewhere completely outside of git? > > Therefore, "TREE" must be the *full* data, and since we have the following > > definition for CHANGESET: > > No. A tree is not the full data. A tree contains enough information to > _recreate_ the full data, but the tree itself just tells you _how_ to do > that. It doesn't contain very much of the data itself at all. Perhaps I'd understand this if you tell me what "recreate" means. If a have a SHA1 hash of a file, and I have the file, I can verify that said file has the SHA1 hash it's supposed to have, but I can't generate the file from it's hash... Sorry for being stubbornly dumb, but you'll have a couple of us puzzling at the README ;-) -dte ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-12 22:36 ` David Eger @ 2005-04-12 23:48 ` Panagiotis Issaris 0 siblings, 0 replies; 179+ messages in thread From: Panagiotis Issaris @ 2005-04-12 23:48 UTC (permalink / raw) To: David Eger Cc: Linus Torvalds, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Hi David, On Tue, Apr 12, 2005 at 06:36:23PM -0400, David Eger wrote: > > No. A tree is not the full data. A tree contains enough information > > to > > _recreate_ the full data, but the tree itself just tells you _how_ > > to do > > that. It doesn't contain very much of the data itself at all. > > Perhaps I'd understand this if you tell me what "recreate" means. > If a have a SHA1 hash of a file, and I have the file, I can verify > that said > file has the SHA1 hash it's supposed to have, but I can't generate the > file > from it's hash... But, but if you have that hexified SHA1 hash of a particular file you want to access, there would be a file with a filename equal to that hexified SHA1 hash which contained the compressed contents of the file you're looking for. At least, that's how I understood it... With friendly regards, Takis -- OpenPGP key: http://lumumba.luc.ac.be/takis/takis_public_key.txt fingerprint: 6571 13A3 33D9 3726 F728 AA98 F643 B12E ECF3 E029 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-12 21:21 ` Linus Torvalds 2005-04-12 22:29 ` Krzysztof Halasa 2005-04-12 22:36 ` David Eger @ 2005-04-12 23:40 ` Andrea Arcangeli 2005-04-12 23:45 ` Linus Torvalds 2 siblings, 1 reply; 179+ messages in thread From: Andrea Arcangeli @ 2005-04-12 23:40 UTC (permalink / raw) To: Linus Torvalds Cc: David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Tue, Apr 12, 2005 at 02:21:58PM -0700, Linus Torvalds wrote: > The full .git archive for 199 versions of the kernel (the 2.6.12-rc2 one > and a test-run of 198 patches from Andrew) is 111MB. In other words, > adding 198 "full" new kernels only grew the archive by 9MB (that's all > "actual disk usage" btw - the files themselves are smaller, but since they > all end up taking up a full disk block..) reiserfs can do tail packing, plus the disk block is meaningless when fetching the data from the network which is the real cost to worry about when synchronizing and downloading (disk cost isn't a big deal). The pagecache cost sounds a very minor one too, since you don't need the whole data in ram, not even all dentries need to be in cache. This is one of the reasons why you don't need to run readdir, and why you can discard the old trees anytime. At the rate of 9M for every 198 changeset checkins, that means I'll have to download 2.7G _uncompressible_ (i.e. already compressed with a bad per-file ratio due the too-small files) for a whole pack including all changesets without accounting the original 111MB of the original tree, with rsync -z of git. That compares with 514M _compressible_ with CVS format on-disk, and with ~79M of the CVS-network download with rsync -z of the CVS repository (assuming default gzip compression level). What BKCVS provided with 79M of rsync -z, now is provided with 2.8G of rsync -z, with a network-bound slowdown of -97.2%. Similar slowdowns should be expected for synchronizations over time while fetching new blobs etc... Ok, BKCVS has less than 60000 checkins due the linearization and coalescing of pulls that couldn't be represented losslessy in CVS, so the network-bound slowdown is less than -97.2%, my math is approximative, but the order of magnitude should remain the same. Clearly one can write an ad-hoc network protocol instead of using rsync/wget, but the server will need quite a bit of cpu and ram to do a checkout/update/sync efficiently to unpack all data and create all changesets to gzip and transfer. Anyway git simplicity and immutable hashes robustness certainly makes it an ideal interim format (and it may even be a very pratical local live format on-disk, except for the backups), I'm only unsure if it's a wise idea to build an SCM on top of the current git format or if it's better to use something like SCCS or CVS to coalesce all diffs of a single file together and to save space and make rsync -z very efficient too (or an approach like arch and darcs that stores changesets per file, i.e. patches). ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-12 23:40 ` Andrea Arcangeli @ 2005-04-12 23:45 ` Linus Torvalds 2005-04-13 0:14 ` Andrea Arcangeli 2005-04-13 9:30 ` Russell King 0 siblings, 2 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-12 23:45 UTC (permalink / raw) To: Andrea Arcangeli Cc: David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Wed, 13 Apr 2005, Andrea Arcangeli wrote: > > At the rate of 9M for every 198 changeset checkins, that means I'll have > to download 2.7G _uncompressible_ (i.e. already compressed with a bad > per-file ratio due the too-small files) for a whole pack including all > changesets without accounting the original 111MB of the original tree, > with rsync -z of git. That compares with 514M _compressible_ with CVS > format on-disk, and with ~79M of the CVS-network download with rsync -z of > the CVS repository (assuming default gzip compression level). Yes. CVS is much denser. CVS is also total crap. So your point is? Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-12 23:45 ` Linus Torvalds @ 2005-04-13 0:14 ` Andrea Arcangeli 2005-04-13 1:10 ` Linus Torvalds 2005-04-13 9:30 ` Russell King 1 sibling, 1 reply; 179+ messages in thread From: Andrea Arcangeli @ 2005-04-13 0:14 UTC (permalink / raw) To: Linus Torvalds Cc: David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Tue, Apr 12, 2005 at 04:45:07PM -0700, Linus Torvalds wrote: > Yes. CVS is much denser. > > CVS is also total crap. So your point is? I wasn't suggesting to use CVS. I meant that for a newly developed SCM, the CVS/SCCS format as storage may be more appealing than the current git format. I guess I should have said RCS instead of CVS, sorry if that created any confusion. The arch/darcs approach of pratically storing patches would also be much denser but it has no efficient way of doing "rcs up -p 1.x" on a file, that doesn't involve potentially unpacking tons of unrelated changesets. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-13 0:14 ` Andrea Arcangeli @ 2005-04-13 1:10 ` Linus Torvalds 2005-04-13 10:59 ` Andrea Arcangeli 2005-04-13 20:44 ` Matt Mackall 0 siblings, 2 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-13 1:10 UTC (permalink / raw) To: Andrea Arcangeli Cc: David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Wed, 13 Apr 2005, Andrea Arcangeli wrote: > > I wasn't suggesting to use CVS. I meant that for a newly developed SCM, > the CVS/SCCS format as storage may be more appealing than the current > git format. Go wild. I did mine in six days, and you've been whining about other peoples SCM's for three years. In other words - go and _do_ something instead of whining. I'm not interested. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-13 1:10 ` Linus Torvalds @ 2005-04-13 10:59 ` Andrea Arcangeli 2005-04-13 20:44 ` Matt Mackall 1 sibling, 0 replies; 179+ messages in thread From: Andrea Arcangeli @ 2005-04-13 10:59 UTC (permalink / raw) To: Linus Torvalds Cc: David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Tue, Apr 12, 2005 at 06:10:27PM -0700, Linus Torvalds wrote: > Go wild. I did mine in six days, and you've been whining about other > peoples SCM's for three years. Even if I spend 6 days doing git, you'd never have thrown away BK in exchange for git. > In other words - go and _do_ something instead of whining. I'm not > interested. CVS and SVN are already an order of magnitude more efficient than git at storing and exporting the data and they shouldn't annoy you during the checkins either, they have a backend much more efficient than git too, and yet you seem not to care about them. My suggestion was simply to at least change git to coalesce the diffs like CVS/SCCS, I'm only making a suggestion to give git a chance to have a backend at least as efficient as the one that CVS uses and to avoid running rsync on a 2.8G uncompressible blob. I don't have enough spare time to do something myself, my spare time would be too short anyway to make a difference in SCM space, so I'd rather spend it all in more innovative space where it might have a slight change to make a difference. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-13 1:10 ` Linus Torvalds 2005-04-13 10:59 ` Andrea Arcangeli @ 2005-04-13 20:44 ` Matt Mackall 2005-04-13 23:42 ` Krzysztof Halasa 1 sibling, 1 reply; 179+ messages in thread From: Matt Mackall @ 2005-04-13 20:44 UTC (permalink / raw) To: Linus Torvalds Cc: Andrea Arcangeli, David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Tue, Apr 12, 2005 at 06:10:27PM -0700, Linus Torvalds wrote: > > > On Wed, 13 Apr 2005, Andrea Arcangeli wrote: > > > > I wasn't suggesting to use CVS. I meant that for a newly developed SCM, > > the CVS/SCCS format as storage may be more appealing than the current > > git format. > > Go wild. I did mine in six days, and you've been whining about other > peoples SCM's for three years. I wrote a hack to do efficient delta storage with O(1) seeks for lookup and append last week, I believe it's been integrated into the latest Bazaar-NG. I expect it'll give better compression and performance than BK. Of course it ends up being O(revisions) for modifications or insertions (but that is probably a non-issue for the SCM models we're looking at). The git model is obviously very different, but I worry about the slop space implied. With 200k file revision and an average of 2k slop per file, that's 400MB of slop, or almost the size of an equivalent delta compressed kernel repo. Now if you can assume that blobs never change and are never deleted, you can simply append them all onto a log, and then index them with a separate file containing an htree of (sha1, offset, length) or the like. Since the key is already a strong hash, this is an excellent match and avoids rehashing in the kernel's directory lookup. And it'll save an inode, a directory entry, and about half a data block per entry. "Open" will also be cheaper as there's no per-revision inode to grab. I could hack on this if you think it fits with the git model, otherwise I'll go back to my other experiments.. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-13 20:44 ` Matt Mackall @ 2005-04-13 23:42 ` Krzysztof Halasa 2005-04-14 0:13 ` Matt Mackall 0 siblings, 1 reply; 179+ messages in thread From: Krzysztof Halasa @ 2005-04-13 23:42 UTC (permalink / raw) To: Matt Mackall Cc: Linus Torvalds, Andrea Arcangeli, David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Matt Mackall <mpm@selenic.com> writes: > Now if you can assume that blobs never change and are never deleted, > you can simply append them all onto a log, and then index them with a > separate file containing an htree of (sha1, offset, length) or the > like. That mean a problem with rsync, though. BTW: I think the bandwidth increase compared to bkcvs isn't that obvious. After a file is modified with git, it has to be transmitted (plus small additional things. If a file is modified with bkcvs, it has to be transmitted (the whole RCS file) as well. Only the initial rsync would be much smaller with bkcvs. -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-13 23:42 ` Krzysztof Halasa @ 2005-04-14 0:13 ` Matt Mackall 0 siblings, 0 replies; 179+ messages in thread From: Matt Mackall @ 2005-04-14 0:13 UTC (permalink / raw) To: Krzysztof Halasa Cc: Linus Torvalds, Andrea Arcangeli, David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Thu, Apr 14, 2005 at 01:42:11AM +0200, Krzysztof Halasa wrote: > Matt Mackall <mpm@selenic.com> writes: > > > Now if you can assume that blobs never change and are never deleted, > > you can simply append them all onto a log, and then index them with a > > separate file containing an htree of (sha1, offset, length) or the > > like. > > That mean a problem with rsync, though. I believe 200k inodes is a problem for rsync too. But we can simply grab the remote htree, do a tree compare, find the ranges of the remote file we need, sort and merge the ranges, and then pull them. That will surely trounce rsync. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-12 23:45 ` Linus Torvalds 2005-04-13 0:14 ` Andrea Arcangeli @ 2005-04-13 9:30 ` Russell King 2005-04-13 10:20 ` Andrea Arcangeli 2005-04-13 14:43 ` Linus Torvalds 1 sibling, 2 replies; 179+ messages in thread From: Russell King @ 2005-04-13 9:30 UTC (permalink / raw) To: Linus Torvalds Cc: Andrea Arcangeli, David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Tue, Apr 12, 2005 at 04:45:07PM -0700, Linus Torvalds wrote: > On Wed, 13 Apr 2005, Andrea Arcangeli wrote: > > At the rate of 9M for every 198 changeset checkins, that means I'll have > > to download 2.7G _uncompressible_ (i.e. already compressed with a bad > > per-file ratio due the too-small files) for a whole pack including all > > changesets without accounting the original 111MB of the original tree, > > with rsync -z of git. That compares with 514M _compressible_ with CVS > > format on-disk, and with ~79M of the CVS-network download with rsync -z of > > the CVS repository (assuming default gzip compression level). > > Yes. CVS is much denser. > > CVS is also total crap. So your point is? And my entire 2.6.12-rc2 BK tree, unchecked out, is about 220MB, which is more dense than CVS. BK is also a lot better than CVS. So _your_ point is? 8) Note: I'm _not_ arguing with your sentiments towards CVS. However, I think the space usage point still stands. What is the space usage behaviour when you have multiple git trees? Do we need a git relink command in git-pasky? 8) -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-13 9:30 ` Russell King @ 2005-04-13 10:20 ` Andrea Arcangeli 2005-04-13 14:43 ` Linus Torvalds 1 sibling, 0 replies; 179+ messages in thread From: Andrea Arcangeli @ 2005-04-13 10:20 UTC (permalink / raw) To: Linus Torvalds, David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Wed, Apr 13, 2005 at 10:30:52AM +0100, Russell King wrote: > And my entire 2.6.12-rc2 BK tree, unchecked out, is about 220MB, which > is more dense than CVS. Yep, this is why I mentioned SCCS format too, I didn't know it was even smaller, but I expected a similar density from SCCS. > Note: I'm _not_ arguing with your sentiments towards CVS. However, I > think the space usage point still stands. If it wasn't for network synchronization it almost wouldn't matter, but fetching 2.8G uncompressible when I could simply fetch 220MB compressible (that will compress with zlib at little cost during rsync to less than 78M), sounds a bit overkill. > What is the space usage behaviour when you have multiple git trees? Multiple trees in the sense of pulls from multiple developers aren't more costly than a normal checkin, due the "soft hardlink" property of the hashes. It's just every checkin taking lots of space, and generating a new uncompressible blobs every time a changeset touches one file. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-13 9:30 ` Russell King 2005-04-13 10:20 ` Andrea Arcangeli @ 2005-04-13 14:43 ` Linus Torvalds 1 sibling, 0 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-13 14:43 UTC (permalink / raw) To: Russell King Cc: Andrea Arcangeli, David Eger, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Wed, 13 Apr 2005, Russell King wrote: > > And my entire 2.6.12-rc2 BK tree, unchecked out, is about 220MB, which > is more dense than CVS. > > BK is also a lot better than CVS. So _your_ point is? Hey, anybody who wants to argue that BK is getter than GIT won't be getting any counter-arguments from me. The fact is, I have constraints. Like needing something to work within a few days. If somebody comes up with a ultra-fast, replicatable, space efficient SCM in three days, I'm all over it. In the meantime, I'd suggest people who worry about network bandwidth try to work out a synchronization protocol that allows you to send "diff updates" between git repositories. The git model doesn't preclude looking at the objects and sending diffs instead (and re-creating the objects on the other side). But my time-constraints _do_. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 21:00 ` Linus Torvalds 2005-04-09 21:00 ` tony.luck 2005-04-09 21:08 ` Linus Torvalds @ 2005-04-10 2:07 ` Paul Jackson 2005-04-10 2:20 ` Paul Jackson 2005-04-10 2:09 ` Paul Jackson 2005-04-10 7:51 ` Junio C Hamano 4 siblings, 1 reply; 179+ messages in thread From: Paul Jackson @ 2005-04-10 2:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: pasky, rddunlap, ross, linux-kernel Linus wrote: > Damn, that's painful. I suspect I will have to change the format somehow. The sha1 (ascii) digests for 16817 files take: 689497 bytes before compression 397475 bytes after minigzip The pathnames, relative to top of tree, for these 16817 files take: 503983 bytes before compression 85786 bytes after minigzip compression I doubt any fancifying up of the pathname storage will gain much. However going from binary to ascii sha1 digest might help (compresses better, I suspect - I'll have to write a few lines of code to see). -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 2:07 ` Paul Jackson @ 2005-04-10 2:20 ` Paul Jackson 0 siblings, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-10 2:20 UTC (permalink / raw) To: Paul Jackson; +Cc: torvalds, pasky, rddunlap, ross, linux-kernel >From before: The sha1 (ascii) digests for 16817 files take: 689497 bytes before compression 397475 bytes after minigzip New numbers: The sha1 (binary) digests for 16817 files take: 336340 bytes before compression 334943 bytes after minigzip So compressing binary digests isn't worth a darn, and compressing ascii digests gets them down to within 18% of binary digests in size. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 21:00 ` Linus Torvalds ` (2 preceding siblings ...) 2005-04-10 2:07 ` Paul Jackson @ 2005-04-10 2:09 ` Paul Jackson 2005-04-10 7:51 ` Junio C Hamano 4 siblings, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-10 2:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: pasky, rddunlap, ross, linux-kernel > Then a "tree" object would point to a "directory" object, Ah - light bulb flickers - in _separate_ files. Yes, that obviously makes a difference. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 21:00 ` Linus Torvalds ` (3 preceding siblings ...) 2005-04-10 2:09 ` Paul Jackson @ 2005-04-10 7:51 ` Junio C Hamano 2005-04-10 5:53 ` Christopher Li ` (2 more replies) 4 siblings, 3 replies; 179+ messages in thread From: Junio C Hamano @ 2005-04-10 7:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Listing the file paths and their sigs included in a tree to make a snapshot of a tree state sounds fine, and diffing two trees by looking at the sigs between two such files sounds fine as well. But I am wondering what your plans are to handle renames---or does git already represent them? ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 7:51 ` Junio C Hamano @ 2005-04-10 5:53 ` Christopher Li 2005-04-10 9:28 ` Junio C Hamano ` (2 more replies) 2005-04-10 11:21 ` Proposal for shell-patch-format [was: Re: more git updates..] Rutger Nijlunsing 2005-04-10 15:44 ` more git updates Linus Torvalds 2 siblings, 3 replies; 179+ messages in thread From: Christopher Li @ 2005-04-10 5:53 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote: > > But I am wondering what your plans are to handle renames---or > does git already represent them? > Rename should just work. It will create a new tree object and you will notice that in the entry that changed, the hash for the blob object is the same. Chris ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 5:53 ` Christopher Li @ 2005-04-10 9:28 ` Junio C Hamano 2005-04-10 7:06 ` Christopher Li 2005-04-10 9:48 ` Petr Baudis 2005-04-10 9:40 ` Wichert Akkerman 2005-04-10 9:41 ` Petr Baudis 2 siblings, 2 replies; 179+ messages in thread From: Junio C Hamano @ 2005-04-10 9:28 UTC (permalink / raw) To: Christopher Li Cc: Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List >>>>> "CL" == Christopher Li <lkml@chrisli.org> writes: CL> On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote: >> >> But I am wondering what your plans are to handle renames---or >> does git already represent them? >> CL> Rename should just work. It will create a new tree object and you CL> will notice that in the entry that changed, the hash for the blob CL> object is the same. Sorry, I was unclear. But doesn't that imply that a SCM built on top of git storage needs to read all the commit and tree records up to the common ancestor to show tree diffs between two forked tree? I suspect that another problem is that noticing the move of the same SHA1 hash from one pathname to another and recognizing that as a rename would not always work in the real world, because sometimes people move files *and* make small changes at the same time. If git is meant to be an intermediate format to suck existing kernel history out of BK so that the history can be converted for the next SCM chosen for the kernel work, I would imagine that there needs to be a way to represent such a case. Maybe convert a file rename as two git trees (one tree for pure move which immediately followed by another tree for edit) if it is not a pure move? ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 9:28 ` Junio C Hamano @ 2005-04-10 7:06 ` Christopher Li 2005-04-10 11:38 ` tony.luck 2005-04-10 9:48 ` Petr Baudis 1 sibling, 1 reply; 179+ messages in thread From: Christopher Li @ 2005-04-10 7:06 UTC (permalink / raw) To: Junio C Hamano Cc: Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sun, Apr 10, 2005 at 02:28:54AM -0700, Junio C Hamano wrote: > >>>>> "CL" == Christopher Li <lkml@chrisli.org> writes: > > CL> On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote: > >> > >> But I am wondering what your plans are to handle renames---or > >> does git already represent them? > >> > > CL> Rename should just work. It will create a new tree object and you > CL> will notice that in the entry that changed, the hash for the blob > CL> object is the same. > > Sorry, I was unclear. But doesn't that imply that a SCM built > on top of git storage needs to read all the commit and tree > records up to the common ancestor to show tree diffs between two > forked tree? > > I suspect that another problem is that noticing the move of the > same SHA1 hash from one pathname to another and recognizing that > as a rename would not always work in the real world, because > sometimes people move files *and* make small changes at the same > time. If git is meant to be an intermediate format to suck > existing kernel history out of BK so that the history can be > converted for the next SCM chosen for the kernel work, I would > imagine that there needs to be a way to represent such a case. > Maybe convert a file rename as two git trees (one tree for pure > move which immediately followed by another tree for edit) if it > is not a pure move? > Git is not a SCM yet. For the rename + change set it should internally handle by pure rename only plus the extra delta. The current git don't have per file change history. From git's point of view some file deleted and the other file appeared with same content. It is the top level SCM to handle that correctly. Rename a directory will be even more fun. Chris ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 7:06 ` Christopher Li @ 2005-04-10 11:38 ` tony.luck 0 siblings, 0 replies; 179+ messages in thread From: tony.luck @ 2005-04-10 11:38 UTC (permalink / raw) To: Christopher Li Cc: Junio C Hamano, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List >handle by pure rename only plus the extra delta. The current git don't >have per file change history. From git's point of view some file deleted >and the other file appeared with same content. > >It is the top level SCM to handle that correctly. >Rename a directory will be even more fun. But from a git perspective it will be very efficient. Imagine that Linus decides to rename arch/i386 as arch/x86 ... at the git repository level this just requires a changeset, a new top level tree, and a new tree for the arch directory showing that i386 changed to x86. That's all ... every files below that didn't change, so the blobs for the files are all the same. -Tony ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-10 9:28 ` Junio C Hamano 2005-04-10 7:06 ` Christopher Li @ 2005-04-10 9:48 ` Petr Baudis 1 sibling, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-10 9:48 UTC (permalink / raw) To: Junio C Hamano Cc: Christopher Li, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Dear diary, on Sun, Apr 10, 2005 at 11:28:54AM CEST, I got a letter where Junio C Hamano <junkio@cox.net> told me that... > >>>>> "CL" == Christopher Li <lkml@chrisli.org> writes: > > CL> On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote: > >> > >> But I am wondering what your plans are to handle renames---or > >> does git already represent them? > >> > > CL> Rename should just work. It will create a new tree object and you > CL> will notice that in the entry that changed, the hash for the blob > CL> object is the same. > > Sorry, I was unclear. But doesn't that imply that a SCM built > on top of git storage needs to read all the commit and tree > records up to the common ancestor to show tree diffs between two > forked tree? No. See diff-tree output and http://pasky.or.cz/~pasky/dev/git/gitdiff-do for how it's done. Basically, you just take the two trees and compare them linearily (do a normal diff on them, essentialy). Then the differences you spot this way are everything what needs to appear in the patch. > I suspect that another problem is that noticing the move of the > same SHA1 hash from one pathname to another and recognizing that > as a rename would not always work in the real world, because > sometimes people move files *and* make small changes at the same > time. If git is meant to be an intermediate format to suck > existing kernel history out of BK so that the history can be > converted for the next SCM chosen for the kernel work, I would > imagine that there needs to be a way to represent such a case. > Maybe convert a file rename as two git trees (one tree for pure > move which immediately followed by another tree for edit) if it > is not a pure move? Actually, this could be possible too I think. We will have to make diff-tree two-pass, but it is already so blinding fast that I guess that doesn't hurt too much. I might try to get my hands on that. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 5:53 ` Christopher Li 2005-04-10 9:28 ` Junio C Hamano @ 2005-04-10 9:40 ` Wichert Akkerman 2005-04-10 9:41 ` Petr Baudis 2 siblings, 0 replies; 179+ messages in thread From: Wichert Akkerman @ 2005-04-10 9:40 UTC (permalink / raw) To: Christopher Li Cc: Junio C Hamano, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Previously Christopher Li wrote: > Rename should just work. It will create a new tree object and you > will notice that in the entry that changed, the hash for the blob > object is the same. What if you rename and change a file within a changeset? Wichert. -- Wichert Akkerman <wichert@wiggy.net> It is simple to make things. http://www.wiggy.net/ It is hard to make things simple. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-10 5:53 ` Christopher Li 2005-04-10 9:28 ` Junio C Hamano 2005-04-10 9:40 ` Wichert Akkerman @ 2005-04-10 9:41 ` Petr Baudis 2005-04-10 7:09 ` Christopher Li 2 siblings, 1 reply; 179+ messages in thread From: Petr Baudis @ 2005-04-10 9:41 UTC (permalink / raw) To: Christopher Li Cc: Junio C Hamano, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Dear diary, on Sun, Apr 10, 2005 at 07:53:40AM CEST, I got a letter where Christopher Li <lkml@chrisli.org> told me that... > On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote: > > > > But I am wondering what your plans are to handle renames---or > > does git already represent them? > > > > Rename should just work. It will create a new tree object and you > will notice that in the entry that changed, the hash for the blob > object is the same. Which is of course wrong when you want to do proper merging, examine per-file history, etc. One solution which springs to my mind is to have a UUID accompany each blob and tree; that will take relatively lot of space though, and I'm not sure it is really worth it. How many renames were there in the 64k commits so far anyway? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-10 9:41 ` Petr Baudis @ 2005-04-10 7:09 ` Christopher Li 0 siblings, 0 replies; 179+ messages in thread From: Christopher Li @ 2005-04-10 7:09 UTC (permalink / raw) To: Petr Baudis Cc: Junio C Hamano, Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sun, Apr 10, 2005 at 11:41:53AM +0200, Petr Baudis wrote: > Dear diary, on Sun, Apr 10, 2005 at 07:53:40AM CEST, I got a letter > where Christopher Li <lkml@chrisli.org> told me that... > > On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote: > > > > > > But I am wondering what your plans are to handle renames---or > > > does git already represent them? > > > > > > > Rename should just work. It will create a new tree object and you > > will notice that in the entry that changed, the hash for the blob > > object is the same. > > Which is of course wrong when you want to do proper merging, examine > per-file history, etc. One solution which springs to my mind is to have > a UUID accompany each blob and tree; that will take relatively lot of > space though, and I'm not sure it is really worth it. It should just use the rename + change two step then it is tractable with git now. Chris ^ permalink raw reply [flat|nested] 179+ messages in thread
* Proposal for shell-patch-format [was: Re: more git updates..] 2005-04-10 7:51 ` Junio C Hamano 2005-04-10 5:53 ` Christopher Li @ 2005-04-10 11:21 ` Rutger Nijlunsing 2005-04-10 15:44 ` more git updates Linus Torvalds 2 siblings, 0 replies; 179+ messages in thread From: Rutger Nijlunsing @ 2005-04-10 11:21 UTC (permalink / raw) To: Junio C Hamano, Linus Torvalds Cc: Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote: > Listing the file paths and their sigs included in a tree to make > a snapshot of a tree state sounds fine, and diffing two trees by > looking at the sigs between two such files sounds fine as well. > > But I am wondering what your plans are to handle renames---or > does git already represent them? git doesn't represent transitions (or deltas), but only state. So it's not (much) more then a .tar file from version-management perspective; the only difference being that a git-tree has a comment field and a predecessor-reference, which are currently not used in determining the 'patch' between two trees. Deltas are derived by comparing different versions and determining the difference by reverse-engineering the differences which got us from version A to version B. Deltas are currently described as patch(1)es. Patches don't have the concept of 'renaming', so even after determining that file X has been renamed to Y, we have no container for this fact. A patch(1) only contains local-file-edits: substitute lines by other lines. Deltas are not needed to follow a tree; deltas are useful for merging branches of versions, and for reviewing purposes. This is comparable to using tar for version-management: it is very common to weekly tar your current version of your project as a poor-mans-version management for one-person one-project. So what is needed is a way to represent deltas which can contain more than only traditional patches. I would propose a simple format: the shell-script in a fixed-format. Shell-patch format in EBNF: <shellpatch> ::= ( <comment>? <command>* )* <comment> ::= <commentline>+ The comments contains the text describing the function of the patch following it. <commentline> ::= "# " <text> <command> ::= "mv " <pathname> " " <pathname> "\n" | "cp " <filename> " " <filename> "\n" | "chmod " <mode> <pathname> "\n" | "patch <<__UNIQUE_STRING__\n" <patch> "__UNIQUE_STRING__\n" (where UNIQUE_STRING must not be contained in patch) <filename> ::= <pathname> (but pointing to a file) <pathname> ::= a pathname relative to '.'; escaping special characters the shell-way; may not contain '..'. Example: # Rename file b to a1, and change a line. mv b a1 patch <<__END__ *** a1 Sun Apr 10 11:43:37 2005 --- a2 Sun Apr 10 11:43:41 2005 *************** *** 1,4 **** 1 2 ! from 3 --- 1,4 ---- 1 2 ! to 3 __END__ Advantages: - ASCII! - a shell-patch is executable without extra tooling - a shell-patch is readable and therefore reviewable - a shell-patch is forward-compatible: a shell-patch acts like a patch (since patch(1) ignores garbage around patch :), but not backwards-compatible. - extensible - the heavy-lifting is done by 'patch' Disadvantages: - no deltas for binary files Open issues: - <comment> could be made more structured; maybe containing fields like Sujbect:, Author:, Signed-By:, certificates, ... (BitKeeper seems to be using "# " <field> ":" <value> "\n" lines) - patch(1) doesn't know any directories. Should shell-patch know directories? This implies commands working on directories to (like directory renaming, mode changing, ...). Otherwise directories are implicit (a file in a directories implies the existance of that directory). Also implies mkdir and rmdir as shell-patch commands. - extra commands might be useful to conserve more state(changes): ln -s -- for symbolic links; ln -- for hard links; chown -- for permissions; chattr -- for storing extended attributes touch -- for setting timestamps (probably creation time only, since mtime is something git relies on) ...and for the really adventurous: sed 's,<fromstring>,<tostring>,' -- for substitutions (this is something darcs supports, but which I think is too bothersome to use since it is difficult to reverse engineere from two random trees) Why a fixed format at all? - This way, the executable shell-patch can be proven to be harmless to the machine: 'rm -rf /' is a valid shell-script, but not a valid shell-patch (since 'rm' is not valid command, random flags like '-rf' are not supported, and '/' is an absolute pathname. - A fixed format enables tooling to support such a patch format; for example creating the reverse-patch, merging patches (yeah, 'cat' also merges patches...). ...what has this to do with git? Not much and everything, depending on how you look onto it. 'git' is 'tar', and 'shell-patch' is 'patch'; both orthogonal concepts but very usable in combination. We'll look at getting from two git trees to a shell-patch. Diffing the trees would not only look at the file and per file at the hashes, but also the other way around: which hash values are used more than once. For files with the same hash value, compare the contents (and rest of attributes); this is needed since the mapping from file contents to sha1 is one-way. When the contents is the same, the shell-patch-command to generate is obviously a 'cp'. For example, we have got two trees in git (pathname -> hash value): tree1/file1 -> 1234 tree1/file2 -> 4567 and tree2/file1 -> 3456 tree2/file3 -> 4567 tree2/file4 -> 4567 ..this could generate shell-patch: # Comments-go-here mv tree2/file2 tree2/file3 cp tree2/file3 tree2/file4 patch tree1/file1 <<__FILE_PATCH__ (patch-goes-here) __FILE_PATCH__ ...by an algorithm which starts by determining all renames, then all copies, and finally all patches. Comments? -- Rutger Nijlunsing ---------------------- linux-kernel at tux.tmfweb.nl never attribute to a conspiracy which can be explained by incompetence ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 7:51 ` Junio C Hamano 2005-04-10 5:53 ` Christopher Li 2005-04-10 11:21 ` Proposal for shell-patch-format [was: Re: more git updates..] Rutger Nijlunsing @ 2005-04-10 15:44 ` Linus Torvalds 2005-04-10 17:00 ` Rutger Nijlunsing 2005-04-10 18:50 ` Paul Jackson 2 siblings, 2 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 15:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sun, 10 Apr 2005, Junio C Hamano wrote: > > But I am wondering what your plans are to handle renames---or > does git already represent them? You can represent renames on top of git - git itself really doesn't care. In many ways you can just see git as a filesystem - it's content- addressable, and it has a notion of versioning, but I really really designed it coming at the problem from the viewpoint of a _filesystem_ person (hey, kernels is what I do), and I actually have absolutely _zero_ interest in creating a traditional SCM system. So to take renaming a file as an example - why do you actually want to track renames? In traditional SCM's, you do it for two reasons: - space efficiency. Most SCM's are based on describing changes to a file, and compress the data by doing revisions on the same file. In order to continue that process past a rename, such an SCM _has_ to track renames, or lose the delta-based approach. The most trivial example of this is "diff", ie a rename ends up generating a _huge_ diff unless you track the rename explicitly. GIT doesn't care. There is _zero_ space efficiency in trying to track renames. In fact, it would add overhead to the system, not lessen it. That's because GIT fundamentally doesn't do the "delta-within-a-file" model. - annotate/blame. This is a valid concern, but the fact is, I never use it. It may be a deficiency of mine, but I simply don't do the per-line thing when I debug or try to find who was responsible. I do "blame" on a much bigger-picture level, and I personally believe (pretty strongly) that per-line annotations are not actually a good thing - they come not because people _want_ to do things at that low level, but because historically, you didn't _have_ the bigger-picture thing. In other words, pretty much every SCM out there is based on SCCS "mentally", even if not in any other model. That's why people think per-line blame is important - you have that mental model. So consider me deficient, or consider me radical. It boils down to the same thing. Renames don't matter. That said, if somebody wants to create a _real_ SCM (rather than my notion of a pure content tracker) on top of GIT, you probably could fairly easily do so by imposing a few limitations on a higher level. For example, most SCM's that track renames require that the user _tell_ them about the renames: you do a "bk mv" or a "svn rename" or something. If you want to do the same on top of GIT, then you should think of GIT as what it is: GIT just tracks contents. It's a filesystem - although a fairly strange one. How would you track renames on top of that? Easy: add your own fields to the GIT revision messages: GIT enforces the header, but you can add anything you want to the "free-form" part that follows it. Same goes for any other information where you care about what happens "within" a file. GIT simply doesn't track it. You can build things on top of GIT if you want to, though. They may not be as efficient as they would be if they were built _into_ GIT, but on the other hand GIT does a lot of other things a hell of a lot faster thanks to it's design. So whether you agree with the things that _I_ consider important probably depends on how you work. The real downside of GIT may be that _my_ way of doing things is quite possibly very rare. But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 15:44 ` more git updates Linus Torvalds @ 2005-04-10 17:00 ` Rutger Nijlunsing 2005-04-10 18:50 ` Paul Jackson 1 sibling, 0 replies; 179+ messages in thread From: Rutger Nijlunsing @ 2005-04-10 17:00 UTC (permalink / raw) To: Linus Torvalds Cc: Junio C Hamano, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List On Sun, Apr 10, 2005 at 08:44:56AM -0700, Linus Torvalds wrote: > > > On Sun, 10 Apr 2005, Junio C Hamano wrote: > > > > But I am wondering what your plans are to handle renames---or > > does git already represent them? > > You can represent renames on top of git - git itself really doesn't care. > In many ways you can just see git as a filesystem - it's content- > addressable, and it has a notion of versioning, but I really really > designed it coming at the problem from the viewpoint of a _filesystem_ > person (hey, kernels is what I do), and I actually have absolutely _zero_ > interest in creating a traditional SCM system. > > So to take renaming a file as an example - why do you actually want to > track renames? In traditional SCM's, you do it for two reasons: > > - space efficiency. Most SCM's are based on describing changes to a file, [snip] > - annotate/blame. This is a valid concern, but the fact is, I never use [snip] - merging. When the parent tree renames a file, it's easier for an out-of-tree patch to get up-to-date. - reviewing. A huge patch with 2000 added lines and 1990 removed lines is more difficult to review then a rename + 10 lines patch. > So consider me deficient, or consider me radical. It boils down to the > same thing. Renames don't matter. When you've got no out-of-tree patches since you've got the parent-of-all-trees, then they don't matter, that's true :) > So whether you agree with the things that _I_ consider important probably > depends on how you work. The real downside of GIT may be that _my_ way of > doing things is quite possibly very rare. -- Rutger Nijlunsing ---------------------------------- eludias ed dse.nl never attribute to a conspiracy which can be explained by incompetence ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 15:44 ` more git updates Linus Torvalds 2005-04-10 17:00 ` Rutger Nijlunsing @ 2005-04-10 18:50 ` Paul Jackson 2005-04-10 20:57 ` Linus Torvalds 1 sibling, 1 reply; 179+ messages in thread From: Paul Jackson @ 2005-04-10 18:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: junkio, rddunlap, ross, linux-kernel Linus wrote: > It's a filesystem - although a > fairly strange one. Ah ha - that explains the read-tree and write-tree names. The read-tree pulls stuff out of this file system into your working files, clobbering local edits. This is like the read(2) system call, which clobbers stuff in your read buffer. The write-tree pushes stuff down into the file system, just like write(2) pushes data into the kernel. I was getting all kind of frustrated yesterday trying to use Linus's git commands, coming at these names with my SCM hat on. That way of thinking really doesn't work well here. I will have to look more closely at pasky's GIT toolkit if I want to see an SCM style interface. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 18:50 ` Paul Jackson @ 2005-04-10 20:57 ` Linus Torvalds 2005-04-10 19:03 ` Christopher Li 2005-04-10 23:14 ` Paul Jackson 0 siblings, 2 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 20:57 UTC (permalink / raw) To: Paul Jackson; +Cc: junkio, rddunlap, ross, linux-kernel On Sun, 10 Apr 2005, Paul Jackson wrote: > > Ah ha - that explains the read-tree and write-tree names. > > The read-tree pulls stuff out of this file system into > your working files, clobbering local edits. This is like > the read(2) system call, which clobbers stuff in your > read buffer. Yes. Except it's a two-stage thing, where the staging area is always the "current directory cache". So a "read-tree" always reads the tree information into the directory cache, but does not actually _update_ any of the files it "caches". To do that, you need to do a "checkout-cache" phase. Similarly, "write-tree" writes the current directory cache contents into a set of tree files. But in order to have that match what is actually in your directory right now, you need to have done a "update-cache" phase before you did the "write-tree". So there is always a staging area between the "real contents" and the "written tree". > That way of thinking really doesn't work well here. > > I will have to look more closely at pasky's GIT toolkit > if I want to see an SCM style interface. Yes. You really should think of GIT as a filesystem, and of me as a _systems_ person, not an SCM person. In fact, I tend to detest SCM's. I think the reason I worked so well with BitKeeper is that Larry used to do operating systems. He's also a systems person, not really an SCM person. Or at least he's in between the two. My operations are like the "system calls". Useless on their own: they're not real applications, they're just how you read and write files in this really strange filesystem. You need to wrap them up to make them do anything sane. For example, take "commit-tree" - it really just says that "this is the new tree, and these other trees were its parents". It doesn't do any of the actual work to _get_ those trees written. So to actually do the high-level operation of a real commit, you need to first update the current directory cache to match what you want to commit (the "update-cache" phase). Then, when your directory cache matches what you want to commit (which is NOT necessarily the same thing as your actual current working area - if you don't want to commit some of the changes you have in your tree, you should avoid updating the cache with those changes), you do stage 2, ie "write-tree". That writes a tree node that describes what you want to commit. Only THEN, as phase three, do you do the "commit-tree". Now you give it the tree you want to commit (remember - that may not even match your current directory contents), and the history of how you got here (ie you tell commit what the previous commit(s) were), and the changelog. So a "commit" in SCM-speak is actually three totally separate phases in my filesystem thing, and each of the phases (except for the last "commit-tree" which is the thing that brings it all together) is actually in turn many smaller parts (ie "update-cache" may have been called hundreds of times, and "write-tree" will write several tree objects that point to each other). Similarly, a "checkout" really is about first finding the tree ID you want to check out, and then bringing it into the "directory cache" by doing a "read-tree" on it. You can then actually update the directory cache further: you might "read-tree" _another_ project, or you could decide that you want to keep one of the files you already had. So in that scneario, after doing the read-tree you'd do an "update-cache" on the file you want to keep in your current directory structure, which updates your directory cache to be a _mix_ of the original tree you now want to check out _and_ of the file you want to use from your current directory. Then doing a "checkout-cache -a" will actually do the actual checkout, and only at that point does your working directory really get changed. Btw, you don't even have to have any working directory files at all. Let's say that you have two independent trees, and you want to create a new commit that is the join of those two trees (where one of the trees take precedence). You'd do a "read-tree <a> <b>", which will create a directory cache (but not check out) that is the union of the <a> and <b> trees (<b> will overrride). And then you can do a "write-tree" and commit the resulting tree - without ever having _any_ of those files checked out. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 20:57 ` Linus Torvalds @ 2005-04-10 19:03 ` Christopher Li 2005-04-10 22:38 ` Linus Torvalds 2005-04-10 23:14 ` Paul Jackson 1 sibling, 1 reply; 179+ messages in thread From: Christopher Li @ 2005-04-10 19:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Jackson, junkio, rddunlap, ross, linux-kernel On Sun, Apr 10, 2005 at 01:57:33PM -0700, Linus Torvalds wrote: > > > That way of thinking really doesn't work well here. > > > > I will have to look more closely at pasky's GIT toolkit > > if I want to see an SCM style interface. > > Yes. You really should think of GIT as a filesystem, and of me as a > _systems_ person, not an SCM person. In fact, I tend to detest SCM's. I > think the reason I worked so well with BitKeeper is that Larry used to do > operating systems. He's also a systems person, not really an SCM person. > Or at least he's in between the two. > Yes, I am puzzled for a while how to use git until I realize that it is a version file system. BTW, one thing I learn from ext3 is that it is very useful to have some compatible flag for future development. I think if we want to reserve some room in the file format for further development of git, it is the right time to do it before it get bigs. e.g. an optional variable size header in "tree" including format version and capability etc. I can see the counter argument that it is not as important as a real file system because it is a lot easier bring it off line to upgrade the whole tree. One the other hand, it is almost did not cost any thing in terms of space and CPU time, most directory did not get to file system block boundary so extra few bytes is almost free. If carefully planed, it will make the future up grade of git a lot smoother. What do you think? Chris ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 19:03 ` Christopher Li @ 2005-04-10 22:38 ` Linus Torvalds 2005-04-10 19:53 ` Christopher Li 2005-04-11 6:57 ` bert hubert 0 siblings, 2 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 22:38 UTC (permalink / raw) To: Christopher Li; +Cc: Paul Jackson, junkio, rddunlap, ross, linux-kernel On Sun, 10 Apr 2005, Christopher Li wrote: > > BTW, one thing I learn from ext3 is that it is very useful to have some > compatible flag for future development. I think if we want to reserve some > room in the file format for further development of git Way ahead of you. This is (one reason) why all git objects have the type embedded inside of them. The format of all objects is totally regular: they are all compressed with zlib, they are all named by the sha1 file, and they all start out with a magic header of "<typename> <typesize><nul byte>". So if I want to create a new kind of tree object that does the same thing as the old one but has some other layout, I'd just call it something else. Like "dir". That was what I initially planned to do about the change to recursive tree objects, but it turned out to actually be a lot easier to just encode it in the old type (that way the routines that read it don't even have to care about old/new types - it's all the same to them). Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 22:38 ` Linus Torvalds @ 2005-04-10 19:53 ` Christopher Li 2005-04-10 23:21 ` Linus Torvalds 2005-04-11 6:57 ` bert hubert 1 sibling, 1 reply; 179+ messages in thread From: Christopher Li @ 2005-04-10 19:53 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Jackson, junkio, rddunlap, ross, linux-kernel On Sun, Apr 10, 2005 at 03:38:39PM -0700, Linus Torvalds wrote: > > > On Sun, 10 Apr 2005, Christopher Li wrote: > > > > BTW, one thing I learn from ext3 is that it is very useful to have some > > compatible flag for future development. I think if we want to reserve some > > room in the file format for further development of git > > Way ahead of you. > > This is (one reason) why all git objects have the type embedded inside of > them. The format of all objects is totally regular: they are all > compressed with zlib, they are all named by the sha1 file, and they all > start out with a magic header of "<typename> <typesize><nul byte>". > > So if I want to create a new kind of tree object that does the same thing > as the old one but has some other layout, I'd just call it something else. > Like "dir". That was what I initially planned to do about the change to > recursive tree objects, but it turned out to actually be a lot easier to > just encode it in the old type (that way the routines that read it don't > even have to care about old/new types - it's all the same to them). Ha, that is right. You put the new type into same object trick me into thinking I have to do the same way. Totally forget I can introduce new type of objects. It is even cleaner. Cool. How about deleting trees from the caches? I don't need to delete stuff from the official tree. It is more for my local version control. Here is the usage case, - I check out the git.git. - using quilt to build my series of patches, git-hack1, git-hack2.. git-hack6. let's say those are store in git cache as well - I pick some of them come up with a clean one "submit.patch" - submit.patch get merged into official git tree. - Now I want to get rid of the hack1 to hack6, but how? One way to do it is never commit hack1 to hack6 into git or cache. They stay as quilt patches only. But it is very tempting to let quilt using git instead of the .pc/ directory, quilt can simplify as some usage case of patch and git. Chris ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 19:53 ` Christopher Li @ 2005-04-10 23:21 ` Linus Torvalds 2005-04-10 21:28 ` Christopher Li 0 siblings, 1 reply; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 23:21 UTC (permalink / raw) To: Christopher Li; +Cc: Paul Jackson, junkio, rddunlap, ross, linux-kernel On Sun, 10 Apr 2005, Christopher Li wrote: > > How about deleting trees from the caches? I don't need to delete stuff from > the official tree. It is more for my local version control. I have a plan. Namely to have a "list-needed" command, which you give one commit, and a flag implying how much "history" you want (*), and then it spits out all the sha1 files it needs for that history. Then you delete all the other ones from your SHA1 archive (easy enough to do efficiently by just sorting the two lists: the list of "needed" files and the list of "available" files). Script that, and call the command "prune-tree" or something like that, and you're all done. (*) The amount of history you want might be "none", which is to say that you don't want to go back in time, so you want _just_ the list of tree and blob objects associated with that commit. Or you might want a "linear" history, which would be the longest path through the parent changesets to the root. Or you might want "all", which would follow all parents and all trees. Or you might want to prune the history tree by date - "give me all history, but cut it off when you hit a parent that was done more than 6 months ago". This "list-needed" thing is not just for pruning history either. If you have a local tree "x", and you want to figure out how much of it you need to send to somebody else who has an older tree "y", then what you'd do is basically "list-needed x" and remove the set of "list-needed y". That gives you the answer to the question "what's the minimum set of sha1 files I need to send to the other guy so that he can re-create my top-of-tree". My second plan is to make somebody else so fired up about the problem that I can just sit back and take patches. That's really what I'm best at. Sitting here, in the (rain) on the patio, drinking a foofy tropical drink, and pressing the "apply" button. Then I take all the credit for my incredible work. Hint, hint. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 23:21 ` Linus Torvalds @ 2005-04-10 21:28 ` Christopher Li 2005-04-12 5:14 ` David Lang 0 siblings, 1 reply; 179+ messages in thread From: Christopher Li @ 2005-04-10 21:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Paul Jackson, junkio, rddunlap, ross, linux-kernel I see. It just need some basic set operation (+, -, and) and some way to select a set: sha5---> / / sha1-->sha2-->sha3-- \ / \ / >sha4 list sha1 # all the file list in changeset sha1 # {sha1} list sha1,sha1 # same as above list sha1,sha2 # all the file list in between changeset sha1 # and changeset sha2 # {sha1, sha2} in example list sha1,sha3 # {sha1, sha2, sha3, sha4} list sha1,any # all the change set reachable from sha1. {sha1, ... sha5, ...} new sha1,sha2 # all the new file add between in sha1, sha2 (+) changed sha1,sha2 # add the changed file between sha1, sha2 (>) (<) deleted sha1,sha2 # add the deleted file between sha1, sha2 (-) before time # all the file before time after time # all the file after time So in my example, the file I want to delete is : {list hack1, base}+ {list hack2, base} ... {list hack6, base} \ - [list official_merge, base ] On Sun, Apr 10, 2005 at 04:21:08PM -0700, Linus Torvalds wrote: > > > > the official tree. It is more for my local version control. > > I have a plan. Namely to have a "list-needed" command, which you give one > commit, and a flag implying how much "history" you want (*), and then it > spits out all the sha1 files it needs for that history. > > Then you delete all the other ones from your SHA1 archive (easy enough to > do efficiently by just sorting the two lists: the list of "needed" files > and the list of "available" files). > > Script that, and call the command "prune-tree" or something like that, and > you're all done. > > (*) The amount of history you want might be "none", which is to say that > you don't want to go back in time, so you want _just_ the list of tree and > blob objects associated with that commit. That will be {list head} > > Or you might want a "linear" history, which would be the longest path > through the parent changesets to the root. That will be {list head,root} > > Or you might want "all", which would follow all parents and all trees. That will be {list any, root} > > Or you might want to prune the history tree by date - "give me all > history, but cut it off when you hit a parent that was done more than 6 > months ago". That is {after -6month } > > This "list-needed" thing is not just for pruning history either. If you > have a local tree "x", and you want to figure out how much of it you need > to send to somebody else who has an older tree "y", then what you'd do is > basically "list-needed x" and remove the set of "list-needed y". That > gives you the answer to the question "what's the minimum set of sha1 files > I need to send to the other guy so that he can re-create my top-of-tree". > That is {list x, any} - {list y, any} > My second plan is to make somebody else so fired up about the problem that > I can just sit back and take patches. That's really what I'm best at. > Sitting here, in the (rain) on the patio, drinking a foofy tropical drink, > and pressing the "apply" button. Then I take all the credit for my > incredible work. Sounds like a good plan. Chris ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 21:28 ` Christopher Li @ 2005-04-12 5:14 ` David Lang 2005-04-12 6:00 ` Paul Jackson 2005-04-12 7:05 ` Barry K. Nathan 0 siblings, 2 replies; 179+ messages in thread From: David Lang @ 2005-04-12 5:14 UTC (permalink / raw) To: Christopher Li Cc: Linus Torvalds, Paul Jackson, junkio, rddunlap, ross, linux-kernel I've been reading this and have another thought for you guys to keep in mind for this tool. version control of system config files on linux systems. it sounds like you could put the / fileystem under the control of git (after teaching it to not cross fileystem boundries so you can have another filesystem to work with) and version control your entire system. if this was done it would be nice to add a item type that would referance a file in a distro package to save space. it sounds like you could run a git checkin daily (as part of the updatedb run for example) at very little cost. for that matter by comparing the git data between servers (or between a server and an archive) you could easily use it to detect tampering. sounds very interesting, but I'm going to let things settle down a bit before I try to tackle this (but you guys who ar working on it shoudl feel free to add the couple options nessasary to implement this if you want ;-) David Lang On Sun, 10 Apr 2005, Christopher Li wrote: > Date: Sun, 10 Apr 2005 17:28:50 -0400 > From: Christopher Li <lkml@chrisli.org> > To: Linus Torvalds <torvalds@osdl.org> > Cc: Paul Jackson <pj@engr.sgi.com>, junkio@cox.net, rddunlap@osdl.org, > ross@jose.lug.udel.edu, linux-kernel@vger.kernel.org > Subject: Re: more git updates.. > > I see. It just need some basic set operation (+, -, and) > and some way to select a set: > > > sha5---> > / > / > sha1-->sha2-->sha3-- > \ / > \ / > >sha4 > > > list sha1 # all the file list in changeset sha1 > # {sha1} > list sha1,sha1 # same as above > list sha1,sha2 # all the file list in between changeset sha1 > # and changeset sha2 > # {sha1, sha2} in example > list sha1,sha3 # {sha1, sha2, sha3, sha4} > > list sha1,any # all the change set reachable from sha1. > {sha1, ... sha5, ...} > > new sha1,sha2 # all the new file add between in sha1, sha2 (+) > changed sha1,sha2 # add the changed file between sha1, sha2 (>) (<) > deleted sha1,sha2 # add the deleted file between sha1, sha2 (-) > > before time # all the file before time > after time # all the file after time > > > So in my example, the file I want to delete is : > > {list hack1, base}+ {list hack2, base} ... {list hack6, base} \ > - [list official_merge, base ] > > > > On Sun, Apr 10, 2005 at 04:21:08PM -0700, Linus Torvalds wrote: >> >> >>> the official tree. It is more for my local version control. >> >> I have a plan. Namely to have a "list-needed" command, which you give one >> commit, and a flag implying how much "history" you want (*), and then it >> spits out all the sha1 files it needs for that history. >> >> Then you delete all the other ones from your SHA1 archive (easy enough to >> do efficiently by just sorting the two lists: the list of "needed" files >> and the list of "available" files). >> >> Script that, and call the command "prune-tree" or something like that, and >> you're all done. >> >> (*) The amount of history you want might be "none", which is to say that >> you don't want to go back in time, so you want _just_ the list of tree and >> blob objects associated with that commit. > > That will be {list head} > >> >> Or you might want a "linear" history, which would be the longest path >> through the parent changesets to the root. > > That will be {list head,root} > >> >> Or you might want "all", which would follow all parents and all trees. > > That will be {list any, root} > >> >> Or you might want to prune the history tree by date - "give me all >> history, but cut it off when you hit a parent that was done more than 6 >> months ago". > > That is {after -6month } > >> >> This "list-needed" thing is not just for pruning history either. If you >> have a local tree "x", and you want to figure out how much of it you need >> to send to somebody else who has an older tree "y", then what you'd do is >> basically "list-needed x" and remove the set of "list-needed y". That >> gives you the answer to the question "what's the minimum set of sha1 files >> I need to send to the other guy so that he can re-create my top-of-tree". >> > > That is {list x, any} - {list y, any} > > >> My second plan is to make somebody else so fired up about the problem that >> I can just sit back and take patches. That's really what I'm best at. >> Sitting here, in the (rain) on the patio, drinking a foofy tropical drink, >> and pressing the "apply" button. Then I take all the credit for my >> incredible work. > > Sounds like a good plan. > > Chris > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-12 5:14 ` David Lang @ 2005-04-12 6:00 ` Paul Jackson 2005-04-12 7:05 ` Barry K. Nathan 1 sibling, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-12 6:00 UTC (permalink / raw) To: David Lang; +Cc: lkml, torvalds, junkio, rddunlap, ross, linux-kernel David wrote: > and version control your entire system Yeah - that works. That's how I back up my system. Not git actually, but a similar sort of store (no compression, a line oriented ascii 'index' file). See my post on "Kernel SCM saga..", Sat, 9 Apr 2005 08:15:53 -0700, Message-Id: <20050409081553.744bbb55.pj@engr.sgi.com> -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-12 5:14 ` David Lang 2005-04-12 6:00 ` Paul Jackson @ 2005-04-12 7:05 ` Barry K. Nathan 1 sibling, 0 replies; 179+ messages in thread From: Barry K. Nathan @ 2005-04-12 7:05 UTC (permalink / raw) To: David Lang Cc: Christopher Li, Linus Torvalds, Paul Jackson, junkio, rddunlap, ross, linux-kernel On Mon, Apr 11, 2005 at 10:14:13PM -0700, David Lang wrote: > I've been reading this and have another thought for you guys to keep in > mind for this tool. > > version control of system config files on linux systems. I've been thinking about this too. (I won't have time to implement this however. If I do have time in the near future to do anything involving git, it probably won't have anything to do with version control of config files.) > it sounds like you could put the / fileystem under the control of git > (after teaching it to not cross fileystem boundries so you can have > another filesystem to work with) and version control your entire system. > if this was done it would be nice to add a item type that would referance > a file in a distro package to save space. it sounds like you could run a > git checkin daily (as part of the updatedb run for example) at very little > cost. I was thinking that the GIT checkin should actually be done by the distro configuration tools, and not as a cronjob. And maybe the config tools could do two checkins if there were any manual changes since the last checkin, or something. (That is, one checkin to check in the manual changes since the last checkin, and another to check in whatever the config tool just did.) Now that I think about it, it would be really good to have a simple tool for doing a manual checkin after manual editing of config files, but I think something like the dual-checkin scheme would be needed as a safety net in case root forgets to do the checkin. -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 22:38 ` Linus Torvalds 2005-04-10 19:53 ` Christopher Li @ 2005-04-11 6:57 ` bert hubert 2005-04-11 7:20 ` Christer Weinigel 1 sibling, 1 reply; 179+ messages in thread From: bert hubert @ 2005-04-11 6:57 UTC (permalink / raw) To: Linus Torvalds Cc: Christopher Li, Paul Jackson, junkio, rddunlap, ross, linux-kernel On Sun, Apr 10, 2005 at 03:38:39PM -0700, Linus Torvalds wrote: > compressed with zlib, they are all named by the sha1 file, and they all Now I know this is a concious decision, but recent zlib allows you to write out gzip content, at a cost of 14 bytes I think per file, by adding 32 to the window size. This in turn would allow users to zcat your objects at ease. You get confirmation of completeness of the file for free, as gzip encodes the length of the file at the end. Perhaps something to consider. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-11 6:57 ` bert hubert @ 2005-04-11 7:20 ` Christer Weinigel 0 siblings, 0 replies; 179+ messages in thread From: Christer Weinigel @ 2005-04-11 7:20 UTC (permalink / raw) To: bert hubert Cc: Linus Torvalds, Christopher Li, Paul Jackson, junkio, rddunlap, ross, linux-kernel bert hubert <ahu@ds9a.nl> writes: > On Sun, Apr 10, 2005 at 03:38:39PM -0700, Linus Torvalds wrote: > > > compressed with zlib, they are all named by the sha1 file, and they all > > Now I know this is a concious decision, but recent zlib allows you to write > out gzip content, at a cost of 14 bytes I think per file, by adding 32 to > the window size. This in turn would allow users to zcat your objects at > ease. > > You get confirmation of completeness of the file for free, as gzip encodes > the length of the file at the end. I would very much like it if git used normal gzip files with a .gz extension. Doing it this way means that the compression methods can be extended in the future. I.e: ab/1234567890.gz gzip compressed ab/1234567890.xd xdelta compressed I find the xdelta encoding very attractive since it can probably reduce the size of the repository drastically. A compression script could for run nightly and xdelta compress everything that's older than a few months (to figure out what files to create the delta from, just look at the commit files and compare the parent tree to the current tree). Of course, this means that a dumb wget won't work all that well to synchronize two trees, but it might be worthwile anyways. /Christer -- "Just how much can I get away with and still go to heaven?" Freelance consultant specializing in device driver programming for Linux Christer Weinigel <christer@weinigel.se> http://www.weinigel.se ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 20:57 ` Linus Torvalds 2005-04-10 19:03 ` Christopher Li @ 2005-04-10 23:14 ` Paul Jackson 2005-04-10 23:38 ` Linus Torvalds 2005-04-11 0:10 ` Petr Baudis 1 sibling, 2 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-10 23:14 UTC (permalink / raw) To: Linus Torvalds; +Cc: junkio, rddunlap, ross, linux-kernel Useful explanation - thanks, Linus. Is this picture and description accurate: ============================================================== < working directory files (foo.c) > ^ ^ | | upward ops | downward ops | | ---------- | ------------ | | checkout-cache | update-cache | | show-diff | v v < current directory cache (".dircache/index") > ^ ^ | | upward ops | downward ops | | ---------- | ------------ | | read-tree | write-tree | | | commit-tree | | v v < git filesystem (blobs, trees, commits: .dircache/{HEAD,objects}) > ============================================================== The checkout-cache and show-diff ops read their meta-data from the cache, and the actual file contents from the git filesystem. Similary, the update-cache op writes meta-data into the cache, and may create new files in the git filesystem. The cache (but not the git filesystem) stores transient information (ctime, mtime, dev, ino, uid, gid, and size) about each working file update-cache has copied into the git filesystem so that checkout-cache and show-diff can detect changes in the contents of working files just from a stat, without actually rereading the file. In some sense, the cache holds the git filesystem inodes, and the git filesystem holds the data blocks. Except that: (1) the cache just holds the current "view" into the git filesystem, (2) objects in the filesystem have an "inode" number (their <sha1> value) that is persistent whether in view or not, (3) objects in the filesystem are not removed just because nothing in the cache references them, (4) objects in the filesystem can reference other objects, that are typically also in the filesystem, but that can still be reliably self-identified even if found in the wild of say one's email inbox, and (5) the view in the directory cache can itself be made into a filesystem object - using commit-tree. ============================================================== Minor question: I must have an old version - I got 'git-0.03', but it doesn't have 'checkout-cache', and its 'read-tree' directly writes my working files. How do I get a current version? Well, one way I see, and that's to pick up Pasky's: http://pasky.or.cz/~pasky/dev/git/git-pasky-base.tar.bz2 Perhaps that's the best way? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 23:14 ` Paul Jackson @ 2005-04-10 23:38 ` Linus Torvalds 2005-04-11 0:19 ` Paul Jackson 2005-04-11 15:49 ` Randy.Dunlap 2005-04-11 0:10 ` Petr Baudis 1 sibling, 2 replies; 179+ messages in thread From: Linus Torvalds @ 2005-04-10 23:38 UTC (permalink / raw) To: Paul Jackson; +Cc: junkio, rddunlap, ross, linux-kernel On Sun, 10 Apr 2005, Paul Jackson wrote: > > Useful explanation - thanks, Linus. Hey. You're welcome. Especially when you create good documentation for this thing. Because: > Is this picture and description accurate: [ deleted, but I'll probably try to put it in an explanation file somewhere ] Yes. Excellent. > Minor question: > > I must have an old version - I got 'git-0.03', but > it doesn't have 'checkout-cache', and its 'read-tree' > directly writes my working files. Yes. Crappy old tree, but it can still read my git.git directory, so you can use it to update to my current source base. However, from a usability angle, my source-base really has been concentrating _entirely_ on just the plumbing, and if you actually want a faucet or a toilet _conntected_ to the plumbing, you're better off with Pasky's tree, methinks: > How do I get a current version? Well, one way I see, > and that's to pick up Pasky's: > > http://pasky.or.cz/~pasky/dev/git/git-pasky-base.tar.bz2 > > Perhaps that's the best way? Indeed. He's got a number of shell scripts etc to automate the boring parts. Linus ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 23:38 ` Linus Torvalds @ 2005-04-11 0:19 ` Paul Jackson 2005-04-11 15:49 ` Randy.Dunlap 1 sibling, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-11 0:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: junkio, rddunlap, ross, linux-kernel Linus writes: > Hey. You're welcome. Especially when you create good documentation for > this thing. Glad to be of service. Sounds like the umbrella in your foofy drink drink will come in handy - keeping off the rain. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 23:38 ` Linus Torvalds 2005-04-11 0:19 ` Paul Jackson @ 2005-04-11 15:49 ` Randy.Dunlap 2005-04-11 18:30 ` Petr Baudis 1 sibling, 1 reply; 179+ messages in thread From: Randy.Dunlap @ 2005-04-11 15:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: pj, junkio, ross, linux-kernel On Sun, 10 Apr 2005 16:38:00 -0700 (PDT) Linus Torvalds wrote: | | | On Sun, 10 Apr 2005, Paul Jackson wrote: | > | > Useful explanation - thanks, Linus. | | Hey. You're welcome. Especially when you create good documentation for | this thing. | | Because: | | > Is this picture and description accurate: | | [ deleted, but I'll probably try to put it in an explanation file | somewhere ] | | Yes. Excellent. | | > Minor question: | > | > I must have an old version - I got 'git-0.03', but | > it doesn't have 'checkout-cache', and its 'read-tree' | > directly writes my working files. | | Yes. Crappy old tree, but it can still read my git.git directory, so you | can use it to update to my current source base. Please go into a little more detail about how to do this step... that seems to be the most basic concept that I am missing. i.e., how to find the "latest/current" tree (version/commit) and check it out (read-tree, checkout-cache, etc.). Even if I use Pasky's tools, I'd like to understand this step. | However, from a usability angle, my source-base really has been | concentrating _entirely_ on just the plumbing, and if you actually want a | faucet or a toilet _conntected_ to the plumbing, you're better off with | Pasky's tree, methinks: | | > How do I get a current version? Well, one way I see, | > and that's to pick up Pasky's: | > | > http://pasky.or.cz/~pasky/dev/git/git-pasky-base.tar.bz2 | > | > Perhaps that's the best way? | | Indeed. He's got a number of shell scripts etc to automate the boring | parts. --- ~Randy ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-11 15:49 ` Randy.Dunlap @ 2005-04-11 18:30 ` Petr Baudis 0 siblings, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-11 18:30 UTC (permalink / raw) To: Randy.Dunlap; +Cc: Linus Torvalds, pj, junkio, ross, linux-kernel Dear diary, on Mon, Apr 11, 2005 at 05:49:31PM CEST, I got a letter where "Randy.Dunlap" <rddunlap@osdl.org> told me that... > On Sun, 10 Apr 2005 16:38:00 -0700 (PDT) Linus Torvalds wrote: ..snip.. > | Yes. Crappy old tree, but it can still read my git.git directory, so you > | can use it to update to my current source base. > > Please go into a little more detail about how to do this step... > that seems to be the most basic concept that I am missing. > i.e., how to find the "latest/current" tree (version/commit) > and check it out (read-tree, checkout-cache, etc.). Well, its ID is by convention kept in .dircache/HEAD. But that is really only a convention, no "core git" tool reads it directly, and you need to update it manually after you do commit-tree. First, you need to get the accompanying tree's id. git-pasky's shortcut is $(tree-id), but manually you can do it by $(cat-file commit $(cat .dircache/HEAD)) | egrep '^tree' Note that if you ever forgot to update HEAD or if you have multiple branches in your repository, you can list all "head commits" (that is, commits which have no other commits referencing them as parents) by doing fsck-cache. Now, you need to populate the directory cache by the tree (see Paul Jackson's diagram): read-tree $tree_id And now you want to update your working tree from the cache: checkout-cache -a -f This will bring your tree in sync with the cache (it won't remove any stale files, though). That means it will overwrite your local changes too - turn that off by omitting the "-f". If you want to update only some files, omit the "-a" and list them. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: Re: more git updates.. 2005-04-10 23:14 ` Paul Jackson 2005-04-10 23:38 ` Linus Torvalds @ 2005-04-11 0:10 ` Petr Baudis 1 sibling, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-11 0:10 UTC (permalink / raw) To: Paul Jackson; +Cc: Linus Torvalds, junkio, rddunlap, ross, linux-kernel Dear diary, on Mon, Apr 11, 2005 at 01:14:57AM CEST, I got a letter where Paul Jackson <pj@engr.sgi.com> told me that... > Useful explanation - thanks, Linus. > > Is this picture and description accurate: > > ============================================================== > > > < working directory files (foo.c) > > ^ > ^ | > | upward ops | downward ops | > | ---------- | ------------ | > | checkout-cache | update-cache | > | show-diff | v > v > < current directory cache (".dircache/index") > > ^ > ^ | > | upward ops | downward ops | > | ---------- | ------------ | > | read-tree | write-tree | > | | commit-tree | > | v > v > < git filesystem (blobs, trees, commits: .dircache/{HEAD,objects}) > Well, except that from purely technical standpoint commit-tree has nothing to do in this picture - it creates new object in the git filesystem based on its input data, but regardless to the directory cache or current tree. It probably still belongs where it is from the workflow standpoint, though. ..snip.. > Minor question: > > I must have an old version - I got 'git-0.03', but > it doesn't have 'checkout-cache', and its 'read-tree' > directly writes my working files. > > How do I get a current version? Well, one way I see, > and that's to pick up Pasky's: > > http://pasky.or.cz/~pasky/dev/git/git-pasky-base.tar.bz2 > > Perhaps that's the best way? You can take mine, and do: git pull pasky git pull linus cp .dircache/HEAD .dircache/HEAD.local Now, your tree and git filesystem is up to date. git track local Now, when you do git pull pasky, your working tree will not be updated automatically anymore. git track linus Now, you start tracking Linus' tree instead. Note that the initial update will blow away the scripts in your current tree, so before you do the last two steps you will probably want to clone the tree and set PATH to the one still tracking me, so you get all the comfort. ;-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 19:45 more git updates Linus Torvalds 2005-04-09 19:56 ` Linus Torvalds 2005-04-09 20:07 ` Petr Baudis @ 2005-04-09 22:00 ` Paul Jackson 2005-04-09 23:21 ` Ralph Corderoy ` (2 subsequent siblings) 5 siblings, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-09 22:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: pasky, rddunlap, ross, mingo, davej, linux-kernel Linus wrote: > the NUL-termination makes this really easy to use even in shell grumble ... > I still use the old tools I learnt to use fifteen years ago new comer ;) -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 19:45 more git updates Linus Torvalds ` (2 preceding siblings ...) 2005-04-09 22:00 ` Paul Jackson @ 2005-04-09 23:21 ` Ralph Corderoy 2005-04-10 0:39 ` Paul Jackson 2005-04-10 17:31 ` Rik van Riel 2005-04-11 16:46 ` ross 5 siblings, 1 reply; 179+ messages in thread From: Ralph Corderoy @ 2005-04-09 23:21 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Ingo Molnar, Dave Jones, Kernel Mailing List Hi Linus, > Btw, the NUL-termination makes this really easy to use even in shell > scripts, ie you can do > > diff-tree <sha1> <sha1> | xargs -0 do_something > > and you'll get each line as one nice argument to your "do_something" > script. So a do_diff could be based on something like > > #!/bin/sh Watch out for when xargs invokes do_something more than once and the `<' is parsed by a different one than the `>'. A `while read ...; do ... done' would avoid that, but wouldn't like the NULs instead of LFs. Cheers, Ralph. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 23:21 ` Ralph Corderoy @ 2005-04-10 0:39 ` Paul Jackson 2005-04-10 1:14 ` Bernd Eckenfels 2005-04-10 10:22 ` Ralph Corderoy 0 siblings, 2 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-10 0:39 UTC (permalink / raw) To: Ralph Corderoy Cc: torvalds, pasky, rddunlap, ross, mingo, davej, linux-kernel Ralph wrote: > Watch out for when xargs invokes do_something more than once and the `<' > is parsed by a different one than the `>'. It will take a pretty long list to do that. It seems that GNU xargs on top of a Linux kernel has a 128 KByte ARG_MAX. In the old days, with 4 KByte ARG_MAX limits, this would have bitten us pretty quickly. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 0:39 ` Paul Jackson @ 2005-04-10 1:14 ` Bernd Eckenfels 2005-04-10 1:33 ` Paul Jackson 2005-04-10 10:22 ` Ralph Corderoy 1 sibling, 1 reply; 179+ messages in thread From: Bernd Eckenfels @ 2005-04-10 1:14 UTC (permalink / raw) To: linux-kernel In article <20050409173944.247252eb.pj@engr.sgi.com> you wrote: > Ralph wrote: >> Watch out for when xargs invokes do_something more than once and the `<' >> is parsed by a different one than the `>'. > It will take a pretty long list to do that. It seems that > GNU xargs on top of a Linux kernel has a 128 KByte ARG_MAX. > In the old days, with 4 KByte ARG_MAX limits, this would have > bitten us pretty quickly. Nevertheless I think it is more parser friendly to have single records for diffs. Greetings Bernd ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 1:14 ` Bernd Eckenfels @ 2005-04-10 1:33 ` Paul Jackson 0 siblings, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-10 1:33 UTC (permalink / raw) To: Bernd Eckenfels; +Cc: linux-kernel Bernd wrote: > more parser friendly to have single records for diffs. good point [looks like you trimmed the cc list - folks around here don't like that ;)] -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 0:39 ` Paul Jackson 2005-04-10 1:14 ` Bernd Eckenfels @ 2005-04-10 10:22 ` Ralph Corderoy 2005-04-10 17:30 ` Paul Jackson 1 sibling, 1 reply; 179+ messages in thread From: Ralph Corderoy @ 2005-04-10 10:22 UTC (permalink / raw) To: Paul Jackson; +Cc: torvalds, pasky, rddunlap, ross, mingo, davej, linux-kernel Hi Paul, > Ralph wrote: > > Watch out for when xargs invokes do_something more than once and the > > `<' is parsed by a different one than the `>'. > > It will take a pretty long list to do that. It seems that GNU xargs > on top of a Linux kernel has a 128 KByte ARG_MAX. I didn't realise it was that long, but one pair of files to diff takes 128 bytes of that. $ wc -c <<\E > <100664 aff074c63ac827801a7d02ff92781365957f1430 update-cache.c > >100664 3a672397164d5ff27a19a6888b578af96824ede7 update-cache.c > E 128 So that's space for 1024 pairs. (Doesn't envp take some up too?) That doesn't seem enough for diffs between revisions, but good enough for most uses that people will get caught out when it fails. $ bzip2 -dc patch-2.6.10.bz2 | grep -c '^diff ' 5384 Cheers, Ralph. ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 10:22 ` Ralph Corderoy @ 2005-04-10 17:30 ` Paul Jackson 0 siblings, 0 replies; 179+ messages in thread From: Paul Jackson @ 2005-04-10 17:30 UTC (permalink / raw) To: Ralph Corderoy Cc: torvalds, pasky, rddunlap, ross, mingo, davej, linux-kernel Ralph wrote: > but good enough for > most uses that people will get caught out when it fails. Exactly. If Linus persists in this diff-tree output format, using two lines for changed files, then I will have to add the following sed script to my arsenal: sed '/^</ { N; s/\n>/ / }' It collapses pairs of lines: <100664 4870bcf91f8666fc788b07578fb7473eda795587 Makefile >100664 5493a649bb33b9264e8ed26cc1f832989a307d3b Makefile to the single line: <100664 4870bcf91f8666fc788b07578fb7473eda795587 Makefile 100664 5493a649bb33b9264e8ed26cc1f832989a307d3b Makefile However, more people will get bit by this git glitch than know sed. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401 ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 19:45 more git updates Linus Torvalds ` (3 preceding siblings ...) 2005-04-09 23:21 ` Ralph Corderoy @ 2005-04-10 17:31 ` Rik van Riel 2005-04-10 17:35 ` Ingo Molnar 2005-04-11 16:46 ` ross 5 siblings, 1 reply; 179+ messages in thread From: Rik van Riel @ 2005-04-10 17:31 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Ingo Molnar, Dave Jones, Kernel Mailing List On Sat, 9 Apr 2005, Linus Torvalds wrote: > I've rsync'ed the new git repository to kernel.org, it should all be there > in /pub/linux/kernel/people/torvalds/git.git/ (and it looks like the > mirror scripts already picked it up on the public side too). GCC 4 isn't very happy. Mostly sign changes, but also something that looks like a real error: gcc -g -O3 -Wall -c -o fsck-cache.o fsck-cache.c fsck-cache.c: In function 'main': fsck-cache.c:59: warning: control may reach end of non-void function 'fsck_tree' being inlined fsck-cache.c:62: warning: control may reach end of non-void function 'fsck_commit' being inlined I assume that fsck_tree and fsck_commit should complain loudly if they ever get to that point - but since I'm not quite sure there's no patch, sorry. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-10 17:31 ` Rik van Riel @ 2005-04-10 17:35 ` Ingo Molnar 0 siblings, 0 replies; 179+ messages in thread From: Ingo Molnar @ 2005-04-10 17:35 UTC (permalink / raw) To: Rik van Riel Cc: Linus Torvalds, Petr Baudis, Randy.Dunlap, Ross Vandegrift, Dave Jones, Kernel Mailing List * Rik van Riel <riel@redhat.com> wrote: > GCC 4 isn't very happy. Mostly sign changes, but also something that > looks like a real error: > > gcc -g -O3 -Wall -c -o fsck-cache.o fsck-cache.c > fsck-cache.c: In function 'main': > fsck-cache.c:59: warning: control may reach end of non-void function 'fsck_tree' being inlined > fsck-cache.c:62: warning: control may reach end of non-void function 'fsck_commit' being inlined > > I assume that fsck_tree and fsck_commit should complain loudly if they > ever get to that point - but since I'm not quite sure there's no > patch, sorry. i sent a patch for most of the sign errors, but the above is a case gcc not noticing that the function can never ever exit the loop, and thus cannot get to the 'return' point. Ingo ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: more git updates.. 2005-04-09 19:45 more git updates Linus Torvalds ` (4 preceding siblings ...) 2005-04-10 17:31 ` Rik van Riel @ 2005-04-11 16:46 ` ross 5 siblings, 0 replies; 179+ messages in thread From: ross @ 2005-04-11 16:46 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Randy.Dunlap, Ingo Molnar, Dave Jones, Kernel Mailing List On Sat, Apr 09, 2005 at 12:45:52PM -0700, Linus Torvalds wrote: > Can you guys re-send the scripts you wrote? They probably need some > updating for the new semantics. Sorry about that ;( I've been off email this weekend, so have fallen a bit behind here. I'll forgo updating my stuff, since it looks like there's superior work. Looks cool! I must say, the git as a filesystem thing is really neat. This has been one of the more fun projects I've toyed around with. -- Ross Vandegrift ross@lug.udel.edu "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 ^ permalink raw reply [flat|nested] 179+ messages in thread
* RE: more git updates.. @ 2005-04-10 22:07 Luck, Tony 2005-04-10 22:11 ` Petr Baudis 0 siblings, 1 reply; 179+ messages in thread From: Luck, Tony @ 2005-04-10 22:07 UTC (permalink / raw) To: Linus Torvalds Cc: Petr Baudis, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List >Also, I did actually debate that issue with myself, and decided that even >if we do have tons of files per directory, git doesn't much care. The >reason? Git never _searches_ for them. Assuming you have enough memory to >cache the tree, you just end up doing a "lookup", and inside the kernel >that's done using an efficient hash, which doesn't actually care _at_all_ >about how many files there are per directory. So long as the hash *is* efficient when the directory is packed full of 38 character filenames made only of [0-9a-f] ... which might not match the test cases under which the hash was picked :-) When there are some full-sized kernel git images, someone should do a sanity check. >Hey, I may end up being wrong, and yes, maybe I should have done a >two-level one. The good news is that we can trivially fix it later (even >dynamically - we can make the "sha1 object tree layout" be a per-tree >config option, and there would be no real issue, so you could make small >projects use a flat version and big projects use a very deep structure >etc). You'd just have to script some renames to move the files around. It depends on how many eco-system shell scripts get built that need to know about the layout ... if some shell/perl "libraries" encode this filename layout (and people use them) ... then switching later would indeed be painless. -Tony ^ permalink raw reply [flat|nested] 179+ messages in thread
* Re: RE: more git updates.. 2005-04-10 22:07 Luck, Tony @ 2005-04-10 22:11 ` Petr Baudis 0 siblings, 0 replies; 179+ messages in thread From: Petr Baudis @ 2005-04-10 22:11 UTC (permalink / raw) To: Luck, Tony Cc: Linus Torvalds, Randy.Dunlap, Ross Vandegrift, Kernel Mailing List Dear diary, on Mon, Apr 11, 2005 at 12:07:37AM CEST, I got a letter where "Luck, Tony" <tony.luck@intel.com> told me that... ..snip.. > >Hey, I may end up being wrong, and yes, maybe I should have done a > >two-level one. The good news is that we can trivially fix it later (even > >dynamically - we can make the "sha1 object tree layout" be a per-tree > >config option, and there would be no real issue, so you could make small > >projects use a flat version and big projects use a very deep structure > >etc). You'd just have to script some renames to move the files around. > > It depends on how many eco-system shell scripts get built that need to > know about the layout ... if some shell/perl "libraries" encode this > filename layout (and people use them) ... then switching later would > indeed be painless. FWIW, my short-term plans include support for monotone-like hash ID shortening - it's enough to use the shortest leading unique part of the ID to identify the revision. I will poke to the object repository for that. I also already have Randy Dunlap's git lsobj, which will list all objects of a specified type (very useful especially when looking for orphaned commits and such rather lowlevel work). -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. ^ permalink raw reply [flat|nested] 179+ messages in thread
end of thread, other threads:[~2005-04-25 20:35 UTC | newest]
Thread overview: 179+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-09 19:45 more git updates Linus Torvalds
2005-04-09 19:56 ` Linus Torvalds
2005-04-09 20:07 ` Petr Baudis
2005-04-09 21:00 ` Linus Torvalds
2005-04-09 21:00 ` tony.luck
2005-04-10 16:01 ` Linus Torvalds
2005-04-12 17:34 ` Helge Hafting
2005-04-10 18:19 ` Paul Jackson
2005-04-10 23:04 ` Bernd Eckenfels
2005-04-11 9:27 ` Anton Altaparmakov
2005-04-09 21:08 ` Linus Torvalds
2005-04-09 23:31 ` Linus Torvalds
2005-04-10 2:41 ` Petr Baudis
2005-04-10 16:27 ` [ANNOUNCE] git-pasky-0.1 Petr Baudis
2005-04-10 16:55 ` Linus Torvalds
2005-04-10 19:49 ` Sean
2005-04-10 17:33 ` Ingo Molnar
2005-04-10 17:42 ` Willy Tarreau
2005-04-10 17:45 ` Ingo Molnar
2005-04-10 18:45 ` Petr Baudis
2005-04-10 19:13 ` Willy Tarreau
2005-04-10 21:27 ` Petr Baudis
2005-04-10 20:38 ` Linus Torvalds
2005-04-10 21:39 ` Linus Torvalds
2005-04-10 23:49 ` Petr Baudis
2005-04-10 22:27 ` Petr Baudis
2005-04-10 23:10 ` Linus Torvalds
2005-04-10 23:26 ` Petr Baudis
2005-04-10 23:46 ` Linus Torvalds
2005-04-10 23:56 ` Petr Baudis
2005-04-11 0:20 ` GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) Linus Torvalds
2005-04-11 0:27 ` Petr Baudis
2005-04-11 7:45 ` Ingo Molnar
2005-04-11 8:40 ` Florian Weimer
2005-04-11 10:52 ` Petr Baudis
2005-04-11 16:05 ` Florian Weimer
2005-04-10 23:23 ` [ANNOUNCE] git-pasky-0.1 Paul Jackson
2005-04-11 0:15 ` Randy.Dunlap
2005-04-11 0:30 ` Re: " Petr Baudis
2005-04-11 1:11 ` Linus Torvalds
2005-04-10 20:41 ` Paul Jackson
2005-04-11 1:58 ` [ANNOUNCE] git-pasky-0.2 Petr Baudis
2005-04-11 2:46 ` Daniel Barkalow
2005-04-11 10:17 ` Petr Baudis
2005-04-11 8:50 ` Ingo Molnar
2005-04-11 10:16 ` Petr Baudis
2005-04-11 13:57 ` [ANNOUNCE] git-pasky-0.3 Petr Baudis
2005-04-12 12:47 ` Martin Schlemmer
2005-04-12 13:02 ` Petr Baudis
2005-04-12 13:13 ` Martin Schlemmer
2005-04-12 13:23 ` Petr Baudis
2005-04-12 13:07 ` David Woodhouse
2005-04-13 8:47 ` Russell King
2005-04-13 8:59 ` Petr Baudis
2005-04-13 9:06 ` H. Peter Anvin
2005-04-13 9:09 ` David Woodhouse
2005-04-13 9:25 ` David Woodhouse
2005-04-13 9:42 ` Petr Baudis
2005-04-13 10:24 ` David Woodhouse
2005-04-13 17:01 ` Daniel Barkalow
2005-04-13 18:07 ` Petr Baudis
2005-04-13 18:22 ` git mailing list (Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3) Linus Torvalds
2005-04-13 18:38 ` Re: Re: Re: [ANNOUNCE] git-pasky-0.3 Daniel Barkalow
2005-04-13 12:43 ` Xavier Bestel
2005-04-13 16:48 ` H. Peter Anvin
2005-04-13 18:15 ` Xavier Bestel
2005-04-13 23:05 ` bd
2005-04-13 14:38 ` Linus Torvalds
2005-04-13 14:47 ` David Woodhouse
2005-04-13 14:59 ` Linus Torvalds
2005-04-13 9:35 ` Russell King
2005-04-13 9:38 ` Russell King
2005-04-13 9:49 ` Petr Baudis
2005-04-13 11:02 ` Ingo Molnar
2005-04-13 14:50 ` Linus Torvalds
2005-04-13 9:46 ` Petr Baudis
2005-04-13 10:28 ` Russell King
2005-04-13 19:03 ` Russell King
2005-04-13 19:13 ` Petr Baudis
2005-04-13 19:21 ` Russell King
2005-04-13 19:23 ` H. Peter Anvin
2005-04-10 6:53 ` more git updates Christopher Li
2005-04-10 11:48 ` Ralph Corderoy
2005-04-10 19:23 ` Paul Jackson
2005-04-10 18:42 ` Christopher Li
2005-04-10 22:30 ` Petr Baudis
2005-04-11 13:58 ` H. Peter Anvin
2005-04-20 20:29 ` Kai Henningsen
2005-04-24 0:42 ` Paul Jackson
2005-04-24 1:29 ` Bernd Eckenfels
2005-04-24 4:13 ` Paul Jackson
2005-04-24 4:38 ` Bernd Eckenfels
2005-04-24 4:53 ` Paul Jackson
2005-04-25 11:57 ` Theodore Ts'o
2005-04-25 16:40 ` David Wagner
2005-04-25 20:35 ` Bernd Eckenfels
2005-04-24 16:52 ` Horst von Brand
2005-04-24 8:00 ` Kai Henningsen
[not found] ` <6f6293f10504210220744af114@mail.gmail.com>
2005-04-24 8:01 ` Kai Henningsen
2005-04-11 11:35 ` [rfc] git: combo-blobs Ingo Molnar
2005-04-11 14:45 ` Paul Jackson
2005-04-11 15:12 ` Ingo Molnar
2005-04-11 15:32 ` Linus Torvalds
2005-04-11 15:39 ` Ingo Molnar
2005-04-11 15:57 ` Ingo Molnar
2005-04-11 16:01 ` Linus Torvalds
2005-04-11 16:33 ` Ingo Molnar
2005-04-12 5:42 ` Barry K. Nathan
2005-04-11 18:13 ` Chris Wedgwood
2005-04-11 18:30 ` Linus Torvalds
2005-04-11 20:18 ` Linus Torvalds
2005-04-11 18:40 ` Petr Baudis
2005-04-11 17:50 ` Paul Jackson
2005-04-11 15:28 ` Ingo Molnar
2005-04-11 15:31 ` Ingo Molnar
2005-04-12 4:05 ` more git updates David Eger
2005-04-12 8:16 ` Petr Baudis
2005-04-12 20:44 ` David Eger
2005-04-12 21:21 ` Linus Torvalds
2005-04-12 22:29 ` Krzysztof Halasa
2005-04-12 22:49 ` Linus Torvalds
2005-04-13 4:32 ` Matthias Urlichs
2005-04-12 22:36 ` David Eger
2005-04-12 23:48 ` Panagiotis Issaris
2005-04-12 23:40 ` Andrea Arcangeli
2005-04-12 23:45 ` Linus Torvalds
2005-04-13 0:14 ` Andrea Arcangeli
2005-04-13 1:10 ` Linus Torvalds
2005-04-13 10:59 ` Andrea Arcangeli
2005-04-13 20:44 ` Matt Mackall
2005-04-13 23:42 ` Krzysztof Halasa
2005-04-14 0:13 ` Matt Mackall
2005-04-13 9:30 ` Russell King
2005-04-13 10:20 ` Andrea Arcangeli
2005-04-13 14:43 ` Linus Torvalds
2005-04-10 2:07 ` Paul Jackson
2005-04-10 2:20 ` Paul Jackson
2005-04-10 2:09 ` Paul Jackson
2005-04-10 7:51 ` Junio C Hamano
2005-04-10 5:53 ` Christopher Li
2005-04-10 9:28 ` Junio C Hamano
2005-04-10 7:06 ` Christopher Li
2005-04-10 11:38 ` tony.luck
2005-04-10 9:48 ` Petr Baudis
2005-04-10 9:40 ` Wichert Akkerman
2005-04-10 9:41 ` Petr Baudis
2005-04-10 7:09 ` Christopher Li
2005-04-10 11:21 ` Proposal for shell-patch-format [was: Re: more git updates..] Rutger Nijlunsing
2005-04-10 15:44 ` more git updates Linus Torvalds
2005-04-10 17:00 ` Rutger Nijlunsing
2005-04-10 18:50 ` Paul Jackson
2005-04-10 20:57 ` Linus Torvalds
2005-04-10 19:03 ` Christopher Li
2005-04-10 22:38 ` Linus Torvalds
2005-04-10 19:53 ` Christopher Li
2005-04-10 23:21 ` Linus Torvalds
2005-04-10 21:28 ` Christopher Li
2005-04-12 5:14 ` David Lang
2005-04-12 6:00 ` Paul Jackson
2005-04-12 7:05 ` Barry K. Nathan
2005-04-11 6:57 ` bert hubert
2005-04-11 7:20 ` Christer Weinigel
2005-04-10 23:14 ` Paul Jackson
2005-04-10 23:38 ` Linus Torvalds
2005-04-11 0:19 ` Paul Jackson
2005-04-11 15:49 ` Randy.Dunlap
2005-04-11 18:30 ` Petr Baudis
2005-04-11 0:10 ` Petr Baudis
2005-04-09 22:00 ` Paul Jackson
2005-04-09 23:21 ` Ralph Corderoy
2005-04-10 0:39 ` Paul Jackson
2005-04-10 1:14 ` Bernd Eckenfels
2005-04-10 1:33 ` Paul Jackson
2005-04-10 10:22 ` Ralph Corderoy
2005-04-10 17:30 ` Paul Jackson
2005-04-10 17:31 ` Rik van Riel
2005-04-10 17:35 ` Ingo Molnar
2005-04-11 16:46 ` ross
-- strict thread matches above, loose matches on Subject: below --
2005-04-10 22:07 Luck, Tony
2005-04-10 22:11 ` Petr Baudis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox