* Re: Index/hash order [not found] ` <Pine.LNX.4.58.0504131144160.4501@ppc970.osdl.org> @ 2005-04-13 20:02 ` Ingo Molnar 2005-04-13 20:07 ` H. Peter Anvin 0 siblings, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2005-04-13 20:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: H. Peter Anvin, git * Linus Torvalds <torvalds@osdl.org> wrote: > > with a plaintext repository we could do the 'hardlink trick' (which > > brings in other manageability problems and limitations but is at least a > > partially good idea), which would make the working tree and the > > repository share the same inode in most cases. > However, the real issue is that you're really asking for trouble. > There are tons of tools that modify files without breaking the > hardlink. Even some editors do. So you just use the wrong tool on the > tree by mistake, and not only is your archive corrupt, you've > corrupted all other archives that might have shared the same object > directory. that's what i loosely meant under 'manageability problems'. I mentioned one solution earlier: to make the repository object an immutable file (the +i flag on the inode) - it really wants to be immutable after all. That would solve a whole range of 'accidental corruption' issues. Another solution (suggested by Christer Weinigel) was to enforce immutability by making it owned by another user/group (git:git or whatever). but having a binary compressed format is 'soft immutability', done cleverly. Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order 2005-04-13 20:02 ` Index/hash order Ingo Molnar @ 2005-04-13 20:07 ` H. Peter Anvin 2005-04-13 20:15 ` Ingo Molnar 2005-04-13 20:15 ` Index/hash order Linus Torvalds 0 siblings, 2 replies; 20+ messages in thread From: H. Peter Anvin @ 2005-04-13 20:07 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, git Ingo Molnar wrote: > > that's what i loosely meant under 'manageability problems'. > > I mentioned one solution earlier: to make the repository object an > immutable file (the +i flag on the inode) - it really wants to be > immutable after all. That would solve a whole range of 'accidental > corruption' issues. > I think abusing the immutable bit quickly will decend into the same rathole which makes u-w often useless. u-w will actually be preserved by more tools -- simply because they know about it -- than +i. Either which way, it feels to me that this idea has already been ruled out, so it's probably pointless to keep debating just exactly what we're not actually going to do. -hpa ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order 2005-04-13 20:07 ` H. Peter Anvin @ 2005-04-13 20:15 ` Ingo Molnar 2005-04-13 20:18 ` Ingo Molnar 2005-04-13 21:04 ` Index/hash order Linus Torvalds 2005-04-13 20:15 ` Index/hash order Linus Torvalds 1 sibling, 2 replies; 20+ messages in thread From: Ingo Molnar @ 2005-04-13 20:15 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Linus Torvalds, git * H. Peter Anvin <hpa@zytor.com> wrote: > >that's what i loosely meant under 'manageability problems'. > > > >I mentioned one solution earlier: to make the repository object an > >immutable file (the +i flag on the inode) - it really wants to be > >immutable after all. That would solve a whole range of 'accidental > >corruption' issues. > > > > I think abusing the immutable bit quickly will decend into the same > rathole which makes u-w often useless. u-w will actually be preserved > by more tools -- simply because they know about it -- than +i. well, the 'owned by another user' solution is valid though, and doesnt have this particular problem. (We've got a secure multiuser OS, so can as well use it to protect the DB against corruption.) > Either which way, it feels to me that this idea has already been ruled > out, so it's probably pointless to keep debating just exactly what > we're not actually going to do. (even if it sounds stupid, i keep discussing decisions that are done for reasons i cannot fully agree with (yet), even if i happen to agree with the net decision. It's all technological arguments, so it's not like there's anything fuzzy about any of these issues.) Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order 2005-04-13 20:15 ` Ingo Molnar @ 2005-04-13 20:18 ` Ingo Molnar 2005-04-13 20:21 ` Ingo Molnar 2005-04-13 21:04 ` Index/hash order Linus Torvalds 1 sibling, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2005-04-13 20:18 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Linus Torvalds, git * Ingo Molnar <mingo@elte.hu> wrote: > > I think abusing the immutable bit quickly will decend into the same > > rathole which makes u-w often useless. u-w will actually be preserved > > by more tools -- simply because they know about it -- than +i. > > well, the 'owned by another user' solution is valid though, and doesnt > have this particular problem. (We've got a secure multiuser OS, so can > as well use it to protect the DB against corruption.) but ... this variant doesnt have any 'wow' feeling to it either, and it clearly brings in a number of other limitations. I might as well shut up until i can suggest something obviously superior :) Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order 2005-04-13 20:18 ` Ingo Molnar @ 2005-04-13 20:21 ` Ingo Molnar 2005-04-13 20:26 ` Updated base64 patches H. Peter Anvin 0 siblings, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2005-04-13 20:21 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Linus Torvalds, git * Ingo Molnar <mingo@elte.hu> wrote: > > > I think abusing the immutable bit quickly will decend into the same > > > rathole which makes u-w often useless. u-w will actually be preserved > > > by more tools -- simply because they know about it -- than +i. > > > > well, the 'owned by another user' solution is valid though, and doesnt > > have this particular problem. (We've got a secure multiuser OS, so can > > as well use it to protect the DB against corruption.) > > but ... this variant doesnt have any 'wow' feeling to it either, and > it clearly brings in a number of other limitations. I might as well > shut up until i can suggest something obviously superior :) i think the killer argument is compression. A 2 GB compressed repository will be a hard sell already, 4 GB is pretty much out of question. And once we accept that we have to have _some_ form of compression, it's Linus' scheme that wins. Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Updated base64 patches 2005-04-13 20:21 ` Ingo Molnar @ 2005-04-13 20:26 ` H. Peter Anvin 0 siblings, 0 replies; 20+ messages in thread From: H. Peter Anvin @ 2005-04-13 20:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, git I have uploaded two new base64 patches, one which uses the flat repository and one which doesn't: ftp://ftp.kernel.org/pub/linux/kernel/people/hpa/git-0.04-base64-3.diff ftp://ftp.kernel.org/pub/linux/kernel/people/hpa/git-0.04-base64-flat-3.diff ... both are still against the git-0.04 tarball. The only differences is changing "char" to "signed char" in places where it actually matters (since plain char is unsigned on some platforms), and, for the non-flat version, allowing the cache subdirectories to be lazily created (if ENOENT is returned, try mkdir before giving up.) -=hpa ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order 2005-04-13 20:15 ` Ingo Molnar 2005-04-13 20:18 ` Ingo Molnar @ 2005-04-13 21:04 ` Linus Torvalds 2005-04-20 7:40 ` enforcing DB immutability Ingo Molnar 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2005-04-13 21:04 UTC (permalink / raw) To: Ingo Molnar; +Cc: H. Peter Anvin, git On Wed, 13 Apr 2005, Ingo Molnar wrote: > > well, the 'owned by another user' solution is valid though, and doesnt > have this particular problem. (We've got a secure multiuser OS, so can > as well use it to protect the DB against corruption.) So now you need root to set up new repositories? No thanks. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* enforcing DB immutability 2005-04-13 21:04 ` Index/hash order Linus Torvalds @ 2005-04-20 7:40 ` Ingo Molnar 2005-04-20 7:49 ` Ingo Molnar 0 siblings, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2005-04-20 7:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: H. Peter Anvin, git * Linus Torvalds <torvalds@osdl.org> wrote: > On Wed, 13 Apr 2005, Ingo Molnar wrote: > > > > well, the 'owned by another user' solution is valid though, and doesnt > > have this particular problem. (We've got a secure multiuser OS, so can > > as well use it to protect the DB against corruption.) > > So now you need root to set up new repositories? No thanks. yeah, it's a bit awkward to protect uncompressed repositories - but it will need some sort of kernel enforcement. (if userspace finds out the DB contains uncompressed blobs, it _will_ try to use them.) (perhaps having an in-kernel GIT-alike versioned filesystem will help - but that brings up the same 'I have to be root' issues. The FS will enforce the true immutability of objects.) perhaps having a new 'immutable hardlink' feature in the Linux VFS would help? I.e. a hardlink that can only be readonly followed, and can be removed, but cannot be chmod-ed to a writeable hardlink. That i think would be a large enough barrier for editors/build-tools not to play the tricks they already do that makes 'readonly' files virtually meaningless. Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: enforcing DB immutability 2005-04-20 7:40 ` enforcing DB immutability Ingo Molnar @ 2005-04-20 7:49 ` Ingo Molnar 2005-04-20 7:53 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Ingo Molnar @ 2005-04-20 7:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: H. Peter Anvin, git * Ingo Molnar <mingo@elte.hu> wrote: > perhaps having a new 'immutable hardlink' feature in the Linux VFS > would help? I.e. a hardlink that can only be readonly followed, and > can be removed, but cannot be chmod-ed to a writeable hardlink. That i > think would be a large enough barrier for editors/build-tools not to > play the tricks they already do that makes 'readonly' files virtually > meaningless. immutable hardlinks have the following advantage: a hardlink by design hides the information where the link comes from. So even if an editor wanted to play stupid games and override the immutability - it doesnt know where the DB object is. (sure, it could find it if it wants to, but that needs real messing around - editors wont do _that_) i think this might work. (the current chattr +i flag isnt quite what we need though because it works on the inode, and it's also a root-only feature so it puts us back to square one. What would be needed is an immutability flag on hardlinks, settable by unprivileged users.) Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: enforcing DB immutability 2005-04-20 7:49 ` Ingo Molnar @ 2005-04-20 7:53 ` Ingo Molnar 2005-04-20 8:58 ` Chris Wedgwood 2005-04-20 14:57 ` Nick Craig-Wood 2005-04-27 8:15 ` Wout 2 siblings, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2005-04-20 7:53 UTC (permalink / raw) To: Linus Torvalds; +Cc: H. Peter Anvin, git * Ingo Molnar <mingo@elte.hu> wrote: > > perhaps having a new 'immutable hardlink' feature in the Linux VFS > > would help? I.e. a hardlink that can only be readonly followed, and > > can be removed, but cannot be chmod-ed to a writeable hardlink. That i > > think would be a large enough barrier for editors/build-tools not to > > play the tricks they already do that makes 'readonly' files virtually > > meaningless. > > immutable hardlinks have the following advantage: a hardlink by design > hides the information where the link comes from. So even if an editor > wanted to play stupid games and override the immutability - it doesnt > know where the DB object is. (sure, it could find it if it wants to, > but that needs real messing around - editors wont do _that_) so the only sensible thing the editor/tool can do when it wants to change the file is precisely what we want: it will copy the hardlinked files's contents to a new file, and will replace the old file with the new file - a copy on write. No accidental corruption of the DB's contents. (another in-kernel VFS solution would be to enforce that the files's name always matches the sha1 hash. So if someone edits a DB object it will automatically change its name. But this is complex, probably cannot be done atomically, and brings up other problems as well.) Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: enforcing DB immutability 2005-04-20 7:53 ` Ingo Molnar @ 2005-04-20 8:58 ` Chris Wedgwood 0 siblings, 0 replies; 20+ messages in thread From: Chris Wedgwood @ 2005-04-20 8:58 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, git On Wed, Apr 20, 2005 at 09:53:20AM +0200, Ingo Molnar wrote: > so the only sensible thing the editor/tool can do when it wants to > change the file is precisely what we want: it will copy the > hardlinked files's contents to a new file, and will replace the old > file with the new file - a copy on write. No accidental corruption > of the DB's contents. editors that have SCM smarts and know about the files different states can do this i really like the way this works under BK btw --- files are RO until i do the magic thing which will do a 'bk edit' and i can then do checkins or similar as needed (this assumes you can do per-file deltas) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: enforcing DB immutability 2005-04-20 7:49 ` Ingo Molnar 2005-04-20 7:53 ` Ingo Molnar @ 2005-04-20 14:57 ` Nick Craig-Wood 2005-04-27 8:15 ` Wout 2 siblings, 0 replies; 20+ messages in thread From: Nick Craig-Wood @ 2005-04-20 14:57 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, git On Wed, Apr 20, 2005 at 09:49:48AM +0200, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > > perhaps having a new 'immutable hardlink' feature in the Linux VFS > > would help? I.e. a hardlink that can only be readonly followed, and > > can be removed, but cannot be chmod-ed to a writeable hardlink. That i > > think would be a large enough barrier for editors/build-tools not to > > play the tricks they already do that makes 'readonly' files virtually > > meaningless. > > immutable hardlinks have the following advantage: a hardlink by design > hides the information where the link comes from. So even if an editor > wanted to play stupid games and override the immutability - it doesnt > know where the DB object is. (sure, it could find it if it wants to, but > that needs real messing around - editors wont do _that_) This has already been implemented for the linux vserver project. Take a look in the patch here :- http://vserver.13thfloor.at/Experimental/patch-2.6.11.7-vs1.9.5.x5.diff.bz2 (Its not split out, but search for IMMUTABLE and you'll see what I mean) It implements immutable linkage invert, which basically allows people to delete hardlinks to immutable files, but not do anything else to them. It uses another bit out of the attributes to "invert" the immutability of the linkage of immutable files. Its used in the vserver project so that individual vservers (which are basically just fancy chroots) can share libraries, binaries and hence memory, can't muck each other up, but can upgrade their libs/binaries. -- Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: enforcing DB immutability 2005-04-20 7:49 ` Ingo Molnar 2005-04-20 7:53 ` Ingo Molnar 2005-04-20 14:57 ` Nick Craig-Wood @ 2005-04-27 8:15 ` Wout 2 siblings, 0 replies; 20+ messages in thread From: Wout @ 2005-04-27 8:15 UTC (permalink / raw) To: Ingo Molnar; +Cc: git On Wed, Apr 20, 2005 at 09:49:48AM +0200, Ingo Molnar wrote: > > * Ingo Molnar <mingo@elte.hu> wrote: > > > perhaps having a new 'immutable hardlink' feature in the Linux VFS > > would help? I.e. a hardlink that can only be readonly followed, and > > can be removed, but cannot be chmod-ed to a writeable hardlink. That i > > think would be a large enough barrier for editors/build-tools not to > > play the tricks they already do that makes 'readonly' files virtually > > meaningless. > > immutable hardlinks have the following advantage: a hardlink by design > hides the information where the link comes from. So even if an editor > wanted to play stupid games and override the immutability - it doesnt > know where the DB object is. (sure, it could find it if it wants to, but > that needs real messing around - editors wont do _that_) > > i think this might work. > > (the current chattr +i flag isnt quite what we need though because it > works on the inode, and it's also a root-only feature so it puts us back > to square one. What would be needed is an immutability flag on > hardlinks, settable by unprivileged users.) > > Ingo > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Slightly off-topic for this list. Apologies to those offended. Would a filesystem that allows sharing of blocks between inodes be useful here? Each block would need a reference count (refco). Writing a block would be impossible once refco > 1. If someone attempts to write to such a block, a new block is allocated for that particular inode and the refco of the original is decreased. Next to this there would have to be a clone_file() function: clone_file(src-file, dst-file, mode) This function would create file dst-file with a new inode that references the blocks belonging to src-file (increasing the blocks' reference counts). The owner/group of dst-file are the caller, not the owner of src-file. Things to check for are: - read permissions for src-file - write permissions for dst-file - are src-file and dst-file in the same filesystem (if not, one could implement copy) - ...? Suppose I have a file foo: foo -> inode1(blk1[1], blk2[1], blk3[1], blk4[1]) The [n] value on the blocks is the reference count. I now call clone_file("foo", "bar", 0644): foo -> inode1(blk1[2], blk2[2], blk3[2], blk4[2]) bar -> inode2(blk1[2], blk2[2], blk3[2], blk4[2]) Next I modify blk2 of bar (write): foo -> inode1(blk1[2], blk2[1], blk3[2], blk4[2]) bar -> inode2(blk1[2], blk5[1], blk3[2], blk4[2]) I see the following uses: - Checking out a tree of (uncompressed) files with git could be done using the clone_file() call on each file. This means no extra disk space is used unless files are edited later. - Easy way to freeze files for backups. A database (mysql, ...) could bring its files into an acceptable state, call clone_file() on them and proceed with its work. - It could be used to protect user files from external tampering. Someone mentioned the problems with malware killing his files. The impact of this could be reduced by having a script that did a clone_file() on everything as root periodically. If files are deleted, root would have a backup. Notes: - Small changes to files would probably cause all the blocks to be copied as programs (editors) usually write out the complete file. - I don't know anything about implementing filesystems so all of the above could be complete nonsense. - The idea isn't mine, I've come across this before under the name of 'snapshot filesystems' and I think it was patented. I've never heard of anyone doing this for individual files though. Wout ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order 2005-04-13 20:07 ` H. Peter Anvin 2005-04-13 20:15 ` Ingo Molnar @ 2005-04-13 20:15 ` Linus Torvalds 1 sibling, 0 replies; 20+ messages in thread From: Linus Torvalds @ 2005-04-13 20:15 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ingo Molnar, git On Wed, 13 Apr 2005, H. Peter Anvin wrote: > > Either which way, it feels to me that this idea has already been ruled > out, so it's probably pointless to keep debating just exactly what we're > not actually going to do. Hey, isn't that how most discussions progress? ;) I don't mind alternatives per se. I'm just lazy. I came up with one solution to the issues I percieved, and I like that one. But dammit, if somebody comes up with something _clearly_ superior, I'll just bow down in your general direction, and promptly implement that. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order [not found] ` <20050413182909.GA25221@elte.hu> [not found] ` <Pine.LNX.4.58.0504131144160.4501@ppc970.osdl.org> @ 2005-04-13 20:28 ` Baruch Even 1 sibling, 0 replies; 20+ messages in thread From: Baruch Even @ 2005-04-13 20:28 UTC (permalink / raw) To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, git Ingo Molnar wrote: > with a plaintext repository we could do the 'hardlink trick' (which > brings in other manageability problems and limitations but is at least a > partially good idea), which would make the working tree and the > repository share the same inode in most cases. > > While in the compressed case we'd have a separate compressed inode > (taking up RAM with all its contents) and the working directory inode > (taking up RAM) - summing up to more RAM than if we only had a single > inode per object. > > furthermore, when generating/destroying large trees (which is a quite > common thing), a hardlinked solution is faster, as it doesnt create > 250MB+ of dirty RAM. In some cases (e.g. handling dozens of 'merge > trees') it's dramatically faster. You could still have the hardlink way by way of a .git/cache that keeps uncompressed files, keep the files with their hash names but uncompressed. It will be easy to find, fully hard-linkable, only keep the needed files uncompressed and the three year old file compressed. The You can even save some CPU time by checking if the file is in the cache before decompressing it, though it does cost you with an extra disk access to see if it's there or not. If you repeat the operation enough you'll have the uncompressed version in the cache most of the times anyway. Clear the cache weekly or so to avoid stale files from an ancient version. Baruch ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org>]
* Re: Index/hash order [not found] ` <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org> @ 2005-04-13 21:40 ` Florian Weimer 2005-04-13 22:11 ` Linus Torvalds 0 siblings, 1 reply; 20+ messages in thread From: Florian Weimer @ 2005-04-13 21:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: H. Peter Anvin, Ingo Molnar, git * Linus Torvalds: > - I want things to distribute well. This means that it has to be based > on a "append data" model, where historical data never changes, and you > only append on top of it (either by adding totally new files, or by > just letting the files grow). Yes, I think this is something which can easily dominate the choice of data structure. > This works in a forward-delta environment (which is fundamentally based > on the notion of "we know the old version, we're adding new stuff on > top of it"), but does _not_ work in the backwards model of "we keep the > old history as a delta against the new" model. Forward deltas don't have to be terribly inefficient. You can get O(log n) access to revision n fairly easily, using the trick described there: <http://svn.collab.net/repos/svn/trunk/notes/skip-deltas> I've run a few tests, just to get a few numbers of the overhead involved. I used the last ~8,000 changesets from the BKCVS kernel repository. With cold cache, a checkout from cold cache takes about 250 seconds on my laptop. I don't have git numbers, but a mere copy of the kernel tree needs 40 seconds. For the hot-cache case, the difference is 140 seconds vs. 2.5 seconds (or 6 seconds with decompression). Uh-oh. I wouldn't have imaged the difference would be *that* dramatic. The file system layer is *fast*. Subversion's delta implementation is not a speed daemon (it handles arbitrarily large files, which increases complexity significantly and slows things down, compared to simpler in-memory algorithms), but it will be very hard to come even close to the 2.5 seconds. On the storage front, we have 220 MB for the skip deltas vs. 106 MB for pure deltas-to-previous vs. 1.1 GB for uncompressed files (directories are always delta-compressed, so to speak[1]). In the first two cases, the first revision in the repository is deltaed against /dev/null and itself and thus compressed, in case you think the numbers are suspiciously low. 1. AFAICS, you can't really avoid that if you want to track file identity information without introducing arbitrary file IDs. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order 2005-04-13 21:40 ` Florian Weimer @ 2005-04-13 22:11 ` Linus Torvalds 2005-04-13 22:48 ` Florian Weimer 2005-04-14 7:04 ` Ingo Molnar 0 siblings, 2 replies; 20+ messages in thread From: Linus Torvalds @ 2005-04-13 22:11 UTC (permalink / raw) To: Florian Weimer; +Cc: H. Peter Anvin, Ingo Molnar, git On Wed, 13 Apr 2005, Florian Weimer wrote: > > I've run a few tests, just to get a few numbers of the overhead > involved. I used the last ~8,000 changesets from the BKCVS kernel > repository. With cold cache, a checkout from cold cache takes about > 250 seconds on my laptop. I don't have git numbers, but a mere copy > of the kernel tree needs 40 seconds. I will bet you that a git checkout is _faster_ than a kernel source tree copy. The time will be dominated by the IO costs (in particular the read costs), and the IO costs are lower thanks to compression. So I think that the cold-cache case will beat your 40 seconds by a clear margin. It generally compresses to half the size, so 20 seconds is not impossible (although seek costs would tend to stay constant, so I'd expect it to be somewhere in between the two). > For the hot-cache case, the difference is 140 seconds vs. 2.5 seconds > (or 6 seconds with decompression). > > Uh-oh. I wouldn't have imaged the difference would be *that* > dramatic. The file system layer is *fast*. Did I mention that I designed git for speed? Yes. The whole damn design is really about performance, distribution, and built-in integrity checking. > On the storage front, we have 220 MB for the skip deltas vs. 106 MB > for pure deltas-to-previous vs. 1.1 GB for uncompressed files > (directories are always delta-compressed, so to speak[1]). That's actually pretty encouraging. Your 1.1GB number implies to me that a compressed file setup should be about half that, which in turn says that the cost of full-file is not at all outrageous. Sure, it's 2-3 times larger than your skip deltas, but considering that the performance is about fifty times faster (and I can do distributed stuff without any locking synchronization and you can't), that's a tradeoff I'm more than happy with. Or maybe I misunderstood what you were comparing? Of course, the numbers will all depend on how the history looks etc, so this is all pretty much just guidelines. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order 2005-04-13 22:11 ` Linus Torvalds @ 2005-04-13 22:48 ` Florian Weimer 2005-04-14 7:04 ` Ingo Molnar 1 sibling, 0 replies; 20+ messages in thread From: Florian Weimer @ 2005-04-13 22:48 UTC (permalink / raw) To: Linus Torvalds; +Cc: H. Peter Anvin, Ingo Molnar, git * Linus Torvalds: > I will bet you that a git checkout is _faster_ than a kernel source tree > copy. The time will be dominated by the IO costs (in particular the read > costs), and the IO costs are lower thanks to compression. So I think that > the cold-cache case will beat your 40 seconds by a clear margin. It > generally compresses to half the size, so 20 seconds is not impossible > (although seek costs would tend to stay constant, so I'd expect it to be > somewhere in between the two). It's indeed slightly faster (34 seconds). The hot-cache case is about 6 seconds. Still okay. However, I should redo these tests with a real git. The numbers could be quite different because seek overhead is a bit hard to predict. Which version should I try? > That's actually pretty encouraging. Your 1.1GB number implies to me that a > compressed file setup should be about half that, which in turn says that > the cost of full-file is not at all outrageous. I usually try to avoid the typical O(f(n)) fallacy because constant factors do matter in practice. But the way you put it -- maybe delta compression isn't worth the complexity after all. At least I'm beginning to have doubts. Especially since the same Subversion repository, stored by the Berkeley DB backend, requires a whopping 1.3 GB of disk space. > Or maybe I misunderstood what you were comparing? My estimates only cover file data, not metadata. Based on the Subversion dumps, it might be possible to get some rough estimates for the cost of storing directory information. What is the average size of a directory blob? Is it true that for each tree revision, you need to store a new directory blob for each directory which indirectly contains a modified file? Does your 50% estimate include wasted space due to the file system block size? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Index/hash order 2005-04-13 22:11 ` Linus Torvalds 2005-04-13 22:48 ` Florian Weimer @ 2005-04-14 7:04 ` Ingo Molnar 2005-04-14 10:50 ` cache-cold repository performance Ingo Molnar 1 sibling, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2005-04-14 7:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Florian Weimer, H. Peter Anvin, git * Linus Torvalds <torvalds@osdl.org> wrote: > > I've run a few tests, just to get a few numbers of the overhead > > involved. I used the last ~8,000 changesets from the BKCVS kernel > > repository. With cold cache, a checkout from cold cache takes about > > 250 seconds on my laptop. I don't have git numbers, but a mere copy > > of the kernel tree needs 40 seconds. > > I will bet you that a git checkout is _faster_ than a kernel source > tree copy. The time will be dominated by the IO costs (in particular > the read costs), and the IO costs are lower thanks to compression. So > I think that the cold-cache case will beat your 40 seconds by a clear > margin. It generally compresses to half the size, so 20 seconds is not > impossible (although seek costs would tend to stay constant, so I'd > expect it to be somewhere in between the two). i'd be surprised if it was twice as fast - cache-cold linear checkouts are _seek_ limited, and it doesnt matter whether after a 1-2 msec track-to-track disk seek the DMA engine spends another 30 microseconds DMA-ing 60K uncompressed data instead of 30K compressed... (there are other factors, but this is the main thing.) Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* cache-cold repository performance 2005-04-14 7:04 ` Ingo Molnar @ 2005-04-14 10:50 ` Ingo Molnar 0 siblings, 0 replies; 20+ messages in thread From: Ingo Molnar @ 2005-04-14 10:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Florian Weimer, H. Peter Anvin, git * Ingo Molnar <mingo@elte.hu> wrote: > i'd be surprised if it was twice as fast - cache-cold linear checkouts > are _seek_ limited, and it doesnt matter whether after a 1-2 msec > track-to-track disk seek the DMA engine spends another 30 microseconds > DMA-ing 60K uncompressed data instead of 30K compressed... (there are > other factors, but this is the main thing.) i've benchmarked cache-cold compressed vs. uncompressed performance, to shed some more light on the performance differences between flat and compressed repositories. i did alot of testing, and i primarily concentrated on being able to _trust_ the benchmark results, not to generate some quick numbers. The major problem was that the timing of the reads associated with 'checking out a large tree' is very unstable, even on a completely isolated testsystem with very common (and predictable) IO hardware. the content i tested was a vanilla 2.6.10 kernel tree, with 19042 files in it, taking 246 MB uncompressed, and 110 MB compressed (via gzip -9). Average file size is 13.2 KB uncompressed, 5.9 KB compressed. Firstly, the timings are very sensitive to the way the tree was created. To have a 'fair' on-disk layout the trees have to be created in an identical fashion: e.g. it is not valid to copy the uncompressed tree and run gzip over it - that will create a 'sparse' on-disk layout penalizing the compressed layout and making it 30% slower than the uncompressed layout! I first created the two trees, then i "cp -a"-ed them over into a new directory one after each other, so that they get on similar on-disk positions as well. I also created 2 more pairs of such trees to make sure disk layout is fair. all timings were taken fresh after reboot, on a UP 1 GB RAM Athlon64 3200+, using a large, top of the line IDE disk. The kernel was 2.6.12-rc2, the filesystem was ext3 with enough free space to not be fragmented, both noatime and nodiratime was specified so that no write activities whatever occur during the 'checkout'. the operation timed was a simple: time find . -type f | xargs cat > /dev/null done in the root of the given tree. This generates the very same readonly IO pattern for each test. I've run the tests 10 times (i.e. have done 10 fresh reboots), but after every reboot i permutated the order of trees tested - to make sure there is no interaction between trees. (there was no interaction) here are the raw numbers, elapsed real time in seconds: flat-1: 29.7 29.5 29.4 29.4 29.5 29.5 29.7 29.6 29.4 29.6 29.5 29.4: 29.5 gzip-1: 41.2 40.9 40.7 40.7 40.5 41.7 41.0 40.3 40.6 40.8 40.8 40.9: 40.8 flat-2: 28.0 28.2 27.7 27.9 27.8 27.9 27.7 27.9 27.9 28.1 27.9 28.0: 27.9 gzip-2: 27.2 27.4 27.4 27.2 27.2 27.2 27.2 27.2 27.1 27.3 27.2 27.4: 27.2 flat-3: 27.0 27.8 27.6 27.7 27.8 27.8 27.8 27.7 27.8 27.6 27.8 27.8: 27.6 gzip-3: 25.8 26.8 26.6 26.5 26.5 26.5 26.6 26.4 26.5 26.7 26.6 26.7: 26.5 The final column is the average. (Standard deviation is below 0.1 sec, less than 0.3%.) flat-1 is the original tree, created via tar. gzip-1 is a cp -a copy of it, per-file compressed afterwards. flat-2 is a cp -a copy of flat-1, gzip-2 is a cp -a copy of gzip-1. flat-3/gzip-3 are cp -a copies of flat-2/gzip-2. note that gzip-1 is ~40% slower due to the 'sparse layout', so its results approximate a repository with 'bad' file layout. I'd not expect GIT repositories to have such a layout normally, so we can disregard it. flat-2/3 and gzip-2/3 can be directly compared. Firstly, the results show that the on-disk layout cannot be constructed reliably - there's a 1% systematic difference between flat-2 and flat-3, and a 3% systematic difference between gzip-2 and gzip-3 - both systematic errors are larger than the 0.5% standard deviation, so they are not measurement errors but real layout properties of these trees. the most interesting result is that gzip-2 is 2.5% faster than flat-2, and gzip-3 is 4% faster than flat-3. These differences are close to the layout-related systematic error, but slightly above it, so i'd conclude that a compressed repository is 3% faster on this hardware. (since these results were in line with my expectations i double-checked everything again and did another 10 reboot tests - same results.) conclusion [*]: there's a negligible cache-cold performance hit from using an uncompressed repository, because cache-cold performance is dominated by number of seeks, which is almost identical in the two cases. Ingo [*] lots of conditionals apply: these werent flat/compressed GIT repositories (although they were quite similar to it), nor was the GIT workload measured (although the one measured should be quite close to it). ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2005-04-27 8:10 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <425C3F12.9070606@zytor.com>
[not found] ` <Pine.LNX.4.58.0504121452330.4501@ppc970.osdl.org>
[not found] ` <20050412224027.GB20821@elte.hu>
[not found] ` <Pine.LNX.4.58.0504121554140.4501@ppc970.osdl.org>
[not found] ` <20050412230027.GA21759@elte.hu>
[not found] ` <20050412230729.GA22179@elte.hu>
[not found] ` <20050413111355.GB13865@elte.hu>
[not found] ` <425D4E1D.4040108@zytor.com>
[not found] ` <20050413165310.GA22428@elte.hu>
[not found] ` <425D4FB1.9040207@zytor.com>
[not found] ` <20050413171052.GA22711@elte.hu>
[not found] ` <Pine.LNX.4.58.0504131027210.4501@ppc970.osdl.org>
[not found] ` <20050413182909.GA25221@elte.hu>
[not found] ` <Pine.LNX.4.58.0504131144160.4501@ppc970.osdl.org>
2005-04-13 20:02 ` Index/hash order Ingo Molnar
2005-04-13 20:07 ` H. Peter Anvin
2005-04-13 20:15 ` Ingo Molnar
2005-04-13 20:18 ` Ingo Molnar
2005-04-13 20:21 ` Ingo Molnar
2005-04-13 20:26 ` Updated base64 patches H. Peter Anvin
2005-04-13 21:04 ` Index/hash order Linus Torvalds
2005-04-20 7:40 ` enforcing DB immutability Ingo Molnar
2005-04-20 7:49 ` Ingo Molnar
2005-04-20 7:53 ` Ingo Molnar
2005-04-20 8:58 ` Chris Wedgwood
2005-04-20 14:57 ` Nick Craig-Wood
2005-04-27 8:15 ` Wout
2005-04-13 20:15 ` Index/hash order Linus Torvalds
2005-04-13 20:28 ` Baruch Even
[not found] ` <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org>
2005-04-13 21:40 ` Florian Weimer
2005-04-13 22:11 ` Linus Torvalds
2005-04-13 22:48 ` Florian Weimer
2005-04-14 7:04 ` Ingo Molnar
2005-04-14 10:50 ` cache-cold repository performance Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).