* Re: Index/hash order
[not found] ` <Pine.LNX.4.58.0504131144160.4501@ppc970.osdl.org>
@ 2005-04-13 20:02 ` Ingo Molnar
2005-04-13 20:07 ` H. Peter Anvin
0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2005-04-13 20:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: H. Peter Anvin, git
* Linus Torvalds <torvalds@osdl.org> wrote:
> > with a plaintext repository we could do the 'hardlink trick' (which
> > brings in other manageability problems and limitations but is at least a
> > partially good idea), which would make the working tree and the
> > repository share the same inode in most cases.
> However, the real issue is that you're really asking for trouble.
> There are tons of tools that modify files without breaking the
> hardlink. Even some editors do. So you just use the wrong tool on the
> tree by mistake, and not only is your archive corrupt, you've
> corrupted all other archives that might have shared the same object
> directory.
that's what i loosely meant under 'manageability problems'.
I mentioned one solution earlier: to make the repository object an
immutable file (the +i flag on the inode) - it really wants to be
immutable after all. That would solve a whole range of 'accidental
corruption' issues.
Another solution (suggested by Christer Weinigel) was to enforce
immutability by making it owned by another user/group (git:git or
whatever).
but having a binary compressed format is 'soft immutability', done
cleverly.
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
2005-04-13 20:02 ` Index/hash order Ingo Molnar
@ 2005-04-13 20:07 ` H. Peter Anvin
2005-04-13 20:15 ` Ingo Molnar
2005-04-13 20:15 ` Index/hash order Linus Torvalds
0 siblings, 2 replies; 23+ messages in thread
From: H. Peter Anvin @ 2005-04-13 20:07 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, git
Ingo Molnar wrote:
>
> that's what i loosely meant under 'manageability problems'.
>
> I mentioned one solution earlier: to make the repository object an
> immutable file (the +i flag on the inode) - it really wants to be
> immutable after all. That would solve a whole range of 'accidental
> corruption' issues.
>
I think abusing the immutable bit quickly will decend into the same
rathole which makes u-w often useless. u-w will actually be preserved
by more tools -- simply because they know about it -- than +i.
Either which way, it feels to me that this idea has already been ruled
out, so it's probably pointless to keep debating just exactly what we're
not actually going to do.
-hpa
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
2005-04-13 20:07 ` H. Peter Anvin
@ 2005-04-13 20:15 ` Ingo Molnar
2005-04-13 20:18 ` Ingo Molnar
2005-04-13 21:04 ` Index/hash order Linus Torvalds
2005-04-13 20:15 ` Index/hash order Linus Torvalds
1 sibling, 2 replies; 23+ messages in thread
From: Ingo Molnar @ 2005-04-13 20:15 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linus Torvalds, git
* H. Peter Anvin <hpa@zytor.com> wrote:
> >that's what i loosely meant under 'manageability problems'.
> >
> >I mentioned one solution earlier: to make the repository object an
> >immutable file (the +i flag on the inode) - it really wants to be
> >immutable after all. That would solve a whole range of 'accidental
> >corruption' issues.
> >
>
> I think abusing the immutable bit quickly will decend into the same
> rathole which makes u-w often useless. u-w will actually be preserved
> by more tools -- simply because they know about it -- than +i.
well, the 'owned by another user' solution is valid though, and doesnt
have this particular problem. (We've got a secure multiuser OS, so can
as well use it to protect the DB against corruption.)
> Either which way, it feels to me that this idea has already been ruled
> out, so it's probably pointless to keep debating just exactly what
> we're not actually going to do.
(even if it sounds stupid, i keep discussing decisions that are done for
reasons i cannot fully agree with (yet), even if i happen to agree with
the net decision. It's all technological arguments, so it's not like
there's anything fuzzy about any of these issues.)
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
2005-04-13 20:15 ` Ingo Molnar
@ 2005-04-13 20:18 ` Ingo Molnar
2005-04-13 20:21 ` Ingo Molnar
2005-04-13 21:04 ` Index/hash order Linus Torvalds
1 sibling, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2005-04-13 20:18 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linus Torvalds, git
* Ingo Molnar <mingo@elte.hu> wrote:
> > I think abusing the immutable bit quickly will decend into the same
> > rathole which makes u-w often useless. u-w will actually be preserved
> > by more tools -- simply because they know about it -- than +i.
>
> well, the 'owned by another user' solution is valid though, and doesnt
> have this particular problem. (We've got a secure multiuser OS, so can
> as well use it to protect the DB against corruption.)
but ... this variant doesnt have any 'wow' feeling to it either, and it
clearly brings in a number of other limitations. I might as well shut up
until i can suggest something obviously superior :)
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
2005-04-13 20:18 ` Ingo Molnar
@ 2005-04-13 20:21 ` Ingo Molnar
2005-04-13 20:26 ` Updated base64 patches H. Peter Anvin
0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2005-04-13 20:21 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linus Torvalds, git
* Ingo Molnar <mingo@elte.hu> wrote:
> > > I think abusing the immutable bit quickly will decend into the same
> > > rathole which makes u-w often useless. u-w will actually be preserved
> > > by more tools -- simply because they know about it -- than +i.
> >
> > well, the 'owned by another user' solution is valid though, and doesnt
> > have this particular problem. (We've got a secure multiuser OS, so can
> > as well use it to protect the DB against corruption.)
>
> but ... this variant doesnt have any 'wow' feeling to it either, and
> it clearly brings in a number of other limitations. I might as well
> shut up until i can suggest something obviously superior :)
i think the killer argument is compression. A 2 GB compressed repository
will be a hard sell already, 4 GB is pretty much out of question. And
once we accept that we have to have _some_ form of compression, it's
Linus' scheme that wins.
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
2005-04-13 20:15 ` Ingo Molnar
2005-04-13 20:18 ` Ingo Molnar
@ 2005-04-13 21:04 ` Linus Torvalds
2005-04-20 7:40 ` enforcing DB immutability Ingo Molnar
1 sibling, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2005-04-13 21:04 UTC (permalink / raw)
To: Ingo Molnar; +Cc: H. Peter Anvin, git
On Wed, 13 Apr 2005, Ingo Molnar wrote:
>
> well, the 'owned by another user' solution is valid though, and doesnt
> have this particular problem. (We've got a secure multiuser OS, so can
> as well use it to protect the DB against corruption.)
So now you need root to set up new repositories? No thanks.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* enforcing DB immutability
2005-04-13 21:04 ` Index/hash order Linus Torvalds
@ 2005-04-20 7:40 ` Ingo Molnar
2005-04-20 7:49 ` Ingo Molnar
0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2005-04-20 7:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: H. Peter Anvin, git
* Linus Torvalds <torvalds@osdl.org> wrote:
> On Wed, 13 Apr 2005, Ingo Molnar wrote:
> >
> > well, the 'owned by another user' solution is valid though, and doesnt
> > have this particular problem. (We've got a secure multiuser OS, so can
> > as well use it to protect the DB against corruption.)
>
> So now you need root to set up new repositories? No thanks.
yeah, it's a bit awkward to protect uncompressed repositories - but it
will need some sort of kernel enforcement. (if userspace finds out the
DB contains uncompressed blobs, it _will_ try to use them.)
(perhaps having an in-kernel GIT-alike versioned filesystem will help -
but that brings up the same 'I have to be root' issues. The FS will
enforce the true immutability of objects.)
perhaps having a new 'immutable hardlink' feature in the Linux VFS would
help? I.e. a hardlink that can only be readonly followed, and can be
removed, but cannot be chmod-ed to a writeable hardlink. That i think
would be a large enough barrier for editors/build-tools not to play the
tricks they already do that makes 'readonly' files virtually
meaningless.
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: enforcing DB immutability
2005-04-20 7:40 ` enforcing DB immutability Ingo Molnar
@ 2005-04-20 7:49 ` Ingo Molnar
2005-04-20 7:53 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Ingo Molnar @ 2005-04-20 7:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: H. Peter Anvin, git
* Ingo Molnar <mingo@elte.hu> wrote:
> perhaps having a new 'immutable hardlink' feature in the Linux VFS
> would help? I.e. a hardlink that can only be readonly followed, and
> can be removed, but cannot be chmod-ed to a writeable hardlink. That i
> think would be a large enough barrier for editors/build-tools not to
> play the tricks they already do that makes 'readonly' files virtually
> meaningless.
immutable hardlinks have the following advantage: a hardlink by design
hides the information where the link comes from. So even if an editor
wanted to play stupid games and override the immutability - it doesnt
know where the DB object is. (sure, it could find it if it wants to, but
that needs real messing around - editors wont do _that_)
i think this might work.
(the current chattr +i flag isnt quite what we need though because it
works on the inode, and it's also a root-only feature so it puts us back
to square one. What would be needed is an immutability flag on
hardlinks, settable by unprivileged users.)
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: enforcing DB immutability
2005-04-20 7:49 ` Ingo Molnar
@ 2005-04-20 7:53 ` Ingo Molnar
2005-04-20 8:58 ` Chris Wedgwood
2005-04-20 14:57 ` Nick Craig-Wood
2005-04-27 8:15 ` Wout
2 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2005-04-20 7:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: H. Peter Anvin, git
* Ingo Molnar <mingo@elte.hu> wrote:
> > perhaps having a new 'immutable hardlink' feature in the Linux VFS
> > would help? I.e. a hardlink that can only be readonly followed, and
> > can be removed, but cannot be chmod-ed to a writeable hardlink. That i
> > think would be a large enough barrier for editors/build-tools not to
> > play the tricks they already do that makes 'readonly' files virtually
> > meaningless.
>
> immutable hardlinks have the following advantage: a hardlink by design
> hides the information where the link comes from. So even if an editor
> wanted to play stupid games and override the immutability - it doesnt
> know where the DB object is. (sure, it could find it if it wants to,
> but that needs real messing around - editors wont do _that_)
so the only sensible thing the editor/tool can do when it wants to
change the file is precisely what we want: it will copy the hardlinked
files's contents to a new file, and will replace the old file with the
new file - a copy on write. No accidental corruption of the DB's
contents.
(another in-kernel VFS solution would be to enforce that the files's
name always matches the sha1 hash. So if someone edits a DB object it
will automatically change its name. But this is complex, probably cannot
be done atomically, and brings up other problems as well.)
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: enforcing DB immutability
2005-04-20 7:53 ` Ingo Molnar
@ 2005-04-20 8:58 ` Chris Wedgwood
0 siblings, 0 replies; 23+ messages in thread
From: Chris Wedgwood @ 2005-04-20 8:58 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, git
On Wed, Apr 20, 2005 at 09:53:20AM +0200, Ingo Molnar wrote:
> so the only sensible thing the editor/tool can do when it wants to
> change the file is precisely what we want: it will copy the
> hardlinked files's contents to a new file, and will replace the old
> file with the new file - a copy on write. No accidental corruption
> of the DB's contents.
editors that have SCM smarts and know about the files different states
can do this
i really like the way this works under BK btw --- files are RO until i
do the magic thing which will do a 'bk edit' and i can then do
checkins or similar as needed (this assumes you can do per-file
deltas)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: enforcing DB immutability
2005-04-20 7:49 ` Ingo Molnar
2005-04-20 7:53 ` Ingo Molnar
@ 2005-04-20 14:57 ` Nick Craig-Wood
2005-04-27 8:15 ` Wout
2 siblings, 0 replies; 23+ messages in thread
From: Nick Craig-Wood @ 2005-04-20 14:57 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, git
On Wed, Apr 20, 2005 at 09:49:48AM +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
>
> > perhaps having a new 'immutable hardlink' feature in the Linux VFS
> > would help? I.e. a hardlink that can only be readonly followed, and
> > can be removed, but cannot be chmod-ed to a writeable hardlink. That i
> > think would be a large enough barrier for editors/build-tools not to
> > play the tricks they already do that makes 'readonly' files virtually
> > meaningless.
>
> immutable hardlinks have the following advantage: a hardlink by design
> hides the information where the link comes from. So even if an editor
> wanted to play stupid games and override the immutability - it doesnt
> know where the DB object is. (sure, it could find it if it wants to, but
> that needs real messing around - editors wont do _that_)
This has already been implemented for the linux vserver project. Take
a look in the patch here :-
http://vserver.13thfloor.at/Experimental/patch-2.6.11.7-vs1.9.5.x5.diff.bz2
(Its not split out, but search for IMMUTABLE and you'll see what I mean)
It implements immutable linkage invert, which basically allows people
to delete hardlinks to immutable files, but not do anything else to
them. It uses another bit out of the attributes to "invert" the
immutability of the linkage of immutable files.
Its used in the vserver project so that individual vservers (which are
basically just fancy chroots) can share libraries, binaries and hence
memory, can't muck each other up, but can upgrade their libs/binaries.
--
Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: enforcing DB immutability
2005-04-20 7:49 ` Ingo Molnar
2005-04-20 7:53 ` Ingo Molnar
2005-04-20 14:57 ` Nick Craig-Wood
@ 2005-04-27 8:15 ` Wout
2 siblings, 0 replies; 23+ messages in thread
From: Wout @ 2005-04-27 8:15 UTC (permalink / raw)
To: Ingo Molnar; +Cc: git
On Wed, Apr 20, 2005 at 09:49:48AM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
> > perhaps having a new 'immutable hardlink' feature in the Linux VFS
> > would help? I.e. a hardlink that can only be readonly followed, and
> > can be removed, but cannot be chmod-ed to a writeable hardlink. That i
> > think would be a large enough barrier for editors/build-tools not to
> > play the tricks they already do that makes 'readonly' files virtually
> > meaningless.
>
> immutable hardlinks have the following advantage: a hardlink by design
> hides the information where the link comes from. So even if an editor
> wanted to play stupid games and override the immutability - it doesnt
> know where the DB object is. (sure, it could find it if it wants to, but
> that needs real messing around - editors wont do _that_)
>
> i think this might work.
>
> (the current chattr +i flag isnt quite what we need though because it
> works on the inode, and it's also a root-only feature so it puts us back
> to square one. What would be needed is an immutability flag on
> hardlinks, settable by unprivileged users.)
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Slightly off-topic for this list. Apologies to those offended.
Would a filesystem that allows sharing of blocks between inodes
be useful here? Each block would need a reference count (refco).
Writing a block would be impossible once refco > 1. If someone
attempts to write to such a block, a new block is allocated for
that particular inode and the refco of the original is decreased.
Next to this there would have to be a clone_file() function:
clone_file(src-file, dst-file, mode)
This function would create file dst-file with a new inode that
references the blocks belonging to src-file (increasing the
blocks' reference counts). The owner/group of dst-file are the
caller, not the owner of src-file.
Things to check for are:
- read permissions for src-file
- write permissions for dst-file
- are src-file and dst-file in the same filesystem (if not,
one could implement copy)
- ...?
Suppose I have a file foo:
foo -> inode1(blk1[1], blk2[1], blk3[1], blk4[1])
The [n] value on the blocks is the reference count.
I now call clone_file("foo", "bar", 0644):
foo -> inode1(blk1[2], blk2[2], blk3[2], blk4[2])
bar -> inode2(blk1[2], blk2[2], blk3[2], blk4[2])
Next I modify blk2 of bar (write):
foo -> inode1(blk1[2], blk2[1], blk3[2], blk4[2])
bar -> inode2(blk1[2], blk5[1], blk3[2], blk4[2])
I see the following uses:
- Checking out a tree of (uncompressed) files with git could be
done using the clone_file() call on each file. This means no
extra disk space is used unless files are edited later.
- Easy way to freeze files for backups. A database (mysql, ...)
could bring its files into an acceptable state, call clone_file()
on them and proceed with its work.
- It could be used to protect user files from external tampering.
Someone mentioned the problems with malware killing his files.
The impact of this could be reduced by having a script that did
a clone_file() on everything as root periodically. If files are
deleted, root would have a backup.
Notes:
- Small changes to files would probably cause all the blocks to
be copied as programs (editors) usually write out the complete
file.
- I don't know anything about implementing filesystems so all of
the above could be complete nonsense.
- The idea isn't mine, I've come across this before under the name
of 'snapshot filesystems' and I think it was patented. I've never
heard of anyone doing this for individual files though.
Wout
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
2005-04-13 20:07 ` H. Peter Anvin
2005-04-13 20:15 ` Ingo Molnar
@ 2005-04-13 20:15 ` Linus Torvalds
1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2005-04-13 20:15 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ingo Molnar, git
On Wed, 13 Apr 2005, H. Peter Anvin wrote:
>
> Either which way, it feels to me that this idea has already been ruled
> out, so it's probably pointless to keep debating just exactly what we're
> not actually going to do.
Hey, isn't that how most discussions progress? ;)
I don't mind alternatives per se. I'm just lazy. I came up with one
solution to the issues I percieved, and I like that one. But dammit, if
somebody comes up with something _clearly_ superior, I'll just bow down in
your general direction, and promptly implement that.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
[not found] ` <20050413182909.GA25221@elte.hu>
[not found] ` <Pine.LNX.4.58.0504131144160.4501@ppc970.osdl.org>
@ 2005-04-13 20:28 ` Baruch Even
1 sibling, 0 replies; 23+ messages in thread
From: Baruch Even @ 2005-04-13 20:28 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, git
Ingo Molnar wrote:
> with a plaintext repository we could do the 'hardlink trick' (which
> brings in other manageability problems and limitations but is at least a
> partially good idea), which would make the working tree and the
> repository share the same inode in most cases.
>
> While in the compressed case we'd have a separate compressed inode
> (taking up RAM with all its contents) and the working directory inode
> (taking up RAM) - summing up to more RAM than if we only had a single
> inode per object.
>
> furthermore, when generating/destroying large trees (which is a quite
> common thing), a hardlinked solution is faster, as it doesnt create
> 250MB+ of dirty RAM. In some cases (e.g. handling dozens of 'merge
> trees') it's dramatically faster.
You could still have the hardlink way by way of a .git/cache that keeps
uncompressed files, keep the files with their hash names but uncompressed.
It will be easy to find, fully hard-linkable, only keep the needed files
uncompressed and the three year old file compressed. The
You can even save some CPU time by checking if the file is in the cache
before decompressing it, though it does cost you with an extra disk
access to see if it's there or not. If you repeat the operation enough
you'll have the uncompressed version in the cache most of the times anyway.
Clear the cache weekly or so to avoid stale files from an ancient version.
Baruch
^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org>]
* Re: Index/hash order
[not found] ` <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org>
@ 2005-04-13 21:40 ` Florian Weimer
2005-04-13 22:11 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Florian Weimer @ 2005-04-13 21:40 UTC (permalink / raw)
To: Linus Torvalds; +Cc: H. Peter Anvin, Ingo Molnar, git
* Linus Torvalds:
> - I want things to distribute well. This means that it has to be based
> on a "append data" model, where historical data never changes, and you
> only append on top of it (either by adding totally new files, or by
> just letting the files grow).
Yes, I think this is something which can easily dominate the choice of
data structure.
> This works in a forward-delta environment (which is fundamentally based
> on the notion of "we know the old version, we're adding new stuff on
> top of it"), but does _not_ work in the backwards model of "we keep the
> old history as a delta against the new" model.
Forward deltas don't have to be terribly inefficient. You can get
O(log n) access to revision n fairly easily, using the trick described
there:
<http://svn.collab.net/repos/svn/trunk/notes/skip-deltas>
I've run a few tests, just to get a few numbers of the overhead
involved. I used the last ~8,000 changesets from the BKCVS kernel
repository. With cold cache, a checkout from cold cache takes about
250 seconds on my laptop. I don't have git numbers, but a mere copy
of the kernel tree needs 40 seconds.
For the hot-cache case, the difference is 140 seconds vs. 2.5 seconds
(or 6 seconds with decompression).
Uh-oh. I wouldn't have imaged the difference would be *that*
dramatic. The file system layer is *fast*.
Subversion's delta implementation is not a speed daemon (it handles
arbitrarily large files, which increases complexity significantly and
slows things down, compared to simpler in-memory algorithms), but it
will be very hard to come even close to the 2.5 seconds.
On the storage front, we have 220 MB for the skip deltas vs. 106 MB
for pure deltas-to-previous vs. 1.1 GB for uncompressed files
(directories are always delta-compressed, so to speak[1]). In the
first two cases, the first revision in the repository is deltaed
against /dev/null and itself and thus compressed, in case you think
the numbers are suspiciously low.
1. AFAICS, you can't really avoid that if you want to track file
identity information without introducing arbitrary file IDs.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
2005-04-13 21:40 ` Florian Weimer
@ 2005-04-13 22:11 ` Linus Torvalds
2005-04-13 22:48 ` Florian Weimer
2005-04-14 7:04 ` Ingo Molnar
0 siblings, 2 replies; 23+ messages in thread
From: Linus Torvalds @ 2005-04-13 22:11 UTC (permalink / raw)
To: Florian Weimer; +Cc: H. Peter Anvin, Ingo Molnar, git
On Wed, 13 Apr 2005, Florian Weimer wrote:
>
> I've run a few tests, just to get a few numbers of the overhead
> involved. I used the last ~8,000 changesets from the BKCVS kernel
> repository. With cold cache, a checkout from cold cache takes about
> 250 seconds on my laptop. I don't have git numbers, but a mere copy
> of the kernel tree needs 40 seconds.
I will bet you that a git checkout is _faster_ than a kernel source tree
copy. The time will be dominated by the IO costs (in particular the read
costs), and the IO costs are lower thanks to compression. So I think that
the cold-cache case will beat your 40 seconds by a clear margin. It
generally compresses to half the size, so 20 seconds is not impossible
(although seek costs would tend to stay constant, so I'd expect it to be
somewhere in between the two).
> For the hot-cache case, the difference is 140 seconds vs. 2.5 seconds
> (or 6 seconds with decompression).
>
> Uh-oh. I wouldn't have imaged the difference would be *that*
> dramatic. The file system layer is *fast*.
Did I mention that I designed git for speed?
Yes. The whole damn design is really about performance, distribution, and
built-in integrity checking.
> On the storage front, we have 220 MB for the skip deltas vs. 106 MB
> for pure deltas-to-previous vs. 1.1 GB for uncompressed files
> (directories are always delta-compressed, so to speak[1]).
That's actually pretty encouraging. Your 1.1GB number implies to me that a
compressed file setup should be about half that, which in turn says that
the cost of full-file is not at all outrageous. Sure, it's 2-3 times
larger than your skip deltas, but considering that the performance is
about fifty times faster (and I can do distributed stuff without any
locking synchronization and you can't), that's a tradeoff I'm more than
happy with.
Or maybe I misunderstood what you were comparing?
Of course, the numbers will all depend on how the history looks etc, so
this is all pretty much just guidelines.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
2005-04-13 22:11 ` Linus Torvalds
@ 2005-04-13 22:48 ` Florian Weimer
2005-04-14 7:04 ` Ingo Molnar
1 sibling, 0 replies; 23+ messages in thread
From: Florian Weimer @ 2005-04-13 22:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: H. Peter Anvin, Ingo Molnar, git
* Linus Torvalds:
> I will bet you that a git checkout is _faster_ than a kernel source tree
> copy. The time will be dominated by the IO costs (in particular the read
> costs), and the IO costs are lower thanks to compression. So I think that
> the cold-cache case will beat your 40 seconds by a clear margin. It
> generally compresses to half the size, so 20 seconds is not impossible
> (although seek costs would tend to stay constant, so I'd expect it to be
> somewhere in between the two).
It's indeed slightly faster (34 seconds). The hot-cache case is about
6 seconds. Still okay.
However, I should redo these tests with a real git. The numbers could
be quite different because seek overhead is a bit hard to predict.
Which version should I try?
> That's actually pretty encouraging. Your 1.1GB number implies to me that a
> compressed file setup should be about half that, which in turn says that
> the cost of full-file is not at all outrageous.
I usually try to avoid the typical O(f(n)) fallacy because constant
factors do matter in practice. But the way you put it -- maybe delta
compression isn't worth the complexity after all. At least I'm
beginning to have doubts.
Especially since the same Subversion repository, stored by the
Berkeley DB backend, requires a whopping 1.3 GB of disk space.
> Or maybe I misunderstood what you were comparing?
My estimates only cover file data, not metadata. Based on the
Subversion dumps, it might be possible to get some rough estimates for
the cost of storing directory information. What is the average size
of a directory blob? Is it true that for each tree revision, you need
to store a new directory blob for each directory which indirectly
contains a modified file?
Does your 50% estimate include wasted space due to the file system
block size?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Index/hash order
2005-04-13 22:11 ` Linus Torvalds
2005-04-13 22:48 ` Florian Weimer
@ 2005-04-14 7:04 ` Ingo Molnar
2005-04-14 10:50 ` cache-cold repository performance Ingo Molnar
1 sibling, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2005-04-14 7:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Florian Weimer, H. Peter Anvin, git
* Linus Torvalds <torvalds@osdl.org> wrote:
> > I've run a few tests, just to get a few numbers of the overhead
> > involved. I used the last ~8,000 changesets from the BKCVS kernel
> > repository. With cold cache, a checkout from cold cache takes about
> > 250 seconds on my laptop. I don't have git numbers, but a mere copy
> > of the kernel tree needs 40 seconds.
>
> I will bet you that a git checkout is _faster_ than a kernel source
> tree copy. The time will be dominated by the IO costs (in particular
> the read costs), and the IO costs are lower thanks to compression. So
> I think that the cold-cache case will beat your 40 seconds by a clear
> margin. It generally compresses to half the size, so 20 seconds is not
> impossible (although seek costs would tend to stay constant, so I'd
> expect it to be somewhere in between the two).
i'd be surprised if it was twice as fast - cache-cold linear checkouts
are _seek_ limited, and it doesnt matter whether after a 1-2 msec
track-to-track disk seek the DMA engine spends another 30 microseconds
DMA-ing 60K uncompressed data instead of 30K compressed... (there are
other factors, but this is the main thing.)
Ingo
^ permalink raw reply [flat|nested] 23+ messages in thread
* cache-cold repository performance
2005-04-14 7:04 ` Ingo Molnar
@ 2005-04-14 10:50 ` Ingo Molnar
0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2005-04-14 10:50 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Florian Weimer, H. Peter Anvin, git
* Ingo Molnar <mingo@elte.hu> wrote:
> i'd be surprised if it was twice as fast - cache-cold linear checkouts
> are _seek_ limited, and it doesnt matter whether after a 1-2 msec
> track-to-track disk seek the DMA engine spends another 30 microseconds
> DMA-ing 60K uncompressed data instead of 30K compressed... (there are
> other factors, but this is the main thing.)
i've benchmarked cache-cold compressed vs. uncompressed performance, to
shed some more light on the performance differences between flat and
compressed repositories.
i did alot of testing, and i primarily concentrated on being able to
_trust_ the benchmark results, not to generate some quick numbers. The
major problem was that the timing of the reads associated with 'checking
out a large tree' is very unstable, even on a completely isolated
testsystem with very common (and predictable) IO hardware.
the content i tested was a vanilla 2.6.10 kernel tree, with 19042 files
in it, taking 246 MB uncompressed, and 110 MB compressed (via gzip -9).
Average file size is 13.2 KB uncompressed, 5.9 KB compressed.
Firstly, the timings are very sensitive to the way the tree was created.
To have a 'fair' on-disk layout the trees have to be created in an
identical fashion: e.g. it is not valid to copy the uncompressed tree
and run gzip over it - that will create a 'sparse' on-disk layout
penalizing the compressed layout and making it 30% slower than the
uncompressed layout! I first created the two trees, then i "cp -a"-ed
them over into a new directory one after each other, so that they get on
similar on-disk positions as well. I also created 2 more pairs of such
trees to make sure disk layout is fair.
all timings were taken fresh after reboot, on a UP 1 GB RAM Athlon64
3200+, using a large, top of the line IDE disk. The kernel was
2.6.12-rc2, the filesystem was ext3 with enough free space to not be
fragmented, both noatime and nodiratime was specified so that no write
activities whatever occur during the 'checkout'.
the operation timed was a simple:
time find . -type f | xargs cat > /dev/null
done in the root of the given tree. This generates the very same
readonly IO pattern for each test. I've run the tests 10 times (i.e.
have done 10 fresh reboots), but after every reboot i permutated the
order of trees tested - to make sure there is no interaction between
trees. (there was no interaction)
here are the raw numbers, elapsed real time in seconds:
flat-1: 29.7 29.5 29.4 29.4 29.5 29.5 29.7 29.6 29.4 29.6 29.5 29.4: 29.5
gzip-1: 41.2 40.9 40.7 40.7 40.5 41.7 41.0 40.3 40.6 40.8 40.8 40.9: 40.8
flat-2: 28.0 28.2 27.7 27.9 27.8 27.9 27.7 27.9 27.9 28.1 27.9 28.0: 27.9
gzip-2: 27.2 27.4 27.4 27.2 27.2 27.2 27.2 27.2 27.1 27.3 27.2 27.4: 27.2
flat-3: 27.0 27.8 27.6 27.7 27.8 27.8 27.8 27.7 27.8 27.6 27.8 27.8: 27.6
gzip-3: 25.8 26.8 26.6 26.5 26.5 26.5 26.6 26.4 26.5 26.7 26.6 26.7: 26.5
The final column is the average. (Standard deviation is below 0.1 sec,
less than 0.3%.)
flat-1 is the original tree, created via tar. gzip-1 is a cp -a copy of
it, per-file compressed afterwards. flat-2 is a cp -a copy of flat-1,
gzip-2 is a cp -a copy of gzip-1. flat-3/gzip-3 are cp -a copies of
flat-2/gzip-2.
note that gzip-1 is ~40% slower due to the 'sparse layout', so its
results approximate a repository with 'bad' file layout. I'd not expect
GIT repositories to have such a layout normally, so we can disregard it.
flat-2/3 and gzip-2/3 can be directly compared. Firstly, the results
show that the on-disk layout cannot be constructed reliably - there's a
1% systematic difference between flat-2 and flat-3, and a 3% systematic
difference between gzip-2 and gzip-3 - both systematic errors are larger
than the 0.5% standard deviation, so they are not measurement errors but
real layout properties of these trees.
the most interesting result is that gzip-2 is 2.5% faster than flat-2,
and gzip-3 is 4% faster than flat-3. These differences are close to the
layout-related systematic error, but slightly above it, so i'd conclude
that a compressed repository is 3% faster on this hardware.
(since these results were in line with my expectations i double-checked
everything again and did another 10 reboot tests - same results.)
conclusion [*]: there's a negligible cache-cold performance hit from
using an uncompressed repository, because cache-cold performance is
dominated by number of seeks, which is almost identical in the two
cases.
Ingo
[*] lots of conditionals apply: these werent flat/compressed GIT
repositories (although they were quite similar to it), nor was the GIT
workload measured (although the one measured should be quite close to
it).
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: enforcing DB immutability
@ 2005-04-20 8:41 linux
2005-04-20 15:57 ` Erik Mouw
2005-04-22 16:10 ` Bill Davidsen
0 siblings, 2 replies; 23+ messages in thread
From: linux @ 2005-04-20 8:41 UTC (permalink / raw)
To: git, linux-kernel; +Cc: mingo
[A discussion on the git list about how to provide a hardlinked file
that *cannot* me modified by an editor, but must be replaced by
a new copy.]
mingo@elte.hu wrote all of:
>>> perhaps having a new 'immutable hardlink' feature in the Linux VFS
>>> would help? I.e. a hardlink that can only be readonly followed, and
>>> can be removed, but cannot be chmod-ed to a writeable hardlink. That i
>>> think would be a large enough barrier for editors/build-tools not to
>>> play the tricks they already do that makes 'readonly' files virtually
>>> meaningless.
>>
>> immutable hardlinks have the following advantage: a hardlink by design
>> hides the information where the link comes from. So even if an editor
>> wanted to play stupid games and override the immutability - it doesnt
>> know where the DB object is. (sure, it could find it if it wants to,
>> but that needs real messing around - editors wont do _that_)
>
> so the only sensible thing the editor/tool can do when it wants to
> change the file is precisely what we want: it will copy the hardlinked
> files's contents to a new file, and will replace the old file with the
> new file - a copy on write. No accidental corruption of the DB's
> contents.
This is not a horrible idea, but it touches on another sore point I've
worried about for a while.
The obvious way to do the above *without* changing anything is just to
remove all write permission to the file. But because I'm the owner, some
piece of software running with my permissions can just deicde to change
the permissions back and modify the file anyway. Good old 7th edition
let you give files away, which could have addressed that (chmod a-w; chown
phantom_user), but BSD took that ability away to make accounting work.
The upshot is that, while separate users keeps malware from harming the
*system*, if I run a piece of malware, it can blow away every file I
own and make me unhappy. When (notice I'm not saying "if") commercial
spyware for Linux becomes common, it can also read every file I own.
Unless I have root access, Linux is no safer *for me* than Redmondware!
Since I *do* have root access, I often set up sandbox users and try
commercial binaries in that environment, but it's a pain and laziness
often wins. I want a feature that I can wrap in a script, so that I
can run a commercial binary in a nicely restricted enviromment.
Or maybe I even want to set up a "personal root" level, and run
my normal interactive shells in a slightly restricted enviroment
(within which I could make a more-restricted world to run untrusted
binaries). Then I could solve the immutable DB issue by having a
"setuid" binary that would make checked-in files unwriteable at my
normal permission level.
Obviously, a fundamental change to the Unix permissions model won't
be available to solve short-term problems, but I thought I'd raise
the issue to get people thinking about longer-term solutions.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: enforcing DB immutability
2005-04-20 8:41 enforcing DB immutability linux
@ 2005-04-20 15:57 ` Erik Mouw
2005-04-22 16:10 ` Bill Davidsen
1 sibling, 0 replies; 23+ messages in thread
From: Erik Mouw @ 2005-04-20 15:57 UTC (permalink / raw)
To: linux; +Cc: git, linux-kernel, mingo
On Wed, Apr 20, 2005 at 08:41:15AM -0000, linux@horizon.com wrote:
> [A discussion on the git list about how to provide a hardlinked file
> that *cannot* me modified by an editor, but must be replaced by
> a new copy.]
Some time ago there was somebody working on copy-on-write links: once
you modify a cow-linked file, the file contents are copied, the file is
unlinked and you can safely work on the new file. It has some horrible
semantics in that the inode number of the opened file changes, I don't
know if applications are or should be aware of that.
Erik
--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: enforcing DB immutability
2005-04-20 8:41 enforcing DB immutability linux
2005-04-20 15:57 ` Erik Mouw
@ 2005-04-22 16:10 ` Bill Davidsen
1 sibling, 0 replies; 23+ messages in thread
From: Bill Davidsen @ 2005-04-22 16:10 UTC (permalink / raw)
To: linux; +Cc: git, linux-kernel, mingo
linux@horizon.com wrote:
> [A discussion on the git list about how to provide a hardlinked file
> that *cannot* me modified by an editor, but must be replaced by
> a new copy.]
>
> mingo@elte.hu wrote all of:
>
>>>>perhaps having a new 'immutable hardlink' feature in the Linux VFS
>>>>would help? I.e. a hardlink that can only be readonly followed, and
>>>>can be removed, but cannot be chmod-ed to a writeable hardlink. That i
>>>>think would be a large enough barrier for editors/build-tools not to
>>>>play the tricks they already do that makes 'readonly' files virtually
>>>>meaningless.
>>>
>>>immutable hardlinks have the following advantage: a hardlink by design
>>>hides the information where the link comes from. So even if an editor
>>>wanted to play stupid games and override the immutability - it doesnt
>>>know where the DB object is. (sure, it could find it if it wants to,
>>>but that needs real messing around - editors wont do _that_)
>>
>>so the only sensible thing the editor/tool can do when it wants to
>>change the file is precisely what we want: it will copy the hardlinked
>>files's contents to a new file, and will replace the old file with the
>>new file - a copy on write. No accidental corruption of the DB's
>>contents.
>
>
> This is not a horrible idea, but it touches on another sore point I've
> worried about for a while.
>
> The obvious way to do the above *without* changing anything is just to
> remove all write permission to the file. But because I'm the owner, some
> piece of software running with my permissions can just deicde to change
> the permissions back and modify the file anyway. Good old 7th edition
> let you give files away, which could have addressed that (chmod a-w; chown
> phantom_user), but BSD took that ability away to make accounting work.
>
> The upshot is that, while separate users keeps malware from harming the
> *system*, if I run a piece of malware, it can blow away every file I
> own and make me unhappy. When (notice I'm not saying "if") commercial
> spyware for Linux becomes common, it can also read every file I own.
>
> Unless I have root access, Linux is no safer *for me* than Redmondware!
>
> Since I *do* have root access, I often set up sandbox users and try
> commercial binaries in that environment, but it's a pain and laziness
> often wins. I want a feature that I can wrap in a script, so that I
> can run a commercial binary in a nicely restricted enviromment.
>
> Or maybe I even want to set up a "personal root" level, and run
> my normal interactive shells in a slightly restricted enviroment
> (within which I could make a more-restricted world to run untrusted
> binaries). Then I could solve the immutable DB issue by having a
> "setuid" binary that would make checked-in files unwriteable at my
> normal permission level.
>
> Obviously, a fundamental change to the Unix permissions model won't
> be available to solve short-term problems, but I thought I'd raise
> the issue to get people thinking about longer-term solutions.
chattr +i file
But the real problem is that you expect your editor to be smart enough
to diddle permissions (some aren't) or create a new file (some aren't
that either).
It sounds as if you're kind of using the wrong tool here, frankly.
You also don't understand hard links, they don't hide anything, the
inode number is there, which is exactly as much information as is in the
original link. And they are lots safer, since you can't wind up with
them pointing to a non-existent file, get them in circular loops, etc.
Okay, YOU probably wouldn't, but believe me semi-competent users
regularly these things.
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2005-04-27 8:10 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <425C3F12.9070606@zytor.com>
[not found] ` <Pine.LNX.4.58.0504121452330.4501@ppc970.osdl.org>
[not found] ` <20050412224027.GB20821@elte.hu>
[not found] ` <Pine.LNX.4.58.0504121554140.4501@ppc970.osdl.org>
[not found] ` <20050412230027.GA21759@elte.hu>
[not found] ` <20050412230729.GA22179@elte.hu>
[not found] ` <20050413111355.GB13865@elte.hu>
[not found] ` <425D4E1D.4040108@zytor.com>
[not found] ` <20050413165310.GA22428@elte.hu>
[not found] ` <425D4FB1.9040207@zytor.com>
[not found] ` <20050413171052.GA22711@elte.hu>
[not found] ` <Pine.LNX.4.58.0504131027210.4501@ppc970.osdl.org>
[not found] ` <20050413182909.GA25221@elte.hu>
[not found] ` <Pine.LNX.4.58.0504131144160.4501@ppc970.osdl.org>
2005-04-13 20:02 ` Index/hash order Ingo Molnar
2005-04-13 20:07 ` H. Peter Anvin
2005-04-13 20:15 ` Ingo Molnar
2005-04-13 20:18 ` Ingo Molnar
2005-04-13 20:21 ` Ingo Molnar
2005-04-13 20:26 ` Updated base64 patches H. Peter Anvin
2005-04-13 21:04 ` Index/hash order Linus Torvalds
2005-04-20 7:40 ` enforcing DB immutability Ingo Molnar
2005-04-20 7:49 ` Ingo Molnar
2005-04-20 7:53 ` Ingo Molnar
2005-04-20 8:58 ` Chris Wedgwood
2005-04-20 14:57 ` Nick Craig-Wood
2005-04-27 8:15 ` Wout
2005-04-13 20:15 ` Index/hash order Linus Torvalds
2005-04-13 20:28 ` Baruch Even
[not found] ` <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org>
2005-04-13 21:40 ` Florian Weimer
2005-04-13 22:11 ` Linus Torvalds
2005-04-13 22:48 ` Florian Weimer
2005-04-14 7:04 ` Ingo Molnar
2005-04-14 10:50 ` cache-cold repository performance Ingo Molnar
2005-04-20 8:41 enforcing DB immutability linux
2005-04-20 15:57 ` Erik Mouw
2005-04-22 16:10 ` Bill Davidsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).