* Re: Security problem [not found] <200606151709.22752.lan@academsoft.ru> @ 2006-06-16 0:12 ` Junio C Hamano 2006-06-16 2:28 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2006-06-16 0:12 UTC (permalink / raw) To: Alexander Litvinov; +Cc: git Alexander Litvinov <lan@academsoft.ru> writes: > Why does not git-checkout check if file content match name of the object ? Good point. We could do a few things: - entry.c:write_entry() could validate after read_sha1_file(). - read_sha1_file() could do the checking; this has performance implications, though. Cloning over git aware protocols validate the objects coming over the wire, so it may make sense to cheat and do the former, so that we do not have to pay the validation cost every time we access any object. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Security problem 2006-06-16 0:12 ` Security problem Junio C Hamano @ 2006-06-16 2:28 ` Linus Torvalds [not found] ` <200606160931.29553.lan@academsoft.ru> 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2006-06-16 2:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Alexander Litvinov, git On Thu, 15 Jun 2006, Junio C Hamano wrote: > > Alexander Litvinov <lan@academsoft.ru> writes: > > > Why does not git-checkout check if file content match name of the object ? > > Good point. We could do a few things: I missed the original mail. What's the problem? If this is about the remote end lying about the SHA1 name, it's a total non-issue for any of the native protocols, since the native protocols don't actually send SHA1 names at all, they just send the data (and we re-create the SHA1 name on receipt). So there's no way to have the name of an object not match its content, unless you have actual corruption (which is for git-fsck-object to find, not somethign that should slow down any normal operation), or if you use one of the dumb protocols. And if you use the dumb protocols, the data should probably be validated _there_ (by fetch(), rather than anywhere else). And for "rsync", you really don't have much choice apart from doing a full fsck, I suspect. So I don't see the security issue, unless you don't trust the local filesystem, in which case nothing git can do matters at all.. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <200606160931.29553.lan@academsoft.ru>]
* Re: Security problem [not found] ` <200606160931.29553.lan@academsoft.ru> @ 2006-06-16 2:56 ` Linus Torvalds 2006-06-16 3:54 ` Alexander Litvinov 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2006-06-16 2:56 UTC (permalink / raw) To: Alexander Litvinov; +Cc: Junio C Hamano, git On Fri, 16 Jun 2006, Alexander Litvinov wrote: > > I have found the ability to hack git repo. After this hacking people will > checkout hacked files from the "trusted" commit. Only git-fsck-objects will > complain at this. Right. If you can't trust your local filesystem, you are screwed. git-fsck-objects will notice when somebody has done something bad, but > Why does not git-checkout check if file content match name of the object ? Why would it? It really just slows things down, and if you don't trust your local repo, people can "hack" you much more easily by just generating a _proper_ tree with the _proper_ data, and git checkout checking the SHA1 wouldn't help at all. The way to security lies in using git-fsck-objects, together with an _external_ source of trust. For example, that external source of trust may be a signed tag, or, perhaps even more simply, just by saving off the top commit name on some trusted medium. But you do need a "point of trust" to start with. Without that, it's a lot easier to "hack" a git repo by doing echo 'Hacked file' > a git commit --amend a git prune and now the file "a" has changed to "Hacked file", and even git-fsck-objects can't tell that anything bad happened. (Btw, if you want to _hide_ the fact that "a" now contains "Hacked file", you do so by faking it in the index. You can have the checked-out copy say what it should say - ie "Usual file" - and if you don't want git to show you the difference to HEAD, you edit the .git/index file by hand so that the timestamp, size and inode matches the real SHA1, even though the _contents_ match "Usual file"). See? You do need to trust something. Normally you'd trust your own filesystem, but git certainly supports other forms of trust through either the native support for signed certificates in the form of tags, or any other form of external trust. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Security problem 2006-06-16 2:56 ` Linus Torvalds @ 2006-06-16 3:54 ` Alexander Litvinov 2006-06-16 5:00 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Alexander Litvinov @ 2006-06-16 3:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git > If you can't trust your local filesystem, you are screwed. You are right, I trust my file system. But if our team had central repo with ssh access to that machine, every developer can hack central repo. Whould git-clone/git-fetch warn me about this ? My own test with (another) local repo says: lan@lan:~/tmp/git/test> git clone 1 2 Generating pack... Done counting 3 objects. Deltifying 3 objects. 100% (3/3) done Total 3, written 3 (delta 0), reused 0 (delta 0) error: git-checkout-index: unable to read sha1 file of a (3609f20ebd357679b111783e8afaf36ec46427f3) It can't checkout object (3609f20ebd357679b111783e8afaf36ec46427f3 is the original file). It seems packed repos are safe from this point. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Security problem 2006-06-16 3:54 ` Alexander Litvinov @ 2006-06-16 5:00 ` Linus Torvalds 2006-06-16 5:37 ` Alexander Litvinov 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2006-06-16 5:00 UTC (permalink / raw) To: Alexander Litvinov; +Cc: Junio C Hamano, git On Fri, 16 Jun 2006, Alexander Litvinov wrote: > > You are right, I trust my file system. But if our team had central repo with > ssh access to that machine, every developer can hack central repo. > > Whould git-clone/git-fetch warn me about this ? Using the native protocol, yes. Using rsync, unless you explicitly fsck the result, no. > It can't checkout object (3609f20ebd357679b111783e8afaf36ec46427f3 is the > original file). It seems packed repos are safe from this point. Well, they may not be "safe" - you just need to work a _lot_ harder to corrupt a pack-file in any interesting manner. And again, git-fsck-objects would pick up any such thing going on. Anyway, what it boils down to is that anybody who has write access to a particular repository can certainly change the repo in "interesting" ways. However, there are various inherent safety valves in place that make it really hard to corrupt on a bigger scale. The first is that git-fsck-objects will definitely find any repository inconsistency, and to get around that, you either have to get around the basic properties of SHA-1 (ie break the hash) _or_ you have to actually change the repository so that it's still a valid repo, just with different content. So let's take a look at those two cases: - if you corrupt the repository, subsequent clones (or even pulls) from the corrupt repository simply won't work if you use the native protocol, because the native protocol doesn't actually trust anything but the actual contents (so if the contents won't match, then neither will the SHA1 names). So the corruption is pretty strictly limited to the _one_ repository that the attacker had write access to. So there's a pretty fundamental "corruption containment" part there. (Side note: there's no question that we might well be able to do better. A _malicious_ server could actually send a corrupt pack, and it's possible that a properly corrupted remote archive could cause even a "good" git-send-pack to just silently send a corrupt pack, so that you'd need to use "git-fsck-objects" on the receiving side to notice that you are missing objects, for example) - if the repository is good (ie fsck is fine), then obviously a "git pull" will also succeed. However, you can't _hide_ the data the way you tried to do: when the receiver checks out the most recent version, it will definitely use the data in the object, there's no way to get the server to serve different data in objects and in the working tree (because the server literally doesn't even send the working tree at all). So you can always convince somebody to pull from an "evil repository", and that's no different from committing a bug by mistake. But at least you can't try to hide the bug just in the object store and have it not show up in diffs and in checked-out copies. The latter case is true even with http and rsync, the actual pull event always pulls just the database, never any checked-out state (in fact, the common case is obviously to pull from a bare repository that doesn't even _have_ checked-out state). So you can't hide things in the index or in the checked-out state except in the filesystem that you have direct write access to. But yeah, I actually still personally do a fair number of "git-fsck-objects". I've never found anything that way since very early on (and back then, the real problem was rsync getting objects that weren't reachable), but I still do it. It makes me feel happier. Of course, bugs always happen. But I can pretty much guarantee that git is fundamentally harder to corrupt than most things. We've had git-fsck-cache since April 8th last year (or, put another way, literally since "Day 2" in git terms - it's the eight commit in the whole git history). Git also has an almost total lack of redundant information. There's basically no "duplicate" information in the repository format itself where you could hide something so that it wouldn't be noticed. In a checked-out project, the checked-out state itself is "duplicate information" (and that was where your "attack" tried to hide things), and there's the index (which is actually a much better and subtle place to hide things ;). But neither of them have any life outside of that particular repository. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Security problem 2006-06-16 5:00 ` Linus Torvalds @ 2006-06-16 5:37 ` Alexander Litvinov 2006-06-16 6:27 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Alexander Litvinov @ 2006-06-16 5:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git > Well, they may not be "safe" - you just need to work a _lot_ harder to > corrupt a pack-file in any interesting manner. And again, git-fsck-objects > would pick up any such thing going on. As it shown in pack-objects.c, each object have stored sha1, almost the same as file rename. > The first is that git-fsck-objects will definitely find any repository > inconsistency, and to get around that, you either have to get around the > basic properties of SHA-1 (ie break the hash) _or_ you have to actually > change the repository so that it's still a valid repo, just with different > content. I still belive SHA-1 is good enouth to hash files - I did not hear about generation reasonable duplicate that can compile and work :-) > - if you corrupt the repository, subsequent clones (or even pulls) from > the corrupt repository simply won't work if you use the native > protocol, because the native protocol doesn't actually trust anything > but the actual contents (so if the contents won't match, then neither > will the SHA1 names). So the corruption is pretty strictly limited to > the _one_ repository that the attacker had write access to. As I understand sent pack file will contains actial SHA-1 of objects. And any hack will be cleary visible. > So there's a pretty fundamental "corruption containment" part there. ... Situation with evil repo is clear to me: you can turst only to trusted commit identified by SHA-1 > But yeah, I actually still personally do a fair number of > "git-fsck-objects". I've never found anything that way since very early on > (and back then, the real problem was rsync getting objects that weren't > reachable), but I still do it. It makes me feel happier. As the result: Always fsck repo after pull/clone ! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Security problem 2006-06-16 5:37 ` Alexander Litvinov @ 2006-06-16 6:27 ` Linus Torvalds 2006-06-16 8:18 ` Alexander Litvinov 0 siblings, 1 reply; 8+ messages in thread From: Linus Torvalds @ 2006-06-16 6:27 UTC (permalink / raw) To: Alexander Litvinov; +Cc: Junio C Hamano, git On Fri, 16 Jun 2006, Alexander Litvinov wrote: > > > Well, they may not be "safe" - you just need to work a _lot_ harder to > > corrupt a pack-file in any interesting manner. And again, git-fsck-objects > > would pick up any such thing going on. > > As it shown in pack-objects.c, each object have stored sha1, almost the same > as file rename. Yes and no. The index file has the stored sha1 (and in that sense you can do almost the same thing as a file rename by just modifying the index file). But when we actually transfer a pack over from one place to another (ie a clone or a push), we don't even transfer the index file. Instead, the index file gets re-generated at the other end. That's pretty much an on-going theme in most of git - trying to avoid having metadata, if that can instead of calculated directly. So again, a "rsync" or a "http" thing that just gets the index and pack-files directly _as_files_, will actually also download a corrupt file. The git native protocol is much harder to fool. git-fsck-objects actually verifies the pack-files and index files in several ways: - both the pack-file and the index-file actually contain a SHA1 checksum of themselves, so any accidental corruption will be picked up (but if somebody is able to get at the filesystem, they can obviously re-calculate the SHA1 and update the checksum too) - the index file also contains the SHA-1 of the pack-file (and that is then part of the checksum of the index file), again to avoid accidental corruption or mixing of index and pack-files. - fsck checks all of these internal SHA-1 checksums, and verifies basic information (ie number of objects must match etc) - each object in the index file is unpacked, and its SHA-1 is re-calculated and checked against what the index file claimed. So exactly as with individual objects, the pack-files are actually verified, and on (native-mode) transfer, the names of individual files are never actually transferred, rather they are re-calculated from the raw contents at the receiving end. The pack-files then have a few additional sanity-checks of their own that should help pinpoint at least the accidental kind of corruption. But no, the SHA1 checksums of the pack-files are not checked by normal operations. That would be deadly - trying to check the SHA1 hash of a pack-file obviously would involve reading it all in, something normal operations actually try to avoid (normal ops use the index exactly in order to only read the parts they need). Perhaps most importantly, after fsck has checked the SHA-1's of each individual object, it will also do a full reachability check. That, in many ways, is even more important than checking that each object name matches its contents (ie there's no missing history either, and the "tips" of the repository end up basically validating all the rest). So again, the thing is set up so that doing a full fsck actually does a _lot_ of integrity checking. But in the absense of explicit fsck, we do trust the data, even if the actual _transfer_ of data will recalculate SHA-1's. > > - if you corrupt the repository, subsequent clones (or even pulls) from > > the corrupt repository simply won't work if you use the native > > protocol, because the native protocol doesn't actually trust anything > > but the actual contents (so if the contents won't match, then neither > > will the SHA1 names). So the corruption is pretty strictly limited to > > the _one_ repository that the attacker had write access to. > > As I understand sent pack file will contains actial SHA-1 of objects. And any > hack will be cleary visible. No, as mentioned, the actual SHA-1's won't ever be sent, so what happens is that if the repository on the sending side was hacked, the _sending_ side may never even realize it (since it's not necessarily checking the SHA-1's), but the receiving side will only ever see the raw data, and as such, it won't ever even _see_ the "false hidden names", because it will generate a whole new index that purely depends on the data. And maybe that's exactly what you meant - yes, the hack will be clearly visible, because the names will now be the "real" ones. You can't hide things by using a false name. > > So there's a pretty fundamental "corruption containment" part there. > ... > Situation with evil repo is clear to me: you can turst only to trusted commit > identified by SHA-1 Yes. Exactly. And once you have a reason to trust a commit, everything you can reach from that commit is also trustworthy, assuming it passes fsck. IOW, you only really need to trust the head(s) in your repository. > > But yeah, I actually still personally do a fair number of > > "git-fsck-objects". I've never found anything that way since very early on > > (and back then, the real problem was rsync getting objects that weren't > > reachable), but I still do it. It makes me feel happier. > > As the result: Always fsck repo after pull/clone ! Well, even better, try to avoid pulling from untrusted sources in the first place ;) But yes, fsck is actually fairly fast if you do incremental pulls and repack your repository. To help you do this, there's two modes to fsck: there's the "full mode", which goes through _everything_, including pack-files, and there's the "fsck only lose objects", which is the common one. So for example, let's say that you only ever repack your repository locally when it's been "known good" (in fact, repacking in itself will generally find almost all of the problems that fsck can find, since a full repack will obviously do the reachability analysis as part of just the preparatory work). That means that you only ever need to do the quick default "light fsck" after a pull, since an incremental pull (with the native protocol) will have unpacked all the pulled objects. So "fsck after each pull" is not something we do by default, but if you keep your repo fairly packed, doing so manually (or by just scripting things) won't even really slow you down, because it will only ever need to check incrementally - the stuff you've re-packed it doesn't need to check (assuming you can now trust your local filesystem). So git certainly gives you the option to be really anal, and doesn't even make it needlessly hard or expensive, even with large repositories. Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Security problem 2006-06-16 6:27 ` Linus Torvalds @ 2006-06-16 8:18 ` Alexander Litvinov 0 siblings, 0 replies; 8+ messages in thread From: Alexander Litvinov @ 2006-06-16 8:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git > So git certainly gives you the option to be really anal, and doesn't even > make it needlessly hard or expensive, even with large repositories. Thanks for detailed description. Now I can sleep without any worry about my repo :-) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-06-16 8:19 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200606151709.22752.lan@academsoft.ru>
2006-06-16 0:12 ` Security problem Junio C Hamano
2006-06-16 2:28 ` Linus Torvalds
[not found] ` <200606160931.29553.lan@academsoft.ru>
2006-06-16 2:56 ` Linus Torvalds
2006-06-16 3:54 ` Alexander Litvinov
2006-06-16 5:00 ` Linus Torvalds
2006-06-16 5:37 ` Alexander Litvinov
2006-06-16 6:27 ` Linus Torvalds
2006-06-16 8:18 ` Alexander Litvinov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox