* Careful object writing..
@ 2005-05-03 19:15 Linus Torvalds
2005-05-03 19:27 ` Chris Wedgwood
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Linus Torvalds @ 2005-05-03 19:15 UTC (permalink / raw)
To: Git Mailing List
I just pushed out the commit that tries to finally actually write the sha1
objects the right way in a shared object directory environment.
I used to be lazy, and just do "O_CREAT | O_EXCL" on the final name, but
that obviously is not very nice when it can result in other people seeing
objects that haven't been fully finalized yet.
So now I do it "right", and create a temporary file in the "top" object
directory, and then when it's all done, I do a "link()" to the final place
and unlink the original.
I also change the permission to 0444 before it gets its final name.
Two notes:
- because the objects all get created initially in .git/objects rather
than in the subdirectory they get moved to, you can't use symlinks
to other filesystems for the 256 object subdirectories. The object
directory has to be one filesystem (but it doesn't have to be the same
one as you actually keep your working ddirectories on, of course)
- The upside of this is that filesystem block allocators should do the
right thing. Instead of spreading the objects out (because they are in
different directories), they should be created together.
Anyway, somebody should double-check the thing. It _should_ now work
correctly over NFS etc too, and everything should be nice and atomic (and
with any half-way decent filesystem, it also means that even if you have a
system crash in the middle, you'll never see half-created objects).
NOTE NOTE NOTE! I have _not_ updated all the helper stuff that also write
objects. So things like "git-http-pull" etc will still write objects
directly into the object directory, and that can cause problems with
shared usage. Same goes for "write_sha1_from_fd()" that rpull.c uses. I
hope somebody will take a look at those issues..
Anyway, at least the really core operations should now really be
"thread-safe" in a shared object directory environment.
Linus
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: Careful object writing.. 2005-05-03 19:15 Careful object writing Linus Torvalds @ 2005-05-03 19:27 ` Chris Wedgwood 2005-05-03 19:47 ` Linus Torvalds 2005-05-03 20:02 ` Daniel Barkalow 2005-05-03 20:00 ` Jan Harkes ` (2 subsequent siblings) 3 siblings, 2 replies; 19+ messages in thread From: Chris Wedgwood @ 2005-05-03 19:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Tue, May 03, 2005 at 12:15:08PM -0700, Linus Torvalds wrote: > So now I do it "right", and create a temporary file in the "top" > object directory, and then when it's all done, I do a "link()" to > the final place and unlink the original. how is this better than a single rename? i take it there is something fundamental from clue.101 i slept though here? also, if you are *really* paranoid you want to fsync *before* you do the link/unklink or rename --- which is what MTAs do[1] however, that said it *kills* performance and if it's not critical it's really a terrible idea also, shouldn't HEAD (and similar)[2] be updated with a temporary and a rename too? > I also change the permission to 0444 before it gets its final name. cool > NOTE NOTE NOTE! I have _not_ updated all the helper stuff that also > write objects. i thought this was all common code? if it's not maybe now is the time to change that? [1] yes, i know this depends on the fs used and various things and ext3 should be fine, blah blah blah, but not everyone uses ext3 and quite probably not everyone will use git under Linux [2] i didn't check the code as i'm still using BK in places ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 19:27 ` Chris Wedgwood @ 2005-05-03 19:47 ` Linus Torvalds 2005-05-03 19:47 ` Chris Wedgwood 2005-05-03 20:02 ` Daniel Barkalow 1 sibling, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2005-05-03 19:47 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Git Mailing List On Tue, 3 May 2005, Chris Wedgwood wrote: > > how is this better than a single rename? i take it there is something > fundamental from clue.101 i slept though here? A rename will overwrite any old object, which means that you cannot do any collision checks. In contrast, a "link()" will return EEXIST if somebody else raced with you and created a new object, and you can do collision checks instead of overwriting another persons object. > also, if you are *really* paranoid you want to fsync *before* you do > the link/unklink or rename --- which is what MTAs do[1] Me, I refuse to slow down my habits for old filesystems. You can either fsck, or use a logging filesystem. I don't see anybody not using logging filesystems these days, so.. > also, shouldn't HEAD (and similar)[2] be updated with a temporary and > a rename too? Maybe. Much less important, though. > > NOTE NOTE NOTE! I have _not_ updated all the helper stuff that also > > write objects. > > i thought this was all common code? if it's not maybe now is the time > to change that? It is all common code, except: - things like "fetch from another host" will use rsync/wget/xxx to actually get the files. To those programs, we're not talking about git objects, we're just talking "regular files" - rpull.c has a special different routine to write its objects. I don't use it, so.. Anyway, it should be reasonably easily fixable. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 19:47 ` Linus Torvalds @ 2005-05-03 19:47 ` Chris Wedgwood 2005-05-03 19:56 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Chris Wedgwood @ 2005-05-03 19:47 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Tue, May 03, 2005 at 12:47:36PM -0700, Linus Torvalds wrote: > Me, I refuse to slow down my habits for old filesystems. You can > either fsck, or use a logging filesystem. ok, so you're saying everyone use linux ext3 or similar more or less... how about we drop all the objects in one directory then? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 19:47 ` Chris Wedgwood @ 2005-05-03 19:56 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2005-05-03 19:56 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Git Mailing List On Tue, 3 May 2005, Chris Wedgwood wrote: > > On Tue, May 03, 2005 at 12:47:36PM -0700, Linus Torvalds wrote: > > > Me, I refuse to slow down my habits for old filesystems. You can > > either fsck, or use a logging filesystem. > > ok, so you're saying everyone use linux ext3 or similar more or > less... No. I'm saying - you can use git-fsck-cache - or you can use a logging filesystem. I happen to use both. > how about we drop all the objects in one directory then? I don't even have directory hashing on, and as mentioned, the logging filesystem is _not_ a requirement. It's just a reality for most of us. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 19:27 ` Chris Wedgwood 2005-05-03 19:47 ` Linus Torvalds @ 2005-05-03 20:02 ` Daniel Barkalow 1 sibling, 0 replies; 19+ messages in thread From: Daniel Barkalow @ 2005-05-03 20:02 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Linus Torvalds, Git Mailing List On Tue, 3 May 2005, Chris Wedgwood wrote: > i thought this was all common code? if it's not maybe now is the time > to change that? The other versions are getting the tagged compressed object along with the intended hash from an external source. They both verify the result as they go (as opposed to the normal case which takes the uncompressed data and finds the hash, and therefore has to be right). The common part is really "open the file" and "close (and place) the file", which weren't previously sufficiently complex to justify sharing the code. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 19:15 Careful object writing Linus Torvalds 2005-05-03 19:27 ` Chris Wedgwood @ 2005-05-03 20:00 ` Jan Harkes 2005-05-03 20:11 ` Linus Torvalds 2005-05-03 22:37 ` Linus Torvalds 2005-05-03 23:04 ` Alex Riesen 2005-05-04 4:07 ` [PATCH] Careful object pulling Daniel Barkalow 3 siblings, 2 replies; 19+ messages in thread From: Jan Harkes @ 2005-05-03 20:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Tue, May 03, 2005 at 12:15:08PM -0700, Linus Torvalds wrote: > I just pushed out the commit that tries to finally actually write the sha1 > objects the right way in a shared object directory environment. > > I used to be lazy, and just do "O_CREAT | O_EXCL" on the final name, but > that obviously is not very nice when it can result in other people seeing > objects that haven't been fully finalized yet. > > So now I do it "right", and create a temporary file in the "top" object > directory, and then when it's all done, I do a "link()" to the final place > and unlink the original. Annoyingly until this commit, git has been just about the perfect SCM system to run on top of Coda. Almost every other SCM can and will get conflicts on it's repository files, which are pretty much impossible to resolve (just try merging two diverging copies of an RCS archive..) But the only conflicts we ever see with git are when two people create the same SHA1 object. And if the contents are in fact identical this conflict will be trivially resolved. I tried to pull in the latest version of your tree, but it doesn't look like this commit has propagated to rsync.kernel.org yet. Hopefully you will accept a small patch (should be < 5 lines) that makes git work nicely when Coda complains about the cross-directory hardlink without affecting the reliability of using link/unlink on normal filesystems. Jan > > I also change the permission to 0444 before it gets its final name. > > Two notes: > > - because the objects all get created initially in .git/objects rather > than in the subdirectory they get moved to, you can't use symlinks > to other filesystems for the 256 object subdirectories. The object > directory has to be one filesystem (but it doesn't have to be the same > one as you actually keep your working ddirectories on, of course) > > - The upside of this is that filesystem block allocators should do the > right thing. Instead of spreading the objects out (because they are in > different directories), they should be created together. > > Anyway, somebody should double-check the thing. It _should_ now work > correctly over NFS etc too, and everything should be nice and atomic (and > with any half-way decent filesystem, it also means that even if you have a > system crash in the middle, you'll never see half-created objects). > > NOTE NOTE NOTE! I have _not_ updated all the helper stuff that also write > objects. So things like "git-http-pull" etc will still write objects > directly into the object directory, and that can cause problems with > shared usage. Same goes for "write_sha1_from_fd()" that rpull.c uses. I > hope somebody will take a look at those issues.. > > Anyway, at least the really core operations should now really be > "thread-safe" in a shared object directory environment. > > Linus > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 20:00 ` Jan Harkes @ 2005-05-03 20:11 ` Linus Torvalds 2005-05-03 20:59 ` Jan Harkes 2005-05-03 22:37 ` Linus Torvalds 1 sibling, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2005-05-03 20:11 UTC (permalink / raw) To: Jan Harkes; +Cc: Git Mailing List On Tue, 3 May 2005, Jan Harkes wrote: > > I tried to pull in the latest version of your tree, but it doesn't look > like this commit has propagated to rsync.kernel.org yet. Hopefully you > will accept a small patch (should be < 5 lines) that makes git work > nicely when Coda complains about the cross-directory hardlink without > affecting the reliability of using link/unlink on normal filesystems. What is it that coda wants to do, and is there some portable way to get there? Is it just that you want to stay within the directory? Or is it any link action that is nasty? What makes resolving renames hard when the file contents are the same? Maybe Coda could just do a trivial resolve of that "conflict" too? Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 20:11 ` Linus Torvalds @ 2005-05-03 20:59 ` Jan Harkes 2005-05-03 22:13 ` Linus Torvalds 0 siblings, 1 reply; 19+ messages in thread From: Jan Harkes @ 2005-05-03 20:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On Tue, May 03, 2005 at 01:11:47PM -0700, Linus Torvalds wrote: > On Tue, 3 May 2005, Jan Harkes wrote: > > I tried to pull in the latest version of your tree, but it doesn't look > > like this commit has propagated to rsync.kernel.org yet. Hopefully you > > will accept a small patch (should be < 5 lines) that makes git work > > nicely when Coda complains about the cross-directory hardlink without > > affecting the reliability of using link/unlink on normal filesystems. > > What is it that coda wants to do, and is there some portable way to get > there? Short summary: rc = link(old, new); if (rc == -1 && errno == EXDEV) rc = rename(old, new); On Coda, the cross-directory link fails, the following cross-directory rename will work fine. On a normal filesystem, if the link fails with EXDEV, the rename will fail with the same. Because our cache consistency model is fairly optimistic, we already have to deal with potential problems with a rename removing an unwanted target. So if we are logging write operations, and the link operation did not return EEXISTS, then the rename will be marked as not having removed any target file. If the target did happen to exist on the server by the time we reintegrate the operations we end up with a reintegration conflict. Longer version: When a server performs conflict resolution it happens on a per-directory basis. So any cross-directory operation already a special case. We cannot guarantee which directory will be resolved first, so if it is the destination of a link or rename the object itself might not exist yet. The advantage of a rename operation is that it contains a reference to both the source and the destination directories. If we don't yet know the renamed object we resolve the source first. That creates the object and allows us to complete the rename operation. However with a link we only have a reference to an object and the directory where the link should be added. But again, the object might not yet exist on all servers. At this point things get a bit more complicated because we don't enforce access based on per object UNIX mode bits, but rely on directory ACLs. So we can't just add a reference to an unknown object in the destination directory because until we know where else this object is located, we can't tell if the user actually is allowed to access the object. > Is it just that you want to stay within the directory? Or is it any link > action that is nasty? We do allow links within the same directory, mostly because that often happens in places like /usr/bin and we know that whenever we encounter the link operation in the resolution log, that the object creation has already been processed. We also know that the new link can't give a user any rights he didn't already have. > What makes resolving renames hard when the file contents are the same? Renames mostly work, there are only a few corner cases left. One is where something is moved up in the directory tree and the source directory is then removed. We end up screwing ourselves because we append the childs logs on removal and the create operation ends up behind the rename operation. But that is a dumb implementation problem. Another issue is of course when someone (validly) hardlinks a file in the same directory and then moves one of the links to another directory. We definitely are not a typical filesystem with UNIX semantics, which is why it is unusual to find an application that seems so well suited for disconnected and weakly connected operation. Jan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 20:59 ` Jan Harkes @ 2005-05-03 22:13 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2005-05-03 22:13 UTC (permalink / raw) To: Jan Harkes; +Cc: Git Mailing List On Tue, 3 May 2005, Jan Harkes wrote: > > Short summary: > > rc = link(old, new); > if (rc == -1 && errno == EXDEV) > rc = rename(old, new); Ok, that is safe enough. Will do. > On Coda, the cross-directory link fails, the following cross-directory > rename will work fine. On a normal filesystem, if the link fails with > EXDEV, the rename will fail with the same. Yup. I do suspect that since you handle the rename anyway, you probably could handle the git link/unlink patterns too, but it's easy enough to just do the rename fallback in git itself. The only reason not to use rename in the first place is literally just to be able to check for collisions. Which we don't actually _do_ right now, but I like to be able to do so in theory. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 20:00 ` Jan Harkes 2005-05-03 20:11 ` Linus Torvalds @ 2005-05-03 22:37 ` Linus Torvalds 2005-05-03 22:40 ` H. Peter Anvin 1 sibling, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2005-05-03 22:37 UTC (permalink / raw) To: Jan Harkes, Peter Anvin; +Cc: Git Mailing List On Tue, 3 May 2005, Jan Harkes wrote: > > I tried to pull in the latest version of your tree, but it doesn't look > like this commit has propagated to rsync.kernel.org yet. Hmm.. It's still not there a few hours later. I wonder what the mirroring rules are. Or maybe mirroring is just broken right now. Peter? One change introduced by me is that the new objects changed from 0664 (-rw-rw-r--) to (0444) -r--r--r-- due to the object writing rules. Maybe the mirroring decides that such objects shouldn't be mirrored, since they are "private"? Or maybe it's just that Peter has shut down mirroring in preparation for the imminent memory upgrade on master.kernel.org. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 22:37 ` Linus Torvalds @ 2005-05-03 22:40 ` H. Peter Anvin 0 siblings, 0 replies; 19+ messages in thread From: H. Peter Anvin @ 2005-05-03 22:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jan Harkes, Git Mailing List Linus Torvalds wrote: > > On Tue, 3 May 2005, Jan Harkes wrote: > >>I tried to pull in the latest version of your tree, but it doesn't look >>like this commit has propagated to rsync.kernel.org yet. > > > Hmm.. It's still not there a few hours later. I wonder what the mirroring > rules are. Or maybe mirroring is just broken right now. Peter? > > One change introduced by me is that the new objects changed from 0664 > (-rw-rw-r--) to (0444) -r--r--r-- due to the object writing rules. Maybe > the mirroring decides that such objects shouldn't be mirrored, since they > are "private"? > > Or maybe it's just that Peter has shut down mirroring in preparation for > the imminent memory upgrade on master.kernel.org. > No, I had stopped the cron job to fix a script bug and forgot to turn it back on. It's pushing now. -hpa ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 19:15 Careful object writing Linus Torvalds 2005-05-03 19:27 ` Chris Wedgwood 2005-05-03 20:00 ` Jan Harkes @ 2005-05-03 23:04 ` Alex Riesen 2005-05-03 23:22 ` Linus Torvalds 2005-05-03 23:22 ` Junio C Hamano 2005-05-04 4:07 ` [PATCH] Careful object pulling Daniel Barkalow 3 siblings, 2 replies; 19+ messages in thread From: Alex Riesen @ 2005-05-03 23:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On 5/3/05, Linus Torvalds <torvalds@osdl.org> wrote: > I also change the permission to 0444 before it gets its final name. Maybe umask it first? Just in case. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 23:04 ` Alex Riesen @ 2005-05-03 23:22 ` Linus Torvalds 2005-05-03 23:25 ` Alex Riesen 2005-05-03 23:22 ` Junio C Hamano 1 sibling, 1 reply; 19+ messages in thread From: Linus Torvalds @ 2005-05-03 23:22 UTC (permalink / raw) To: Alex Riesen; +Cc: Git Mailing List On Wed, 4 May 2005, Alex Riesen wrote: > > On 5/3/05, Linus Torvalds <torvalds@osdl.org> wrote: > > I also change the permission to 0444 before it gets its final name. > > Maybe umask it first? Just in case. I considered it, but it's not worth it. If you don't want somebody else to see your objects, you should just disable execute permission on your .git directory. "umask" is actually a fairly nasty interface, since it takes effect on create(), and we do want to fchmod _after_ the create (some filesystems don't like it when you create a non-writable object and then write to it, but more importantly, since we use "mkstemp()" for the temp-file handling, we don't even have control of the initial umask. So to make it (0444 & umask) git would actually have to jump through silly hoops. Without actually buying you anything new, since the access permissions for git objects really are about the _directory_ anyway. Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 23:22 ` Linus Torvalds @ 2005-05-03 23:25 ` Alex Riesen 0 siblings, 0 replies; 19+ messages in thread From: Alex Riesen @ 2005-05-03 23:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List On 5/4/05, Linus Torvalds <torvalds@osdl.org> wrote: > > > I also change the permission to 0444 before it gets its final name. > > Maybe umask it first? Just in case. > > I considered it, but it's not worth it. > > If you don't want somebody else to see your objects, you should just > disable execute permission on your .git directory. ... Of course. It's leaves even more control to the user. Forget I asked :) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Careful object writing.. 2005-05-03 23:04 ` Alex Riesen 2005-05-03 23:22 ` Linus Torvalds @ 2005-05-03 23:22 ` Junio C Hamano 1 sibling, 0 replies; 19+ messages in thread From: Junio C Hamano @ 2005-05-03 23:22 UTC (permalink / raw) To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List >>>>> "AR" == Alex Riesen <raa.lkml@gmail.com> writes: AR> On 5/3/05, Linus Torvalds <torvalds@osdl.org> wrote: >> I also change the permission to 0444 before it gets its final name. AR> Maybe umask it first? Just in case. In general worrying about umask when you see chmod is a good practice, but it probably is not applicable to this particular case. A person with umask 077 should still get 0444 if the SHA1_FILE_DIRECTORY is shared with other people, and if it is not shared, his initial git-init-db would have made it 0700, so it does not matter it the files files underneath it have 0444. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] Careful object pulling 2005-05-03 19:15 Careful object writing Linus Torvalds ` (2 preceding siblings ...) 2005-05-03 23:04 ` Alex Riesen @ 2005-05-04 4:07 ` Daniel Barkalow 2005-05-04 9:35 ` Morten Welinder 3 siblings, 1 reply; 19+ messages in thread From: Daniel Barkalow @ 2005-05-04 4:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List This splits out the careful methods for writing a file and placing it in the correct location from the function to write a buffer to the file and makes the various functions used by git-*-pull programs use those functions to write their files. Signed-off-by: Daniel Barkalow <barkalow@iabervon.org> Index: cache.h =================================================================== --- 51a882a2dc62e0d3cdc79e0badc61559fb723481/cache.h (mode:100644 sha1:8dd812827604d510038f5d93e3718c43f9d12c30) +++ 4e31436bacfff09ce673665a1061b41e37ffd661/cache.h (mode:100644 sha1:0d7411c3b86a899cee45627997f4bb7ba0df2ea7) @@ -134,6 +134,12 @@ extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size); extern int write_sha1_file(char *buf, unsigned long len, const char *type, unsigned char *return_sha1); +/* Open a file for writing a sha1 file. */ +extern int open_sha1_tmpfile(char tmpfile[PATH_MAX]); + +/* Put a sha1 file in the correct place. */ +extern int place_sha1_file(char tmpfile[PATH_MAX], const unsigned char *sha1); + extern int check_sha1_signature(unsigned char *sha1, void *buf, unsigned long size, const char *type); /* Read a tree into the cache */ Index: http-pull.c =================================================================== --- 51a882a2dc62e0d3cdc79e0badc61559fb723481/http-pull.c (mode:100644 sha1:f693aba61b4dcb4b738faf1335ec74aa1545e45d) +++ 4e31436bacfff09ce673665a1061b41e37ffd661/http-pull.c (mode:100644 sha1:793d8b9e125c1f164f774e2888d26beaa75e1df0) @@ -52,8 +52,9 @@ char real_sha1[20]; char *url; char *posn; + char tmpname[PATH_MAX]; - local = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666); + local = open_sha1_tmpfile(tmpname); if (local < 0) return error("Couldn't open %s\n", filename); @@ -88,15 +89,14 @@ inflateEnd(&stream); SHA1_Final(real_sha1, &c); if (zret != Z_STREAM_END) { - unlink(filename); + unlink(tmpname); return error("File %s (%s) corrupt\n", hex, url); } if (memcmp(sha1, real_sha1, 20)) { - unlink(filename); + unlink(tmpname); return error("File %s has bad hash\n", hex); } - - return 0; + return place_sha1_file(tmpname, sha1); } int main(int argc, char **argv) Index: sha1_file.c =================================================================== --- 51a882a2dc62e0d3cdc79e0badc61559fb723481/sha1_file.c (mode:100644 sha1:e6ce455ae90bd430f2128f454bdb6e0575412486) +++ 4e31436bacfff09ce673665a1061b41e37ffd661/sha1_file.c (mode:100644 sha1:85daa0b0045c3f19e697d1a7aa8ab15ff54eab99) @@ -276,6 +276,55 @@ } } +int open_sha1_tmpfile(char tmpfile[PATH_MAX]) +{ + int fd; + + snprintf(tmpfile, sizeof(tmpfile), "%s/obj_XXXXXX", get_object_directory()); + fd = mkstemp(tmpfile); + if (fd < 0) { + fprintf(stderr, + "unable to create temporary sha1 filename %s: %s", + tmpfile, strerror(errno)); + return -1; + } + return fd; +} + +int place_sha1_file(char tmpfile[PATH_MAX], const unsigned char *sha1) +{ + char *filename = sha1_file_name(sha1); + int ret; + + ret = link(tmpfile, filename); + if (ret < 0) { + ret = errno; + + /* + * Coda hack - coda doesn't like cross-directory links, + * so we fall back to a rename, which will mean that it + * won't be able to check collisions, but that's not a + * big deal. + * + * When this succeeds, we just return 0. We have nothing + * left to unlink. + */ + if (ret == EXDEV && !rename(tmpfile, filename)) + return 0; + } + unlink(tmpfile); + if (ret) { + if (ret != EEXIST) { + fprintf(stderr, + "unable to write sha1 filename %s: %s", + filename, strerror(ret)); + return -1; + } + /* FIXME!!! Collision check here ? */ + } + return 0; +} + int write_sha1_file(char *buf, unsigned long len, const char *type, unsigned char *returnsha1) { int size; @@ -286,7 +335,7 @@ char *filename; static char tmpfile[PATH_MAX]; char hdr[50]; - int fd, hdrlen, ret; + int fd, hdrlen; /* Generate the header */ hdrlen = sprintf(hdr, "%s %lu", type, len)+1; @@ -316,12 +365,9 @@ return -1; } - snprintf(tmpfile, sizeof(tmpfile), "%s/obj_XXXXXX", get_object_directory()); - fd = mkstemp(tmpfile); - if (fd < 0) { - fprintf(stderr, "unable to create temporary sha1 filename %s: %s", tmpfile, strerror(errno)); - return -1; - } + fd = open_sha1_tmpfile(tmpfile); + if (fd < 0) + return fd; /* Set it up */ memset(&stream, 0, sizeof(stream)); @@ -349,53 +395,28 @@ if (write(fd, compressed, size) != size) die("unable to write file"); + fchmod(fd, 0444); close(fd); - ret = link(tmpfile, filename); - if (ret < 0) { - ret = errno; - - /* - * Coda hack - coda doesn't like cross-directory links, - * so we fall back to a rename, which will mean that it - * won't be able to check collisions, but that's not a - * big deal. - * - * When this succeeds, we just return 0. We have nothing - * left to unlink. - */ - if (ret == EXDEV && !rename(tmpfile, filename)) - return 0; - } - unlink(tmpfile); - if (ret) { - if (ret != EEXIST) { - fprintf(stderr, "unable to write sha1 filename %s: %s", filename, strerror(ret)); - return -1; - } - /* FIXME!!! Collision check here ? */ - } - - return 0; + return place_sha1_file(tmpfile, sha1); } int write_sha1_from_fd(const unsigned char *sha1, int fd) { - char *filename = sha1_file_name(sha1); - int local; z_stream stream; unsigned char real_sha1[20]; char buf[4096]; char discard[4096]; + char tmpname[PATH_MAX]; int ret; SHA_CTX c; - local = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666); + local = open_sha1_tmpfile(tmpname); if (local < 0) - return error("Couldn't open %s\n", filename); + return -1; memset(&stream, 0, sizeof(stream)); @@ -408,7 +429,7 @@ size = read(fd, buf, 4096); if (size <= 0) { close(local); - unlink(filename); + unlink(tmpname); if (!size) return error("Connection closed?"); perror("Reading from connection"); @@ -431,15 +452,15 @@ close(local); SHA1_Final(real_sha1, &c); if (ret != Z_STREAM_END) { - unlink(filename); + unlink(tmpname); return error("File %s corrupted", sha1_to_hex(sha1)); } if (memcmp(sha1, real_sha1, 20)) { - unlink(filename); + unlink(tmpname); return error("File %s has bad hash\n", sha1_to_hex(sha1)); } - - return 0; + + return place_sha1_file(tmpname, sha1); } int has_sha1_file(const unsigned char *sha1) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Careful object pulling 2005-05-04 4:07 ` [PATCH] Careful object pulling Daniel Barkalow @ 2005-05-04 9:35 ` Morten Welinder 2005-05-04 16:16 ` Daniel Barkalow 0 siblings, 1 reply; 19+ messages in thread From: Morten Welinder @ 2005-05-04 9:35 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Linus Torvalds, Git Mailing List Something's fishy there. You are comparing the result from link with EEXIST. Morten ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Careful object pulling 2005-05-04 9:35 ` Morten Welinder @ 2005-05-04 16:16 ` Daniel Barkalow 0 siblings, 0 replies; 19+ messages in thread From: Daniel Barkalow @ 2005-05-04 16:16 UTC (permalink / raw) To: Morten Welinder; +Cc: Linus Torvalds, Git Mailing List On Wed, 4 May 2005, Morten Welinder wrote: > Something's fishy there. You are comparing the result from link with EEXIST. No, it just looks that way. If ret is negative, errno gets written to it. If it's zero, we don't do anything with it. It can't be positive. So, at the point where we test it, it must be the errno from link, which would be EEXIST if the case we're worried about. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2005-05-04 16:10 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-05-03 19:15 Careful object writing Linus Torvalds 2005-05-03 19:27 ` Chris Wedgwood 2005-05-03 19:47 ` Linus Torvalds 2005-05-03 19:47 ` Chris Wedgwood 2005-05-03 19:56 ` Linus Torvalds 2005-05-03 20:02 ` Daniel Barkalow 2005-05-03 20:00 ` Jan Harkes 2005-05-03 20:11 ` Linus Torvalds 2005-05-03 20:59 ` Jan Harkes 2005-05-03 22:13 ` Linus Torvalds 2005-05-03 22:37 ` Linus Torvalds 2005-05-03 22:40 ` H. Peter Anvin 2005-05-03 23:04 ` Alex Riesen 2005-05-03 23:22 ` Linus Torvalds 2005-05-03 23:25 ` Alex Riesen 2005-05-03 23:22 ` Junio C Hamano 2005-05-04 4:07 ` [PATCH] Careful object pulling Daniel Barkalow 2005-05-04 9:35 ` Morten Welinder 2005-05-04 16:16 ` Daniel Barkalow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).