Careful object writing..

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Careful object writing..
@ 2005-05-03 19:15 Linus Torvalds
  2005-05-03 19:27 ` Chris Wedgwood
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Linus Torvalds @ 2005-05-03 19:15 UTC (permalink / raw)
  To: Git Mailing List

I just pushed out the commit that tries to finally actually write the sha1
objects the right way in a shared object directory environment.

I used to be lazy, and just do "O_CREAT | O_EXCL" on the final name, but
that obviously is not very nice when it can result in other people seeing
objects that haven't been fully finalized yet.

So now I do it "right", and create a temporary file in the "top" object
directory, and then when it's all done, I do a "link()" to the final place
and unlink the original.

I also change the permission to 0444 before it gets its final name.

Two notes:

 - because the objects all get created initially in .git/objects rather 
   than in the subdirectory they get moved to, you can't use symlinks 
   to other filesystems for the 256 object subdirectories. The object 
   directory has to be one filesystem (but it doesn't have to be the same 
   one as you actually keep your working ddirectories on, of course)

 - The upside of this is that filesystem block allocators should do the 
   right thing. Instead of spreading the objects out (because they are in 
   different directories), they should be created together.

Anyway, somebody should double-check the thing. It _should_ now work
correctly over NFS etc too, and everything should be nice and atomic (and
with any half-way decent filesystem, it also means that even if you have a
system crash in the middle, you'll never see half-created objects).

NOTE NOTE NOTE! I have _not_ updated all the helper stuff that also write 
objects. So things like "git-http-pull" etc will still write objects 
directly into the object directory, and that can cause problems with 
shared usage. Same goes for "write_sha1_from_fd()" that rpull.c uses. I 
hope somebody will take a look at those issues..

Anyway, at least the really core operations should now really be
"thread-safe" in a shared object directory environment.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 19:15 Careful object writing Linus Torvalds
@ 2005-05-03 19:27 ` Chris Wedgwood
  2005-05-03 19:47   ` Linus Torvalds
  2005-05-03 20:02   ` Daniel Barkalow
  2005-05-03 20:00 ` Jan Harkes
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 19+ messages in thread
From: Chris Wedgwood @ 2005-05-03 19:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Tue, May 03, 2005 at 12:15:08PM -0700, Linus Torvalds wrote:

> So now I do it "right", and create a temporary file in the "top"
> object directory, and then when it's all done, I do a "link()" to
> the final place and unlink the original.

how is this better than a single rename?  i take it there is something
fundamental from clue.101 i slept though here?

also, if you are *really* paranoid you want to fsync *before* you do
the link/unklink or rename --- which is what MTAs do[1]

however, that said it *kills* performance and if it's not critical
it's really a terrible idea

also, shouldn't HEAD (and similar)[2] be updated with a temporary and
a rename too?

> I also change the permission to 0444 before it gets its final name.

cool

> NOTE NOTE NOTE! I have _not_ updated all the helper stuff that also
> write objects.

i thought this was all common code?  if it's not maybe now is the time
to change that?



[1] yes, i know this depends on the fs used and various things and
    ext3 should be fine, blah blah blah, but not everyone uses ext3
    and quite probably not everyone will use git under Linux

[2] i didn't check the code as i'm still using BK in places

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 19:27 ` Chris Wedgwood
@ 2005-05-03 19:47   ` Linus Torvalds
  2005-05-03 19:47     ` Chris Wedgwood
  2005-05-03 20:02   ` Daniel Barkalow
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2005-05-03 19:47 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Git Mailing List



On Tue, 3 May 2005, Chris Wedgwood wrote:
> 
> how is this better than a single rename?  i take it there is something
> fundamental from clue.101 i slept though here?

A rename will overwrite any old object, which means that you cannot do any 
collision checks. In contrast, a "link()" will return EEXIST if somebody 
else raced with you and created a new object, and you can do collision 
checks instead of overwriting another persons object.

> also, if you are *really* paranoid you want to fsync *before* you do
> the link/unklink or rename --- which is what MTAs do[1]

Me, I refuse to slow down my habits for old filesystems. You can either 
fsck, or use a logging filesystem. 

I don't see anybody not using logging filesystems these days, so..

> also, shouldn't HEAD (and similar)[2] be updated with a temporary and
> a rename too?

Maybe. Much less important, though.

> > NOTE NOTE NOTE! I have _not_ updated all the helper stuff that also
> > write objects.
> 
> i thought this was all common code?  if it's not maybe now is the time
> to change that?

It is all common code, except:
 - things like "fetch from another host" will use rsync/wget/xxx to 
   actually get the files. To those programs, we're not talking about git 
   objects, we're just talking "regular files"
 - rpull.c has a special different routine to write its objects. I don't 
   use it, so..

Anyway, it should be reasonably easily fixable.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 19:47   ` Linus Torvalds
@ 2005-05-03 19:47     ` Chris Wedgwood
  2005-05-03 19:56       ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wedgwood @ 2005-05-03 19:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Tue, May 03, 2005 at 12:47:36PM -0700, Linus Torvalds wrote:

> Me, I refuse to slow down my habits for old filesystems. You can
> either fsck, or use a logging filesystem.

ok, so you're saying everyone use linux ext3 or similar more or
less...

how about we drop all the objects in one directory then?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 19:47     ` Chris Wedgwood
@ 2005-05-03 19:56       ` Linus Torvalds
  0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2005-05-03 19:56 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Git Mailing List



On Tue, 3 May 2005, Chris Wedgwood wrote:
>
> On Tue, May 03, 2005 at 12:47:36PM -0700, Linus Torvalds wrote:
> 
> > Me, I refuse to slow down my habits for old filesystems. You can
> > either fsck, or use a logging filesystem.
> 
> ok, so you're saying everyone use linux ext3 or similar more or
> less...

No. I'm saying
 - you can use git-fsck-cache
 - or you can use a logging filesystem.

I happen to use both.

> how about we drop all the objects in one directory then?

I don't even have directory hashing on, and as mentioned, the logging 
filesystem is _not_ a requirement. It's just a reality for most of us.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 19:27 ` Chris Wedgwood
  2005-05-03 19:47   ` Linus Torvalds
@ 2005-05-03 20:02   ` Daniel Barkalow
  1 sibling, 0 replies; 19+ messages in thread
From: Daniel Barkalow @ 2005-05-03 20:02 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Git Mailing List

On Tue, 3 May 2005, Chris Wedgwood wrote:

> i thought this was all common code?  if it's not maybe now is the time
> to change that?

The other versions are getting the tagged compressed object along with the
intended hash from an external source. They both verify the result as they
go (as opposed to the normal case which takes the uncompressed data
and finds the hash, and therefore has to be right). The common part is
really "open the file" and "close (and place) the file", which weren't
previously sufficiently complex to justify sharing the code.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 19:15 Careful object writing Linus Torvalds
  2005-05-03 19:27 ` Chris Wedgwood
@ 2005-05-03 20:00 ` Jan Harkes
  2005-05-03 20:11   ` Linus Torvalds
  2005-05-03 22:37   ` Linus Torvalds
  2005-05-03 23:04 ` Alex Riesen
  2005-05-04  4:07 ` [PATCH] Careful object pulling Daniel Barkalow
  3 siblings, 2 replies; 19+ messages in thread
From: Jan Harkes @ 2005-05-03 20:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Tue, May 03, 2005 at 12:15:08PM -0700, Linus Torvalds wrote:
> I just pushed out the commit that tries to finally actually write the sha1
> objects the right way in a shared object directory environment.
> 
> I used to be lazy, and just do "O_CREAT | O_EXCL" on the final name, but
> that obviously is not very nice when it can result in other people seeing
> objects that haven't been fully finalized yet.
> 
> So now I do it "right", and create a temporary file in the "top" object
> directory, and then when it's all done, I do a "link()" to the final place
> and unlink the original.

Annoyingly until this commit, git has been just about the perfect SCM
system to run on top of Coda. Almost every other SCM can and will get
conflicts on it's repository files, which are pretty much impossible to
resolve (just try merging two diverging copies of an RCS archive..)

But the only conflicts we ever see with git are when two people create
the same SHA1 object. And if the contents are in fact identical this
conflict will be trivially resolved.

I tried to pull in the latest version of your tree, but it doesn't look
like this commit has propagated to rsync.kernel.org yet. Hopefully you
will accept a small patch (should be < 5 lines) that makes git work
nicely when Coda complains about the cross-directory hardlink without
affecting the reliability of using link/unlink on normal filesystems.

Jan

> 
> I also change the permission to 0444 before it gets its final name.
> 
> Two notes:
> 
>  - because the objects all get created initially in .git/objects rather 
>    than in the subdirectory they get moved to, you can't use symlinks 
>    to other filesystems for the 256 object subdirectories. The object 
>    directory has to be one filesystem (but it doesn't have to be the same 
>    one as you actually keep your working ddirectories on, of course)
> 
>  - The upside of this is that filesystem block allocators should do the 
>    right thing. Instead of spreading the objects out (because they are in 
>    different directories), they should be created together.
> 
> Anyway, somebody should double-check the thing. It _should_ now work
> correctly over NFS etc too, and everything should be nice and atomic (and
> with any half-way decent filesystem, it also means that even if you have a
> system crash in the middle, you'll never see half-created objects).
> 
> NOTE NOTE NOTE! I have _not_ updated all the helper stuff that also write 
> objects. So things like "git-http-pull" etc will still write objects 
> directly into the object directory, and that can cause problems with 
> shared usage. Same goes for "write_sha1_from_fd()" that rpull.c uses. I 
> hope somebody will take a look at those issues..
> 
> Anyway, at least the really core operations should now really be
> "thread-safe" in a shared object directory environment.
> 
> 		Linus
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 20:00 ` Jan Harkes
@ 2005-05-03 20:11   ` Linus Torvalds
  2005-05-03 20:59     ` Jan Harkes
  2005-05-03 22:37   ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2005-05-03 20:11 UTC (permalink / raw)
  To: Jan Harkes; +Cc: Git Mailing List



On Tue, 3 May 2005, Jan Harkes wrote:
> 
> I tried to pull in the latest version of your tree, but it doesn't look
> like this commit has propagated to rsync.kernel.org yet. Hopefully you
> will accept a small patch (should be < 5 lines) that makes git work
> nicely when Coda complains about the cross-directory hardlink without
> affecting the reliability of using link/unlink on normal filesystems.

What is it that coda wants to do, and is there some portable way to get 
there? 

Is it just that you want to stay within the directory? Or is it any link 
action that is nasty?

What makes resolving renames hard when the file contents are the same? 
Maybe Coda could just do a trivial resolve of that "conflict" too?

			Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 20:11   ` Linus Torvalds
@ 2005-05-03 20:59     ` Jan Harkes
  2005-05-03 22:13       ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Harkes @ 2005-05-03 20:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Tue, May 03, 2005 at 01:11:47PM -0700, Linus Torvalds wrote:
> On Tue, 3 May 2005, Jan Harkes wrote:
> > I tried to pull in the latest version of your tree, but it doesn't look
> > like this commit has propagated to rsync.kernel.org yet. Hopefully you
> > will accept a small patch (should be < 5 lines) that makes git work
> > nicely when Coda complains about the cross-directory hardlink without
> > affecting the reliability of using link/unlink on normal filesystems.
> 
> What is it that coda wants to do, and is there some portable way to get 
> there? 

Short summary:

    rc = link(old, new);
    if (rc == -1 && errno == EXDEV)
	rc = rename(old, new);

On Coda, the cross-directory link fails, the following cross-directory
rename will work fine.  On a normal filesystem, if the link fails with
EXDEV, the rename will fail with the same.

Because our cache consistency model is fairly optimistic, we already
have to deal with potential problems with a rename removing an unwanted
target. So if we are logging write operations, and the link operation
did not return EEXISTS, then the rename will be marked as not having
removed any target file. If the target did happen to exist on the server
by the time we reintegrate the operations we end up with a reintegration
conflict.

Longer version:

When a server performs conflict resolution it happens on a per-directory
basis. So any cross-directory operation already a special case.

We cannot guarantee which directory will be resolved first, so if it is
the destination of a link or rename the object itself might not exist
yet. The advantage of a rename operation is that it contains a reference
to both the source and the destination directories. If we don't yet know
the renamed object we resolve the source first. That creates the object
and allows us to complete the rename operation.

However with a link we only have a reference to an object and the
directory where the link should be added. But again, the object might
not yet exist on all servers. At this point things get a bit more
complicated because we don't enforce access based on per object UNIX
mode bits, but rely on directory ACLs. So we can't just add a reference
to an unknown object in the destination directory because until we know
where else this object is located, we can't tell if the user actually is
allowed to access the object.

> Is it just that you want to stay within the directory? Or is it any link 
> action that is nasty?

We do allow links within the same directory, mostly because that often
happens in places like /usr/bin and we know that whenever we encounter
the link operation in the resolution log, that the object creation has
already been processed. We also know that the new link can't give a user
any rights he didn't already have.

> What makes resolving renames hard when the file contents are the same? 

Renames mostly work, there are only a few corner cases left. One is
where something is moved up in the directory tree and the source
directory is then removed. We end up screwing ourselves because we
append the childs logs on removal and the create operation ends up
behind the rename operation. But that is a dumb implementation problem.
Another issue is of course when someone (validly) hardlinks a file in
the same directory and then moves one of the links to another directory.

We definitely are not a typical filesystem with UNIX semantics, which is
why it is unusual to find an application that seems so well suited for
disconnected and weakly connected operation.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 20:59     ` Jan Harkes
@ 2005-05-03 22:13       ` Linus Torvalds
  0 siblings, 0 replies; 19+ messages in thread
From: Linus Torvalds @ 2005-05-03 22:13 UTC (permalink / raw)
  To: Jan Harkes; +Cc: Git Mailing List

On Tue, 3 May 2005, Jan Harkes wrote:
> 
> Short summary:
> 
>     rc = link(old, new);
>     if (rc == -1 && errno == EXDEV)
> 	rc = rename(old, new);

Ok, that is safe enough. Will do.

> On Coda, the cross-directory link fails, the following cross-directory
> rename will work fine.  On a normal filesystem, if the link fails with
> EXDEV, the rename will fail with the same.

Yup. I do suspect that since you handle the rename anyway, you probably 
could handle the git link/unlink patterns too, but it's easy enough to 
just do the rename fallback in git itself.

The only reason not to use rename in the first place is literally just to 
be able to check for collisions. Which we don't actually _do_ right now, 
but I like to be able to do so in theory.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 20:00 ` Jan Harkes
  2005-05-03 20:11   ` Linus Torvalds
@ 2005-05-03 22:37   ` Linus Torvalds
  2005-05-03 22:40     ` H. Peter Anvin
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2005-05-03 22:37 UTC (permalink / raw)
  To: Jan Harkes, Peter Anvin; +Cc: Git Mailing List

On Tue, 3 May 2005, Jan Harkes wrote:
> 
> I tried to pull in the latest version of your tree, but it doesn't look
> like this commit has propagated to rsync.kernel.org yet.

Hmm.. It's still not there a few hours later. I wonder what the mirroring
rules are. Or maybe mirroring is just broken right now. Peter?

One change introduced by me is that the new objects changed from 0664
(-rw-rw-r--) to (0444) -r--r--r-- due to the object writing rules. Maybe
the mirroring decides that such objects shouldn't be mirrored, since they
are "private"?

Or maybe it's just that Peter has shut down mirroring in preparation for 
the imminent memory upgrade on master.kernel.org. 

			Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 22:37   ` Linus Torvalds
@ 2005-05-03 22:40     ` H. Peter Anvin
  0 siblings, 0 replies; 19+ messages in thread
From: H. Peter Anvin @ 2005-05-03 22:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jan Harkes, Git Mailing List

Linus Torvalds wrote:
> 
> On Tue, 3 May 2005, Jan Harkes wrote:
> 
>>I tried to pull in the latest version of your tree, but it doesn't look
>>like this commit has propagated to rsync.kernel.org yet.
> 
> 
> Hmm.. It's still not there a few hours later. I wonder what the mirroring
> rules are. Or maybe mirroring is just broken right now. Peter?
> 
> One change introduced by me is that the new objects changed from 0664
> (-rw-rw-r--) to (0444) -r--r--r-- due to the object writing rules. Maybe
> the mirroring decides that such objects shouldn't be mirrored, since they
> are "private"?
> 
> Or maybe it's just that Peter has shut down mirroring in preparation for 
> the imminent memory upgrade on master.kernel.org. 
> 

No, I had stopped the cron job to fix a script bug and forgot to turn it 
back on.  It's pushing now.

	-hpa


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 19:15 Careful object writing Linus Torvalds
  2005-05-03 19:27 ` Chris Wedgwood
  2005-05-03 20:00 ` Jan Harkes
@ 2005-05-03 23:04 ` Alex Riesen
  2005-05-03 23:22   ` Linus Torvalds
  2005-05-03 23:22   ` Junio C Hamano
  2005-05-04  4:07 ` [PATCH] Careful object pulling Daniel Barkalow
  3 siblings, 2 replies; 19+ messages in thread
From: Alex Riesen @ 2005-05-03 23:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On 5/3/05, Linus Torvalds <torvalds@osdl.org> wrote:
> I also change the permission to 0444 before it gets its final name.

Maybe umask it first? Just in case.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 23:04 ` Alex Riesen
@ 2005-05-03 23:22   ` Linus Torvalds
  2005-05-03 23:25     ` Alex Riesen
  2005-05-03 23:22   ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2005-05-03 23:22 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Git Mailing List

On Wed, 4 May 2005, Alex Riesen wrote:
>
> On 5/3/05, Linus Torvalds <torvalds@osdl.org> wrote:
> > I also change the permission to 0444 before it gets its final name.
> 
> Maybe umask it first? Just in case.

I considered it, but it's not worth it.

If you don't want somebody else to see your objects, you should just 
disable execute permission on your .git directory. "umask" is actually a 
fairly nasty interface, since it takes effect on create(), and we do want 
to fchmod _after_ the create (some filesystems don't like it when you 
create a non-writable object and then write to it, but more importantly, 
since we use "mkstemp()" for the temp-file handling, we don't even have 
control of the initial umask.

So to make it (0444 & umask) git would actually have to jump through silly
hoops. Without actually buying you anything new, since the access
permissions for git objects really are about the _directory_ anyway.

		Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 23:22   ` Linus Torvalds
@ 2005-05-03 23:25     ` Alex Riesen
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Riesen @ 2005-05-03 23:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On 5/4/05, Linus Torvalds <torvalds@osdl.org> wrote:
> > > I also change the permission to 0444 before it gets its final name.
> > Maybe umask it first? Just in case.
> 
> I considered it, but it's not worth it.
> 
> If you don't want somebody else to see your objects, you should just
> disable execute permission on your .git directory. ...

Of course. It's leaves even more control to the user. Forget I asked :)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Careful object writing..
  2005-05-03 23:04 ` Alex Riesen
  2005-05-03 23:22   ` Linus Torvalds
@ 2005-05-03 23:22   ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2005-05-03 23:22 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List

>>>>> "AR" == Alex Riesen <raa.lkml@gmail.com> writes:

AR> On 5/3/05, Linus Torvalds <torvalds@osdl.org> wrote:
>> I also change the permission to 0444 before it gets its final name.

AR> Maybe umask it first? Just in case.

In general worrying about umask when you see chmod is a good
practice, but it probably is not applicable to this particular
case.  A person with umask 077 should still get 0444 if the
SHA1_FILE_DIRECTORY is shared with other people, and if it is
not shared, his initial git-init-db would have made it 0700, so
it does not matter it the files files underneath it have 0444.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] Careful object pulling
  2005-05-03 19:15 Careful object writing Linus Torvalds
                   ` (2 preceding siblings ...)
  2005-05-03 23:04 ` Alex Riesen
@ 2005-05-04  4:07 ` Daniel Barkalow
  2005-05-04  9:35   ` Morten Welinder
  3 siblings, 1 reply; 19+ messages in thread
From: Daniel Barkalow @ 2005-05-04  4:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

This splits out the careful methods for writing a file and placing it in
the correct location from the function to write a buffer to the file and
makes the various functions used by git-*-pull programs use those
functions to write their files.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Index: cache.h
===================================================================
--- 51a882a2dc62e0d3cdc79e0badc61559fb723481/cache.h  (mode:100644 sha1:8dd812827604d510038f5d93e3718c43f9d12c30)
+++ 4e31436bacfff09ce673665a1061b41e37ffd661/cache.h  (mode:100644 sha1:0d7411c3b86a899cee45627997f4bb7ba0df2ea7)
@@ -134,6 +134,12 @@
 extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size);
 extern int write_sha1_file(char *buf, unsigned long len, const char *type, unsigned char *return_sha1);
 
+/* Open a file for writing a sha1 file. */
+extern int open_sha1_tmpfile(char tmpfile[PATH_MAX]);
+
+/* Put a sha1 file in the correct place. */
+extern int place_sha1_file(char tmpfile[PATH_MAX], const unsigned char *sha1);
+
 extern int check_sha1_signature(unsigned char *sha1, void *buf, unsigned long size, const char *type);
 
 /* Read a tree into the cache */
Index: http-pull.c
===================================================================
--- 51a882a2dc62e0d3cdc79e0badc61559fb723481/http-pull.c  (mode:100644 sha1:f693aba61b4dcb4b738faf1335ec74aa1545e45d)
+++ 4e31436bacfff09ce673665a1061b41e37ffd661/http-pull.c  (mode:100644 sha1:793d8b9e125c1f164f774e2888d26beaa75e1df0)
@@ -52,8 +52,9 @@
 	char real_sha1[20];
 	char *url;
 	char *posn;
+	char tmpname[PATH_MAX];
 
-	local = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
+	local = open_sha1_tmpfile(tmpname);
 
 	if (local < 0)
 		return error("Couldn't open %s\n", filename);
@@ -88,15 +89,14 @@
 	inflateEnd(&stream);
 	SHA1_Final(real_sha1, &c);
 	if (zret != Z_STREAM_END) {
-		unlink(filename);
+		unlink(tmpname);
 		return error("File %s (%s) corrupt\n", hex, url);
 	}
 	if (memcmp(sha1, real_sha1, 20)) {
-		unlink(filename);
+		unlink(tmpname);
 		return error("File %s has bad hash\n", hex);
 	}
-	
-	return 0;
+	return place_sha1_file(tmpname, sha1);
 }
 
 int main(int argc, char **argv)
Index: sha1_file.c
===================================================================
--- 51a882a2dc62e0d3cdc79e0badc61559fb723481/sha1_file.c  (mode:100644 sha1:e6ce455ae90bd430f2128f454bdb6e0575412486)
+++ 4e31436bacfff09ce673665a1061b41e37ffd661/sha1_file.c  (mode:100644 sha1:85daa0b0045c3f19e697d1a7aa8ab15ff54eab99)
@@ -276,6 +276,55 @@
 	}
 }
 
+int open_sha1_tmpfile(char tmpfile[PATH_MAX])
+{
+	int fd;
+
+	snprintf(tmpfile, sizeof(tmpfile), "%s/obj_XXXXXX", get_object_directory());
+	fd = mkstemp(tmpfile);
+	if (fd < 0) {
+		fprintf(stderr,
+			"unable to create temporary sha1 filename %s: %s",
+			tmpfile, strerror(errno));
+		return -1;
+	}
+	return fd;
+}
+
+int place_sha1_file(char tmpfile[PATH_MAX], const unsigned char *sha1)
+{
+	char *filename = sha1_file_name(sha1);
+	int ret;
+
+	ret = link(tmpfile, filename);
+	if (ret < 0) {
+		ret = errno;
+
+		/*
+		 * Coda hack - coda doesn't like cross-directory links,
+		 * so we fall back to a rename, which will mean that it
+		 * won't be able to check collisions, but that's not a
+		 * big deal.
+		 *
+		 * When this succeeds, we just return 0. We have nothing
+		 * left to unlink.
+		 */
+		if (ret == EXDEV && !rename(tmpfile, filename))
+			return 0;
+	}
+	unlink(tmpfile);
+	if (ret) {
+		if (ret != EEXIST) {
+			fprintf(stderr,
+				"unable to write sha1 filename %s: %s", 
+				filename, strerror(ret));
+			return -1;
+		}
+		/* FIXME!!! Collision check here ? */
+	}
+	return 0;
+}
+
 int write_sha1_file(char *buf, unsigned long len, const char *type, unsigned char *returnsha1)
 {
 	int size;
@@ -286,7 +335,7 @@
 	char *filename;
 	static char tmpfile[PATH_MAX];
 	char hdr[50];
-	int fd, hdrlen, ret;
+	int fd, hdrlen;
 
 	/* Generate the header */
 	hdrlen = sprintf(hdr, "%s %lu", type, len)+1;
@@ -316,12 +365,9 @@
 		return -1;
 	}
 
-	snprintf(tmpfile, sizeof(tmpfile), "%s/obj_XXXXXX", get_object_directory());
-	fd = mkstemp(tmpfile);
-	if (fd < 0) {
-		fprintf(stderr, "unable to create temporary sha1 filename %s: %s", tmpfile, strerror(errno));
-		return -1;
-	}
+	fd = open_sha1_tmpfile(tmpfile);
+	if (fd < 0)
+		return fd;
 
 	/* Set it up */
 	memset(&stream, 0, sizeof(stream));
@@ -349,53 +395,28 @@
 
 	if (write(fd, compressed, size) != size)
 		die("unable to write file");
+
 	fchmod(fd, 0444);
 	close(fd);
 
-	ret = link(tmpfile, filename);
-	if (ret < 0) {
-		ret = errno;
-
-		/*
-		 * Coda hack - coda doesn't like cross-directory links,
-		 * so we fall back to a rename, which will mean that it
-		 * won't be able to check collisions, but that's not a
-		 * big deal.
-		 *
-		 * When this succeeds, we just return 0. We have nothing
-		 * left to unlink.
-		 */
-		if (ret == EXDEV && !rename(tmpfile, filename))
-			return 0;
-	}
-	unlink(tmpfile);
-	if (ret) {
-		if (ret != EEXIST) {
-			fprintf(stderr, "unable to write sha1 filename %s: %s", filename, strerror(ret));
-			return -1;
-		}
-		/* FIXME!!! Collision check here ? */
-	}
-
-	return 0;
+	return place_sha1_file(tmpfile, sha1);
 }
 
 int write_sha1_from_fd(const unsigned char *sha1, int fd)
 {
-	char *filename = sha1_file_name(sha1);
-
 	int local;
 	z_stream stream;
 	unsigned char real_sha1[20];
 	char buf[4096];
 	char discard[4096];
+	char tmpname[PATH_MAX];
 	int ret;
 	SHA_CTX c;
 
-	local = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
+	local = open_sha1_tmpfile(tmpname);
 
 	if (local < 0)
-		return error("Couldn't open %s\n", filename);
+		return -1;
 
 	memset(&stream, 0, sizeof(stream));
 
@@ -408,7 +429,7 @@
 		size = read(fd, buf, 4096);
 		if (size <= 0) {
 			close(local);
-			unlink(filename);
+			unlink(tmpname);
 			if (!size)
 				return error("Connection closed?");
 			perror("Reading from connection");
@@ -431,15 +452,15 @@
 	close(local);
 	SHA1_Final(real_sha1, &c);
 	if (ret != Z_STREAM_END) {
-		unlink(filename);
+		unlink(tmpname);
 		return error("File %s corrupted", sha1_to_hex(sha1));
 	}
 	if (memcmp(sha1, real_sha1, 20)) {
-		unlink(filename);
+		unlink(tmpname);
 		return error("File %s has bad hash\n", sha1_to_hex(sha1));
 	}
-	
-	return 0;
+
+	return place_sha1_file(tmpname, sha1);
 }
 
 int has_sha1_file(const unsigned char *sha1)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Careful object pulling
  2005-05-04  4:07 ` [PATCH] Careful object pulling Daniel Barkalow
@ 2005-05-04  9:35   ` Morten Welinder
  2005-05-04 16:16     ` Daniel Barkalow
  0 siblings, 1 reply; 19+ messages in thread
From: Morten Welinder @ 2005-05-04  9:35 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, Git Mailing List

Something's fishy there.  You are comparing the result from link with EEXIST.

Morten

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] Careful object pulling
  2005-05-04  9:35   ` Morten Welinder
@ 2005-05-04 16:16     ` Daniel Barkalow
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Barkalow @ 2005-05-04 16:16 UTC (permalink / raw)
  To: Morten Welinder; +Cc: Linus Torvalds, Git Mailing List

On Wed, 4 May 2005, Morten Welinder wrote:

> Something's fishy there.  You are comparing the result from link with EEXIST.

No, it just looks that way. If ret is negative, errno gets written to
it. If it's zero, we don't do anything with it. It can't be positive. So,
at the point where we test it, it must be the errno from link, which would
be EEXIST if the case we're worried about.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-05-04 16:10 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-03 19:15 Careful object writing Linus Torvalds
2005-05-03 19:27 ` Chris Wedgwood
2005-05-03 19:47   ` Linus Torvalds
2005-05-03 19:47     ` Chris Wedgwood
2005-05-03 19:56       ` Linus Torvalds
2005-05-03 20:02   ` Daniel Barkalow
2005-05-03 20:00 ` Jan Harkes
2005-05-03 20:11   ` Linus Torvalds
2005-05-03 20:59     ` Jan Harkes
2005-05-03 22:13       ` Linus Torvalds
2005-05-03 22:37   ` Linus Torvalds
2005-05-03 22:40     ` H. Peter Anvin
2005-05-03 23:04 ` Alex Riesen
2005-05-03 23:22   ` Linus Torvalds
2005-05-03 23:25     ` Alex Riesen
2005-05-03 23:22   ` Junio C Hamano
2005-05-04  4:07 ` [PATCH] Careful object pulling Daniel Barkalow
2005-05-04  9:35   ` Morten Welinder
2005-05-04 16:16     ` Daniel Barkalow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).