fatal: unable to write sha1 file git 1.6.2.1

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* fatal: unable to write sha1 file git 1.6.2.1
@ 2009-03-24 18:20 Peter
  2009-03-24 19:30 ` Nicolas Pitre
  2009-03-24 19:31 ` Linus Torvalds
  0 siblings, 2 replies; 13+ messages in thread
From: Peter @ 2009-03-24 18:20 UTC (permalink / raw)
  To: git

Hi
I try to add a directory with lots of binary files to a git repository.
I receive the error message:
*
fatal: unable to write sha1 file

*This is  git 1.6.2.1.

Are there limits concerning binary files ( like executables , images ) 
for the inclusion in a git repo ?

Thanks a lot for your help

Peter

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 18:20 fatal: unable to write sha1 file git 1.6.2.1 Peter
@ 2009-03-24 19:30 ` Nicolas Pitre
  2009-03-24 19:31 ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: Nicolas Pitre @ 2009-03-24 19:30 UTC (permalink / raw)
  To: Peter; +Cc: git

On Tue, 24 Mar 2009, Peter wrote:

> Hi
> I try to add a directory with lots of binary files to a git repository.
> I receive the error message:
> *
> fatal: unable to write sha1 file
> 
> *This is  git 1.6.2.1.
> 
> Are there limits concerning binary files ( like executables , images ) for the
> inclusion in a git repo ?

Currently, Git is unable to deal with files that it cannot load entirely 
in memory.

What is the size of your largest file?


Nicolas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 18:20 fatal: unable to write sha1 file git 1.6.2.1 Peter
  2009-03-24 19:30 ` Nicolas Pitre
@ 2009-03-24 19:31 ` Linus Torvalds
  2009-03-24 21:05   ` Peter
  1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2009-03-24 19:31 UTC (permalink / raw)
  To: Peter, Junio C Hamano; +Cc: Git Mailing List



On Tue, 24 Mar 2009, Peter wrote:
>
> I try to add a directory with lots of binary files to a git repository.
> I receive the error message:
> *
> fatal: unable to write sha1 file
> 
> *This is  git 1.6.2.1.
> 
> Are there limits concerning binary files ( like executables , images ) for the
> inclusion in a git repo ?

If that is the only error message you got, then afaik the only way that 
can happen is if "close(fd)" returns an error.

The only other "unable to write sha1 file" message happens if 
write_buffer() fails, but if that happens then you should also have gotten 
a

	file write error (<error message goes here>)

message in addition to the "unable to write sha1 file".

What OS? What filesystem? Are you perhaps running out of space?

You could also try to apply this patch to get more information (Junio, 
maybe worth doing regardless? We should try to avoid ambiguous error 
messages that don't give full error information)

		Linus
---
 sha1_file.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 4563173..54972f9 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2287,7 +2287,7 @@ static void close_sha1_file(int fd)
 		fsync_or_die(fd, "sha1 file");
 	fchmod(fd, 0444);
 	if (close(fd) != 0)
-		die("unable to write sha1 file");
+		die("error when closing sha1 file (%s)", strerror(errno));
 }
 
 /* Size of directory component, including the ending '/' */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 19:31 ` Linus Torvalds
@ 2009-03-24 21:05   ` Peter
  2009-03-24 22:30     ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Peter @ 2009-03-24 21:05 UTC (permalink / raw)
  To: git

I consolidate from Nicolas and Linus, thanks for your help:


Currently, Git is unable to deal with files that it cannot load entirely 
in memory.

What is the size of your largest file?

The biggest is 51 Mb , followed by one that is 23 MB. I was able to add 
and commit a part of the dir tree where the biggest file was 14 Mb. So 
there seems to be a threshold between 23 Mb and 14 Mb.

> If that is the only error message you got, then afaik the only way that 
> can happen is if "close(fd)" returns an error.
>   
Yes thats the only message I got.
> The only other "unable to write sha1 file" message happens if 
> write_buffer() fails, but if that happens then you should also have gotten 
> a
>
> 	file write error (<error message goes here>)
>
> message in addition to the "unable to write sha1 file".
>
> What OS? What filesystem? Are you perhaps running out of space?
>   
Its Lenny Debian 5.0.0, Diskspace is ample . Filesystem is cifs ( this 
is a windows 2000 share mounted with samba in a VMware Workstation 
Debian client ( yes, I know ... )). Memory usage, according to htop, is 
constant = 140/504 MB during the whole process until git fails.
> You could also try to apply this patch to get more information (Junio, 
> maybe worth doing regardless? We should try to avoid ambiguous error 
> messages that don't give full error information)
>
> 		Linus
> ---
>  sha1_file.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 4563173..54972f9 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2287,7 +2287,7 @@ static void close_sha1_file(int fd)
>  		fsync_or_die(fd, "sha1 file");
>  	fchmod(fd, 0444);
>  	if (close(fd) != 0)
> -		die("unable to write sha1 file");
> +		die("error when closing sha1 file (%s)", strerror(errno));
>  }
>  
>  /* Size of directory component, including the ending '/' */
> --
>   
Will try it, thanks
Peter

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 21:05   ` Peter
@ 2009-03-24 22:30     ` Linus Torvalds
  2009-03-24 22:42       ` Peter
                         ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Linus Torvalds @ 2009-03-24 22:30 UTC (permalink / raw)
  To: Peter; +Cc: Git Mailing List, Steve French, Jeff Layton

On Tue, 24 Mar 2009, Peter wrote:
> > 
> > What OS? What filesystem? Are you perhaps running out of space?
>   
> Its Lenny Debian 5.0.0, Diskspace is ample . Filesystem is cifs ( this is a
> windows 2000 share mounted with samba in a VMware Workstation Debian client (
> yes, I know ... )). Memory usage, according to htop, is constant = 140/504 MB
> during the whole process until git fails.

Ok, it sounds very much like a possible CIFS problem. 

Getting the exact error code for the "close()" will be interesting, 
because the only thing that can return an error under Linux in close() is 
if the filesystem "->flush()" function fails with an error.

In cifs, the cifs_flush() thing does a filemap_fdatawrite(), forcing the 
data out, and that in turn calls do_writepages() which in turn calls 
either generic_writepages() or cifs_writepages() depending on random cifs 
crap.

I'm not seeing any obvious errors there. But I would _not_ be surprised if 
the fchmod(fd, 0444) that we did before the close could be causing this: 
cifs gets confused and thinks that it must not write to the file because 
the file has been turned read-only.

Here's an idea: if this is reproducible for you, does the behavior change 
if you do

	[core]
		core.fsyncobjectfiles = true

in your .git/config file? That causes git to always fsync() the written 
data _before_ it does that fchmod(), which in turn means that by the time 
the close() rolls around, there should be no data to write, and thus no 
possibility that anybody gets confused when there is still dirty data on a 
file that has been marked read-only.

Anyway, I'm cc'ing Steve French and Jeff Layton, as the major CIFS go-to 
guys. It does seem like a CIFS bug is likely.

Steve, Jeff: git does basically

	open(".git/objects/xy/tmp_obj_xyzzy", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
	write(5, ".."..., len)
	fchmod(5, 0444)
	close(5)
	link(".git/objects/xy/tmp_obj_xyzzy", ".git/objects/xy/xyzzy");
	unlink(".git/objects/xy/tmp_obj_xyzzy");

to write a new datafile. Under CIFS, that "close()" apparently sometimes 
returns with an error, and fails.

I guess we could change the "fchmod()" to happen after the close(), just 
to make it easier for filesystems to get this right. And yes, as outlined 
above, there's a config option to make git use fdatasync() before it does 
that fchmod() too. But it does seem like CIFS is just buggy.

If CIFS has problems with the above sequence (say, if some timeout 
refreshes the inode data or causes a re-connect with the server or 
whatever), then maybe cifs should always do an implicit fdatasync() when 
doing fchmod(), just to make sure that the fchmod won't screw up any 
cached dirty data?

		Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 22:30     ` Linus Torvalds
@ 2009-03-24 22:42       ` Peter
  2009-03-25  0:03         ` Linus Torvalds
  2009-03-24 23:35       ` Jeff Layton
  2009-03-25 10:52       ` Peter
  2 siblings, 1 reply; 13+ messages in thread
From: Peter @ 2009-03-24 22:42 UTC (permalink / raw)
  To: Linus Torvalds, git

Linus Torvalds wrote:
> On Tue, 24 Mar 2009, Peter wrote:
>   
>>> What OS? What filesystem? Are you perhaps running out of space?
>>>       
>>   
>> Its Lenny Debian 5.0.0, Diskspace is ample . Filesystem is cifs ( this is a
>> windows 2000 share mounted with samba in a VMware Workstation Debian client (
>> yes, I know ... )). Memory usage, according to htop, is constant = 140/504 MB
>> during the whole process until git fails.
>>     
>
> Ok, it sounds very much like a possible CIFS problem. 
>
> Getting the exact error code for the "close()" will be interesting, 
> because the only thing that can return an error under Linux in close() is 
> if the filesystem "->flush()" function fails with an error.
>
> In cifs, the cifs_flush() thing does a filemap_fdatawrite(), forcing the 
> data out, and that in turn calls do_writepages() which in turn calls 
> either generic_writepages() or cifs_writepages() depending on random cifs 
> crap.
>
> I'm not seeing any obvious errors there. But I would _not_ be surprised if 
> the fchmod(fd, 0444) that we did before the close could be causing this: 
> cifs gets confused and thinks that it must not write to the file because 
> the file has been turned read-only.
>
> Here's an idea: if this is reproducible for you, does the behavior change 
> if you do
>
> 	[core]
> 		core.fsyncobjectfiles = true
>
> in your .git/config file? That causes git to always fsync() the written 
> data _before_ it does that fchmod(), which in turn means that by the time 
> the close() rolls around, there should be no data to write, and thus no 
> possibility that anybody gets confused when there is still dirty data on a 
> file that has been marked read-only.
>
> Anyway, I'm cc'ing Steve French and Jeff Layton, as the major CIFS go-to 
> guys. It does seem like a CIFS bug is likely.
>
> Steve, Jeff: git does basically
>
> 	open(".git/objects/xy/tmp_obj_xyzzy", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
> 	write(5, ".."..., len)
> 	fchmod(5, 0444)
> 	close(5)
> 	link(".git/objects/xy/tmp_obj_xyzzy", ".git/objects/xy/xyzzy");
> 	unlink(".git/objects/xy/tmp_obj_xyzzy");
>
> to write a new datafile. Under CIFS, that "close()" apparently sometimes 
> returns with an error, and fails.
>
> I guess we could change the "fchmod()" to happen after the close(), just 
> to make it easier for filesystems to get this right. And yes, as outlined 
> above, there's a config option to make git use fdatasync() before it does 
> that fchmod() too. But it does seem like CIFS is just buggy.
>
> If CIFS has problems with the above sequence (say, if some timeout 
> refreshes the inode data or causes a re-connect with the server or 
> whatever), then maybe cifs should always do an implicit fdatasync() when 
> doing fchmod(), just to make sure that the fchmod won't screw up any 
> cached dirty data?
>
> 		Linus
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   
Hi
Thanks a lot , I will check that out tomorrow, in the meantime, this is 
the result of your patch being applied:

$ git add <big stuff>

fatal: error when closing sha1 file (Bad file descriptor)

Peter

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 22:30     ` Linus Torvalds
  2009-03-24 22:42       ` Peter
@ 2009-03-24 23:35       ` Jeff Layton
  2009-03-25  0:11         ` Linus Torvalds
  2009-03-25  0:17         ` Steven French
  2009-03-25 10:52       ` Peter
  2 siblings, 2 replies; 13+ messages in thread
From: Jeff Layton @ 2009-03-24 23:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Peter, Git Mailing List, Steve French

On Tue, 24 Mar 2009 15:30:38 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Tue, 24 Mar 2009, Peter wrote:
> > > 
> > > What OS? What filesystem? Are you perhaps running out of space?
> >   
> > Its Lenny Debian 5.0.0, Diskspace is ample . Filesystem is cifs ( this is a
> > windows 2000 share mounted with samba in a VMware Workstation Debian client (
> > yes, I know ... )). Memory usage, according to htop, is constant = 140/504 MB
> > during the whole process until git fails.
> 
> Ok, it sounds very much like a possible CIFS problem. 
> 
> Getting the exact error code for the "close()" will be interesting, 
> because the only thing that can return an error under Linux in close() is 
> if the filesystem "->flush()" function fails with an error.
> 
> In cifs, the cifs_flush() thing does a filemap_fdatawrite(), forcing the 
> data out, and that in turn calls do_writepages() which in turn calls 
> either generic_writepages() or cifs_writepages() depending on random cifs 
> crap.
> 
> I'm not seeing any obvious errors there. But I would _not_ be surprised if 
> the fchmod(fd, 0444) that we did before the close could be causing this: 
> cifs gets confused and thinks that it must not write to the file because 
> the file has been turned read-only.
> 
> Here's an idea: if this is reproducible for you, does the behavior change 
> if you do
> 
> 	[core]
> 		core.fsyncobjectfiles = true
> 
> in your .git/config file? That causes git to always fsync() the written 
> data _before_ it does that fchmod(), which in turn means that by the time 
> the close() rolls around, there should be no data to write, and thus no 
> possibility that anybody gets confused when there is still dirty data on a 
> file that has been marked read-only.
> 
> Anyway, I'm cc'ing Steve French and Jeff Layton, as the major CIFS go-to 
> guys. It does seem like a CIFS bug is likely.
> 
> Steve, Jeff: git does basically
> 
> 	open(".git/objects/xy/tmp_obj_xyzzy", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
> 	write(5, ".."..., len)
> 	fchmod(5, 0444)
> 	close(5)
> 	link(".git/objects/xy/tmp_obj_xyzzy", ".git/objects/xy/xyzzy");
> 	unlink(".git/objects/xy/tmp_obj_xyzzy");
> 
> to write a new datafile. Under CIFS, that "close()" apparently sometimes 
> returns with an error, and fails.
> 
> I guess we could change the "fchmod()" to happen after the close(), just 
> to make it easier for filesystems to get this right. And yes, as outlined 
> above, there's a config option to make git use fdatasync() before it does 
> that fchmod() too. But it does seem like CIFS is just buggy.
> 
> If CIFS has problems with the above sequence (say, if some timeout 
> refreshes the inode data or causes a re-connect with the server or 
> whatever), then maybe cifs should always do an implicit fdatasync() when 
> doing fchmod(), just to make sure that the fchmod won't screw up any 
> cached dirty data?
> 

Yes. That's probably the right thing to do here. This is looks like a
regression that I introduced some time ago in
cea218054ad277d6c126890213afde07b4eb1602. Before that delta we always
flushed all the data before doing any setattr. After that delta, we
just did it on size changes. In a later commit, Steve fixed it so that
it got done on ATTR_MTIME too.

We can easily change the cifs_setattr variants to flush all the dirty
data on ATTR_MODE changes, and maybe even change it to only flush on
certain ATTR_MODE changes, like when we're clearing all the write bits.

I wonder though whether that's sufficient. If we're changing ownership
for instance, will we hit the same issue? Maybe the safest approach is
just to go back to flushing on any setattr call.

Steve, thoughts?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 22:42       ` Peter
@ 2009-03-25  0:03         ` Linus Torvalds
  2009-03-25  0:24           ` Jeff Layton
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2009-03-25  0:03 UTC (permalink / raw)
  To: Peter; +Cc: Git Mailing List, Jeff Layton, Steve French



On Tue, 24 Mar 2009, Peter wrote:
>
> Thanks a lot , I will check that out tomorrow, in the meantime, this is the
> result of your patch being applied:
> 
> $ git add <big stuff>
> 
> fatal: error when closing sha1 file (Bad file descriptor)

Ok, that's probably cifs_writepages() doing

                        open_file = find_writable_file(CIFS_I(mapping->host));
                        if (!open_file) {
                                cERROR(1, ("No writable handles for inode"));
                                rc = -EBADF;
			} else {
				..

so yeah, looks like it's the fchmod() that triggers it.

I suspect this would be a safer - if slightly slower - way to make sure 
the file is read-only. It's slower, because it is going to look up the 
filename once more, but I bet it is going to avoid this particular CIFS 
bug.

			Linus
---
 http-push.c |    2 +-
 sha1_file.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/http-push.c b/http-push.c
index 48e5f38..ba4fa4d 100644
--- a/http-push.c
+++ b/http-push.c
@@ -748,8 +748,8 @@ static void finish_request(struct transfer_request *request)
 			aborted = 1;
 		}
 	} else if (request->state == RUN_FETCH_LOOSE) {
-		fchmod(request->local_fileno, 0444);
 		close(request->local_fileno); request->local_fileno = -1;
+		chmod(request->tmpfile, 0444);
 
 		if (request->curl_result != CURLE_OK &&
 		    request->http_code != 416) {
diff --git a/sha1_file.c b/sha1_file.c
index 4563173..8268da7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2285,9 +2285,8 @@ static void close_sha1_file(int fd)
 {
 	if (fsync_object_files)
 		fsync_or_die(fd, "sha1 file");
-	fchmod(fd, 0444);
 	if (close(fd) != 0)
-		die("unable to write sha1 file");
+		die("error when closing sha1 file (%s)", strerror(errno));
 }
 
 /* Size of directory component, including the ending '/' */
@@ -2384,6 +2383,7 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
 	close_sha1_file(fd);
 	free(compressed);
 
+	chmod(tmpfile, 0444);
 	if (mtime) {
 		struct utimbuf utb;
 		utb.actime = mtime;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 23:35       ` Jeff Layton
@ 2009-03-25  0:11         ` Linus Torvalds
  2009-03-25  0:17         ` Steven French
  1 sibling, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2009-03-25  0:11 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Peter, Git Mailing List, Steve French



On Tue, 24 Mar 2009, Jeff Layton wrote:
> 
> Yes. That's probably the right thing to do here. This is looks like a
> regression that I introduced some time ago in
> cea218054ad277d6c126890213afde07b4eb1602. Before that delta we always
> flushed all the data before doing any setattr. After that delta, we
> just did it on size changes. In a later commit, Steve fixed it so that
> it got done on ATTR_MTIME too.

Ahh. Yes. You do need to flush on at least ATTR_MODE and ATTR_UID/GID 
changes too.

> I wonder though whether that's sufficient. If we're changing ownership
> for instance, will we hit the same issue? Maybe the safest approach is
> just to go back to flushing on any setattr call.
> 
> Steve, thoughts?

Is there anything relevant left to be sufficient reason to make it 
conditional? Once you're doing it on MODE/SIZE/UID/GID/MTIME changes, that 
pretty much will cover all set_attr calls. 

			Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 23:35       ` Jeff Layton
  2009-03-25  0:11         ` Linus Torvalds
@ 2009-03-25  0:17         ` Steven French
  2009-03-25  0:49           ` Jeff Layton
  1 sibling, 1 reply; 13+ messages in thread
From: Steven French @ 2009-03-25  0:17 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Git Mailing List, Linus Torvalds, Peter

Jeff Layton <jlayton@redhat.com> wrote on 03/24/2009 06:35:06 PM:

> On Tue, 24 Mar 2009 15:30:38 -0700 (PDT)
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > 
> > 
> > On Tue, 24 Mar 2009, Peter wrote:
> > > > 
> > > > What OS? What filesystem? Are you perhaps running out of space?
> > > 
> > > Its Lenny Debian 5.0.0, Diskspace is ample . Filesystem is cifs 
> ( this is a
> > > windows 2000 share mounted with samba in a VMware Workstation 
> Debian client (
> > > yes, I know ... )). Memory usage, according to htop, is constant
> = 140/504 MB
> > > during the whole process until git fails.
> > 
> > Ok, it sounds very much like a possible CIFS problem. 
> > 
> > Getting the exact error code for the "close()" will be interesting, 
> > because the only thing that can return an error under Linux in close() 
is 
> > if the filesystem "->flush()" function fails with an error.
> > 
> > In cifs, the cifs_flush() thing does a filemap_fdatawrite(), forcing 
the 
> > data out, and that in turn calls do_writepages() which in turn calls 
> > either generic_writepages() or cifs_writepages() depending on random 
cifs 
> > crap.
> > 
> > I'm not seeing any obvious errors there. But I would _not_ be 
surprised if 
> > the fchmod(fd, 0444) that we did before the close could be causing 
this: 
> > cifs gets confused and thinks that it must not write to the file 
because 
> > the file has been turned read-only.
> > 
> > Here's an idea: if this is reproducible for you, does the behavior 
change 
> > if you do
> > 
> >    [core]
> >       core.fsyncobjectfiles = true
> > 
> > in your .git/config file? That causes git to always fsync() the 
written 
> > data _before_ it does that fchmod(), which in turn means that by the 
time 
> > the close() rolls around, there should be no data to write, and thus 
no 
> > possibility that anybody gets confused when there is still dirty data 
on a 
> > file that has been marked read-only.
> > 
> > Anyway, I'm cc'ing Steve French and Jeff Layton, as the major CIFS 
go-to 
> > guys. It does seem like a CIFS bug is likely.
> > 
> > Steve, Jeff: git does basically
> > 
> >    open(".git/objects/xy/tmp_obj_xyzzy", O_RDWR|O_CREAT|O_EXCL, 0600) 
= 5
> >    write(5, ".."..., len)
> >    fchmod(5, 0444)
> >    close(5)
> >    link(".git/objects/xy/tmp_obj_xyzzy", ".git/objects/xy/xyzzy");
> >    unlink(".git/objects/xy/tmp_obj_xyzzy");
> > 
> > to write a new datafile. Under CIFS, that "close()" apparently 
sometimes 
> > returns with an error, and fails.
> > 
> > I guess we could change the "fchmod()" to happen after the close(), 
just 
> > to make it easier for filesystems to get this right. And yes, as 
outlined 
> > above, there's a config option to make git use fdatasync() before it 
does 
> > that fchmod() too. But it does seem like CIFS is just buggy.
> > 
> > If CIFS has problems with the above sequence (say, if some timeout 
> > refreshes the inode data or causes a re-connect with the server or 
> > whatever), then maybe cifs should always do an implicit fdatasync() 
when 
> > doing fchmod(), just to make sure that the fchmod won't screw up any 
> > cached dirty data?
> > 
> 
> Yes. That's probably the right thing to do here. This is looks like a
> regression that I introduced some time ago in
> cea218054ad277d6c126890213afde07b4eb1602. Before that delta we always
> flushed all the data before doing any setattr. After that delta, we
> just did it on size changes. In a later commit, Steve fixed it so that
> it got done on ATTR_MTIME too.
> 
> We can easily change the cifs_setattr variants to flush all the dirty
> data on ATTR_MODE changes, and maybe even change it to only flush on
> certain ATTR_MODE changes, like when we're clearing all the write bits.
> 
> I wonder though whether that's sufficient. If we're changing ownership
> for instance, will we hit the same issue? Maybe the safest approach is
> just to go back to flushing on any setattr call.
> 
> Steve, thoughts?

Flushing on all setattr calls seems excessive, but the case of
fchmod(0444) followed by close should not require it, and should
work over cifs unless the tcp socket drops between the chmod
and the close/flush (requiring reconnection and reopening
the file handle).  For the normal case, the chmod should not
matter, since for cifs, unlike NFSv3, there is a network
open operation and access checks are done on the open call, and 
you already have the file open for write).  If /proc/fs/cifs/Stats
shows the number of reconnects as 0, then we don't have the case 
of dropping the network between chmod and flush/close.  Since certain
writebehind errors (in writepages) can not be returned to the application
until flush/close, it is just as likely that what you are seeing is
an error from a previous failed write.  The logic for handling
errors in cifs_writepages is much better in recent kernels.

What is the kernel version?

Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-25  0:03         ` Linus Torvalds
@ 2009-03-25  0:24           ` Jeff Layton
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2009-03-25  0:24 UTC (permalink / raw)
  To: Peter; +Cc: Linus Torvalds, Git Mailing List, Steve French

On Tue, 24 Mar 2009 17:03:22 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Tue, 24 Mar 2009, Peter wrote:
> >
> > Thanks a lot , I will check that out tomorrow, in the meantime, this is the
> > result of your patch being applied:
> > 
> > $ git add <big stuff>
> > 
> > fatal: error when closing sha1 file (Bad file descriptor)
> 
> Ok, that's probably cifs_writepages() doing
> 
>                         open_file = find_writable_file(CIFS_I(mapping->host));
>                         if (!open_file) {
>                                 cERROR(1, ("No writable handles for inode"));
>                                 rc = -EBADF;
> 			} else {
> 				..
> 
> so yeah, looks like it's the fchmod() that triggers it.
> 
> I suspect this would be a safer - if slightly slower - way to make sure 
> the file is read-only. It's slower, because it is going to look up the 
> filename once more, but I bet it is going to avoid this particular CIFS 
> bug.
>

Yeah, that should work around it. This CIFS patch should also fix it.
It's untested but it's reasonably simple.

Just out of curiosity, what sort of server are you mounting here?

--------------[snip]------------------

>From ea8dc9927fb9542bb1c73b1982cc44bf3d4a9798 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@redhat.com>
Date: Tue, 24 Mar 2009 19:50:17 -0400
Subject: [PATCH] cifs: flush data on any setattr

We already flush all the dirty pages for an inode before doing
ATTR_SIZE and ATTR_MTIME changes. There's another problem though -- if
we change the mode so that the file becomes read-only then we may not
be able to write data to it afterward.

Fix this by just going back to flushing all the dirty data on any
setattr call. There are probably some cases that can be optimized out,
but we need to take care that we don't cause a regression by doing that.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/inode.c |   58 ++++++++++++++++++++++++++++--------------------------
 1 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index a8797cc..f4e880d 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1792,20 +1792,21 @@ cifs_setattr_unix(struct dentry *direntry, struct iattr *attrs)
 		goto out;
 	}
 
-	if ((attrs->ia_valid & ATTR_MTIME) || (attrs->ia_valid & ATTR_SIZE)) {
-		/*
-		   Flush data before changing file size or changing the last
-		   write time of the file on the server. If the
-		   flush returns error, store it to report later and continue.
-		   BB: This should be smarter. Why bother flushing pages that
-		   will be truncated anyway? Also, should we error out here if
-		   the flush returns error?
-		 */
-		rc = filemap_write_and_wait(inode->i_mapping);
-		if (rc != 0) {
-			cifsInode->write_behind_rc = rc;
-			rc = 0;
-		}
+	/*
+	 * Attempt to flush data before changing attributes. We need to do
+	 * this for ATTR_SIZE and ATTR_MTIME for sure, and if we change the
+	 * ownership or mode then we may also need to do this. Here, we take
+	 * the safe way out and just do the flush on all setattr requests. If
+	 * the flush returns error, store it to report later and continue.
+	 *
+	 * BB: This should be smarter. Why bother flushing pages that
+	 * will be truncated anyway? Also, should we error out here if
+	 * the flush returns error?
+	 */
+	rc = filemap_write_and_wait(inode->i_mapping);
+	if (rc != 0) {
+		cifsInode->write_behind_rc = rc;
+		rc = 0;
 	}
 
 	if (attrs->ia_valid & ATTR_SIZE) {
@@ -1903,20 +1904,21 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs)
 		return -ENOMEM;
 	}
 
-	if ((attrs->ia_valid & ATTR_MTIME) || (attrs->ia_valid & ATTR_SIZE)) {
-		/*
-		   Flush data before changing file size or changing the last
-		   write time of the file on the server. If the
-		   flush returns error, store it to report later and continue.
-		   BB: This should be smarter. Why bother flushing pages that
-		   will be truncated anyway? Also, should we error out here if
-		   the flush returns error?
-		 */
-		rc = filemap_write_and_wait(inode->i_mapping);
-		if (rc != 0) {
-			cifsInode->write_behind_rc = rc;
-			rc = 0;
-		}
+	/*
+	 * Attempt to flush data before changing attributes. We need to do
+	 * this for ATTR_SIZE and ATTR_MTIME for sure, and if we change the
+	 * ownership or mode then we may also need to do this. Here, we take
+	 * the safe way out and just do the flush on all setattr requests. If
+	 * the flush returns error, store it to report later and continue.
+	 *
+	 * BB: This should be smarter. Why bother flushing pages that
+	 * will be truncated anyway? Also, should we error out here if
+	 * the flush returns error?
+	 */
+	rc = filemap_write_and_wait(inode->i_mapping);
+	if (rc != 0) {
+		cifsInode->write_behind_rc = rc;
+		rc = 0;
 	}
 
 	if (attrs->ia_valid & ATTR_SIZE) {
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-25  0:17         ` Steven French
@ 2009-03-25  0:49           ` Jeff Layton
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2009-03-25  0:49 UTC (permalink / raw)
  To: Steven French; +Cc: Git Mailing List, Linus Torvalds, Peter

On Tue, 24 Mar 2009 19:17:34 -0500
Steven French <sfrench@us.ibm.com> wrote:

> Jeff Layton <jlayton@redhat.com> wrote on 03/24/2009 06:35:06 PM:
> 
> > On Tue, 24 Mar 2009 15:30:38 -0700 (PDT)
> > Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > 
> > > 
> > > 
> > > On Tue, 24 Mar 2009, Peter wrote:
> > > > > 
> > > > > What OS? What filesystem? Are you perhaps running out of space?
> > > > 
> > > > Its Lenny Debian 5.0.0, Diskspace is ample . Filesystem is cifs 
> > ( this is a
> > > > windows 2000 share mounted with samba in a VMware Workstation 
> > Debian client (
> > > > yes, I know ... )). Memory usage, according to htop, is constant
> > = 140/504 MB
> > > > during the whole process until git fails.
> > > 
> > > Ok, it sounds very much like a possible CIFS problem. 
> > > 
> > > Getting the exact error code for the "close()" will be interesting, 
> > > because the only thing that can return an error under Linux in close() 
> is 
> > > if the filesystem "->flush()" function fails with an error.
> > > 
> > > In cifs, the cifs_flush() thing does a filemap_fdatawrite(), forcing 
> the 
> > > data out, and that in turn calls do_writepages() which in turn calls 
> > > either generic_writepages() or cifs_writepages() depending on random 
> cifs 
> > > crap.
> > > 
> > > I'm not seeing any obvious errors there. But I would _not_ be 
> surprised if 
> > > the fchmod(fd, 0444) that we did before the close could be causing 
> this: 
> > > cifs gets confused and thinks that it must not write to the file 
> because 
> > > the file has been turned read-only.
> > > 
> > > Here's an idea: if this is reproducible for you, does the behavior 
> change 
> > > if you do
> > > 
> > >    [core]
> > >       core.fsyncobjectfiles = true
> > > 
> > > in your .git/config file? That causes git to always fsync() the 
> written 
> > > data _before_ it does that fchmod(), which in turn means that by the 
> time 
> > > the close() rolls around, there should be no data to write, and thus 
> no 
> > > possibility that anybody gets confused when there is still dirty data 
> on a 
> > > file that has been marked read-only.
> > > 
> > > Anyway, I'm cc'ing Steve French and Jeff Layton, as the major CIFS 
> go-to 
> > > guys. It does seem like a CIFS bug is likely.
> > > 
> > > Steve, Jeff: git does basically
> > > 
> > >    open(".git/objects/xy/tmp_obj_xyzzy", O_RDWR|O_CREAT|O_EXCL, 0600) 
> = 5
> > >    write(5, ".."..., len)
> > >    fchmod(5, 0444)
> > >    close(5)
> > >    link(".git/objects/xy/tmp_obj_xyzzy", ".git/objects/xy/xyzzy");
> > >    unlink(".git/objects/xy/tmp_obj_xyzzy");
> > > 
> > > to write a new datafile. Under CIFS, that "close()" apparently 
> sometimes 
> > > returns with an error, and fails.
> > > 
> > > I guess we could change the "fchmod()" to happen after the close(), 
> just 
> > > to make it easier for filesystems to get this right. And yes, as 
> outlined 
> > > above, there's a config option to make git use fdatasync() before it 
> does 
> > > that fchmod() too. But it does seem like CIFS is just buggy.
> > > 
> > > If CIFS has problems with the above sequence (say, if some timeout 
> > > refreshes the inode data or causes a re-connect with the server or 
> > > whatever), then maybe cifs should always do an implicit fdatasync() 
> when 
> > > doing fchmod(), just to make sure that the fchmod won't screw up any 
> > > cached dirty data?
> > > 
> > 
> > Yes. That's probably the right thing to do here. This is looks like a
> > regression that I introduced some time ago in
> > cea218054ad277d6c126890213afde07b4eb1602. Before that delta we always
> > flushed all the data before doing any setattr. After that delta, we
> > just did it on size changes. In a later commit, Steve fixed it so that
> > it got done on ATTR_MTIME too.
> > 
> > We can easily change the cifs_setattr variants to flush all the dirty
> > data on ATTR_MODE changes, and maybe even change it to only flush on
> > certain ATTR_MODE changes, like when we're clearing all the write bits.
> > 
> > I wonder though whether that's sufficient. If we're changing ownership
> > for instance, will we hit the same issue? Maybe the safest approach is
> > just to go back to flushing on any setattr call.
> > 
> > Steve, thoughts?
> 
> Flushing on all setattr calls seems excessive, but the case of
> fchmod(0444) followed by close should not require it, and should
> work over cifs unless the tcp socket drops between the chmod
> and the close/flush (requiring reconnection and reopening
> the file handle).  For the normal case, the chmod should not
> matter, since for cifs, unlike NFSv3, there is a network
> open operation and access checks are done on the open call, and 
> you already have the file open for write).  If /proc/fs/cifs/Stats
> shows the number of reconnects as 0, then we don't have the case 
> of dropping the network between chmod and flush/close.  Since certain
> writebehind errors (in writepages) can not be returned to the application
> until flush/close, it is just as likely that what you are seeing is
> an error from a previous failed write.  The logic for handling
> errors in cifs_writepages is much better in recent kernels.
> 
> What is the kernel version?
> 

Good point -- I stand corrected. I just gave this series of calls a quick
test against a win2k8 server:

unlink("/mnt/win2k8/testfile")          = 0
open("/mnt/win2k8/testfile", O_RDWR|O_CREAT, 0644) = 3
write(3, "\1\0\0\0\0\0\0\0\35\244@\211:\0\0\0\7\0\0\0\0\0\0\0X\r\25-\377\177\0\0`"..., 1024) = 1024
fchmod(3, 0444)                         = 0
close(3)                                = 0

Here's what it looks like on the wire (forgive the IPv6 addresses, it's
just what I had handy)...

Create the file:
  3.842955      feed::2 -> feed::8      SMB NT Create AndX Request, Path: \testfile
  3.843641      feed::8 -> feed::2      SMB NT Create AndX Response, FID: 0x4007

Set the ATTR_READONLY bit:
  3.844255      feed::2 -> feed::8      SMB Trans2 Request, SET_FILE_INFO, FID: 0x4007
  3.845070      feed::8 -> feed::2      SMB Trans2 Response, FID: 0x4007, SET_FILE_INFO

Write the data:
  3.845766      feed::2 -> feed::8      SMB Write AndX Request, FID: 0x4007, 1024 bytes at offset 0
  3.846627      feed::8 -> feed::2      SMB Write AndX Response, FID: 0x4007, 1024 bytes

Close the file:
  3.847043      feed::2 -> feed::8      SMB Close Request, FID: 0x4007
  3.847940      feed::8 -> feed::2      SMB Close Response, FID: 0x4007

...none of that errored out, even though the writes clearly hit the
server after the mode change. Either there's something else going on
here, or this server is misbehaving. It might be interesting to see a
capture.

Some info about what sort of server this is might also be helpful.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: fatal: unable to write sha1 file git 1.6.2.1
  2009-03-24 22:30     ` Linus Torvalds
  2009-03-24 22:42       ` Peter
  2009-03-24 23:35       ` Jeff Layton
@ 2009-03-25 10:52       ` Peter
  2 siblings, 0 replies; 13+ messages in thread
From: Peter @ 2009-03-25 10:52 UTC (permalink / raw)
  To: Linus Torvalds, git


>
> Here's an idea: if this is reproducible for you, does the behavior change 
> if you do
>
> 	[core]
> 		core.fsyncobjectfiles = true
>
>   
Yes, that works !
Thanks a lot !

Peter

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-03-25 10:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-24 18:20 fatal: unable to write sha1 file git 1.6.2.1 Peter
2009-03-24 19:30 ` Nicolas Pitre
2009-03-24 19:31 ` Linus Torvalds
2009-03-24 21:05   ` Peter
2009-03-24 22:30     ` Linus Torvalds
2009-03-24 22:42       ` Peter
2009-03-25  0:03         ` Linus Torvalds
2009-03-25  0:24           ` Jeff Layton
2009-03-24 23:35       ` Jeff Layton
2009-03-25  0:11         ` Linus Torvalds
2009-03-25  0:17         ` Steven French
2009-03-25  0:49           ` Jeff Layton
2009-03-25 10:52       ` Peter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).