git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* checkout-index: unable to create file foo (File exists)
@ 2012-11-01 20:25 Brian J. Murrell
  2012-11-04 22:10 ` Pete Wyckoff
  0 siblings, 1 reply; 4+ messages in thread
From: Brian J. Murrell @ 2012-11-01 20:25 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2128 bytes --]

When we use git on a network filesystem, occasionally and sporadically
we will see the following from a git checkout command:

error: git checkout-index: unable to create file foo (File exists)

Through a very basic grepping and following of the source it seems that
the core of the error message is coming from write_entry() in entry.c:

		fd = open_output_fd(path, ce, to_tempfile);
		if (fd < 0) {
			free(new);
			return error("unable to create file %s (%s)",
				path, strerror(errno));
		}

So looking into open_output_fd() there is a call to create_file() which
does:

	return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);

I am able to prevent the problem from happening with 100% success by
simply giving the git checkout a "-q" argument to prevent it from
emitting progress reports.  This would seem to indicate that the problem
likely revolves around the fact that the progress reporting uses SIGALRM.

Given that O_CREAT | O_EXCL are used in the open() call and that SIGALRM
(along with SA_RESTART) is being used frequently to do progress updates,
it seems reasonable to suspect that the problem is that open() is being
interrupted (but only after it creates the file and before completing)
by the progress reporting mechanism's SIGALRM and when the progress
reporting is done, open() is restarted automatically (due to the use of
SA_RESTART) and fails because the file exists and O_CREAT | O_EXCL are
used in the open() call.

Does this seem like a reasonable hypothesis?

If it does, where does the problem lie here?  Is it that SA_RESTART
should not be used since it's not safe with open() and O_CREAT | O_EXCL
(and every system call caller should be handling EINTR) or should the
open() be idempotent so that it can be restarted automatically with
SA_RESTART?  If open(2) is supposed to be idempotent, it would be most
useful to have a citation to standard where that is specified.

If open() is not required to be idempotent, it's use with O_CREAT |
O_EXCL and SA_RESTART seems fatally flawed.

Any insight or opinions would be much appreciated.

Cheers,
b.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: checkout-index: unable to create file foo (File exists)
  2012-11-01 20:25 checkout-index: unable to create file foo (File exists) Brian J. Murrell
@ 2012-11-04 22:10 ` Pete Wyckoff
  2012-11-05 15:25   ` Brian J. Murrell
  0 siblings, 1 reply; 4+ messages in thread
From: Pete Wyckoff @ 2012-11-04 22:10 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: git

brian@interlinx.bc.ca wrote on Thu, 01 Nov 2012 16:25 -0400:
> When we use git on a network filesystem, occasionally and sporadically
> we will see the following from a git checkout command:
> 
> error: git checkout-index: unable to create file foo (File exists)
> 
> Through a very basic grepping and following of the source it seems that
> the core of the error message is coming from write_entry() in entry.c:
> 
> 		fd = open_output_fd(path, ce, to_tempfile);
> 		if (fd < 0) {
> 			free(new);
> 			return error("unable to create file %s (%s)",
> 				path, strerror(errno));
> 		}
> 
> So looking into open_output_fd() there is a call to create_file() which
> does:
> 
> 	return open(path, O_WRONLY | O_CREAT | O_EXCL, mode);
> 
> I am able to prevent the problem from happening with 100% success by
> simply giving the git checkout a "-q" argument to prevent it from
> emitting progress reports.  This would seem to indicate that the problem
> likely revolves around the fact that the progress reporting uses SIGALRM.
> 
> Given that O_CREAT | O_EXCL are used in the open() call and that SIGALRM
> (along with SA_RESTART) is being used frequently to do progress updates,
> it seems reasonable to suspect that the problem is that open() is being
> interrupted (but only after it creates the file and before completing)
> by the progress reporting mechanism's SIGALRM and when the progress
> reporting is done, open() is restarted automatically (due to the use of
> SA_RESTART) and fails because the file exists and O_CREAT | O_EXCL are
> used in the open() call.
> 
> Does this seem like a reasonable hypothesis?

Fascinating problem and observations.

We've been using NFS with git for quite a while and have never
seen such an error.

> If it does, where does the problem lie here?  Is it that SA_RESTART
> should not be used since it's not safe with open() and O_CREAT | O_EXCL
> (and every system call caller should be handling EINTR) or should the
> open() be idempotent so that it can be restarted automatically with
> SA_RESTART?  If open(2) is supposed to be idempotent, it would be most
> useful to have a citation to standard where that is specified.
> 
> If open() is not required to be idempotent, it's use with O_CREAT |
> O_EXCL and SA_RESTART seems fatally flawed.

man 7 signal (linux man-pages 3.42) describes open() as restartable.

Which network filesystem and OS are you using?  The third option is
that there is a bug in the filesystem client.

		-- Pete

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: checkout-index: unable to create file foo (File exists)
  2012-11-04 22:10 ` Pete Wyckoff
@ 2012-11-05 15:25   ` Brian J. Murrell
  2012-11-05 17:53     ` Pete Wyckoff
  0 siblings, 1 reply; 4+ messages in thread
From: Brian J. Murrell @ 2012-11-05 15:25 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1383 bytes --]

On 12-11-04 05:10 PM, Pete Wyckoff wrote:
> 
> Fascinating problem and observations.

I thought so as well.

> We've been using NFS with git for quite a while and have never
> seen such an error.

Could be because NFS manages to operate more atomically given that it's
just the network exporting of local filesystem.

> man 7 signal (linux man-pages 3.42) describes open() as restartable.

Indeed.  The question is just what is to be assumed by the code that has
asked for the restartability.

> Which network filesystem and OS are you using?

The filesystem is Lustre.  So not only is it networked, it is
distributed where the namespace and data store are handled by different
nodes, to it's not at all as atomic as NFS-on-(say-)ext4.  Given that,
it's entirely possible to imagine a scenario where a namespace (MDT in
the Lustre nomenclature) operation could get interrupted after the
namespace entry has been created but before the open(2) completes.  So
the question here is who's responsibility is it to handle that situation?

> The third option is
> that there is a bug in the filesystem client.

Yep.  But before we can go on to determining a bug, the proper/expected
behavior needs to be determined.  I guess that's taking this a bit OT
for this list though.  I'm not really sure where else to go to determine
this though.  :-(

b.




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: checkout-index: unable to create file foo (File exists)
  2012-11-05 15:25   ` Brian J. Murrell
@ 2012-11-05 17:53     ` Pete Wyckoff
  0 siblings, 0 replies; 4+ messages in thread
From: Pete Wyckoff @ 2012-11-05 17:53 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: git

brian@interlinx.bc.ca wrote on Mon, 05 Nov 2012 10:25 -0500:
> On 12-11-04 05:10 PM, Pete Wyckoff wrote:
> > Which network filesystem and OS are you using?
> 
> The filesystem is Lustre.  So not only is it networked, it is
> distributed where the namespace and data store are handled by different
> nodes, to it's not at all as atomic as NFS-on-(say-)ext4.  Given that,
> it's entirely possible to imagine a scenario where a namespace (MDT in
> the Lustre nomenclature) operation could get interrupted after the
> namespace entry has been created but before the open(2) completes.  So
> the question here is who's responsibility is it to handle that situation?

That's all in the filesystem.  Hopefully it doesn't really work
like that because the fs is incosistent at this point.

ERESTARTSYS handling is done entirely in the kernel, not in glibc
and not git.  A possible in-kernel fix is not to handle any
signals (except KILL) when waiting for the open mechanics to
finish.

> > The third option is
> > that there is a bug in the filesystem client.
> 
> Yep.  But before we can go on to determining a bug, the proper/expected
> behavior needs to be determined.  I guess that's taking this a bit OT
> for this list though.  I'm not really sure where else to go to determine
> this though.  :-(

You could toss this to lustre support.  Or try first to come up
with a reduced testcase with lots of opens and SIGALRMs racing
against each other.  Maybe xfstests or some other suite might
also tickle the bug.

I don't think it is feasible to try to handle this error
condition in applications.

		-- Pete

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-11-05 17:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-01 20:25 checkout-index: unable to create file foo (File exists) Brian J. Murrell
2012-11-04 22:10 ` Pete Wyckoff
2012-11-05 15:25   ` Brian J. Murrell
2012-11-05 17:53     ` Pete Wyckoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).