git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] builtin/clean.c: Handle disappearing files
@ 2015-05-14 21:16 David Turner
  2015-05-14 21:45 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: David Turner @ 2015-05-14 21:16 UTC (permalink / raw)
  To: git; +Cc: David Turner

During a git clean, some other process might be deleting files as
well.  If this happens, make git clean no longer die.

Signed-off-by: David Turner <dturner@twitter.com>
---
 builtin/clean.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/builtin/clean.c b/builtin/clean.c
index 98c103f..3ae44c2 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -941,8 +941,16 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
 		if (!cache_name_is_other(ent->name, ent->len))
 			continue;
 
-		if (lstat(ent->name, &st))
-			die_errno("Cannot lstat '%s'", ent->name);
+		/*
+		 * Some concurrent process might have already removed
+		 * ent->name.
+		 */
+		if (lstat(ent->name, &st)) {
+			if (errno == ENOENT || errno == ENOTDIR)
+				continue;
+			else
+				die_errno("Cannot lstat '%s'", ent->name);
+		}
 
 		if (pathspec.nr)
 			matches = dir_path_match(ent, &pathspec, 0, NULL);
-- 
2.0.4.315.gad8727a-twtrsrc

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin/clean.c: Handle disappearing files
  2015-05-14 21:16 [PATCH] builtin/clean.c: Handle disappearing files David Turner
@ 2015-05-14 21:45 ` Junio C Hamano
  2015-05-14 21:52   ` David Turner
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2015-05-14 21:45 UTC (permalink / raw)
  To: David Turner; +Cc: git, David Turner

David Turner <dturner@twopensource.com> writes:

> During a git clean, some other process might be deleting files as
> well.  If this happens, make git clean no longer die.
>
> Signed-off-by: David Turner <dturner@twitter.com>
> ---

I am having a hard time to convince myself that this is a good
change.

For this change to be an improvement that matters, you must be
allowing some other process mucking around with the files in your
working tree when you run "git clean".  The original catches such
situation as an anomaly that the user would want to be aware of (and
investigate), but this patch essentially declares that having such a
random process touching your working tree from sideways is a normal
situation.  I do not think I am willing to make such a declaration
myself; I'd rather want to know why other people are touching my
working tree while I am working in it.

>  builtin/clean.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/clean.c b/builtin/clean.c
> index 98c103f..3ae44c2 100644
> --- a/builtin/clean.c
> +++ b/builtin/clean.c
> @@ -941,8 +941,16 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
>  		if (!cache_name_is_other(ent->name, ent->len))
>  			continue;
>  
> -		if (lstat(ent->name, &st))
> -			die_errno("Cannot lstat '%s'", ent->name);
> +		/*
> +		 * Some concurrent process might have already removed
> +		 * ent->name.
> +		 */
> +		if (lstat(ent->name, &st)) {
> +			if (errno == ENOENT || errno == ENOTDIR)
> +				continue;
> +			else
> +				die_errno("Cannot lstat '%s'", ent->name);
> +		}
>  
>  		if (pathspec.nr)
>  			matches = dir_path_match(ent, &pathspec, 0, NULL);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin/clean.c: Handle disappearing files
  2015-05-14 21:45 ` Junio C Hamano
@ 2015-05-14 21:52   ` David Turner
  2015-05-14 22:14     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: David Turner @ 2015-05-14 21:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, David Turner

On Thu, 2015-05-14 at 14:45 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > During a git clean, some other process might be deleting files as
> > well.  If this happens, make git clean no longer die.
> >
> > Signed-off-by: David Turner <dturner@twitter.com>
> > ---
> 
> I am having a hard time to convince myself that this is a good
> change.
> 
> For this change to be an improvement that matters, you must be
> allowing some other process mucking around with the files in your
> working tree when you run "git clean".  The original catches such
> situation as an anomaly that the user would want to be aware of (and
> investigate), but this patch essentially declares that having such a
> random process touching your working tree from sideways is a normal
> situation.  I do not think I am willing to make such a declaration
> myself; I'd rather want to know why other people are touching my
> working tree while I am working in it.

Our build tool[1] stores statistics in a hidden file in the working
tree.  After it runs, it daemonizes and uploads those stats, and then
deletes the stats file.  Because the upload might take some time, the
user might run git clean in the meantime.  (I think there might be some
other deletion that happens in the background too; I haven't really
investigated).

Of course, in interactive use, very little harm is done if clean dies
here: the user simply must notice that the clean has failed and retry
it.  But in non-interactive use, scripts could fail.

At least, I think that's what could be causing us to hit this error; I
haven't actually done any research to see if this is true.

[1] https://github.com/pantsbuild/pants

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin/clean.c: Handle disappearing files
  2015-05-14 21:52   ` David Turner
@ 2015-05-14 22:14     ` Junio C Hamano
  2015-05-14 22:33       ` David Turner
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2015-05-14 22:14 UTC (permalink / raw)
  To: David Turner; +Cc: git, David Turner

David Turner <dturner@twopensource.com> writes:

>> For this change to be an improvement that matters, you must be
>> allowing some other process mucking around with the files in your
>> working tree when you run "git clean".  The original catches such
>> situation as an anomaly that the user would want to be aware of (and
>> investigate), but this patch essentially declares that having such a
>> random process touching your working tree from sideways is a normal
>> situation.  I do not think I am willing to make such a declaration
>> myself; I'd rather want to know why other people are touching my
>> working tree while I am working in it.
>
> Our build tool[1] stores statistics in a hidden file in the working
> tree.  After it runs, it daemonizes and uploads those stats, and then
> deletes the stats file.  Because the upload might take some time, the
> user might run git clean in the meantime.  (I think there might be some
> other deletion that happens in the background too; I haven't really
> investigated).
>
> Of course, in interactive use, very little harm is done if clean dies
> here: the user simply must notice that the clean has failed and retry
> it.  But in non-interactive use, scripts could fail.
>
> At least, I think that's what could be causing us to hit this error; I
> haven't actually done any research to see if this is true.

I find that the above argues that this patch is a bad idea.

The change sweeps the problem under the rug, killing the canary in
the mine, instead of motivating you to figure out why it is
happening to you.

Of course, scripts could choose to ignore errors from "git clean" it
runs.  At that point, the "sweeping under the rug" ugliness is not
in Git, which is better ;-).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin/clean.c: Handle disappearing files
  2015-05-14 22:14     ` Junio C Hamano
@ 2015-05-14 22:33       ` David Turner
  2015-05-15  5:26         ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: David Turner @ 2015-05-14 22:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, David Turner

On Thu, 2015-05-14 at 15:14 -0700, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> >> For this change to be an improvement that matters, you must be
> >> allowing some other process mucking around with the files in your
> >> working tree when you run "git clean".  The original catches such
> >> situation as an anomaly that the user would want to be aware of (and
> >> investigate), but this patch essentially declares that having such a
> >> random process touching your working tree from sideways is a normal
> >> situation.  I do not think I am willing to make such a declaration
> >> myself; I'd rather want to know why other people are touching my
> >> working tree while I am working in it.
> >
> > Our build tool[1] stores statistics in a hidden file in the working
> > tree.  After it runs, it daemonizes and uploads those stats, and then
> > deletes the stats file.  Because the upload might take some time, the
> > user might run git clean in the meantime.  (I think there might be some
> > other deletion that happens in the background too; I haven't really
> > investigated).
> >
> > Of course, in interactive use, very little harm is done if clean dies
> > here: the user simply must notice that the clean has failed and retry
> > it.  But in non-interactive use, scripts could fail.
> >
> > At least, I think that's what could be causing us to hit this error; I
> > haven't actually done any research to see if this is true.
> 
> I find that the above argues that this patch is a bad idea.
> 
> The change sweeps the problem under the rug, killing the canary in
> the mine, instead of motivating you to figure out why it is
> happening to you.

But it's totally legit for processes to create and delete files in the
working tree at any time.  Maybe I'm editing during the clean, and my
editor creates a backup file (or creates a lock file which it's going to
move on top of my old file).

> Of course, scripts could choose to ignore errors from "git clean" it
> runs.  At that point, the "sweeping under the rug" ugliness is not
> in Git, which is better ;-).

The problem is that git clean does not finish the job it has done in
this case.  So my script would have to retry on error (and every script
that git clean uses would have to so). 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin/clean.c: Handle disappearing files
  2015-05-14 22:33       ` David Turner
@ 2015-05-15  5:26         ` Junio C Hamano
  2015-05-15 18:13           ` David Turner
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2015-05-15  5:26 UTC (permalink / raw)
  To: David Turner; +Cc: git, David Turner

David Turner <dturner@twopensource.com> writes:

>> > Of course, in interactive use, very little harm is done if clean dies
>> > here: the user simply must notice that the clean has failed and retry
>> > it.  But in non-interactive use, scripts could fail.
>> >
>> > At least, I think that's what could be causing us to hit this error; I
>> > haven't actually done any research to see if this is true.
>> 
>> I find that the above argues that this patch is a bad idea.
>> 
>> The change sweeps the problem under the rug, killing the canary in
>> the mine, instead of motivating you to figure out why it is
>> happening to you.
>
> But it's totally legit for processes to create and delete files in the
> working tree at any time.  Maybe I'm editing during the clean, and my
> editor creates a backup file (or creates a lock file which it's going to
> move on top of my old file).

Isn't that exactly what users need to be aware of the potential
problem waiting to bite them?  Maybe you are editing during the
clean, which may notice a funny file your editor creates and removes
it, and when the editor wants to do something useful using that
funny file, it no longer is there.  Sure, if your editor is quick
enough and race with "clean" in certain timing, it may cause "clean"
to notice and die.  But if "clean" does not die, that means it won
the race and removed what your editor needed to use, no?

Isn't it even worse for scripts?  If your "build statistics" thing
created a temporary before "clean" started to run (causing 'clean'
to notice that as a cruft), and if "clean" gets to remove it before
the "build statistics" thing finishes what it was doing accumulating
whatever data in that file, when "build statistics" thing finally
tries to use the file, it no longer is there.

And isn't it even more worse for scripts that drive "clean"?  By
letting "clean" what it thinks cruft, while other processes are
"creating and deleting files in the working tree at any time"
without coordination, such a script is actively making the other
processes unreliable.  If "clean" in such a script stops as soon as
it notices such a race before doing any more damage, it would be a
good thing.  Retrying "clean" (to finish cleaning) does not sound
like a remedy; not running "clean" when other people are still using
the working tree is.  It's like somebody is randomly running "make
clean" in the background every once in a while in a working tree
where I am trying to do a real work.  Why is it "totally legit"?
And hiding the problem by making "clean" ignore such a race would
not help the user to fix such a script, would it?

Perhaps there is some assumption you have in the way the working
tree is being used that I haven't considered, and it is entirely
possible that this change may make sense under that special case,
but without knowing what that untold assumption is, the more I hear
about this topic from you, the less I am convinced that this is a
good change.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin/clean.c: Handle disappearing files
  2015-05-15  5:26         ` Junio C Hamano
@ 2015-05-15 18:13           ` David Turner
  2015-05-16 19:21             ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: David Turner @ 2015-05-15 18:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, David Turner

On Thu, 2015-05-14 at 22:26 -0700, Junio C Hamano wrote:
<snip>

OK, I'm sold that this patch is not a great idea. But:

> If your "build statistics" thing
> created a temporary before "clean" started to run (causing 'clean'
> to notice that as a cruft), and if "clean" gets to remove it before
> the "build statistics" thing finishes what it was doing accumulating
> whatever data in that file, when "build statistics" thing finally
> tries to use the file, it no longer is there.

Only if the build statistics file is included in the pathspec for
clean.  

So maybe we should just move the lstat below the pathspec check.  This
is probably faster in some cases anyway.  What do you think?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin/clean.c: Handle disappearing files
  2015-05-15 18:13           ` David Turner
@ 2015-05-16 19:21             ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2015-05-16 19:21 UTC (permalink / raw)
  To: David Turner; +Cc: git, David Turner

David Turner <dturner@twopensource.com> writes:

> So maybe we should just move the lstat below the pathspec check.  This
> is probably faster in some cases anyway.  What do you think?

I also was wondering about the placement of that lstat; because
having it before the pathspec filtering didn't make sense to me at
all.  If we are not told to clean some parts of the tree, we really
shouldn't be looking at that part of the tree.  I think such a
change is desirable and the justification of the change should be
based solely on the above---limit first and then check, don't check
what we were not told to look at.

And if such a change lets you solve your original issue, that is a
nice side effect I am happy to see.  By excluding the paths that are
unstable (because other processes randomly muck with them without
coordinating with the user) explicitly when running "git clean", the
user _is_ saying that s/he is aware of the issue and is actively
avoiding modifications by the third-party would not get in the way,
so I wouldn't call that sweeping it under the rug, like I did for
your original one.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-16 19:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-14 21:16 [PATCH] builtin/clean.c: Handle disappearing files David Turner
2015-05-14 21:45 ` Junio C Hamano
2015-05-14 21:52   ` David Turner
2015-05-14 22:14     ` Junio C Hamano
2015-05-14 22:33       ` David Turner
2015-05-15  5:26         ` Junio C Hamano
2015-05-15 18:13           ` David Turner
2015-05-16 19:21             ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).