Re: [PATCH] git-add: allow --ignore-missing always, not just in dry run

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dieter Plaetinck <dieter@plaetinck.be>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] git-add: allow --ignore-missing always, not just in dry run
Date: Fri, 20 Jan 2012 19:14:51 +0100	[thread overview]
Message-ID: <20120120191451.6dac25dd@plaetinck.be> (raw)
In-Reply-To: <7vk44nidtb.fsf@alter.siamese.dyndns.org>

On Thu, 19 Jan 2012 13:26:40 -0800
Junio C Hamano <gitster@pobox.com> wrote:

> Dieter Plaetinck <dieter@plaetinck.be> writes:
> 
> > So basically, if this tool needs to check which files
> > still/no-longer exist before calling git-add, that's vulnerable to
> > race conditions.
> 
> I do not think you are solving the real problem in your script even
> if you allowed "add --ignore-missing".
> 
> I suspect you are making things even worse by using
> "--ignore-missing" in your script. If a user is actively updating the
> files in the filesystem, at least "git add" without
> "--ignore-missing" would catch the case where you _thought_ the user
> modified but still has the file, but in reality the further updates
> in the working tree removed the file, which is a clear indication
> that the rate you are processing the notify stream is slower than the
> activity generated by the user and allows you to notice that you may
> be better off waiting a bit until things calm down before running
> your automated commit.

I don't understand what you mean. if this happened:
1) modify file
2) modify file
3) remove file
but my script tries to git add --ignore-missing, when 3) already happened
but the script only sees event 2)
then it will not fail, then catch the 3rd event, then do a git rm.
if however, i don't enable ignore-missing, there's a failure on the git add.
I guess what you're saying is "better to get an error, interpret it as `this may not be a "real" error`, just continue"

> 
> Also, with or without "--ignore-missing", I think we have safety
> valves to cause "git add" fail if the file being added is updated
> while git is working on it (i.e. we read and compute the object name,
> and then store it compressed, and check the hash of what is stored
> matches the object name we computed earlier, which would fail if the
> file is updated in the middle at the right time).
> 
> This means that the "--ignore-missing" option will _not_ eliminate all
> cases where "git add" may detect an error and fails. In other words,
> your script needs to deal with error return from "git add" anyway
> even if we applied your patch and you used "--ignore-missing" in your
> script.
> 
> I have to say that the basic premise of your script is simply broken,
> and I am afraid that it is unfixable without an atomic snapshot
> support from the underlying filesystem (i.e. take a snapshot, run
> 'git add' on it, and then release the snapshot).

I assumed that `git add` works atomically. (as in: if you git add a file and
modify that file at the same time, either the old or the new version will be added
to the index, but always successfully).
If I understand this correctly, `git add` can only be successful if at least the file
remains untouched while the git command runs.
This means I should change my entire approach and be aware that git may return failures when the user is changing files while we are adding to the index. 
Maybe I should do something like:
once >0 inotify events happened,
* run git status, for each file in git status, if we've seen events, try to add file to the index.
if the above fails, wait a few seconds, then try again. if failures persist a few times in a row, only then we can be reasonbly sure something is really wrong.

but there will always be the case where a file gets deleted and (re)added, there is always the risk for races:
1) my script runs "git rm" after seeing a "delete" event but the file has been added in the meanwhile... git removes the file => bad
2) my script runs "git add" after seeing a new/changed file but the file has been removed in the meanwhile... git gives an error.
and in the latter case you can't just apply the "just retry" trick I mentioned above because the next event is delete, which would trigger a "git rm",
potentially causing unrecoverable data loss as described in 1).

one example of such things happening in real life is vim which creates (and removes) tmp files called `4913`.
We could just add such filenames to gitignore,
but that seems like a bit of a burden which shouldn't be needed.

To avoid the risks mentioned above and the "file can be altered during git add command",
what i'm really looking for is a way to, for a given file (file for which I've seen inotify events):
synchronize changes of that file to the index even if the file was unknown to git, or doesn't exist anymore;
and do that in an atomic way (or: i'll just retry a few times until it fails consistently, but removal race conditions can lead to data loss)

I tried `git update-index` as you suggested but couldn't have it behave like that.
And something like `git add --all` can still cause a file removal when it shouldn't, and lead to uncoverable data loss on race conditions.

Sorry for the long mail, I tried to keep it as minimal as possible.

thanks!
Dieter

next prev parent reply	other threads:[~2012-01-20 18:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-18 21:52 [PATCH] git-add: allow --ignore-missing always, not just in dry run Dieter Plaetinck
2012-01-18 22:56 ` Junio C Hamano
2012-01-19 10:52   ` Dieter Plaetinck
2012-01-19 21:26     ` Junio C Hamano
2012-01-20 18:14       ` Dieter Plaetinck [this message]
2012-01-19 11:03   ` Thomas Rast
2012-01-19 18:46     ` Junio C Hamano
2012-01-20 12:56       ` Thomas Rast
2012-01-20 18:03         ` Junio C Hamano
2012-02-07  4:39 ` Mike Gant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120120191451.6dac25dd@plaetinck.be \
    --to=dieter@plaetinck.be \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).