From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dieter Plaetinck Subject: Re: [PATCH] git-add: allow --ignore-missing always, not just in dry run Date: Fri, 20 Jan 2012 19:14:51 +0100 Message-ID: <20120120191451.6dac25dd@plaetinck.be> References: <1326923544-8287-1-git-send-email-dieter@plaetinck.be> <7vobu0liwj.fsf@alter.siamese.dyndns.org> <20120119115216.2773a02f@plaetinck.be> <7vk44nidtb.fsf@alter.siamese.dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: git@vger.kernel.org To: Junio C Hamano X-From: git-owner@vger.kernel.org Fri Jan 20 19:16:13 2012 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RoJ0R-0004jG-Ul for gcvg-git-2@lo.gmane.org; Fri, 20 Jan 2012 19:16:08 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753566Ab2ATSQD (ORCPT ); Fri, 20 Jan 2012 13:16:03 -0500 Received: from smtp01.priorweb.be ([62.182.61.111]:51074 "EHLO smtp.priorweb.be" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753448Ab2ATSQB (ORCPT ); Fri, 20 Jan 2012 13:16:01 -0500 Received: from [178.79.146.162] (port=44394 helo=localhost.localdomain) by smtp.priorweb.be with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1RoJ0F-0001ZK-Qh; Fri, 20 Jan 2012 19:15:55 +0100 In-Reply-To: <7vk44nidtb.fsf@alter.siamese.dyndns.org> X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.8; x86_64-unknown-linux-gnu) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, 19 Jan 2012 13:26:40 -0800 Junio C Hamano wrote: > Dieter Plaetinck writes: > > > So basically, if this tool needs to check which files > > still/no-longer exist before calling git-add, that's vulnerable to > > race conditions. > > I do not think you are solving the real problem in your script even > if you allowed "add --ignore-missing". > > I suspect you are making things even worse by using > "--ignore-missing" in your script. If a user is actively updating the > files in the filesystem, at least "git add" without > "--ignore-missing" would catch the case where you _thought_ the user > modified but still has the file, but in reality the further updates > in the working tree removed the file, which is a clear indication > that the rate you are processing the notify stream is slower than the > activity generated by the user and allows you to notice that you may > be better off waiting a bit until things calm down before running > your automated commit. I don't understand what you mean. if this happened: 1) modify file 2) modify file 3) remove file but my script tries to git add --ignore-missing, when 3) already happened but the script only sees event 2) then it will not fail, then catch the 3rd event, then do a git rm. if however, i don't enable ignore-missing, there's a failure on the git add. I guess what you're saying is "better to get an error, interpret it as `this may not be a "real" error`, just continue" > > Also, with or without "--ignore-missing", I think we have safety > valves to cause "git add" fail if the file being added is updated > while git is working on it (i.e. we read and compute the object name, > and then store it compressed, and check the hash of what is stored > matches the object name we computed earlier, which would fail if the > file is updated in the middle at the right time). > > This means that the "--ignore-missing" option will _not_ eliminate all > cases where "git add" may detect an error and fails. In other words, > your script needs to deal with error return from "git add" anyway > even if we applied your patch and you used "--ignore-missing" in your > script. > > I have to say that the basic premise of your script is simply broken, > and I am afraid that it is unfixable without an atomic snapshot > support from the underlying filesystem (i.e. take a snapshot, run > 'git add' on it, and then release the snapshot). I assumed that `git add` works atomically. (as in: if you git add a file and modify that file at the same time, either the old or the new version will be added to the index, but always successfully). If I understand this correctly, `git add` can only be successful if at least the file remains untouched while the git command runs. This means I should change my entire approach and be aware that git may return failures when the user is changing files while we are adding to the index. Maybe I should do something like: once >0 inotify events happened, * run git status, for each file in git status, if we've seen events, try to add file to the index. if the above fails, wait a few seconds, then try again. if failures persist a few times in a row, only then we can be reasonbly sure something is really wrong. but there will always be the case where a file gets deleted and (re)added, there is always the risk for races: 1) my script runs "git rm" after seeing a "delete" event but the file has been added in the meanwhile... git removes the file => bad 2) my script runs "git add" after seeing a new/changed file but the file has been removed in the meanwhile... git gives an error. and in the latter case you can't just apply the "just retry" trick I mentioned above because the next event is delete, which would trigger a "git rm", potentially causing unrecoverable data loss as described in 1). one example of such things happening in real life is vim which creates (and removes) tmp files called `4913`. We could just add such filenames to gitignore, but that seems like a bit of a burden which shouldn't be needed. To avoid the risks mentioned above and the "file can be altered during git add command", what i'm really looking for is a way to, for a given file (file for which I've seen inotify events): synchronize changes of that file to the index even if the file was unknown to git, or doesn't exist anymore; and do that in an atomic way (or: i'll just retry a few times until it fails consistently, but removal race conditions can lead to data loss) I tried `git update-index` as you suggested but couldn't have it behave like that. And something like `git add --all` can still cause a file removal when it shouldn't, and lead to uncoverable data loss on race conditions. Sorry for the long mail, I tried to keep it as minimal as possible. thanks! Dieter