* File owner/group and git @ 2015-11-05 2:03 David Turner 2015-11-05 2:38 ` Junio C Hamano 0 siblings, 1 reply; 3+ messages in thread From: David Turner @ 2015-11-05 2:03 UTC (permalink / raw) To: git mailing list In unpack-trees.c, in verify_uptodate_1, we check ie_match_stat. This returns OWNER_CHANGED if a file has changed ownership since the index was updated. Do we actually care about that particular case? Or really anything other than DATA_CHANGED? (We noticed this due to a bug in our watchman code, which fakes up the owners, but it's a general question). ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: File owner/group and git 2015-11-05 2:03 File owner/group and git David Turner @ 2015-11-05 2:38 ` Junio C Hamano 2015-11-05 17:57 ` David Turner 0 siblings, 1 reply; 3+ messages in thread From: Junio C Hamano @ 2015-11-05 2:38 UTC (permalink / raw) To: David Turner; +Cc: git mailing list David Turner <dturner@twopensource.com> writes: > In unpack-trees.c, in verify_uptodate_1, we check ie_match_stat. This > returns OWNER_CHANGED if a file has changed ownership since the index > was updated. Do we actually care about that particular case? Or really > anything other than DATA_CHANGED? That's a 10-year old code and there aren't that many people left who can answer the original rationale, I am afraid ;-) In general, "Do we actually care?" is not the question we ask in this area of the code. "Does it help us to catch real changes, or does it change spuriously to make it too unreliable a signal to be useful?" is the question that drives the design of this part of the system. DATA_CHANGED is "we know the contents are different without even looking at the data". If the size is different from the last time we hashed the data, the contents must have changed. The inverse is not true (and that is half of the "racy git" issue). Other *_CHANGED are finely classified only because originally we didn't really know which are useful to treat as notable change event, and "changed" variable had sufficient number of bits to hold different classification, so that we could pick and choose which ones we truly care. We knew MTIME was useful in the sense that even if the size is the same, updated mtime is good enough indication that the stuff has changed, even to "make" utility. INODE and CTIME are not so stable on some filesystems (e.g. inum may not be stable on a network share across remount) and in some environments (e.g. some virus scanners touch ctime to mark scanned files, cf. 1ce4790b), and would trigger false positives too often to be useful. We always paid attention to them initially, but there are configurations to tell Git not raise them these days. OWNER probably falls into a category that is stable enough to be useful, as the most likely way for it to change is not by running "chown" on the file in-place (which does not change the contents), but by running "mv" to drop another file owned by somebody else to the original location (which likely does change the contents). At the same time, "mv" a different file into the path would likely trigger changes to INODE and MTIME as well, so it cannot be more than belt-and-suspenders measure to catch modification. In that sense ignoring OWNER would not hurt too much. If it changes spuriously to make it too unreliable a signal to be useful, it certainly is OK to introduce a knob to ignore it. It might even make sense to ignore it unconditionally if the false hit happens too frequently, but offhand my gut reaction is that there may be something wrong in the environment (i.e. system outside Git in which Git runs) if owner/group changes spuriously to cause issues. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: File owner/group and git 2015-11-05 2:38 ` Junio C Hamano @ 2015-11-05 17:57 ` David Turner 0 siblings, 0 replies; 3+ messages in thread From: David Turner @ 2015-11-05 17:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: git mailing list On Wed, 2015-11-04 at 18:38 -0800, Junio C Hamano wrote: > David Turner <dturner@twopensource.com> writes: > > > In unpack-trees.c, in verify_uptodate_1, we check ie_match_stat. This > > returns OWNER_CHANGED if a file has changed ownership since the index > > was updated. Do we actually care about that particular case? Or really > > anything other than DATA_CHANGED? > > That's a 10-year old code and there aren't that many people left > who can answer the original rationale, I am afraid ;-) > > In general, "Do we actually care?" is not the question we ask in > this area of the code. "Does it help us to catch real changes, or > does it change spuriously to make it too unreliable a signal to be > useful?" is the question that drives the design of this part of the > system. > > DATA_CHANGED is "we know the contents are different without even > looking at the data". If the size is different from the last time > we hashed the data, the contents must have changed. The inverse is > not true (and that is half of the "racy git" issue). > > Other *_CHANGED are finely classified only because originally we > didn't really know which are useful to treat as notable change > event, and "changed" variable had sufficient number of bits to hold > different classification, so that we could pick and choose which > ones we truly care. We knew MTIME was useful in the sense that even > if the size is the same, updated mtime is good enough indication > that the stuff has changed, even to "make" utility. > > INODE and CTIME are not so stable on some filesystems (e.g. inum may > not be stable on a network share across remount) and in some > environments (e.g. some virus scanners touch ctime to mark scanned > files, cf. 1ce4790b), and would trigger false positives too often to > be useful. We always paid attention to them initially, but there > are configurations to tell Git not raise them these days. > > OWNER probably falls into a category that is stable enough to be > useful, as the most likely way for it to change is not by running > "chown" on the file in-place (which does not change the contents), > but by running "mv" to drop another file owned by somebody else to > the original location (which likely does change the contents). At > the same time, "mv" a different file into the path would likely > trigger changes to INODE and MTIME as well, so it cannot be more > than belt-and-suspenders measure to catch modification. In that > sense ignoring OWNER would not hurt too much. > > If it changes spuriously to make it too unreliable a signal to be > useful, it certainly is OK to introduce a knob to ignore it. It > might even make sense to ignore it unconditionally if the false hit > happens too frequently, but offhand my gut reaction is that there > may be something wrong in the environment (i.e. system outside Git > in which Git runs) if owner/group changes spuriously to cause > issues. Thanks. The only case where we saw it was with our watchman code, which lies about ownership (to save space/time). We're going to try ignoring OWNER_CHANGED in our watchman branch, and if that fixes the issue for our users, we'll stop worrying about it on the theory that Duy's watchman stuff is the long-term path forward, and it doesn't have this issue. ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-11-05 17:57 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-05 2:03 File owner/group and git David Turner 2015-11-05 2:38 ` Junio C Hamano 2015-11-05 17:57 ` David Turner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).