All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Turner <dturner@twopensource.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git mailing list <git@vger.kernel.org>
Subject: Re: File owner/group and git
Date: Thu, 05 Nov 2015 12:57:00 -0500	[thread overview]
Message-ID: <1446746220.4131.57.camel@twopensource.com> (raw)
In-Reply-To: <xmqqoaf9891r.fsf@gitster.mtv.corp.google.com>

On Wed, 2015-11-04 at 18:38 -0800, Junio C Hamano wrote:
> David Turner <dturner@twopensource.com> writes:
> 
> > In unpack-trees.c, in verify_uptodate_1, we check ie_match_stat.  This
> > returns OWNER_CHANGED if a file has changed ownership since the index
> > was updated.  Do we actually care about that particular case?  Or really
> > anything other than DATA_CHANGED?
> 
> That's a 10-year old code and there aren't that many people left
> who can answer the original rationale, I am afraid ;-)
> 
> In general, "Do we actually care?" is not the question we ask in
> this area of the code.  "Does it help us to catch real changes, or
> does it change spuriously to make it too unreliable a signal to be
> useful?" is the question that drives the design of this part of the
> system.
> 
> DATA_CHANGED is "we know the contents are different without even
> looking at the data".  If the size is different from the last time
> we hashed the data, the contents must have changed.  The inverse is
> not true (and that is half of the "racy git" issue).
> 
> Other *_CHANGED are finely classified only because originally we
> didn't really know which are useful to treat as notable change
> event, and "changed" variable had sufficient number of bits to hold
> different classification, so that we could pick and choose which
> ones we truly care.  We knew MTIME was useful in the sense that even
> if the size is the same, updated mtime is good enough indication
> that the stuff has changed, even to "make" utility.
> 
> INODE and CTIME are not so stable on some filesystems (e.g. inum may
> not be stable on a network share across remount) and in some
> environments (e.g. some virus scanners touch ctime to mark scanned
> files, cf. 1ce4790b), and would trigger false positives too often to
> be useful.  We always paid attention to them initially, but there
> are configurations to tell Git not raise them these days.
> 
> OWNER probably falls into a category that is stable enough to be
> useful, as the most likely way for it to change is not by running
> "chown" on the file in-place (which does not change the contents),
> but by running "mv" to drop another file owned by somebody else to
> the original location (which likely does change the contents).  At
> the same time, "mv" a different file into the path would likely
> trigger changes to INODE and MTIME as well, so it cannot be more
> than belt-and-suspenders measure to catch modification.  In that
> sense ignoring OWNER would not hurt too much.
> 
> If it changes spuriously to make it too unreliable a signal to be
> useful, it certainly is OK to introduce a knob to ignore it.  It
> might even make sense to ignore it unconditionally if the false hit
> happens too frequently, but offhand my gut reaction is that there
> may be something wrong in the environment (i.e. system outside Git
> in which Git runs) if owner/group changes spuriously to cause
> issues.

Thanks.

The only case where we saw it was with our watchman code, which lies
about ownership (to save space/time).  We're going to try ignoring
OWNER_CHANGED in our watchman branch, and if that fixes the issue for
our users, we'll stop worrying about it on the theory that Duy's
watchman stuff is the long-term path forward, and it doesn't have this
issue.

      reply	other threads:[~2015-11-05 17:57 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-05  2:03 File owner/group and git David Turner
2015-11-05  2:38 ` Junio C Hamano
2015-11-05 17:57   ` David Turner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1446746220.4131.57.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.