git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Josip Sokcevic <sokcevic@google.com>
Cc: jonathantanmy@google.com, git@vger.kernel.org, git@jeffhostetler.com
Subject: Re: [PATCH v2] diff-lib: Fix check_removed when fsmonitor is on
Date: Thu, 07 Sep 2023 11:07:28 -0700	[thread overview]
Message-ID: <xmqqa5txluvz.fsf@gitster.g> (raw)
In-Reply-To: <20230907170119.1536694-1-sokcevic@google.com> (Josip Sokcevic's message of "Thu, 7 Sep 2023 10:01:19 -0700")

Josip Sokcevic <sokcevic@google.com> writes:

> diff --git a/diff-lib.c b/diff-lib.c
> index d8aa777a73..664613bb1b 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -39,11 +39,22 @@
>  static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st)
>  {
>  	assert(is_fsmonitor_refreshed(istate));

Not a problem this patch introduces, but doesn't this call path

  diff_cache()
  -> unpack_trees()
     -> oneway_diff()
        -> do_oneway_diff()
           -> show_new_file(), show_modified()
               -> get_stat_data()
                  -> check_removed()

violate the assertion?  If so, perhaps we should rewrite it into a
more explicit "if (...) BUG(...)" that is not compiled away.

> -	if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) {
> -		if (!is_missing_file_error(errno))
> -			return -1;
> -		return 1;
> +	if (ce->ce_flags & CE_FSMONITOR_VALID) {
> +		/*
> +		 * Both check_removed() and its callers expect lstat() to have
> +		 * happened and, in particular, the st_mode field to be set.
> +		 * Simulate this with the contents of ce.
> +		 */
> +		memset(st, 0, sizeof(*st));

It is true that the original, when CE_FSMONITOR_VALID bit is set,
bypasses lstat() altogether and leaves the contents of st completely
uninitialized, but this is still way too insufficient, isn't it?

There are three call sites of the check_removed() function.

 * The first one in run_diff_files() only cares about st.st_mode and
   other members of the structure are not looked at.  This makes
   readers wonder if the "st" parameter to check_removed() should
   become "mode_t *st_mode" to clarify this point, but the primary
   thing I want to say is that this caller will not mind if we leave
   other members of st bogus (like 0-bit filled) as long as the mode
   is set correctly.

 * The second one in run_diff_files() passes the resulting &st to
   match_stat_with_submodule(), which in turn passes it to
   ie_match_stat(), which cares about "struct stat" members that are
   used for quick change detection, like owner, group, mtime.
   Giving it a bogus st will most likely cause it to report a
   change.

 * The third one is in get_stat_data().  This also uses the &st to
   call match_stat_with_submodule(), so it is still totally broken
   to give it a bogus st, the same way as the second caller above.

> +		st->st_mode = ce->ce_mode;

Does this work correctly when the cache entry points at a gitlink,
which uses 0160000 that is not a valid st_mode?  I think you'd want
to use a reverse function of create_ce_mode().

> +	} else {
> +		if (lstat(ce->name, st) < 0) {
> +			if (!is_missing_file_error(errno))
> +				return -1;
> +			return 1;
> +		}
>  	}

At this point, if FSMONITOR_VALID bit is not set, we will always
perform lstat() and get all the members of st populated properly,
which is a definite improvement.

While I think this does not make it worse (it is an existing bug
that the code is broken for a ce with the CE_FSMONITOR_VALID bit
set), we may want to leave a note that we _know_ the code after this
patch is still broken.  "Simulate this with ..." -> "Just setting
st_mode is still insufficient and will break majority of callers".

It may make sense, until we clean it up, to disable the check for
the FSMONITOR_VALID bit in this codepath and always perform lstat().
Optimization matters, but computing quickly in order to return an
incorrect result is optimizing for a wrong thing.  I dunno.

Thanks.

  parent reply	other threads:[~2023-09-07 18:08 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-29  0:56 Strange diff-index with fsmonitor, submodules Jonathan Tan
2023-08-29  1:20 ` Junio C Hamano
2023-08-29 12:45   ` Jeff Hostetler
2023-08-29 16:57 ` Jeff Hostetler
2023-08-29 17:01   ` Jonathan Tan
2023-09-06  6:02 ` [PATCH] [diff-lib] Fix check_removed when fsmonitor is on Josip Sokcevic
2023-09-06 20:37   ` Jonathan Tan
2023-09-07 17:01     ` [PATCH v2] diff-lib: " Josip Sokcevic
2023-09-07 17:22       ` Jonathan Tan
2023-09-07 18:07       ` Junio C Hamano [this message]
2023-09-07 23:08         ` Josip Sokcevic
2023-09-08 15:00           ` Junio C Hamano
2023-09-11 17:09   ` [PATCH v3] " Josip Sokcevic
2023-09-11 22:53     ` Junio C Hamano
2023-09-12  3:03       ` Josip Sokcevic
2023-09-12 17:07         ` Junio C Hamano
2023-09-14 22:39           ` Josip Sokcevic
2023-09-18 16:35             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqa5txluvz.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=sokcevic@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).