From: Junio C Hamano <gitster@pobox.com>
To: Josip Sokcevic <sokcevic@google.com>
Cc: jonathantanmy@google.com, git@vger.kernel.org, git@jeffhostetler.com
Subject: Re: [PATCH v2] diff-lib: Fix check_removed when fsmonitor is on
Date: Thu, 07 Sep 2023 11:07:28 -0700 [thread overview]
Message-ID: <xmqqa5txluvz.fsf@gitster.g> (raw)
In-Reply-To: <20230907170119.1536694-1-sokcevic@google.com> (Josip Sokcevic's message of "Thu, 7 Sep 2023 10:01:19 -0700")
Josip Sokcevic <sokcevic@google.com> writes:
> diff --git a/diff-lib.c b/diff-lib.c
> index d8aa777a73..664613bb1b 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -39,11 +39,22 @@
> static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st)
> {
> assert(is_fsmonitor_refreshed(istate));
Not a problem this patch introduces, but doesn't this call path
diff_cache()
-> unpack_trees()
-> oneway_diff()
-> do_oneway_diff()
-> show_new_file(), show_modified()
-> get_stat_data()
-> check_removed()
violate the assertion? If so, perhaps we should rewrite it into a
more explicit "if (...) BUG(...)" that is not compiled away.
> - if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) {
> - if (!is_missing_file_error(errno))
> - return -1;
> - return 1;
> + if (ce->ce_flags & CE_FSMONITOR_VALID) {
> + /*
> + * Both check_removed() and its callers expect lstat() to have
> + * happened and, in particular, the st_mode field to be set.
> + * Simulate this with the contents of ce.
> + */
> + memset(st, 0, sizeof(*st));
It is true that the original, when CE_FSMONITOR_VALID bit is set,
bypasses lstat() altogether and leaves the contents of st completely
uninitialized, but this is still way too insufficient, isn't it?
There are three call sites of the check_removed() function.
* The first one in run_diff_files() only cares about st.st_mode and
other members of the structure are not looked at. This makes
readers wonder if the "st" parameter to check_removed() should
become "mode_t *st_mode" to clarify this point, but the primary
thing I want to say is that this caller will not mind if we leave
other members of st bogus (like 0-bit filled) as long as the mode
is set correctly.
* The second one in run_diff_files() passes the resulting &st to
match_stat_with_submodule(), which in turn passes it to
ie_match_stat(), which cares about "struct stat" members that are
used for quick change detection, like owner, group, mtime.
Giving it a bogus st will most likely cause it to report a
change.
* The third one is in get_stat_data(). This also uses the &st to
call match_stat_with_submodule(), so it is still totally broken
to give it a bogus st, the same way as the second caller above.
> + st->st_mode = ce->ce_mode;
Does this work correctly when the cache entry points at a gitlink,
which uses 0160000 that is not a valid st_mode? I think you'd want
to use a reverse function of create_ce_mode().
> + } else {
> + if (lstat(ce->name, st) < 0) {
> + if (!is_missing_file_error(errno))
> + return -1;
> + return 1;
> + }
> }
At this point, if FSMONITOR_VALID bit is not set, we will always
perform lstat() and get all the members of st populated properly,
which is a definite improvement.
While I think this does not make it worse (it is an existing bug
that the code is broken for a ce with the CE_FSMONITOR_VALID bit
set), we may want to leave a note that we _know_ the code after this
patch is still broken. "Simulate this with ..." -> "Just setting
st_mode is still insufficient and will break majority of callers".
It may make sense, until we clean it up, to disable the check for
the FSMONITOR_VALID bit in this codepath and always perform lstat().
Optimization matters, but computing quickly in order to return an
incorrect result is optimizing for a wrong thing. I dunno.
Thanks.
next prev parent reply other threads:[~2023-09-07 18:08 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-29 0:56 Strange diff-index with fsmonitor, submodules Jonathan Tan
2023-08-29 1:20 ` Junio C Hamano
2023-08-29 12:45 ` Jeff Hostetler
2023-08-29 16:57 ` Jeff Hostetler
2023-08-29 17:01 ` Jonathan Tan
2023-09-06 6:02 ` [PATCH] [diff-lib] Fix check_removed when fsmonitor is on Josip Sokcevic
2023-09-06 20:37 ` Jonathan Tan
2023-09-07 17:01 ` [PATCH v2] diff-lib: " Josip Sokcevic
2023-09-07 17:22 ` Jonathan Tan
2023-09-07 18:07 ` Junio C Hamano [this message]
2023-09-07 23:08 ` Josip Sokcevic
2023-09-08 15:00 ` Junio C Hamano
2023-09-11 17:09 ` [PATCH v3] " Josip Sokcevic
2023-09-11 22:53 ` Junio C Hamano
2023-09-12 3:03 ` Josip Sokcevic
2023-09-12 17:07 ` Junio C Hamano
2023-09-14 22:39 ` Josip Sokcevic
2023-09-18 16:35 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqa5txluvz.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=sokcevic@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).