All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Patrick Steinhardt <ps@pks.im>,
	 Jeff Hostetler <git@jeffhostetler.com>,
	 Jeff Hostetler <jeffhostetler@github.com>
Subject: Re: [PATCH v2 14/16] fsmonitor: support case-insensitive events
Date: Fri, 23 Feb 2024 10:14:48 -0800	[thread overview]
Message-ID: <xmqqjzmvt5d3.fsf@gitster.g> (raw)
In-Reply-To: <288f3f4e54e98a68d72e97125b1520605c138c3c.1708658300.git.gitgitgadget@gmail.com> (Jeff Hostetler via GitGitGadget's message of "Fri, 23 Feb 2024 03:18:18 +0000")

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +/*
> + * Use the name-hash to do a case-insensitive cache-entry lookup with
> + * the pathname and invalidate the cache-entry.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */
> +static size_t handle_using_name_hash_icase(
> +	struct index_state *istate, const char *name)
> +{
> +	struct cache_entry *ce = NULL;
> +
> +	ce = index_file_exists(istate, name, strlen(name), 1);
> +	if (!ce)
> +		return 0;
> +
> +	/*
> +	 * A case-insensitive search in the name-hash using the
> +	 * observed pathname found a cache-entry, so the observed path
> +	 * is case-incorrect.  Invalidate the cache-entry and use the
> +	 * correct spelling from the cache-entry to invalidate the
> +	 * untracked-cache.  Since we now have sparse-directories in
> +	 * the index, the observed pathname may represent a regular
> +	 * file or a sparse-index directory.
> +	 *
> +	 * Note that we should not have seen FSEvents for a
> +	 * sparse-index directory, but we handle it just in case.
> +	 *
> +	 * Either way, we know that there are not any cache-entries for
> +	 * children inside the cone of the directory, so we don't need to
> +	 * do the usual scan.
> +	 */
> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
> +			 name, ce->name);
> +
> +	untracked_cache_invalidate_trimmed_path(istate, ce->name, 0);
> +	ce->ce_flags &= ~CE_FSMONITOR_VALID;
> +	return 1;
> +}

You first ask the name-hash to turn the incoming "name" into the
case variant that we know about, i.e. ce->name, and use that to
access the untracked cache.  Clever and makes sense.  But if we have
ce->name, doesn't it mean the name is tracked?  Do we find anything
useful to do in the untracked cache invalidation codepath in that
case?

An FSmonitor event with case-incorrect pathname for a directory may
not be this trivial, I presume, and I expect that is what the
remainder of this patch is about.

> +
> +/*
> + * Use the dir-name-hash to find the correct-case spelling of the
> + * directory.  Use the canonical spelling to invalidate all of the
> + * cache-entries within the matching cone.
> + *
> + * Returns the number of cache-entries that we invalidated.
> + */
> +static size_t handle_using_dir_name_hash_icase(
> +	struct index_state *istate, const char *name)

It is a bit unfortunate that here on the name-hash side we contrast
the two helper function variants as "dir-name" vs "name", while the
original handle_path side use "without_slash" vs "with_slash".

If I understand correctly, it is not like there are two distinct
hashes, "name-hash" vs "dir-name-hash".  Both of these helpers use
the same "name-hash" mechanism, and this function differs from the
previous one in that it is about a directory, which is why it has
"dir" in its name.  I wonder if we renamed the other one with
"nondir" in its name, and the other without_slash and with_slash
pair to match, e.g., handle_nondir_path() vs handle_dir_path(), or
something like that, the resulting names for these four functions
become easier to contrast and understand?

> +{
> +	struct strbuf canonical_path = STRBUF_INIT;
> +	int pos;
> +	size_t len = strlen(name);
> +	size_t nr_in_cone;
> +
> +	if (name[len - 1] == '/')
> +		len--;
> +
> +	if (!index_dir_find(istate, name, len, &canonical_path))
> +		return 0; /* name is untracked */
> +
> +	if (!memcmp(name, canonical_path.buf, canonical_path.len)) {
> +		strbuf_release(&canonical_path);
> +		/*
> +		 * NEEDSWORK: Our caller already tried an exact match
> +		 * and failed to find one.  They called us to do an
> +		 * ICASE match, so we should never get an exact match,
> +		 * so we could promote this to a BUG() here if we
> +		 * wanted to.  It doesn't hurt anything to just return
> +		 * 0 and go on becaus we should never get here.  Or we
> +		 * could just get rid of the memcmp() and this "if"
> +		 * clause completely.
> +		 */
> +		return 0; /* should not happen */
> +	}

"becaus" -> "because".

If we should never get here, having BUG("we should never get here")
would not hurt anything, either.  On the other hand, silently
returning 0 will hide the bug under the carpet, and I am not sure it
is fair to call it "doesn't hurt anything".

> +
> +	trace_printf_key(&trace_fsmonitor,
> +			 "fsmonitor_refresh_callback MAP: '%s' '%s'",
> +			 name, canonical_path.buf);
> +
> +	/*
> +	 * The dir-name-hash only tells us the corrected spelling of
> +	 * the prefix.  We have to use this canonical path to do a
> +	 * lookup in the cache-entry array so that we repeat the
> +	 * original search using the case-corrected spelling.
> +	 */
> +	strbuf_addch(&canonical_path, '/');
> +	pos = index_name_pos(istate, canonical_path.buf,
> +			     canonical_path.len);
> +	nr_in_cone = handle_path_with_trailing_slash(
> +		istate, canonical_path.buf, pos);
> +	strbuf_release(&canonical_path);
> +	return nr_in_cone;
> +}

Nice.  Do we need to give this corrected name to help untracked
cache invalidation from the caller that called us?

> @@ -319,6 +416,19 @@ static void fsmonitor_refresh_callback(struct index_state *istate, char *name)
>  	else
>  		nr_in_cone = handle_path_without_trailing_slash(istate, name, pos);
>  
> +	/*
> +	 * If we did not find an exact match for this pathname or any
> +	 * cache-entries with this directory prefix and we're on a
> +	 * case-insensitive file system, try again using the name-hash
> +	 * and dir-name-hash.
> +	 */
> +	if (!nr_in_cone && ignore_case) {
> +		nr_in_cone = handle_using_name_hash_icase(istate, name);
> +		if (!nr_in_cone)
> +			nr_in_cone = handle_using_dir_name_hash_icase(
> +				istate, name);
> +	}

It might be interesting to learn how often we go through these
"fallback" code paths by tracing.  Maybe it will become too noisy?
I dunno.

>  	if (nr_in_cone)
>  		trace_printf_key(&trace_fsmonitor,
>  				 "fsmonitor_refresh_callback CNT: %d",

  reply	other threads:[~2024-02-23 18:14 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 20:52 [PATCH 00/12] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
2024-02-13 20:52 ` [PATCH 01/12] sparse-index: pass string length to index_file_exists() Jeff Hostetler via GitGitGadget
2024-02-13 22:07   ` Junio C Hamano
2024-02-20 17:34     ` Jeff Hostetler
2024-02-13 20:52 ` [PATCH 02/12] name-hash: add index_dir_exists2() Jeff Hostetler via GitGitGadget
2024-02-13 21:43   ` Junio C Hamano
2024-02-20 17:38     ` Jeff Hostetler
2024-02-20 19:34       ` Junio C Hamano
2024-02-15  9:31   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 03/12] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
2024-02-13 20:52 ` [PATCH 04/12] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-20 18:54     ` Jeff Hostetler
2024-02-21 12:54       ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 05/12] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
2024-02-14  1:34   ` Junio C Hamano
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 06/12] fsmonitor: clarify handling of directory events in callback Jeff Hostetler via GitGitGadget
2024-02-14  7:47   ` Junio C Hamano
2024-02-20 18:56     ` Jeff Hostetler
2024-02-20 19:24       ` Junio C Hamano
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-20 19:10     ` Jeff Hostetler
2024-02-13 20:52 ` [PATCH 07/12] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
2024-02-14 16:46   ` Junio C Hamano
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 08/12] fsmonitor: support case-insensitive directory events Jeff Hostetler via GitGitGadget
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 09/12] fsmonitor: refactor non-directory callback Jeff Hostetler via GitGitGadget
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 10/12] fsmonitor: support case-insensitive non-directory events Jeff Hostetler via GitGitGadget
2024-02-13 20:52 ` [PATCH 11/12] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
2024-02-15  9:32   ` Patrick Steinhardt
2024-02-13 20:52 ` [PATCH 12/12] t7527: update case-insenstive fsmonitor test Jeff Hostetler via GitGitGadget
2024-02-23  3:18 ` [PATCH v2 00/16] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 01/16] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
2024-02-23  6:37     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 02/16] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 03/16] t7527: temporarily disable case-insensitive tests Jeff Hostetler via GitGitGadget
2024-02-23  8:17     ` Junio C Hamano
2024-02-26 17:12       ` Jeff Hostetler
2024-02-23  3:18   ` [PATCH v2 04/16] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
2024-02-23  8:18     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 05/16] fsmonitor: clarify handling of directory events in callback helper Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 06/16] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
2024-02-23  8:18     ` Junio C Hamano
2024-02-25 12:30     ` Torsten Bögershausen
2024-02-25 17:24       ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 07/16] dir: create untracked_cache_invalidate_trimmed_path() Jeff Hostetler via GitGitGadget
2024-02-25 12:35     ` Torsten Bögershausen
2024-02-23  3:18   ` [PATCH v2 08/16] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 09/16] fsmonitor: move untracked invalidation into helper functions Jeff Hostetler via GitGitGadget
2024-02-23 17:36     ` Junio C Hamano
2024-02-26 18:45       ` Jeff Hostetler
2024-02-23  3:18   ` [PATCH v2 10/16] fsmonitor: return invalidated cache-entry count on directory event Jeff Hostetler via GitGitGadget
2024-02-23  3:18   ` [PATCH v2 11/16] fsmonitor: remove custom loop from non-directory path handler Jeff Hostetler via GitGitGadget
2024-02-23 17:47     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 12/16] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
2024-02-23 17:51     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 13/16] fsmonitor: trace the new invalidated cache-entry count Jeff Hostetler via GitGitGadget
2024-02-23 17:53     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 14/16] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
2024-02-23 18:14     ` Junio C Hamano [this message]
2024-02-26 20:41       ` Jeff Hostetler
2024-02-26 21:18         ` Junio C Hamano
2024-02-25 13:10     ` Torsten Bögershausen
2024-02-26 20:47       ` Jeff Hostetler
2024-02-23  3:18   ` [PATCH v2 15/16] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
2024-02-23 18:18     ` Junio C Hamano
2024-02-23  3:18   ` [PATCH v2 16/16] t7527: update case-insenstive fsmonitor test Jeff Hostetler via GitGitGadget
2024-02-26 21:39   ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 01/14] name-hash: add index_dir_find() Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 02/14] t7527: add case-insensitve test for FSMonitor Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 03/14] fsmonitor: refactor refresh callback on directory events Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 04/14] fsmonitor: clarify handling of directory events in callback helper Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 05/14] fsmonitor: refactor refresh callback for non-directory events Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 06/14] dir: create untracked_cache_invalidate_trimmed_path() Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 07/14] fsmonitor: refactor untracked-cache invalidation Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 08/14] fsmonitor: move untracked-cache invalidation into helper functions Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 09/14] fsmonitor: return invalidated cache-entry count on directory event Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 10/14] fsmonitor: remove custom loop from non-directory path handler Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 11/14] fsmonitor: return invalided cache-entry count on non-directory event Jeff Hostetler via GitGitGadget
2024-03-06 12:58       ` Patrick Steinhardt
2024-02-26 21:39     ` [PATCH v3 12/14] fsmonitor: trace the new invalidated cache-entry count Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 13/14] fsmonitor: refactor bit invalidation in refresh callback Jeff Hostetler via GitGitGadget
2024-02-26 21:39     ` [PATCH v3 14/14] fsmonitor: support case-insensitive events Jeff Hostetler via GitGitGadget
2024-03-06 12:58       ` Patrick Steinhardt
2024-02-27  1:40     ` [PATCH v3 00/14] FSMonitor edge cases on case-insensitive file systems Junio C Hamano
2024-03-06 12:58     ` Patrick Steinhardt
2024-03-06 17:09       ` Junio C Hamano
2024-03-06 18:10       ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqjzmvt5d3.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jeffhostetler@github.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.