All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  newren@gmail.com,  anh@canva.com,
	 Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 5/5] sparse-index: improve lstat caching of sparse paths
Date: Thu, 27 Jun 2024 14:14:22 -0700	[thread overview]
Message-ID: <xmqqy16qrsup.fsf@gitster.g> (raw)
In-Reply-To: <0cb344ac14fa942f781221663e2324fd2225fb5f.1719412192.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Wed, 26 Jun 2024 14:29:51 +0000")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  struct path_found_data {
> +	/**
> +	 * The path stored in 'dir', if non-empty, corresponds to the most-
> +	 * recent path that we checked where:
> +	 *
> +	 *   1. The path should be a directory, according to the index.
> +	 *   2. The path does not exist.
> +	 *   3. The parent path _does_ exist. (This may be the root of the
> +	 *      working directory.)
> +	 */
>  	struct strbuf dir;
> -	int dir_found;
>  	size_t lstat_count;
>  };
>  
>  #define PATH_FOUND_DATA_INIT { \
> -	.dir = STRBUF_INIT, \
> -	.dir_found = 1 \
> +	.dir = STRBUF_INIT \
>  }
>  
>  static void clear_path_found_data(struct path_found_data *data)
> @@ -455,49 +462,108 @@ static void clear_path_found_data(struct path_found_data *data)
>  	strbuf_release(&data->dir);
>  }
>  
> +/**
> + * Return the length of the largest common substring that ends in a

"largest" here is understandable (it means longest).

> + * slash ('/') to indicate the largest common parent directory. Returns

here I find it a bit confusing.  It is the deepest common parent
directory between the two (or "the common parent directory with the
longest path"), but my initial reaction was "largest common parent
directory?  wouldn't the root level by definition the 'largest'
(having the largest number of paths underneath) directory that is
common?)".

> + * zero if no common directory exists.
> + */
> +static size_t max_common_dir_prefix(const char *path1, const char *path2)
> +{
> +	size_t common_prefix = 0;
> +	for (size_t i = 0; path1[i] && path2[i]; i++) {
> +		if (path1[i] != path2[i])
> +			break;
> +
> +		/*
> +		 * If they agree at a directory separator, then add one
> +		 * to make sure it is included in the common prefix string.
> +		 */
> +		if (path1[i] == '/')
> +			common_prefix = i + 1;
> +	}
> +
> +	return common_prefix;
> +}

Looking good.  I assume that these two paths are relative to the
top-level of the working tree (in other words, they do not begin
with a slash)?

>  static int path_found(const char *path, struct path_found_data *data)
>  {
> ...
> +	 * At this point, we know that 'path' doesn't exist, and we know that
> +	 * the parent directory of 'data->dir' does exist. Let's set 'data->dir'
> +	 * to be the top-most non-existing directory of 'path'. If the first
> +	 * parent of 'path' exists, then we will act as though 'path'
> +	 * corresponds to a directory (by adding a slash).
>  	 */
> -	newdir = strrchr(path, '/');
> -	if (!newdir)
> -		return 0; /* Didn't find a parent dir; just return 0 now. */
> +	common_prefix = max_common_dir_prefix(path, data->dir.buf);
> ...
> +	strbuf_setlen(&data->dir, common_prefix);
> +	while (1) {

Oooh, nice.  So you learned /a/b/c/d did not exist, so check /a/b/c,
and then /a/b/ and stop, because you know /a does exist already.
With luck, our next query is for /a/b/c/e or for /a/b/f, and knowing
that /a/b/ does not exist would allow us to say "no, they do not
exist" without having to lstat().  OK.


  reply	other threads:[~2024-06-27 21:14 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-20 16:11 [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Derrick Stolee via GitGitGadget
2024-06-20 16:11 ` [PATCH 1/5] sparse-index: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-24 22:12   ` Elijah Newren
2024-06-26 12:42     ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13   ` Elijah Newren
2024-06-26 12:43     ` Derrick Stolee
2024-06-20 16:11 ` [PATCH 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-24 22:13   ` Elijah Newren
2024-06-20 16:11 ` [PATCH 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-24 22:13   ` Elijah Newren
2024-06-20 16:11 ` [PATCH 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-24 22:14   ` Elijah Newren
2024-06-25  0:08     ` Junio C Hamano
2024-06-26 13:06     ` Derrick Stolee
2024-06-28  0:10       ` Elijah Newren
2024-06-20 19:16 ` [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-20 20:21   ` Derrick Stolee
2024-06-20 21:02     ` Junio C Hamano
2024-06-26 14:29 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-27 20:59     ` Junio C Hamano
2024-06-28  0:51       ` Elijah Newren
2024-06-28  1:49         ` Derrick Stolee
2024-06-28  5:50         ` Junio C Hamano
2024-06-28  0:31     ` Elijah Newren
2024-06-28  1:56       ` Derrick Stolee
2024-06-26 14:29   ` [PATCH v2 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-26 14:29   ` [PATCH v2 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-27 21:14     ` Junio C Hamano [this message]
2024-06-28  1:56       ` Derrick Stolee
2024-06-27 21:46   ` [PATCH v2 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Junio C Hamano
2024-06-28  0:59     ` Elijah Newren
2024-06-28  1:57       ` Derrick Stolee
2024-06-28 12:43   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 1/5] sparse-checkout: refactor skip worktree retry logic Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 2/5] sparse-index: refactor path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 3/5] sparse-index: use strbuf in path_found() Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 4/5] sparse-index: count lstat() calls Derrick Stolee via GitGitGadget
2024-06-28 12:43     ` [PATCH v3 5/5] sparse-index: improve lstat caching of sparse paths Derrick Stolee via GitGitGadget
2024-06-28 15:07     ` [PATCH v3 0/5] sparse-index: improve clear_skip_worktree_from_present_files() Elijah Newren
2024-06-28 19:34       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy16qrsup.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=anh@canva.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.