All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: szeder.dev@gmail.com, newren@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/1] sparse-checkout: respect core.ignoreCase in cone mode
Date: Fri, 13 Dec 2019 18:09:52 +0000	[thread overview]
Message-ID: <pull.488.v2.git.1576260593.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.488.git.1575920580.gitgitgadget@gmail.com>

This is the first of several cleanups to the sparse-checkout feature that
had a large overhaul in ds/sparse-cone.

We have an internal customer that plans to use sparse-checkout as a core
feature of their version control experience. They use a case-insensitive
filesystem, so as they created their directory dependency graph they did not
keep consistent casing. This data is what they plan to feed to "git
sparse-checkout set".

While I would certainly prefer that the data is cleaned up (and that process
is ongoing), I could not argue with the logic that git add does the "right
thing" when core.ignoreCase is enabled.

Update in V2:

Based on excellent feedback from Junio, we decided to shore up the case
issues from the previous implementation. Now, the sparse-checkout file may
be stored with the "wrong" case but the hash algorithm is replaced with a
case-insensitive check when core.ignoreCase is set. Because the performance
impact is much lower than I anticipated, I no longer think it is important
to add a new config option to enable/disable this logic.

Thanks, -Stolee

Derrick Stolee (1):
  sparse-checkout: respect core.ignoreCase in cone mode

 Documentation/git-sparse-checkout.txt |  5 +++++
 builtin/sparse-checkout.c             | 10 ++++++++--
 dir.c                                 | 15 ++++++++++++---
 t/t1091-sparse-checkout-builtin.sh    | 17 +++++++++++++++++
 4 files changed, 42 insertions(+), 5 deletions(-)


base-commit: cff4e9138d8df45e3b6199171092ee781cdadaeb
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-488%2Fderrickstolee%2Fsparse-case-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-488/derrickstolee/sparse-case-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/488

Range-diff vs v1:

 1:  23705845ce ! 1:  dc97d5a0d2 sparse-checkout: respect core.ignoreCase in cone mode
     @@ -12,23 +12,42 @@
          using a case-insensitive match. Do the same for the sparse-checkout
          feature.
      
     -    The sanitize_cone_input() method is named to imply that other checks
     -    may be added. In fact, such checks are planned including looking for
     -    wildcards that make the paths invalid cone patterns or must be
     -    escaped.
     -
     -    Specifically, if the path has a match in the index, then use that
     -    path instead. If there is no match, still add that path to the
     -    patterns, as the user may expect the directory to appear after a
     -    checkout to another ref. However, we have no matching path to
     -    correct for a case conflict, and must assume that the user provided
     -    the correct case.
     -
     -    Another option would be to do case-insensitive checks while
     -    updating the skip-worktree bits during unpack_trees(). Outside of
     -    the potential performance loss on a more expensive code path, that
     -    also breaks compatibility with older versions of Git as using the
     -    same sparse-checkout file would change the paths that are included.
     +    Perform case-insensitive checks while updating the skip-worktree
     +    bits during unpack_trees(). This is done by changing the hash
     +    algorithm and hashmap comparison methods to optionally use case-
     +    insensitive methods.
     +
     +    When this is enabled, there is a small performance cost in the
     +    hashing algorithm. To tease out the worst possible case, the
     +    following was run on a repo with a deep directory structure:
     +
     +            git ls-tree -d -r --name-only HEAD |
     +                    git sparse-checkout set --stdin
     +
     +    The 'set' command was timed with core.ignoreCase disabled or
     +    enabled. For the repo with a deep history, the numbers were
     +
     +            core.ignoreCase=false: 62s
     +            core.ignoreCase=true:  74s (+19.3%)
     +
     +    For reproducibility, the equivalent test on the Linux kernel
     +    repository had these numbers:
     +
     +            core.ignoreCase=false: 3.1s
     +            core.ignoreCase=true:  3.6s (+16%)
     +
     +    Now, this is not an entirely fair comparison, as most users
     +    will define their sparse cone using more shallow directories,
     +    and the performance improvement from eb42feca97 ("unpack-trees:
     +    hash less in cone mode" 2019-11-21) can remove most of the
     +    hash cost. For a more realistic test, drop the "-r" from the
     +    ls-tree command to store only the first-level directories.
     +    In that case, the Linux kernel repository takes 0.2-0.25s in
     +    each case, and the deep repository takes one second, plus or
     +    minus 0.05s, in each case.
     +
     +    Thus, we _can_ demonstrate a cost to this change, but it is
     +    unlikely to matter to any reasonable sparse-checkout cone.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ -39,9 +58,10 @@
       If the patterns do match the expected format, then Git will use faster hash-
       based algorithms to compute inclusion in the sparse-checkout.
       
     -+If `core.ignoreCase=true`, then the 'git sparse-checkout set' command will
     -+correct for incorrect case when assigning patterns to the sparse-checkout
     -+file.
     ++If `core.ignoreCase=true`, then the pattern-matching algorithm will use a
     ++case-insensitive check. This corrects for case mismatched filenames in the
     ++'git sparse-checkout set' command to reflect the expected cone in the working
     ++directory.
      +
       SEE ALSO
       --------
     @@ -51,71 +71,76 @@
       --- a/builtin/sparse-checkout.c
       +++ b/builtin/sparse-checkout.c
      @@
     - 	}
     - }
     + 	struct pattern_entry *e = xmalloc(sizeof(*e));
     + 	e->patternlen = path->len;
     + 	e->pattern = strbuf_detach(path, NULL);
     +-	hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
     ++	hashmap_entry_init(&e->ent,
     ++			   ignore_case ?
     ++			   strihash(e->pattern) :
     ++			   strhash(e->pattern));
     + 
     + 	hashmap_add(&pl->recursive_hashmap, &e->ent);
       
     -+static void sanitize_cone_input(struct strbuf *line)
     -+{
     -+	if (ignore_case) {
     -+		struct index_state *istate = the_repository->index;
     -+		const char *name = index_dir_matching_name(istate, line->buf, line->len);
     -+
     -+		if (name) {
     -+			strbuf_setlen(line, 0);
     -+			strbuf_addstr(line, name);
     -+		}
     -+	}
     -+
     -+	if (line->buf[0] != '/')
     -+		strbuf_insert(line, 0, "/", 1);
     -+}
     -+
     - static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl)
     - {
     - 	strbuf_trim(line);
      @@
     - 	if (!line->len)
     - 		return;
     + 		e = xmalloc(sizeof(struct pattern_entry));
     + 		e->patternlen = newlen;
     + 		e->pattern = xstrndup(oldpattern, newlen);
     +-		hashmap_entry_init(&e->ent, memhash(e->pattern, e->patternlen));
     ++		hashmap_entry_init(&e->ent,
     ++				   ignore_case ?
     ++				   strihash(e->pattern) :
     ++				   strhash(e->pattern));
       
     --	if (line->buf[0] != '/')
     --		strbuf_insert(line, 0, "/", 1);
     -+	sanitize_cone_input(line);
     + 		if (!hashmap_get_entry(&pl->parent_hashmap, e, ent, NULL))
     + 			hashmap_add(&pl->parent_hashmap, &e->ent);
     +
     + diff --git a/dir.c b/dir.c
     + --- a/dir.c
     + +++ b/dir.c
     +@@
     + 			 ? ee1->patternlen
     + 			 : ee2->patternlen;
       
     - 	insert_recursive_pattern(pl, line);
     ++	if (ignore_case)
     ++		return strncasecmp(ee1->pattern, ee2->pattern, min_len);
     + 	return strncmp(ee1->pattern, ee2->pattern, min_len);
       }
     -
     - diff --git a/cache.h b/cache.h
     - --- a/cache.h
     - +++ b/cache.h
     + 
      @@
     - int verify_path(const char *path, unsigned mode);
     - int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
     - int index_dir_exists(struct index_state *istate, const char *name, int namelen);
     -+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen);
     - void adjust_dirname_case(struct index_state *istate, char *name);
     - struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
     + 		translated->pattern = truncated;
     + 		translated->patternlen = given->patternlen - 2;
     + 		hashmap_entry_init(&translated->ent,
     +-				   memhash(translated->pattern, translated->patternlen));
     ++				   ignore_case ?
     ++				   strihash(translated->pattern) :
     ++				   strhash(translated->pattern));
       
     -
     - diff --git a/name-hash.c b/name-hash.c
     - --- a/name-hash.c
     - +++ b/name-hash.c
     + 		if (!hashmap_get_entry(&pl->recursive_hashmap,
     + 				       translated, ent, NULL)) {
      @@
     - 	return dir && dir->nr;
     + 	translated->pattern = xstrdup(given->pattern);
     + 	translated->patternlen = given->patternlen;
     + 	hashmap_entry_init(&translated->ent,
     +-			   memhash(translated->pattern, translated->patternlen));
     ++			   ignore_case ?
     ++			   strihash(translated->pattern) :
     ++			   strhash(translated->pattern));
     + 
     + 	hashmap_add(&pl->recursive_hashmap, &translated->ent);
     + 
     +@@
     + 	/* Check straight mapping */
     + 	p.pattern = pattern->buf;
     + 	p.patternlen = pattern->len;
     +-	hashmap_entry_init(&p.ent, memhash(p.pattern, p.patternlen));
     ++	hashmap_entry_init(&p.ent,
     ++			   ignore_case ?
     ++			   strihash(p.pattern) :
     ++			   strhash(p.pattern));
     + 	return !!hashmap_get_entry(map, &p, ent, NULL);
       }
       
     -+const char *index_dir_matching_name(struct index_state *istate, const char *name, int namelen)
     -+{
     -+	struct dir_entry *dir;
     -+
     -+	lazy_init_name_hash(istate);
     -+	dir = find_dir_entry(istate, name, namelen);
     -+
     -+	return dir ? dir->name : NULL;
     -+}
     -+
     - void adjust_dirname_case(struct index_state *istate, char *name)
     - {
     - 	const char *startPtr = name;
      
       diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
       --- a/t/t1091-sparse-checkout-builtin.sh
     @@ -125,16 +150,20 @@
       '
       
      +test_expect_success 'cone mode: set with core.ignoreCase=true' '
     -+	test_when_finished git -C repo config --unset core.ignoreCase &&
      +	git -C repo sparse-checkout init --cone &&
     -+	git -C repo config core.ignoreCase true &&
     -+	git -C repo sparse-checkout set Folder1 &&
     ++	git -C repo -c core.ignoreCase=true sparse-checkout set folder1 &&
      +	cat >expect <<-EOF &&
      +		/*
      +		!/*/
      +		/folder1/
      +	EOF
     -+	test_cmp expect repo/.git/info/sparse-checkout
     ++	test_cmp expect repo/.git/info/sparse-checkout &&
     ++	ls repo >dir &&
     ++	cat >expect <<-EOF &&
     ++		a
     ++		folder1
     ++	EOF
     ++	test_cmp expect dir
      +'
      +
       test_done

-- 
gitgitgadget

  parent reply	other threads:[~2019-12-13 20:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-09 19:42 [PATCH 0/1] sparse-checkout: respect core.ignoreCase in cone mode Derrick Stolee via GitGitGadget
2019-12-09 19:43 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
2019-12-11 18:44   ` Junio C Hamano
2019-12-11 19:11     ` Derrick Stolee
2019-12-11 20:00       ` Junio C Hamano
2019-12-11 20:29         ` Derrick Stolee
2019-12-11 21:37           ` Junio C Hamano
2019-12-12  2:51             ` Derrick Stolee
2019-12-12 20:59   ` Derrick Stolee
2019-12-13 19:05     ` Junio C Hamano
2019-12-13 19:40       ` Derrick Stolee
2019-12-13 22:02         ` Junio C Hamano
2019-12-13 18:09 ` Derrick Stolee via GitGitGadget [this message]
2019-12-13 18:09   ` [PATCH v2 " Derrick Stolee via GitGitGadget
2019-12-13 19:58   ` [PATCH v2 0/1] " Junio C Hamano
2019-12-18 11:41 ` [PATCH " Ed Maste

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.488.v2.git.1576260593.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.