From: Victoria Dye <vdye@github.com>
To: William Sprent via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"William Sprent" <williams@unity3d.com>,
"Derrick Stolee" <derrickstolee@github.com>,
"Elijah Newren" <newren@gmail.com>,
"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2] dir: check for single file cone patterns
Date: Wed, 4 Jan 2023 18:48:24 -0500 [thread overview]
Message-ID: <99e3c0f9-ecfa-7c26-eea5-685bc324f674@github.com> (raw)
In-Reply-To: <pull.1446.v2.git.1672734059938.gitgitgadget@gmail.com>
William Sprent via GitGitGadget wrote:
> From: William Sprent <williams@unity3d.com>
>
> The sparse checkout documentation states that the cone mode pattern set
> is limited to patterns that either recursively include directories or
> patterns that match all files in a directory. In the sparse checkout
> file, the former manifest in the form:
>
> /A/B/C/
>
> while the latter become a pair of patterns either in the form:
>
> /A/B/
> !/A/B/*/
>
> or in the special case of matching the toplevel files:
>
> /*
> !/*/
>
> The 'add_pattern_to_hashsets()' function contains checks which serve to
> disable cone-mode when non-cone patterns are encountered. However, these
> do not catch when the pattern list attempts to match a single file or
> directory, e.g. a pattern in the form:
>
> /A/B/C
>
> This causes sparse-checkout to exhibit unexpected behaviour when such a
> pattern is in the sparse-checkout file and cone mode is enabled.
> Concretely, with the pattern like the above, sparse-checkout, in
> non-cone mode, will only include the directory or file located at
> '/A/B/C'. However, with cone mode enabled, sparse-checkout will instead
> just manifest the toplevel files but not any file located at '/A/B/C'.
>
> Relatedly, issues occur when supplying the same kind of filter when
> partial cloning with '--filter=sparse:oid=<oid>'. 'upload-pack' will
> correctly just include the objects that match the non-cone pattern
> matching. Which means that checking out the newly cloned repo with the
> same filter, but with cone mode enabled, fails due to missing objects.
This clearly explains the issue you're trying to fix: cone mode
sparse-checkout requires patterns like '/A/B/', !/A/B/*/', '/A/B/C/', etc.,
but invalid patterns like '/A/B/C' (no trailing slash) currently don't force
the switch to non-cone mode, leading to unexpected behavior.
>
> To fix these issues, add a cone mode pattern check that asserts that
> every pattern is either a directory match or the pattern '/*'. Add a
> test to verify the new pattern check and modify another to reflect that
> non-directory patterns are caught earlier.
I think this is the best way to maintain the current intended behavior of
cone mode sparse-checkout. While the idea of single file "exceptions" to
cone patterns has been brought up in the past (I think most recently at the
Contributor's Summit this past September [1]), it'd be a substantial change
that's definitely out-of-scope of this small bugfix.
[1] https://lore.kernel.org/git/YzXwOsaoCdBhHsX1@nand.local/
>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
> dir: check for single file cone patterns
>
> Resubmitting this without the superfluous double negation of the
> 'strcmp' output.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1446%2Fwilliams-unity%2Fws%2Fsparse-checkout-pattern-match-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1446/williams-unity/ws/sparse-checkout-pattern-match-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1446
>
> Range-diff vs v1:
>
> 1: cfb4b75c378 ! 1: d5406e62f6f dir: check for single file cone patterns
> @@ dir.c: static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_
> }
>
> + if (!(given->flags & PATTERN_FLAG_MUSTBEDIR) &&
> -+ !!strcmp(given->pattern, "/*")) {
> ++ strcmp(given->pattern, "/*")) {
> + /* Not a cone pattern. */
> + warning(_("unrecognized pattern: '%s'"), given->pattern);
> + goto clear_hashmaps;
>
>
> dir.c | 7 +++++++
> t/t1091-sparse-checkout-builtin.sh | 11 ++++++++++-
> 2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index fbdb24fc819..4e99f0c868f 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -732,6 +732,13 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
> goto clear_hashmaps;
> }
>
> + if (!(given->flags & PATTERN_FLAG_MUSTBEDIR) &&
> + strcmp(given->pattern, "/*")) {
This condition is only entered if:
- the pattern *does not* end in a trailing slash
('!(given->flags & PATTERN_FLAG_MUSTBEDIR)'), AND
- the pattern *does not* start with '/*' ('strcmp(given->pattern, "/*")')
The '/*'-like patterns are already handled by other parts of
'add_pattern_to_hashsets()', so only the '/A/B/C'-like patterns will trigger
the "unrecognized" warning. Looks good!
> + /* Not a cone pattern. */
> + warning(_("unrecognized pattern: '%s'"), given->pattern);
> + goto clear_hashmaps;
> + }
> +
> prev = given->pattern;
> cur = given->pattern + 1;
> next = given->pattern + 2;
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index b563d6c263e..627267be153 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -238,7 +238,7 @@ test_expect_success 'cone mode: match patterns' '
> test_expect_success 'cone mode: warn on bad pattern' '
> test_when_finished mv sparse-checkout repo/.git/info/ &&
> cp repo/.git/info/sparse-checkout . &&
> - echo "!/deep/deeper/*" >>repo/.git/info/sparse-checkout &&
> + echo "!/deep/deeper/*/" >>repo/.git/info/sparse-checkout &&
This test changes because, without the trailing slash on the pattern, the
error would become "unrecognized pattern: !/deep/deeper/*" due to the new
check in 'add_pattern_to_hashsets()'. Changing the test pattern ensures
we're still exercising what the test was originally intended to exercise,
and doesn't indicate a regression.
> git -C repo read-tree -mu HEAD 2>err &&
> test_i18ngrep "unrecognized negative pattern" err
> '
> @@ -667,6 +667,15 @@ test_expect_success 'pattern-checks: starting "*"' '
> check_read_tree_errors repo "a deep" "disabling cone pattern matching"
> '
>
> +test_expect_success 'pattern-checks: non directory pattern' '
> + cat >repo/.git/info/sparse-checkout <<-\EOF &&
> + /deep/deeper1/a
> + EOF
> + check_read_tree_errors repo deep "disabling cone pattern matching" &&
> + check_files repo/deep deeper1 &&
> + check_files repo/deep/deeper1 a
> +'
And this test ensures the new check is working for the appropriate patterns.
This patch looks good to me, thanks for finding & fixing the bug!
> +
> test_expect_success 'pattern-checks: contained glob characters' '
> for c in "[a]" "\\" "?" "*"
> do
>
> base-commit: 7c2ef319c52c4997256f5807564523dfd4acdfc7
next prev parent reply other threads:[~2023-01-04 23:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-20 14:27 [PATCH] dir: check for single file cone patterns William Sprent via GitGitGadget
2022-12-20 14:33 ` Ævar Arnfjörð Bjarmason
2022-12-21 12:51 ` William Sprent
2023-01-03 8:20 ` [PATCH v2] " William Sprent via GitGitGadget
2023-01-04 6:07 ` Junio C Hamano
2023-01-04 23:48 ` Victoria Dye [this message]
2023-01-05 2:16 ` Junio C Hamano
2023-01-05 11:39 ` William Sprent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=99e3c0f9-ecfa-7c26-eea5-685bc324f674@github.com \
--to=vdye@github.com \
--cc=avarab@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=williams@unity3d.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).