* [PATCH 0/2] refs: a couple of --exclude fixes @ 2025-03-06 1:19 Taylor Blau 2025-03-06 1:19 ` [PATCH 1/2] refs.c: remove empty '--exclude' patterns Taylor Blau ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Taylor Blau @ 2025-03-06 1:19 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt, SURA This short patch series fixes a couple of quirk's with the --exclude feature in for-each-ref (and the corresponding low level bits in the reftable and packed backend). The issue described and fixed in the second patch was reported by SURA here[1]. While working on that issue, I noticed a separate issue that merited fixing in its own patch, which precedes the fix I intended to write ;-). Thanks in advance for your review! [1]: https://lore.kernel.org/git/CAD6AYr-ZC32VNfUfMB63H-rQRfTdV=VQfBm67i2mG+6GDCNxkQ@mail.gmail.com/ Taylor Blau (2): refs.c: remove empty '--exclude' patterns refs.c: unify '--exclude' behavior between files and packed backends refs.c | 20 ++++++++++++++++++++ t/t1419-exclude-refs.sh | 12 +++++++++++- 2 files changed, 31 insertions(+), 1 deletion(-) base-commit: 6a64ac7b014fa2cfa7a69af3c253bcd53a94b428 -- 2.49.0.rc1.2.g7e6a5e020ba ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] refs.c: remove empty '--exclude' patterns 2025-03-06 1:19 [PATCH 0/2] refs: a couple of --exclude fixes Taylor Blau @ 2025-03-06 1:19 ` Taylor Blau 2025-03-06 1:19 ` [PATCH 2/2] refs.c: unify '--exclude' behavior between files and packed backends Taylor Blau 2025-03-06 15:34 ` [PATCH v2 0/2] refs: a couple of --exclude fixes Taylor Blau 2 siblings, 0 replies; 17+ messages in thread From: Taylor Blau @ 2025-03-06 1:19 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt, SURA In 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid excluded pattern(s), 2023-07-10), the packed-refs backend learned how to construct "jump lists" to avoid enumerating sections of the packed-refs file that we know the caller is going to throw out anyway. This process works by finding the start- and end-points (that is, where in the packed-refs file corresponds to the range we're going to ignore) for each exclude pattern, then constructing a jump list based on that. At enumeration time we'll consult the jump list to skip past everything in the range(s) found in the previous step, saving time when excluding a large portion of references. But when there is a --exclude pattern which is just the empty string, the behavior is a little funky. When we try and exclude the empty string, the matched range covers the entire packed-refs file, meaning that we won't output any packed references. But the empty pattern doesn't actually match any references to begin with! For example, on my copy of git.git I can do: $ git for-each-ref '' | wc -l 0 So "git for-each-ref --exclude=''" shouldn't actually remove anything from the output, and ought to be equivalent to "git for-each-ref". But it's not, and in fact: $ git for-each-ref | wc -l 2229 $ git for-each-ref --exclude='' | wc -l 480 But why does the '--exclude' version output only some of the references in the repository? Here's a hint: $ find .git/refs -type f | wc -l 480 Indeed, because the files backend doesn't implement[^1] the same jump list concept as the packed backend we get the correct result for the loose references, but none of the packed references. Since the empty string exclude pattern doesn't match anything, we can discard them before the packed-refs backend has a chance to even see it (and likewise for reftable, which also implements a similar concept since 1869525066 (refs/reftable: wire up support for exclude patterns, 2024-09-16)). This approach (copying only some of the patterns into a strvec at the refs.c layer) may seem heavy-handed, but it's setting us up to fix another bug in the following commit where the fix will involve modifying the incoming patterns. [^1]: As noted in 59c35fac54. We technically could avoid opening and enumerating the contents of, for e.g., "$GIT_DIR/refs/heads/foo/" if we knew that we were excluding anything under the 'refs/heads/foo' hierarchy. But the --exclude stuff is all best-effort anyway, since the caller is expected to cull out any results that they don't want. Noticed-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- refs.c | 16 ++++++++++++++++ t/t1419-exclude-refs.sh | 10 ++++++++++ 2 files changed, 26 insertions(+) diff --git a/refs.c b/refs.c index 91da5325d7..17d3840aff 100644 --- a/refs.c +++ b/refs.c @@ -1699,6 +1699,20 @@ struct ref_iterator *refs_ref_iterator_begin( enum do_for_each_ref_flags flags) { struct ref_iterator *iter; + struct strvec normalized_exclude_patterns = STRVEC_INIT; + + if (exclude_patterns) { + for (size_t i = 0; exclude_patterns[i]; i++) { + const char *pattern = exclude_patterns[i]; + size_t len = strlen(pattern); + if (!len) + continue; + + strvec_push(&normalized_exclude_patterns, pattern); + } + + exclude_patterns = normalized_exclude_patterns.v; + } if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) { static int ref_paranoia = -1; @@ -1719,6 +1733,8 @@ struct ref_iterator *refs_ref_iterator_begin( if (trim) iter = prefix_ref_iterator_begin(iter, "", trim); + strvec_clear(&normalized_exclude_patterns); + return iter; } diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh index c04eeb7211..fd58260a24 100755 --- a/t/t1419-exclude-refs.sh +++ b/t/t1419-exclude-refs.sh @@ -155,4 +155,14 @@ test_expect_success 'meta-characters are discarded' ' assert_no_jumps perf ' +test_expect_success 'empty string exclude pattern is ignored' ' + git update-ref refs/heads/loose $(git rev-parse refs/heads/foo/1) && + + for_each_ref__exclude refs/heads "" >actual 2>perf && + for_each_ref >expect && + + test_cmp expect actual && + assert_no_jumps perf +' + test_done -- 2.49.0.rc1.2.g7e6a5e020ba ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/2] refs.c: unify '--exclude' behavior between files and packed backends 2025-03-06 1:19 [PATCH 0/2] refs: a couple of --exclude fixes Taylor Blau 2025-03-06 1:19 ` [PATCH 1/2] refs.c: remove empty '--exclude' patterns Taylor Blau @ 2025-03-06 1:19 ` Taylor Blau 2025-03-06 8:47 ` Patrick Steinhardt 2025-03-06 15:34 ` [PATCH v2 0/2] refs: a couple of --exclude fixes Taylor Blau 2 siblings, 1 reply; 17+ messages in thread From: Taylor Blau @ 2025-03-06 1:19 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt, SURA In the packed-refs backend, our implementation of '--exclude' (dating back to 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid excluded pattern(s), 2023-07-10)) considers, for example: $ git for-each-ref --exclude=refs/heads/ba to exclude "refs/heads/bar", "refs/heads/baz", and so on. The files backend, which does not implement '--exclude' (and relies on the caller to cull out results that don't match) naturally will enumerate "refs/heads/bar" and so on. So in the above example, 'for-each-ref' will try and see if "refs/heads/ba" matches "refs/heads/bar" (since the files backend simply enumerated every loose reference), and, realizing that it does not match, output the reference as expected. (A caller that did want to exclude "refs/heads/bar" and "refs/heads/baz" might instead run "git for-each-ref --exclude='refs/heads/ba*'"). This can lead to strange behavior, like seeing a different set of references advertised via 'upload-pack' depending on what set of references were loose versus packed. So there is a subtle bug with '--exclude' which is that in the packed-refs backend we will consider "refs/heads/bar" to be a pattern match against "refs/heads/ba" when we shouldn't. Likewise, the reftable backend (which in this case is bug-compatible with the packed backend) exhibits the same broken behavior. There are a few ways to fix this. One is to tighten the rules in cmp_record_to_refname(), which is used to determine the start/end-points of the jump list used by the packed backend. In this new "strict" mode, the comparison function would handle the case where we've reached the end of the pattern by introducing a new check like so: while (1) { if (*r1 == '\n') return *r2 ? -1 : 0; if (!*r2) if (strict && *r1 != '/') /* <- here */ return 1; return start ? 1 : -1; if (*r1 != *r2) return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; r1++; r2++; } (eliding out the rest of cmp_record_to_refname()). Equivalently, we could teach refs/packed-backend::populate_excluded_jump_list() to append a trailing '/' if one does not already exist, forcing an exclude pattern like "refs/heads/ba" to only match "refs/heads/ba/abc" and so forth. But since the same problem exists in reftable, we can fix both at once by performing this pre-processing step one layer up in refs.c at the common entrypoint for the two, which is 'refs_ref_iterator_begin()'. Since that solution is both the simplest and only requires modification in one spot, let's normalize exclude patterns so that they end with a trailing slash. This causes us to unify the behavior between all three backends. There is some minor test fallout in the "overlapping excluded regions" test, which happens to use 'refs/ba' as an exclude pattern, and expects references under the "refs/heads/bar/*" and "refs/heads/baz/*" hierarchies to be excluded from the results. But that test fallout is expected, because the test was codifying the buggy behavior to begin with, and should have never been written that way. Reported-by: SURA <surak8806@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- refs.c | 6 +++++- t/t1419-exclude-refs.sh | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/refs.c b/refs.c index 17d3840aff..2d9a1b51f4 100644 --- a/refs.c +++ b/refs.c @@ -1708,7 +1708,11 @@ struct ref_iterator *refs_ref_iterator_begin( if (!len) continue; - strvec_push(&normalized_exclude_patterns, pattern); + if (pattern[len - 1] == '/') + strvec_push(&normalized_exclude_patterns, pattern); + else + strvec_pushf(&normalized_exclude_patterns, "%s/", + pattern); } exclude_patterns = normalized_exclude_patterns.v; diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh index fd58260a24..d955cf9541 100755 --- a/t/t1419-exclude-refs.sh +++ b/t/t1419-exclude-refs.sh @@ -101,7 +101,7 @@ test_expect_success 'adjacent, non-overlapping excluded regions' ' test_expect_success 'overlapping excluded regions' ' for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf && - for_each_ref refs/heads/foo refs/heads/quux >expect && + for_each_ref refs/heads/bar refs/heads/foo refs/heads/quux >expect && test_cmp expect actual && assert_jumps 1 perf -- 2.49.0.rc1.2.g7e6a5e020ba ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] refs.c: unify '--exclude' behavior between files and packed backends 2025-03-06 1:19 ` [PATCH 2/2] refs.c: unify '--exclude' behavior between files and packed backends Taylor Blau @ 2025-03-06 8:47 ` Patrick Steinhardt 2025-03-06 14:54 ` Taylor Blau 0 siblings, 1 reply; 17+ messages in thread From: Patrick Steinhardt @ 2025-03-06 8:47 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren, SURA On Wed, Mar 05, 2025 at 08:19:53PM -0500, Taylor Blau wrote: Nit: I'd reword the commit subject to not only talk about "files" and "packed" backends. Or maybe not mention the backend at all, as it is the same bug for every backend that implements excludes. Something like "refs: stop matching non-directory prefixes in exclude patterns", for example. [snip] > But since the same problem exists in reftable, we can fix both at once > by performing this pre-processing step one layer up in refs.c at the > common entrypoint for the two, which is 'refs_ref_iterator_begin()'. > > Since that solution is both the simplest and only requires modification > in one spot, let's normalize exclude patterns so that they end with a > trailing slash. This causes us to unify the behavior between all three > backends. Nice. > There is some minor test fallout in the "overlapping excluded regions" > test, which happens to use 'refs/ba' as an exclude pattern, and expects > references under the "refs/heads/bar/*" and "refs/heads/baz/*" > hierarchies to be excluded from the results. Yup, I noticed that this test is asserting the broken behaviour. > diff --git a/refs.c b/refs.c > index 17d3840aff..2d9a1b51f4 100644 > --- a/refs.c > +++ b/refs.c > @@ -1708,7 +1708,11 @@ struct ref_iterator *refs_ref_iterator_begin( > if (!len) > continue; > > - strvec_push(&normalized_exclude_patterns, pattern); > + if (pattern[len - 1] == '/') > + strvec_push(&normalized_exclude_patterns, pattern); > + else > + strvec_pushf(&normalized_exclude_patterns, "%s/", > + pattern); > } > > exclude_patterns = normalized_exclude_patterns.v; This looks exactly as expected. > diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh > index fd58260a24..d955cf9541 100755 > --- a/t/t1419-exclude-refs.sh > +++ b/t/t1419-exclude-refs.sh > @@ -101,7 +101,7 @@ test_expect_success 'adjacent, non-overlapping excluded regions' ' > > test_expect_success 'overlapping excluded regions' ' > for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf && > - for_each_ref refs/heads/foo refs/heads/quux >expect && > + for_each_ref refs/heads/bar refs/heads/foo refs/heads/quux >expect && > > test_cmp expect actual && > assert_jumps 1 perf I was wondering whether this still tests the right thing. But the ranges still are overlapping, as "refs/heads" and "refs/heads/baz" are. So judging by the test description it seems to still do what's advertised. Patrick ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] refs.c: unify '--exclude' behavior between files and packed backends 2025-03-06 8:47 ` Patrick Steinhardt @ 2025-03-06 14:54 ` Taylor Blau 0 siblings, 0 replies; 17+ messages in thread From: Taylor Blau @ 2025-03-06 14:54 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Jeff King, Elijah Newren, SURA On Thu, Mar 06, 2025 at 09:47:37AM +0100, Patrick Steinhardt wrote: > On Wed, Mar 05, 2025 at 08:19:53PM -0500, Taylor Blau wrote: > > Nit: I'd reword the commit subject to not only talk about "files" and > "packed" backends. Or maybe not mention the backend at all, as it is the > same bug for every backend that implements excludes. Something like > "refs: stop matching non-directory prefixes in exclude patterns", for > example. Fair enough. > > diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh > > index fd58260a24..d955cf9541 100755 > > --- a/t/t1419-exclude-refs.sh > > +++ b/t/t1419-exclude-refs.sh > > @@ -101,7 +101,7 @@ test_expect_success 'adjacent, non-overlapping excluded regions' ' > > > > test_expect_success 'overlapping excluded regions' ' > > for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf && > > - for_each_ref refs/heads/foo refs/heads/quux >expect && > > + for_each_ref refs/heads/bar refs/heads/foo refs/heads/quux >expect && > > > > test_cmp expect actual && > > assert_jumps 1 perf > > I was wondering whether this still tests the right thing. But the ranges > still are overlapping, as "refs/heads" and "refs/heads/baz" are. So > judging by the test description it seems to still do what's advertised. That's not quite true. for_each_ref__exclude treats its first argument as the positive half of the query, and the remaining arguments as exclusions. So the only excluded regions here are "refs/heads/ba" and "refs/heads/baz", which were overlapping prior to this patch but aren't anymore. So this test isn't quite doing what it says it is anymore, and should really be called "non-directory excluded regions" or similar. But we should still have a separate test that does cover excluded regions, which needs a new layer underneath some hirearchy, e.g., having "refs/heads/bar/x/..." and "refs/heads/bar" as the excluded regions. I adjusted those and will push out a new version with the results shortly. Thanks for reading closely. Thanks, Taylor ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 0/2] refs: a couple of --exclude fixes 2025-03-06 1:19 [PATCH 0/2] refs: a couple of --exclude fixes Taylor Blau 2025-03-06 1:19 ` [PATCH 1/2] refs.c: remove empty '--exclude' patterns Taylor Blau 2025-03-06 1:19 ` [PATCH 2/2] refs.c: unify '--exclude' behavior between files and packed backends Taylor Blau @ 2025-03-06 15:34 ` Taylor Blau 2025-03-06 15:34 ` [PATCH v2 1/2] refs.c: remove empty '--exclude' patterns Taylor Blau 2025-03-06 15:34 ` [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns Taylor Blau 2 siblings, 2 replies; 17+ messages in thread From: Taylor Blau @ 2025-03-06 15:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt, SURA Here is a small reroll of my series to fix a couple of quirks with the --exclude pattern matching. The changes since last time are fairly minor: the second patch's subject line was reworded (thanks to a suggestion by Patrick). Likewise, its test script changed slightly to reflect that "refs/heads/ba" and "refs/heads/bar" are no longer considered overlapping regions. For convenience, a range-diff is below. Thanks again for your review :-). Taylor Blau (2): refs.c: remove empty '--exclude' patterns refs.c: stop matching non-directory prefixes in exclude patterns refs.c | 20 ++++++++++++++++++++ t/t1419-exclude-refs.sh | 26 ++++++++++++++++++++++++-- 2 files changed, 44 insertions(+), 2 deletions(-) Range-diff against v1: 1: c3b5ca5973 = 1: c3b5ca5973 refs.c: remove empty '--exclude' patterns 2: 7e6a5e020b ! 2: 67c8c5f797 refs.c: unify '--exclude' behavior between files and packed backends @@ Metadata Author: Taylor Blau <me@ttaylorr.com> ## Commit message ## - refs.c: unify '--exclude' behavior between files and packed backends + refs.c: stop matching non-directory prefixes in exclude patterns In the packed-refs backend, our implementation of '--exclude' (dating back to 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid @@ Commit message But that test fallout is expected, because the test was codifying the buggy behavior to begin with, and should have never been written that - way. + way. Split that into its own test (since the range is no longer + overlapping under the stricter interpretation of --exclude patterns + presented here). Create a new test which does have overlapping + regions by using a refs/heads/bar/4/... hierarchy and excluding both + "refs/heads/bar" and "refs/heads/bar/4". Reported-by: SURA <surak8806@gmail.com> Helped-by: Jeff King <peff@peff.net> @@ refs.c: struct ref_iterator *refs_ref_iterator_begin( exclude_patterns = normalized_exclude_patterns.v; ## t/t1419-exclude-refs.sh ## +@@ t/t1419-exclude-refs.sh: test_expect_success 'setup' ' + echo "create refs/heads/$name/$i $base" || return 1 + done || return 1 + done >in && ++ for i in 5 6 7 ++ do ++ echo "create refs/heads/bar/4/$i $base" || return 1 ++ done >>in && + echo "delete refs/heads/main" >>in && + + git update-ref --stdin <in && @@ t/t1419-exclude-refs.sh: test_expect_success 'adjacent, non-overlapping excluded regions' ' + esac + ' - test_expect_success 'overlapping excluded regions' ' +-test_expect_success 'overlapping excluded regions' ' ++test_expect_success 'non-directory excluded regions' ' for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf && - for_each_ref refs/heads/foo refs/heads/quux >expect && + for_each_ref refs/heads/bar refs/heads/foo refs/heads/quux >expect && ++ ++ test_cmp expect actual && ++ assert_jumps 1 perf ++' ++ ++test_expect_success 'overlapping excluded regions' ' ++ for_each_ref__exclude refs/heads refs/heads/bar refs/heads/bar/4 >actual 2>perf && ++ for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect && test_cmp expect actual && assert_jumps 1 perf base-commit: 6a64ac7b014fa2cfa7a69af3c253bcd53a94b428 -- 2.49.0.rc1.2.g67c8c5f7978 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/2] refs.c: remove empty '--exclude' patterns 2025-03-06 15:34 ` [PATCH v2 0/2] refs: a couple of --exclude fixes Taylor Blau @ 2025-03-06 15:34 ` Taylor Blau 2025-03-07 21:32 ` Elijah Newren 2025-03-06 15:34 ` [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns Taylor Blau 1 sibling, 1 reply; 17+ messages in thread From: Taylor Blau @ 2025-03-06 15:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt, SURA In 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid excluded pattern(s), 2023-07-10), the packed-refs backend learned how to construct "jump lists" to avoid enumerating sections of the packed-refs file that we know the caller is going to throw out anyway. This process works by finding the start- and end-points (that is, where in the packed-refs file corresponds to the range we're going to ignore) for each exclude pattern, then constructing a jump list based on that. At enumeration time we'll consult the jump list to skip past everything in the range(s) found in the previous step, saving time when excluding a large portion of references. But when there is a --exclude pattern which is just the empty string, the behavior is a little funky. When we try and exclude the empty string, the matched range covers the entire packed-refs file, meaning that we won't output any packed references. But the empty pattern doesn't actually match any references to begin with! For example, on my copy of git.git I can do: $ git for-each-ref '' | wc -l 0 So "git for-each-ref --exclude=''" shouldn't actually remove anything from the output, and ought to be equivalent to "git for-each-ref". But it's not, and in fact: $ git for-each-ref | wc -l 2229 $ git for-each-ref --exclude='' | wc -l 480 But why does the '--exclude' version output only some of the references in the repository? Here's a hint: $ find .git/refs -type f | wc -l 480 Indeed, because the files backend doesn't implement[^1] the same jump list concept as the packed backend we get the correct result for the loose references, but none of the packed references. Since the empty string exclude pattern doesn't match anything, we can discard them before the packed-refs backend has a chance to even see it (and likewise for reftable, which also implements a similar concept since 1869525066 (refs/reftable: wire up support for exclude patterns, 2024-09-16)). This approach (copying only some of the patterns into a strvec at the refs.c layer) may seem heavy-handed, but it's setting us up to fix another bug in the following commit where the fix will involve modifying the incoming patterns. [^1]: As noted in 59c35fac54. We technically could avoid opening and enumerating the contents of, for e.g., "$GIT_DIR/refs/heads/foo/" if we knew that we were excluding anything under the 'refs/heads/foo' hierarchy. But the --exclude stuff is all best-effort anyway, since the caller is expected to cull out any results that they don't want. Noticed-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- refs.c | 16 ++++++++++++++++ t/t1419-exclude-refs.sh | 10 ++++++++++ 2 files changed, 26 insertions(+) diff --git a/refs.c b/refs.c index 91da5325d7..17d3840aff 100644 --- a/refs.c +++ b/refs.c @@ -1699,6 +1699,20 @@ struct ref_iterator *refs_ref_iterator_begin( enum do_for_each_ref_flags flags) { struct ref_iterator *iter; + struct strvec normalized_exclude_patterns = STRVEC_INIT; + + if (exclude_patterns) { + for (size_t i = 0; exclude_patterns[i]; i++) { + const char *pattern = exclude_patterns[i]; + size_t len = strlen(pattern); + if (!len) + continue; + + strvec_push(&normalized_exclude_patterns, pattern); + } + + exclude_patterns = normalized_exclude_patterns.v; + } if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) { static int ref_paranoia = -1; @@ -1719,6 +1733,8 @@ struct ref_iterator *refs_ref_iterator_begin( if (trim) iter = prefix_ref_iterator_begin(iter, "", trim); + strvec_clear(&normalized_exclude_patterns); + return iter; } diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh index c04eeb7211..fd58260a24 100755 --- a/t/t1419-exclude-refs.sh +++ b/t/t1419-exclude-refs.sh @@ -155,4 +155,14 @@ test_expect_success 'meta-characters are discarded' ' assert_no_jumps perf ' +test_expect_success 'empty string exclude pattern is ignored' ' + git update-ref refs/heads/loose $(git rev-parse refs/heads/foo/1) && + + for_each_ref__exclude refs/heads "" >actual 2>perf && + for_each_ref >expect && + + test_cmp expect actual && + assert_no_jumps perf +' + test_done -- 2.49.0.rc1.2.g67c8c5f7978 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] refs.c: remove empty '--exclude' patterns 2025-03-06 15:34 ` [PATCH v2 1/2] refs.c: remove empty '--exclude' patterns Taylor Blau @ 2025-03-07 21:32 ` Elijah Newren 2025-03-07 23:37 ` Taylor Blau 0 siblings, 1 reply; 17+ messages in thread From: Elijah Newren @ 2025-03-07 21:32 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Patrick Steinhardt, SURA On Thu, Mar 6, 2025 at 7:34 AM Taylor Blau <me@ttaylorr.com> wrote: > > In 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid > excluded pattern(s), 2023-07-10), the packed-refs backend learned how to > construct "jump lists" to avoid enumerating sections of the packed-refs > file that we know the caller is going to throw out anyway. > > This process works by finding the start- and end-points (that is, where > in the packed-refs file corresponds to the range we're going to ignore) > for each exclude pattern, then constructing a jump list based on that. > At enumeration time we'll consult the jump list to skip past everything > in the range(s) found in the previous step, saving time when excluding a > large portion of references. > > But when there is a --exclude pattern which is just the empty string, > the behavior is a little funky. When we try and exclude the empty > string, the matched range covers the entire packed-refs file, meaning > that we won't output any packed references. But the empty pattern > doesn't actually match any references to begin with! For example, on my > copy of git.git I can do: > > $ git for-each-ref '' | wc -l > 0 > > So "git for-each-ref --exclude=''" shouldn't actually remove anything > from the output, and ought to be equivalent to "git for-each-ref". But > it's not, and in fact: > > $ git for-each-ref | wc -l > 2229 > $ git for-each-ref --exclude='' | wc -l > 480 > > But why does the '--exclude' version output only some of the references > in the repository? Here's a hint: > > $ find .git/refs -type f | wc -l > 480 > > Indeed, because the files backend doesn't implement[^1] the same jump > list concept as the packed backend we get the correct result for the > loose references, but none of the packed references. > > Since the empty string exclude pattern doesn't match anything, we can > discard them before the packed-refs backend has a chance to even see it > (and likewise for reftable, which also implements a similar concept > since 1869525066 (refs/reftable: wire up support for exclude patterns, > 2024-09-16)). > > This approach (copying only some of the patterns into a strvec at the > refs.c layer) may seem heavy-handed, but it's setting us up to fix > another bug in the following commit where the fix will involve modifying > the incoming patterns. > > [^1]: As noted in 59c35fac54. We technically could avoid opening and > enumerating the contents of, for e.g., "$GIT_DIR/refs/heads/foo/" if > we knew that we were excluding anything under the 'refs/heads/foo' > hierarchy. But the --exclude stuff is all best-effort anyway, since > the caller is expected to cull out any results that they don't want. > > Noticed-by: Jeff King <peff@peff.net> > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > refs.c | 16 ++++++++++++++++ > t/t1419-exclude-refs.sh | 10 ++++++++++ > 2 files changed, 26 insertions(+) > > diff --git a/refs.c b/refs.c > index 91da5325d7..17d3840aff 100644 > --- a/refs.c > +++ b/refs.c > @@ -1699,6 +1699,20 @@ struct ref_iterator *refs_ref_iterator_begin( > enum do_for_each_ref_flags flags) > { > struct ref_iterator *iter; > + struct strvec normalized_exclude_patterns = STRVEC_INIT; > + > + if (exclude_patterns) { > + for (size_t i = 0; exclude_patterns[i]; i++) { > + const char *pattern = exclude_patterns[i]; > + size_t len = strlen(pattern); > + if (!len) > + continue; > + > + strvec_push(&normalized_exclude_patterns, pattern); > + } > + > + exclude_patterns = normalized_exclude_patterns.v; > + } > > if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) { > static int ref_paranoia = -1; > @@ -1719,6 +1733,8 @@ struct ref_iterator *refs_ref_iterator_begin( > if (trim) > iter = prefix_ref_iterator_begin(iter, "", trim); > > + strvec_clear(&normalized_exclude_patterns); > + > return iter; > } > > diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh > index c04eeb7211..fd58260a24 100755 > --- a/t/t1419-exclude-refs.sh > +++ b/t/t1419-exclude-refs.sh > @@ -155,4 +155,14 @@ test_expect_success 'meta-characters are discarded' ' > assert_no_jumps perf > ' > > +test_expect_success 'empty string exclude pattern is ignored' ' > + git update-ref refs/heads/loose $(git rev-parse refs/heads/foo/1) && > + > + for_each_ref__exclude refs/heads "" >actual 2>perf && > + for_each_ref >expect && > + > + test_cmp expect actual && > + assert_no_jumps perf > +' > + > test_done > -- > 2.49.0.rc1.2.g67c8c5f7978 Makes sense...but doesn't the second patch also fix this issue without the first patch being needed? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] refs.c: remove empty '--exclude' patterns 2025-03-07 21:32 ` Elijah Newren @ 2025-03-07 23:37 ` Taylor Blau 2025-03-07 23:58 ` Elijah Newren 0 siblings, 1 reply; 17+ messages in thread From: Taylor Blau @ 2025-03-07 23:37 UTC (permalink / raw) To: Elijah Newren; +Cc: git, Junio C Hamano, Jeff King, Patrick Steinhardt, SURA On Fri, Mar 07, 2025 at 01:32:49PM -0800, Elijah Newren wrote: > Makes sense...but doesn't the second patch also fix this issue without > the first patch being needed? It does, but the mechanism is pretty round-about. (From a quick glance we'll turn the empty pattern "" into "/" which won't match anything, and thus won't contribute to the jump list). But there are a couple of reasons to keep this patch. Most importantly, it hardens us against potential future regressions here with the empty pattern. And it makes dealing with that case much more explicit by throwing those patterns out before they make their way to the backends instead of the quirk above. Thanks, Taylor ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] refs.c: remove empty '--exclude' patterns 2025-03-07 23:37 ` Taylor Blau @ 2025-03-07 23:58 ` Elijah Newren 0 siblings, 0 replies; 17+ messages in thread From: Elijah Newren @ 2025-03-07 23:58 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Patrick Steinhardt, SURA On Fri, Mar 7, 2025 at 3:37 PM Taylor Blau <me@ttaylorr.com> wrote: > > On Fri, Mar 07, 2025 at 01:32:49PM -0800, Elijah Newren wrote: > > Makes sense...but doesn't the second patch also fix this issue without > > the first patch being needed? > > It does, but the mechanism is pretty round-about. (From a quick glance > we'll turn the empty pattern "" into "/" which won't match anything, and > thus won't contribute to the jump list). How is that round-about? The whole point of patch 2 is to stop matching on excludes as a prefix unless that prefix is a directory name, right? To me, patch 1 merely looks like a special case of patch 2. > But there are a couple of reasons to keep this patch. Most importantly, > it hardens us against potential future regressions here with the empty > pattern. I'm fine with leaving the patch in place, since it doesn't hurt anything. And if the empty pattern is especially problematic, I can see this logic. > And it makes dealing with that case much more explicit by > throwing those patterns out before they make their way to the backends > instead of the quirk above. I don't understand this reason, though. It feels to me like the design behind patch 2 rather than a "quirk"...but maybe the fact that patch 2 doesn't exclude "refs/heads/bar" (at the low-level) even when that exact string is given as an exclude was unintentional or something? If it was an intentional part of patch 2 (as I assumed while reading it), then I don't see how patch 2 excluding the empty string exclude is a "quirk". Am I missing something? (Not that it matters, since I'm fine with keeping the patch for your first reason, I'm just curious...) ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns 2025-03-06 15:34 ` [PATCH v2 0/2] refs: a couple of --exclude fixes Taylor Blau 2025-03-06 15:34 ` [PATCH v2 1/2] refs.c: remove empty '--exclude' patterns Taylor Blau @ 2025-03-06 15:34 ` Taylor Blau 2025-03-06 17:27 ` Junio C Hamano 2025-03-07 21:31 ` Elijah Newren 1 sibling, 2 replies; 17+ messages in thread From: Taylor Blau @ 2025-03-06 15:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Jeff King, Elijah Newren, Patrick Steinhardt, SURA In the packed-refs backend, our implementation of '--exclude' (dating back to 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid excluded pattern(s), 2023-07-10)) considers, for example: $ git for-each-ref --exclude=refs/heads/ba to exclude "refs/heads/bar", "refs/heads/baz", and so on. The files backend, which does not implement '--exclude' (and relies on the caller to cull out results that don't match) naturally will enumerate "refs/heads/bar" and so on. So in the above example, 'for-each-ref' will try and see if "refs/heads/ba" matches "refs/heads/bar" (since the files backend simply enumerated every loose reference), and, realizing that it does not match, output the reference as expected. (A caller that did want to exclude "refs/heads/bar" and "refs/heads/baz" might instead run "git for-each-ref --exclude='refs/heads/ba*'"). This can lead to strange behavior, like seeing a different set of references advertised via 'upload-pack' depending on what set of references were loose versus packed. So there is a subtle bug with '--exclude' which is that in the packed-refs backend we will consider "refs/heads/bar" to be a pattern match against "refs/heads/ba" when we shouldn't. Likewise, the reftable backend (which in this case is bug-compatible with the packed backend) exhibits the same broken behavior. There are a few ways to fix this. One is to tighten the rules in cmp_record_to_refname(), which is used to determine the start/end-points of the jump list used by the packed backend. In this new "strict" mode, the comparison function would handle the case where we've reached the end of the pattern by introducing a new check like so: while (1) { if (*r1 == '\n') return *r2 ? -1 : 0; if (!*r2) if (strict && *r1 != '/') /* <- here */ return 1; return start ? 1 : -1; if (*r1 != *r2) return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; r1++; r2++; } (eliding out the rest of cmp_record_to_refname()). Equivalently, we could teach refs/packed-backend::populate_excluded_jump_list() to append a trailing '/' if one does not already exist, forcing an exclude pattern like "refs/heads/ba" to only match "refs/heads/ba/abc" and so forth. But since the same problem exists in reftable, we can fix both at once by performing this pre-processing step one layer up in refs.c at the common entrypoint for the two, which is 'refs_ref_iterator_begin()'. Since that solution is both the simplest and only requires modification in one spot, let's normalize exclude patterns so that they end with a trailing slash. This causes us to unify the behavior between all three backends. There is some minor test fallout in the "overlapping excluded regions" test, which happens to use 'refs/ba' as an exclude pattern, and expects references under the "refs/heads/bar/*" and "refs/heads/baz/*" hierarchies to be excluded from the results. But that test fallout is expected, because the test was codifying the buggy behavior to begin with, and should have never been written that way. Split that into its own test (since the range is no longer overlapping under the stricter interpretation of --exclude patterns presented here). Create a new test which does have overlapping regions by using a refs/heads/bar/4/... hierarchy and excluding both "refs/heads/bar" and "refs/heads/bar/4". Reported-by: SURA <surak8806@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- refs.c | 6 +++++- t/t1419-exclude-refs.sh | 16 ++++++++++++++-- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/refs.c b/refs.c index 17d3840aff..2d9a1b51f4 100644 --- a/refs.c +++ b/refs.c @@ -1708,7 +1708,11 @@ struct ref_iterator *refs_ref_iterator_begin( if (!len) continue; - strvec_push(&normalized_exclude_patterns, pattern); + if (pattern[len - 1] == '/') + strvec_push(&normalized_exclude_patterns, pattern); + else + strvec_pushf(&normalized_exclude_patterns, "%s/", + pattern); } exclude_patterns = normalized_exclude_patterns.v; diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh index fd58260a24..04797aee59 100755 --- a/t/t1419-exclude-refs.sh +++ b/t/t1419-exclude-refs.sh @@ -46,6 +46,10 @@ test_expect_success 'setup' ' echo "create refs/heads/$name/$i $base" || return 1 done || return 1 done >in && + for i in 5 6 7 + do + echo "create refs/heads/bar/4/$i $base" || return 1 + done >>in && echo "delete refs/heads/main" >>in && git update-ref --stdin <in && @@ -99,9 +103,17 @@ test_expect_success 'adjacent, non-overlapping excluded regions' ' esac ' -test_expect_success 'overlapping excluded regions' ' +test_expect_success 'non-directory excluded regions' ' for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf && - for_each_ref refs/heads/foo refs/heads/quux >expect && + for_each_ref refs/heads/bar refs/heads/foo refs/heads/quux >expect && + + test_cmp expect actual && + assert_jumps 1 perf +' + +test_expect_success 'overlapping excluded regions' ' + for_each_ref__exclude refs/heads refs/heads/bar refs/heads/bar/4 >actual 2>perf && + for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect && test_cmp expect actual && assert_jumps 1 perf -- 2.49.0.rc1.2.g67c8c5f7978 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns 2025-03-06 15:34 ` [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns Taylor Blau @ 2025-03-06 17:27 ` Junio C Hamano 2025-03-07 9:35 ` Patrick Steinhardt 2025-03-07 21:31 ` Elijah Newren 1 sibling, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2025-03-06 17:27 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Jeff King, Elijah Newren, Patrick Steinhardt, SURA Taylor Blau <me@ttaylorr.com> writes: > So there is a subtle bug with '--exclude' which is that in the > packed-refs backend we will consider "refs/heads/bar" to be a pattern > match against "refs/heads/ba" when we shouldn't. Likewise, the reftable > backend (which in this case is bug-compatible with the packed backend) > exhibits the same broken behavior. > ... > There is some minor test fallout in the "overlapping excluded regions" > test, which happens to use 'refs/ba' as an exclude pattern, and expects > references under the "refs/heads/bar/*" and "refs/heads/baz/*" > hierarchies to be excluded from the results. > > ... test (since the range is no longer > overlapping under the stricter interpretation of --exclude patterns > presented here). The code change, reasoning, and the tests look all good. It just leaves a bit awkward aftertaste. In general, I think our "we have a tree-like structure with patterns to match paths" code paths, like pathspec matching, are structured in such a way that the low level is expected to merely cull candidates early as a performance optimization measure (in other words, they are allowed false positives and say something matches when they do not, but not allowed false negatives) and leave the upper level to further reject the ones that do not match the pattern. If packed-refs backend was too loose in its matching and erroneously considered that refs/heads/bar matched refs/heads/ba pattern, I would naïvely expect that the upper layer would catch and reject that refs/heads/bar as not matching. Apparently that was not happening and that is why we need this fix? Is the excluded region optimization expected to be powerful enough to cover all our needs so that we do not need to post-process what it passes? Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns 2025-03-06 17:27 ` Junio C Hamano @ 2025-03-07 9:35 ` Patrick Steinhardt 2025-03-07 17:31 ` Junio C Hamano 0 siblings, 1 reply; 17+ messages in thread From: Patrick Steinhardt @ 2025-03-07 9:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Taylor Blau, git, Jeff King, Elijah Newren, SURA On Thu, Mar 06, 2025 at 09:27:21AM -0800, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > So there is a subtle bug with '--exclude' which is that in the > > packed-refs backend we will consider "refs/heads/bar" to be a pattern > > match against "refs/heads/ba" when we shouldn't. Likewise, the reftable > > backend (which in this case is bug-compatible with the packed backend) > > exhibits the same broken behavior. > > ... > > There is some minor test fallout in the "overlapping excluded regions" > > test, which happens to use 'refs/ba' as an exclude pattern, and expects > > references under the "refs/heads/bar/*" and "refs/heads/baz/*" > > hierarchies to be excluded from the results. > > > > ... test (since the range is no longer > > overlapping under the stricter interpretation of --exclude patterns > > presented here). > > The code change, reasoning, and the tests look all good. It just > leaves a bit awkward aftertaste. > > In general, I think our "we have a tree-like structure with patterns > to match paths" code paths, like pathspec matching, are structured > in such a way that the low level is expected to merely cull > candidates early as a performance optimization measure (in other > words, they are allowed false positives and say something matches > when they do not, but not allowed false negatives) and leave the > upper level to further reject the ones that do not match the > pattern. If packed-refs backend was too loose in its matching and > erroneously considered that refs/heads/bar matched refs/heads/ba > pattern, I would naïvely expect that the upper layer would catch and > reject that refs/heads/bar as not matching. I think you've swapped things around a bit by accident. The problem is that the patterns were being matched too loosely by the underlying backends, which had the consequence that the backends marked too many refs as excluded. As a result, those reference won't ever be yielded to the upper layer at all. So the upper layer doesn't even have a chance to correct such a mistake at all: it cannot correct what it doesn't know. There isn't really a way to implement such a safety net, either (or at least I cannot think of any): the whole point of making backends handle the exclude patterns is that they can skip whole regions entirely and not even try to read them. > Apparently that was not happening and that is why we need this fix? > > Is the excluded region optimization expected to be powerful enough > to cover all our needs so that we do not need to post-process what > it passes? No, it's not. But we can only correct false negatives, not false positives: - A false negative is a ref that matches an exclude pattern but that we yield regardless from the backend, and those do get handled by the upper layer. - A false positive is a ref that does not match an exclude pattern but is still treated as matching by the backend. We thus don't yield them, and thus the upper layer cannot rectify the bug. The fix at hand fixes false positives. What makes me feel a bit uneasy is that for the "files" backend the optimization depends on the packed state, which is quite awkward overall as our tests may not uncover issues only because we didn't pack refs. I don't really see a way to address this potential test gap generically though. The "reftable" backend doesn't have the same issue as it does not have the same split between packed and loose refs, so the optimization always kicks in. Patrick ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns 2025-03-07 9:35 ` Patrick Steinhardt @ 2025-03-07 17:31 ` Junio C Hamano 2025-03-07 23:42 ` Taylor Blau 0 siblings, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2025-03-07 17:31 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: Taylor Blau, git, Jeff King, Elijah Newren, SURA Patrick Steinhardt <ps@pks.im> writes: > I think you've swapped things around a bit by accident. The problem is > that the patterns were being matched too loosely by the underlying > backends, which had the consequence that the backends marked too many > refs as excluded. OK, I agree it is confusing. As a selection mechanism for refs to be shown or processed, exclusion should be "we omit it because we clearly know this one should not be in the final result, but we may pass questionable ones, relying on our caller to have the final say". As a selection mechanism for refs to be excluded, the logic should be the other way around, so false positive and false negative are going to be swapped. We want the exclusion at the lower layer to only say "this ref clearly matches with given exclusion pattern", but we used to claim matches for refs that shouldn't match. OK. Thanks for straightening me out. > What makes me feel a bit uneasy is that for the "files" backend the > optimization depends on the packed state, which is quite awkward overall > as our tests may not uncover issues only because we didn't pack refs. I > don't really see a way to address this potential test gap generically > though. True. An obvious optimization for "files" _might_ be to lazily walk the directory hierarchy and skip recursive readdir when a directory clearly matches the given exclusion pattern, but the result of such an optimization (in other words, what would seep through the sieve) to be filtered out at the upper layer would be different from what the "packed-refs" backend does for its optimization, and they would be different for reftable or any other future backends. But I think that is the nature of lower-level optimization---each backend takes advantage of intimately knowing how it organizes the underlying data, and how they can omit without looking into a bulk of the section of data deeply would be different. Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns 2025-03-07 17:31 ` Junio C Hamano @ 2025-03-07 23:42 ` Taylor Blau 0 siblings, 0 replies; 17+ messages in thread From: Taylor Blau @ 2025-03-07 23:42 UTC (permalink / raw) To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Jeff King, Elijah Newren, SURA On Fri, Mar 07, 2025 at 09:31:17AM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > I think you've swapped things around a bit by accident. The problem is > > that the patterns were being matched too loosely by the underlying > > backends, which had the consequence that the backends marked too many > > refs as excluded. > > OK, I agree it is confusing. As a selection mechanism for refs to > be shown or processed, exclusion should be "we omit it because we > clearly know this one should not be in the final result, but we may > pass questionable ones, relying on our caller to have the final > say". As a selection mechanism for refs to be excluded, the logic > should be the other way around, so false positive and false negative > are going to be swapped. We want the exclusion at the lower layer > to only say "this ref clearly matches with given exclusion pattern", > but we used to claim matches for refs that shouldn't match. > > OK. Thanks for straightening me out. Yes, Patrick is exactly right here. Thanks, Patrick, for beating me to the punch ;-). > > What makes me feel a bit uneasy is that for the "files" backend the > > optimization depends on the packed state, which is quite awkward overall > > as our tests may not uncover issues only because we didn't pack refs. I > > don't really see a way to address this potential test gap generically > > though. > > True. An obvious optimization for "files" _might_ be to lazily walk > the directory hierarchy and skip recursive readdir when a directory > clearly matches the given exclusion pattern, but the result of such > an optimization (in other words, what would seep through the sieve) > to be filtered out at the upper layer would be different from what > the "packed-refs" backend does for its optimization, and they would > be different for reftable or any other future backends. I had considered doing this back when I wrote 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid excluded pattern(s), 2023-07-10). But I decided against it for a couple of reasons. First, it's a little more complicated than the packed backend's implementation, since we have to consider the additional context of what layer of the $GIT_DIR/refs directory we're in to construct the full prefix in order to even perform the match. But the second reason was that we should never have so many loose references sitting around for this optimization to even matter. If we're in a case where it does, then the repository in question should "git pack-refs --all" to take advantage of the optimization. > But I think that is the nature of lower-level optimization---each > backend takes advantage of intimately knowing how it organizes the > underlying data, and how they can omit without looking into a bulk > of the section of data deeply would be different. Yep. Thanks, Taylor ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns 2025-03-06 15:34 ` [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns Taylor Blau 2025-03-06 17:27 ` Junio C Hamano @ 2025-03-07 21:31 ` Elijah Newren 2025-03-07 23:46 ` Taylor Blau 1 sibling, 1 reply; 17+ messages in thread From: Elijah Newren @ 2025-03-07 21:31 UTC (permalink / raw) To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Patrick Steinhardt, SURA On Thu, Mar 6, 2025 at 7:34 AM Taylor Blau <me@ttaylorr.com> wrote: > > In the packed-refs backend, our implementation of '--exclude' (dating > back to 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid > excluded pattern(s), 2023-07-10)) considers, for example: > > $ git for-each-ref --exclude=refs/heads/ba > > to exclude "refs/heads/bar", "refs/heads/baz", and so on. > > The files backend, which does not implement '--exclude' (and relies on > the caller to cull out results that don't match) naturally will > enumerate "refs/heads/bar" and so on. > > So in the above example, 'for-each-ref' will try and see if > "refs/heads/ba" matches "refs/heads/bar" (since the files backend simply > enumerated every loose reference), and, realizing that it does not > match, output the reference as expected. (A caller that did want to > exclude "refs/heads/bar" and "refs/heads/baz" might instead run "git > for-each-ref --exclude='refs/heads/ba*'"). > > This can lead to strange behavior, like seeing a different set of > references advertised via 'upload-pack' depending on what set of > references were loose versus packed. > > So there is a subtle bug with '--exclude' which is that in the > packed-refs backend we will consider "refs/heads/bar" to be a pattern > match against "refs/heads/ba" when we shouldn't. Likewise, the reftable > backend (which in this case is bug-compatible with the packed backend) > exhibits the same broken behavior. Yuck; nice to see this being addressed. > There are a few ways to fix this. One is to tighten the rules in > cmp_record_to_refname(), which is used to determine the start/end-points > of the jump list used by the packed backend. In this new "strict" mode, > the comparison function would handle the case where we've reached the > end of the pattern by introducing a new check like so: > > while (1) { > if (*r1 == '\n') > return *r2 ? -1 : 0; > if (!*r2) > if (strict && *r1 != '/') /* <- here */ > return 1; > return start ? 1 : -1; > if (*r1 != *r2) > return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1; > r1++; > r2++; > } > > (eliding out the rest of cmp_record_to_refname()). Equivalently, we > could teach refs/packed-backend::populate_excluded_jump_list() to append > a trailing '/' if one does not already exist, forcing an exclude pattern > like "refs/heads/ba" to only match "refs/heads/ba/abc" and so forth. > > But since the same problem exists in reftable, we can fix both at once > by performing this pre-processing step one layer up in refs.c at the > common entrypoint for the two, which is 'refs_ref_iterator_begin()'. > > Since that solution is both the simplest and only requires modification > in one spot, let's normalize exclude patterns so that they end with a > trailing slash. This causes us to unify the behavior between all three > backends. :-) > There is some minor test fallout in the "overlapping excluded regions" > test, which happens to use 'refs/ba' as an exclude pattern, and expects > references under the "refs/heads/bar/*" and "refs/heads/baz/*" > hierarchies to be excluded from the results. > > But that test fallout is expected, because the test was codifying the > buggy behavior to begin with, and should have never been written that > way. Split that into its own test (since the range is no longer > overlapping under the stricter interpretation of --exclude patterns > presented here). Create a new test which does have overlapping > regions by using a refs/heads/bar/4/... hierarchy and excluding both > "refs/heads/bar" and "refs/heads/bar/4". Always nice to see tests corrected. > Reported-by: SURA <surak8806@gmail.com> > Helped-by: Jeff King <peff@peff.net> > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > refs.c | 6 +++++- > t/t1419-exclude-refs.sh | 16 ++++++++++++++-- > 2 files changed, 19 insertions(+), 3 deletions(-) > > diff --git a/refs.c b/refs.c > index 17d3840aff..2d9a1b51f4 100644 > --- a/refs.c > +++ b/refs.c > @@ -1708,7 +1708,11 @@ struct ref_iterator *refs_ref_iterator_begin( > if (!len) > continue; > > - strvec_push(&normalized_exclude_patterns, pattern); > + if (pattern[len - 1] == '/') > + strvec_push(&normalized_exclude_patterns, pattern); > + else > + strvec_pushf(&normalized_exclude_patterns, "%s/", > + pattern); Doesn't this mean that if the user requested to exclude "refs/heads/bar" and "refs/heads/bar" exists, that we won't exclude it because it doesn't have a trailing slash? From reading other comments in this thread, I guess that ends up being okay, because we only promise to filter out what we can cheaply filter, and we rely on our caller to double-check everything and do the real filtering. ...but it gives me some ugly dir.c vibes, reminding me of 95c11ecc73f2 (Fix error-prone fill_directory() API; make it only return matches, 2020-04-01) and a slew of related bugs preceding it. Granted, dir.c had this tri-state to deal with (tracked, untracked-but-ignored, untracked-and-not-ignored) and simplifying of whole directories, which don't apply here, so maybe the similarity of "fast-filtering-only-and-rely-on-caller" won't be a problem since the upper level filtering is so much more straightforward. Should this at least be called out in the commit message, though? > } > > exclude_patterns = normalized_exclude_patterns.v; > diff --git a/t/t1419-exclude-refs.sh b/t/t1419-exclude-refs.sh > index fd58260a24..04797aee59 100755 > --- a/t/t1419-exclude-refs.sh > +++ b/t/t1419-exclude-refs.sh > @@ -46,6 +46,10 @@ test_expect_success 'setup' ' > echo "create refs/heads/$name/$i $base" || return 1 > done || return 1 > done >in && > + for i in 5 6 7 > + do > + echo "create refs/heads/bar/4/$i $base" || return 1 > + done >>in && > echo "delete refs/heads/main" >>in && > > git update-ref --stdin <in && > @@ -99,9 +103,17 @@ test_expect_success 'adjacent, non-overlapping excluded regions' ' > esac > ' > > -test_expect_success 'overlapping excluded regions' ' > +test_expect_success 'non-directory excluded regions' ' > for_each_ref__exclude refs/heads refs/heads/ba refs/heads/baz >actual 2>perf && > - for_each_ref refs/heads/foo refs/heads/quux >expect && > + for_each_ref refs/heads/bar refs/heads/foo refs/heads/quux >expect && > + > + test_cmp expect actual && > + assert_jumps 1 perf > +' > + > +test_expect_success 'overlapping excluded regions' ' > + for_each_ref__exclude refs/heads refs/heads/bar refs/heads/bar/4 >actual 2>perf && > + for_each_ref refs/heads/baz refs/heads/foo refs/heads/quux >expect && > > test_cmp expect actual && > assert_jumps 1 perf > -- > 2.49.0.rc1.2.g67c8c5f7978 Other than the one surprise noted above, looks good to me. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns 2025-03-07 21:31 ` Elijah Newren @ 2025-03-07 23:46 ` Taylor Blau 0 siblings, 0 replies; 17+ messages in thread From: Taylor Blau @ 2025-03-07 23:46 UTC (permalink / raw) To: Elijah Newren; +Cc: git, Junio C Hamano, Jeff King, Patrick Steinhardt, SURA On Fri, Mar 07, 2025 at 01:31:31PM -0800, Elijah Newren wrote: > > diff --git a/refs.c b/refs.c > > index 17d3840aff..2d9a1b51f4 100644 > > --- a/refs.c > > +++ b/refs.c > > @@ -1708,7 +1708,11 @@ struct ref_iterator *refs_ref_iterator_begin( > > if (!len) > > continue; > > > > - strvec_push(&normalized_exclude_patterns, pattern); > > + if (pattern[len - 1] == '/') > > + strvec_push(&normalized_exclude_patterns, pattern); > > + else > > + strvec_pushf(&normalized_exclude_patterns, "%s/", > > + pattern); > > Doesn't this mean that if the user requested to exclude > "refs/heads/bar" and "refs/heads/bar" exists, that we won't exclude it > because it doesn't have a trailing slash? > > >From reading other comments in this thread, I guess that ends up being > okay, because we only promise to filter out what we can cheaply > filter, and we rely on our caller to double-check everything and do > the real filtering. > > ...but it gives me some ugly dir.c vibes, reminding me of 95c11ecc73f2 > (Fix error-prone fill_directory() API; make it only return matches, > 2020-04-01) and a slew of related bugs preceding it. Granted, dir.c > had this tri-state to deal with (tracked, untracked-but-ignored, > untracked-and-not-ignored) and simplifying of whole directories, which > don't apply here, so maybe the similarity of > "fast-filtering-only-and-rely-on-caller" won't be a problem since the > upper level filtering is so much more straightforward. > > Should this at least be called out in the commit message, though? Yeah, I think that we don't have a tri-state here to deal with as was the case in 95c11ecc732 makes this a little easier to reason about. And you're right: if we have a pattern like "refs/heads/bar" and we see a leaf in our reference hierarchy called "refs/heads/bar", the packed backend will not exclude it. This is OK because the exclude pattern stuff is all considered "best-effort" and callers are expected to do their own filtering. Note that the exclude patterns (at least in the packed backend) don't know how to handle meta-characters (there's a big comment in refs/packed-backend.c explaining why). So we can't guarantee the absence of false positives without performing the same post-processing as the caller would. Even prior to this commit, a literal match in the excluded patterns would result in a region whose start- and end-points are the same, and we'd throw it out before it made its way into the jump list. Thanks, Taylor ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-03-07 23:58 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-06 1:19 [PATCH 0/2] refs: a couple of --exclude fixes Taylor Blau 2025-03-06 1:19 ` [PATCH 1/2] refs.c: remove empty '--exclude' patterns Taylor Blau 2025-03-06 1:19 ` [PATCH 2/2] refs.c: unify '--exclude' behavior between files and packed backends Taylor Blau 2025-03-06 8:47 ` Patrick Steinhardt 2025-03-06 14:54 ` Taylor Blau 2025-03-06 15:34 ` [PATCH v2 0/2] refs: a couple of --exclude fixes Taylor Blau 2025-03-06 15:34 ` [PATCH v2 1/2] refs.c: remove empty '--exclude' patterns Taylor Blau 2025-03-07 21:32 ` Elijah Newren 2025-03-07 23:37 ` Taylor Blau 2025-03-07 23:58 ` Elijah Newren 2025-03-06 15:34 ` [PATCH v2 2/2] refs.c: stop matching non-directory prefixes in exclude patterns Taylor Blau 2025-03-06 17:27 ` Junio C Hamano 2025-03-07 9:35 ` Patrick Steinhardt 2025-03-07 17:31 ` Junio C Hamano 2025-03-07 23:42 ` Taylor Blau 2025-03-07 21:31 ` Elijah Newren 2025-03-07 23:46 ` Taylor Blau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).